book review review: transforming research methods in the social sciences: case studies from south africa book title: transforming research methods in the social sciences: case studies from south africa author: sumaya laher, angelo fynn, sherianne kramer isbn: 978-0-6398050-1-6 publisher: wits university press, 2019, r550.00* (hardcover); open access: http://oapen.org/search?identifier=1004359 *book price at the time of review review title: review: transforming research methods in the social sciences: case studies from south africa reviewer: werner de klerk1 elinda harmse1 affiliations: 1school of psychosocial health, community psychosocial research (compres), north-west university, potchefstroom, south africa corresponding author: werner de klerk, 12998699@nwu.ac.za how to cite this book review: de klerk, w., & harmse, e. (2020). review: transforming research methods in the social sciences: case studies from south africa. african journal of psychological assessment, 2(0), a27. https://doi.org/10.4102/ajopa.v2i0.27 copyright notice: © 2020. the authors. licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. literature has indicated that research methods play an important role in the quality of research as well as educating young researchers; however, the application thereof is unclear, which can be harmful to the field of psychology (scholtz, de klerk, & de beer, 2020). the recently published research methodology book by laher, fynn and kramer (2019) was, according to the authors, written in response: two fundamental issues facing social sciences in south africa, namely the active production of knowledge relevant to the south african context and access beyond the sphere of university scholars with subsidised access to scholarly publications. as such, this book is both intentionally open access and context specific. (kramer, fynn, & laher, 2019, p. 1) in the light of this statement, as well as the article written by scholtz et al. (2020), we as reviewers had to ask ourselves, did laher et al. (2019) achieve what they set out to do? laher et al. (2019) mention that the field of social science is ever growing and expanding. therefore, it can become quite challenging in keeping up with the newest research methods and development thereof. as mentioned by the authors, there have been a number of attempts to create an all-encompassing research book in order to assist scholars, academia and anyone that has an interest in research. consequently, the authors refrained from using the traditional research methods and rather approached a diverse perspective by using different south african case studies as examples. taking the above into consideration, this research book consists of three main sections: quantitative (eight chapters), qualitative (nine chapters) and transparadigmatic (six chapters). transforming research methods in the social sciences is a research methodology book book written by numerous experts that focuses on research in the south african context. it provides techniques, procedures and research methods that can be applied to context-bound issues, and which are relevant to the following fields: psychology, sociology, ethnography and anthropology. besides the unique combination of theoretical and application issues, each chapter provides the reader with ethical considerations and issues that one may encounter using that particular method within the south african context. in the words of fynn, kramer and laher (2019, p. xi), ‘this book is simply the introductory chapter to, we hope, a larger body of work that will systematically transform how social science research is conducted within the global south’. taking this statement into consideration, the reviewers recommend that future edited versions of transforming research methods in the social sciences should also consider including more traditional research methods as case studies within the south african context, as there are many examples of these. further reading suggestions include research articles published by scholtz et al. (2020), coetzee and van zyl (2014) and o’neil and koekemoer (2016), which provide a broad overview of the use of research methods internationally and nationally within the psychology research context. other research books such as doing social research (wagner, kawulich, & garner, 2012), research at grass roots (de vos, strydom, fouche, & delport, 2011), doing research in the real world (gray, 2014), first steps in research (maree, 2016), the practice of social research (babbie & mouton, 2001), and research in practice (terre blanche, durrheim, & painter, 2006) are also recommended as good reads for students, academia and researchers in practice. in conclusion, the reviewers are of the opinion that transforming research methods in the social sciences is very germane in the current south african research context. any students, academics, scholars and future researchers will benefit from reading this work, as it is broad and has a variety of relevant information with regard to theoretical and applied knowledge. the reviewers acknowledge that the book does not seek to have the ultimate knowledge or answers, however, it can be regarded as an important methodological guide, seeing that it will set the tone and bar for future research and findings to come in the social sciences. references babbie, e., & mouton, j. (2001). the practice of social research (south african edn.). cape town: oxford university press southern africa. coetzee, m., & van zyl., l.e. (2014). a review of a decade’s scholarly publications (2004–2013) in the south african journal of industrial psychology. south african journal of industrial psychology, 40(1), 1–16. https://doi.org/10.4102/sajip.v40i1.1227 de vos, a.s., strydom, h., fouché, c., & delport, c.s.l. (2011). research at grass roots (4th edn.). pretoria: van schaik publishers. fynn, a., kramer, s., & laher, s. (2019). preface. in s. laher, a. fynn, & s. kramer (eds.), transforming research methods in the social sciences: case studies from south africa (pp. xi–xii). johannesburg: wits university press. https://doi.org/10.18772/22019032750.4 gray, d.e. (2014). doing research in the real world (3rd edn.). london: sage. kramer, s., fynn, a., & laher, s. (2019). research as practice: contextualising applied research in the south african context. in s. laher, a. fynn, & s. kramer (eds.), transforming research methods in the social sciences: case studies from south africa (pp. 1–18). johannesburg: wits university press. https://doi.org/10.18772/22019032750.6 laher, s., fynn, a., & kramer, s. (2019). transforming research methods in the social sciences: case studies from south africa. johannesburg: wits university press. https://doi.org/10.18772/22019032750 maree, k. (2016). first steps in research (2nd edn.). pretoria: van schaik publishers. o’neil, s., & koekemoer, e. (2016). two decades of qualitative research in psychology, industrial and organisational psychology and human resource management within south africa: a critical review. south african journal of industrial psychology, 42(1), 1–16. https://doi.org/10.4102/sajip.v42i1.1350 scholtz, s.e., de klerk, w., & de beer, l.t. (2020). the use of research methods in psychological research: a systematised review. frontiers in research metrics and analytics, 5(1), 1–17. https://doi.org/10.3389/frma.2020.00001 terre blanche, m., durrheim, k., & painter, d. (2006). research in practice: applied methods for the social sciences (2nd rev. edn.). cape town: university of cape town press. wagner, c., kawulich, b., & garner, m. (2012). doing social research: a global context. berkshire: mcgraw-hill higher education. http://www.ajopa.org open access page 1 of 1 reviewer acknowledgement read online: scan this qr code with your smart phone or mobile device to read online. read online: scan this qr code with your smart phone or mobile device to read online. acknowledgement to reviewers in an effort to facilitate the selection of appropriate peer reviewers for the african journal of psychological assessment, we ask that you take a moment to update your electronic portfolio on https://ajopa. org for our files, allowing us better access to your areas of interest and expertise, in order to match reviewers with submitted manuscripts. if you would like to become a reviewer, please visit the journal website and register as a user. in order to be considered, please email submissions@ajopa.org indicating your intention to register as a reviewer for the journal. to access your details on the website, you will need to follow these steps: 1. log into the online journal at https://ajopa. org 2. in your ‘user home’ [https://ajopa.org/index. php/ajopa/user] select ‘edit my profile’ under the heading ‘my account’ and insert all relevant details, bio statement and reviewing interest(s). 3. it is good practice as a reviewer to update your personal details regularly to ensure contact with you throughout your professional term as reviewer to african journal of psychological assessment. please do not hesitate to contact us if you require assistance in performing this task. publisher: publishing@aosis.co.za tel: +27 21 975 2602 the editorial team of the african journal of psychological assessment recognises the value and importance of the peer reviewer in the overall publication process – not only in shaping the individual manuscript, but also in shaping the credibility and reputation of our journal. we are committed to the timely publication of all original, innovative contributions submitted for publication. as such, the identification and selection of reviewers who have expertise and interest in the topics appropriate to each manuscript are essential elements in ensuring a timely, productive peer review process. we would like to take this opportunity to thank and recognise the following reviewers for their precious time and dedication, regardless of whether the papers they reviewed were finally published. we apologise for any names that have been inadvertently left out. these individuals provided their services to the journal as a reviewer from 01 october 2021 to 30 september 2022. alban burke aline ferreira-correia andries masenge annelies cramer brandon morgan brian o’connor casper j. van zyl celeste m. combrinck charles h. van wijk crystal clack david bischof dragos iliescu elisabeth a. kirkbride feziwe mpondo kamleshie mohangi kurt f. geisinger leila abdool-gafoor leon t. de beer mandy wigdorowitz petrus nel petrus c. bester sadaf a. milani sharon truter simangele mayisela sizwe zondo stephen stark tarique variava tyrone b. pretorius werner de klerk http://www.ajopa.org� https://ajopa.org https://ajopa.org mailto:submissions@ajopa.org https://ajopa.org https://ajopa.org https://ajopa.org/index.php/ajopa/user https://ajopa.org/index.php/ajopa/user mailto:publishing@aosis.co.za acknowledgement to reviewers ajopa_v1_2019_contents.indd http://www.ajopa.org open access table of contents i original research generalised anxiety disorder in adolescents in ghana: examination of the psychometric properties of the generalised anxiety disorder-7 scale samuel adjorlolo african journal of psychological assessment | vol 1 | a10 | 18 july 2019 original research a brief sailor resiliency scale for the south african navy charles h. van wijk, jarred h. martin african journal of psychological assessment | vol 1 | a12 | 17 october 2019 original research the boston naming test-south african short form, part i: psychometric properties in a group of healthy english-speaking university students kevin g.f. thomas, lauren baerecke, chen y. pan, helen l. ferrett african journal of psychological assessment | vol 1 | a15 | 22 november 2019 reviewer acknowledgement african journal of psychological assessment | vol 1 | a21 | 12 december 2019 26 33 41 51 page i of i table of contents i editorial editorial: psychological assessment in africa: the time is now! sumaya laher african journal of psychological assessment | vol 1 | a11 | 27 march 2019 original research the five-factor model and individualism and collectivism in south africa: implications for personality assessment sumaya laher, safia dockrat african journal of psychological assessment | vol 1 | a4 | 28 march 2019 original research measuring cognitive emotion regulation in south africa using the cognitive emotion regulation questionnaire-short form itai propheta, casper j.j. van zyl african journal of psychological assessment | vol 1 | a9 | 18 april 2019 original research methodological rigour and coherence in the construction of instruments: the emotional social screening tool for school readiness erica munnik, mario r. smith african journal of psychological assessment | vol 1 | a2 | 24 june 2019 1 4 13 19 vol 1 (2019) issn: 2707-1618 (print) | issn: 2617-2798 (online)african journal of psychological assessment http://www.ajopa.org open access page 1 of 1 reviewer acknowledgement read online: scan this qr code with your smart phone or mobile device to read online. read online: scan this qr code with your smart phone or mobile device to read online. acknowledgement to reviewers in an effort to facilitate the selection of appropriate peer reviewers for the african journal of psychological assessment, we ask that you take a moment to update your electronic portfolio on https://ajopa.org for our files, allowing us better access to your areas of interest and expertise, in order to match reviewers with submitted manuscripts. if you would like to become a reviewer, please visit the journal website and register as a reviewer. to access your details on the website, you will need to follow these steps: 1. log into the online journal at https://ajopa. org 2. in your ‘user home’ [https://ajopa.org/index. php/ajopa/user] select ‘edit my profile’ under the heading ‘my account’ and insert all relevant details, bio statement and reviewing interest(s). 3. it is good practice as a reviewer to update your personal details regularly to ensure contact with you throughout your professional term as reviewer to african journal of psychological assessment. please do not hesitate to contact us if you require assistance in performing this task. publisher: publishing@aosis.co.za tel: +27 21 975 2602 tel: 086 1000 381 the editorial team of the african journal of psychological assessment recognises the value and importance of the peer reviewer in the overall publication process – not only in shaping the individual manuscript, but also in shaping the credibility and reputation of our journal. we are committed to the timely publication of all original, innovative contributions submitted for publication. as such, the identification and selection of reviewers who have expertise and interest in the topics appropriate to each manuscript are essential elements in ensuring a timely, productive peer review process. we would like to take this opportunity to thank all reviewers who participated in shaping this volume of the african journal of psychological assessment. we appreciate the time taken to perform your review(s) successfully. aline ferreira-correia brandon morgan cas h. prinsloo casper van zyl celeste m. combrinck david bischof fatima peters ingrid opperman kate cockcroft kgope moalusi nabeelah bemath nafisa cassimjee neo pule nicola taylor prudence mdletshe safiyyah pahad sharon truter http://www.ajopa.org� https://ajopa.org https://ajopa.org https://ajopa.org https://ajopa.org/index.php/ajopa/user https://ajopa.org/index.php/ajopa/user mailto:publishing@aosis.co.za ajopa_v2_2020_contents.indd http://www.ajopa.org open access table of contents i original research examining the internal structure of the executive functioning inventory amongst south african students candice britz, casper j.j. van zyl african journal of psychological assessment | vol 2 | a26 | 21 september 2020 original research beyond factor analysis: insights into the dimensionality of the fortitude questionnaire through bifactor statistical analysis tyrone b. pretorius, anita padmanabhanunni african journal of psychological assessment | vol 2 | a30 | 20 october 2020 review article relevance of the person-environment fit approach to career assessment in south africa – a review nabeelah bemath african journal of psychological assessment | vol 2 | a22 | 18 june 2020 review article gamification in psychological assessment in south africa: a narrative review yaseerah akoodie african journal of psychological assessment | vol 2 | a24 | 10 september 2020 book review review: transforming research methods in the social sciences: case studies from south africa werner de klerk, elinda harmse african journal of psychological assessment | vol 2 | a27 | 11 june 2020 reviewer acknowledgement african journal of psychological assessment | vol 2 | a43 | 21 december 2020 46 54 62 69 79 81 page i of i table of contents i editorial ignorance is not an excuse – irresponsible neurocognitive test use highlights the need for appropriate training kate cockcroft african journal of psychological assessment | vol 2 | a28 | 21 july 2020 original research the need for contextually appropriate career counselling assessment: using narrative approaches in career counselling assessment in african contexts kobus maree african journal of psychological assessment | vol 2 | a18 | 03 march 2020 original research a psychometric evaluation of the 17-itemed utrecht work engagement scale in uganda ibrahim a. musenze, thomas s. mayende african journal of psychological assessment | vol 2 | a8 | 29 january 2020 original research the impact of different time limits and test versions on reliability in south africa danille e. arendse african journal of psychological assessment | vol 2 | a14 | 03 march 2020 original research standardising the single and double letter cancellation test for south african military personnel chevon p. haarhoff, christi gadd, boshadi semenya, rené van eeden african journal of psychological assessment | vol 2 | a19 | 08 june 2020 original research time limits and english proficiency tests: predicting academic performance ingrid opperman african journal of psychological assessment | vol 2 | a20 | 25 june 2020 1 3 10 19 29 37 vol 2 (2020) issn: 2707-1618 (print) | issn: 2617-2798 (online)african journal of psychological assessment http://www.ajopa.org open access page 1 of 1 reviewer acknowledgement acknowledgement to reviewers in an effort to facilitate the selection of appropriate peer reviewers for the african journal of psychological assessment, we ask that you take a moment to update your electronic portfolio on https://ajod.org for our files, allowing us better access to your areas of interest and expertise, in order to match reviewers with submitted manuscripts. if you would like to become a reviewer, please visit the journal website and register as a reviewer. to access your details on the website, you will need to follow these steps: 1. log into the online journal at https://ajopa.org 2. in your ‘user home’ [https://ajopa.org/index. php/ajopa/user] select ‘edit my profile’ under the heading ‘my account’ and insert all relevant details, bio statement and reviewing interest(s). 3. it is good practice as a reviewer to update your personal details regularly to ensure contact with you throughout your professional term as reviewer to african journal of psychological assessment. please do not hesitate to contact us if you require assistance in performing this task. publisher: publishing@aosis.co.za tel: +27 21 975 2602 tel: 086 1000 381 the editorial team of the african journal of psychological assessment recognises the value and importance of the peer reviewer in the overall publication process – not only in shaping the individual manuscript, but also in shaping the credibility and reputation of our journal. we are committed to the timely publication of all original, innovative contributions submitted for publication. as such, the identification and selection of reviewers who have expertise and interest in the topics appropriate to each manuscript are essential elements in ensuring a timely, productive peer review process. we would like to take this opportunity to thank all reviewers who participated in shaping this volume of the african journal of psychological assessment. we appreciate the time taken to perform your review(s) successfully. adri vorster celeste m. combrinck cheryl foxcroft ghouwa ismail hamsa venkatakrishnan ingrid opperman james takalani kate cockcroft malose makhubela maria a. florence nafisa cassimjee regis chireshe rené van eeden safiyyah pahad samantha adams samuel adjorlolo solomon mashegoane http://www.ajopa.org https://ajod.org https://ajopa.org https://ajopa.org/index.php/ajopa/user https://ajopa.org/index.php/ajopa/user mailto:publishing@aosis.co.za ajopa_v4_2022_contents.indd http://www.ajopa.org open access table of contentspage i of i table of contents vol 4 (2022) issn: 2707-1618 (print) | issn: 2617-2798 (online)african journal of psychological assessment editorial looking inward: reflections on the african journal of psychological assessment and the way forward sumaya laher african journal of psychological assessment | vol 4 | a132 | 15 december 2022 original research the applicability of the ucla loneliness scale in south africa: factor structure and dimensionality tyrone b. pretorius african journal of psychological assessment | vol 4 | a63 | 10 january 2022 original research factor structure of the dispositional hope scale amongst south africans: an exploratory structural equation modelling study itumeleng p. khumalo, tharina guse african journal of psychological assessment | vol 4 | a66 | 31 january 2022 original research montreal cognitive assessment: exploring the impact of demographic variables, internal consistency reliability and discriminant validity in a south african sample elisabeth kirkbride, aline ferreira-correia, mlinganisi sibandze african journal of psychological assessment | vol 4 | a73 | 24 february 2022 original research psychometric description of the life orientation test-revised in a south african sample: a pilot study charles h. van wijk african journal of psychological assessment | vol 4 | a51 | 25 february 2022 original research measuring the big five personality factors in south african adolescents: psychometric properties of the basic traits inventory gideon p. de bruin, nicola taylor, șerban a. zanfirescu african journal of psychological assessment | vol 4 | a85 | 31 march 2022 original research reliability, minimum detectable change and sociodemographic biases of selected neuropsychological tests among people living with hiv in south-eastern nigeria martins c. nweke, nalini govender, aderonke akinpelu, adesola ogunniyi, nombeko mshunqane african journal of psychological assessment | vol 4 | a84 | 28 april 2022 1 4 12 21 30 38 45 original research preliminary normative data for the hooper visual organization test for a south african sample saleha mahomed-kola, aline ferreira-correia, casper j.j. van zyl african journal of psychological assessment | vol 4 | a64 | 30 may 2022 original research investigating the validity of the short form burnout assessment tool: a job demands-resources approach leon t. de beer, wilmar b. schaufeli, arnold b. bakker african journal of psychological assessment | vol 4 | a95 | 09 june 2022 original research assessing the cognitive component of subjective well-being: revisiting the satisfaction with life scale with classical test theory and item response theory tyrone b. pretorius, anita padmanabhanunni african journal of psychological assessment | vol 4 | a106 | 19 july 2022 original research the sensory classroom teacher questionnaire: a tool for assessing conducive classroom conditions for children with adhd hannelie du preez, celeste-marié combrinck african journal of psychological assessment | vol 4 | a107 | 30 august 2022 original research the development of the quality of translation and linguistic equivalence checklist mario r. smith, nuraan adams, erica munnik african journal of psychological assessment | vol 4 | a108 | 26 october 2022 original research psychometric properties of the brief sailor resiliency scale in the south african army david j. schoeman, nafisa cassimjee african journal of psychological assessment | vol 4 | a100 | 26 october 2022 original research the molteno adapted scale: a child development screening tool for healthcare settings priscilla e. springer, barbara laughton, tonya m. esterhuizen, amy l. slogrove, mariana kruger african journal of psychological assessment | vol 4 | a92 | 04 november 2022 reviewer acknowledgement african journal of psychological assessment | vol 4 | a124 | 14 december 2022 52 60 69 78 86 101 110 117 references footnotes about the author(s) kate cockcroft department of psychology, school of human and community development, university of the witwatersrand, johannesburg, south africa citation cockcroft, k. (2020). ignorance is not an excuse – irresponsible neurocognitive test use highlights the need for appropriate training. african journal of psychological assessment, 2(0), a28. https://doi.org/10.4102/ajopa.v2i0.28 editorial ignorance is not an excuse – irresponsible neurocognitive test use highlights the need for appropriate training kate cockcroft published: 21 july 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. neurocognitive assessment is a complex endeavour. its focus is traditionally on brain–behaviour relationships and how injury or disease may impact the cognitive, emotional, physical, sensorimotor and adaptive abilities of the individual (vanderploeg, 2000). the main aim of such an assessment is typically the identification of impairment for the purposes of rehabilitation, therapy and treatment, and research into neurocognitive assessment is based on the intention to advance this aim. in africa, possibly the greatest challenge for those conducting neurocognitive assessment and related research is the linguistic, cultural, educational and socioeconomic diversity and complexity of our populations, and how to ethically and fairly evaluate normative and non-normative behaviour in such contexts. increasingly, even in western, educated, industrialised, rich and developed ([weird]; heinrich, heine, & norenzayan, 2010) contexts, the identification of neurocognitive impairment is somewhat nebulous and may be operationalised in different ways by different practitioners and researchers. researchers and practitioners emphasise that neurocognitive test scores should always be evaluated within the broader context of a person’s sociocultural, educational, psychological, linguistic and occupational history and are aware that interpretation of test results extends beyond the neurocognitive evaluation of the individual (lezak, howieson, & loring, 2004). similarly, researchers and practitioners understand that neurocognitive tests need to be examined critically in terms of their validity as measures of neurocognition for the person with whom they are being used. testing (for either diagnostic or research purposes), without a critical consideration of the instruments used and the individual they are used with, will result in unethical, culturally insensitive use, and interpretation of, neurocognitive measures. africa and its people have long been the recipients of such uncritical testing, and one would anticipate that there would be greater sensitivity in the use of such tests by now (laher & cockcroft 2014). however, the consequences of such practice have been felt recently. nieuwoudt, dickie, coetsee, engelbrecht and terblanche (2019, p. 1) published an article in the journal aging, neuropsychology, and cognition (since retracted by the journal), in which they used the montreal cognitive assessment (moca), a screening test for cognitive impairment, with a south african sample in order to argue that coloured1 women had ‘low cognitive function’. despite evidence that the psychometric properties of the moca vary across different countries and demographic variables, with performance highly sensitive to age and level of education (ashworth et al., 2014), nieuwoudt et al. (2019) used this instrument to make flawed, racist and reckless generalisations. see hendricks, kramer and ratele (2019) for a detailed critique of this research. another article (asongu & kodila-tedika, 2019, p. 1) argued that african countries whose citizens have higher intellectual ability were ‘more likely to experience lower levels of slave exports … probably due to comparatively better capacities to organize, co-operate, oversee and confront slave traders’. in this article, intelligence is measured by means of the historic intelligence quotient (iq), defined as the ‘national average intelligence quotients of populations, including estimates of indigenous populations for the colonized countries’ (asongu & kodila-tedika, 2019, p. 4). the questionable analyses in this paper aside, the very assumption on which the measurement and use of iq are based ignores a long, shameful history of the abuse of iq tests to exclude and control marginalised communities and is blind to the issues of construct validity and bias. it fails to consider acculturation variables, such as language usage, test-wiseness (test-taking skill, motivation and perceptions of test face validity), socio-economic status (ses), home and school environments and level and quality of education, which are widely known to be key factors in considering a person’s iq test performance (cockcroft, alloway, copello, & milligan, 2015; shuttleworth-edwards et al., 2004). such unethical use, and interpretation, of neurocognitive tests and their latent constructs stems partly from the fact that the people using these are not trained in them (researchers from a department of sport science in the instance of the first article and from a business school in the second). practitioners and researchers are morally culpable if they ignore the limitations of using and interpreting neurocognitive data obtained from tests used in africa, but which were standardised in global north contexts (watts & shuttleworth-edwards, 2016). it is our role, as practitioners and researchers in africa, not to allow such problematic research and to make a positive contribution to the body of knowledge through sound and ethical practices. ensuring such practice is difficult given that many tests of neurocognition are openly available. training in psychometrics and the appropriate use of psychological assessment instruments is generally only included in postgraduate programmes in psychology. however, many other disciplines undertake research that employs measurement of neurocognitive abilities, whether in the form of self-report questionnaires, scales or standardised assessments. the advent of digital neuropsychology means that such tests are increasingly available in open-source and digital formats, leading to a high probability of misuse. if we are serious about ethical and responsible research that is sensitive to our context and our people, all research methods training should incorporate an introduction to psychometric principles and practices so that researchers can use open-access tests ethically. such introductory courses can, for example, demonstrate how tests of neurocognition hold cross-cultural biases, and that these are most evident (but not exclusively) in tasks that tap crystallised, long-term learning, irrespective of whether the format is verbal or non-verbal. certain measures tapping fluid processing (such as processing speed) also appear to hold cultural and experiential biases (cockcroft et al., 2015; shuttleworth-edwards et al., 2004). training that emphasises the challenges in measuring and identifying individual differences in neurocognition will go a long way towards ensuring sensitive and responsible assessment, as well as rigorous research that strengthens the credibility of the field. references asongu, s., & kodila-tedika, o. (2019). intelligence and slave exports from africa. journal of interdisciplinary economics, 31, 1–15. https://doi.org/10.1177/0260107919829963 ashworth, b., dilks, l., hutchinson, k., hayes, s., moore, m., orozoco, a., … barnett, o. (2014). a pilot study of age and education norms for the montreal cognitive assessment. archives of clinical neuropsychology, 29(6), 527–528. https://doi.org/10.1093/arclin/acu038.67 cockcroft, k., alloway, t., copello, e., & milligan, r. (2015). a cross-cultural comparison between south african and british students on the wechsler adult intelligence scales third edition (wais-iii). frontiers in psychology, 6, 1–11. https://doi.org/10.3389/fpsyg.2015.00297 hendricks, l., kramer, s., & ratele, k. (2019). research shouldn’t be a dirty thought, but race is a problematic construct. south african journal of psychology, 49(3), 308–311. https://doi.org/10.1177/0081246319852548 heinrich, j., heine, s.j., & norenzayan, a. (2010). the weirdest people in the world? behavioral and brain sciences, 33(2–3), 61–83. https://doi.org/10.1017/s0140525x0999152x laher, s., & cockcroft, k. (2014). psychological assessment in post-apartheid south africa: the way forward. south african journal of psychology, 44(3), 303–314. https://doi.org/10.1177/0081246314533634 lezak, m.d., howieson, d.b., & loring, d.w. (2004). neuropsychological assessment (4th edn.). new york, ny: oxford university press. nieuwoudt, s., dickie, k.e., coetsee, c., engelbrecht, l., & terblanche, e. (2019). retracted article: age-and education-related effects on cognitive functioning in colored south african women. aging, neuropsychology, and cognition, 27(3), 1–17. https://doi.org/10.1080/13825585.2019.1598538 shuttleworth-edwards, a.b., kemp, r.d., rust, a.l., muirhead, j.g.l., hartman, n.p., & radloff, s.e. (2004). cross-cultural effects on iq test performance: a review and preliminary normative indications on wais-iii test performance. journal of clinical and experimental neuropsychology, 26(7), 903–920. https://doi.org/10.1080/13803390490510824 vanderploeg, r. (2000). clinician’s guide to neuropsychological assessment (2nd edn.). mahwah, nj: lawrence erlbaum associates. watts, a.d., & shuttleworth-edwards, a.b. (2016). neuropsychology in south africa: confronting the challenges of specialist practice in a culturally diverse developing country. the clinical neuropsychologist, 30(8), 1–20. https://doi.org/10.1080/13854046.2016.1212098 footnotes 1. ‘coloured’ refers to the official south african government racial classification, as used by nieuwoudt, dickie, coetsee, engelbrecht and terblanche (2019, p. 1). references about the author(s) sumaya laher department of psychology, faculty of humanities, university of the witwatersrand, johannesburg, south africa citation laher, s. (2022). looking inward: reflections on the african journal of psychological assessment and the way forward. african journal of psychological assessment, 4(0), a132. https://doi.org/10.4102/ajopa.v4i0.132 editorial looking inward: reflections on the african journal of psychological assessment and the way forward sumaya laher copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. three and a half years ago, i had the opportunity to write the inaugural editorial for the african journal of psychological assessment (ajopa). at that time, the editors, editorial board and the psychological society of south africa (psyssa) shared a vision about creating a platform to collate the disparate research being conducted in psychological assessment as well as open up opportunities for collaboration and indigenous knowledge production. this vision is slowly being realised as ajopa grows its contributions with every issue. the most recent ajopa report indicates that the journal has tripled its user count since 2019 with a large percentage of users visiting more than one page on the journal site. submissions to the journal have also increased with the journal publishing relevant and contemporary research consistently. what is even more encouraging are the download statistics and citation counts for the articles in the journal that continue to increase exponentially each year. these are strong empirical indicators that the research published in ajopa is actively contributing to the field. the 2022 issue of ajopa provides contributions broadly located within the educational, organisational, neuropsychological and clinical areas of psychological assessment, all of which present rigorous research interrogating the relevance of assessment tools for the south african, african and global south contexts. the health and wellbeing of young adults is a core concern particularly post the pandemic and the many lockdowns. in this fourth volume of the journal, pretorius (2022) discusses the validity of a loneliness scale to be used among emerging adults in the western cape, south africa finding that the university of california los angeles (ucla) loneliness scale does have some merits for the south african context. de bruin et al. (2022) found support for the utility of the locally developed basic traits inventory as a measure of personality in adolescents in south africa. similarly, khumalo and guse (2022) explored the psychometric properties of the dispositional hope scale (dhs) among university students in south africa, arguing that hope is associated with several positive mental health outcomes for students and that a scale like the dhs could be used in interventions to enhance hope and consequently improve mental health for students. interestingly, the study did not find support for the factor structure of the dhs and the model of hope on which it is based. the authors acknowledge that this could be because of measurement instability but raise an important question around the conceptualisation of hope in the south african context making reference to the relevance of a cross culturally transported theoretical construct (p. 7). there is an urgent need to screen and assess children for developmental delay in both clinical and research contexts. in this volume, springer et al. (2022) report on the use of the molteno adapted scale, a developmental screening tool for children under 5 years that assesses language skills, personal and social development as well as fine and gross motor domains. the results suggest that this tool only be used by trained professionals as the evidence for its validity in younger children was variable. du preez and combrinck (2022) describe the development and validation of the sensory classroom teacher questionnaire (sctq) concluding that sctq is useful to ‘gauge and plan effecting attention (co)regulation, learning space design and sensory modulation and synergy for the child diagnosed with adhd’. however, the authors acknowledge that their studies were conducted on learners from well-resourced schools. hence, further research is needed to determine the sctq’s utility for the broader population in south africa. other articles in this volume consider the use of assessments with individuals living with human immunodeficiency virus (hiv). human immunodeficiency virus-associated neurocognitive disorder (hand) is prevalent across sub-saharan africa. there is therefore a huge need for assessments for people living with hiv, which will facilitate accurate classification of hand especially in low-and middle-income countries (lmic) where sophisticated laboratory and neuroimaging techniques are not easily accessible. nweke et al. (2022) explored the psychometric properties of the hopkins verbal learning test-revised (hvlt-r), controlled oral word association test (cowat), trail making test-a (tmt-a) and -b (tmt-b), digit span test-forward (dst-f) and -backward (dst-b) in a sample of 60 people living with hiv in nigeria. they found support for the use of tmt-a, dst-f and dst-b, but the tmt-b, hvlt-r and cowat exhibited at least one form of sociodemographic bias. in south africa, kirkbride et al. (2022) explored the psychometric properties of the montreal cognitive assessment (moca) in a sample of people living with hiv as well as a control group. kirkbride et al. (2022) found that the moca had limited utility as a screening or diagnostic tool for cognitive impairment. they discuss issues related to methodological choices as well as contextual factors in explaining the lack of discriminant validity as well as the overall applicability of the moca in detecting cognitive impairment in south africans. other articles in this volume advocate for instruments in the neuropsychological, educational and organisational psychology fields. mhomed-kola et al. (2022) found support for the use of hooper visual organization test (hvot) in a sample of clinical and non-clinical individuals, and they provide preliminary stratified normative data for south african adults who do not speak english as a first language and who attended public schools arguing that the provision of such normative data is vital for clinicians working in the area. van wijk (2022) found support for the bidimensional structure of the life orientation test-revised (lot-r) in a south african sample and provided preliminary normative data based on a sample of workers against which individual scores can be interpreted. similarly, pretorius and padmanabhanunni (2022) found support for the use of the satisfaction of life scale in a sample of south african school teachers. de beer et al. (2022) argue for the necessity of a locally validated burnout tool especially as job stress has been compounded during the pandemic. they found support for the use of the burnout assessment tool (bat-12) in a sample of south african employees. the bat-12 is also available as an open access, online application for employees to estimate their risk of burnout. schoeman and cassimjee (2022) report on the efficacy of the brief sailor resiliency scale (bcrs) for use with the south african army. their findings concur with those of van wijk and martin (2019) who found the scale to be an effective measure of resilience in members of the south african navy. the evidence in support of using the bcrs to measure resilience in military personnel as well as the support for the use of the bat-12 is important as this allows for appropriate screening of personnel to identify individuals for further assessment and targeted intervention by appropriate support providers. the final article in this volume provides evidence in favour of using the quality of translation and linguistic equivalence checklist (qtlc) arguing for the necessity of an instrument to ensure that test translations are rigorous and can be reliably used in south africa as well as across africa. the diversity of research across this and the other volumes in ajopa speaks to the broad scale applicability of psychological assessment within all the sub disciplines of psychology. however, the collection of research in this volume highlights once again that the ‘transport and test’ approach is not viable. it is necessary for research to establish the equivalence and contextual relevance of assessment instruments interrogating the theories on which these assessments are based as well as the ways in which the constructs being measured are understood. it is encouraging to note the number of individuals developing scales for use in the local context. research into the use of instruments that are freely available must also be encouraged given the resource difficulties faced across the continent. psychological assessment is an active and continuously growing field of research and practice in africa. however, this work is not adequately reflected in ajopa with a majority of the articles stemming from south africa. given that the journal is only in its fourth issue and has its origins at the psyssa, the dominance of south african research was expected. however, going forward, the editorial board would like to strongly encourage submissions from across the continent. as ajopa is not yet accredited by scopus or web of science, it may be less attractive to researchers who wish to publish in journals listed in international databases. what is encouraging is ajopa’s inclusion in the directory of open access journals and subsequently in the south african department of higher education and training database (a year sooner than anticipated bearing testament to the increasing recognition of the scholarship in the journal). another possible detracting factor for publication in ajopa might be the introduction of article processing charges (apcs). smith et al. (2022b) analysed articles published open access in the elsevier system and found that authors in the global south were indeed underrepresented in journals charging apcs and concluded that apcs are a barrier to open access publication for these researchers. this is hugely problematic as it is evident from the literature that publishing research as open access allows authors to get more online views have higher download rates and ultimately more citations. a number of journals that offer this route also have higher impact factors so that publishing in these journals increases the scientific standing of the researcher (wang et al. 2015; nabyonga-orem et al., 2020). hence, there is a need for more affordable open access publishing options. currently, local open access journals like ajopa offer this, but there is a need for further conversation on the best models of publication that are accessible, inclusive and affordable. references de beer, l.t., schaufeli, w.b., & bakker, a.b. (2022). investigating the validity of the short form burnout assessment tool: a job demands-resources approach. african journal of psychological assessment, 4, a95. https://doi.org/10.4102/ajopa.v4i0.95 de bruin, g.p., taylor, n., & zanfirescu, ș.a. (2022). measuring the big five personality factors in south african adolescents: psychometric properties of the basic traits inventory. african journal of psychological assessment, 4, a85. https://doi.org/10.4102/ajopa.v4i0.85 du preez, h., & combrinck, c.-m. (2022). the sensory classroom teacher questionnaire: a tool for assessing conducive classroom conditions for children with adhd. african journal of psychological assessment, 4, a107. https://doi.org/10.4102/ajopa.v4i0.107 khumalo, i., & guse, t. (2022). factor structure of the dispositional hope scale amongst south africans: an exploratory structural equation modelling study. african journal of psychological assessment, 4, 1–9. https://doi.org/10.4102/ajopa.v4i0.66 kirkbride, e., ferreira-correia, a., & sibandze, m. (2022). montreal cognitive assessment: exploring the impact of demographic variables, internal consistency reliability and discriminant validity in a south african sample. african journal of psychological assessment, 4, a73. https://doi.org/10.4102/ajopa.v4i0.73 mahomed-kola, s., ferreira-correia, a., & van zyl, c.j.j. (2022). preliminary normative data for the hooper visual organization test for a south african sample. african journal of psychological assessment, 4, a64. https://doi.org/10.4102/ajopa.v4i0.64 nweke, m.c., govender, n., akinpelu, a., ogunniyi, a., & mshunqane, n. (2022). reliability, minimum detectable change and sociodemographic biases of selected neuropsychological tests among people living with hiv in south-eastern nigeria. african journal of psychological assessment, 4, a84. https://doi.org/10.4102/ajopa.v4i0.84 nabyonga-orem, j., asamani, j.a., & nyirenda, t., & abimbola, s. (2020). article processing charges are stalling the progress of african researchers: a call for urgent reforms. bmj global health, 5(9), e003650. https://doi.org/10.1136/bmjgh-2020-003650 pretorius, t. (2022). the applicability of the ucla loneliness scale in south africa: factor structure and dimensionality. african journal of psychological assessment, 4, 1–8. https://doi.org/10.4102/ajopa.v4i0.63 pretorius, t.b., & padmanabhanunni, a. (2022). assessing the cognitive component of subjective well-being: revisiting the satisfaction with life scale with classical test theory and item response theory. african journal of psychological assessment, 4, a106. https://doi.org/10.4102/ajopa.v4i0.106 schoeman, d., & cassimjee, n. (2022). psychometric properties of the brief sailor resiliency scale in the south african army. african journal of psychological assessment, 4, 1–9. https://doi.org/10.4102/ajopa.v4i0.100 smith, m., adams, n., & munnik, e. (2022a). the development of the quality of translation and linguistic equivalence checklist. african journal of psychological assessment, 4, 1–15. https://doi.org/10.4102/ajopa.v4i0.108 smith, a.c., merz, l., borden, j.b., gulick, c.k., kshirsagar, a.r., & bruna, e.m. (2022b). assessing the effect of article processing charges on the geographic diversity of authors using elsevier’s “mirror journal” system. quantitative science studies, 2(4), 1123–1143. https://doi.org/10.1162/qss_a_00157 springer, p.e., laughton, b., esterhuizen, t.m., slogrove, a.l., & kruger, m. (2022). the molteno adapted scale: a child development screening tool for healthcare settings. african journal of psychological assessment, 4(0), a92. https://doi.org/10.4102/ajopa.v4i0.92 van wijk, c.h. (2022). psychometric description of the life orientation test-revised in a south african sample: a pilot study. african journal of psychological assessment, 4, a51. https://doi.org/10.4102/ajopa.v4i0.51 van wijk, c.h., & martin, j. (2019). a brief sailor resiliency scale for the south african navy. african journal of psychological assessment, 1, 1–8. https://doi.org/10.4102/ajopa.v1i0.12 wang, x., liu, c., mao, w., & fang, z. (2015). the open access advantage considering citation, article usage and social media attention. scientometrics, 103(2), 555–564. https://doi.org/10.1007/s11192-015-1547-0 book review cross-cultural cognitive assessment: data from africa book title: cross-cultural cognitive test norms: an advanced collation from africa authors: shuttleworth-edwards, a.b. & truter, s. (2022) isbn: 978-0-620-98214-6 publisher: inter-ed publishers, paardevlei, south africa r1855.00* *book price at time of review review title: cross-cultural cognitive assessment: data from africa reviewer: kate cockcroft1 affiliation: 1department of psychology, faculty of humanities, university of the witwatersrand, johannesburg, south africa corresponding author: kate cockcroft, kate.cockcroft@wits.ac.za how to cite this article: cockcroft, k. (2023). cross-cultural cognitive assessment: data from africa. african journal of psychological assessment, 5(0), a139. https://doi.org/10.4102/ajopa.v5i0.139 copyright notice: © 2023. the authors. licensee: aosis openjournals. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. shuttleworth-edwards’ and truter’s publication is an african version of the compendia of normative data in the tradition of mitrushina et al. (2005) and strauss et al. (2006). as the interpretation of cognitive test scores profoundly affects the quality and utility of psychological assessment, reports and research, such a collation containing most of the available normative data for commonly used cognitive tests is invaluable. before this book became available, only those with resources (time, staff, journal access) to conduct exhaustive library searches were able to uncover normative reports for a specific test. some of this information exists only as grey literature, so locating it is no small endeavour. this book carefully collates these dispersed norms for 83 cognitive tests from 16 african countries and provides an invaluable guide for the busy clinician, researcher, lecturer, and graduate student. it does more than this too: it offers a critical reflection of the challenges in cross-cultural cognitive assessment. there has been a growing realisation that many of the assumptions underlying neuropsychological testing are not culturally universal (cockcroft, 2022). in addition, problems with intentional and unintentional racial, ethnic, linguistic and socioeconomic discrimination caused by cognitive tests and their users are well documented (cockcroft, 2020; laher & cockcroft, 2013). access to normative data from people with similar demographic backgrounds to the person you intend to assess is crucial in mitigating such discrimination. shuttleworth-edwards and truter highlight that many practitioners, especially those in training, are unaware of the importance of, and/or where to locate, demographically appropriate test choices and norms, especially in cases of socioeconomic, language and educational diversity. their book addresses this concern as the authors included only those studies, which provided all of the core demographic variables, namely language, age, level of education, as well as some indication of socioeconomic status (ses) and/or quality of education. the book has a clear aim to alert practitioners and researchers about these demographic features, which are deemed vital for optimal normative data. some of these, such as quality of education, are not always included in normative data. this unjust feature of the south african educational landscape means that the type of education received ranges from extremely well-resourced and on par with advantaged socioeconomic conditions elsewhere in the world to egregiously under-resourced resulting in low levels of technological sophistication and inadequate literacy. it is therefore important to consider this factor in the assessment of cognitive functioning particularly because there is considerable evidence that quality of education is a much more useful variable than either level of education or race group (shuttleworth-edwards, 2016). in addition to a failure to consider the role of quality of education in evaluating cognitive performance, shuttleworth-edwards and truter also point out another common error made by practitioners, which is ignoring a client’s timed versus untimed performance on tests. slow processing speed may conceal an otherwise intact function when the client is assessed under time constraints. bearing this in mind, the book has been structured so that visuospatial and executive functions are divided into separate chapters based on whether the tests tapping these functions are timed or not. other functional domains could not be as clearly divided, and instead include constant reminders to the practitioner to consider the role of processing speed when interpreting the reason for suboptimal performance on a timed task. in some instances, the normative data in this book reassuringly highlight some well-established influences on cognitive test performance, such as those of age and level of education across all tests and all domains (mitrushina et al., 2005; strauss et al., 2006).this reminds us that these variables should never be neglected in the interpretation of cognitive functioning. shuttleworth-edwards and truter also observe that there is a need for more refined stratification of norms within these variables (age and level of education), for example, the separation of older and oldest-old adults, children versus adolescents, pre-primary versus primary school children. these provide further indications of the kind of research that is needed in the field. another well-established finding was the general lack of influence of sex on normative data (mitrushina et al., 2005; strauss et al., 2006). consideration is also given to an extremely important issue for multilingual settings such as south africa, namely the language of test administration for individuals whose first language is not english. the authors show how language of test administration interacts with age so that home language is not always the best language for assessment. they give a detailed discussion of the nuances in deciding on the appropriate language of assessment. shuttleworth-edwards’ and truter’s text provides all this information in a carefully organised, accessible and user-friendly manner. make sure to read the book’s preface, which provides a detailed rationale for undertaking this enormous project, as well as the clinical and demographic scope of the book. part 1 gives a detailed introduction of cross-cultural test norm challenges and some proposed solutions, as well as the theoretical and conceptual underpinnings of the book. part 2 covers the step-by-step process of applying and interpreting test norms, and part 3 provides the collated normative test data for core functional modalities. i like this organisation, which corresponds with the functional domains of a brain-behaviour model of cognitive assessment. this is preferable to a text-oriented approach, as it allows for more a ‘clinically contextualised’ neuropsychological approach to the assessment of cognitive strengths and weaknesses. in organising the book in this manner, the authors acknowledge that the multifunctional nature of cognitive tests makes their separation into distinct functional modalities artificial but is necessary in order to impose organisational structure on the vast information. they also caution that practitioners should refrain from conceptualising tests too narrowly as belonging solely to a single functional category and that it is important to acknowledge that, in addition to the core functions tapped by a test, other functions would also be drawn on by a particular test. this book addresses the long-standing need for demographically focused african norms with more refined levels of stratification than is usually available from test standardisation data. this is important because this continent has an enormous population with very varying levels of technological skill, educational backgrounds, socioeconomic conditions, literacy, and test-wiseness. while the compendium is a comprehensive and accessible practitioner resource, it also has considerable value for postgraduate professional training and research. it highlights many areas where additional research could fill existing norming gaps. one of these gaps is the lack of an in-depth critical comparison of norms derived for the various african countries, across all the tests and functional domains included in the book. cross-cultural cognitive test norms: an advanced collation from africa is much more than a collection of normative studies. it includes a thoughtful and critical engagement, which draws attention to each study’s strengths and limitations, while stressing administration, scoring and interpretation issues relevant to sound neuropsychological practice. this book fills what has been an enduring gap in standardising the presentation of norms on commonly used tests of cognitive functioning in an african context. references cockcroft, k. (2020). ignorance is not an excuse – irresponsible neurocognitive test use highlights the need for appropriate training. african journal of psychological assessment, 2(0), 1–2. https://doi.org/10.4102/ajopa.v2i0.28 cockcroft, k. (2022). are working memory models weird? testing working memory models in a non-weird sample. neuropsychology, 36(5), 456–467. https://doi.org/10.1037/neu0000811 laher, s., & cockcroft, k. (eds.). (2013). psychological assessment in south africa: research and applications. wits university press. mitrushina, m., boone, k.b., razani, j., & d’elia, l.f. (2005). handbook of normative data for neuropsychological assessment (2nd ed.). oxford university press. shuttleworth-edwards, a. b. (2016). generally representative is representative of none: commentary on the pitfalls of iq test standardization in multicultural settings. clinical neuropsychologist, 30(7), 975–998. https://doi.org/10.1080/13854046.2016.1204011 strauss, e.s., sherman, e.m., & spreen, o. (2006). a compendium of neuropsychological tests: administration, norms, and commentary. oxford university press. abstract assessment in industry assessment in academia concluding remarks acknowledgements references footnote about the author(s) mandy wigdorowitz department of theoretical and applied linguistics, faculty of modern and medieval languages and linguistics, university of cambridge, cambridge, united kingdom department of psychology, faculty of humanities, university of johannesburg, johannesburg, south africa pakeezah rajab department of product management, jvr psychometrics, randburg, south africa tasneem hassem department of psychology, faculty of humanities, university of the witwatersrand, johannesburg, south africa neziswa titi children’s institute, faculty of health sciences, university of cape town, cape town, south africa citation igdorowitz, m., rajab, p., hassem, t., & titi, n. (2021). the impact of covid-19 on psychometric assessment across industry and academia in south africa. african journal of psychological assessment, 3(0), a38. https://doi.org/10.4102/ajopa.v3i0.38 original research the impact of covid-19 on psychometric assessment across industry and academia in south africa mandy wigdorowitz, pakeezah rajab, tasneem hassem, neziswa titi received: 15 oct. 2020; accepted: 05 mar. 2021; published: 31 mar. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the coronavirus disease 2019 (covid-19) pandemic had changed the world in unexpected ways and psychometric assessment was no exception. despite the advancements made in online psychometric assessment implementation, the authors of this commentary reflected on their own experiences in the context of the psychology profession in south africa, where psychology professionals had been faced with the dilemma of halting, postponing or adapting assessments for remote implementation. remote implementation had many challenges, notably shifting the logistics, financial burden and accountability onto the test-taker. in addition, when implementing remote testing, considerations of supervised and unsupervised testing need to be considered in terms of flexibility, control, test-taker comfort, standardisation, costs, ethical concerns and crisis management. whilst in the private sector, remote psychometric assessment had been met with resilience and innovation, in academia, remote psychometric research was faced with unique challenges which affect all aspects of the research process and access to participation. across both industry and academia where psychometric assessments were conducted, the scores and results need to be interpreted with reflection and caution as the pandemic had led to an increase in psychological distress in addition to the unique contextual challenges that south africa already faced. keywords: academia; covid-19 pandemic; industry; mental health; psychometric assessment; south africa; remote testing. on 26 march 2020, all south africans were directly affected by the coronavirus disease 2019 (covid-19) pandemic when a nation-wide lockdown was imposed to curb the spread of the disease, following an announcement by the world health organization (who; cucinotta & vanelli, 2020). this brought various economic and academic activities to a near standstill. one area within psychology that was directly affected was the administration of psychometric assessments, both across industry (private and public sectors) and academia. this is because the options for psychometric evaluation and in-person testing abruptly changed, whereby new ways of performing daily and structured operations were adjusted with unprecedented speed and agility in order to adhere to the lockdown regulations. whilst the implementation of assessments is generally dynamic and constantly improving, with widespread consideration for in-person and online procedures, it has been advised that, in certain cases, examiner–examinee contact is necessary to carry out valid and reliable testing (international test commission, 2005, 2010; naglieri et al., 2008). this is largely dependent on the psychological construct being measured and for what purpose it is to be used. whilst, for instance, a metric of depression may be obtained via an online questionnaire, an in-depth cognitive evaluation may require an in-person approach.1 south african psychology professionals (pps) – which include all those with a psychology registration under the auspices of the health professions council of south africa – working with assessments were faced with the dilemma to either postpone or halt assessment administration or adapt and shift the assessments to an online platform, where possible. for those implementing the latter solution, two options became available. firstly, supervised remote testing is conducted, whereby the test-taker (tt) completes the assessments whilst guided by a pp via a web-based interface. secondly, unsupervised remote testing is conducted, where the tt independently completes the assessments under no pp guidance (see table 1 for considerations for each testing scenario). with working-from-home as the imposed ‘new normal’, guidance was available for pps to adopt a virtual exchange, by means of tele-psychology (chipise, wassenaar, & wilkinson, 2019; evans, 2018). yet, focus was placed on appropriate tele-practice across a range of psychological offerings with limited emphasis on assessment. crucially, evans (2018) noted that when psychological assessments were conducted via tele-psychology, the integrity of the assessment needed to be preserved and the recommended administrative procedures were followed. this sentiment is shared by the american psychological association in its guidance on psychological tele-assessment during covid-19, but it caveats that where administrative procedures are now unable to: [m]imic or at least approximate the standardized protocols presented in test manuals… psychologists should take steps to collect data that are as high quality as possible and use caution and clinical expertise when interpreting those data. (wright, mihura, pade, & mccord, 2020, para. 40) table 1: considerations for in-person, supervised and unsupervised remote testing. as such, in conjunction with adjusting assessment practices, there are psychological effects that should be acknowledged which may have an effect on both tts and pps involved in the assessment process. widespread psychological distress has been reported as a serious consequence of this pandemic (united nations, 2020), and alarmingly, mental health-related conditions prior to the pandemic already result in annual economic losses in excess of $1 trillion, and globally, more than 80% of individuals with a mental health need, do not receive any form of quality or affordable care (who, 2019). the south african population is no exception to experiencing dire mental health outcomes. in an online survey aimed to gauge the mental health of south africans as a consequence of the pandemic, the south african depression and anxiety group (2020) reported that the effects of covid-19 and the imposed lockdown restrictions were linked to an alarming elevation in people reporting mental health symptoms, stemming from financial distress to familial living conditions and an overall increase in general anxieties. moreover, many south africans live below the poverty line, making depression and anxiety an expected reality in the country (francis & webster, 2019). the coronavirus disease 2019 was therefore an additional obstacle to pre-existing adversities experienced by numerous citizens (mayosi et al., 2012; prince et al., 2007) and had a significant impact on how psychometric assessment practices were carried out. in this commentary, the authors, who have diverse backgrounds in academia and the private sector, provide their reflections on psychometric assessment across industry and academia from a psychology praxis perspective. we invite pps to think together towards more innovative and adaptable approaches for assessment across various sectors in south africa that take into consideration the constraints to access and affordability of psychometric assessments, and the mental health effects of covid-19 on tts and pps, which ultimately accentuate existing inequalities. assessment in industry a search for selection and development assessments will quickly demonstrate that all local assessment providers have online platforms with immediate reporting functionalities hosting an array of ability, personality, interest and values assessments. online administration of assessments, many of which are unsupervised (such as personality and values questionnaires) was therefore not something pps in this field, especially those in the private sector, necessarily needed to adjust to. this is to say, for a number of years already, many of these assessments have been standardised and were available to be administered electronically, so at the face-value, it can be assumed that the psychometric integrity of these assessments is intact. however, we were faced with questions that a technical manual written prior to the pandemic could not have answered: should pps working in the corporate sector adjust their interpretations with reference to the impact that the pandemic may have? are such adjustments even possible given the unparalleled experience of testing under such conditions? it is impossible to exhaustively answer these questions at present, given the copious factors at play, even when little change needs to be imposed for administering certain assessments, but they should be deliberated on in accordance with the progression of assessment throughoutand post-pandemic. it has become exceedingly challenging for the administration of cognitive and ability-based assessments, where traditionally, unsupervised remote testing was rarely employed as a suitable option in south africa (cf. brearly et al., 2017). whilst most assessments used for development purposes, such as personality and values assessments which are seen as lower ‘risk’, may be less influenced by time, context and supervision of test-taking, it is best practice to complete cognitive assessments, in particular, in the morning and without distraction, especially if the results will be used for selection and succession decisions (ngo, biss, & hasher, 2018; sievertsen, gino, & piovesan, 2016). normally, in in-person settings, pps ensure that these assessments are proctored to confirm the tt’s identity, ensure no cheating and sharing of items and provide administrative and query support. with the expansion of video-conferencing options, pps are still able to supervise some cognitive assessments, but there are aspects that the pp has lost control over, which could still ultimately result in inaccurate scores. completing assessments at home impedes the home-work boundary and shifts the logistical onus (stable internet connection, quiet setting), financial burden (electricity and data charges) and accountability (self-paced completion, distractions) onto the tt. for unsupervised remote testing, tts can complete assessments outside of typical working hours, which could also negatively affect the integrity of the assessment. for instance, it is impossible to know whether the tt or someone else completed the assessment (which has always been a concern with unsupervised testing), or whether the items are viewed by non-intended audiences, such as spouses or roommates who are in the room at the time of assessment completion, and/or other staff members in the case of shared email addresses or shared screens if the tt chooses to complete the assessment between meetings and forgets to sign off. despite the changed context in which assessments must be conducted, it is clear that assessment practices within industry have adjusted, and although there are specific limitations to remote testing, overall, the private sector has exhibited resilience and innovation in dealing with assessment administration during the pandemic. assessment in academia psychometric assessment in academia has faced unique challenges too. psychology professionals in academia utilising assessments for research purposes have had to adapt, modify or discontinue their research. if suitable assessment adaptation was not a viable option, the research needed to be altered, possibly leading to new aims and research questions, or stopped altogether, regardless of the amount of data already collected. any of these changes could result in drastic theoretical, methodological and/or ethical implications both central to the research study itself and more peripherally, concerning the research process. of significance is how the pandemic has affected all aspects of the research process, including the timeline and completion of the research, availability and duration of funding, degree or grant completion requirements and deadlines, research aims and hypotheses, ethical amendments and clearance, participant testing environment and additional cost implications (hedding, greve, breetzke, nel, & van vuuren, 2020). prior to determining the most suitable remote research strategy, the researcher would also need to consider the implications of both supervised and unsupervised remote testing (see table 1) and the impact each of these settings may have on the above-mentioned aspects of the research. in south africa, a particular concern with research involving assessments – both generally and more crucially during and post the pandemic – is the availability of sufficient funding as well as concerns about budget cuts to higher education (see naidu & dell, 2020). research grants and adaptation of assessments would be contingent on the funding available for a given project, as, for example, various experts may need to be consulted in order to ensure optimal functionality of the assessment in line with test manual guidance. in addition, the researcher has to consider the psychological impact and financial burden placed on participants. unlike industry, where the tts choose to complete assessments for selection processes or are expected to partake in assessments as part of development initiatives, tts in academia are mostly unpaid volunteers with little incentive to complete assessments. thus, adding further obstacles, such as data costs and connectivity issues in more rural areas, may reduce the number of participants either willing or able to partake in remote research. these factors will further aggravate the financial strain on a population already dealing with financial difficulties (francis & webster, 2019). these socio-economic circumstances inadvertently result in a less diverse and representative sample, as they limit access to participation in research amongst certain social groups in the country. although access limitations have been accentuated in academic research because of the economic inequalities re-introduced by the pandemic, one possible benefit is that researchers can gain insight into the feasibility and psychometric integrity of remote assessment. concluding remarks the covid-19 pandemic almost brought economic and academic activities to a standstill in a country already confronted with pervasive inequality and adversity (francis & webster, 2019). participation in psychometric assessment during this time, either through industry or in an academic setting, may only be suitably accessible to those with some economic and social privileges. it is, therefore, essential that psychometric assessments, and the methodological and ethical principles that accompany them, consider how the contextual landscape of south africa has changed and continues to evolve over the course of the pandemic. equally worth consideration is what the long-term effects will be once the pandemic is brought under control. it is undeniable that the pandemic has fostered creativity in the way pps adapt and administer psychometric assessments to tts, but in implementing such innovation, it is important to recognise the specific limitations and possible social, economic and mental health effects that may emerge beyond what is normally acknowledged as key observations within the field (irwing, booth, & hughes, 2018). particular focus should be placed on equitable access, affordability and the psychological well-being of pps and especially tts, given the existing pressures involved in completing an assessment and performing well in it. test-takers’ psychological well-being is an important factor that must be taken into account when selecting participants and interpreting their assessment results, given that the effects of the pandemic may further negatively affect the social, economic and infrastructural conditions in which south africans live, quite possibly inflating depression and anxiety rates amongst normal samples. we, therefore, recommend that pps administering assessments are attentive to possible stressors experienced by tts, as a temporary or long-term consequence of the pandemic, which could affect performance (wright et al., 2020). psychology professionals need to reconsider the tools they draw on to evaluate certain constructs, and in doing so, should be sensitive to the diversity of tts in terms of their scores as well as their personal and social characteristics. the pandemic has also created an opportunity for test developers to demonstrate the stability of their measures, with avenues for future, post-covid research to be conducted on the same assessments that have been used before and during the pandemic. in a post-covid world, it would further benefit the profession to reflect on the changes enforced during this period and consider their impacts; that is, what changes will remain after the pandemic and in what ways will assessment practices shift or how might they return to the pre-covid standard. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions m.w. conceptualised the presented article structure and wrote the manuscript together with p.r., t.h. and n.t. ethical considerations this article followed all ethical standards for research without direct contact with human or animal subjects. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the authors confirm that the data supporting the findings of this study are available within the article. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references brearly, t.w., shura, r.d., martindale, s.l., lazowski, r.a., luxton, d.d., shenal, b.v., & rowland, j.a. (2017). neuropsychological test administration by videoconference: a systematic review and meta-analysis. neuropsychology review, 27(2), 174–186. https://doi.org/10.1007/s11065-017-9349-1 chipise, e., wassenaar, d., & wilkinson, a. (2019). towards new ethics guidelines: the ethics of online therapy in south africa. south african journal of psychology, 49(3), 337–352. https://doi.org/10.1177/0081246318811562 cucinotta, d., & vanelli, m. (2020). who declares covid-19 a pandemic. acta biomedica, 91(1), 157–160. https://doi.org/10.23750/abm.v91i1.9397 evans, d.j. (2018). some guidelines for telepsychology in south africa. south african journal of psychology, 48(2), 166–170. https://doi.org/10.1177/0081246318757943 francis, d., & webster, e. (2019). poverty and inequality in south africa: critical reflections. development southern africa, 36(6), 788–802. https://doi.org/10.1080/0376835x.2019.1666703 hedding, d.w., greve, m., breetzke, g.d., nel, w., & van vuuren, b.j. (2020). covid-19 and the academe in south africa: not business as usual. south african journal of science, 116(7–8), 1–3. https://doi.org/10.17159/sajs.2020/8298 international test commission. (2005). international guidelines on computer-based and internet delivered testing. international journal of testing, 6(2), 143–172. https://doi.org/10.1207/s15327574ijt0602_4 international test commission. (2010). a test-taker’s guide to technology-based testing. retrieved from http://www.intestcom.org irwing, p., booth, t., & hughes, d.j. (2018). the wiley handbook of psychometric testing: a multidisciplinary reference on survey, scale and test development. chichester: john wiley & sons ltd. mayosi, b.m., lawn, j.e., van niekerk, a., bradshaw, d., abdoolkarim, s.s., coovadia, h.m., & lancet south africa team. (2012). health in south africa: changes and challenges since 2009. the lancet, 380(9858), 2029–2043. https://doi.org/10.1016/s0140-6736(12)61814-5 naglieri, j.a., drasgow, f., schmit, m., handler, l., prifitera, a., margolis, a., & velasquez, r. (2008). psychological testing on the internet: new problems, old issues. in d.n. bersoff (ed.), ethical conflicts in psychology (4th edn., pp. 306–312). washington, dc: american psychological association. naidu, e., & dell, s. (2020, july). concern over cuts to higher education, science budgets. university world news africa edition. retrieved from https://www.universityworldnews.com/post.php?story=202007230657559 ngo, k.w.j., biss, r.k., & hasher, l. (2018). time of day effects on the use of distraction to minimise forgetting. quarterly journal of experimental psychology, 71(11), 2334–2341. https://doi.org/10.1177/1747021817740808 prince, m., patel, v., saxena, s., maj, m., maselko, j., phillips, m.r., & rahman, a. (2007). no health without mental health. the lancet, 370(9590), 859–877. https://doi.org/10.1016/s0140-6736(07)61238-0 sievertsen, h.h., gino, f., & piovesan, m. (2016). cognitive fatigue influences students’ performance on standardized tests. proceedings of the national academy of sciences of the united states of america (pnas), 113(10), 2621–2624. https://doi.org/10.1073/pnas.1516947113 the south african depression and anxiety group. (2020). sadag’s online survey findings on covid-19 and mental health (april 2020). retrieved from http://www.sadag.org/index.php?option=com_content&view=article&id=3091&itemid=483 united nations. (2020, may). policy brief: covid-19 and the need for action on mental health. retrieved from https://www.un.org/sites/un2.un.org/files/un_policy_brief-covid_and_mental_health_final.pdf world health organisation. (2019). the who special initiative for mental health (2019–2023): universal health coverage for mental health. retrieved from https://www.who.int/mental_health/evidence/special_initiative_2019_2023/en/ wright, a.j., mihura, j.l., pade, h., & mccord, d.m. (2020, may). guidance on psychological tele-assessment during the covid-19 crisis. american psychological association services, inc. retrieved from https://www.apaservices.org/practice/reimbursement/health-codes/testing/tele-assessment-covid-19 footnote 1. however, see brearly et al. (2017) for a review on neuropsychological test administration by video-conference. abstract method analysis results conclusion acknowledgements references about the author(s) tyrone b. pretorius department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa citation pretorius, t.b. (2021). over reliance on model fit indices in confirmatory factor analyses may lead to incorrect inferences about bifactor models: a cautionary note. african journal of psychological assessment, 3(0), a35. https://doi.org/10.4102/ajopa.v3i0.35 original research over reliance on model fit indices in confirmatory factor analyses may lead to incorrect inferences about bifactor models: a cautionary note tyrone b. pretorius received: 29 sept. 2020; accepted: 21 jan. 2021; published: 12 mar. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract this brief article attempts to describe the importance of relying not just on model fit indices, but also on bifactor, confirmatory factor analysis to examine the factor structure of instruments presumed to be multidimensional. three ancillary bifactor indices (explained common variance, omega hierarchical and percentage uncontaminated correlations) were calculated for three instruments that have been described as multidimensional in published research. one of these instruments, the normative beliefs about aggression scale (nobags) demonstrated strong evidence of multidimensionality. the second instrument, problem-solving inventory demonstrated some evidence of multidimensionality, but must be considered essentially unidimensional because of lack of sufficient evidence. the third instrument, cyberchondria severity scale demonstrated essential unidimensionality with little evidence of multidimensionality. these findings support the argument that using only model fit statistics may lead researchers to draw incorrect conclusions about the dimensionality of an instrument. keywords: ancillary bifactor indices; bifactor models; model-fit indices; multidimensionality; unidimensionality. researchers may be interested in the scores produced by measuring instruments only to the extent that they can be used to test a proposed hypothesis. however, a critical first step in using a measuring instrument is to conduct an examination of its psychometric properties, such as reliability and validity. the investigation of the psychometric properties of an instrument should be undertaken even when the purpose of the research is not necessarily focused on the investigation of the psychometric properties. researchers need to be certain that the scores obtained from any instrument are useful. this is critical as measurement properties are not inherent qualities of tests but rather of scores (zangaro, 2019). as such, measuring instruments can have different properties in different applications and with different samples in different contexts. one such property of measuring instruments is validity, which refers to the extent that a measuring instrument actually measures the construct it claims to measure (clark & watson, 1995). construct validity is a type of validity that is often examined by use of confirmatory factor analysis (cfa) (brown & moore, 2012). if the hypothesised underlying structure of an instrument is replicated, the replication is considered indicative of construct validity. however, it should be noted that confirming that the structure of an instrument holds is only one part of validity. it is necessary to show that an instrument has both internal and external validity before using the instrument in practice. in cfa, the scale items are regarded as observed measurements and the hypothesised factors are regarded as latent variables. if an instrument is hypothesised to have a total score and subscale scores, cfa will typically be used to examine several models of the structure of the instrument to determine which models best fits the data. studies generally compare three conceptualisations of the factor structure of an instrument that is hypothesised to consist of a total scale and several subscales: a one-factor model, a second-order factor model and a bifactor model (see figure 1). for example, reynolds and keith (2017) used a one-factor model, a second-order factor model and a bifactor model to examine the structure of the wechsler intelligence scale for children. figure 1: three conceptualisations of the factor structure of a hypothetical eight-item scale. (a) one-factor model, rectangles are observed variables. ellipse is latent variable; (b) second-order factor model and (c) bifactor model. in the one-factor model, all items load on a total scale score. in the bifactor model, however, items load on both a total scale score (referred to as the ‘general factor’) and several subscale scores (referred to as ‘specific factors’: reise et al., 2013). in the second-order model, items load first on several subscales, and the subscale scores in turn load on a total scale. therefore, the relationship between the total scale and the observed items is mediated by the subscales in the second-order model (brown & moore, 2012). many researchers rely solely on model fit indices to compare and select the best-fitting model (morgan et al., 2015). the most common fit indices are the chi-squared (χ2), root mean square error of approximation (rmsea), comparative fit index (cfi), standardised root mean square residual (srmr), tucker–lewis index (tli), goodness-of-fit index (gfi), relative fit index (rfi), normed fit index (nfi), bollen’s fit index (bl89) and akaike’s information criterion. if the fit indices indicate that the one-factor model is the best fit for the data, researchers often conclude that the scale is unidimensional, whereas fit indices that support either a second-order or bifactor model are taken as evidence of multidimensionality. however, rodriguez et al. (2016b) have called these conclusions an ‘overly simplistic conceptualization of the dimensionality of psychological data’ (p. 231). there is also growing scepticism about relying on fit indices alone. for example, morgan et al. (2015) describes these fit indices as useful but cautions that ‘the exclusive use of approximate fit statistics is perilous’ (p. 17). judgements about the dimensionality of a measuring instrument based solely on model fit indices are problematic for two reasons. firstly, it has been demonstrated that these indices generally favour bifactor models even in instances where the item loadings on general and specific factors may be relatively low (bornovalova et al., 2020). secondly, model fit indices fail to capture the relative strength of the general factor and specific factors (reise et al., 2013). if model fit indices support the bifactor model as the best fit, at least three possible conclusions can be drawn: (1) the instrument is essentially unidimensional, because the specific factors do not account for specific unique variance other than that explained by the general factor; (2) some limited evidence of multidimensionality exists, but is not sufficient to exclude a unidimensional interpretation; or (3) the specific factors account for sufficient reliable variance in addition to the variance accounted for by the general factor to support the interpretation of the instrument as multidimensional. to examine the dimensionality of an instrument, rodriguez et al. (2016a) have urged researchers to calculate ancillary bifactor indices in addition to model fit indices. ancillary bifactor indices include explained common variance (ecv), omega hierarchical (omegah) and percentage of uncontaminated correlations (puc). indices such as these enable an evaluation of dimensionality. it is also possible to compute mcdonald’s omega: a model-based estimate of reliability. explained common variance refers to the percentage of variance amongst all items that can be explained by each factor (ecv for general factor and ecv_s for specific factors). percentage of uncontaminated correlations measures the number of unique correlations amongst items that can be explained by the general factor alone. omegah measures the proportion of systematic variance in total scores that can be attributed to individual differences on the general factor (rodriguez et al., 2016a). the purpose of this commentary is to demonstrate the importance of calculating ancillary bifactor indices in addition to model fit indices to examine the dimensionality of an instrument. to this end, bifactor indices were calculated for three published papers that concluded that a bifactor model was the best fitting model for the study data. method three published studies were selected to demonstrate the three possible outcomes of examining the dimensionality of an instrument, as described in the introduction. the studies are described below: padmanabhanunni (2017) examined the factor structure of the normative beliefs about aggression scale (nobags: huesmann et al., 2011). a cfa confirmed that a bifactor model with a total scale (approval of aggression) and two subscales (retaliation beliefs and general beliefs) was the best fitting model (χ2 > 0.05, gfi, rfi, nfi > 0.95 and rmsea = 0.05). heppner et al. (2002) examined the generalisability of problem-solving appraisal amongst black south africans and investigated the psychometric properties of the problem solving inventory (psi: heppner, 1988). the results of cfa (χ2 < 0.05, cfi > 0.95, nfi > 0.90, bl89 > 0.95 and rmsea = 0.08) supported the hypothesised bifactor structure of the psi as a total scale of problem-solving appraisal and three subscales (problem-solving confidence, approach-avoidance style and personal control). norr et al. (2015) examined a bifactor model of the cyberchrondria severity scale (css: mcelroy & shevlin, 2014) which assesses anxiety and behaviours associated with seeking online health information. a cfa confirmed a bifactor structure (χ2 > 0.05, cfi > 0.95, rmsea = 0.07) consisting of a total cyberchondria scale and four subscales (reassurance, excessiveness, distress and compulsion). analysis the standardised regression loadings reported in the three studies were used to calculate the bifactor indices necessary to assess the instruments’ dimensionality. the bifactor indices calculator (dueber, 2017) was used for these calculations. the existing literature provides guidelines regarding the interpretation of these indices. explained common variance provides an indication of the relative strength of factors, such that a higher ecv (>0.80: rodriguez et al., 2016b) is associated with a strong general factor and indicates that the instrument is essentially unidimensional. omegah indicates the proportion of systematic variance in total scores that is attributable to individual differences on the general factor. it has been suggested by rodriguez et al. (2016b) that an omegah greater than 0.80 indicates that the instrument is essentially unidimensional. finally, it has also been recommended that researchers consider ecv and omegah in conjunction with puc, and reise et al. (2013) suggest that puc values lower than 0.80, together with general ecv values greater than 0.60 and omegah of the general factor greater than 0.70 would indicate that the presence of some multidimensionality that is not strong enough to rule out the interpretation of the instrument as essentially unidimensional. results the results of the ancillary bifactor analyses are reported in table 1. table 1: bifactor indices for three studies. in the padmanabhanunni (2017) study, the general factor of the nobags accounted for 54% of the common variance, and the two specific factors accounted for 46% of the common variance (20% and 26%, respectively). omegah was 0.60, well below the cut-off of 0.80 suggested by rodriguez et al. (2016b). when considered with puc, the ecv of the general factor was below 0.60 and omegah was below 0.70. these bifactor indices clearly support the interpretation of the nobags as multidimensional. in the heppner et al. (2002) study, the general factor of the psi accounted for 63% of the variance, and the three specific factors accounted for 14%, 6% and 16% of the variance, respectively. omegah was below 0.80, which suggests that the instrument may have some multidimensionality. however, when considered with puc, ecv was greater than 0.60, omegah was greater than 0.70 and puc was lower than 0.80. these findings indicate that there is some evidence of multidimensionality, but the evidence is not strong enough to overrule the interpretation of the psi as unidimensional. finally, in the norr et al. (2015) study, the general factor of the css accounted for 80% of the variance, and just 20% of the variance was explained by the four specific factors. the variance explained by each of the four specific factors ranged from 3% to 7%. omegah was above 0.80, which indicates that this instrument is essentially unidimensional. its unidimensionality was further confirmed when puc, ecv and omegah were considered together (puc < 0.80, ecv > 0.60 and omegah > 0.70). conclusion the aim of this commentary was to demonstrate that model fit indices alone provide insufficient evidence to draw conclusions about the dimensionality of a measuring instrument. three published papers that drew such conclusions based on fit indices of a bifactor model were subjected to ancillary bifactor analyses, in which ecv, omegah and puc were used to determine the relative strength of the general factor and the specific factors. these analyses indicated that one instrument (nobags) demonstrated sufficient evidence of multidimensionality. one instrument (psi) demonstrated some evidence of multidimensionality, but the evidence was not strong enough to rule out the possibility of the instrument being unidimensional. the third instrument (css) did not demonstrate evidence of multidimensionality and was determined to be essentially unidimensional. these findings highlight the insufficiency of solely relying on cfa model fit indices to draw conclusions about the hypothesised structure of a measuring instrument. the model fit indices for these three studies, reported in the methods section above, showed acceptable fit indices for all three studies. however, the bifactor indices demonstrated that the assumption of multidimensionality is not tenable. researchers, investigating bifactor models are urged to go beyond model fit indices and investigate the pattern of item loadings as well as calculating ancillary bifactor indices. acknowledgements competing interests the author declares that he has no financial or personal relationships that may have inappropriately influenced him in writing this research article. author’s contribution t.b.p. is the sole author of this research article. ethical considerations this article followed all ethical standards for research without direct contact with human or animal subjects. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author. references bornovalova, m.a., choate, a.m., fatimah, h., petersen, k.j., & wiernik, b.m. (2020). appropriate use of bifactor analysis in psychopathology research: appreciating benefits and limitations. biological psychiatry, 88(1), 18–27. https://doi.org/10.1016/j.biopsych.2020.01.013 brown, t.a., & moore, m.t. (2012). confirmatory factor analysis. in handbook of structural equation modeling (pp. 361–379). retrieved from https://www.researchgate.net/profile/michael_moore8/publication/251573889_hoyle_cfa_chapter_-_final/links/0deec51f14d2070566000000/hoyle-cfa-chapter-final.pdf clark, l.a., & watson, d. (1995). constructing validity: basic issues in objective scale development. psychological assessment, 7, 309 – 319. retrieved from http://www.bwgriffin.com/gsu/courses/edur9131/content/clark_validity_scaledevelopment.pdf dueber, d.m. (2017). bifactor indices calculator: a microsoft excel-based tool to calculate various indices relevant to bifactor cfa models. https://doi.org/10.13023/edp.tool.01 heppner, p.p. (1988). the problem-solving inventory. palo alto, ca: consulting psychologist press. heppner, p.p., pretorius, t.b., wei, m., lee, d.g., & wang, y.w. (2002). examining the generalizability of problem-solving appraisal in black south africans. journal of counseling psychology, 49(4), 484. https://doi.org/10.1037/0022-0167.49.4.484 huesmann, l.r., guerra, n.g., miller, l., & zelli, a. (2011). the normative beliefs about aggression scale [nobags]. retrieved from https://rcgd.isr.umich.edu/aggr/measures/normativebeliefsaboutaggscale.2011.pdf mcelroy, e., & shevlin, m. (2014). the development and initial validation of the cyberchondria severity scale (css). journal of anxiety disorders, 28(2), 259–265. https://doi.org/10.1016/j.janxdis.2013.12.007 morgan, g.b., hodge, k.j., wells, k.e., & watkins, m.w. (2015). are fit indices biased in favor of bi-factor models in cognitive ability research?: a comparison of fit in correlated factors, higher-order, and bi-factor models via monte carlo simulations. journal of intelligence, 3(1), 2–20. https://doi.org/10.3390/jintelligence3010002 norr, a.m., allan, n.p., boffa, j.w., raines, a.m., & schmidt, n.b. (2015). validation of the cyberchondria severity scale (css): replication and extension with bifactor modeling. journal of anxiety disorders, 31, 58–64. https://doi.org/10.1016/j.janxdis.2015.02.001 padmanabhanunni, a. (2017). the factor structure of the normative beliefs about aggression scale as used with a sample of adolescents in low socio-economic areas of south africa. south african journal of psychology, 49(1), 27–38. https://doi.org/10.1177%2f0081246317743185 reise, s.p., scheines, r., widaman, k.f., & haviland, m.g. (2013). multidimensionality and structural coefficient bias in structural equation modeling a bifactor perspective. educational and psychological measurement, 73(1), 5–26. https://doi.org/10.1177%2f0013164412449831 reynolds, m.r., & keith, t.z. (2017). multi-group and hierarchical confirmatory factor analysis of the wechsler intelligence scale for children – fifth edition: what does it measure? intelligence, 62, 31–47. https://doi.org/10.1016/j.intell.2017.02.005 rodriguez, a., reise, s.p., & haviland, m.g. (2016a). evaluating bifactor models: calculating and interpreting statistical indices. psychological methods, 21(2), 137. https://doi.org/10.1037/met0000045 rodriguez, a., reise, s.p., & haviland, m.g. (2016b). applying bifactor statistical indices in the evaluation of psychological measures. journal of personality assessment, 98(3), 223–237. https://doi.org/10.1080/00223891.2015.1089249 zangaro, g.a. (2019). importance of reporting psychometric properties of instruments used in nursing research. western journal of nursing research, 41(11), 1548–1550. https://doi.org/10.1177/0193945919866827 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) karina mostert management cybernetics research entity, faculty of economic and management sciences, north-west university, potchefstroom, south africa leon t. de beer workwell research unit, faculty of economic and management sciences, north-west university, potchefstroom, south africa ronalda de beer management cybernetics research entity, faculty of economic and management sciences, north-west university, potchefstroom, south africa citation mostert, k., de beer, l.t., & de beer, r. (2023). psychometric properties of the flourishing scale for south african first-year students. african journal of psychological assessment, 5(0), a130. https://doi.org/10.4102/ajopa.v5i0.130 review article psychometric properties of the flourishing scale for south african first-year students karina mostert, leon t. de beer, ronalda de beer received: 10 nov. 2022; accepted: 29 jan. 2023; published: 24 mar. 2023 copyright: © 2023. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract this study focused on a positive construct of well-being, namely flourishing. in a multicultural and diverse country such as south africa, it is a legal requirement to provide evidence that measures of psychological constructs, like flourishing, are fair, unbiased, and equivalent for diverse groups in the country. the aim was to test the psychometric properties of the flourishing scale, a purpose-made scale that measures positive functioning across various areas of life. this study tested the factorial validity, item bias, measurement invariance and reliability of the flourishing scale in a sample of 1088 south african first-year university students. a unidimensional structure was confirmed. although three items showed statistically significant uniform and total bias for language and campus groups, the magnitude and practical impact were negligible. no evidence of bias across gender groups was found. configural, metric and partial scalar invariance were established for language and campus groups. full measurement invariance was established across gender groups. cronbach’s alpha coefficient was 0.91, indicating high reliability. the study provided promising results for using the flourishing scale among south african university students to measure flourishing as an aspect of well-being. contribution: this study contributes to the field of student well-being in south africa. no studies could be found that test for item bias or measurement invariance of the flourishing scale, specifically for south african first-year students. this study is the first to test these psychometric properties of a flourishing scale in a multicultural setting for students from different languages. keywords: flourishing; factorial validity; item bias; differential item functioning; measurement invariance; internal consistency; first-year students; university. introduction it is well established that first-year students face various challenges when transitioning from secondary to tertiary education (kelly & finlayson 2016; nair & fisher 2000; van zyl 2016). as students are often far from their loved ones, they feel alone, isolated, and stressed (eagan et al. 2015). as a result, transitioning to higher education and adjusting to all the unfamiliar challenges encountered during the first year can negatively affect students’ well-being (eagan et al. 2015; vuckovic, riley & floyd 2019). however, it is also essential to identify and support students who are doing well and provide resources to help them flourish. the idea of flourishing has emerged as a critical component of subjective well-being (diener et al. 2010). high levels of positive feelings characterise flourishing – the sense that one has a purpose in life, fosters positive relationships with others, cultivates optimism and strengthens high self-esteem (diener et al. 2010). flourishing also refers to a person’s knowledge of their life or how well they believe it to be and is linked to hedonic and eudemonic well-being (keyes 2002). knowledge of students’ levels of flourishing could help higher education institutions (heis) to motivate students to make an effort to achieve their academic objectives, enhance their welfare, and help train productive employees (botha, mostert & jacobs 2019). the flourishing of first-year students is essential to heis, as this affects the process of graduation and their readiness to work (jayawickreme & dahill-brown 2016; schneiderman, ironson & siegel 2005). diener et al. (2010) developed a psychometric scale, the flourishing scale, to answer the need for a purpose-made scale to measure psychological flourishing. although the scale does not give distinct metrics of different aspects of flourishing, it provides an overview of positive functioning across various areas in life generally perceived to be significant. the scale measures universal human psychological needs, meaning and purpose in life, optimism, and feelings of competence (diener et al. 2010). this scale can be a valuable tool for heis to identify students’ flourishing levels to develop effective interventions to enhance levels of understanding and learn from students who are doing well at university who are thriving. it is crucial to use scales that prove to be psychometrically sound. in a multicultural and diverse country such as south africa, it is vital to test measures of psychological constructs to ensure they are fair, unbiased and equivalent for all ethnicities, languages, and other diverse groups in south africa. south african law requires evidence that tests are appropriate, impartial and unbiased. this is stipulated in the employment equity act 55 of 1998, section 8 (government gazette 1998), which states that any form of psychological tests or similar assessments are prohibited unless the test or assessment being used is valid and reliable, can be applied fairly to all employees, and is not biased or discriminating against any employee or group. the more rigorous testing of measures in diverse contexts are, it is not only applicable to south africa, but also to other countries with diverse student populations. with the increasing migration and globalisation, many countries have become more diverse and multicultural (van de vijver & rothmann 2004). it is also true for heis, where there is an influx of international students who need support (mckay, o’neill & petrakieva 2018). multicultural testing is therefore of interest to other diverse settings, including student populations. central to multicultural assessment is bias and equivalence concepts (van de vijver & rothmann 2004). bias refers to certain nuisance factors that impede the comparability of test scores. equivalence testing ensures the comparability of test scores across cultures or groups. when test scores are free of bias and demonstrate equivalence (or invariance), the scores can be compared across cultures or different sub-groups. of particular interest are item bias and measurement invariance. item bias (also referred to as differential item functioning [dif]) occurs when respondents from different groups score differently on the item, even though they have the same standing on the underlying construct. familiar sources of item bias include: differential response styles, poor item translation and ambiguous items, and the connotative meaning and appropriateness of the item content based on cultural specifics. measurement invariance has: (1) configural invariance (the extent to which a factor structure can be replicated across groups), (2) metric invariance (equal factor loadings for similar items across groups), and (3) scalar invariance (similar meaning or interpretation for different groups) (laher 2008; van de vijver & rothmann 2004). in addition, confirmatory factor analysis (cfa) and internal consistency (cronbach’s coefficient alpha) were used to test the factor structure and reliability of the flourishing scale. concerning factorial validity, the scale has a one-factor structure (didino et al. 2019; duan & xie 2019; muñoz & nieto 2019; singh, junnarkar & jaswal 2016), also in student samples (hone, jarden & schofield 2014; senol-durak & durak 2019; sumi 2014). many studies have shown that the flourishing scale has a high level of internal consistency, with cronbach’s alpha coefficients ranging from 0.80 to 0.91 (choudhry et al. 2018; didino et al. 2019; muñoz & nieto 2019; singh et al. 2016). no studies could be found that test for item bias or measurement invariance of the flourishing scale, specifically for south african first-year students. therefore, this study aims to provide psychometric evidence for the applicability of the flourishing scale in the diverse context of a south african university. more specifically, this study tested the factorial validity, item bias, metric, scalar and configural invariance, and internal consistency of the scale among first-year university students. methods participants the study’s target demographic group was first-year university students enrolled at a south african university. a sample of 1088 participants was used, of which 72.4% were between the ages of 17 and 20 years and 16.7% were between 21 and 22 years. south africa has 11 official languages distributed in different parts of the country. the languages most frequently used by students of the participating university were included in the analyses: afrikaans (260, 23.9%), setswana (199, 18.3), sesotho (152, 14.0%) and english (94, 8.6%). the university has three campuses: campus 1 is a campus located in a peri-urban area (131, 12%), campus 2 is located in a medium-sized urban city (478, 43%), and campus 3 is a smaller campus located in a large industrial city. in total, 689 (63.3%) females and 319 (29.3%) males participated in the study. most participants were black students (62.3%), followed by white students (22.2%). instrument the flourishing scale (diener et al. 2010) is a concise eight-item measure of respondents’ self-perceived performance in critical life domains such as relationships, self-esteem, intention, and optimism. a 7-point likert scale was used, ranging from 1 (strongly disagree) to 7 (strongly agree). an example item is: ‘i lead a purposeful and meaningful life’. a high score indicates that the individual possesses psychological resources and strengths. the scale showed good psychometric qualities. the cronbach’s alpha coefficient is reported as 0.82 (diener et al. 2010). procedure the participating university accepted and authorised the project, and the study was granted ethics clearance. a secure direct link to the questionnaire was put on the university’s online portal. throughout the study’s duration, students were informed about the research and encouraged to participate voluntarily. this was accomplished through field workers who presented brief awareness sessions in classrooms. before completing the questionnaire, participants were required to sign an informed consent form. furthermore, participants were assured that their reported responses would be anonymous, that the data gathered in the study would adhere to the project’s confidentiality criteria, and that the findings would be carefully stored in a secure database that would be password protected. data analysis mplus 8.6 (muthén & muthén 2021) was used to conduct the statistical analyses. confirmatory factor analysis was used to test the factorial validity of the flourishing scale. maximum likelihood estimation was used, with the covariance matrix as input. the following fit indices were considered to assess the fit of the measurement model: the χ² statistic, the comparative fit index (cfi), the tucker–lewis index (tli), the root mean square error of approximation (rmsea), and the standardised root mean square residual (srmr). proper fit is considered at a value of 0.90 and above for the cfi and tli (byrne 2001; hoyle 1995). for the rmsea, a value of 0.05 or less indicates a good fit, whereas values between 0.05 and 0.08 are considered an acceptable model fit (browne & cudeck 1993; chen et al. 2008). differential item functioning was used to test for the presence of item bias for language (four of the languages most frequently used by students at the participating university: afrikaans, setswana, sesotho and english), campus (the three campuses described here) and also included males and females. two forms of bias were tested: uniform and non-uniform bias. uniform bias refers to the systematic difference in ability levels of the underlying construct between compared groups (swaminathan & rogers 1990; teresi & fleishman 2007). non-uniform bias is the difference in the likelihood of related answers across different groups fluctuating across all ability levels (swaminathan & rogers 1990; teresi & fleishman 2007). the lordif package (choi, gibbons & crane 2011) in rstudio team (2020) was used. the following formulas were used and compared with test for uniform and non-uniform bias, using ordinal logistic regression to generate three likelihood-ratio χ² statistics (choi et al. 2011): biased items are flagged when statistically significant differences are detected, that is when the log-likelihood values of models are compared and p < 0.01; for uniform bias when comparing models 1 and 2 , for non-uniform bias when comparing models 2 and 3 ; for a total dif effect, comparing models 1 and 3 (choi et al. 2011). the pseudo-mcfadden r2 statistic is used to quantify the impact or practically significant effect of dif, classifying the magnitude of dif as negligible (< 0.13), moderate (between 0.13 and 0.26), or large (> 0.26) (zumbo 1999). in addition, the impact of uniform dif can be determined using the β1 coefficient when models 1 and 2 are compared (crane, van belle & larson 2004). different thresholds, ranging from a 10% difference between models 1 and 2, indicate a practically meaningful effect (crane et al. 2004; maldonado & greenland 1993). measurement invariance was investigated for the same language, campus, and gender groups. this was carried out in a multigroup analysis framework including the: (1) configural invariance model (i.e. the baseline model for the more constrained models and the test if a similar underlying latent factor is evident in the different groups); (2) metric invariance model (assumes the invariance or similarity of the factor loading in the different groups); and (3) scalar invariance model (test if the factor loadings and item intercepts are invariant or similar in the different groups) (preti et al. 2013). the cfi and rmsea values were used. for cfi, the fit is considered adequate if values are > 0.90 and better if they are > 0.95. for rmsea, the cut-off value is < 0.08, but better is < 0.05 (van de schoot, lugtig & hox 2012). in addition, changes in cfi were used as recommended by shi et al. (2019). a δcfi value higher than 0.01 between two nested models indicates that the added group constraints have led to a poorer fit; in other words, the more constrained model is rejected. by freeing the loading of items, partial metric invariance can be achieved (cheung & rensvold 2002; preti et al. 2013). cronbach’s alpha coefficient was used to determine the reliability of the scales. a cut-off point of 0.70 is deemed satisfactory (nunnally & bernstein 1994). ethical considerations the study was approved by the ethics committee, faculty of economic and management sciences (ec-ems) (ethics no.: nwu-hs-2014-0165-a4). before completing the questionnaire, participants were required to sign an informed consent form. in addition, participants were assured that their reported responses would be anonymous, that the data gathered in the study would adhere to the project’s confidentiality criteria, and that the findings would be stored in a secure database that is password protected. results factorial validity with regard to the factorial validity of the flourishing scale, a one-factor structure showed a good fit to the data (χ2 = 180.11; df = 19; cfi = 0.94; tli = 0.91; rmsea = 0.079; srmr = 0.04). the standardised loadings are shown in table 1. table 1: standardised factor loadings. all items had high factor loadings (λ) (shevlin et al. 1998), ranging from 0.65 (item 8) to 0.80 (item 1). item bias (differential item functioning) uniform, non-uniform and total bias were tested (see table 2). table 2: differential item functioning. items 2, 3 and 7 showed statistically significantly uniform and total bias for the included language and campus groups, while no bias was detected between males and females. to determine if the magnitude of dif for these three items were of practical significance, pseudo-mcfadden r2 values and the difference in the β1 coefficient were inspected. in addition, visual graphs are provided for each item to demonstrate the effect between language and campus groups (figure 1, figure 2, figure 3, figure 4, figure 5 and figure 6). each of these figures present four graphs providing additional diagnostic information, including the item characteristic curve for the different groups (in this case, language and campus groups; upper-left graph); the item response functions for the parameter estimates for each group (lower-left graph); the absolute difference between item characteristic curves for sub-groups (upper-right graph); and the absolute difference between the item characteristic curves of the sub-groups weighted by the score distribution (choi et al. 2011). figure 1: graphical display of item 2, which shows uniform and non-uniform differential item functioning with respect to language groups. (a) items true score functions item 2; (b) differences in items true score functions; (c) item response functions; (d) impact (weighed by density). figure 2: graphical display of item 3, which shows uniform and non-uniform differential item functioning with respect to language groups. (a) items true score functions item 3; (b) differences in items true score functions; (c) item response functions; (d) impact (weighed by density). figure 3: graphical display of item 7, which shows uniform and non-uniform differential item functioning with respect to language groups. (a) items true score functions item 7; (b) differences in items true score functions; (c) item response functions; (d) impact (weighed by density). figure 4: graphical display of item 2, which shows uniform and non-uniform differential item functioning with respect to campuses. (a) items true score functions item 2; (b) differences in items true score functions; (c) item response functions; (d) impact (weighed by density). figure 5: graphical display of item 3, which shows uniform and non-uniform differential item functioning with respect to campuses. (a) items true score functions item 3; (b) differences in items true score functions; (c) item response functions; (d) impact (weighed by density). figure 6: graphical display of item 7, which shows uniform and non-uniform differential item functioning with respect to campuses. (a) items true score functions item 7; (b) differences in items true score functions; (c) item response functions; (d) impact (weighed by density). for all three items in language and campus groups, the differences between language and campus groups were slightly different compared with each other; however, these differences were negligible, as can be seen in the density-weighted impact in each figure (bottom right plots). also, the pseudo-mcfadden r2 statistic values were all smaller than 0.13 and the difference in β1 coefficients smaller than 5%. as a result, dif’s magnitude or practical impact on these three items can be classified as negligible. measurement invariance the results of the configural, metric and scalar invariance testing across the language, campus, and gender groups included in this study are shown in table 3. table 3: measurement invariance analysis. with regard to language and campus, configural and metric invariance were established. the results of scalar invariance showed that δcfi for language was –0.024 and for campus –0.018 (higher than 0.01). consequently, partial scalar invariance was established, releasing the intercept of items 4 and 7 in the afrikaans and english language groups and items 3 and 7 in all three campus groups. configural, metric and scalar invariance was confirmed for gender. internal consistency as a measure of internal consistency, cronbach’s alpha coefficient was calculated to establish the internal consistency of the flourishing scale. with α = 0.91, the flourishing scale was found to be reliable (nunnally & bernstein 1994). discussion this study aimed to test the psychometric properties of the flourishing scale to determine if this scale is valid and reliable for assessing flourishing, a positive construct of psychological well-being, in south african first-year university students. the study’s primary objective was to determine the factorial validity, item bias, metric, scalar and structural invariance, and internal consistency. concerning the factorial validity, the results showed that a one-factor structure was a good fit for the data. the findings are consistent with previous studies, where a one-factor structure was confirmed in student samples from new zealand, turkey, and japan (hone et al. 2014; senol-durak & durak 2019; sumi 2014). differential item functioning was used to determine uniform and non-uniform bias. statistically significant uniform and total bias were found across language and campus groups for items 2, 3 and 7. however, the magnitude or practical impact of this bias was negligible. this means that, on a practical level, the language, campus, and gender sub-groups included in this study understood the items identically across groups, and that no incongruities at the item level exist for participants in these sub-groups (cleary & hilton 1968; van de vijver & tanzer 2004). regarding measurement invariance, configural invariance was established for all included sub-groups. the results show that the one-factor structure of the flourishing scale has the same pattern and fits the data equally well in all groups. therefore, the factor structure can be replicated similarly for different language, campus and gender groups (byrne, shavelson & muthén 1989; putnick & bornstein 2016). metric invariance was also established for all sub-groups, indicating that the loading of each item contributes equally to the latent construct of flourishing across the different groups. although scalar invariance was confirmed for gender, only partial scalar invariance was established for language and campus groups because of the δcfi values higher than 0.01 (cheung & rensvold 2002; preti et al. 2013). this implies that specific item intercepts were not equivalent between language and campus groups. as a result, the intercepts of items 4 and 7 of two language groups (i.e. afrikaans and english) and items 3 and 7 in all three campus groups had to be released to establish partial invariance. even though these parameters can vary across groups, valid inferences can still be made when at least two intercepts and factor loadings are equally constrained, which is in line with the findings of previous studies (laguna et al. 2017; van de schoot et al. 2012). the cronbach’s alpha coefficient was calculated to determine the internal consistency of the flourishing scale and showed a reliability coefficient of 0.91. various research studies have found that the flourishing scale has a high level of internal consistency, with cronbach’s alpha coefficients ranging from 0.80 to 0.91 (choudhry et al. 2018; didino et al. 2019; muñoz & nieto 2019; singh et al. 2016). limitations and recommendations even though the findings of this study are promising, several limitations must be mentioned. the study’s primary focus was on first-year university students in south africa. therefore, the study should be replicated for senior students, other universities, and other countries with multicultural populations. south africa has 11 official languages, of which only 4 were included in this study. other language groups should also be included in future studies. three items seemed to be somewhat problematic (items 3, 4 and 7) regarding bias and invariance. even though the practical effect was small and negligible, future studies should investigate how these items function in other samples. conclusion this study provides initial support for using the flourishing scale in a south african sample of first-year university students and opens the way for its further use in other student samples. the scale demonstrated high reliability, and the dif and invariance analyses confirmed that no practically significant incongruities exist between language, campus, and gender groups. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions k.m. conceived of the presented idea and supervised the study. k.m. and l.t.d.b. verified the analytical methods. l.t.d.b. conducted the statistical analyses. r.d.b. wrote the original draft and k.m. supervised the study while reviewing and editing the manuscript. l.t.d.b. assisted with the interpretation of the results. k.m. provided necessary resources and acquired the funding for the project. all authors discussed the results and contributed to the final manuscript. funding information the material described in this article is based on work supported by: (1) the office of the deputy vice-chancellor: teaching and learning at the university. data availability derived data supporting the findings of this study are available from the corresponding author, k.m., on request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors and the publisher. references botha, b., mostert, k., & jacobs, m. (2019). exploring indicators of subjective well-being for first-year university students. journal of psychology in africa, 29(5), 480–490, https://doi.org/10.1080/14330237.2019.1665885 browne, m.w., & cudeck, r. (1993). alternative ways of assessing model fit. in k.a. bollen & j.s. long (eds.), testing structural equation models (pp. 136–162). sage. byrne, b.m. (2001). structural equation modeling with amos: basic concepts, applications, and programming. lawrence erlbaum associates publishers. byrne, b.m., shavelson, r.j., & muthén, b. (1989). testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. psychological bulletin, 105(3), 456–466. https://doi.org/10.1037/0033-2909.105.3.456 chen, f., curran, p.j., bollen, k.a., kirby, j., & paxton, p. (2008). an empirical evaluation of the use of fixed cut-off points in rmsea test statistic in structural equation models. sociology methods & research, 36(4), 462–494. https://doi.org/10.1177/0049124108314720 cheung, g.w., & rensvold, r.b. (2002). evaluating goodness-of-fit indexes for testing measurement invariance. structural equation modeling, 9(2), 233–255. https://doi.org/10.1207/s15328007sem0902_5 choi, s.w., gibbons, l.e., & crane, p.k. (2011). lordif: an r package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and monte carlo simulations. journal of statistical software, 39(8), 1–30. https://doi.org/10.18637/jss.v039.i08 choudhry, f.r., al-worafi, y.m., akram, b., ahmed, m.a., anwar ul haq, m., khan, t.m., rehman, i.u., barki, n., munawar, k., kamal, a., kassab, y.w., bakrin, f.s., & golden, k.j. (2018). factor structure of urdu version of the flourishing scale. frontiers in psychology, 9, 1513. https://doi.org/10.3389/fpsyg.2018.01513 cleary, t.a., & hilton, t.l. (1968). an investigation of item bias. educational and psychological measurement, 28(1), 61–75. https://doi.org/10.1177/001316446802800106 crane, p.k., van belle, g., & larson, e.b. (2004). test bias in a cognitive test: differential item functioning in the casi. statistics in medicine, 23(2), 241–256. https://doi.org/10.1002/sim.1713 didino, d., taran, e.a., barysheva, g.a., & casati, f. (2019). psychometric evaluation of the russian version of the flourishing scale in a sample of older adults living in siberia. health and quality of life outcomes, 17(1), 34. https://doi.org/10.1186/s12955-019-1100-6 diener, e., wirtz, d., tov, w., kim-prieto, c., choi, d., oishi, s., & biswas-diener, r. (2010). new well-being measures: short scales to assess flourishing and positive and negative feelings. social indicators research, 97(2), 143–156. https://doi.org/10.1007/s11205-009-9493-y duan, w., & xie, d. (2019). measuring adolescent flourishing: psychometric properties of flourishing scale in a sample of chinese adolescents. journal of psychoeducational assessment, 37(1), 131–135. https://doi.org/10.1177/0734282916655504 eagan, k., stolzenberg, e.b., bates, a.k., aragon, m., suchard, m.r., & rios-aguilar, c. (2015). the american freshman: national norms fall 2015. higher education research institute, university of california. retrieved from https://www.heri.ucla.edu/monographs/theamericanfreshman2015.pdf government gazette. (1998). employment equity act no. 55 of 1998. 19 october 1998, vol. 400, no. 19370, republic of south africa, cape town. hone, l., jarden, a., & schofield, g. (2014). psychometric properties of the flourishing scale in a new zealand sample. social indicators research, 119(2), 1031–1045. https://doi.org/10.1007/s11205-013-0501-x hoyle, r.h. (1995). the structural equation modeling approach: basic concepts and fundamental issues. in r.h. hoyle (ed.), structural equation modeling: concepts, issues, and applications (pp. 1–15). sage. jayawickreme, e., & dahill-brown, s.e. (2016). developing well-being and capabilities as a goal of higher education: a thought-piece on educating the whole student. in j. vittersø (ed.), handbook of eudaimonic well-being (pp. 473–484). springer international publishing ag. kelly, o.c., & finlayson, o.e. (2016). easing the transition from secondary school to higher education through recognition of the skills of our students. new directions in the teaching of physical sciences, 6, 51–55. https://doi.org/10.29311/ndtps.v0i6.385 keyes, c.l.m. (2002). the mental health continuum: from languishing to flourishing in life. journal of health & social behaviour, 43(2), 207–222. https://doi.org/10.2307/3090197 laguna, m., mielniczuk, e., razmus, w., moriano, j.a., & gorgievski, m.j. (2017). cross-culture and gender invariance of the warr (1990) job-related well-being measure. journal of occupational and organizational psychology, 90(1), 117–125. https://doi.org/10.1111/joop.12166 laher, s. (2008). structural equivalence and the neo-pi-r: implications for the applicability of the five-factor model of personality in an african context. sa journal of industrial psychology, 34(1), 76–80. https://doi.org/10.4102/sajip.v34i1.429 maldonado, g., & greenland, s. (1993). simulation study of confounder-selection strategies. american journal of epidemiology, 138(11), 923–936. https://doi.org/10.1093/oxfordjournals.aje.a116813 mckay, j., o’neill, d., & petrakieva, l. (2018). cakes (cultural awareness and knowledge exchange scheme): a holistic and inclusive approach to supporting international students. journal of further and higher education, 42(2), 276–288. https://doi.org/10.1080/0309877x.2016.1261092 muñoz, c.p., & nieto, b.b. (2019). spanish version of the flourishing scale (fs) on the parents of children with cancer: a validation through rasch analysis. frontiers in psychology, 10, 35. https://doi.org/10.3389/fpsyg.2019.00035 muthén, l.k., & muthén, b.o. (2021). mplus user’s guide (8th ed.). muthén & muthén. nair, c.s., & fisher, d.l. (2000). transition from senior secondary to higher education: a learning environment perspective. research in science education, 30(4), 435–450. https://doi.org/10.1007/bf02461561 nunnally, j.c., & bernstein, i.h. (1994). psychometric theory (3rd ed.). mcgraw-hill. preti, a., vellante, m., gabbrielli, m., lai, v., muratore, t., pintus, e., pintus, m., sanna, s., scanu, r., tronci, d., corrias, i., petretto, d.r., & carta, m.g. (2013). confirmatory factor analysis and measurement invariance by gender, age and levels of psychological distress of the short temps-a. journal of affective disorders, 151(3), 995–1002. https://doi.org/10.1016/j.jad.2013.08.025 putnick, d.l., & bornstein, m.h. (2016). measurement invariance conventions and reporting: the state of the art and future directions for psychological research. developmental review, 41, 71–90. https://doi.org/10.1016/j.dr.2016.06.004 rstudio team. (2020). integrated development for r. rstudio. author. retrieved from https://www.rstudio.com/products/team/ schneiderman, n., ironson, g., & siegel, s.d. (2005). stress and health: psychological, behavioral, and biological determinants. annual review of clinical psychology, 1, 607–628. https://doi.org/10.1146/annurev.clinpsy.1.102803.144141 senol-durak, e., & durak, m. (2019). psychometric properties of the turkish version of the flourishing scale and the scale of positive and negative experience. mental health, religion & culture, 22(10), 1021–1032. https://doi.org/10.1080/13674676.2019.1689548 shevlin, m., brunsden, v., & miles, j. n. v. (1998). satisfaction with life scale: analysis of factorial invariance, mean structures and reliability. personality and individual differences, 25(5), 911–916. https://doi.org/10.1016/s0191-8869(98)00088-9 shi, d., lee, t., & maydeu-olivares, a. (2019). understanding the model size effect on sem fit indices. educational and psychological measurement, 79(2), 310–334. https://doi-org.nwulib.nwu.ac.za/10.1177/0013164418783530 singh, k., junnarkar, m., & jaswal, s. (2016). validating the flourishing scale and the scale of positive and negative experience in india. mental health, religion & culture, 19(8), 943–954. https://doi.org/10.1080/13674676.2016.1229289 sumi, k. (2014). reliability and validity of japanese versions of the flourishing scale and the scale of positive and negative experience. social indicators research, 118(2), 601–615. https://doi.org/10.1007/s11205-013-0432-6 swaminathan, h., & rogers, h.j. (1990). detecting differential item functioning using logistic regression procedures. journal of educational measurement, 27(4), 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x teresi, j.a., & fleishman, j.a. (2007). differential item functioning and health assessment. quality of life research, 16(suppl. 1), 33–42. https://doi.org/10.1007/s11136-007-9184-6 van de schoot, r., lugtig, p., & hox, j. (2012). a checklist for testing measurement invariance. european journal of developmental psychology, 9(4), 486–492. https://doi.org/10.1080/17405629.2012.686740 van de vijver, a.j.r., & rothmann, s. (2004). assessment in multicultural groups: the south african case. south african journal of industrial psychology, 30(4), 1–7. https://doi.org/10.4102/sajip.v30i4.169 van de vijver, f.j.r., & tanzer, n.k. (2004). bias and equivalence in cross-cultural assessment: an overview. revue européenne de psychologieappliqué, 54(2), 119–135. https://doi.org/10.1016/j.erap.2003.12.004 van zyl, a. (2016). the contours of inequality: the links between socio-economic status of students and other variables at the university of johannesburg. journal of student affairs in africa, 4(1), 1–13. https://doi.org/10.14426/jsaa.v4i1.141 vuckovic, m., riley, j. b., & floyd, b. (2019). the first year colloquium: creating a safe space for students to flourish. journal of the scholarship of teaching and learning, 19(2), 172–186. https://doi.org/10.14434/josotl.v19i1.23517 zumbo, b.d. (1999). a handbook on the theory and methods of differential item functioning (dif): logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. directorate of human resources research and evaluation, department of national defence. abstract introduction methods summation of the review findings conclusion acknowledgements references about the author(s) cebokazi n. mtati department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa erica munnik department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa citation mtati, c.n., & munnik, e. (2023). instruments measuring emotional-social competence in preschoolers in south africa: a review study. african journal of psychological assessment, 5(0), a111. https://doi.org/10.4102/ajopa.v5i0.111 review article instruments measuring emotional-social competence in preschoolers in south africa: a review study cebokazi n. mtati, erica munnik received: 08 apr. 2022; accepted: 17 aug. 2022; published: 11 jan. 2023 copyright: © 2023. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract south african children still enter mainstream education with their emotional and social well-being compromised. therefore, an awareness of and emphasis on emotional social competencies as a domain of school readiness is essential. this review aimed to identify and describe instruments measuring emotional and social competency as a domain of school readiness in preschoolers and report on their psychometric properties. the study utilised a systematic review design. peer-reviewed articles that met the inclusion criteria were identified from six literature databases using boolean phrases. grey literature was also considered. the title search yielded 3872 articles. fifty-four articles were screened based on the abstract. from among these, four articles met the minimum threshold of 80% in the appraisal phase, proceeded to data extraction and were subjected to thematic synthesis in the summation phase. the emotional social screening tool for school readiness (e3sr), the emotional competence for screening for preschoolers (sce), the social competence for screening for preschoolers (scs), the preschool behavioural and emotional rating scale (prebers) and the school readiness screening instrument for grade 00 (pre–grade r) were identified as instruments that measure domains of emotional social competence in preschool children. the instruments displayed good psychometric characteristics. the e3sr and the school readiness screening instrument for grade 00 were locally constructed and deemed contextually appropriate for use in the south african context. the need for locally developed, standardised, cost-effective measures to supplement assessment in the educational environment remains a focus for further research. contribution: this review contributed to the body of knowledge related to contextually appropriate, psychometrically sound, accessible and affordable screenings available to schools and parents to assess emotional and social competencies in preschoolers in south africa. keywords: assessment; emotional social competency; instruments; preschoolers; school readiness; south africa. introduction the development of emotional and social skills or competencies is imperative and plays a vital role in children’s school readiness and adjustment (blair & peters, 2003; denham et al., 2014). in this article, competency is understood as the emotional and social age-appropriate behaviours that children possess and utilise effectively, resulting in emotional and social competence to enter formal schooling. it also refers to learned skills more broadly defined to include ‘the acquisition or development of specific capacities, abilities, aptitudes or competencies’ (gilbert et al., 2004). according to swim (2007), the attainment of social and emotional competencies is influenced by the social and cultural context in which children develop. bustin (2007) and mohamed (2013) acknowledge the fact that most preschoolers are unprepared to enter formal schooling because of inadequate exposure to early childhood learning opportunities and socio-economic challenges. the need to consider contextual factors that impact schools and communities to accommodate children’s unique learning needs is receiving ongoing attention (kokkalia et al., 2019). contextual factors such as the readiness of educational institutions to accommodate diversity, the family’s responsiveness towards children’s readiness and broader community factors such as the effect of violence or substances on the developmental trajectory of the child (kokkalia et al., 2019; munnik & smith, 2019) need to be kept in mind. a recent study conducted by wu et al. (2020), where mothers in lower socio-economic environments diagnosed with depression tend to experience challenges in their marriage and their parenting practices, which impacted negatively on their children’s abilities to establish the emotional and social skills required to establish and maintain interpersonal relationships in the early school environment, testifies to the importance of always keeping contextual factors in mind when the child is assessed for school readiness. children from disadvantaged backgrounds may find school adjustment and learning challenging as they need to adjust to a new school environment and need to establish relationships with peers and teachers (munnik & smith, 2019). if they are unable to establish these relationships, they will struggle to adjust to the various demands of conventional or formal schooling (puckett & black, 2002). given that many south african children enter mainstream schooling with their emotional, social, physical and intellectual well-being compromised (laher & cockcroft, 2013), it is of utmost importance that emphasis is placed on the development of children’s emotional and social skills. in addition to cognitive skills, emotional and social skills are identified as important in the establishment of children’s readiness to enter mainstream education. children who struggle with emotional regulation or management, specifically in dealing with negative emotions, may struggle to focus on learning, whereas those who have acquired adequate emotional regulation skills or manage their emotions in socially acceptable ways, are better able to easily engage in classroom activities, thereby making learning easier for them (denham et al., 2014). furthermore, schultz et al. (2010) indicate that emotional regulation skills in preschoolers in turn help them to be able to facilitate social problem-solving as well as to have the ability to engage in prosocial behaviour and effective communication instead of engaging in aggressive or oppositional behaviour. similarly, rademacher and koglin (2019) indicate that children who lack emotional skills have difficulty accessing competent solutions in the face of challenging situations and tasks and tend to react in oppositional or aggressive ways to solve problems, in comparison to children who have established these skills. it is clear that age-appropriate emotional and social skills remain vital for school readiness and academic success for the preschooler (mtati, 2020). school readiness assessments are one way to establish whether children are ready to enter mainstream schooling. as part of establishing grade r learners’ readiness in south africa, foundation phase teachers conduct continuous assessments primarily through observation, as prescribed by the department of basic education (dbe, 2014). in addition, collateral from parents and other role players such as paediatricians, social workers, occupational therapists, speech therapists and psychologists may be used to gain information about the learners’ abilities. school readiness assessments are seen as an additional source of information that might be used to establish if children are ready to enter mainstream education (laher & cockcroft, 2013). school readiness assessment measures can be classified as either screening or diagnostic measures. screening measures are usually cost effective, easy to use and used by multiraters to establish if further in-depth assessment is deemed necessary (munnik, 2018). ştefan et al. (2009) propose that screening measures provide a relatively good indication of whether a child is likely to have mastered the targeted construct or ability (ştefan et al., 2009). in contrast, diagnostic instruments are usually used to establish a formal diagnosis to inform specific treatment plans (foxcroft & roodt, 2013). diagnostic measures are usually used by trained professionals such as psychologists or psychiatrists. laher and cockcroft (2013) emphasised the lack of south africa–based literature on emotional and social competency as a domain of school readiness. in addition, amod and heafield (2013) argue that there is a lack of psychometrically sound locally developed school readiness assessment tools in south africa. munnik (2018) adds that most of the existing measures are not appropriate for use and are not able to cater to the range of children attending schools from diverse cultural and social backgrounds in the south african context. according to the literature, most of the instruments were developed more than 20 years ago, and in a post-apartheid south african setting, these assessments are out of date and inappropriate (mohamed, 2013; munnik et al., 2021). a few examples of south africa–based assessments still used by practitioners to establish children’s readiness for school are the junior south african individual scales (jsais) (madge et al., 1985), which assesses cognitive abilities; the griffiths developmental scales iii (stroud, 2016), which assesses foundations of learning, memory and social emotional development; the aptitude test for school beginners (asb), which assesses aptitudes necessary to be school ready (human sciences research council of south africa [hsrc], 2010) and the vinelands adaptive behaviour scales, which assesses communication, daily living skills, socialisation, motor skills, and maladaptive behaviour (roopesh, 2019). however, most of these instruments were developed abroad, with only the jsais and asb being developed locally more than 20 years ago. limited research has been conducted on the validity and reliability of all these instruments for use in a multi-cultural south africa (mtati, 2020). school readiness assessment practices prioritise motor development and broader cognitive and academic abilities and competencies as a domain of school readiness (amod & heafield, 2013) and exclude the assessment of the emotional and social aspects of the child (munnik & smith, 2019). therefore, more effective school readiness screening instruments that assess emotional social skills are important for the accurate measurement of young children’s emotional and social abilities or competencies during their preschool years (munnik, 2018). this review consolidated recent literature (2008–2018) on psychometric assessments that assess emotional or social competency as a domain of school readiness. the following research questions were investigated: what is the methodological quality of the studies related to psychometric assessments that assess emotional and social competency as an identified area or domain of school readiness? which instruments developed locally or abroad are currently available and appropriate to assess emotional and social competence or skills as a domain of school readiness in a multicultural south african context? how is emotional social competence operationalised? what are the technical qualities of the identified psychometric assessments that assess emotional and social competency in school-ready children? methods research design this study used a systematic review methodology and considered peer-reviewed, full-text studies that used a quantitative design, published from 2008 to 2018. the target population was preschool children between the ages of 4 and 6 years. this study expanded on the systematic review project conducted by munnik et al. (2015). search process the present study adopted the preferred reporting items for systematic reviews and meta-analyses (prisma) model cited in liberati et al. (2009). preferred reporting items for systematic reviews and meta-analyses recommends that systematic reviews comprise four levels of review which include identification, screening, eligibility (quality appraisal) and summation. studies were retrieved from two core sources: database searches and grey literature. based on their focus on psychology and education, the following databases were searched: academic search complete, ebscohost, education resources information center (eric), google scholar, psycarticles, psycinfo, sabinet, sage online and socindex. grey literature was included in the form of unpublished south african doctoral dissertations. the following search terms were used and combined in 11 boolean phrases: ‘emotional social competency’, ‘assessment’, ‘emotional competency’, ‘social competency’, ‘school readiness instrument’, ‘preschool’, ‘emotional social intelligence’, ‘emotional social readiness’ and ‘screening instrument’ in the identification phase of filtering. the articles that made it through the title and abstract searches were appraised by the use of the smith franciscus swartbooi (sfs) quality appraisal tool developed by smith et al. (2015). two reviewers were independently involved in the title and abstract search process with the aim of promoting and maintaining methodological rigour. a third reviewer was identified to assist with the appraisal of the extracted articles. appraisal was done independently. after the individual appraisals, the reviewers’ scores were compared; scores that differed were discussed until consensus was reached. there were minor discrepancies noted between the scores of the reviewers initially. most of the discrepancies were because of differences in scoring for the methodological rigour subsection of the sfs. both reviewers read through the sections of articles related to methodological rigour, noting the reason for discrepancies. this was resolved through discussions until an agreement was reached. finally, once the discrepancies were dealt with, only articles with a score of 80% and above, the set threshold on sfs, were accepted to proceed to the summation phase. study eligibility and appraisal the systematic review considered south african and international studies that included school readiness assessment instruments with a focus on emotional and social competency as an identified area or domain of school readiness in preschool children between the ages of 4 and 6 years. in south africa, the preschool population is defined by the south african schools act (republic of south africa, 1996), specifying that children need to enter grade 1 in the year that they are 7 years old. peer-reviewed, full-text studies that used a quantitative design that contained the highest level of evidence from 2008 to 2018 were included. the sfs appraisal tool was used to appraise articles. it has a total of 28 questions within three sections which include purpose of the measure, methodological rigour and general considerations. based on the overall quality of the article, each article was appraised and scored to obtain a total score (percentage) categorised as either weak (0% – 40%), moderate (41% – 60%), strong (61% – 80%) or excellent (80% – 100%). in order to be included in the current study, each article had to achieve 80% or above to ensure that only high-quality articles were used to extract relevant information in the summation phase. summation the focus of the review was descriptive, not statistical, generalisability; therefore, thematic synthesis was employed (gough et al., 2017), which involves the integration of findings and results aiming to provide a broad description of the research phenomenon. a self-developed data-extraction table was used to extract descriptive data (type of design, methodology and outcomes) to report on the study characteristics. thematic synthesis was employed to gather and synthesise information relating to the research aims. process results identification the title search yielded a result of 3872 articles via the database. during title search, 157 duplicates were identified and removed, and a total of 3663 titles were excluded from the review as they were considered inappropriate at face value. screening fifty-two articles were screened by abstract based on the inclusion criteria. thirty-seven abstracts were excluded because of their focus on intervention as well as the age of the participants not meeting the requirements of 4–6 years old as stipulated. policy reports, reviews and correlation studies were excluded, as well as articles that purely focused on cognitive abilities. at this stage, a decision was made to include grey literature in the form of unpublished south african doctoral dissertations, because insufficient south africa–based articles were found. the dissertation by munnik (2018) was found on google scholar, and the dissertation by mohamed (2013) was found via a preliminary search on google. eligibility at the end of the screening stage, 15 articles and two unpublished theses were retained for quality appraisal. of these, only two articles and the two unpublished theses were eligible for inclusion in the final summation based on their scores above the 80% threshold obtained on the sfs. articles excluded lacked detail in methodological rigour and did not report on item selection, assembling of the items, development of administration instructions and gender appropriateness. ethical considerations permission to conduct the study (reference number: hs19/6/7) was obtained from the humanities and social science research ethics committee of the university of the western cape. ethical guidelines to conduct a systematic review included using systematic, explicit, unbiased, transparent, rigorous and reproducible methods to synthesise and integrate evidence. to ensure that reliable and valid sources of data were used in the systematic review, search databases endorsed by the university of the western cape were used. permission to use the smith franciscus swartbooi (sfs) appraisal tool was also obtained from the developer. the authors of the original work were appropriately cited, so that there was no violation of copyright or intellectual property. summation of the review findings study characteristics the studies included (n = 4) represented various countries, two studies were conducted in south africa (mohamed, 2013; munnik, 2018), one in europe (romania) (ştefan et al., 2009) and one in america (washington, dc) (epstein et al., 2009). the studies provided an overview of the development of the instrument and proceeded with a detailed discussion of the technical qualities and psychometric characteristics of the instruments. most studies used survey design as the primary data-collection method to establish the factor structures of the tests. qualitative methods were used to report on content and face validity. sample sizes ranged from 1471 preschool children (epstein et al., 2009) to 310 preschool children (ştefan et al., 2009). urban samples were used in all studies, with the exception of ştefan et al. (2009), who included urban and rural samples. stratification of samples was employed in all studies including children from low, medium to high socio-economic groupings, with similar ratios for boys and girls. english versions of the protocols were used in all studies, except the scs and sce (ştefan et al., 2004), where protocols were administered in romanian or english. instruments and their characteristics the identified instruments measuring emotional and social skills in preschoolers include the emotional social screening tool for school readiness (e3sr) (munnik, 2018), the school readiness screening instrument for grade 00 (pre–grade r) (mohamed, 2013), the emotional competence screening for preschoolers (sce) and social competency screening for preschoolers (scs) (ştefan et al., 2009) and the preschool behavioural and emotional rating scale (prebers) (epstein et al., 2009). all of the measures were developed in the years as specified above. the prebers and e3sr were identified as strength-based measures, designed to assess preschoolers emotional social skills and competencies while the sce, scs and the school readiness screening instrument for grade 00 (pre–grade r) were identified as measures to identify developmental and academic risk in preschool children. all instruments were appropriate for use across the preschool age group, although two of these instruments focus on the age groups between 3–5 years (prebers) and 4–5.5 years (school readiness screening instrument for grade 00 (pre–grade r). the e3sr focuses on the age groups between 5–7 and the scs and sce on 5–7.5 years (the scs and sce also have scales for the younger age groups, 2.5–4 years, 4–5 years), thus targeting a broader age group. as the age requirement from grade 1 is 7 years in south africa, it can be assumed that the scales developed for the 5–7 age group might be the most appropriate scales to use to establish readiness on an emotional social level before entry to mainstream education, grade 1. in terms of administration, likert scales were used in all the instruments as the preferred rating scale. there was variability across the instruments concerning the duration of administration, ranging from 10 minutes (scs & sce) to 15 – 20 minutes (e3sr). likewise, the number of items varied across instruments, ranging from 42 to 57 items. the screening instruments require either parents or teachers who are familiar with the child’s skills and behavioural traits to complete the questionnaires. the school readiness screening instrument for grade 00 (pre–grade r) (mohamed, 2013) is the only instrument of the four that has a shortened version. shortened versions are usually easy to administer and more cost effective, and they assist with screening to establish if a more comprehensive assessment needs to be conducted (kruyen et al., 2013). theoretical and operational definitions the instruments operationalised emotional and social competence by covering multiple subdomains, with their respective items linked to each domain. the items included in the various domains and subdomains of each instrument were closely linked to their theoretical and operational definitions. table 1 provides an overview of the theoretical definitions as well as the domains and subdomains as operationalised in the instruments. table 1: theoretical and operational definitions. table 1 shows that mohamed (2013), munnik (2018) and ştefan et al. (2009) provided theoretical definitions of emotional and social competency as separate constructs. they divided social and emotional skills into two distinct but interrelated domains. there were similarities in the definitions, as they all viewed emotional competency as inclusive of the way that a child deals with and is able to cope with emotions in different contexts. for munnik (2018), emotional competency is inward-focused behaviour that is driven by the child’s internal sense of self that allows the child to manage with age-appropriate challenges. for mohamed (2013), emotional competency is the ability to express and understand emotions and the ability for emotional regulation in self and others. for ştefan et al. (2009), emotional competency is related to the child’s independence in dealing with emotion-provoking situations. the authors’ definitions of social competency also portrayed similar understandings, being inclusive of interactions and engagement with the social environment to achieve certain goals or tasks. munnik (2018) defined social competency as focusing on relationships with the external environment and on interactional relationships with people and cooperative activities such as play. mohamed (2013) viewed social competency as the child’s way of thinking, feeling and behaving to achieve social tasks. ştefan et al. (2009) describe social competency as the ability to exhibit socially acceptable behaviours with positive outcomes that allow children to achieve their goals. epstein et al. (2009) did not include theoretical or conceptual definitions in their article, as the main focus of the article was on the establishment of the scientific standards of the prebers and not on the construction per se. operational definitions the most comprehensive coverage was provided by munnik (2018), who included five subdomains of emotional competency (emotional maturity, emotional management, independence, sense of self and mental well-being and alertness) and four subdomains of social competency (social skills or confidence, prosocial behaviour, compliance with rules and communication skills). ştefan et al. (2009) covered three subdomains of emotional competency (emotional understanding, emotional expression, emotional regulation) and three subdomains of social competency (compliance with rules, interpersonal skills, prosocial behaviour). similarly, mohamed (2013) covered three subdomains within the emotional domain (empathy, emotional regulation and self-confidence) and three subdomains within the social domain (interpersonal competencies, social regulation behaviour and social graces). epstein et al. (2009) viewed emotional and social competency as one construct with four subdomains (emotional regulation, school readiness, social confidence and family involvement). emotional regulation and social or interpersonal skills were important domains identified in all of the studies. psychometric properties of the instruments table 2 provides a summary of the instruments’ scientific characteristics, inclusive of validity and reliability indices. table 2: validity and reliability indices per instrument. reliability internal consistency: all instruments demonstrated good to excellent internal consistency. cronbach’s alpha analysis indicated good to excellent reliability over 0.95 in the identified domains and subdomains of the e3sr. the cronbach’s alphas were high, with values over 0.80 for the domains in the sce and scs scales, and high values over 0.70 for the emotional and social domains in the school readiness screening instrument for grade 00 (pre–grade r). cronbach’s alpha for the domains of the prebers was also high, with values over 0.83. test–retest reliability: test–retest coefficients for teacher and parent forms of sce and scs for the 5–7.5 age group at a 3-month interval indicated values in 0.72–0.83 range. thus, test–retest coefficients showed good stability of the scale over a 3-month interval. test–retest reliability was not assessed and reported upon for the e3sr, prebers or the school readiness screening instrument for grade 00 (pre–grade r). it was mentioned as the focus for future research. inter-rater reliability: correlation coefficients were significant at p < 0.05 and in the range of low agreement for both sce and scs. inter-rater reliability was also not assessed and reported on in the other studies. it was recommended as a focus for future research. validity face and content validity: ştefan et al. (2009) used experts to establish if constructs are measured similarly in the parents’ and teachers’ forms of the sce and scs, while munnik (2018) used experts to establish if the items are representative of the stated domains and subdomains of the e3sr. the establishment of face and content validity for the prebers and readiness screening instrument for grade 00 (pre–grade r) was mentioned but not expanded upon in epstein et al. (2009) and mohamed (2013). construct validity: munnik (2018) employed exploratory factor analysis that yielded an eight-factor structure and reduced the total number of items for the e3sr to 41. she also employed confirmatory factor analysis to establish a model fit, which suggested a move towards model fit. mohamed (2013) performed exploratory factor analysis that confirmed a three-factor structure for the emotional subdomain and a four-factor structure for the social subdomain reducing the total number of items to 34. furthermore, mohamed (2013) also created a shortened version of the questionnaire with six items in the emotional and eight items in the social domain. epstein et al. (2009) employed an exploratory factor analysis which yielded a four-factor structure with a total of 57 items. ştefan et al. (2009) did not perform factor analysis. convergent and concurrent validity: the sce and scs were validated against the social skills rating system (ssrs, self–controlled scale form). correlations were in the medium to high range for the 5–7.5 age group on the sce and scs parents’ and teachers’ formats. correlations between scs–p (parents’ version) and scs–e (educators’ version) and the behaviour problem scale from the ssrs parents’ and teachers’ versions were medium negative correlations. criterion validity: epstein et al. (2009) concluded that the prebers was able to distinguish between children with and without disabilities. ştefan et al. (2009), epstein et al. (2009) and munnik (2018) concluded that the prebers, scs and sce and e3sr can be used with confidence to identify children’s strengths and weaknesses in the domain of emotional and social competence as a prerequisite for entry into mainstream education. the authors also provided guidance on future directions for research such as the establishment of convergent validity, concurrent validity and further validation studies (epstein et al., 2009; munnik, 2018). the need for longitudinal studies was also emphasised (ştefan et al., 2009). in sum, the articles provided a synopsis of the construction of the instruments, their theoretical definitions and how the instruments were operationalised. they also reported on the research conducted to establish the psychometric properties of the respective instruments and the methodological criteria used to investigate reliability, factor structure and validity of the instruments. implications and recommendations the primary contribution of this review is that it assists in the identification of instruments that measure social and emotional skills as a domain of school readiness that might be applicable for use in the south african context. the review expands existing early childhood research by identifying the underlying constructs and their operationalisation in the assessment of emotional and social competence in preschoolers. this study highlighted the need for ongoing refinement of existing scales and argues for a focus on the development of more instruments to complement and aid in existing practices in the educational environment to assess emotional and social skills in preschool children. future research should include the development of screening and diagnostic measures that focus on assessing emotional and social competencies and skills as a domain of school readiness, which are easily accessible, culturally appropriate and available for use by educators, parents and professionals such as psychologists in the south african context. conclusion there is a lack of screening and diagnostic measures currently available to assess emotional and social skills as an area or domain of school readiness in preschoolers in south africa. the perception that many developed assessment tools are not effective and undervalue the emotional and social competencies as part of school readiness assessment is still the dominant perception. the screening and school readiness assessment measures available abroad are not standardised for the south african population and therefore not appropriate for use within a multicultural south african context. more effective school readiness screening instruments that assess emotional and social skills are important for the accurate screening of young children’s emotional and social competencies during the preschool years. this review highlighted the need for ongoing engagement in research pertaining to children’s emotional and social skills as an important area or domain of school readiness. the need for appropriate diagnostic instruments is also highlighted as a means to identify learners in need of further intervention. acknowledgements the third reviewer, mrs c. meyburgh, involved in the appraisal stage of the systematic review, is hereby acknowledged for her contribution to the review. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this review article. authors’ contributions c.n.m. conducted the systematic review as part of the research towards a postgraduate qualification. she also contributed towards the writing of the article. e.m. supervised the review process and contributed to the conceptualisation and writing of the article. she also acted as the corresponding author. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability data sharing does not apply to this review as no new data were generated or analysed in this study. disclaimer the views and opinions expressed in the article are those of the authors and do not automatically contain the official policy or position of any affiliated agency of the authors. references amod, z., & heafield, d. (2013). psychological assessment in south africa research and applications. wits university test, university of the witwatersrand. blair, c., & peters, r. (2003). physiological and neurocognitive correlates of adaptive behaviour in preschool among children in head start. developmental neuropsychology, 24(1), 479–497. https://doi.org/10.1207/s15326942dn2401_04 bustin, c. (2007). the development and validation of a social emotional school readiness scale. doctoral dissertation. university of the free state. denham, s.a., bassett, h.h., zinsser, k., & wyatt, t.m. (2014). how preschoolers’ social–emotional learning predicts their early school success: developing theory-promoting, competency-based assessments. infant and child development, 23(4), 426–454. https://doi.org/10.1002/icd.1840 department of education. (2014). policy on screening, identification, assessment and support (sias). pretoria: department of basic education. retrieved march 04, 2022, from https://wcedonline.westerncape.gov.za/specialised-ed/documents/sias-2014.pdf epstein, m.h., synhorst, l.l., cress, c.j., & allen, e.a. (2009). development and standardization of a test to measure the emotional and behavioural strengths of preschool children. journal of emotional and behavioural disorders, 17(1), 29–37. https://doi.org/10.1177/1063426608319223 foxcroft, c., & roodt, g. (2013). introduction to psychological assessment in the south african context (4th ed.). oxford university press. gilbert, r., balatti, j., turner, p., & whitehouse, h. (2004). the generic skills debate in research higher degrees. higher education research & development, 23(3), 375–388. https://doi.org/10.1080/0729436042000235454 gough, d., oliver, s., & thomas, j. (eds.). (2017). an introduction to systematic reviews. sage. kokkalia, g., drigas, a. s., economou, a., & roussos, p. (2019). school readiness from kindergarten to primary school. international journal of emerging technologies in learning (online), 14(11), 4. kruyen, p.m., emons, w.h., & sijtsma, k. (2013). on the shortcomings of shortened tests: a literature review. international journal of testing, 13(3), 223–248. https://doi.org/10.1080/15305058.2012.703734 laher, s., & cockcroft, k. (2013). psychological assessment in south africa: research and applications. wits university press. liberati, a., altman, d.g., tetzlaff, j., mulrow, c., gøtzsche, p.c., ioannidis, j.p., & moher, d. (2009). the prisma statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. plos med, 6(7), e1000100. https://doi.org/10.1371/journal.pmed.1000100 madge, e.m., van den berg, a.r., & robinson, m. (1985). manual for the junior south african individual scales (jsais). human science research council. mohamed, s.a. (2013). the development of a school readiness screening instrument for grade 00 (pre-grade r) learners. doctoral dissertation. university of the free state. mtati, c.n. (2020). a systematic review: instruments that measure emotional and social competency as a domain of school readiness of preschool children in south africa. maters dissertation. university of the western cape. retrieved march 02, 2022, from http://hdl.handle.net/11394/7668 munnik, e. (2018). the development of a screening tool for assessing emotional social competency in preschoolers as a domain of school readiness. doctoral thesis. university of the western cape. munnik, e., & smith, m.r. (2019). contextualising school readiness in south africa: stakeholders perspectives. south african journal of childhood education, 9(1), a680. https://doi.org/10.4102/sajce.v9i1.680 munnik, e., hargey, m., meyburgh, c., gaika, m., & mariens, m. (2015). a systematic review of screening tools for emotional social competency as a domain of school readiness. in m. smith (chair), symposium on methodological rigor and coherence: deconstructing the quality appraisal tool in systematic review methodology, conducted at the 21st national conference of the south african psychological association of south africa, 14 december 2015. johannesburg. munnik, e., wagener, e., & smith, m. (2021). validation of the emotional social screening tool for school readiness. african journal of psychological assessment, 3(0), a42. https://doi.org/10.4102/ajopa.v3i0.42 puckett, m.b., & black, j.k. (2002). the young child: development from prebirth through eight (3rd ed.). prentice-hall. rademacher, a., & koglin, u. (2019). the concept of self-regulation and preschoolers’ social-emotional development: a systematic review. early child development and care, 189(14), 2299–2317. https://doi.org/10.1080/03004430.2018.1450251 republic of south africa. (1996). south african schools act (no. 84 of 1996), retrieved march 02, 2022, from http://www.info.gov.za/acts/1996/a84-96.pdf roopesh, b.n. (2019). vineland social maturity scale: an update on administration and scoring. indian journal of clinical psychology, 46(2), 91–102. schultz, d., ambike, a., logie, k.s., bohner, k.e., stapleton, l.m., vanderwalde, h., betkowski, j.a. (2010). assessment of social information processing in early childhood: development and initial validation of the schultz test of emotion processing – preliminary version. journal of abnormal child psychology, 38, 601–613. https://doi.org/10.1007/s10802-010-9390-5 smith, m.r., franciscus, g., swartbooi, c., jacobs, w., & munnik, e. (2015, september). developing a critical appraisal tool: the sfs scoring system. in m. smith (chair), symposium on methodological rigor and coherence: deconstructing the quality appraisal tool in systematic review methodology, conducted at the 21st national conference of the psychological association of south africa, strength in unity, sep 15–18, johannesburg, psyssa. ştefan, c.a., bălaj, a., porumb, m., albu, m., & miclea, m. (2009). preschool screening for social and emotional competencies-development and psychometric properties. cognition, brain, behaviour: an interdisciplinary journal, 13(2), 121–146. stroud, l. (2016). scale a: foundations of learning. powerpoint presentation on the developments in the griffiths iii on the aricd site. swim, t.j. (2007). theories of child development: building blocks of developmentally appropriate practices. retrieved march 02, 2022, from http://www.earlychildhoodnews.com/earlychildhood/article_print.aspx?articleid=411 wu, z., hu, b.y., wu, h., winsler, a., & chen, l. (2020). family socioeconomic status and chinese preschoolers’ social skills: examining underlying family processes. journal of family psychology, 34(8), 969–979. https://doi.org/10.1037/fam0000674 acknowledgements references about the author(s) sumaya laher department of psychology, university of the witwatersrand, johannesburg, south africa citation laher, s. (2019). editorial: psychological assessment in africa: the time is now! african journal of psychological assessment, 1(0), a11. https://doi.org/10.4102/ajopa.v1i0.11 editorial editorial: psychological assessment in africa: the time is now! sumaya laher copyright: © 2019. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. it is indeed an honour to pen the inaugural editorial for the african journal of psychological assessment (ajopa). whilst psychological assessment in its present form has largely been the domain of western psychologies located primarily in the global north, african researchers and practitioners have in recent decades adopted some of these techniques and used them successfully. others have adapted the techniques to suit the african context, often amalgamating these techniques within indigenous knowledge systems and contexts. some african researchers and practitioners have developed new and exciting methods congruent with local belief systems that tend to have better contextual fit. contrary to popular belief, psychology and psychological assessment in particular are active areas of engagement and robust debate in africa. south africa, for example, has an intimate history with psychological assessment, with the field being abused firstly to justify the ineducability of the native and then to support apartheid politics and post apartheid to almost being outlawed in the country (see laher & cockcroft, 2014). the field currently is a hotbed of development and discussion. in the policy sphere, the employment equity act 55, section 8 (government gazette 1998) states that psychological assessment of an employee is prohibited unless the test or assessment is reliable, valid, unbiased and can be applied fairly to all employees. south africa is one of the few countries that has legislation pertaining to the use of psychological assessments. however, botswana recently embarked on a process to develop a framework for the use of tests in the country’s schools (see mpofu, oakland, ntinda, secco, & maree, 2014). whilst having legislation for assessment is progressive, ensuring compliance with the legislation, as with the employment equity act in the case of south africa, is a challenge. added to this is the difficulty in ensuring that ethical procedures are followed with psychological assessment. test review processes in south africa are also under discussion. despite these challenges, research in the field is ongoing and in diverse areas of psychology, ranging from the more traditional, clinical and industrial psychology areas through to more community-based approaches (see laher & cockcroft, 2013, 2017). drawing on the south african context yet again (as this is the context most familiar to the editors), the economics that underlie psychological assessment in south africa are interesting and have relevance across the continent. south africa is home to many companies that specialise in psychological assessment and also supply material across african countries. the interplay between profit-making, negotiations with bigger multinational companies in the global north and the ethical responsibility to ensure access to quality assessments across the country and continent makes for discussion not often encountered in current assessment journals. south africa, like some other countries in africa, evidences two economies: one that is not too dissimilar to western contexts, generally with an educated, employed population, and one in which most people are unemployed, have little access to quality education and by and large live in poverty (leibrandt, woolard, finn, & argent, 2010). hence, the assessment field is split in that one caters for the more advantaged parts of the population who can afford to do assessments and who have the level of education necessary to do so and are demographically very similar to the western populations on whom tests are developed and normed. researchers and practitioners in this segment of the population are very much in sync with western developments in assessment, with the most recent being the focus on the fourth industrial revolution and the use of digital technologies and artificial intelligence, amongst other aspects (ayentimi & burgess, 2018; schwab, 2016). gamification, in particular, features strongly in this sphere (see armstrong, ferrell, collmus, & landers, 2016; herzig, ameling, & schill, 2015). the second economy is characterised by poverty and inequality. communities with no or little access to education, low literacy, large-scale unemployment and high crime rates have unique challenges for assessment (see laher & cockcroft, 2017). traditional pen-and-paper assessments from the west are often inappropriate in these settings, where most people do not have english as a first language. these communities are prevalent throughout africa, with other countries having contexts of famine, war and unrest in addition to poverty and illiteracy (see kagaari & kibanja, in press; tchombe, asangha, melem, wirdze, & ndzetar, in press). emic measures rooted in the philosophies of indigenous knowledge and emanating from local customs and context are increasingly being used in such contexts. the work of colleagues in zambia on the panga muthu test, the zambia child assessment tool and the object-based pattern reasoning assessment evidence this (matafwali & serpell, 2014; zuilkowski, mccoy, serpell, matafwali, & fink, 2016). research on the use of graphogame to enhance literacy in children (see jere-folotiya et al., 2014) also demonstrates the possibility of using digital technology in developing contexts, suggesting that the fourth industrial revolution is happening in a different way for assessment in african contexts. another interesting example of indigenous research can be found in the area of personality assessment. thalmayer (2018) collected data from 116 maasai herders in rural kenya and 114 supyire-senufo agriculturalists in mali and found evidence to match a big-two model (social self-regulation and dynamism) and a four-factor model (anger, laziness, virtue and happiness) that was common between the maasai and senufo. research is currently being conducted with the khoekhoe people in namibia (thalmayer, 2018). research using the south african personality inventory (sapi), an indigenous personality instrument, also supported a two-factor model (an agentic or personal growth factor and a social relational cluster) (see valchev, van de vijver, nel, rothmann, & meiring, 2013; valchev et al., 2014). in the area of cognitive assessment, serpell (2011), using empirical data from zambian children, argues for the relevance of social responsibility as a dimension of intelligence. super, harkness, barry and zeitlin (2011) use research conducted in loupa, senegal, to critically examine the concept of socially responsible intelligence, arguing for reciprocal relationships between african and western contexts – the need to ‘think globally but act locally’. from this brief snapshot it is clear that psychological assessment is a vibrant and active field in africa. however, much of this research is not disseminated across the continent. the ajopa aims to serve as the platform for the current disparate research being conducted in psychometrics and psychological assessment in africa. furthermore, ajopa will open up opportunities for collaboration and indigenous knowledge production. submissions that analyse and debate the current eurocentric and western cultural hegemonic practices that dominate the field of psychological assessment are encouraged, as this will lend much support to international debates in psychological theory and assessment. hence, the journal aims to be of relevance to local and international policy, research and practice. manuscripts in the areas of psychometrics and psychological assessment are invited. manuscript submissions must demonstrate a clear contribution to the field and must be of relevance to the african context. manuscripts can focus on, but are not limited to, ethics in assessment, establishing the psychometric properties of an instrument, methods in assessment, research on core issues in psychological assessment (e.g. assessment in low-resource settings, multicultural assessment, acculturation and assessment, language and assessment and assessment of people with disabilities), specific areas in assessment (e.g. cognitive, personality, vocational, intelligence and aptitude assessment) and/or particular settings (e.g. clinical, educational, forensic, organisational and neuropsychological assessment). manuscripts may take the form of original research studies, theoretical papers, case studies, test reviews or methods papers. african journal of psychological assessment is fully open access and charges no article processing fees. the editorial team looks forward to interacting with authors, reviewers and readers. acknowledgements the author would like to thank prof. kate cockcroft and prof. david maree, as well as trudie retief for their valuable feedback. competing interests the author declares that she has no financial or personal relationships that may have inappropriately influenced her in writing this article. references armstrong, m.b., ferrell, j., collmus, a.b., & landers, r.n. (2016). correcting misconceptions about gamification of assessment: more than sjts and badges. industrial and organizational psychology, 9, 671–677. https://doi.org/10.1017/iop.2016.69 ayentimi, d.t., & burgess, j. (2018). is the fourth industrial revolution relevant to sub-saharan africa?, technology analysis & strategic management. https://doi.org/-10.1080/09537325.2018.1542129 herzig, p., ameling, m., & schill, a. (2015). workplace psychology and gamification: theory and application. in t. reiners & l. wood (eds.), gamification in education and business (pp. 451–457). cham: springer. jere-folotiya, j., chansa-kabali, t., munachaka, j., yalukanda, c., sampa, f., westerholm, j., … lyytinen, h. (2014). the effect of using a mobile literacy game to improve literacy levels of grade one learners in zambian schools. educational technology research & development, 62, 417–436. https://doi.org/10.1007/s11423-014-9342-9 kagaari, j., & kibanja, g. (in press). the chronicle of psychological assessment in eastern africa: cultural, legal, and professional limitations. in s. laher (ed.), the international histories of psychological assessment. cambridge, uk: cambridge university press. laher, s., & cockcroft, k. (2013). contextualising psychological assessment in south africa. in s. laher & k. cockcroft (eds.). psychological assessment in south africa: research and applications (pp. 1–16). johannesburg: wits university press. laher, s., & cockcroft, k. (2014). psychological assessment in post-apartheid south africa: the way forward. south african journal of psychology, 44, 303–314. https://doi.org/10.1177/0081246314533634 laher, s., & cockcroft, k. (2017). moving from culturally biased to culturally responsive assessment practices in low resource, multicultural settings. professional psychology: research and practice, 48(2), 115. leibrandt, m., woolard, i., finn, a., & argent, j. (2010). trends in south african income distribution and poverty since the fall of apartheid. oecd social, employment and migration working papers, no. 101. retrieved from https://www.oecd-ilibrary.org/social-issues-migration-health/trends-in-south-african-income-distribution-and-poverty-since-the-fall-of-apartheid_5kmms0t7p1ms-en matafwali, b., & serpell, r. (2014). design and validation of assessment tests for young children in zambia. new directions for child and adolescent development, 146, 77–96. https://doi.org/10.1002/cad.20074 mpofu, e., oakland, t., ntinda, k., seeco, e., & maree, j.g. (2014). constructing a framework for the use of tests within a developing nation’s school system. international perspectives in psychology: research, practice, consultation, 3, 106–122. https://doi.org/10.1037/ipp0000015 schwab, k. (2016). the fourth industrial revolution. new york: crown publishing. serpell, r. (2011). social responsibility as a dimension of intelligence, and as an educational goal: insights from programmatic research in an african society. child development perspectives, 5, 126–133. https://doi.org/10.1111/j.1750-8606.2011.00167.x super, c.m., harkness, s., barry, o., & zeitlin, m. (2011). think locally, act globally: contributions of african research to child development. child development perspectives, 5, 119–125. https://doi.org/10.1111/j.1750-8606.2011.00166.x tchombe, t., asangha, m., melem, l., wirdze, l., & ndzetar, e. (in press). psychological testing and inclusive schooling: issues and prospects in central africa. in s. laher (ed.), the international histories of psychological assessment. cambridge, uk: cambridge university press. thalmayer, a.g. (2018). personality structure in africa: lexical studies of personality in maa, senufo, and khoekhoe. paper presented at the tilburg conference on methods and culture in psychology, tilburg university, 15 june 2018. valchev, v., van de vijver, f., nel, a., rothmann, s., & meiring, d. (2013). the use of traits and contextual information in free personality descriptions across ethnocultural groups in south africa. journal of personality and social psychology, 104, 1077–1091. https://doi.org/10.1037/a0032276 valchev, v.h., van de vijver, f.j.r., meiring, d., nel, j.a., hill, c., laher, s., & adams, b.g. (2014). beyond agreeableness: social–relational personality concepts from an indigenous and cross-cultural perspective. journal of research in personality, 48, 17–32. https://doi.org/10.1016/j.jrp.2013.10.003 zuilkowski, s.s., mccoy, d.c., serpell, r., matafwali, b., & fink, g. (2016). dimensionality and the development of cognitive assessments for children in sub-saharan africa. journal of cross-cultural psychology, 47, 341–354. https://doi.org/10.1177/0022022115624155 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) feziwe mpondo dsi-nrf centre of excellence (coe), faculty of health sciences, university of the witwatersrand, johannesburg, south africa samrc developmental pathways for health research unit, department of paediatrics, university of the witwatersrand, johannesburg, south africa charlotte wray department of psychiatry, university of oxford, oxford, united kingdom shane a. norris samrc developmental pathways for health research unit, department of paediatrics, university of the witwatersrand, johannesburg, south africa dsi-nrf centre of excellence in human development, faculty of health sciences, university of the witwatersrand, johannesburg, south africa aryeh d. stein hubert department of global health, rollins school of public health, emory university, atlanta, ga, united states of america alan stein department of psychiatry, university of oxford, oxford, united kingdom linda m. richter dsi-nrf centre of excellence in human development, faculty of health sciences, university of the witwatersrand, johannesburg, south africa citation mpondo, f., wray, c., norris, s.a., stein, a.d., stein, a., & richter, l.m. (2021). assessing psychological well-being measures among south african adults in the birth to twenty plus cohort. african journal of psychological assessment, 3(0), a44. https://doi.org/10.4102/ajopa.v3i0.44 original research assessing psychological well-being measures among south african adults in the birth to twenty plus cohort feziwe mpondo, charlotte wray, shane a. norris, aryeh d. stein, alan stein, linda m. richter received: 30 nov. 2020; accepted: 23 june 2021; published: 16 aug. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract mental health and substance use disorders account for a significant proportion of disability worldwide. in many developing countries like south africa, mental healthcare services are often inadequate, forcing people to find their own way of coping with distress and give meaning to their experiences. therefore, this situation necessitates the conceptualisation and characterisation of the quality-of-life indicators, as well as psychosocial strategies to promote mental well-being. the objectives of this study were to assess the psychometric properties of psychological well-being (pwb) measures in the context of urban soweto. data were collected from participants in the birth to twenty plus cohort (n = 1327), in 2018–2019. exploratory and confirmatory factor analyses conducted for measures of hope, faith, social support, general self-efficacy, and life satisfaction were taken from the national institutes of health (nih) toolbox emotion battery. cronbach’s alpha was used to determine internal consistencies; discriminant validity was assessed using pearson correlations. test-retest reliability analysis was conducted on a subset of participants at three time points which were at least 2 months apart. overall, the measures of pwb were characterised as having unidimensional factor structures, good model fit indices, high internal consistency and reliability to the paragraph. this study demonstrated that the pwb measures evaluated here are psychometrically sound, and suitable to be used in the south african context. keywords: psychological well-being; validity; test-retest; reliability; hope; faith; self-efficacy; general life satisfaction. introduction mental health and substance use disorders as well as injury account for a significant proportion of disability worldwide – especially in low-to-middle income countries (lmics) such as south africa (collaborators & ärnlöv, 2020; who, 2017). in many developing countries, mental healthcare services are often absent or inadequate (jansen et al., 2015). as a consequence, people who live in community settings with complex social problems often have to find their own way of coping with distress, build strength capacities, cultivate resilience and give meaning to their experiences (gil-rivas, handrup, tanner, & walker, 2019; jansen et al., 2015; sankoh, sevalie, & weston, 2018). the world health organization (who) as cited by masten and reed (2002) defines mental health as: a state of well-being in which the individual realises his or her abilities, can cope with normal stresses of life, can work productively and fruitfully, and can contribute positively to their community. (masten & reed, 2002, p. 90; who 2001a, b) therefore, suggesting the necessity of conceptualising and characterising the quality of life indicators, as well as psychosocial strategies to promote mental well-being. this is particularly important for south africa because even though it is an upper-middle-income country, people still face huge socioeconomic, structural, and public health issues that tax their emotional resources. there are high levels of unemployment; in the last quarter of 2019 alone 27.6% people of productive age were without jobs. crime is also on the rise in urban areas; between 2016 and 2017, 1.6 million individuals experienced contact crime such as murder, robbery and sexual offences, and a huge part of the population lives in overcrowded neighbourhoods with poor infrastructure, which make it difficult to monitor crime (statistics south africa, 2019). well-being classifies into two dimensions, namely subjective well-being (swb), and psychological well-being (pwb) (ryan, huta, & deci, 2008; smith & yang, 2017; van de weijer, baselmans, van der deijl, & bartels, 2018). subjective well-being can be operationalised with constructs that measure affect as well as those that cover cognitive aspects for example, harmony in life scale (nima, cloninger, persson, sikström, & garcia, 2020). psychological well-being refers to measures that assess efficacious or non-efficacious functioning at interand intra-individual levels, and is operationalised through constructs such as personal growth, purpose in life and self-acceptance (ryff, 2014). in this article, we evaluate specific domains of the national institutes of health (nih) toolbox emotion battery (salsman et al., 2013) namely, hope, faith, social support, general self-efficacy, and life satisfaction which are measures of pwb. these pwb scales were selected because they are widely used in the south african public health and community research contexts (brinker & cheruvu, 2017; pacico, bastianello, zanon, & hutz, 2013; van zyl & dhurup, 2018). it is important to re-evaluate the psychometric properties of these validated scales, especially in a local context, to see how a particular measurement theory is reflected in local empirical data (flora & flake, 2017). there is paucity of psychometric data on pwb scales in south africa, which makes it difficult to tell whether the scales are measuring latent constructs per the original design. hope is considered a psychological strength used to ensure that goals are attained through planning, overcoming behavioural or physical health issues, and dealing with any unintended outcomes from stressful life events (pacico et al., 2013; savahl, casas, & adams, 2016). faith has been defined as how an individual understands their ‘ultimate reality’ (fowler, 1981) by putting confidence in a higher power or being pious (bai, lazenby, jeon, dixon, & mccorkle, 2015). the faith construct has been shown to have positive associations with physical and mental health as well as other measures such as coping, and self-esteem (abdel-khalek & tekke, 2019). social support has been shown to buffer adverse life events through the action of others and belief of support, which leads to an appraisal of life situations as non-threatening. social support is widely incorporated into interventions and used to explain behaviour change (cohen, 2004). many types of social support were evaluated and shown to be consistent; for example in relationships and risky behaviours, and in promoting physical activity (brinker & cheruvu, 2017; cohen, 2004; ory et al., 2018; simoni, frick, & huang, 2006; wright, 2016). self-efficacy is the belief that one can accomplish tasks and goals in unpredictable circumstances. efficacious individuals welcome challenging tasks as motivating factors, while inefficacious individuals dwell on their weaknesses (bandura, 1986; mpondo et al., 2015). self-efficacy has been used extensively in health promotion studies and interventions (dennis, brennenstuhl, & abbass-dick, 2018; ory et al., 2018). general life satisfaction is an individual’s judgement of the consonance of their living conditions and standards without comparing themselves to others (diener, emmons, larsen, & griffin, 1985). according to veenhoven (1993, p. 213), ‘general life satisfaction is the degree to which a person evaluates their life’. recent studies have looked at general life satisfaction in association with self-rated health and social capital constructs (gigantesco et al., 2019; maass, kloeckner, lindstrøm, & lillefjell, 2016). the objective of this study was to assess the psychometric properties of pwb measures in the context of urban south africa. we conducted exploratory factor analyses (efas) to evaluate structure patterns, and confirmatory factor analysis (cfa) to get fit indices. we checked for internal consistencies using cronbach’s alpha and scale validity by calculating correlations between all the scales, and we also conducted test-retest reliability as well as intraclass correlations (iccs). methods sampling the birth to twenty plus (bt20+) cohort was established to observe growth, development and health of children and adolescents in an urban cohort, following the democratic transition in the republic of south africa. the cohort enrolled 3273 singleton babies from soweto and johannesburg, south africa, who were born between 23 april and 8 june 1990, and who continued to live in the area for the first 6 months of the child’s life. since birth, information on socioeconomic, family and personal factors influencing physical and psychological health and well-being has been collected 21 times. this article uses data collected when cohort members were 28 years old. a detailed description of the study and its cohort is published elsewhere (richter, norris, pettifor, yach, & cameron, 2007). data used here were collected between june 2018 and june 2019 from 1327 individuals, who had data on all the measures. we collected test-retest reliability data from a sub-set of the cohort (n = 43) participants, who were seen at three time points (t1, t2 and t3). the average t1 – t2 time point interval was 57 days, t2 – t3 was 14 days, and t1 – t3 was 191 days. participants completed the same questionnaires and were seen by the same assessor at each time point. measures all measures come from the nih toolbox emotion battery, which identified and developed measures suitable for use in epidemiology research across different ethnicities and cultures in high income countries (salsman et al., 2013). hope was measured using the who quality of life assessment (whoqol) study (group, 1998). the scale comprises four likert scale items with answer options ranging from 1 (not at all) to 5 (extremely). these items were shown to have good psychometric properties, that is, coefficient alpha of 0.74 in the original whoqol study (group, 1998) under the psychological facetspirituality domain. faith was also measured using the whoqol assessment (group, 1998). the scale comprises four likert scale items with answer options ranging from 1 (not at all) to 5 (extremely). this measure had good psychometric properties that is, coefficient alpha of 0.74 in the whoqol validation study (group, 1998) under the psychological facetspirituality domain. social support was measured using the nih toolbox social support questionnaire (salsman et al., 2013). this scale comprises eight self-report items with response options ranging from 1 (never) to 5 (always). these items have been shown to have a good model fit (i.e., cfi = 0.99; root mean square error of approximation [rmsea] = 0.112) and excellent psychometric properties (i.e. coefficient alpha 0.96) in the nih toolbox validation study (salsman et al., 2013). life satisfaction was measured using the nih toolbox life satisfaction scale (salsman et al., 2013), which comprises five likert scale items, with response options ranging from 1 (strongly disagree) to 5 (strongly agree). the psychometric properties of the scale in the nih toolbox validation study were good (i.e. coefficient alphas of 0.79–0.89; salsman et al., 2013). self-efficacy was measured using the nih toolbox general self-efficacy scale, which comprises nine items with response options ranging from 1 (never) to 5 (very often). this scale has been shown to have excellent psychometric properties (i.e., coefficient alphas of 0.93; cfi = 0.99 and rmsea = 0.73; salsman et al., 2013). analysis a total sample of 1327 participants was used to conduct efas to evaluate the factor structure patterns of the hope, faith, social support, general life satisfaction and self-efficacy measures. we used the keiser meyer olkin-bartlett’s (kmo) test for sampling adequacy: kmo values between 0.8 and 1 indicate sampling adequacy, values < 0.6 indicate inadequacy of the sample, and kmo values close to zero indicate widespread correlation. to understand the structure of variable clusters and identify latent variables we used the principal factor (pf) estimation technique. we also used the estat anti command to check for variables that were correlating too high. we chose oblique oblimin rotation to get the simplest factor structure. to extract factors, we used kaiser’s criterion by checking the scree plots. factors with loadings 0.30 or higher were considered components of one domain; at least three items needed to load onto a domain to be considered a valid factor. to obtain fit indices we conducted cfa using maximum likelihood (ml) estimation, and default bootstrap settings. fit indices calculated were: chi-square (χ2), chi-square/degree of freedom ration (χ2/df), the comparative fit index (cfi; hu & bentler, 1999), the tucker-lewis index (tli: hu & bentler, 1999), the rmsea (steiger, 1990), and a standardised root mean square residual (srmr; hu & bentler, 1999). best practice guidelines suggest that χ2/df should be less than 5, srmr should be close to zero, rmsea should be < 0.05, thus indicating a good fit, whereas a value that is < 0.08 indicates a reasonable model, and values exceeding that indicate a mediocre or a poor fit (byrne, 2010). for a good fit, the cfi and tli are recommended to be ≥ 0.90 (byrne, 2010; hu & bentler, 1999). internal consistency and reliability were determined using cronbach’s alpha (α). to determine scale validity, we used pearson’s correlation matrix. stata version 14 was used for analysis (statacorp, 2015). for the test-retest reliability, we evaluated practice effects using t-tests and effect sizes. cohen’s d was used to determine the magnitude of the practice effects, 0.2 is interpreted as a small effect, 0.5 as moderate and 0.8 as a large effect (cohen, 2004). we also used iccs to determine test-retest reliability. intra-class correlation coefficients were interpreted as: poor (< 0.5), moderate (0.50–0.74), good (0.75–0.90) and above 0.90 as excellent test-retest reliability (koo & li, 2016). ethical considerations the human research ethics committee of university of the witwatersrand (south africa) granted ethical clearance for this study (reference number: m180225) and the study was conducted in line with the principles of the declaration of helsinki for research involving human subjects. participants provided written informed assent consent. results a total of 1327 participants were interviewed, and about 99% of those had complete data for variables of interest. about 639 (48%) were male and 698 (52%) females. results of normality are presented in table 1 and item means for the pwb measures stratified by sex are presented in table 2. table 1: tests of normality for psychological well-being measures. table 2: mean and standard deviation by gender of item scores of socio-emotional measures (n = 1327). factor analysis all measures had kmo test values between 0.8 and 1, and thus suitable for further factor analysis. the scree plots showed that all latent variables converged into a single higher-order factor: eigenvalues > 1. table 3 displays exploratory factor analysis (efa) results. factors were regarded as stable if at least three items had significant loadings; this was the case for all measures. table 3: exploratory factor analyses and keiser-meyer-olkin test for sampling suitability (n = 1327). the cfa results are presented in table 4. for the hope scale the unadjusted model fit was poor that is, rmsea = 0.13; cfi = 0.60, and tli = 0.82. we identified items that may have been ambiguous or may have had an unclear meaning to the participant and lower factor loadings that is, hope-item 4 ‘how optimistic are you to remain in times of uncertainty’. for the faith measure, the unadjusted model had estimates: rmsea = 0.09; cfi = 0.80 and tli = 0.82, and we removed item 4 ‘to what extent does faith help you enjoy your life’. the social support scale also had poor fit indices (rmsea = 0.13; cfi = 0.64 and tli = 0.81), therefore two items were removed: item 6 ‘in the past month, please describe how often you had someone you trust to talk with about your feelings’, and item 8 ‘in the past month, please describe how often you had someone to turn to for suggestions about how to deal with a problem’. table 4: confirmatory factor analysis for fit indices. scale consistency and validity the mean and standard deviations of the summed scores of all the measures are presented in table 4. the individual scales for hope, faith, social support, general life satisfaction, and self-efficacy produced high internal consistencies (α’s). general life satisfaction and hope showed α’s > 0.70; faith, social support and self-efficacy α > 0.80 (figure 1.). figure 1: reliability scores (α) of each socio-emotional measure α. the pearson correlation coefficients are presented in table 5. most of the correlations showed significant positive associations of medium magnitudes, and faith vs. hope and self-efficacy vs. hope showed strong correlations. table 5: correlation of all socio-emotional measures. test-retest we assessed participants at three-time points: t1 and t2, and each had 43 participants, and t3, which had 30 participants (see table 6 for means and sd at each time point). at time point t1 to t2, self-efficacy showed significant practice effects with a small magnitude, hope had moderate non-significant practice effects. all other practice effects at t1 and t2 were small and non-significant. at t2 and t3, general life satisfaction had small and significant effects, self-efficacy had large non-significant effects, and all other measures had small non-significant effects. table 6: test-retest mean assessment scores over time. table 7 depicts icc test-retest reliability estimates. at timepoint t1 to t2, the reliability estimate was moderate for all pwb measures. at timepoint t2 to t3, the reliability estimates for hope, general life satisfaction, and self-efficacy were moderate, whereas faith and social support showed good reliability. table 7: intraclass correlation coefficient for test-retest measures. discussion this article aimed to evaluate the psychometric properties of pwb measures: hope, faith, social support, self-efficacy, and general life satisfaction, in a sample of young adult urban south africans. the factor structures for all the measures were unidimensional similar to other studies (de maria, vellone, durante, biagioli, & matarese, 2018; hinz et al., 2018; nel & boshoff, 2014; salsman et al., 2013). we removed some items in our cfa to improve fit indices (for hope, faith, and social support). this suggests that the language of the removed items needs to be re-evaluated to ensure acceptability to local understandings. the correlations allowed comparison of the magnitude of associations between the measures; faith, hope, self-efficacy, and general life satisfaction were shown to be valid as confirmed by good cronbach’s alphas (westen & rosenthal, 2003). this result suggests future studies can potentially assess these measures together. the test-retest results showed small practice effects for self-efficacy and general life satisfaction to some extent expected given the relatively short period of time between test-retest intervals. this was expected as participants had become familiar with the measures. intraclass correlations were moderate at t1 – t2 for all the measures, and for faith the icc was good, as well as for social support at t2 – t3 thus implying that there were small variations that originated from the instruments or circumstances under which measurements were taken. this suggests that the measures were reliable for application in the south african context (de vet, terwee, knol, & bouter, 2006; koo & li, 2016). participants reported moderate to high levels of hope, faith, social support and general life satisfaction, and low to moderate levels of self-efficacy. because these measures have been shown to have buffering effect against mental health disorders, and to enhance one’s reserves of social cognitive and problem-solving capabilities, they can be targeted for mental health promotion interventions (nyqvist, forsman, giuntoli, & cattan, 2013). the interventions can be delivered in various ways by community healthcare workers who would use a community-based model or through using digital technology (e.g. zero-rated platforms on cellular phones). the interventions could teach people how to cultivate positive feelings, exercise cognitive flexibility, self-compassion, have hope and optimism while providing and using support resources intentionally. another study conducted on coping in the soweto population showed that religious activity (i.e. gathering for prayer in a group or praying) was perceived to be a good source of resilience and coping (kim, kaiser, bosire, shahbazian, & mendenhall, 2019). this is a pre-existing psychosocial resource that can be incorporated into interventions, not to endorse religion per se, in the organisational sense, but to use some of the tenets embodied therein such as altruism, forgiveness, gratitude and social support as tools to buffer against mental health issues (sharma & singh, 2019). the limitation of this study pertains to the generalisability of some of the measures (hope, faith and social support) because some items were removed to improve fit indices and indeed reliability. it may be difficult to compare our findings to other validated studies. however, the removal of the items was in line with the purpose of testing the psychometric properties of a scale in a local context. removing items is warranted when those items have weak loadings or are ambiguous – concerning how a participant interacts with an item (i.e. obscure, sophisticated or complex vocabulary). literature shows that the removal of items from a scale does not compromise the reliability of that scale (mccrae, kurtz, yamagata, & terracciano, 2011). another limitation is the sample size used for the test-retest: it too might affect the generalisability of the results. because of time constraints we could not collect repeat measures for the pwb scales for a bigger sample. conclusion in conclusion, the fact that all pwb measures were shown to have high internal consistency, validity and reliability when used within an urban and multi-cultural context is a strength and points to their usefulness of the tools for assessing whether individuals are languishing or thriving. therefore, the measures are relevant for the community and/or research setting to be administered by trained non-clinical assessors. acknowledgements our gratitude goes to all the participants of the birth-to-twenty plus cohort, their parents and 298 relatives for contributing for more than 27 years to this study. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions all authors conceived and/or designed the work that led to the submission, revised the manuscript, approved the final version, agreed to be accountable for all aspects of the work. f.m. and l.m.r. carried out the data analyses and interpreted the data. f.m., c.w. and l.m.r. carried out the writing of the manuscript. s.a.n., a.d.s., and a.s. made significant contributions in interpreting the results and revising the manuscript. funding information the study was funded by the bill and melinda gates foundation (opp1164115). data availability data is under an embargo from the date of data collection (september 2019) until june 2021, thereafter the data will be made freely available; hyperlinks will be published. the data that support the findings of this study are available from the corresponding author, f.m., upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references abdel-khalek, a.m., & tekke, m. (2019). the association between religiosity, well-being, and mental health among college students from malaysia. revista mexicana de psicología, 36(1), 5–16. bai, m., lazenby, m., jeon, s., dixon, j., & mccorkle, r. (2015). exploring the relationship between spiritual well-being and quality of life among patients newly diagnosed with advanced cancer. palliative & supportive care, 13(4), 927–935. https://doi.org/10.1017/s1478951514000820 bandura, a. (1986). the explanatory and predictive scope of self-efficacy theory. journal of social and clinical psychology, 4(3), 359–373. https://doi.org/10.1521/jscp.1986.4.3.359 brinker, j., & cheruvu, v.k. (2017). social and emotional support as a protective factor against current depression among individuals with adverse childhood experiences. preventive medicine reports, 5, 127–133. https://doi.org/10.1016/j.pmedr.2016.11.018 byrne, b.m. (2010). structural equation modeling with amos basic concepts, applications, and programming (multivariate applications series). new york, ny: routledge. cohen, s. (2004). social relationships and health. american psychologist, 59(8), 676. https://doi.org/10.1037/0003-066x.59.8.676 collaborators, g.b.d., & ärnlöv, j. (2020). global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of disease study 2019. the lancet, 396(10258), 1223–1249. https://doi.org/10.1016/s0140-6736(20)30752-2 de maria, m., vellone, e., durante, a., biagioli, v., & matarese, m. (2018). psychometrics evaluation of the multidimensional scale of perceived social support (mspss) in people with chronic disease. annali dell’istituto superiore di sanità, 54(4), 308–315. dennis, c.-l., brennenstuhl, s., & abbass-dick, j. (2018). measuring paternal breastfeeding self-efficacy: a psychometric evaluation of the breastfeeding self-efficacy scale-short form among fathers. midwifery, 64, 17–22. https://doi.org/10.1016/j.midw.2018.05.005 de vet, h.c.w., terwee, c.b., knol, d.l., & bouter, l.m. (2006). when to use agreement versus reliability measures. journal of clinical epidemiology, 59(10), 1033–1039. https://doi.org/10.1016/j.jclinepi.2005.10.015 diener, e.d., emmons, r.a., larsen, r.j., & griffin, s. (1985). the satisfaction with life scale. journal of personality assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13 flora, d.b., & flake, j.k. (2017). the purpose and practice of exploratory and confirmatory factor analysis in psychological research: decisions for scale development and validation. canadian journal of behavioural science/revue canadienne des sciences du comportement, 49(2), 78. https://doi.org/10.1037/cbs0000069 fowler, j.w. (1981). faith and human development. new york, ny: harper & row. gigantesco, a., fagnani, c., toccaceli, v., stazi, m.a., lucidi, f., violani, c., … picardi, a. (2019). the relationship between satisfaction with life and depression symptoms by gender. frontiers in psychiatry, 10, 419. https://doi.org/10.3389/fpsyt.2019.00419 gil-rivas, v., handrup, c.t., tanner, e., & walker, d.k. (2019). global mental health: a call to action. american journal of orthopsychiatry, 89(4), 420. https://doi.org/10.1037/ort0000373 group, t.w. (1998). the world health organization quality of life assessment (whoqol): development and general psychometric properties. social science & medicine, 46(12), 1569–1585. https://doi.org/10.1016/s0277-9536(98)00009-4 hinz, a., conrad, i., schroeter, m.l., glaesmer, h., brähler, e., zenger, m., … herzberg, p.y. (2018). psychometric properties of the satisfaction with life scale (swls), derived from a large german community sample. quality of life research, 27(6), 1661–1670. https://doi.org/10.1007/s11136-018-1844-1 hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling: a multidisciplinary journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 jansen, s., white, r., hogwood, j., jansen, a., gishoma, d., mukamana, d., & richters, a. (2015). the ‘treatment gap’ in global mental health reconsidered: sociotherapy for collective trauma in rwanda. european journal of psychotraumatology, 6(1), 28706. https://doi.org/10.3402/ejpt.v6.28706 kim, a.w., kaiser, b., bosire, e., shahbazian, k., & mendenhall, e. (2019). idioms of resilience among cancer patients in urban south africa: an anthropological heuristic for the study of culture and resilience. transcultural psychiatry, 56(4), 720–747. https://doi.org/10.1177/1363461519858798 koo, t.k., & li, m.y. (2016). a guideline of selecting and reporting intraclass correlation coefficients for reliability research. journal of chiropractic medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012 maass, r., kloeckner, c.a., lindstrøm, b., & lillefjell, m. (2016). the impact of neighborhood social capital on life satisfaction and self-rated health: a possible pathway for health promotion? health & place, 42, 120–128. https://doi.org/10.1016/j.healthplace.2016.09.011 masten, a.s., & reed, m.-g.j. (2002). resilience in development. in c.r. snyder & s.j. lopez (eds.), handbook of positive psychology (pp. 74–88). oxford: oxford university press. mccrae, r.r., kurtz, j.e., yamagata, s., & terracciano, a. (2011). internal consistency, retest reliability, and their implications for personality scale validity. personality and social psychology review : an official journal of the society for personality and social psychology, 15(1), 28–50. https://doi.org/10.1177/1088868310366253 mpondo, f., ruiter, r.a.c., van den borne, b., & reddy, p.s. (2015). self-determination and gender-power relations as predictors of condom use self-efficacy among south african women. health psychology open, 2(2), 1–11. https://doi.org/10.1177/2055102915598676 nel, p., & boshoff, a. (2014). factorial invariance of the adult state hope scale. sa journal of industrial psychology, 40(1), a1177. https://doi.org/10.4102/sajip.v40i1.1177 nima, a.a., cloninger, k.m., persson, b.n., sikström, s., & garcia, d. (2020). validation of subjective well-being measures using item response theory. frontiers in psychology, 10, 3036. https://doi.org/10.3389/fpsyg.2019.03036 nyqvist, f., forsman, a.k., giuntoli, g., & cattan, m. (2013). social capital as a resource for mental well-being in older people: a systematic review. aging & mental health, 17(4), 394–410. https://doi.org/10.1080/13607863.2012.742490 ory, m.g., lee, s., han, g., towne, s.d., quinn, c., neher, t., … smith, m.l. (2018). effectiveness of a lifestyle intervention on social support, self-efficacy, and physical activity among older adults: evaluation of texercise select. international journal of environmental research and public health, 15(2), 234. https://doi.org/10.3390/ijerph15020234 pacico, j.c., bastianello, m.r., zanon, c., & hutz, c.s. (2013). adaptation and validation of the dispositional hope scale for adolescents. psicologia: reflexão e crítica, 26(3), 488–492. https://doi.org/10.1590/s0102-79722013000300008 richter, l., norris, s., pettifor, j., yach, d., & cameron, n. (2007). cohort profile: mandela’s children: the 1990 birth to twenty study in south africa. international journal of epidemiology, 36(3), 504–511. https://doi.org/10.1093/ije/dym016 ryan, r.m., huta, v., & deci, e.l. (2008). living well: a self-determination theory perspective on eudaimonia. journal of happiness studies, 9(1), 139–170. https://doi.org/10.1007/s10902-006-9023-4 ryff, c.d. (2014). psychological well-being revisited: advances in the science and practice of eudaimonia. psychotherapy and psychosomatics, 83(1), 10–28. https://doi.org/10.1159/000353263 salsman, j.m., butt, z., pilkonis, p.a., cyranowski, j.m., zill, n., hendrie, h.c., … choi, s.w. (2013). emotion assessment using the nih toolbox. neurology, 80(suppl 3), s76–s86. https://doi.org/10.1212/wnl.0b013e3182872e11 sankoh, o., sevalie, s., & weston, m. (2018). mental health in africa. the lancet global health, 6(9), e954–e955. https://doi.org/10.1016/s2214-109x(18)30303-6 savahl, s., casas, f., & adams, s. (2016). validation of the children’s hope scale amongst a sample of adolescents in the western cape region of south africa. child indicators research, 9(3), 701–713. https://doi.org/10.1007/s12187-015-9334-2 sharma, s., & singh, k. (2019). religion and well-being: the mediating role of positive virtues. journal of religion and health, 58(1), 119–131. https://doi.org/10.1007/s10943-018-0559-5 simoni, j.m., frick, p.a., & huang, b. (2006). a longitudinal evaluation of a social support model of medication adherence among hiv-positive men and women on antiretroviral therapy. health psychology, 25(1), 74. https://doi.org/10.1037/0278-6133.25.1.74 smith, g.d., & yang, f. (2017). stress, resilience and psychological well-being in chinese undergraduate nursing students. nurse education today, 49, 90–95. https://doi.org/10.1016/j.nedt.2016.10.004 stata cooperation. (2017). stata 15. college station, tx: stata cooperation. statistics south africa. (2019). victims of crime: governance, public safety and justice survey. government gazette, august. retrieved from http://www.statssa.gov.za/publications/p0341/p03412018.pdf statacorp. (2015). stata statistical software: release 14. statacorp lp, college station, tx: stata cooperation. steiger, j.h. (1990). structural model evaluation and modification: an interval estimation approach. multivariate behavioral research, 25(2), 173–180. https://doi.org/10.1207/s15327906mbr2502_4 van de weijer, m., baselmans, b., van der deijl, w., & bartels, m. (2018). a growing sense of well-being: a literature review on the complex framework well-being. https://doi.org/10.31234/osf.io/3rmx9 van zyl, y., & dhurup, m. (2018). self-efficacy and its relationship with satisfaction with life and happiness among university students. journal of psychology in africa, 28(5), 389–393. veenhoven, r., ehrhardt, j., ho, m.s.d. & de vries, a. (1993), happiness in nations: subjective appreciation of life in 56 nations 1946–1992, rotterdam: erasmus university rotterdam. westen, d., & rosenthal, r. (2003). quantifying construct validity: two simple measures. journal of personality and social psychology, 84(3), 608. https://doi.org/10.1037/0022-3514.84.3.608 wright, k.b. (2016). communication in health-related online social support groups/communities: a review of research on predictors of participation, applications of social support theory, and health outcomes. review of communication research, 4, 65–87. world health organization (who). (2001a). basic documents (43rd ed.). geneva: world health organization. world health organization (who). (2001b). strengthening mental health promotion. geneva: world health organization (fact sheet, no. 220). world health organization (who). (2017). depressio n and other common mental disorders: global health estimates (no. who/msd/mer/2017.2). geneva: world health organization. situating international histories of psychological assessment in a changed scientific landscape book review situating international histories of psychological assessment in a changed scientific landscape book title: international histories of psychological assessment author: laher s. (ed) (2002) isbn: 9781108755078 publisher: university printing house, cambridge, united kingdom £26.99 (gbp) *book price at time of review review title: situating international histories of psychological assessment in a changed scientific landscape reviewer: david j.f. maree1 affiliation: 1department of psychology, faculty of humanities, hatfield campus, university of pretoria, pretoria, south africa corresponding author: david maree, david.maree@up.ac.za how to cite this book review: maree, d.j.f. (2023). situating international histories of psychological assessment in a changed scientific landscape. african journal of psychological assessment, 5(0), a142. https://doi.org/10.4102/ajopa.v5i0.142 copyright notice: © 2023. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. introduction one can imagine that asking a multitude of authors to provide an overview of psychological assessment in their countries is a daunting task, but laher (2022) managed the process excellently by including section editors and providing contributing authors with a clear structure, or discussion template, that allowed authors to organise content comparably. writing histories is fraught with difficulties, but the authors of this collection of international histories of assessment did an excellent job. the structure of the book the work is introduced by laher and section editors in chapter 1, giving a short introduction to the brief given to authors and introducing the reader to the global terrain of assessment. the sections include africa, arab levant, europe, asia, oceania and the americas. although comprehensive, some areas are not covered, such as east africa, but as the field grows and access to psychology increases, future editions of this work will hopefully be expanded. the chapters utilised the discussion template to reflect a brief overview of the country under discussion and phases in the assessment development, such as pre-history, the 19th century and the development during the 20th century. prominent tests used in a particular country were discussed, as well as limitations and future directions. not many chapters addressed the latter, but as laher (2022) pointed out in chapter 19, most concluded that assessment and test development need to progress much more, gain more ground, focus on indigenous test development and allow psychology to flourish as a science and practice. of course, it depends on which part of the world was discussed. as expected, the west or north america and western europe dominate the test development scene in terms of progress with creating, adapting and analysing tests as well as providing guidelines for test construction, adaptation and application. the editor tempered the importance of these localities by placing the chapter on north america (and canada; chapter 18) last in chapters related to history discussion – a nice symbolic touch to emphasise the ‘other’ international voices in assessment. the international test commission (itc), which supported this publication, plays a pervasive role in providing training, guidance and standards along with the european federation of psychologists’ associations (efpa) and others. the progress or lack thereof in various latin american, african and asian countries can be associated with institutionalised support on the local governmental or international association level. assessment and test development go hand in hand with accepting psychology as an academic discipline and practice in countries. in some instances, psychology and assessment are supported by local governments, even with the establishment of local psychological regulative bodies. it is also apparent that some countries struggle with establishing these bodies, so psychology and assessment suffer greatly (chapter 4). of course, some governments view the psychological project as colonial and mainly a westernised endeavour, making it taking root so much more difficult (see zambia as an example in chapters 1 and 3). a lesson to be learned from global history is that some form of regulation of psychological practice is required if a country would like psychology to thrive. it need not be based on a westernised model. still, the sole reason is to do justice to communities at the receiving end: they deserve high-quality and standardised assessment and therapeutic interventions. a form of regulation does ensure training and assessment standards but should be in a form suited locally. from the different narratives, it seems as if progress in assessment is also associated with how well psychology becomes institutionalised as an academic discipline along with the goal of training psychologists. iran is one example (chapter 11), and brazil (chapter 16) is another. worldwide assessment and test development trends show that western countries found their feet many years ago and took the lead with standards and guidelines (chapters 7, 9). in contrast, other countries went through initial development phases, promising progress and then sudden interruption and decline (chapter 16). even in these countries, like chile (chapter 16) and south africa (chapter 2), assessment and test development need to be re-invigorated, marketed and supported. political factors greatly impacted the growth and decline of psychology and assessment (chapters 11 and 13). the communist regime’s dismantling of psychology in eastern-european countries is an example (chapter 10). other countries struggle with the exigencies of having psychology as a subject in their education and training institutions. some countries only recently managed to formalise psychology, training and associated regulative bodies, such as malaysia and peru (chapters 14 and 16), while others were growing strongly since the 2000s, such as brazil, with the support of the itc (chapter 16). writing history the introductory chapter 1 sets the tone for contributors by providing some guidance about writing the assessment history. at first glance, the guidelines harbour an epistemological tension against the background of the old and new-style of history writing in psychology (lovett, 2006). the old refers to a manner of presenting the history of psychology, of which boring (1950) is a prime example. the new justifies its approach by contrasting it with particular features of the old. the characteristics of old history writing as contrasted with the new are (lovett, 2006): (1) providing grand histories of prominent men in psychology as opposed to a focus on the historical context or zeitgeist (watrin, 2017), (2) focusing on development within psychological science (internalist) without considering the socio-political and historical context influence on the development of psychology (externalist), (3) writing about past events that have relevance for the present. thus, what and how to report depend on present concerns (presentism) as opposed to interpreting events through the eyes of the past (historicism), (4) current knowledge of psychologists is viewed as progressive when compared to the past; we know more than those in the past (whigg history) as opposed to an anti-progressionist view of development and progress in psychology, (5) old histories apparently rely on secondary sources rather than first-hand accounts which is the preference of the new histories and finally, (6) old histories were largely written by psychologists with no formal training in historiography, while trained historians are responsible for the new history. the reason for focusing on new history would be to avoid the supposed bias inherent in old history writing. thus, amateur historians tend to one-sidedly promote a great person’s or grand narrative’s influence on psychology, and perpetuating these beliefs for ‘facts’ relevant to the present by relying on narrowly focused internal history. the proponents of the new history justify it as critical because it aims to unmask perpetuated biases and beliefs when telling the story of psychology and, in this case, assessment (teo, 2005; watrin, 2017). thus, supporting the critical intent of the new history but avoiding a dogmatic use of the dichotomies, one would do well, as watrin (2017) urges, to view the writing of the history of psychology as a mix of approaches. accordingly, laher et al. (2022) aptly require: [a]uthors … to situate the chapters somewhere between narrative and historiography. hence the chapters assume a more critical stance in reporting the history of psychological assessment that recognises that history is never fact and always represents the subjective position of the author. (p. 2) they utilise both an internalist and externalist perspective, accommodating both the present and the historical, and secondary and first-hand sources when required. authors provided ‘facts’ of assessment in their countries and also employed their knowledge of psychology and the assessment enterprise. these accounts did not glorify the achievements of the past but acknowledged the modern roots of anglo-european developments in assessment; authors also, where applicable, pointed out the indigenous roots of assessment even if they hark back hundreds of years. the story of assessment is not one of linear and cumulative progress. to provide ethical and non-discriminatory psychological service to countries and their people, psychologists need to consider the historical story of assessment: where did we do injustice, against whom did we discriminate and in what manner; how did we employ assessment and psychological science to commit epistemological violence (teo, 2008, 2010)? the contributors’ view of history (writing) recreates the epistemological tension on another level as well, because the topic of the work is assessment, an activity and project primarily located within the quantitative domain of psychological practice. if it remains in the metatheoretical domain of modernist science with close alliances to natural science, positivism and associated epistemologies, then its ability to be critical, as its history stance would like to be, can be stymied. this issue is addressed next. psychology as science laher (2022, p. 359) in chapter 19, claims that most of those working with assessment and assessment development are guided by ‘… by a particular way of understanding science as espoused by the scientists working within modernist assumptions of what science should be’. that the modernist assumptions of what science should be are still widely accepted is probably true. authors mostly wrote carefully about these assumptions. laher (2022, p. 359) rightly credits cross-cultural psychology for its critical take on classical assumptions, the resulting sensitivity was displayed in abundance in most chapters. most called for the translation of tests, the adaptation of tests, development of indigenous tests and realised that emic-developed assessments were preferable to mere translation and adaptation. however, as teo (2005, pp. 161–162) remarks, the critical propensity of cross-cultural psychology is not as incisive as that of postcolonial critique: the former remains squarely within western (and thus modernist) assumptions and methodologies and might fail in dissolving the epistemological tension referred to above. although the metatheoretical considerations of natural science moved beyond positivism, naïve realism and empiricism a long time ago, the image of science psychologists are stuck in can rightfully be labelled as modernist. we have to thank our empirical social science and psychology methodology textbooks for this. the modernist view of science is informed mainly by what michell (2003) calls the quantitative imperative, namely, the view that measurement is a necessary characteristic of science. for various reasons, psychology invested in this modernist view of science, which applies primarily to some natural science disciplines (michell, 2000, 2008). the modernist view of science became so entrenched in our approach to methodology that it is not even questioned in psychometrics. if we accept the socio-historical nature of our psychological constructs, the demands of various postmodern and postcolonial positions make sense. with varying degrees, these positions provide a necessary voice to those treated unjustly, marginalised and misappropriated (teo, 2005). in chapter 9, laher (2022) reiterated that the origin of psychometrics as we know it lay with galton; further development of assessment and tests are a westernised project which delivered processes and products that justified the judgement of inferiority of certain races and cultures. laher’s (2022, p. 360) warning has postcolonial overtones: ‘… assessment is not, as with all fields of knowledge, exempt from agendas linked to power’, where in this instance, power refers to economic exploitation: the proliferation of western assessments is profitable. thus, it is easy to see how not attending to emic epistemologies and methodologies is possible. but, laher (2022, pp. 361–362) calls for a combination of emic and etic approaches and correctly points out that it is an error to think that eurocentric constructs have universal applicability just as it is an error to think that the emically developed tests, methods and constructs have only local validity. the same applies to methodology, methods and our concept of science. emic and etic perspectives can enrich and even change these. for the moment though, we can address our modernist assumptions about what science is. a critical realist view of science allows us to maintain a position between constructionism and realism by distinguishing between an intransitive and transitive domain (bhaskar, 1975/2008). the latter comprises our constructions, so to speak, about the real. the former acknowledges a mind-independent reality. science is the process of examining, confronting and questioning reality whilst forming explanatory theories about how things work, and we know that our theories, facts and knowledge may be false or shown to be empty constructions. on some levels, we can and do measure phenomena. the fortunate advantage of critical realism is its methodological pluralism implying that the nature of the thing under investigation determines the applicability of the method (danermark et al., 2019). the epistemological relativism, realism and critical orientation of critical realism provide a proper postmodern metatheoretical framework that can address our postcolonial concerns and global assessment aspirations (tinsley, 2022). this illuminating collection of chapters intentionally steps into a new scientific landscape no longer modernist. it has to negotiate between the old and new and land in a metatheoretical space where epistemological tensions are superseded. references bhaskar, r. (1975/2008). a realist theory of science. routledge. boring, e.g. (1950). a history of experimental psychology (2nd ed.). appleton-century-crofts. danermark, b., ekström, m., & karlsson, j.c. (2019). explaining society: critical realism in the social sciences (2nd ed.). routledge. laher, s. (ed.). (2022). international histories of psychological assessment. cambridge university press. laher, s., gan, y., geisinger, k.f., iliescu, d., macqueen, p., & zeinoun, p.a. (2022). histories of psychological assessment: an introduction. in s. laher (ed.), international histories of psychological assessment (pp. 1–20). cambridge university press. lovett, b.j. (2006). the new history of psychology: a review and critique [historical article]. history of psychology, 9(1), 17–37. https://doi.org/10.1037/1093-4510.9.1.17 michell, j. (2000). normal science, pathological science and psychometrics. theory & psychology, 10(5), 639–667. https://doi.org/10.1177/0959354300105004 michell, j. (2003). pragmatism, positivism and the quantitative imperative. theory & psychology, 13(1), 45–52. https://doi.org/10.1177/0959354303013001761 michell, j. (2008). is psychometrics pathological science? measurement: interdisciplinary research and perspectives, 6(1–2), 7–24. https://doi.org/10.1080/15366360802035489 teo, t. (2005). the critique of psychology: from kant to postcolonial theory. springer. teo, t. (2008). from speculation to epistemological violence: a critical-hermeneutic reconstruction. theory & psychology, 18(1), 47–67. https://doi.org/10.1177/0959354307086922 teo, t. (2010). what is epistemological violence in the empirical social sciences. social and personality psychology compass, 4(5), 295–303. https://doi.org/10.1111/j.1751-9004.2010.00265.x tinsley, m. (2022). towards a postcolonial critical realism. critical sociology, 48(2), 235–250. https://doi.org/10.1177/08969205211003962 watrin, j.p. (2017). the ‘new history of psychology’ and the uses and abuses of dichotomies. theory & psychology, 27(1), 69–86. https://doi.org/10.1177/0959354316685450 abstract introduction understanding individualism and collectivism methods results discussion conclusion acknowledgements references about the author(s) sumaya laher department of psychology, university of the witwatersrand, johannesburg, south africa safia dockrat department of psychology, university of the witwatersrand, johannesburg, south africa citation laher, s. & dockrat, s. (2019). the five-factor model and individualism and collectivism in south africa: implications for personality assessment. african journal of psychological assessment, 1(0), a4. https://doi.org/10.4102/ajopa.v1i0.4 original research the five-factor model and individualism and collectivism in south africa: implications for personality assessment sumaya laher, safia dockrat received: 20 nov. 2018; accepted: 29 jan. 2019; published: 28 mar. 2019 copyright: © 2019. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the five-factor model (ffm) of personality is one of the prominent models in contemporary psychology and defines personality in terms of five broad factors, namely neuroticism, extraversion, openness to experience, agreeableness and conscientiousness. recent research, however, questions the applicability of the ffm in non-western cultures, suggesting that it is not exhaustive enough and that it does not account for some other personality factors, most notably individualism and collectivism. yet, it remains the gold standard against which all personality instruments are compared. this study investigated whether the ffm of personality is related to individualism and/or collectivism in a sample of 272 south africans from the general johannesburg area. individuals completed a questionnaire consisting of a demographic section, the horizontal–vertical individualism/collectivism scale and the neo-pi-3. exploratory factor analysis was used to analyse the data. the results indicated support for an individualism–collectivism dimension. these results are discussed within the context of the universal applicability of the ffm. keywords: collectivism; five-factor model; individualism; neo-pi-3; personality assessment introduction the five-factor model (ffm) of personality that the neo personality inventory (neo-pi-3) is based upon has dominated personality theory and assessment over the last decade (laher, 2013). according to the ffm, human personality can be described by five personality traits, namely neuroticism, extraversion, openness to experience, agreeableness and conscientiousness. research into the cross-cultural applicability of the ffm has shown differences between asian and western cultures with the five factors not replicating clearly in these cultures (see cheung et al., 2008; laher, 2013; mccrae et al., 2005b; valchev et al., 2014). in some cultures, evidence has also been found for a sixth factor. for example, ashton and lee (2005) found evidence for an honesty and humility factor in addition to the five factors. studies in china have found that interpersonal relatedness can be regarded as a potential sixth dimension in describing asian personality (cheung et al., 2008). in south africa, nel et al. (2012) found evidence for nine personality clusters, namely extraversion, soft-heartedness, conscientiousness, emotional stability, intellect, openness, integrity, relationship harmony and facilitating. it is evident that the first six clusters are more closely related to the ffm and the last three represent more indigenous personality constructs. two arguments may be noted from the literature. the first stems from the argument for the universality of the ffm in that if the five factors are universal and vary only in the intensity of presentation across cultures and individuals, it is possible that collectivist expressions of the five factors may need to be incorporated into the current ffm. the second argument suggests that there may be a sixth domain to the five factors of personality, and this sixth domain is best defined by some social or interpersonal relatedness factor. thus, this study explores the relationship between personality and the individualism and collectivism dimensions using the neo-pi-3. the neo-pi-3 is the most recent version of the neo inventories with mccrae, costa and martin (2005a) arguing for its greater applicability given the removal of problematic items and the simplification of language. this relationship between personality, individualism and collectivism is explored with a view of contributing to the debate on the role of the individualism and collectivism dimensions in relation to the ffm of personality. understanding individualism and collectivism individualism and collectivism were chosen in this study as representations of the social relational dimension as presently they are amongst the most widely used constructs in research about cultural differences (taras et al., 2014). according to hofstede’s model (1980), individualism–collectivism can be viewed as opposite poles representing an independent position from groups on the one hand, to a dependence on groups on the other. within an individualistic society, people are viewed as independent from the group, and personal goals are given preference over shared ones; behaviour is thus based on personal attitudes rather than group norms. collectivist societies, on the other hand, emphasise interdependence within the group (as seen in the chinese model), and peoples’ behaviours are controlled depending on group norms rather than personal attitudes. this results in people in collectivist societies seeking to avoid conflict and maintain relationships (laher, 2013). according to triandis (2001), although individualism and collectivism are useful in terms of analysis, it would be gross stereotyping to assume that every individual within a certain culture would have all the characteristics of that culture. as a result, a distinction can be drawn between different types of individualistic and collectivist societies. this difference is because of the degree of emphasis placed on what have been termed horizontal and vertical social relationships. the former (horizontal) describes equality amongst individuals and the latter (vertical) describes a hierarchical structure where individuals differ in status. using these two dimensions, four distinct patterns within cultures have been identified, namely horizontal individualism (hi), vertical individualism (vi), horizontal collectivism (hc) and vertical collectivism (vc) (triandis & gelfand, 1998). horizontal individualism describes a society with people who want to be distinct from the group, and are highly self-reliant but not interested in the acquisition of status. with vi, people are competitive with others for the purpose of acquiring status. vertical individualism recognises and accepts inequality amongst individuals (triandis & gelfand, 1998). in collectivist societies, hc can be observed when individuals emphasise interdependence, sociability and sharing common goals but do not necessarily submit to authority easily. in vc, individuals are greatly concerned with the integrity of the in-group. they are willing to sacrifice their own desires and goals for the betterment of the in-group and promote competition between the out-group and the in-group. inequality and hierarchy within the collective is accepted (triandis, 2001; triandis & gelfand, 1998). for this study, this more nuanced understanding of individualism and collectivism was adopted. research on personality and individualism and/or collectivism markus and kitayama (1998) contrast the interdependent view of the person in collectivist cultures with the independent, self-contained, autonomous being in individualistic cultures and refer to the collective construction of personality in asia that fosters relationality. furthermore, cross and markus (1999, cited in mccrae et al., 2004) argue that: personality traits, as distinctive and enduring aspects of individuals are essentially a western phenomenon; in non-western, collectivist societies, personality characteristics are fluid, determined more by transient interpersonal situations than by enduring traits. (p. 180) this is supported by research in the indian, chinese and african contexts (see cheung et al., 2008; laungani, 1999; lodhi, deo & belhekar, 2002; ma & schoeneman, 1997; mpofu, 2001; mwamwenda, 2004). in the south african context, eaton and louw (2002) found that compared to english speakers, african-language speakers tended to use more interdependent and concrete descriptions characteristic of the collectivist dimension. vogt and laher (2009) provided support for individualism and collectivism as a separate factor to be considered in personality psychology. laher (2010a) argued that this collectivist dimension in south africa is best captured by the indigenous term ‘ubuntu’ [humanness]. ubuntu originates from an african aphorism, umuntu ngumuntu ngabantu (isizulu version) or motho ke motho ka batho (sesotho version), which translates as, ‘a person is a person through persons’. ubuntu as it is concerned with relationships towards others is defined by reverence, respect, sympathy, tolerance, loyalty, courtesy, patience, generosity, hospitality and co-operativeness (louw, 2001). this argument is supported by valchev et al. (2014) who present findings using the south african personality inventory (sapi) that support agentic versus communal dimensions to personality. valchev et al. (2014) also make reference to ubuntu in understanding the communal aspects found in the sapi. this exposition of ubuntu is important for a number of reasons. firstly, it clearly brings across the collectivist understanding of the individual in community. secondly, the use of indigenous languages to explain the essence of ubuntu suggests that the ffm by virtue of its location in the english lexicon may well have not considered these traits. thirdly, it could be argued that the description of traits associated with ubuntu (generous, hospitable, friendly, caring, compassionate, open and available to others, affirming of others, does not feel threatened that others are able and good) are traits that are linked to extraversion and agreeableness in the ffm. hence, it may be argued that these are subsumed in the ffm. however, we would like to argue that the presentation of extraversion and agreeableness in the ffm is more individualist and therefore cannot subsume the communal aspects of generosity, caring, etc. furthermore, ubuntu encapsulates an openness and availability to others that is not captured in the ffm, not even in the openness to experience domain. all of the domains measure personality as an expression of individual traits and behaviours. items on the neo-pi-r are also phrased at that level. to conclude, there is sufficient evidence to suggest that an individualism–collectivism distinction in personality, particularly in the ffm, is necessary. however, the arguments presented above indicate that aspects of this collectivist dimension might be tapped in the domains of agreeableness, extraversion and openness to experience but in an individualistic way. it is unclear both from the literature presented and the current conceptualisation as to whether the individualism–collectivism dimension should be a separate factor measured across individuals or whether it is an underlying cultural mechanism that needs to be incorporated into items, scales and factors in the ffm. thus, this study explores the relationship between personality and the individualism and collectivism dimensions in a sample of south african individuals in johannesburg and surrounding areas. methods sample a non-probability, convenience sample of 272 people from the communities in johannesburg and surrounding areas voluntarily completed the questionnaire. individuals in the sample were aged between 14 and 90 years (x = 36.52, sd = 14.53). from table 1, it is evident that the majority of the sample were female (n = 85, 66.9%). in terms of race, 39.7% were black people (n = 108), 8.8% were mixed race people (n = 24), 23.2% were indian people (n = 63) and 27.6% were white people (n = 75). a total of 153 (56.3%) individuals spoke english, while 115 (42.2%) spoke a language other than english. two questions were included in the questionnaire that requested participants whose home language was not english to rate their english reading skills and english comprehension skills from 1 to 5, with 1 being ‘not so good’ and 5 being ‘excellent’. for individuals who had english as a second language (n = 115), the majority of the sample (n = 88; 76.5%) reported excellent to good english reading and english comprehension ability, while 20% reported a satisfactory english reading and english comprehension ability (n = 23), thus controlling for issues of language proficiency in the study. table 1: descriptive statistics for the sample. instruments a questionnaire consisting of three sections was distributed to participants, namely a section on demographics, the neo-pi-3 and the horizontal–vertical individualism/collectivism scale. demographic variables collected included gender, education, occupation, race, language, english reading ability, english comprehension ability and test familiarity. demographic variables were used for descriptive purposes only. neo-pi-3 the neo-pi-3 consists of 240 items and measures the five domains and 30 facets of personality, as proposed by the ffm. the neo-pi-3 is a revised version of the neo-pi-r. the test can be used with adolescents aged 14 years and above (mccrae & costa, 2010). internal consistency reliabilities for the five domains in the neo-pi-3 ranged from 0.85 to 0.89 for form s (self-rating phrased in the first person) and from 0.84 to 0.93 for form r (other-rating phrased in the third person) (mccrae et al., 2005a). the revised instrument retained the proposed factor structure and showed slightly improved internal consistency, ‘cross-observer agreement’ and readability (mccrae et al., 2005a, p. 261). evidence suggests that the neo-pi-3 scales have convergent and discriminant validity when used in an adolescent population. for the general population, the psychometric properties remained fairly similar to that of the neo-pi-r’s generally good performance, with slight improvements (mccrae et al., 2005a). internal consistency reliability coefficients for the five domains ranged from 0.78 to 0.92 in this study, while facet reliability coefficients were all above 0.60 except for actions (α = 0.53), values (α = 0.52), straightforwardness (α = 0.49), modesty (α = 0.58) and tender-mindedness (α = 0.53). horizontal–vertical individualism/collectivism scale triandis and gelfand (1998) designed a 16-item scale to measure four dimensions of individualism and collectivism. the four dimensions are as follows: vc, vi, hc and hi (further description of each discussed in literature review). all items are answered on a 9-point scale ranging from 1, which represents never or definitely no, to 9, representing always or definitely yes. each dimension’s items are summed up separately to create a vc, vi, hc and hi score. internal consistency reliability scores, using cronbach’s alpha, range from 0.73 to 0.82 for the four dimensions described above (triandis & gelfand, 1998). good convergent and divergent validity for this scale was found. a strong relation to other individualism–collectivism scales was also found (triandis & gelfand, 1998). cronbach’s alpha coefficients for the four scales ranged from 0.60 to 0.66 in this study. research design located in the quantitative paradigm, this study used a non-experimental, cross-sectional design where participants completed a questionnaire at one point in time. there was no control group or manipulation of variables and the study was largely exploratory. hence, a non-experimental design was suitable for this study. research procedure a group of postgraduate psychology students collected data by administering the questionnaire to volunteers in the community. once all data had been collected, they were captured and scored as per the test developer specifications. thereafter, the data were analysed using the spss computer program (version 23, ibm, 2015). data analysis all data were first analysed using descriptive statistics. the nominal variables, namely gender, race and home language, were examined using frequencies, while for the interval variable, namely neo-pi-3 scale, means, standard deviations, minimum and maximum values and skewness coefficients were calculated. an exploratory factor analysis was run to determine the independence between the neo-pi-3 scales and the horizontal–vertical individualism/collectivism scale. principal component analysis was the method selected as it is a simple but effective method of determining factors that explain all the variance including the error variance in any particular correlation matrix (huck, 2012). varimax rotation was utilised as it aims to maximise the sum of variances of squared loadings in the columns of the factor matrix. this produces in each column loadings that are either high or near zero, thereby assisting interpretation (laher, 2010b). ethical considerations ethical clearance to conduct the research was obtained from the human research ethics committee (hrec) at the university of the witwatersrand (protocol number: h16/02/14). results table 2 presents the means, standard deviations, minimum and maximum values, and skewness coefficients for the domain and facet scales of the neo-pi-3. it is evident that all the domains and facets are normally distributed as the skewness coefficients were within the range of +1 to -1 (huck, 2012). table 2: descriptive statistics for the neo-pi-3. the relationship between the neo-pi-3 and horizontal–vertical i/c scale results for the independence of the neo-pi-3 scales and that of the horizontal–vertical i/c (hvic) scale are presented below, using factor analyses. in this study, both empirical and theoretical techniques were used to determine the number of factors to extract. theoretically, the neo-pi-3 proposes five factors and individualism–collectivism would be an additional factor if it loads as a single sixth factor, and it proposes two factors if it loads as two separate constructs, that is, individualism as one factor and collectivism as another factor. as indicated in table 3, according to the guttman–kaiser greater-than-one (k1) rule, eight factors should have been extracted. according to the scree plot (see figure 1) and parallel analysis (see table 3), six factors should be extracted. hence, five-, sixand eight-factor solutions were explored using varimax rotation. figure 1: cattell’s scree plot for the neo-pi-3 and horizontal–vertical i/c scale. table 3: eigenvalues and parallel analysis results. the five-factor solution explored whether the individualism–collectivism dimension could be subsumed by the neo-pi-3 as mccrae and costa (2003) argue. the six-factor model addressed whether individualism–collectivism can be considered as an additional construct. and finally the eight-factor solution addressed whether individualism and collectivism are in fact separate constructs, in line with the empirical conclusion using the guttman–kaiser greater-than-one rule. these results are presented here. all loadings above 0.40 or below -0.40 were considered as a loading on that particular factor or each analysis and are represented in bold font in the relevant tables. five-factor solution for the neo-pi-3 and the horizontal–vertical i/c scale table 4 presents the results for the five-factor solution. factor 1 loads as the conscientiousness factor as all six facets of this domain load positively on this factor. in addition, impulsiveness and vulnerability both load on this factor. however impulsiveness only has its secondary negative loading on this factor, whereas vulnerability’s highest loading appears on factor 1. factor 2 loads all of the openness facets and five of the six extraversion facets (excluding values which do not load on any factor). altruism also loads on factor 2 (0.465), but this is the secondary loading for altruism. its primary loading appears on factor 4 with the rest of the agreeableness facets. all six of the neuroticism facets load on factor 3, with moderate-to-high loadings of above 0.5. in addition, gregariousness has a primary loading on this factor of -0.441. factor 4 is characterised as the agreeableness facet, with all six of the facets loading above 0.4. vertical individualism also has a secondary negative loading on this fourth factor of -0.521. the fifth factor is in fact the individualism–collectivism dimension, consisting of the four subscales. all four subscales, hi, vi, hc and vc, have positive and moderate-to-high loadings on factor 5. horizontal individualism has a loading of 0.614, vi 0.570, hc 0.768 and vc 0.808. table 4: five-factor solution for the joint factor analysis of the neo-pi-3 and the individualism and collectivism dimensions. six-factor solution for the neo-pi-3 and the horizontal–vertical i/c scale table 5 presents the six-factor solution. given that openness and extraversion loaded on the same factor in the five-factor solution, it was concluded that this was not tenable. in this solution, the five factors of the neo-pi-3 now load as five separate factors, as the theory indicates. extraversion and openness no longer load on the same factor as seen in table 4. factor 1 continues as the conscientiousness factor, with impulsiveness and vulnerability loading negatively as before. factor 2 is now characterised by the neuroticism domain. all six neuroticism facets load moderately to high on factor 2. assertiveness (-0.431) now loads negatively on factor 2 as well as actions (-0.422). all six of the openness facets have small to high positive loadings on factor 3. factor 4 is characterised by small to high positive loadings for five of the six extraversion facets. assertiveness, the sixth facet of extraversion, loads positively on factor 6, which is the individualism–collectivism domain. it also has a small negative loading on factor 2. table 5: six-factor solution for the joint factor analysis of the neo-pi-3 and the individualism and collectivism dimensions. factor 5 can be considered the agreeableness factor. five of the six facets load positively, with small-to-moderate loadings. factor 6 finally is primarily the individualism–collectivism factor. all four dimensions of the individualism–collectivism dimension have positive, moderate-to-high loadings on this factor. in addition, impulsiveness loads primarily on this factor. both assertiveness and activity have small secondary, positive loadings on factor 6. compliance, which does not have a significant positive loading on any other factor has a high negative loading of -0.761 on factor 6. eight-factor solution for the neo-pi-3 and the horizontal–vertical i/c scale table 6 presents the eight-factor solution for the joint factor analysis of the neo-pi-3 and the individualism–collectivism dimension. factor 1 remains the same, loading as the conscientiousness factor with impulsiveness and vulnerability negatively loading as well. factor 2 loads just as table 4, characterised by the neuroticism domain. factor 3 is characterised as the openness factor, with all six facets loading positively. in addition, excitement-seeking and positive emotions both have positive, secondary loading on factor 3. factor 4 is characterised by five of the six extraversion facets, excluding assertiveness, with positive loadings of moderate to high. activity (0.52) and assertiveness (0.47) both have positive primary loadings on factor 6 instead. factor 5 is seen as the agreeableness factor, with five of the six facets loading positively on this factor. the sixth facet, compliance, loads negatively on the sixth factor with a loading of -0.784. the sixth factor is not characterised by any of the other factors or scales. as discussed, activity, assertiveness and compliance all load on this factor, with a positive secondary loading for impulsiveness (0.478). the seventh factor is the individualism factor, where hi and vi load positively. finally, the eighth factor is characterised as the collectivism factor, with hc and vc both loading positively. no other cross-loadings are evident for these two factors. table 6: eight-factor solution for the joint factor analysis of the neo-pi-3 and the individualism and collectivism dimensions. discussion this study sought to examine the relationship between the ffm and individualism and collectivism. this was done by examining five-, sixand eight-factor solutions for data obtained from the neo-pi-3 and the hvic scale. in the five-factor solution, we set out to test if the individualism–collectivism dimension is subsumed by the neo-pi-3 as mccrae and costa (2003) argue. what was found was that neuroticism, agreeableness and conscientiousness all loaded on separate factors; however, extraversion and openness loaded on the same factor. the fifth factor could be labelled as the individualism–collectivism factor as all four constructs for individualism and collectivism had loaded strongly on this factor (hi, vi, hc and vc). vertical individualism also had a negative and moderate secondary loading on factor 4: the agreeableness factor. vertical individualism is reflected in the desire for individuals to compete with other individuals, therefore recognising and accepting inequality amongst individuals, and a concern with becoming distinguished and acquiring status is evident (triandis & gelfand, 1998). this can be said to be rather contrary to facets such as compliance, altruism and tender-mindedness. these are all facets constituting the agreeableness domain. thus, this negative loading appears justified. aside from the vi cross loading, no other individualism–collectivism dimensions loaded with the five factors, suggesting that individualism and collectivism are not subsumed in the five factors of the neo-pi-3. this is in keeping with prior research in the field (cheung et al., 2001; laher, 2014; vogt & laher, 2009). the six-factor solution that followed aimed to test if the ffm would load as five separate factors, and the individualism–collectivism dimension would load on the sixth factor as a separate construct. in this solution, the five factors of the neo-pi-3 loaded as five separate factors, as the theory indicates with a sixth separate individualism–collectivism dimension. this result concurs with other research. cheung et al. (2008) confirmed that a six-factor solution is ideal: including a collectivism dimension via the inclusion of an interpersonal relatedness factor. similarly, valchev et al. (2014) found support for separate communal personality traits. the results of the eight-factor solution are the most interesting of the factor solutions as they suggest a new way of defining the individualism–collectivism dimension. it suggests a separation of the construct into two distinct constructs that can be individually explored further. overall though, the five-, sixand eight-factor solutions echo the need for the individualism and collectivism dimensions to be included in the understanding of personality as they are not subsumed in the ffm of personality as operationalised by the neo-pi-3. these results provide further empirical support to the arguments calling for an expansion of the ffm that are in line with previous research in the field (cheung et al., 2008; laher, 2013). while these findings suggest the expansion of the ffm, it is necessary to note the limitations of the sample used in terms of sample size and representivity. the use of etic instruments also needs to be noted (laher & cockcroft, 2014). it is recommended that further research with larger and more representative samples is needed. the use of several measures of individualism and collectivism would be important, as literature has shown that the use of a single measure might provide too simplistic a view for these complex variables (see taras et al., 2014). further, the development of an emic tool that can account for south african definitions of individualism and collectivism may be very useful to such a study within the south african context. as discussed in the literature review, further understandings and exploration of ‘ubuntu’ as a useful way of defining a specifically south african collectivism would prove very useful for better appreciation and accommodation of the unique south african context. conclusion it is evident from the findings that individualism and collectivism were not found to be subsumed in the ffm as operationalised by the neo-pi-3. the six-factor solution, for the inclusion of the five factors of the neo-pi-3 and the individualism–collectivism dimension, is the most informative in supporting calls for the inclusion of a sixth factor, while the eight-factor solution provided an interesting finding by splitting the dimension into individualism as one construct and collectivism as another. this finding contributes to debates on the understanding of the ic construct as either a single construct on a continuum or separate bipolar constructs (see taras et al., 2014). overall, the findings provide support for the need to reconsider the universality of the ffm in its current form. this finding has implications for personality assessment where the majority of the instruments still utilise the ffm as the gold standard for understanding the measurement of personality. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions s.l. conceptualised the question, supervised the project and wrote the article. s.d. collected the data and contributed to the write-up of the literature review, methods and results in the article. funding this work is based on the research supported in part by the national research foundation of south africa (grant number: 116327). references ashton, m.c., & lee, k. (2005). honesty-humility, the big five, and the five factor model. journal of personality, 73, 1321–1353. https://doi.org/10.1111/j.1467-6494.2005.00351.x cheung, f.m., cheung, s., zhang, j., leung, k., leong, f., & yeh, k.h. (2008). relevance of openness as a personality dimension in chinese culture: aspects of its cultural relevance. journal of cross-cultural psychology, 39, 81–108. https://doi.org/10.1177/0022022107311968 cheung, f.m., leung, k., zhang, j.x., sun, h.f., gan, y.q., song, w.z., & xie, d. (2001). indigenous chinese personality constructs: is the five-factor model complete? journal of cross-cultural psychology, 32, 407–433. https://doi.org/10.1177/0022022101032004003 eaton, l., & louw, j. (2002). culture and self in sa: individualism and collectivism predictions. journal of social psychology, 140, 210–217. https://doi.org/10.1080/00224540009600461 hofstede, g. (1980). culture’s consequences: international differences in work related values. beverley hills, ca: sage. huck, s.w. (2012). reading statistics and research (6th edn.). knoxville, tn: pearson education. louw, d.j. (2001). ubuntu and the challenges of multiculturalism in post-apartheid south africa. retrieved from http://www.phys.uu.nl/~unitwin/ubuntu.html laher, s. (2010a). the applicability of the neo-pi-r and the cpai-2 in south africa. (unpublished phd dissertation). university of the witwatersrand, johannesburg. laher, s. (2010b). using exploratory factor analysis in personality research: best practice recommendations. south african journal of industrial psychology, 36, 1–7. https://doi.org/10.4102/sajip.v36i1.873 laher, s. (2013). understanding the five factor model and five factor theory through a south african cultural lens. south african journal of psychology, 43, 208–221. https://doi.org/10.1177/0081246313483522 laher, s., & cockcroft, k. (2014). psychological assessment in post-apartheid south africa: the way forward. south african journal of psychology, 44, 303–314. https://doi.org/10.1177/0081246314533634 laungani, p. (1999). cultural influences on identity and behaviour: india and britain. in y.t. lee, c.r. mccauley, & j.g. draguns (eds.), personality and person perception across cultures (pp. 191–212). mahwah, nj: lawrence erlbaum associates. lodhi, p.h., deo, s., & belhekar, v.m. (2002). the five-factor model of personality: measurement and correlates in the indian context. in r.r. mccrae & j. allik (eds.), the five-factor model of personality across cultures (pp. 227–248). new york: kluwer academic. ma, v., & schoeneman, t.j. (1997). individualism versus collectivism: a comparison of kenyan and american self-concepts. basic and applied social psychology, 19, 261–273. https://doi.org/10.1207/s15324834basp1902_7 markus, h.r., & kitayama, s. (1998). the cultural psychology of personality. journal of cross-cultural psychology, 29, 63–87. mccrae, r.r., & costa, p.t.jr., (2003). personality in adulthood: a five-factor theory perspective (2nd ed.). new york: guilford press. mccrae, r.r., & costa, p.t.jr., (2010). neo-inventories for the neo personality inventory-3, neo five factor inventory-3 and neo personality inventoryrevised. professional manual. lutz, fl: psychological assessment resources, inc. mccrae, r.r., costa, p.t., & martin, t.a. (2005a). the neo–pi–3: a more readable revised neo personality inventory. journal of personality assessment, 84, 261–270. https://doi.org/10.1207/s15327752jpa8403_05 mccrae, r.r., terracciano, a., &78 members of the personality profiles of cultures project. (2005b). personality profiles of cultures: aggregate personality traits. journal of personality and social psychology, 89, 407–425. https://doi.org/10.1037/0022-3514.89.3.407 mpofu, e. (2001). exploring the self-concept in african culture. journal of genetic psychology, 155, 341–354. https://doi.org/10.1080/00221325.1994.9914784 mwamwenda, t.s. (2004). educational psychology: an african perspective (3rd edn.) sandton: heinemann publishers. nel, j.a., valchev, v.h., rothmann, s., van de vijver, f.j.r., meiring, d., & de bruin, g.p. (2012). exploring the personality structure in the 11 languages of south africa. journal of personality, 80, 915–948. https://doi.org/10.1111/j.1467-6494.2011.00751.x taras, v., sarola, r., muchinsky, p., kemmelmeier, m., singelis, t.m., avsec, a., … sinclair, h.c. (2014). opposite ends of the same stick: multi-method test of the dimensionality of individualism and collectivism. journal of cross-cultural psychology, 45, 213–246. https://doi.org/10.1177/0022022113509132 triandis, h.c. (2001). individualism-collectivism and personality. journal of personality, 69, 907–924. https://doi.org/10.1111/1467-6494.696169 triandis, h.c., & gelfand, m.j. (1998). converging measurement of horizontal and vertical individualism and collectivism. journal of personality & social psychology, 59, 1006–1020. https://doi.org/10.1037/0022-3514.74.1.118 valchev, v.h., van de vijver, f.j.r., meiring, d., nel, j.a., hill, c., laher, s., & adams, b.g. (2014). beyond agreeableness: social–relational personality concepts from an indigenous and cross-cultural perspective. journal of research in personality, 48, 17–32. https://doi.org/10.1016/j.jrp.2013.10.003 vogt, l., & laher, s. (2009). the relationship between individualism/collectivism and the five factor model of personality: an exploratory study. psychology in society, 37, 39–54. abstract introduction the need for updating local research on career counselling assessment theory and practice from climbing the career ladder to flourishing in unstructured and rapidly changing occupational contexts factors that co-determine career theory development and assessment-related strategies contextualising career counselling theory and practice in global south contexts innovating assessment in career counselling in global south contexts the value of utilising and integrating qualitative and quantitative interventions some caveats for the successful implementation of postmodern approaches to assessment (and intervention) in career counselling conclusion acknowledgements references about the author(s) kobus maree department of industrial psychology, faculty of economic and management sciences, university of the free state, bloemfontein, south africa citation maree, k. (2020). the need for contextually appropriate career counselling assessment: using narrative approaches in career counselling assessment in african contexts. african journal of psychological assessment, 2(0), a18. https://doi.org/10.4102/ajopa.v2i0.18 research project registration: project number: 115505 original research the need for contextually appropriate career counselling assessment: using narrative approaches in career counselling assessment in african contexts kobus maree received: 26 aug. 2019; accepted: 18 dec. 2019; published: 03 mar. 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract this article reports on the value of using narrative approaches in career counselling assessment in african contexts in addition to quantitative approaches to enhance the contextual relevance of assessment. the narrative represents a response to calls for local research on career counselling assessment theory and practice. a number of factors that co-determine the development of theory and assessment-related strategies are elaborated herein. the view is expressed that the feasibilities of the 21st-century labour markets should co-determine assessment-related strategies and theory development. in addition, the imperative to constantly innovate assessment in career counselling is emphasised against the background of the shift in emphasis from climbing the career ladder to flourishing in unstructured occupational contexts before some caveats for the successful implementation of postmodern approaches to assessment in career counselling are explicated briefly. the call for advancing the theory base in career counselling assessment-related matters in global south contexts in general and african contexts in particular is repeated. it is concluded that career counselling assessment theory and practice should be conceptualised from the key perspective that it should particularly meet the basic criteria of contextual relevance. keywords: narrative approaches in career counselling; assessment in global south contexts; quantitative approaches; enhancing contextual relevance; innovation in career counselling assessment. introduction many authors have not only called for innovation in particular, and reshaping of the theory and practice of career psychology in general but also for greater embedment of assessment in career counselling in national contexts (e.g. gerryts & maree, 2019a; maree & gerryts, 2019; maree & molepo, 2016; stead & watson, 2017). criticism of global north (north american and eurocentric) career counselling theory and intervention in global south (developing) countries have especially called for research on approaches to career counselling theory and practice that are more harmonious with the conditions of a developing country (lopez levers, may, & vogel, 2011), rather than to uncritically accept approaches, questionnaires and interventions developed elsewhere in the world. researchers and practitioners alike have called for cultural, educational, gender and socio-economic factors to be considered to advance the relevance of career counselling in these contexts in particular (alika & egbochuku, 2009; metz & guichard, 2009). this aim is in keeping with the following sentiments expressed by the health professions council of south africa (hpcsa, 2010): the history of development and use of psychometric measuring devices, instruments, methods and techniques in south africa have been tainted by the legacy of segregation which influenced certain stereotypical attitudes and culturally insensitive and inappropriate interventions. as a result very, few tests are available that have been developed and applied with the necessary appreciation of cultural and other diversity concerns with a view to standardizing same for all south africans. (p. 1) this statement makes clear that psychologists in general should carefully and consistently re-consider whether the approaches and associated instruments and interventions they draw on (i.e. their assessment-related activities in general) adequately equip them with the know-how to assess and prepare their clients for rapidly changing global contexts. below, the author elaborates on what has been said so far. the need for updating local research on career counselling assessment theory and practice watson (2013) maintained that career theory in general – and career construction theory, practice and assessment in particular – may fail to meet the needs of clients from all cultural groups, especially non-white, non-western, non-standard populations (‘non-career … the underclass, the underprivileged, the disadvantaged, the disaffected’ (p. 6) in particular). it is of particular concern that only a small number of psychometric tests and corresponding and trustworthy qualitative assessment techniques and strategies have been designed specifically for diverse african populations (maree, 2013). some models, strategies, questionnaires and tests developed in north american and eurocentric contexts can be adapted, re-standardised and subsequently implemented especially in third world (developing country) contexts. in addition, as mentioned earlier, research should at the same time be conducted to design and develop models, strategies, interventions, questionnaires (qualitative) and tests locally to facilitate career counselling in developing countries. in this regard, the author supports stead and watson’s (2017) call for appropriate assessment and establishment of the applicability of career counselling theory and practice to population groups that differ from the population groups for whom they were developed originally instead of uncritically assuming its applicability. these views are consistent with the sentiments expressed by oakland (2004), who maintained that the design and development of (qualitative and quantitative) assessment instruments and matching interventions and so on should form part of the broader socio-economic aims of any country. from climbing the career ladder to flourishing in unstructured and rapidly changing occupational contexts up until the latter part of the previous century, individuals would mostly choose a career, gradually climb the ladder within that career and its clearly demarcated structures, and stay in it for the rest of their career-lives; even staying with one enterprise right through their career-lives. however, the situation has changed considerably: few people in global north contexts especially, but, to an increasing extent, in global south contexts too do likewise. quite the opposite. workers of today transit from one job and one work environment to another repeatedly. in addition, in the current uncertain and fluid career world, they are expected to negotiate and navigate multiple work-related transitions (and deal regularly with work-related trauma) in the course of their career-lives. this ever-changing situation is impacting fundamentally the way in which career counselling should be provided to address the needs of contemporary career counsellors and respond adequately to changes taking place in the world of work. simply asking a number of questions that relate to their personal and family history, letting them complete a few interest and personality inventories, inquiring about their study orientation and letting them complete intelligence or aptitude tests, after which they are ‘told’ what they ‘must’ do to ensure an optimal fit between their traits and possible work environments by an expert career counsellor (in other words, career education or career guidance is provided) no longer suffice. the view that a linear career ‘path’ should be chosen is considered defunct today. such an approach contributes to a large extent to the inordinately high dropout rates at tertiary level. the vast majority of prospective students are never allowed to recount their career-life stories (narratability), reflect on these stories and draw on these stories reflexively ((auto)biographicity) (savickas, 2019) to uncover their key career-life themes and draw on their own inner advice (see later). consequently, many students lack a sense of meaning and purpose when they enter tertiary training environments, display a curious inability to adapt to changing circumstances and subsequently either drop out and migrate to another field of study (without having received avant-garde career counselling). many of them eventually either end up feeling stuck in a world of work that they experience as frustrating and unfulfilling or end up being inadequately employable. fugate and kinicki (2008) referred to the importance of individuals’ dispositional employability as ‘a constellation of individual differences that predispose employees to (pro-) actively adapt to their work and career environments’ and confirmed that ‘employability is a disposition that captured individual characteristics that foster adaptive behaviours and positive employment outcomes’ (p. 504). in other words, adaptability could be regarded as the vehicle that influences and shapes people’s employability potential. seen from this perspective, the ‘new’ approach serves to bolster people’s potential to find sustainable and decent work. moreover, this approach aims to help people make meaning of their career-lives and experience a sense of purpose and hope in their career-lives instead of merely deciding on a field of study, completing their studies, or training and finding work. next, the author briefly elaborates on a number of factors that co-determine the development of theory and assessment-related strategies. factors that co-determine career theory development and assessment-related strategies multiple global occupational changes during the past few decades especially call for timely and appropriate responses from career counselling theorists, practitioners and researchers to ensure that people seeking career counselling are receiving career counselling interventions that are compatible with developments in the field. establishing success in a career, for instance, is becoming ever more unpredictable. the notion of ‘climbing the ladder’ is becoming redundant in many workplace contexts (bimrose, 2010), prompting career psychology theorists, practitioners and researchers to theorise about career counselling and implement avant-garde practical career counselling strategies to help them deal with the uncertainty, insecurity and anxiety that inevitably characterise new occupational environments. in these environments, large numbers of people lose their jobs, fail to find employment or discover that their qualifications have become redundant and no longer suffice to help them find employment. globally, consensus is growing that the focus of career counselling has shifted from ‘matching’ people’s personalities to ‘fitting’ career environments to blending careers into people’s lives and lifestyles more satisfactorily. moreover, there is general agreement that people’s careers should be integrated into their lifestyle rather than the other way round. in addition, workers are increasingly being subjected to trauma at the workplace. for these and other reasons, innovative ways must be found to help workers deal with this trauma – inherently a side effect of recurrent workplace transitions. the above paragraph confirms the view that the feasibilities of the 21st-century labour markets should co-determine assessment-related strategies and theory development. this statement is evidenced by the changing vocabulary of the career counselling discourse. concepts, such as internationalisation, globalisation of the workforce, labour surplus, diversification, multi-skilling and the gig economy, reflect researchers’ and practitioners’ almost desperate attempts to regain a sense of control over a situation that sometimes appears to be slipping out of control. likewise, there is no contesting the key importance of acquiring information communication technology (ict) skills (especially in the light of the fourth and maybe even fifth industrial revolution (work 4.0 and work 5.0) (maree, 2019b; schwab, 2016; schwab & samans, 2016). the fourth wave in career counselling (ca. 1990–2010) was characterised by uncertainty, protean and boundaryless careers, de-jobbing and the disappearance of standard jobs (savickas, 2015). the fifth wave in career counselling will be characterised by attempts to enable people to navigate repeated transitions brought about by fundamental changes in ict triggered by the digital revolution (gillwald, 2019; gurri, 2013). gillwald (2019) predicts that the sixth wave, that is the kondratiev wave, will be related to renewable energy. in the meantime, we are bombarded by predictions of changing employment patterns, such as the prediction that robots will soon take over the vast majority of jobs and that, in the future, human beings will have to deal with a situation where notifications such as ‘no jobs available’ will be supplemented by ones that state ‘humans need not apply’. irrespective of individuals’ perspective on these matters, there seems to be a consensus that helping people make meaning and lead a purposeful life in a ‘jobless world’, and this can or has already become a key element of career counsellors’ job. gillwald (2019) further contends that schumpeter’s (1942, 1982 [1934]) idea of ‘creative destruction’ (the disassembling of time-honoured practices to pave the way for innovation) underpins currently a great deal of postmodern innovation theory and informs speculations about the value of disruption as a theoretically constructive economic and social influence (henton & held, 2013). this view is wholly in line with current thinking in career counselling that advocates the idea of active mastery passive suffering and the idea of converting challenge into opportunity and challenge or ‘problems’ into prospects, positive expectations, possibilities and hope (savickas, 2019). whereas, during the fourth wave, the predominant helping models in career psychology have been career counselling and life-design counselling (models that will, in all likelihood, still prevail during the first part of the fifth wave at least), from my vantage point, during the fifth wave (extending into the sixth wave), the focus could be life purpose counselling as the predominant helping model and focussed on inspiration, promoting social justice, purpose, ethical behaviour and the common good to marry the needs of humans with the skills of robots. from this perspective, contextualising and innovating assessment in career counselling to equip career counsellors fittingly for their task of assessing and counselling their clients appositely in these rapidly changing times is non-negotiable. contextualising career counselling theory and practice in global south contexts global south contexts are typically characterised by poverty and a severe lack of resources. fewer work opportunities are available than in global north contexts and unemployment is mostly widespread and ever-increasing. however, in the vast majority of contexts, there are pockets of affluence. therefore, whilst the majority of contexts are seriously deprived (disadvantaged), a few will display a flourishing character. for this reason, in global south contexts, career counselling as a field should cater for the needs of both privileged and disadvantaged groups. different kinds of career counselling styles (based on different theoretical orientations) will be necessary to be drawn on to address the divergent needs of individuals from diverse contexts. in affluent contexts (that resemble those contexts where newer, postmodern theory and intervention were initially developed), and especially in one-on-one contexts, postmodern approaches, including self-construction and career construction (guichard, 2009; savickas, 2019; savickas et al., 2009) (and life-design counselling), are increasingly gaining traction. this is not the case in poverty-stricken regions, although where group career counselling is the order of the day in a few regions where such a service is provided. in these contexts, the traditional person–environment fit model (vocational guidance) still prevails. researchers such as albien (2018), maree (2017a, 2017b, 2017c), maree, cook and fletcher (2018) and watson (2013) have demonstrated the value of contextualising and then applying self-construction and career construction and life-design counselling successfully for use in (south) african contexts and also applying this successfully in individual african contexts. maree (2017b), in particular, has shown how life-design counselling could be adjusted for use in group-based african contexts. however, the application of these newer approaches remains woefully inadequate in african contexts. innovating assessment in career counselling in global south contexts the author concurs with phares’ (1992) statement that the primary allegiance of psychologists should be to devising best ways to be helpful to clients instead of to any given theoretical approach and associated techniques and interventions. therefore, to promote the rigour (validity, reliability and trustworthiness) of any assessment and intervention technique or strategy, it is a key to constantly innovate and contextualise conceptual frameworks and associated intervention strategies carefully before using them in contexts that differ from the original contexts for which they were originally conceptualised. therefore, reflecting theoretically and making practical suggestions on reshaping and adapting career theory and practice should, in contexts of developing country, occur at the following two levels: innovating career counselling constantly and actively, in combination with constantly contextualising career counselling. therefore, to facilitate avant-garde career counselling in collectivist, third world (under-developed and developing countries) contexts, especially where people (either the majority of people or members of minority groups) have been subjugated or colonised for decades and longer, innovation is particularly important. a few matters that lie at the heart of our efforts to update, innovate and contextualise career counselling are highlighted below. firstly, the need to provide career counselling to all people across the diversity spectrum. the south african career development association’s (sacda) efforts in this regard should therefore be applauded and strongly supported (department of higher education and training (dhet), 2017). secondly, ensuring that career counselling is provided in various formats depending on the most pressing need that exists in any given context. this includes the provision of (1) basic career counselling-related information (which could be provided by suitably trained practitioners with a grade 12 certificate), (2) career education, guidance and development (services that could be provided by non-psychologists with apposite training) and (3) career counselling (i.e. the psychological dimension of career counselling) that could only be provided by psychologists. thirdly, understanding the great need to not only provide group-based career counselling but also to promote group-based integrative qualitative–quantitative career counselling across (south) africa. whilst there would always be a need for one-on-one career counselling, the vast need is to make available career counselling to all people. the caveat is the importance to uphold and promote the rigour of all interventions. fourthly, a willingness by all to embrace newer developments in addition to a commitment to receive adequate training of these newer developments in theory and practice. fifthly, an understanding of the fact that inadequate or inappropriate career counselling, without any doubt, contributes to worrying dropout rates at institutions of education and training. lastly, acceptance of the key role of career counselling to promote global efforts to provide sustainable decent work for all. in the next section, the value of utilising and integrating qualitative and quantitative interventions is discussed briefly. the value of utilising and integrating qualitative and quantitative interventions inadequate research has been conducted on the value of subjective (qualitative, storied or narrative) interventions and associated techniques in career counselling along with quantitative approaches in global south, especially in the context of a developing country (maree, 2010, 2015, 2016a, 2016b). in many global north contexts, however, the use of integrative career counselling approaches (integrating qualitative and quantitative approaches and interventions) has grown exponentially in popularity (mcmahon & watson, 2015). against this background, part of the resolution for contemporary career counselling challenges in the context of a developing country seems to be the use of an integrative qualitative–quantitative approach to career counselling premised on the principles of self-construction and career construction (that comprise the basis of life-design counselling). summarised, from a postmodern and storied perspective, self-construction and career construction interventions entail elicitation of clients’ career-life stories (phase 1), validation of these stories by them (phase 2) and the planning and enactment of the future chapters in these stories by carrying out action steps that are jointly conceptualised and agreed upon by counsellors and their clients. these steps promote clients’ career adaptability (including their career concern, career control, career curiosity and career confidence). consequently, their employability, too, is enhanced (fugate, kinicki, & ashforth, 2004). when people share their autobiographies, they could be helped to identify their key life themes and find out what really drives or motivates them. this approach has been shown to identify people’s deep-seated strengths and motives and helps them reflect first on their career-life stories and then on their reflections (meta-reflect) in the context of a ‘typical’ developing country (maree, 2017a). these meta-reflections facilitate scaffolding for action and forward movement (savickas, 2015). this approach could enable career counsellors to help people regain a sense of meaning, purpose and hope in their career-lives, even, i surmise, in a ‘jobless world’. the integrative approach is premised on the acknowledgement of the need to make use of subjective (stories) and objective (scores) factors to help people confronting repeated crossroads manage past, present and the future trauma and triumph over challenges by reflexive constructing, deconstructing, reconstructing and co-constructing of their career-life stories, and transforming ‘problems’ or challenges into opportunities, disappointments into accomplishments, and hurt into healing. this approach endorses duffy and dik’s (2009) view that: [o]nce clients have addressed the broad level of life purpose or meaning, a next step is for counselors to help clients connect what they view as their life purpose or role in the larger society with their activity within the work role. (p. 441) the integrative approach is founded on the notion of career as a story and is closely aligned with self-construction and career construction counselling theories. these theories are encapsulated in the encompassing life-design approach and intervention. the life-design approach represents the first-ever coordinated career counselling theory (savickas et al., 2009). it was induced by the need for lifelong, holistic and contextual career counselling interventions to help people preserve perspective and hope in rapidly changing work contexts (maree, 2010, 2015). life-design intervention comprises a space-bound, time-bound and context-bound type of career counselling intervention. it builds on and integrates career construction and self-construction with those idiosyncratic aspects that feature play in people’s career-lives. eliciting and drawing on people’s career-life stories represent an invaluable mechanism for enabling life design (mcadams, 2001). life design ‘emphasizes narratability to tell one’s story coherently, adaptability to cope with changes in self and situation, and intentionality to design a successful life’ (hartung, 2013, p. 11). it could be utilised to help people actualise their potential and make meaningful social contributions. this approach seems especially germane for use in african contexts (maree, 2019b), shaped by ubuntu, isinti, ujamaa [broadly speaking, the emphasis on the extended family, brotherhood, and familyhood; the belief that human beings become persons through other people or the community] and associated principles (nussbaum, palsule, & mkhize, 2010) and the birth place of the notion of storytelling. curiously, in the corridors of conference venues, the author was asked questions such as ‘but will the storied approach “work” in a developing country such as south africa?’ essentially, this kind of question reflects ignorance about the fact that storytelling lies at the very heart of african cultures and that drawing on the elicitation of stories comes natural in african contexts (sonn, stevens, & duncan, 2013; stevens, duncan, & sonn, 2013). using a storied approach in these contexts makes a perfect sense. in fact, using the storied approach is unquestionably harmonious with global south contexts (especially the african contexts). in fact, narrating stories on various occasions and for various reasons has been ‘practised’ across africa for millennia. stevens, duncan, and sonn (2013, p. 18), for instance, contend that ‘black history … has been passed on through the art of storytelling’. underlying the tradition of ‘storytelling’ is the paradigm of social constructionism. stated briefly, this theory advances identity formation in preference to personality traits, career adaptability in place of ‘maturity’, using ‘stories’ or narratives (a qualitative approach) together with ‘scores’ (numbers; a quantitative approach) and action as opposed to inaction and intent. this approach also advances the importance of becoming employable rather than finding work or a job (employment) (often the sole aim in disadvantaged contexts where job opportunities are extremely limited). in the long run, this approach strives to help people negotiate and deal with repeated occupational transitions during their working lives and transform career counselling assessment into intervention. in the process, (critical) self-reflection, reflexivity as well as dialogical (social) meanings of co-construction (between clients and their counsellors) are advanced (blustein, palladino schulteiss, & flum, 2004; cardoso, silva, gonqalves, & duarte, 2014). at the very least, it seems to be a key to contextualise theory and practice developed elsewhere in contexts that differ from the contexts in which these theories and practices were initially developed. in the following, this matter is briefly discussed. some caveats for the successful implementation of postmodern approaches to assessment (and intervention) in career counselling the successful introduction of integrative approach depends strongly on a number of factors, including the following: allowing sufficient time and facilitating sufficient training opportunities for career counsellors to acquaint and familiarise themselves with the approach. acceptance of the value and importance of the approach as a mechanism to enable them to respond appropriately to sweeping changes in the occupational world. adjustment of career counselling training programmes to reflect the contents of integrative approach by the education and psychology departments of universities and other tertiary training institutions. based on personal experiences of the writer as a lecturer of career counselling at various levels and at different training institutions, the integrative approach is embraced and greatly appreciated by not only master’s students but also by students from other levels of training. career counsellors who have been trained in traditional modes of career counselling, too, should be privy to further training opportunities to introduce them to newer, different and more respectful career counselling approaches that more satisfactorily respond to the different demands associated with the rapidly changing world of work. acceptance by career counsellors that their allegiance in career counselling should be to promote the best interests of their clients. as there is a global acceptance of merit and importance of integrative approach, and because research has shown conclusively the value of different postmodern approaches to career counselling, there appears to be no excuse for not doing so (duarte, paixão, & da silva, 2019; hartung, 2019; maree, 2019a). conclusion this article endeavours to demonstrate the importance of using narrative approaches in career counselling assessment in african contexts in addition to quantitative approaches to enhance contextual relevance. it highlights the need for approaches that are more harmonious with the conditions of a developing country. the author has argued that providing career counselling is a multidimensional enterprise. by contextualisation and constant innovation, career counselling is especially important in collectivist, global south (developing country) contexts where people have been suppressed and colonised for decades and where the cultures often differ widely from the contexts in which theories and practices were initially developed, and authentic career counselling has often been obstructed. the article is concluded by reiterating that career counselling assessment theory and practice should be conceptualised from the key perspective that it should particularly meet the basic criteria of contextual relevance. an afterthought it seems important to note sharf’s (2013) assertion that: [n]o career theories of development have been formulated to apply specifically to one culture or another. however, research has been done on the applicability of particular career development theories to specific cultural groups. (p. 17) the author concurs with watson (2013) that the first part of sharf’s statement is particularly problematic. recently, shuttleworth-edwards (2018) asked whether population-based norms are becoming ‘obsolete’, citing suchy’s (2016) assertion that ‘defining a population by country borders only makes sense to the extent that countries are characterised by a single language and a unified educational content’. this is, of course, no longer the case in the majority of contexts because of ‘globalization, migration, and population diversity around the world’ (suchy, 2016, p. 973). noting the aforementioned observations, the author of this article cannot help but wonder: should sharf’s (2013) assertion not maybe be tweaked to read ‘career theories of development [can no longer be conceptualised as if they] should be formulated to apply specifically to one culture [or context] or another.’ (sharf, 2013, p. 17) (my own emphasis added). acknowledgements the author would like to thank mr tim steward (language editor). competing interests the author has declared that no competing interests exists. authors’ contributions i declare that i am the sole author of this research article. ethical consideration this article followed all ethical standards for research without direct contact with human or animal subjects. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability statement data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author. references albien, a.j. (2018). a mixed-methods analysis of black adolescents’ vocational identity status and career adaptability competencies in a south african township. doctoral dissertation. stellenbosch university, stellenbosch. alika, h.i., & egbochuku, e.o. (2009). vocational interest, counselling, socio-economic status and age as correlated of re-entry of girls into schools in edo state. edo journal of counselling, 2(1), 9–16. https://doi.org/10.4314/ejc.v2i1.52649 bimrose, j. (2010). adapting in a changing world: dealing with repeated career transitions. in k. maree (ed.), career counselling: methods that work (pp. 118–127). cape town: juta. blustein, d.l., palladino schulteiss, d.e., & flum, h. (2004). toward a relational perspective of the psychology of careers and working: a social constructionist analysis. journal of vocational behavior, 6, 423–440. https://doi.org/10.1016/j.jvb.2003.12.008 cardoso, p., silva, j.r., gonqalves, m.m., & duarte, m.e. (2014). narrative innovation in life design counseling: the case of ryan. journal of vocational behavior, 85, 276–286. https://doi.org/10.1016/j.jvb.2014.08.001 department of higher education and training (dhet). (2017). national policy for an integrated career development system for south africa. pretoria: government printers. duarte, m.e., paixão, m.p., & da silva, j.t. (2019). life-design counseling from an innovative career counseling perspective. in j.g. maree (ed.), handbook of innovative career counselling (pp. 35–52). new york: springer. duffy, r.d., & dik, b.j. (2009). beyond the self: external influences in the career development process. the career development quarterly, 58, 29–43. https://doi.org/10.1002/j.2161-0045.2009.tb00171.x fugate, m., & kinicki, a.m. (2008). a dispositional approach to employability: development of a measure and test of implications for employee reactions to organizational change. journal of occupational and organizational psychology, 81(3), 503–527. https://doi.org/10.1348/096317907x241579 fugate, m., kinicki, a.j., & ashforth, b.e. (2004). employability: a psycho-social construct, its dimensions, and applications. journal of vocational behavior, 65, 14–38. https://doi.org/10.1016/j.jvb.2003.10.005 gerryts, e., & maree, j.g. (2019). enhancing the employability of young adults from socio-economically challenged contexts: theoretical overview. in j.g. maree (ed.), handbook of innovative career counselling (pp. 425–444). new york: springer. gillwald, a. (2019). south africa is caught in the global hype of the fourth industrial revolution. the conversation. retrieved june 13, 2019 from https://theconversation.com/south-africa-is-caught-in-the-global-hype-of-the-fourth-industrial-revolution-121189 guichard, j. (2009). self-constructing. journal of vocational behavior, 75, 251–258. https://doi.org/10.1016/j.jvb.2009.03.004 gurri, m. (2013). the revolt of the public and the crisis of authority in the new millennium (kindle dx version). retrieved from amazon.com hartung, p.j. (2013). career as story: making the narrative turn. in w.b. walsh, m.l. savickas, & p.j. hartung (eds.), handbook of vocational psychology (4th edn., pp. 33–52). new york: routledge. hartung, p.j. (2019). life design: a paradigm for innovating career counselling in global context. in j.g. maree (ed.), handbook of innovative career counselling (pp. 3–18). new york: springer. health professions council of south africa (hpcsa). (2010). policy on the classification of psychometric measuring devices, methods and techniques. pretoria: hpcsa. henton, d., & held, k. (2013). the dynamics of silicon valley: creative destruction and the evolution of the innovation habitat. social science information, 52(4), 539–557. https://doi.org/10.1177/0539018413497542 lopez levers, l.l., may, m., & vogel, g. (2011). research on counseling in african settings. in e. mpofu (ed.), counseling people of african ancestry (pp. 57–74). new york: cambridge university press. maree, j.g. (2010). career story interviewing using the three anecdotes technique. journal of psychology in africa, 20, 369–380. maree, j.g. (2013). counselling for career construction. connecting life themes to construct life portraits: turning pain into hope. rotterdam: sense. maree, j.g. (2015). life themes and narratives. in p.j. hartung, m.l. savickas & w.b. walsh (eds.), apa handbook of career intervention. applications (vol. 2, pp. 225–239). new york: american psychology association (apa). maree, j.g. (2016a). career construction counselling with a mid-career black male. career development quarterly, 64, 20–35. https://doi.org/10.1002/cdq.12038 maree, j.g. (2016b). using interpersonal process during career construction counselling to promote reflexivity and expedite change. journal of vocational behaviour, 96, 22–30. https://doi.org/10.1016/j.jvb.2016.07.009 maree, j.g. (2017a). the career interest profile (version 6). randburg: jvr psychometrics. maree, j.g. (2017b). utilizing career adaptability and career resilience to promote employability and decent work and alleviate poverty. in j.g. maree (ed.), handbook of career adaptability, employability, and resilience (pp. 349–373). new york: springer. maree, j.g. (2017c). life design counselling. in g. stead & m. watson (eds.), career psychology (3rd edn., pp. 105–118). pretoria: van schaik. maree, j.g. (2019b). contextualisation as a determining factor for career counselling throughout the world. in j.a. athanasou & h.n. perera (eds.), international handbook of career guidance (2nd edn., pp. 555–578). new york: springer. maree, j.g. (ed.). (2019a). handbook of innovative career counselling. new york: springer. maree, j.g., & gerryts, e. (2019). enhancing the employability of unskilled and unemployed young adults: practical guidelines. in j.g. maree (ed.), handbook of innovative career counselling (pp. 444–469). new york: springer. maree, j.g., & molepo, m. (2016). implementing a qualitative (narrative) approach in cross-cultural career counselling. in m. mcmahon (ed.), career counselling: constructivist approaches (2nd edn., pp. 65–78). rotterdam: sense. maree, j.g., cook, a., & fletcher, l. (2018). assessment of the value of group-based counselling for career construction. international journal of adolescence and youth, 23(1), 118–132. mcadams, d.p. (2001). the psychology of life stories. review of general psychology, 5(2), 100–122. https://doi.org/10.1037//1089-2680.5.2.100 mcmahon, m., & watson, m. (2015). qualitative career assessment: future directions. in m. mcmahon & m. watson (eds.), career assessment: qualitative approaches (pp. 257–262). rotterdam: sense. metz, a.j., & guichard, j. (2009). vocational psychology and new challenges. the career development quarterly, 57, 310–318. https://doi.org/10.1002/j.2161-0045.2009.tb00116.x nussbaum, b., palsule, s., & mkhize, v. (2010): personal growth african style. johannesburg: penguin. oakland, t. (2004). use of educational and psychological tests internationally. applied psychology: an international review, 53, 157–172. https://doi.org/10.1111/j.1464-0597.2004.00166.x phares, e.j. (1992). clinical psychology: concepts, methods and profession. pacific grove, ca: brooks/cole. savickas, m.l. (2015). career counseling paradigms: guiding, developing, and designing. in p.j. hartung, m.l. savickas, & w.b. walsh (eds.), apa handbook of career intervention: vol. 1. foundations (pp. 129–142). washington, dc: american psychological association (apa). savickas, m.l. (2019). career counselling (2nd edn.). washington, dc: american psychological association (apa). savickas, m.l., nota, l., rossier, j., dauwalder, j.p., duarte, m.e., guichard, j., … van vianen, a.e.m. (2009). life designing: a paradigm for career construction in the 21th century. journal of vocational behavior, 75(3), 239–250. https://doi.org/10.1016/j.jvb.2009.04.004 schumpeter, j.a. (1942). capitalism, socialism and democracy. new york: harper torchbooks. schumpeter, j.a. (1982 [1934]). the theory of economic development. piscataway, nj: transaction. schwab, k. (2019). the fourth industrial revolution: what it means, how to respond. retrieved august 16, 2019 from http://www.weforum.org/agenda/2016/01/the-fourth-industrial-revolution-what-it-means-and-how-to-respond schwab, k., & samans, r. (2016). the future of jobs. employment, skills and workforce strategy for the fourth industrial revolution. world economic forum. retrieved june 11, 2016 from http://www.nmi.is/media/338436/the_global_competitiveness_report_2016-2017.pdf sharf, r.s. (2013). applying career development theory to counselling (6th edn.). independence, ky: brooks/cole. shuttleworth-edwards, a.b. (2018). response to taylor and de beer (2018) on population-based norms and iq testing. south african journal of psychology, 48(2), 175–178. https://doi.org/10.1177/0081246317747170 sonn, c.c, stevens, g., & duncan, n. (2013). decolonisation, critical methodologies, and why stories matter. in g. stevens, n. duncan, & d. hook (eds.), race, memory and the apartheid archive: towards a transformative psychosocial praxis (pp. 295–314). london: palgrave macmillan. stead, g., & watson, m. (eds.). (2017). career psychology (3rd edn.). pretoria: van schaik. stevens, g., duncan, n., & sonn, c.c. (2013). memory, narrative and voice as liberatory praxis in the apartheid archive. in g. stevens, n. duncan, & d. hook (eds.), race, memory and the apartheid archive: towards a transformative psychosocial praxis (pp. 25–44). london: palgrave macmillan. suchy, y. (2016). population-based norms in crisis. the clinical neuropsychologist, 30, 973–974. https://doi.org/10.1080/13854046.2016.1225363 watson, m. (2013). deconstruction, reconstruction, co-construction: career construction theory in a developing world context. indian journal of career and livelihood planning, 2, 1–12. abstract gaining insight into the impact of covid-19 on organisations the commercial and strategic business impact of covid-19 leveraging technology in assessment practices in a covid-19 world assessment practices in retrenchment and restructuring organisations’ perspectives of a post-covid-19 world closing thoughts acknowledgements references footnote about the author(s) kim e. dowdeswell tts top talent solutions, pretoria, south africa department of human resource management, faculty of economic and management sciences, university of pretoria, pretoria, south africa hennie j. kriek tts top talent solutions, pretoria, south africa department of industrial and organisational psychology, faculty of economic and management sciences, university of south africa, pretoria, south africa citation dowdeswell, k.e., & kriek, h.j. (2021). shifting assessment practices in the age of covid-19. african journal of psychological assessment, 3(0), a50. https://doi.org/10.4102/ajopa.v3i0.50 original research shifting assessment practices in the age of covid-19 kim e. dowdeswell, hennie j. kriek received: 04 jan. 2021; accepted: 13 apr. 2021; published: 28 may 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the coronavirus disease 2019 (covid-19) pandemic has had an unprecedented impact on the world of work, and we see a corresponding shift in the talent management and assessment spheres. this commentary reflects on the impact the pandemic has had on organisations’ human resource (hr) practices in general and on assessment practices in particular. informed by insights drawn from a series of in-depth interviews with representatives of organisations in south africa and on the broader african continent, we consider recent trends in unproctored internet testing (uit) and virtual or video interviewing technologies that appear central to how organisations have adapted their assessment practices in a covid-19 world. we also consider the role of various assessment practices in retrenchment and restructuring applications. finally, potential implications for organisations and their assessment practices when moving towards a post-covid-19 world are discussed. keywords: talent assessment practices; covid-19; unproctored internet testing; virtual interviewing; remote working; digitalisation; retrenchment. undoubtedly, the coronavirus disease 2019 (covid-19) pandemic has had an unprecedented disruptive impact on the world of work, forcing fundamental changes to the way organisations function and how people work. virtually overnight, organisations had to adapt their business strategies and operating models while workers found themselves either unable to work or working out how to work from home to limit the spread of the coronavirus. in this article, we reflect on the impact of the pandemic on organisations’ human resource (hr) practices in general and on assessment practices1 in particular and discuss implications for organisations on the way forward. gaining insight into the impact of covid-19 on organisations our reflections are informed by observations arising from a series of in-depth interviews undertaken by the tts top talent solutions team with 41 key clients across a variety of industries. the semi-structured interviews sought to gain a thorough understanding of the pandemic’s impact on clients’ businesses and covered several key aspects of talent management and business processes in addition to understanding the impact the pandemic and lockdown had on organisations’ hiring practices, talent assessment practices and adoption of new talent technologies (tts top talent solutions, 2020). the interviews were conducted during july–september 2020 within the context of our client service provider relationships, with all participants providing their consent to be interviewed prior to our asking of their opinions on how the pandemic had influenced their organisations’ use of assessments, amongst other related topics. two-thirds of the participants were either executives, directors and heads of functions or held overall accountability for their organisation’s assessment practices. given the seniority of the majority of participants together with the range in participating organisations’ size, industry and geographic location, we would argue that it lends credibility to the insights and opinions shared (see table 1). table 1: description of participating organisations. the commercial and strategic business impact of covid-19 the commercial impact of the pandemic and ensuing national lockdowns around the world has been substantial. in south africa in particular, gross domestic product (gdp) dropped by just over 16% between the first and second quarters of 2020, resulting in a negative annualised growth rate substantially worse than the contraction experienced during the 2009 global financial crisis (statistics south africa, 2020). in line with this economic downturn and gdp contraction, we found almost three-quarters of participating client companies had experienced a negative to substantially negative commercial impact on their business. while only one participating organisation was undergoing a major restructuring specifically because of the covid-19 crisis at the time of the interviews, one in five participating organisations had seen some staff retrenched and one in four anticipated retrenchments within the next 6 months. the impact of the covid-19 pandemic has been felt not only commercially and on client organisations’ business models but also on their hr initiatives, with a substantial impact on organisations’ hiring practices in particular. unsurprisingly, given the pandemic’s largely negative commercial impact, most participants noted a short-term reduction (42%) or a complete freeze (26%) on hiring. for the remaining participants noting either no impact or that hiring was continuing for critical roles only, the limitations on face-to-face contact imposed by lockdown regulations brought the opportunities offered by technology-enabled assessment practices into sharp focus. in the next section, we highlight two assessment practices that we believe became particularly salient during lockdown, supported by the observations of our client organisations. leveraging technology in assessment practices in a covid-19 world unproctored internet testing firstly, we believe the widespread use of unproctored internet testing (uit), enabled by technological advancements in assessment during the early 2000s, offered an avenue for many organisations to continue assessing people without risking face-to-face exposure. in an illustration of how internet-delivered assessment has become the norm rather than the exception in the past decade (foxcroft & roodt, 2018), kantrowitz, tuzinski and raines (2018) reported the prevalence of online assessments in hiring applications to be 76% in a middle east and african sample, and 84% in the global sample. in the south african context specifically during lockdown, more than half of our participating organisations indicated that the covid-19 pandemic had no impact on how they were using assessments (presumably, already using online assessments prior to the covid-19 crisis), while a further quarter reported either moving to online assessments (having not used them previously) or using more online assessments than previously. participants also noted undertaking more structured validation interviews with candidates, given the increased usage of online unsupervised (but controlled) assessments, an increased focus on development rather than on hiring, and a change in criterion focus with specific consideration for individuals’ suitability to remote working. the temporary shuttering of workplaces and public facilities such as universities and libraries during the lockdown may give rise to renewed concerns about unequal candidate access to computer technology, which were prevalent during the early days of uit adoption (laher & cockcroft, 2013; tippins et al., 2006). while such concerns are not unique to south africa, they were and are particularly salient given the extreme inequality that persists in the country (world bank, 2018). however, the increasing use of mobile-delivered assessments, again enabled by advancements in both technology and assessment science, could go a long way to mitigate this risk as tablets and smartphones are more affordable and ubiquitous than computers (laher & cockcroft, 2013), and with south africa’s smartphone penetration reaching 91.2% in 2019 (independent communications authority of south africa, 2020). furthermore, recent studies have demonstrated the equivalence of mobile assessments both internationally (mcclure johnson, capman, siemsen, martin, & boyce, 2019) and in south africa (meyer, clifton, & dowdeswell, 2020), offering further support for their use in the local context. virtual or video interviewing while many organisations placed a freeze on hiring in response to the commercial impact of the covid-19 pandemic, in our interviews with clients we noted a substantial uptake in the use of virtual or video interviewing. prior to the covid-19 crisis, virtual interviewing was used by less than one-third of participants; subsequently, this figure nearly doubled to almost 60% of participants, with a further one in five participants planning to adopt virtual interviewing at the time of the research. interestingly, however, over two-thirds of participants reported utilising two-way or synchronous ‘live’ virtual interviews. this highlights that although several organisations have adopted virtual interviewing during the crisis, they are not necessarily tapping into the efficiency and speed benefits offered by asynchronous virtual interviewing technology platforms such as hirevue (hirevue.com), vidrecruiter (vidrecruiter.com), recright (www.recright.com) and interview rocket (interviewrocket.com). this is an important point that bears consideration as the light at the end of the tunnel (i.e. the recovery of economies and easing of financial constraints on organisations) draws closer: with job losses because of covid-19 numbering in the millions around the world, it is reasonable to expect recruiters to be inundated with applications when organisations start opening up their hiring initiatives again. the utilisation of well-designed asynchronous virtual interviewing applications can support recruiters in dealing with the expected increase in applications in a fair and efficient manner, leveraging the demonstrated benefits of technology-assisted evaluation of candidates (campion, campion, campion, & reider, 2016). assessment practices in retrenchment and restructuring while leveraging technology enabled several organisations to successfully adapt their assessment practices to a covid-19 world, the place of assessment practices in the retrenchment and restructuring necessitated by the economic crisis in many organisations is also an important discussion point. as a first consideration, because south african labour legislation requires personnel-related decisions to be based on job-inherent requirements, we asked affected organisations whether they had clear job or role specifications in place to assist with their retrenchment processes. the responses were of substantial concern: almost half of participating organisations affected by or anticipating retrenchments did not have clear job or role specifications in place to guide what assessment practices would be appropriate to inform decision-making. if retrenchment decisions are taken based on inaccurate job requirements, the organisation runs the risk not only of not retaining staff with the key knowledge, skills, abilities and other characteristics needed for successful recovery, but also opens themselves up to possible legal challenges. a second key consideration informing the role of assessment practices in retrenchment and restructuring is the nature of the process: whether an organisation is purely reducing headcount or restructuring and redesigning roles. where job requirements are not changing (i.e. in the case of downsizing), decisions should be made on the basis of employees’ current job performance and other relevant criteria such as tenure and technical skills. in contrast, where job requirements are changing and new requirements are brought in, future-oriented assessment methods such as personality measures and cognitive ability tests can provide valuable insight into employees’ potential to succeed in the new role (bywater & thompson, 2005). this perspective is upheld by recent case law whereby the labour court found that psychometric tests may be used as assessment criteria to fill vacant posts during a retrenchment process (pratten v afrizun kzn (pty) ltd, 2020), building on a previous labour appeal court judgement that psychometric tests should not be used to select employees for retrenchments (south african breweries v louw, 2018). organisations’ perspectives of a post-covid-19 world in september 2020, the world bank reported anticipating that a full global economic recovery from the covid-19 pandemic could take up to 5 years to achieve (nagarajan, 2020), and exactly how the economic recovery will play out remains uncertain. when we asked our clients what they thought the talent management space would look like in a post-covid-19 world, over 80% of participants stated that their talent functions would return to normal within the next year at least. perhaps more aligned to the world bank view, one in seven participants felt that the process could take up to 2 years and was more likely to require adapting to a new normal rather than returning to the old normal. for example, increased remote working and flexible hours was expected to be one of the top three anticipated changes, in line with gartner’s reported 82% of leaders intending to allow remote working at least some of the time and 47% of leaders supportive of remote working full-time (gartner, 2020). reasons given for this expectation include increased efficiencies and effectiveness leading to time and cost savings (for both organisations and staff), as well as broadening the reach of talent sourcing and applicant attraction beyond the organisation’s immediate geographic locations or applicants’ willingness to relocate. the importance of the central role that technology-enabled assessment practices now hold in organisations is underscored by our clients’ expectations that remote working will continue past the current crisis, and that ongoing support for digitalisation and digital transformation initiatives will continue to accelerate as it has done during the crisis. surprisingly, then, given the disruptive impact of the covid-19 pandemic on virtually all aspects of life, in our discussions with clients we found that 25% of participating organisations did not anticipate having to make any fundamental changes to how their organisations would operate in a post-covid-19 world. while this view may be specific to a certain industry and not widespread across industries or organisational types, it may make preparing for a ‘newly uncertain’ future (deloitte, 2020) more challenging. closing thoughts the covid-19 pandemic has had wide-reaching implications for how organisations operate, organisational performance, and how talent and assessment practices are approached. a clear learning from 2020 is the central role played by technology in enabling operations to continue despite the ensuing national lockdowns. in this way, the pandemic has focused attention on how organisations orientate themselves to new ways of working and new technologies that enable this change. despite some organisations intending to return to their old ways of working in the post-covid-19 world, we believe that the pandemic has served as a flashpoint, emphasising the importance of utilising best-of-breed talent assessments and technologies to maintain and improve effectiveness, efficiency and candidate experience while insulating the organisation and its practices from future shocks as much as possible. acknowledgements competing interests the authors declared that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions h.j.k. was responsible for conceptualisation, project design and article review. k.e.d. was responsible for project design, analysis and article drafting and editing. ethical considerations this article followed all ethical standards for research without direct contact with human or animal subjects. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability data sharing is not applicable to this article as the views shared are the opinions of the authors. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references bywater, j., & thompson, d. (2005). personality questionnaires in a redundancy/restructuring setting: what do we know now? selection and development review, 21, 7–13. campion, m.c., campion, m.a., campion, e.d., & reider, m.h. (2016). initial investigation into computer scoring of candidate essays for personnel selection. journal of applied psychology, 101(7), 958–975. https://doi.org/10.1037/apl0000108 deloitte. (2020). recovering from covid-19: considering economic scenarios for resilient leaders. white paper. retrieved from https://www2.deloitte.com/content/dam/deloitte/za/documents/about-deloitte/za-deloitte-scenarios-for-resilient-leaders-april-2020-2.pdf foxcroft, c., & roodt, g. (2018). introduction to psychological assessment in the south african context (5th edn.). oxford university press, cape town, south africa. gartner. (2020, july 14). gartner survey reveals 82% of company leaders plan to allow employees to work remotely some of the time. press release. retrieved from https://www.gartner.com/en/newsroom/press-releases/2020-07-14-gartner-survey-reveals-82-percent-of-company-leaders-plan-to-allow-employees-to-work-remotely-some-of-the-time independent communications authority of south africa. (2020). the state of the ict sector report in south africa, 2020. retrieved from https://www.icasa.org.za/uploads/files/state-of-the-ict-sector-report-march-2020.pdf kantrowitz, t.m., tuzinski, k.a., & raines, j.m. (2018). 2018 global assessment trends report. white paper. retrieved from https://www.shl.com/en/assessments/trends/global-assessment-trends-report/ laher, s., & cockcroft, k. (2013). psychological assessment in south africa: research and applications 2000–2010. wits university press, johannesburg, south africa. https://doi.org/10.18772/22013015782 mcclure johnson, t.k., capman, j.f., siemsen, a., martin, n.r., & boyce, a.s. (2019, april 04–06). exploring equivalence and applicant reactions to a mobile cognitive assessment battery [symposium]. 34th annual conference of the society for industrial and organizational psychology, national harbor, md. meyer, a., clifton, s. & dowdeswell, k.e. (2020, december 01–03). assessments on the go: equivalence of smartphone vs. non-smartphone delivered cognitive tests [master tutorial]. 22nd annual conference of the society for industrial and organisational psychology of south africa. virtual. nagarajan, s. (2020, september 18). full global recovery from covid-19 may take 5 years, world bank chief economist says. business insider. retrieved from https://www.businessinsider.co.za/world-bank-global-economic-recovery-will-take-5-years-2020-9 pratten v afrizun kzn (pty) ltd (d439/15) [2020] zalcd 9; (2020) 41 ilj 2899 (lc) (17 april 2020). retrieved from http://www.saflii.org/za/cases/zalcd/2020/9.html south african breweries (pty) ltd v louw (ca16/2016, c285/2014) [2017] zalac 63; [2018] 1 bllr 26 (lac); (2018) 39 ilj 189 (lac) (24 october 2017). retrieved from http://www.saflii.org/za/cases/zalac/2017/63.html statistics south africa. (2020, september 08). steep slump in gdp as covid-19 takes its toll on the economy. retrieved from http://www.statssa.gov.za/?p=13601 tippins, n.t., beaty, j., drasgow, f., gibson, w.m., pearlman, k., segall, d.o., & shepherd, w. (2006). unproctored internet testing in employment settings. personnel psychology, 59(1), 189–225. https://doi.org/10.1111/j.1744-6570.2006.00909.x tts top talent solutions. (2020, december 08). covid-19’s impact on work and talent assessment practices. white paper. retrieved from https://www.tts-talent.com/blog/whitepaper-the-impact-of-covid-19-on-work-and-talent-assessment-practices/ world bank. (2018). overcoming poverty and inequality in south africa: an assessment of drivers, constraints and opportunities. retrieved from http://documents.worldbank.org/curated/en/530481521735906534/pdf/124521-rev-ouo-south-africa-poverty-and-inequality-assessment-report-2018-final-web.pdf footnote 1. in this commentary, the term ‘assessment practices’ is used to encompass both psychological assessments, such as personality measures and cognitive ability tests, as well as ‘other similar assessments’, in line with section 8 of south africa’s employment equity act. this perspective acknowledges that other measures commonly used in employment contexts that are not necessarily psychological in nature, such as interviews and assessment centres, still need to have demonstrable evidence of the appropriateness of their use and that practitioners need to be mindful in how they utilise such measures in practice. abstract introduction methods results discussion conclusions acknowledgements references about the author(s) hannelie du preez department of humanities education, faculty of education, university of pretoria, pretoria, south africa celeste-marié combrinck department of science, mathematics and technology education, faculty of education, university of pretoria, pretoria, south africa citation du preez, h., & combrinck, c-m. (2022). the sensory classroom teacher questionnaire: a tool for assessing conducive classroom conditions for children with adhd. african journal of psychological assessment, 4(0), a107. https://doi.org/10.4102/ajopa.v4i0.107 original research the sensory classroom teacher questionnaire: a tool for assessing conducive classroom conditions for children with adhd hannelie du preez, celeste-marié combrinck received: 18 feb. 2022; accepted: 30 june 2022; published: 30 aug. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract physical classrooms provide immense sensory stimulation to children and inform behaviour, cognitive processes and psychological state of mind. children diagnosed with any subtype of attention-deficit hyperactivity disorder (adhd) are more likely to exhibit sensory integration/processing impairments that contribute to inappropriate behavioural and learning responses. teachers need good information and user-friendly psycho-educational instruments to meet the needs of children diagnosed with any adhd subtype. the sensory classroom teacher questionnaire (sctq) utilises adhd symptomatology to evaluate learning spaces that support children in regulating their response to sensory input. we report on the piloted design and refinement of the sctq based on best practices. a convenience sample of south african early childhood teachers administered the first (n = 313) and second (n = 72) versions of the sctq at various primary schools. cross-disciplinary specialists appraised the sctq for content validity, while the rasch rating scale model was applied to assess internal construct reliability and validity. the structure of the latent constructs was assessed using bayesian confirmatory factor analysis. following the first pilot, we refined the sctq by combining or deleting unnecessary items and reducing the five-point likert scale to a three-point scale. revising the likert scale in version one was necessary to improve category functioning. adjusting the three-point scale in the revised sctq indicated good item and scale functioning. we show the conceptual framework, refinement process, all results and the most recent version of the sctq for teachers to use and educational researchers to adapt further. keywords: attention-deficit hyperactivity disorder (adhd); early childhood development and education (ecde); inclusive communal learning spaces; sensory classroom teacher questionnaire (sctq); psycho-educational assessment instruments; sensory integration/processing; sensory ergonomics; south africa. introduction children instinctively observe and process sensory stimuli as indicators of how to act, feel and behave (lópez et al., 2018; mahdjoubi & akplotsyi, 2012). children can subconsciously judge if a communal learning space is supportive of their educational and developmental needs (davies, 2020; dupaul & stoner, 2015). children whose central nervous system is challenged to integrate and process sensory stimuli might interpret their classroom conditions as unconducive. feeling uncomfortable in their learning environment will affect their behaviour, learning and well-being (ayres, 1979; zimmer et al., 2012). therefore, educators play a vital role in designing conducive communal learning spaces so that children can best absorb, retain and process new information. attention-deficit hyperactivity disorder and sensory processing disorder difficulty in integrating and processing environmental and sensory information has been observed in various neurodevelopmental disorders, especially attention-deficit hyperactivity disorder (adhd) (ghanizadeh, 2011; zimmer et al., 2012). the american psychiatric association (apa, 2013) considers adhd the most prevalent psychiatric disorder among the young child population. in a communal classroom of ± 30 children, two to three are diagnosed with adhd (barkley, 2018; micoulaud-franchi et al., 2016), increasing the likelihood for them also to experience sensory integration or processing difficulties (ghanizadeh, 2011). table 1 summarises the responses and characteristics the child is confronted with to a different degree, intensity and nature. table 1: attention-deficit hyperactivity disorder characteristics and sensory processing disorder response regulation. sensory ergonomics as an intervention approach in early education professionals can benefit from using psychometric-founded instruments to interpret environmental and sensory stimuli to adapt to communal spaces, for example, the classroom climate scale (lópez et al., 2018), classroom sensory environment assessment (miller-kuhaneck & kellehers, 2018) and the sensory gating deficit and distractibility questionnaire for adults with adhd (micoulaud-franchi et al., 2016). however, these mentioned instruments focus predominantly on a specific aspect of sensory integration (e.g. environmental design) or an age group (adults), which does not cater for the needs of a south african early childhood teacher. delaying interventions will increase the prospects of children with adhd being unsuccessful in their school trajectory (huerta, 2017). therefore, looking to early childhood teachers as ‘path changers’ is not unfounded. introducing young children to quality learning environments is not a luxury but rather a necessity. learning spaces ought to be perceived as emotionally safe, socially smart, environmentally friendly and cognitively supportive (jensen, 2003). ergonomics for children integrates a wide swath of disciplines (e.g. psychology, rehabilitation, education, architecture, law) to ensure developmentally appropriate practices (lueder & rice, 2007). more specifically, sensory ergonomics is considered a trusted strategy to cater for children’s special educational and developmental needs by constructing conducive classroom conditions (lombard, 2015; eds. lueder & rice, 2007). studying children’s ability to self-regulate physical, emotional and cognitive responses (sensation) within the learning environment (ergonomics) sheds insight into whether their sensory nervous system is promoting or hindering their functioning (brown, 2002; lombard, 2015). early childhood education challenges in south africa early childhood education in south africa, also known as foundation phase, is children’s first compulsory entry to schooling. six-year-old children enter the schooling system as preschoolers in an informal classroom before transitioning to a more formal school setting in subsequent grades. although informal (grade r) schooling and formal (grades 1–3) schooling are clustered as early childhood education, the curriculum design, pedagogical approach, teaching and learning support material, and classroom setup for preschoolers and schoolers differ significantly (van heerden & du preez, 2021). herewith some challenges teachers in south african schools come to face: firstly, it is not uncommon for developing countries to have mainstream and multi-aged classrooms in the early years that are significantly larger (n = ≥ 45) than international classrooms (n = ± 24) (howie et al., 2017), suggesting that larger classes increase the likelihood of hosting more children with adhd (perold, louw, & kleynhans, 2010). secondly, south african teachers are often situated in disempowering learning environments in terms of physical size, socio-economic status, high child-to-teacher ratios, limited access to multidisciplinary teams for guidance and a lack of developmentally appropriate resources and equipment (balfour, mitchell, & moletsane, 2008). thirdly, foundation phase teachers may not be well informed about neurological disorders (e.g. adhd, sensory processing disorder [spd] and associated subtypes) and interdisciplinary interventions (e.g. sensory ergonomics) to adapt to learning environments (brown, 2002; perold et al., 2010). fourthy, optimising an environment for rich sensory stimuli and spd is the specialisation field of registered occupational therapists (ayres, 1979; lombard, 2015), leaving south african teachers to decipher child–environment synergy for themselves. fifthly, too few training opportunities for continuous professional teacher development (cptd) is offered to broaden their knowledge and skills on special educational needs (de clercq & phiri, 2013). lastly, teacher-friendly psycho-educational resources to assist teachers in creating conducive learning environments are rarely freely available. research gap and contribution considering all mentioned dichotomies, a prospect is offered to recouple cross-disciplinary knowledge systems to promote conducive classroom conditions for children with adhd who could also experience sensory integration/processing challenges. the nexus this article would like to present is a well-designed psycho-educational assessment instrument for grade 1–3 teachers when creating inclusive, conducive and sensory ergonomic learning spaces for children diagnosed with adhd. the researchers designed an instrument that assesses environmental conditions which teachers could use to enhance the classroom climate. this study’s main objectives and processes are as follows: conceptualising a psycho-educational instrument guided by principles of sensory ergonomics, sensory integration/processing and the triad of characteristics of children with adhd symptoms analysing and refining the sensory classroom teacher questionnaire (sctq) through two rounds of piloting offering an inclusive and pedagogical practice-oriented psycho-educational assessment instrument for early childhood teachers. methods evidence of behaviour was systematically and objectively gathered, and items were created from which inferences can be drawn (du preez & de klerk, 2019; murphy & davidshofer, 2005). mixed methods research is considered appropriate and is necessary for instrument conceptualisation, development and validation (zhou, 2019) as it enables test developers to be iterative and intentional in abstracting, simplifying and categorising qualitative and quantitative evidence (du preez & de klerk, 2019). the three phases used to develop this instrument are proposed by creswell and plano clark (2017) in combination with a sequential mixed methods research design. these phases are given as follows: qualitatively defining the latent constructs of the instrument qualitatively conceptualising and revising items from a psycho-philosophical viewpoint quantitatively piloting the instrument. this study followed best practices in validating scales, by means off: item generation based on theory, improving content validity aided by subject-matter experts, pretesting items, item reduction and refinement, and piloting the instrument for reliability, validity and dimensionality (boateng, neilands, frongillo, melgar-quiñonez, & young, 2018). conceptualising the psycho-educational instrument evidence for instrument validity commences with an adept conceptualisation phase that requires identifying a suitable conceptual or theoretical framework, relevant extensive scholarly literature and a panel of experts to scrutinise the construct(s) and provide feedback on the content validity (du preez & de klerk, 2019; michell, 1997; zhou, 2019). existing scholarly theories were utilised to generate items for the sctq to measure sensory ergonomics as a latent construct. the sctq is based on meaning-making frameworks about sensory integration/processing, sensory ergonomics and the triad characteristics of adhd. modulating sensory input is imperative for everyday functioning as it influences productivity, focus, attention, communication and interaction (alnajjar et al., 2015; apa, 2013; brown, 2002; lombard, 2015), hitherto a challenge for children with adhd. factual statements (items) were derived from literature and categorised into subscales to represent the dimensions of the latent construct. the aim was to create concise and unambiguous items that would measure one central idea and remain consistent with the purpose of the measurement (murphy & davidshofer, 2005; eds. schweizer & distefano, 2016). the final version of the sctq is summarised in table 2, listing the construct, then the question number and the item statement followed by likert scale options. the questionnaire is self-administered and can be used by early childhood teachers themselves or informed observers. table 2: the sensory classroom teacher questionnaire (final version). as presented in table 2, the three underlying latent constructs, or subscales, represent the conceptual framework. these latent constructs offer item statements that suggest practical ways to adapt, change and manipulate the existing communal learning space. sensory integration/processing interventions assist children with adhd in modifying and regulating their own social and academic behaviour, which they find challenging, as alluded to in the triad of characteristics. the instrument is a multidimensional questionnaire that assesses aspects of overarching sensory ergonomics in the early childhood classroom. the sctq’s constructs are given as follows: attention (co)regulation: teacher and peers can model, coach or assist the child with adhd to manage his or her deployment of attention, by offering a learning space that is supportive of maintaining attention, staying alert and directing attention to task goals, and ignoring distracting or irrelevant stimuli. the maximum score for the section is 24. learning space design: teachers, in collaboration with the school principal, staff and/or colleagues, intentionally consider which developmentally appropriate educational material, equipment and beautifications can be utilised in the communal learning space to produce a sense of security, inclusiveness, safety and sensory synergy. the maximum score for the section is 21. sensory modulation and synergy: a child with adhd utilises his or her communal learning space to organise and regulate his or her reaction to the sensory stimuli in an adaptive manner. a child with adhd can uphold a functional level of attentiveness or alertness (alone or with help from a peer) to respond appropriately to the present sensory stimuli. the maximum score for the section is 27. please note that the olfactory system (smell and taste) is not included for manipulation, as lombard (2015) strongly advises to keep it neutral or natural. sampling of the panel of experts, fieldworkers and teacher participants a purposive and convenience sampling technique was used to recruit the panel of experts, fieldworkers and teacher participants. firstly, researchers consulted with the panel of experts to appraise the content validity of the sctq and its generated statements (items). the panel of experts was multidisciplinary, including two early childhood specialists, one registered occupational therapist, two registered educational psychologists and one research psychologist. secondly, a protocol document was compiled and the fieldworkers were trained on handling topics such as ethical procedure to obtain informed consent, ensuring safe and anonymous participation, how to self-administer the sctq, authoring a qualitative interview report and capturing the raw data in microsoft word and excel. the fieldworkers were qualified early childhood teachers enrolled for their postgraduate studies in learning support. they were authorised by the department of basic education to visit any primary school in south africa and issue a copy of the registered ethics certificate granted by the university of pretoria’s ethics committee of the faculty of education. thirdly, the participants, who self-administered the sctq, were qualified and appointed in-service teachers within the early childhood sector either in grade 1, 2 or 3. the participants were mainly female, as most south african teachers in foundation phase are women (petersen, 2014; sak, 2015). the teacher participants’ biographical profiles are tabulated in table 3. table 3: biographical profile of teacher participants. as depicted in table 3, both rounds of piloting presented a teacher sample profile of having less experience in teaching in the early years; schools with more nongovernment income and; teachers perceiving their adhd knowledge as adequate. data analysis qualitative content analysis a panel of subject-matter experts were assembled to assess the quality of the sctq items as a qualitative content validation method. the feedback generated from the panel of experts provided qualitative evidence for content validity (zhou, 2019). the cross-disciplinary specialists commented on the sctq’s purpose, the phrasing of items and the items’ relevance to measuring the constructs. the panel examined the length, appropriateness and format of the instrument used to improve, reduce and refine the items after the first round of data collection. as for the fieldworkers, who observed the completion of the sctq by the teacher participants, they generated a report on additional information about observations and discussions on (co)regulation to effect attention, such as sensory signals, labelling and schedules; learning space design elements, including regulating sensory inputs through less decoration, providing age-appropriate resources and offering input that the child can see, hear or feel; and regulation of overall functioning by becoming aware of sensory properties that may compromise sensory synergy, for example, seating arrangements, transition areas and designated areas. rasch measurement theory (rmt) was used as the guiding psychometric model to refine and assess the internal reliability and validity of the sctq (bond & fox, 2015; retief, potgieter, & lutz, 2013). bayesian confirmatory factor analysis (cfa) was used to assess the structure of the latent traits (taylor, 2019). rasch rating scale model rasch analysis is a family of probabilistic models that assess items and instruments (andrich & marais, 2019; linacre, 2021). rasch measurement theory evaluates the psychometric properties of the items by modelling the log-odd probability of each rating selected by the teachers who endorsed the construct overall (boone, staver, & yale, 2014). rasch unidimensional models for measurement (rumm 2030) software was utilised (andrich, sheridan, & luo, 2009). the model, item, person and category fit, and invariance and dimensionality were investigated (andrich et al., 2009). descriptive statistics were calculated in ibm’s statistical package for the social sciences (spss). adequate evidence for internal reliability and validity of inferences required (bond & fox, 2015): data fit to model: chi-square goodness-of-fit statistic not significant items to fit the model: small and nonsignificant residual values (< ± 2.5) likert scale categories: ordered monotonically and contribute to the measurement lack of sign ificant secondary constructs: unidimensionality assessed with principal component analysis (pca) and eigenvalues below 2 reliability indices: above 0.70 differential item functioning (dif) absent: bonferroni corrected p-values nonsignificant. bayesian confirmatory factor analysis a bayesian analysis was used to conduct the cfa because of the ordered-categorical nature of the data (arbuckle, 2017). bayesian analysis outperforms maximum likelihood when items have fewer than four categories (stenling, ivarsson, johnson, & lindwall, 2015). in ibm amos (arbuckle, 2021), the markov chain monte carlo algorithm is applied and considered to offer advantages to cfas with categorical data (taylor, 2019). the cfa was used to assess the structural validity of the instrument. a null hypothesis approach was used with noninformative priors. the bayesian model would indicate a good fit for the cfa when (gelman, 2013; harindranath & jayanth, 2018): the posterior predictive p is close to or equal to the value of 0.5. the 95% highest posterior density interval does not contain a 0. the convergence statistics (cs) were below 1.100 as required. ethical considerations ethical approval was obtained from the university of pretoria, faculty of education, ethics committee (reference number: up09/04/01). results the results are presented as two rounds of quantitative data collection. after pretesting the items in the first round, considerable instrument refinement was carried out. the final version was qualitatively examined by subject-matter experts and piloted on another grade 1–3 in-service teacher sample. the second version is considered the final to-date version and evidence from both rounds of piloting (see table 2). pretesting of sensory classroom teacher questionnaire items the combination of rasch statistics, qualitative interviews of the fieldworkers and the review of the items by panel experts led to the revision of the instrument (cavanagh & romanoski, 2006; eds. cavanagh & waugh, 2011). the first version of the sctq had 55 items, and the model fit was significantly different from the data (p < 0.05). four items had significant fit residuals, indicating potential problems. thirty-one out of the original 55 items displayed disordered thresholds and categories, with less than 10% of participants endorsing the option. the five-point likert scale categories were combined into three categories, which resulted in a better fit and no disordered categories. differential item functioning was detected for five items compared to all the demographic characteristics. the refining of the items resolved the dif. principal component analysis in rumm 2030 was used to investigate unidimensionality, and the three factors were confirmed to be independent. the cross-disciplinary panel of experts revised the first version of the sctq so that items could be combined or deleted. the revised instrument has 24 items and an improved overall rmt fit (χ2 = 132.744, p = 0.015). pilot of sensory classroom teacher questionnaire the next round of data collection for the sctq revealed that each construct fit the rasch model, had acceptable reliability and was unidimensional (table 4). table 4: rasch analysis findings for three sensory classroom teacher questionnaire constructs. four items were identified that had problematic categories. reliability indices were above 0.700 and deemed acceptable (linacre, 2021). differential item functioning was absent for grade, school type, teacher knowledge of adhd and teaching experience. the second version of the sctq showed evidence of internally reliable and valid inferences to assess classroom sensory ergonomics in the classroom. the individual item fit statistics are shown in table 5 with their standard error (se), fit residual values, degrees of freedom (df), chi-square and probability values. table 5: item location, standard error and fit residuals. as seen in table 5, none of the items had fit residuals above or below 2.5, and none of the items significantly misfit the rasch model. the correlations among the latent traits are shown in table 6. as derived from the rasch logit conversion, the constructs had moderate to strong relationships (0.378–0.628). the logit scales ranged from −3 to +5, and attention (co)regulation was the most difficult to implement in the classroom (m = 0.361, se = 1.426). table 6: means, standard deviations, minimum, maximum and correlations between constructs. an examination of the items and latent trait structures was conducted with a bayesian cfa. the posterior predictive p is close to 0.50 (p = 0.38). the 95% highest posterior density interval did not contain a 0. all items had convergence statistics lower than 1.1. the posterior predictive p showed a good fit, but the model could be improved. one item did not significantly load onto its factor, question 6 (‘labelling of resources and designated areas’). the convergence statistics were acceptable for the specified structure. discussion sensory integration/processing is a reserved and specialised field of occupational therapists, which leaves south african teachers less likely to benefit from such a knowledge system to create conducive classroom conditions for children with adhd. with the development of this psycho-educational instrument, south african early childhood teachers can now (re)design their learning spaces from sensory ergonomics and sensory integration and processing standpoints, which are sensitive to the special educational and developmental needs of learners with adhd (see table 1). the first round of excessive statements (items) included in the sctq is based on scholarly literature and was appraised by a panel of cross-disciplinary experts. the first version had an excessive number of items recommended in practice, which according to liu (2020) is common when designing a new instrument. the pretesting of these items led to refinement and reduction of the items through rasch analysis and panel of experts content analysis. the second version of the instrument was piloted on a smaller sample of in-service early childhood teachers, and evidence was found for the internal reliability and validity of the sctq’s items. the structure of the latent constructs was established with a bayesian cfa, and the model fit was considered adequate. the results from the pilot showed that valid and reliable inferences could be derived from the psycho-educational instrument for effecting attention (co)regulation, learning space design and sensory modulation and synergy. however, the loadings of items onto the latent traits could be further explored in future studies to derive a more robust model with larger samples and re-examine the categories. the questionnaire presents best practices and thus identifies potential areas which require attention to create conducive classroom conditions using sensory ergonomics. the sequential mixed methods research design for scale development integrated scientific reality with psychometric validity. mixing modes of inquiry occurred at four levels, namely conceptual, operational, piloting and analysis. the conceptual framework of the psycho-educational instrument is based on theories on sensory integration and processing, sensory ergonomics, and the triad characteristics of adhd. the study’s limitations are that the assessment instrument was piloted twice in south african schools that were well resourced and self-rated by teachers who had less than 11 years of experience in teaching in the early years. another limitation is that the predictive validity of the instrument has not yet been examined. future research should include applying the sctq in a wider variety of contexts and testing the predictive validity of the instrument through longitudinal studies. the study has implications for early childhood education practices, as teachers need psycho-educational instruments to evaluate communal learning spaces to reassess their conducive conditions. teachers could use the sctq as a guide to (re)design classroom spaces that reflect sensory ergonomics and inclusive education principles. conclusions empowering early childhood teachers to create conducive learning environments will create a sense of belonging, safety and inclusion among children diagnosed with adhd who could also experience sensory integration and processing challenges. the early childhood teacher should serve as a gatekeeper by becoming more aware of the child–environment relationship by using the sctq psycho-educational tool with instrument for assessing conducive classroom principles. the sctq utilised sensory integration and processing, sensory ergonomics and the triad characteristics of adhd as the meaning-making framework. the sctq offers guidelines on how to adapt, change or manipulate the learning environment to meet the needs of children with special educational and developmental requirements. the sensory ergonomics construct was operationalised by assembling a cross-disciplinary panel of experts for their qualitative content validity input and applying rmt and bayesian cfa. the designed sctq is ‘quantitatively defensible and qualitatively meaningful’ (bond & fox, 2015, p. 329). utilising rmt provided evidence for the instrument’s internal validity and reliability and offered additional guidance to cross-disciplinary specialists. teachers and researchers can use the current instrument to gauge and plan effecting attention (co)regulation, learning space design and sensory modulation and synergy for the child diagnosed with adhd. thus, the teacher, researcher or observer can screen for irrelevant and undesirable environmental stimuli by becoming the synergy mediator before the child with adhd enters the communal learning space. to amplify the importance of this topic in early childhood education, this article concludes with a quote from jensen (2003): environments are the medium in which we live. we can feel them every day, all day long. at school only the quality of the teacher is a greater determinant of student success than the environment. one environment brings out the best in us and another brings out the worst in us. they can be nourishing or toxic, support or draining. environments are never neutral. how important are they? how important is water to fish? (p. v) acknowledgements the leading author and primary investigator of this article received a research development programme (rdp) grant to conduct this research and attended a writing workshop sponsored by the department of research and innovation (dri) at the university of pretoria as part of a capacity-building partnership. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions h.d.p. conceived the presented idea, designed the methodology and was responsible for data collection, data capturing and introduction. c-m.c. conducted the analysis and wrote the methods and results section. both authors were involved in writing the discussion and conclusions. funding information this work is based on the research supported in part by a research development programme (rdp) grant sponsored by the department of research and innovation (dri). data availability the data that support the findings of this study are available from the corresponding author upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references alnajjar, f., itkonen, m., berenz, v., tournier, m., nagai, c., & shimoda, s. (2015). sensory synergy as environmental input integration. frontiers in neuroscience, 8(436), 1–11. https://doi.org/10.3389/fnins.2014.00436 american psychiatric association (apa). (2013). diagnostic and statistical manual for mental disorders (5th ed.). arlington, va: american psychiatric publishing. andrich, d., & marais, i. (2019). a course in rasch measurement theory. singapore: springer. andrich, d., sheridan, b., & luo, g. (2009). interpreting rumm2030. perth: laboratory pty ltd. arbuckle, j.l. (2017). amos 26.0 user’s guide. chicago, il: ibm spss. retrieved from https://www.ibm.com/docs/en/sslvmb_26.0.0/pdf/amos/ibm_spss_amos_user_guide.pdf arbuckle, j.l. (2021). ibm spss amos (version 27.0) [computer program]. ibm corporation. ayres, a.j. (1979). sensory integration and the child. los angeles, ca: western psychological services. balfour, r., mitchell, c., & moletsane, r. (2008). troubling contexts: towards a generative theory of rurality as education research. journal of rural and community development, 3(3), 100–111. barkley, r.a. (ed.). (2018). attention-deficit hyperactivity di sorder: a handbook for diagnosis and treatment (4th ed.). new york, ny: guilford. boateng, g.o., neilands, t.b., frongillo, e.a., melgar-quiñonez, h.r., & young, s.l. (2018). best practices for developing and validating scales for health, social, and behavioral research: a primer. frontiers in public health, 6(149), 1–18. https://doi.org/10.3389/fpubh.2018.00149 bond, t.g., & fox, c.m. (2015). applying the rasch model: fundamental measurement in the hu man sciences (3rd ed.). new york, ny: routledge. boone, w.j., staver, j.r., & yale, m.s. (2014). rasch analysis in the human sciences. dordrecht: springer. brown, c. (2002). what is the best environment for me? a sensory processing perspective. occupationa l therapy in mental health, 17(3–4), 115–125. https://doi.org/10.1300/j004v17n03_08 cavanagh, r.f., & romanoski, j.t. (2006). rating scale instruments and measurement. learning environments research, 9(3), 273–289, netherlands. https://doi.org/10.1007/s10984-006-9011-y cavanagh, r.f., & waugh, r.f. (eds.). (2011). applications of rasch measurement in learning environments research. rotterdam: sense publishers. creswell, j.w., & plano clark, v.l. (2017). designing and conducting mixed methods research (3rd ed.). los angeles, ca: sage. davies, c. (ed.). (2020). creating multi-sensory environments: practical ideas for teaching and learning (2nd ed.). new york, ny; routledge. de clercq, f., & phiri, r. (2013). the challenges of school-based teacher development initiatives in south africa and the potential of cluster teaching. perspectives in education, 31(1), 77–86. retrieved from https://hdl.handle.net/10520/ejc133226 dupaul, g.j., & stoner, g. (2015). adhd in the schools. assessment and intervention strategies (3rd ed.). new york, ny: guilford press. du preez, h., & de klerk, w. (2019). a psycho-philosophical view on the ‘conceptualisation’ of psychological measure development. south african journal of industrial psychology, 45(1), 1–10. https://doi.org/10.4102/sajip.v45i0.1593 gelman, a. (2013). two simple examples for understanding posterior p-values whose distributions are far from uniform. electronic journal of statistics, 7, 2595–2602. https://doi.org/10.1214/13-ejs854 ghanizadeh, a. (2011). sensory processing problems in children with adhd, a systematic review. psychiatry investigation, 8(2), 89–94. https://doi.org/10.4306/pi.2011.8.2.89 harindranath, r.m., & jayanth, j. (2018). bayesian structural equation modelling tutorial for novice management researchers. management research review, 41(11), 1254–1270. https://doi.org/10.1108/mrr-11-2017-0377 howie, s.j., combrinck, c., roux, k., tshele, m., mokoena, g.m., & mcleod palane, n. (2017). pirls literacy 2016 progress in int alt ernational reading literacy study 2016: south african children’s reading literacy achievement. pretoria: centre for evaluation and assessment. huerta, m. (2017). meeting the needs of students with adhd. edutopia. retrieved from https://www.edutopia.org/blog/bridging-the-adhd-gap-merle-huerta jensen, e. (2003). environments for learning. san diego, ca: the brainstore®, inc. linacre, j.m. (2021). a user’s guide to winsteps® ministep. rasch-model computer program. program manual 5.1.7. winsteps®. liu, x. (2020). using and developing measurement instruments in science education: a rasch modelling approach (science & engineering education sources). charlotte, nc: information age publishing. lombard, a. (2015). sensory intelligence. why it matters more than i.q. and e.q. cape town: metz press. lópez, v., torres‑vallejos, j., ascorra, p., villalobos‑parada, b., bilbao, m., & valdés, r. (2018). construction and validation of a classroom climate scale: a mixed-methods approach. learning environments research, 21(3), 407–422. https://doi.org/10.1007/s10984-018-9258-0 lueder, r., & rice, v.j.b. (eds.). (2007). ergonomics for children: designing products and places for toddler to teens. boca raton, fl: crc press. mahdjoubi, l., & akplotsyi, r. (2012). the impact of sensory learning modalities on children’s sensitivity to sensory cues in the perception of their school environment. journal of environmental psychology, 32(3), 208–215. https://doi.org/10.1016/j.jenvp.2012.02.002 michell, j. (1997). quantitative science and the definition of measurement in psychology. british journal of psychology, 88(3), 355–383. https://doi.org/10.1111/j.2044-8295.1997.tb02641.x micoulaud-franchi, j., lopez, r., michel, p., brandejsky, l., bioulac, s., philip, p., … boyer, l. (2016). the development of the sgi-16: a shortened sensory gating deficit and distractibility questionnaire for adults with adhd. adhd attention deficit and hyperactivity disorders, 9(3), 179–187. https://doi.org/10.1007/s12402-016-0215-4 miller-kuhaneck, h., & kelleher, j. (2018). the classroom sensory environment assessment as an educational tool for teachers. journal of occupational therapy, schools, & early intervention, 11(2), 161–171. https://doi.org/10.1080/19411243.2018.1432442 murphy, k.r., & davidshofer, c.o. (2005). psychological testi ng: principles and applications (6th ed.). upper saddle river, ny: pearson education. perold, m., louw, c., & kleynhans, s. (2010). primary school teachers’ knowledge and misperceptions of attention deficit hyperactivity disorder (adhd). south african jour nal of education, 30(3), 457–473. https://doi.org/10.15700/saje.v30n3a364 petersen, n. (2014). the ‘good’, the ‘bad’ and the ‘ugly’? views on male teachers in foundation phase education. south african journal of education, 34(1), 1–13. https://doi.org/10.15700/201412120926 retief, l., potgieter, m., & lutz, m. (2013). the usefulness of the rasch model for the refinement of likert scale questionnaires. african journal of research in mathematics, science and technology education, 17(1–2), 126–138. https://doi.org/10.1080/10288457.2013.828407 sak, r. (2015). comparison of self-efficacy between male and female pre-service early childhood teachers. early child development and care, 185(10), 1629–1640. https://doi.org/10.1080/03004430.2015.1014353 schweizer, k., & distefano, c. (eds.). (2016). principles and methods of test construction. standards and recent advances (vol. 3). göttingen: hogrefe publishing. stenling, a., ivarsson, a., johnson, u., & lindwall, m. (2015). bayesian structural equation modelling in sport and exercise psychology. journal of sport and exercise psychology, 37(4), 410–420. https://doi.org/10.1123/jsep.2014-0330 taylor, j.m. (2019). overview and illustration of bayesian confirmatory factor analysis with ordinal indicators. practical assessment, research & evaluation, 24(4), 1–27. https://doi.org/10.7275/vk6g-0075 van heerden, j., & du preez, h. (2021). creating learning environments that promote play. in j. van heerden, & a. veldsman (eds.), rethinking learning through play (pp. 99–120). pretoria: van schaik publishers. zhou, y. (2019). a mixed-methods model of scale development and validation analysis. measurement: interdisciplinary research and perspectives, 17(1), 38–47. https://doi.org/10.1080/15366367.2018.1479088 zimmer, m., desch, l., rosen, l.d., bailey, m.l., becker, d., culbert, t.p., … wiley, s.e. (2012). sensory integration therapies for children with developmental and behavioral disorders. paediatrics, 129(6), 1186–1189. https://doi.org/10.1542/peds.2012-0876 references about the author(s) sumaya laher department of psychology, faculty of humanities, university of the witwatersrand, johannesburg, south africa citation laher, s. (2021). advancing psychological assessment in africa: contributions from the african journal of psychological assessment. african journal of psychological assessment, 3(0), a88. https://doi.org/10.4102/ajopa.v3i0.88 editorial advancing psychological assessment in africa: contributions from the african journal of psychological assessment sumaya laher copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. the last 2 years have been challenging across all spheres of our lives. from march 2020 to date, things we all knew and took for granted – from work to school, to relationships and social interactions – have fundamentally changed. the coronavirus disease 2019 (covid-19) pandemic and subsequent restrictions placed on individual movements and interactions have necessitated a rethinking in all fields in psychology including psychological assessment. in this volume a special section guest edited by august and mashegoane (2021), provides some insight into the effects of the pandemic on the field of psychological assessment. this ranges from practical and logistical issues such as how to make assessment remotely or face to face whilst complying with preventative measures, as well as ethical issues linked to conducting tests in the online space. dowdeswell and kriek (2021) discuss recent trends in unproctored internet testing (uit) and virtual or video interviewing technologies, and reflect on the role of assessment practices in retrenchment and restructuring in corporates. the advent of covid-19 had a huge impact on higher education, particularly on professional training programmes which involve substantial practical and hands-on training. munnik, smith, adams tucker and human (2021) present a case study from the masters in clinical psychology programme at the university of western cape discussing the challenges faced in teaching psychological assessment and the methods adopted in adjusting during this time. wigdorowitz, rajab, hassem and titi (2021) discuss the impact of covid-19 across both industry and academia, providing suggestions for conducting assessments remotely and issues to consider. makhubela and mashegoane (2021) discuss the psychometric properties of the fear of covid-19 scale (fcv-19s) in a south african context. whilst it is clear across the articles that switching to virtual modes of assessing has its drawbacks, august and mashegoane (2021) note importantly in their editorial the positive contributions to the field in terms of ensuring greater access to assessment. despite the pandemic, the african journal of psychological assessment (ajopa) has continued to grow as evidenced by the increasing number, and varied nature, of submissions. the inaugural volume included a modest offering of six research articles representing the diverse and creative work from cognitive and personality assessment through to emotional screening. volume 2 expanded to include seven research articles, two review articles, and a book review with content spanning organisational, neuropsychological, vocational and educational assessments. cockcroft’s (2020) editorial highlighted the misuses of assessment in south africa arguing that it is imperative for ‘practitioners and researchers in africa, not to allow such problematic research and to make a positive contribution to the body of knowledge through sound and ethical practices’ (p. 2). this year the journal has expanded further to include 10 research articles which reflect current research in south africa. in this volume, the assessment work that traverses the organisational, clinical and developmental fields has been published. munnik, wagener and smith (2021) consider the screening of emotional and social readiness for school, whilst abdool gafoor, burke and fourie (2021) investigate the efficacy of the senior south african individual scale-revised (ssais-r) in children with attention-deficit/hyperactivity disorder (adhd). both these articles touch on the need to consider the lived realities of children in south africa when assessing. psychological wellbeing is a core concern more so because this has been identified by the world health organization (who) as a sustainable development goal (sdg). this is even more crucial given the actual and predicted mental health outcomes during and post the pandemic. mpondo et al. (2021) and khumalo, ejoke, oppong asante and rugira (2021) provide excellent input on measures of psychological wellbeing, whilst van wijk (2021) considers the usefulness of the stress overload scale for employed south africans. van lil and taylor (2021) provide recommendations for the more effective use of personality facets in predicting work related outcomes using the locally developed basic traits inventory. pienaar and theron (2021) present research on the development and validation of a graduate leadership competency questionnaire arguing strongly for a measure that is able to assess a future generation of leaders thereby allowing for effective succession planning. ajopa also publishes articles that discuss methodological developments in the field. in this volume, pretorius (2021) argues for employing additional indicators beyond model fit indices to examine the factor structure of instruments of multidimensional instruments. if one examines the download trends, it is clear that the journal is increasingly becoming recognised as the forum for assessment research in africa attracting consistently more readers and submissions. this year, the journal was included in the directory of open access journals (doaj) and on the ebsco database, and was accredited by the south african department of higher education and training. however, it is necessary for the journal to attract research from elsewhere in africa beyond the southern region. going forward, it is important that the journal considers strategies for the inclusion of more voices in the field. it is also vital for the journal to invite more contributions on the use of indigenous knowledge, beliefs and systems in psychological assessment. in february 2020, academics from the university of zambia presented a 2-day workshop at the university of the witwatersrand, south africa titled, ‘approaches to psychological assessment in africa: responsiveness to african cultural contexts in designing assessment methods’. the panga munthu test, the zambia cognitive assessment test, the home environment potential assessment schedule and the south african personality inventory were presented to stimulate discussions on projects for developing local and/or regional assessment measures that may be more suited to african contexts (fetvadjiev, meiring, van de vijver, & nel, 2018; matafwali & serpell, 2014; nabuzoka, 1993). the discussions held much promise for local and international projects wanting to focus on cross-cultural assessment. of particular note was the interesting discussion on gamification in assessment and the use of indigenous games like masikitlane – a local stone throwing game – as forms of assessment. whilst there are many discussions and workshops of this nature across the african continent, much of this research remains unpublished. hence, ajopa provides a platform for the sharing of this research to find agile solutions for assessment across africa. on behalf of the editors, editorial board and the publishers (aosis (pty) ltd and psyssa), i would like to thank you for supporting this journal, to encourage you to submit your work to the journal, and to continue the conversation on psychological assessment across the continent. references abdool gafoor, l., burke, a., & fourie, j. (2021). the efficacy of the senior south african individual scale revised in distinguishing between attention deficit hyperactivity disorder, normal and sluggish cognitive tempo children. african journal of psychological assessment, 3(0), a45. https://doi.org/10.4102/ajopa.v3i0.45 august, j., & mashegoane, s. (2021). psychological assessment during and after the covid-19 pandemic. african journal of psychological assessment, 3(0), a74. https://doi.org/10.4102/ajopa.v3i0.74 cockcroft, k. (2020). ignorance is not an excuse – irresponsible neurocognitive test use highlights the need for appropriate training. african journal of psychological assessment, 2(0), a28. https://doi.org/10.4102/ajopa.v2i0.28 dowdeswell, k., & kriek, h. (2021). shifting assessment practices in the age of covid-19. african journal of psychological assessment, 3(0), a50. https://doi.org/10.4102/ajopa.v3i0.50 fetvadjiev, v.h., meiring, d., van de vijver, f.j.r., & nel, j.a. (2018). indigenous personality structure and measurement in south africa. in a.t. church (ed.), the praeger handbook of personality across cultures (pp. 137–160). santa barbara, ca: preager. khumalo, i., ejoke, u., oppong asante, k., & rugira, j. (2021). measuring social well-being in africa: an exploratory structural equation modelling study. african journal of psychological assessment, 3(0), a37. https://doi.org/10.4102/ajopa.v3i0.37 matafwali, b., & serpell, r. (2014). design and validation of assessment tests for young children in zambia. in r. serpell & k. marfo (eds.), child development in africa: views from inside. new directions for child and adolescent development, 2014(146), 77–96. https://doi.org/10.1002/cad.20074 makhubela, m., & mashegoane, s. (2021). psychometric properties of the fear of covid-19 scale amongst black south african university students. african journal of psychological assessment, 3(0), a57. https://doi.org/10.4102/ajopa.v3i0.57 mpondo, f., wray, c., norris, s., stein, a., stein, a., & richter, l. (2021). assessing psychological well-being measures amongst south african adults in the birth to twenty plus cohort. african journal of psychological assessment, 3(0), a44. https://doi.org/10.4102/ajopa.v3i0.44 munnik, e., smith, m., adams tucker, l., & human, w. (2021). covid-19 and psychological assessment teaching practices – reflections from a south african university. african journal of psychological assessment, 3(0), a40. https://doi.org/10.4102/ajopa.v3i0.40 munnik, e., wagener, e., & smith, m. (2021). validation of the emotional social screening tool for school readiness. african journal of psychological assessment, 3(0), a42. https://doi.org/10.4102/ajopa.v3i0.42 nabuzoka, d. (1993). how to define, involve and assess the care unit? experiences and research from a cbr programme in zambia. in h. finkenflügel (ed.), the handicapped community. the relation between primary health care and community based rehabilitation (pp. 73–88). amsterdam: vu university press. pienaar, j., & theron, c. (2021). the development and validation of a graduate leader competency questionnaire: arguing the need for a graduate leader performance measure. african journal of psychological assessment, 3(0), a61. https://doi.org/10.4102/ajopa.v3i0.61 pretorius, t. (2021). over reliance on model fit indices in confirmatory factor analyses may lead to incorrect inferences about bifactor models: a cautionary note. african journal of psychological assessment, 3(0), a35. https://doi.org/10.4102/ajopa.v3i0.35 van lill, x., & taylor, n. (2021). the manifestation of the 10 personality aspects amongst the facets of the basic traits inventory. african journal of psychological assessment, 3(0), a31. https://doi.org/10.4102/ajopa.v3i0.31 van wijk, c. (2021). usefulness of the english version of the stress overload scale in a sample of employed south africans. african journal of psychological assessment, 3(0), a41. https://doi.org/10.4102/ajopa.v3i0.41 wigdorowitz, m., rajab, p., hassem, t., & titi, n. (2021). the impact of covid-19 on psychometric assessment across industry and academia in south africa. african journal of psychological assessment, 3(0), a38. https://doi.org/10.4102/ajopa.v3i0.38 abstract introduction method results discussion limitations conclusion acknowledgements references footnote about the author(s) tyrone b. pretorius department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa anita padmanabhanunni department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa citation pretorius, t.b., & padmanabhanunni, a. (2022). assessing the cognitive component of subjective well-being: revisiting the satisfaction with life scale with classical test theory and item response theory. african journal of psychological assessment, 4(0), a106. https://doi.org/10.4102/ajopa.v4i0.106 original research assessing the cognitive component of subjective well-being: revisiting the satisfaction with life scale with classical test theory and item response theory tyrone b. pretorius, anita padmanabhanunni received: 17 feb. 2022; accepted: 06 june 2022; published: 19 july 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract life satisfaction is generally regarded as the cognitive component of subjective well-being, as opposed to positive and negative affect, which are regarded as the affective components. this topic has been extensively studied worldwide and has been linked to a variety of outcomes related to the work context as well as psychological well-being. in this study, we examine the psychometric properties of the satisfaction with life scale (swls), one of the most widely used measures of life satisfaction, using three different approaches: classical test theory, rasch analysis and mokken analysis. combining these three approaches provides a more comprehensive validation of an instrument. in this study, schoolteachers (n = 355) completed the swls, the trait scale of the state-trait anxiety inventory, the center for epidemiological studies depression scale, the beck hopelessness scale and the university of california, los angeles loneliness scale. the three approaches confirmed the reliability, validity and unidimensional nature of the swls, thus supporting its continued use as a measure of life satisfaction in the south african context. keywords: mokken analysis; rasch analysis; classical test theory; satisfaction with life scale; reliability; validity. introduction positive psychology is a movement based on the seminal work of seligman and others (e.g. seligman, 2002; seligman & csikszentmihalyi, 2014), which sparked renewed interest in focusing on what can go right rather than what can go wrong, also known as psychological strength. during the tanner lecture series at the university of michigan, seligman (2010) bemoaned psychology’s obsession with ‘what is wrong with life: suicide, depression, schizophrenia, and all the brick walls that can fall on you’ (p. 232) and ‘[we] tried to create a field in which we asked the question, “what makes life worth living, and how can we build it?”’ (p. 232) this sparked interest in research on positive psychological variables such as happiness, life satisfaction and subjective well-being. these terms are often used interchangeably. for example, in a multination study of subjective well-being worldwide, life satisfaction and happiness were used as variables with which to compare different types of economic systems in terms of subjective well-being (tsai, 2009). whilst life satisfaction and happiness are often used as equivalents to subjective well-being, the latter concept is generally regarded as a multidimensional concept that consists of both affective and cognitive dimensions (pavot & diener, 2008). in this context, positive and negative affect are typically regarded as the affective components, whereas life satisfaction is regarded as the cognitive dimension (prasoon & chaturvedi, 2016). in this regard, diener, emmons, larsen and griffin (1985) described life satisfaction as a cognitive judgement of people’s level of satisfaction based on a comparison with a standard. moreover, a meta-analytic review provided some evidence for the hierarchical conceptualisation of subjective well-being and showed that positive affect, negative affect and life satisfaction load on a latent subjective well-being construct (busseri, 2018). busseri (2018) used meta-analytic correlations to estimate a latent subjective well-being factor, which had moderate to strong loadings on positive and negative affect as well as life satisfaction. life satisfaction is linked to a wide variety of outcomes related to the work context and psychological well-being. it has also been found to be negatively associated with depression, anxiety and stress in migrant and nonmigrant samples in the united states of america and russia (brailovskaia, schönfeld, kochetkov, & margraf, 2019), as well as in university students in brazil (lopes & nihei, 2021) and school students in china (tang, xiang, cheung, & xiang, 2021). in a study on work–family conflict and its correlates, it was found that respondents with high levels of work–family conflict have decreased levels of life satisfaction (cazan, truţă, & pavalache-ilie, 2019). the same study also revealed positive correlations between life satisfaction and psychological well-being and positive affectivity, as well as a negative correlation with negative affectivity. amongst the work-related variables that have been found to be associated with life satisfaction are job satisfaction and workplace attachment (cazan et al., 2019), turnover intentions (lin, hu, danaee, alias, & wong, 2021; ohunakin, adeniji, oludayo, osibanjo, & oduyoye, 2019) and innovative work behaviour and job performance (chughtai, 2021). overall, life satisfaction has been found to be related to a vast range of organisational and health-related variables. the satisfaction with life scale (swls) is arguably the most extensively used measure of life satisfaction and has, in fact, been described as the gold standard for measuring life satisfaction (kaczmarek, bujacz, & eid, 2015). this scale has been used in various countries and is available in several languages (e.g. dutch: van loon, tijhuis, surtees, & ormel, 2001; spanish: extremera & fernandez-berrocal, 2005; japanese: oishi & sullivan, 2005; korean: cha, 2003; and chinese: liang & zhu, 2015). notably, the psychometric properties of the swls are largely based on classical test theory (ctt). for example, in the original study of the scale development, the authors reported a test–retest reliability and an alpha coefficient of 0.82 and 0.87, respectively. exploratory factor analysis (efa) resulted in a single factor extracted that explained 66% of the variance, and validity was established through positive correlations between the swls and a range of other measures of subjective well-being (diener et al., 1985). similarly, other studies have reported satisfactory reliability, validity and evidence for the unidimensionality of the swls based on the ctt in other cultures (e.g. mexico: lópez-ortega, torres-castro, & rosas-carrasco, 2016; iran: maroufizadeh, ghaheri, samani, & ezabadi, 2016; and pakistan: barki, choudhry, & munawar, 2020). relatively few studies have employed item response theory (irt), either parametric or nonparametric. oishi (2006) examined the cross-cultural equivalence of the swls between american and chinese samples, using the two-parameter logistic model of the irt. they found that the item difficulty estimates of two of the items of the swls were different between the two groups. they also found that the mean satisfaction score of the american sample was substantially higher than that of the chinese sample. several studies have also examined the psychometric properties of the swls using rasch analysis (e.g. akif, 2021; løvereide & hagell, 2016; schutte, negri, delle fave, & wissing, 2021) but with mixed results. schutte et al. (2021) found that the results supported the unidimensional structure of the swls, that one item did not fit the model and that there was no differential item functioning (dif) between south african and italian samples. however, akif (2021) and løvereide and hagell (2016) found that the irt supported the reliability of the scale and that all items fitted the model well. both schutte et al. (2021) and løvereide and hagell (2016) suggested the use of fewer response categories, rather than the seven-point likert scale, which is the current format of the swls. to a certain extent, oishi (2006) and akif (2021) used a combination of both ctt and parametric irt to examine the psychometric properties of the swls. in both studies, the authors used efa and confirmatory factor analysis (cfa) to confirm the unidimensionality of the scale, in addition to an irt analysis. only one study was found that used a combination of ctt, parametric irt and nonparametric irt to examine the swls (avşar, 2021). however, this was not, strictly speaking, an examination of the psychometric properties of the swls but rather an examination of the impact of excluding participants who provided aberrant responses to items of the swls. the results demonstrated that after the aberrant individuals were excluded, a better fit was obtained for the cfa, mokken model and graded response model. in this study, the reliability, validity and dimensionality of the swls are examined from three perspectives: ctt, parametric irt (rasch analysis) and nonparametric irt (mokken analysis). generally, combining the irt and ctt provides a comprehensive picture of the psychometric properties of an instrument (akif, 2021; oishi, 2006). for example, oishi (2006) found that if only structural equation modelling was used in the analysis of the swls, they would have erroneously concluded that only one item of the swls had dif, whereas irt revealed that four items had dif. in ctt, the instrument is the unit of analysis, whereas in irt, the item is the unit of analysis. in this regard, ctt focuses more on instrument-level indices such as reliability or standard error of a scale, whilst irt, as its name indicates, focuses more on item-level indices such as item difficulty and dif (abedalaziz & leng, 2018). moreover, irt indices are less sample dependent than ctt indices. in this regard, magno (2009) empirically demonstrated that unlike ctt, item difficulty indices and estimates of reliability in irt were more stable across different samples. in addition, irt provides information regarding person–item interactions which is not provided by ctt (akif, 2021; oishi, 2006). mokken analysis is a non-parametric alternative to rasch analysis; thus, it has fewer assumptions than rasch analysis. in addition, the rasch model assumes that all items have the same response function. item response function refers to the probability that respondents with a high level of the latent trait will endorse an item whilst respondents with a lower level of the latent trait will not endorse the item. in mokken analysis, no such assumption is made, and the item response function could differ for different items. hence, using more than one approach provides a comprehensive picture of the instrument under investigation. method participants both primary and secondary schoolteachers (n = 355) from across south africa participated in this study. most of them were based in the province of the western cape (82.3%) and taught primary school students (61.1%). the sample was largely urban in nature (61.7%) and women (76.6%), and the mean age of the sample was 41.9 years (± 12.4). the mean number of years that the participants have worked in the field of teaching was 15.7 years (± 11.8). instruments all participants completed the following instruments: a brief demographic survey, the swls (diener et al., 1985), the trait scale of the state-trait anxiety inventory (stai-t; spielberger, 1988), the center for epidemiological studies depression scale (ces-d; radloff, 1977), the beck hopelessness scale (bhs; beck, weissman, lester, &trexler, 1974) and the university of california, los angeles loneliness scale (ucla-ls; russell, 1996). as indicated, the swls measures the cognitive component of subjective well-being and consists of five items scored on a 7-point likert scale, ranging from 1 (strongly disagree) to 7 (strongly agree). the swls has generally demonstrated satisfactory psychometric properties, as previously indicated. pavot and diener (2008) provided an extensive review of the psychometric properties of the swls up until 2008. they also confirmed that factor analytic studies have replicated the one-factor structure of the scale. however, they highlighted that item 5 (‘if i could live my life over, i would change almost nothing’) typically has lower factor loadings and item-total correlations than those of the rest of the items. however, they argue for the retention of item 5 based on its high correlation with the other items. more recent studies have also confirmed the satisfactory psychometric properties of the swls. for example, lópez-ortega et al. (2016) reported a reliability coefficient of 0.74. moreover, in terms of validity, they found that the swls significantly correlates with depression and perceived health, amongst other factors. they also confirmed the single-factor structure through an efa. similarly, barki et al. (2020) reported a reliability estimate of 0.89 and found that the swls significantly correlates with anxiety and depression. they also confirmed the unidimensional structure with a cfa. other researchers have also used the swls in south africa and reported satisfactory internal consistency reliability (padmanabhanunni & pretorius, 2021a). the stai-t is a measure of trait anxiety that consists of 20 items scored on a 4-point likert scale, ranging from 1(almost never) to 4 (almost always). a favourable estimate of reliability has been reported both in the original study (α = 0.86–0.92; spielberger, 1988) and in more recent studies (e.g. bee seok, abd hamid, mutang, & ismail, 2018; hallit et al., 2020; stojanović et al., 2020). in south africa, the stai-t has also demonstrated satisfactory reliability when used with students (pretorius & padmanabhanunni, 2021) as well as with a manganese-exposed community when translated into several south african languages (racette et al., 2021). the ces-d is a widely used 20-item measure of depressive symptomology relying on a 4-point scale with response options ranging from zero (rarely or none of the time) to three (most of or all the time). this measure has generally demonstrated satisfactory internal consistency (e.g. ilic, babic, dimitrijevic, ilic, & grujicic, 2019; nia et al., 2019; singh, zaki, farid, & kaur, 2021). in south africa, hassem (2021) developed a 19-item adapted version of the ces-d to be used as an online screening tool for depression. the ces-d has also been used to assess depression in students (padmanabhanunni & pretorius, 2020; pretorius, 1991) and teachers (padmanabhanunni, pretorius, stiegler, & bouchard, 2022), and it has generally demonstrated satisfactory internal consistency. the bhs is a 20-item self-report measure of hopelessness scored on a dichotomous true-or-false response format. satisfactory reliability estimates have been reported for the bhs both in the original study (kuder-richardson -20 = 0.93; beck et al., 1974) and in several different contexts (e.g. colombia: kocalevent et al., 2017; nigeria: aloba, olabisi, ajao, & aloba, 2017; and japan: sueki, 2020). similar satisfactory estimates of reliability have also been reported for student samples in south africa (heppner, pretorius, wei, lee, & wang, 2002; padmanabhanunni & pretorius, 2021b). the ucla-ls is a 20-item self-report measure of loneliness scored on a 4-point likert scale ranging from 1 (i never feel this way) to 4 (i often feel this way). the author of the ucla-ls reported alpha coefficients ranging between 0.89 and 0.94 for different samples of students, nurses, teachers and older individuals. in more recent studies, the ucla-ls has consistently demonstrated satisfactory reliability (e.g. arimoto & tadaka, 2019; zeas-sigüenza, oliveira, ferreira, ganho, & ruisoto, 2021). the ucla-ls has also demonstrated acceptable reliability in south africa (padmanabhanunni & pretorius, 2021c; pretorius, 1993). in a study in south africa, pretorius (2022) examined the dimensionality of the ucla-ls using cfa and bifactor indices and concluded that it is best used as a total scale with three subscales. procedure an electronic version of the above-mentioned instruments was first constructed using google forms. then, a google link was posted on teacher facebook groups after permission was obtained from the administrators of these sites. the school liaison officers of the university also sent the link to schools with which they had a working relationship. data analysis ibm spss statistics version 27 for windows (ibm corp., armonk, ny, usa) was used to perform the ctt analyses, and ibm spss amos version 27 (ibm corp.) was used to conduct cfa. in addition, winsteps version 5.1.4 (linacre, 2021a) was used to perform the rasch analysis, and r (r core team, 2017) was used to conduct the mokken analysis with the ‘mokken’ package (van der ark, 2012). the reliability of the swls was assessed in terms of cronbach’s alpha (α), composite reliability (cr) and mokken scale reliability (msrho). conventionally, a reliability coefficient greater than 0.70 is considered evidence of satisfactory reliability (taber, 2018). to determine the construct validity of the swls, the item-total correlations (ctt), item and person separation indices (rasch analysis) and scalability coefficients for each item (hi, mokken analysis) were evaluated. in general, item-total correlations greater than 0.50 (devon et al., 2007, hajjar, 2018) indicate that all items contribute to the measurement of the latent construct (i.e. life satisfaction). the hi coefficient serves the same function as the item-total correlations in the sense that it indicates the extent to which each item contributes to the total scale. according to mokken (1971), hi coefficients greater than 0.30 indicate well-fitting items that contribute to the measurement of the latent construct. with regard to person and item separation indices, linacre (2021b) recommends that a person separation index of > 2 together with person reliability of > 0.80 and an item separation index of > 3 together with item separation reliability of > 0.80 are acceptable. if these criteria are met, this would indicate that the scale can distinguish between different levels of ‘performers’ (i.e. those with high and low scores on the latent construct–person separation) and that an item difficulty hierarchy exists (item separation). for each item, the rasch analysis also provides fit statistics called the infit and outfit mean square (mnsq), which is used to determine the extent to which each item fits the rasch model. linacre (2021b) suggested that mean square values below 0.50 and above 1.5 are indicative of misfitting items. mokken analysis also provides an indication of whether items discriminate between participants who have high or low levels of life satisfaction (monotonicity) and whether there are items that respondents with the same level of life satisfaction may have endorsed in significantly different ways (invariant item ordering [iio]; sijtsma & van der ark, 2017). for these two assumptions, monotonicity and iio, mokken analysis provides a crit value, which is used to assess potential violations. according to sijtsma and van der ark (2017), a crit value greater than 80 indicates serious violations, whereas a crit value between 40 and 80 indicates minor but acceptable violations. to assess the measurement invariance between men and women, dif was calculated using rasch analysis. in this context, a dif value smaller than 0.50 would indicate that the items measure the same construct across different groups (linacre, 2021b). given reported findings that women generally reported higher levels of life satisfaction (e.g. joshanloo & jovanović, 2020), it is important to demonstrate that the swls measures the same construct in the two groups. other types of construct validity include convergent, discriminant and concurrent validity. firstly, to demonstrate convergent validity, the average variance extracted (ave), cr and factor loadings were used. in general, significant factor loadings (posch et al., 2019), an ave value greater than 0.50 and an ave value smaller than the cr value are evidence of convergent validity. secondly, with regard to discriminant validity, an ave value greater than the maximum shared variance (msv) and average shared variance (asv) is indicative of discriminant validity. this is because it indicates that the latent construct explains a greater proportion of the variance in the items that contribute to its measurement compared with the proportion of variance shared with other related constructs (almén, lundberg, sundin, & jansson, 2018). finally, concurrent validity was established through the associations between life satisfaction and the indices of psychological distress which have been consistently linked to life satisfaction in the literature, namely anxiety, hopelessness, loneliness and depression (e.g. brailovskaia et al., 2019; tang et al., 2021). the dimensionality of the swls was evaluated using all three approaches. to perform a factor analysis, both efa (principal axis) and cfa were conducted. however, before the efa was conducted, the suitability of the data for factor analysis was examined using the kaiser–meyer–olkin (kmo) measure of sampling adequacy and bartlett’s test of sphericity. in general, a kmo value above 0.5, both at scale level and individual item level, and a significance level for bartlett’s test below 0.05 suggests a substantial correlation in the data. thus, it would be appropriate to proceed with factor analysis (hadi, abdullah, & sentosa, 2016). the following fit indices were used in the cfa to measure the model fit (kline, 2005): chi-square (χ2, best if p > 0.05), comparative fit index (cfi, best if above 0.90), root-mean-square error of approximation (rmsea, best if below 0.08), the tucker–lewis index (tli, best if above 0.90) and the goodness-of-fit index (gfi, best if above 0.95). mokken analysis provides an algorithm, an automated item selection procedure (aisp) that partitions items into scales. items that are not selected through the aisp are regarded as unscalable (sijtsma & van der ark, 2017). in addition to providing an h-coefficient for each item (hi), mokken analysis also provides a scalability coefficient (h) for the entire scale to reflect the strength of the scale. the following rule of thumb is typically used to evaluate h: h ≥ 50 reflects a strong scale, 0.40 ≤ h < 0.50 reflects an intermediate scale and h < 0.40 reflects a weak scale (wind, 2017). after the presumed latent trait is removed, a principal component analysis (pca) of the residuals is used in the rasch analysis to detect multidimensionality. if a possible additional dimension, as indicated by the pca (called the ‘first contrast’), has an eigenvalue of > 2, then this suggests two or more items loading on a possible second factor, thus indicating multidimensionality (linacre, 2021b). ethical considerations ethical approval for this study was obtained from the humanities and social sciences ethics committee of the university of the western cape (reference number: hs21/3/8). participation was voluntary, and all participants provided informed consent before they were allowed to proceed with the electronic survey. results table 1 shows the reliabilities, descriptive statistics and intercorrelations between the study variables. overall, the reliabilities of all scales can be considered to be satisfactory (α > 0.70). notably, the mean life satisfaction score in the current study was 21.9 (± 7.3). this is significantly lower than the value reported by diener et al. in the original research of the scale (m = 23.5, standard deviation [sd] = 6.43, t = −4.15, p < 0.001). it is also significantly lower than the value reported more recently by jovanović and brdar (2018)1 for three different countries: austria (m = 25.6, sd = 5.95, t = −9.58, p < 0.001), croatia (m = 24.1, sd = 5.1, t = −5.71, p < 0.001) and serbia (m = 23.3, sd = 5.7, t = −3.64, p < 0.001). however, it is comparable to the mean life satisfaction score reported in the same study for two countries: bosnia and herzegovina (m = 21.6, sd = 6.7, t = 0.76, p = 0.449) and montenegro (m = 22.4, sd = 6.5, t = −1.31, p = 0.191). table 1: intercorrelations, reliabilities and descriptive statistics for variables. table 1 also indicates a significant negative relationship between life satisfaction and the indices of psychological distress: anxiety (r = −0.52, p < 0.001, 95% confidence interval [ci] [−0.59, −0.44]), depression (r = −0.55, p < 0.001, 95% ci [−0.62, − 0.47]), hopelessness (r = −0.62, p < 0.001, 95% ci [−0.68, −0.55]) and loneliness (r = −0.53, p < 0.001, 95% ci [−0.61, −0.45]). this indicates that high levels of life satisfaction are associated with low levels of anxiety, depression, hopelessness and loneliness, thus providing evidence of concurrent validity. table 2 shows the ctt, rasch and mokken indices for the items of the swls. it can be observed that the inter-item correlations were all significant and above 0.50. moreover, the item-total correlations ranged between 0.63 and 0.84. item 5 (‘if i could live my life over, i would change almost nothing’) exhibited the lowest correlation with the latent variable, but it was higher than 0.50. the hi coefficients ranged between 0.58 and 0.73, thus exceeding the suggested cut-off of > 0.30. the factor loadings were all above 0.70 (hair, ringle, & sarstedt, 2011), ranged between 0.74 and 0.91 and were all significant. in addition, there were no significant violations of monotonicity and only one minor violation of iio (item 5, crit = 61). the infit and outfit mnsq values were all within the range of > 0.50 – < 1.5, indicating the absence of misfitting items. the dif value for all items across gender was < 0.50, indicating measurement invariance across men and women. table 2: classical test theory and item response theory indices for the satisfaction with life scale at the item level. the results of the kmo and bartlett’s tests indicated that the data set was adequately sampled and that there was substantial correlation in the data set (kmo = 0.863, kmo for individual items = 0.82–0.91, bartlett’s test = 1125.88, p < 0.001). they therefore indicated that factor analysis of the data is appropriate. the efa extracted one factor, which accounted for 71.15% of the variance, hence demonstrating a dominant factor underlying the swls. as shown in figure 1, a one-factor model of the swls was examined using cfa. figure 1: one-factor model of the satisfaction with life scale with the indices of psychological well-being as outcome variables. the results of cfa are reported in table 3. the fit indices were all above the suggested best-fit indicator (χ2 = ns, gfi = 0.99, tli = 0.99, cfi = 0.99 and rmsea = 0.05), indicating that a one-factor representation of the swls is a favourable fit for the data. table 3: confirmatory factor analysis fit indices for the satisfaction with life scale: one-factor model. table 4 shows the ctt, rasch and mokken indices for the swls at the scale level, together with the suggested cut-off values. with regard to reliability, the cronbach alpha (0.90), cr (0.93) and msrho (0.90) values were all above 0.70, and the ave value (0.73) was larger than 0.50 and also larger than the msv value (0.38) and the asv (0.31). in this regard, the highest correlation coefficient between life satisfaction and the indices of psychological well-being was with hopelessness (r = −0.62). thus, the squared correlation (msv) was 0.38. average shared variance is the mean of the squared correlations between life satisfaction, anxiety, depression, hopelessness and loneliness. the separation indices were at an acceptable level (item separation index = 5.91, person separation index = 2.37). similarly, the separation reliabilities also exceeded the suggested cut-off values (item separation reliability = 0.97, person separation reliability = 0.85). the eigenvalue associated with a possible additional factor in the rasch analysis was found to be 1.82, indicating that the scale is essentially unidimensional. however, at least two items (items four and five) were reflected on the first contrast. the disattenuated correlation between the rasch dimension and the first contrast was 0.73, indicating that the two clusters of items have more than half of their variance in common, which would support a possible interpretation that the two clusters of items measure the same latent variable (linacre, 2021b). the scalability coefficient in the mokken analysis also indicated the existence of a very strong scale (h = 0.66). table 4: classical test theory and item response theory indices for the satisfaction with life scale at the scale level. discussion in this study, we used ctt and irt to examine the replicability of the psychometric properties of the swls in a sample of south african teachers. the results obtained support the findings in the literature regarding the psychometric properties of the swls as examined through ctt (e.g. barki et al., 2020; diener et al., 1985; maroufizadeh et al., 2016) and irt (e.g. akif, 2021; oishi, 2006). firstly, the mean life satisfaction score for the current sample of teachers in south africa was found to be significantly lower than the values reported in other countries. pavot and diener (2009) asserted that it is typical for citizens in western countries to score highly on a range of measures of well-being, including the swls. this assertion seems to be corroborated by south african studies that have also reported low life satisfaction scores prior to the pandemic (e.g. westaway, maritz, & golele, 2003: m = 21.7, sd = 8.8; field & buitendach, 2011: m = 17.47, sd = 6.33). however, there have also been south african studies that reported very high life satisfaction scores amongst south african samples (le roux, kagee, van der merwe, & parker, 2008: m = 28.7, sd = 7.8; roothman, kirsten, & wissing, 2003: m = 24.9 and sd = 5.4 for men and m = 24.8 and sd = 6.0 for women). the available evidence therefore does not allow for a definitive statement of the impact of the pandemic on the life satisfaction of teachers. rather, it merely reflects the fact that they have low levels of life satisfaction, which may have been the case even prior to the pandemic. secondly, all the indices of reliability (cronbach’s alpha, cr and msrho) exceeded the conventional cut-off (> 0.70), thus demonstrating that the swls has very satisfactory reliability that supports its continued use as a research instrument. thirdly, both the ctt and the irt confirmed that the swls demonstrates sufficient validity. with regard to construct validity, all items were found to highly correlate with the total scale. in addition, the scalability coefficient of the individual items (hi) indicated that all items contributed to the measurement of life satisfaction. moreover, the person and item separation indices confirmed that the swls can distinguish between respondents with low and high scores on life satisfaction (person separation – rasch; monotonicity – mokken) and that an item difficulty hierarchy exists (item separation). the mokken analysis also confirmed that there was no violation of the assumption of iio, and thus that there was consistency in the way respondents with the same level of satisfaction responded to items. differential item functioning also demonstrated that there were no gender differences in the measurement of the construct. notably, the convergent, discriminant and concurrent validity of the swls were also confirmed. the significant factor loadings of the five items and the fact that the ave value was above 0.50 and below the cr value confirmed the convergent validity of the scale. the total life satisfaction score accounted for a greater proportion of the variance in the five items (ave) as opposed to the variance it shared with the indices of psychological well-being (msv, asv), thus demonstrating discriminant validity. finally, the significant associations between life satisfaction and the indices of psychological distress provided evidence of concurrent validity. the ctt, rasch and mokken analyses provided complementary evidence of the unidimensional nature of the scale through efa, cfa (ctt), the scalability coefficient (mokken) and pca of the residuals after the rasch factor was extracted. some concerns have been expressed regarding item 5 (‘if i could live my life over, i would change almost nothing’; oishi, 2006; pavot & diener, 2008; schutte et al., 2021). this item seems to be conceptually different from the other four items in that it focuses on the past, whereas the other four focus on the present. as in these other studies, item 5 has also been found to have the lowest item-total correlation and the lowest factor loading in the current study. however, both ctt and irt seem to suggest that, whilst this item is conceptually different, its inclusion in the scale is probably warranted. in cfa, the factor loading of item 5 was lower than the other four items but still significant, whilst in rasch analysis the disattenauted correlation coefficient suggested that the two clusters of items measure the same underlying construct. this requires further and more detailed investigation in future studies. in summary, the three approaches mentioned in this study provide complementary evidence of the reliability, validity and unidimensional nature of the swls. the swls has largely been used as a research rather than a diagnostic instrument, and the evidence from three different perspectives supports its continued use in research on subjective well-being. limitations this study has some limitations. for example, as we have largely used self-report measures, it is important to acknowledge the potential self-report bias. the study results, however, are comparable to previously reported results. in addition, most of the teachers were from one province only, thereby limiting the generalisability of the study. therefore, in future studies, researchers should attempt to select more representative samples. conclusion the swls is a popular and widely used measure of life satisfaction that is extensively used in south africa. to our knowledge, this is the first study in which ctt, rasch and mokken analyses are used in a complementary manner to evaluate the psychometric properties of the swls. the results indicate that the swls is a reliable, valid and unidimensional measure of the cognitive component of subjective well-being. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this research article. authors’ contributions a.p. and t.b.p. contributed equally to the conceptualisation, data collection, writing, review and editing of this article. t.b.p. was responsible for the data analysis. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the data sets generated and/or analysed during the current study are available from the corresponding author upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references abedalaziz, n., & leng, c.h. (2018). the relationship between ctt and irt approaches in analyzing item characteristics. mojes: malaysian online journal of educational sciences, 1(1), 64–70. retrieved from https://www.researchgate.net/publication/316172903 akif, a.v.c.u. (2021). item response theory-based psychometric investigation of swls for university students. international journal of psychology and educational studies, 8(2), 27–37. https://doi.org/10.52380/ijpes.2021.8.2.265 almén, n., lundberg, h., sundin, ö., & jansson, b. (2018). the reliability and factorial validity of the swedish version of the recovery experience questionnaire. nordic psychology, 70(4), 324–333. https://doi.org/10.1080/19012276.2018.1443280 aloba, o., olabisi, o., ajao, o., & aloba, t. (2017). the beck hopelessness scale: factor structure, validity, and reliability in a non-clinical sample of student nurses in south-western nigeria. journal of behavioral health, 6(1), 58–65. https://doi.org/10.5455/jbh.20161022032400 arimoto, a., & tadaka, e. (2019). reliability and validity of japanese versions of the ucla loneliness scale version 3 for use among mothers with infants and toddlers: a cross-sectional study. bmc women’s health, 19(1), 1–9. https://doi.org/10.1186/s12905-019-0792-4 avşar, a.ş. (2021). aberrant individuals’ effects on fit indices both of confirmatory factor analysis and polytomous irt models. current psychology, 1–10. https://doi.org/10.1007/s12144-021-01563-4 barki, n., choudhry, f.r., & munawar, k. (2020). the satisfaction with life scale: psychometric properties in pakistani population. medical journal of the islamic republic of iran, 34, 159. https://doi.org/10.47176/mjiri.34.159 beck, a.t., weissman, a., lester, d., &trexler, l. (1974). the measurement of pessimism: the hopelessness scale. journal of consulting and clinical psychology, 42(6), 861–865. https://doi.org/10.1037/h0037562 bee seok, c., abd hamid, h.s., mutang, j.a., & ismail, r. (2018). psychometric properties of the state-trait anxiety inventory (form y) among malaysian university students. sustainability, 10(9), 3311. https://doi.org/10.3390/su10093311 brailovskaia, j., schönfeld, p., kochetkov, y., & margraf, j. (2019). what does migration mean to us? usa and russia: relationship between migration, resilience, social support, happiness, life satisfaction, depression, anxiety and stress. current psychology, 38(2), 421–431. https://doi.org/10.1007/s12144-017-9627-3 busseri, m.a. (2018). examining the structure of subjective well-being through meta-analysis of the associations among positive affect, negative affect, and life satisfaction. personality and individual differences, 122(1), 68–71. https://doi.org/10.1016/j.paid.2017.10.003 cazan, a.m., truţă, c., & pavalache-ilie, m. (2019). the work-life conflict and satisfaction with life: correlates and the mediating role of the work-family conflict. romanian journal of psychology, 21(1), 3–10. https://doi.org/10.24913/rjap.21.1.02 cha, k.h. (2003). subjective well-being among college students. social indicators research, 62(1), 455–477. https://doi.org/10.1023/a:1022669906470 chughtai, a.a. (2021). a closer look at the relationship between life satisfaction and job performance. applied research in quality of life, 16(2), 805–825. https://doi.org/10.1007/s11482-019-09793-2 devon, h.a., block, m.e., moyle-wright, p., ernst, d.m., hayden, s.j., lazzara, d.j., … kostas-polston, e. (2007). a psychometric toolbox for testing validity and reliability. journal of nursing scholarship, 39(2), 155–164. https://doi.org/10.1111/j.1547-5069.2007.00161.x diener, e., emmons, r.a., larsen, r.j., & griffin, s. (1985). the satisfaction with life scale. journal of personality assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13 extremera, n., & fernandez-berrocal, p. (2005). perceived emotional intelligence and life satisfaction: predictive and incremental validity using the trait meta-mood scale. personality and individual differences, 39(5), 937–948. https://doi.org/10.1016/j.paid.2005.03.012 field, l.k., & buitendach, j.h. (2011). happiness, work engagement and organisational commitment of support staff at a tertiary education institution in south africa. sa journal of industrial psychology, 37(1), 1–10. https://doi.org/10.4102/sajip.v37i1.946 hadi, n.u., abdullah, n., & sentosa, i. (2016). an easy approach to exploratory factor analysis: marketing perspective. journal of educational and social research, 6(1), 215. retrieved from https://www.mcser.org/journal/index.php/jesr/article/view/8799 hair, j.f., ringle, c.m., & sarstedt, m. (2011). pls-sem: indeed a silver bullet. journal of marketing theory and practice, 19(2), 139–152. https://doi.org/10.2753/mtp1069-6679190202 hajjar, s.t. (2018). statistical analysis: internal-consistency reliability and construct validity. international journal of quantitative and qualitative research methods, 6(1), 46–57. retrieved from https://www.eajournals.org/wp-content/uploads/statistical-analysis-internal-consistency-reliability-and-construct-validity-1.pdf hallit, s., haddad, c., hallit, r., akel, m., obeid, s., haddad, g., … salameh, p. (2020). validation of the hamilton anxiety rating scale and state trait anxiety inventory a and b in arabic among the lebanese population. clinical epidemiology and global health, 8(4), 1104–1109. https://doi.org/10.1016/j.cegh.2020.03.028 hassem, t. (2021). establishing the content validity of an online depression screening tool for south africa. african journal of psychological assessment, 3, a62. https://doi.org/10.4102/ajopa.v3i0.62 heppner, p.p., pretorius, t.b., wei, m., lee, d.g., & wang, y.w. (2002). examining the generalizability of problem-solving appraisal in black south africans. journal of counseling psychology, 49(4), 484–498. https://doi.org/10.1037/0022-0167.49.4.484 ilic, i., babic, g., dimitrijevic, a., ilic, m., & grujicic, s.s. (2019). reliability and validity of the center for epidemiologic studies depression (ces-d) scale in serbian women with abnormal papanicolaou smear results. international journal of gynecologic cancer, 29(6), 996–1002. retrieved from https://ijgc.bmj.com/content/29/6/996.abstract joshanloo, m., & jovanović, v. (2020). the relationship between gender and life satisfaction: analysis across demographic groups and global regions. archives of women’s mental health, 23(3), 331–338. https://doi.org/10.1007/s00737-019-00998-w jovanović, v., & brdar, i. (2018). the cross-national measurement invariance of the satisfaction with life scale in a sample of undergraduate students. personality and individual differences, 128(1), 7–9. https://doi.org/10.1016/j.paid.2018.02.010 kaczmarek, l.d., bujacz, a., & eid, m. (2015). comparative latent state – trait analysis of satisfaction with life measures: the steen happiness index and the satisfaction with life scale. journal of happiness studies, 16(2), 443–453. https://doi.org/10.1007/s10902-014-9517-4 kline, r.b. (2005). principles and practice of structural equation modeling (2nd ed.). new york: guilford. kocalevent, r.d., finck, c., pérez-trujillo, m., sautier, l., zill, j., & hinz, a. (2017). standardization of the beck hopelessness scale in the general population. journal of mental health, 26(6), 516–522. https://doi.org/10.1080/09638237.2016.1244717 le roux, m.c., kagee, a., van der merwe, m., & parker, f. (2008). subjective well-being of primary health care patients in the western cape, south africa. south african family practice, 50(3), 68–68. https://doi.org/10.1080/20786204.2008.10873723 liang, y., & zhu, d. (2015). subjective wellof chinese landless peasants in relatively developed regions: measurement using panas and swls. social indicators research, 123(3), 817–835. https://doi.org/10.1007/s11205-014-0762-z lin, y., hu, z., danaee, m., alias, h., & wong, l.p. (2021). the impact of the covid-19 pandemic on future nursing career turnover intention among nursing students. risk management and healthcare policy, 14, 3605–3615. https://doi.org/10.2147/rmhp.s322764 linacre, j.m. (2021a). winsteps® rasch measurement computer program (version 5.1.4). portland, or: winsteps.com. linacre, j.m. (2021b). winsteps® rasch measurement computer program user’s guide. version 5.1.1. portland, or: winsteps.com. lopes, a.r., & nihei, o.k. (2021). depression, anxiety and stress symptoms in brazilian university students during the covid-19 pandemic: predictors and association with life satisfaction, psychological well-being and coping strategies. plos one, 16(10), e0258493. https://doi.org/10.1371/journal.pone.0258493 lópez-ortega, m., torres-castro, s., & rosas-carrasco, o. (2016). psychometric properties of the satisfaction with life scale (swls): secondary analysis of the mexican health and aging study. health and quality of life outcomes, 14(1), 1–7. https://doi.org/10.1186/s12955-016-0573-9 løvereide, l., & hagell, p. (2016). measuring life satisfaction in parkinson’s disease and healthy controls using the satisfaction with life scale. plos one, 11(10), e0163931. https://doi.org/10.1371/journal.pone.0163931 magno, c. (2009). demonstrating the difference between classical test theory and item response theory using derived test data. the international journal of educational and psychological assessment, 1(1), 1–11. retrieved from https://ssrn.com/abstract=1426043 maroufizadeh, s., ghaheri, a., samani, r.o., & ezabadi, z. (2016). psychometric properties of the satisfaction with life scale (swls) in iranian infertile women. international journal of reproductive biomedicine, 14(1), 57–62. retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/pmc4837918/ mokken, r.j. (1971). a theory and procedure of scale analysis. the hague: mouton. nia, h.s., rezapour, m., allen, k.a., sharif, s.p., jafari, a., torkmandi, h., … goudarzian, a.h. (2019). the psychometric properties of the center for epidemiological studies depression scale (ces-d) for iranian cancer patients. asian pacific journal of cancer prevention: apjcp, 20(9), 2803–2809. https://doi.org/10.31557/apjcp.2019.20.9.2803 ohunakin, f., adeniji, a.a., oludayo, o.a., osibanjo, a.o., & oduyoye, o.o. (2019). employees’ retention in nigeria’s hospitality industry: the role of transformational leadership style and job satisfaction. journal of human resources in hospitality & tourism, 18(4), 441–470. https://doi.org/10.1080/15332845.2019.1626795 oishi, s. (2006). the concept of life satisfaction across cultures: an irt analysis. journal of research in personality, 40(4), 411–423. https://doi.org/10.1016/j.jrp.2005.02.002 oishi, s., & sullivan, h.w. (2005). the mediating role of parental expectations in culture and well-being. journal of personality, 73(5), 1267–1294. https://doi.org/10.1111/j.1467-6494.2005.00349.x padmanabhanunni, a., & pretorius, t. (2021a). the loneliness – life satisfaction relationship: the parallel and serial mediating role of hopelessness, depression and ego-resilience among young adults in south africa during covid-19. international journal of environmental research and public health, 18(7), 3613. https://doi.org/10.3390/ijerph18073613 padmanabhanunni, a., & pretorius, t. (2021b). behaviour is the key in a pandemic: the direct and indirect effects of covid-19-related variables on psychological wellbeing. psychological reports. https://doi.org/10.1177/00332941211025269 padmanabhanunni, a., & pretorius, t.b. (2020). when coping resources fail: the health-sustaining and moderating role of fortitude in the relationship between covid-19-related worries and psychological distress. african safety promotion: a journal of injury and violence prevention, 18(2), 28–47. retrieved from https://www.ajol.info/index.php/asp/article/download/211538/199456 padmanabhanunni, a., & pretorius, t.b. (2021c). the unbearable loneliness of covid-19: covid-19-related correlates of loneliness in south africa in young adults. psychiatry research, 296, 113658. https://doi.org/10.1016/j.psychres.2020.113658 padmanabhanunni, a., pretorius, t.b., stiegler, n., & bouchard, j.p. (2022). a serial model of the interrelationship between perceived vulnerability to disease, fear of covid-19, and psychological distress among teachers in south africa. annales médico-psychologiques, revue psychiatrique, 180(1), 23–28. https://doi.org/10.1016/j.amp.2021.11.007 pavot, w., & diener, e. (2008). the satisfaction with life scale and the emerging construct of life satisfaction. the journal of positive psychology, 3(2), 137–152. https://doi.org/10.1080/17439760701756946 pavot, w., & diener, e. (2009). review of the satisfaction with life scale. in e. diener (ed.), assessing wellbeing: the collected works of ed diener (pp. 101–117). dordrecht: springer. posch, l., bleier, a., lechner, c.m., danner, d., flöck, f., & strohmaier, m. (2019). measuring motivations of crowdworkers: the multidimensional crowdworker motivation scale. acm transactions on social computing, 2(2), 1–34. https://doi.org/10.1145/3335081 prasoon, r., & chaturvedi, k.r. (2016). life satisfaction: a literature review. the researcher: international journal of management, humanities and social sciences, 1(02), 24–31. retrieved from https://researcher.galgotiapublications.com/researcher/article/download/9/9 pretorius, t.b. (1991). cross-cultural application of the center for epidemiological studies depression scale: a study of black south african students. psychological reports, 69(suppl 3), 1179–1185. https://doi.org/10.2466/pr0.1991.69.3f.1179 pretorius, t.b. (1993). the metric equivalence of the ucla loneliness scale for a sample of south african students. educational and psychological measurement, 53(1), 233–239. https://doi.org/10.1177/0013164493053001026 pretorius, t.b. (2022). the applicability of the ucla loneliness scale in south africa: factor structure and dimensionality. african journal of psychological assessment, 4, 8. https://doi.org/10.4102/ajopa.v4i0.63 pretorius, t., & padmanabhanunni, a. (2021). a looming mental health pandemic in the time of covid-19? role of fortitude in the interrelationship between loneliness, anxiety, and life satisfaction among young adults. south african journal of psychology, 51(2), 256–268. https://doi.org/10.1177/0081246321991030 racette, b.a., nelson, g., dlamini, w.w., hershey, t., prathibha, p., turner, j.r., … nielsen, s.s. (2021). depression and anxiety in a manganese-exposed community. neurotoxicology, 85, 222–233. https://doi.org/10.1016/j.neuro.2021.05.017 radloff, l.s. (1977). the ces-d scale: a self-report depression scale for research in the general population. applied psychological measurement, 1(3), 385–401. https://doi.org/10.1177/014662167700100306 r core team. (2017). r: a language and environment for statistical computing. r foundation for statistical computing. vienna. retrieved from https://www.r-project.org/ roothman, b., kirsten, d.k., & wissing, m.p. (2003). gender differences in aspects of psychological well-being. south african journal of psychology, 33(4), 212–218. https://doi.org/10.1177/008124630303300403 russell, d.w. (1996). ucla loneliness scale (version 3): reliability, validity, and factor structure. journal of personality assessment, 66(1), 20–40. https://doi.org/10.1207/s15327752jpa6601_2 schutte, l., negri, l., delle fave, a., & wissing, m.p. (2021). rasch analysis of the satisfaction with life scale across countries: findings from south africa and italy. current psychology, 40(10), 4908–4917. https://doi.org/10.1007/s12144-019-00424-5 seligman, m. (2010). flourish: positive psychology and positive interventions. the tanner lectures on human values, 31(4), 1–56. retrieved from http://www.isbm.at/pics/flourish_seligman.pdf seligman, m.e. (2002). authentic happiness: using the new positive psychology to realize your potential for lasting fulfillment. new york: simon and schuster. seligman, m.e., & csikszentmihalyi, m. (2014). positive psychology: an introduction. in csikszentmihalyi, m., & larson, r. (eds.), flow and the foundations of positive psychology (pp. 279–298). dordrecht: springer. sijtsma, k., & van der ark, l.a. (2017). a tutorial on how to do a mokken scale analysis on your test and questionnaire data. british journal of mathematical and statistical psychology, 70(1), 137–158. https://doi.org/10.1111/bmsp.12078 singh, s., zaki, r.a., farid, n.d.n., & kaur, k. (2021). reliability analysis of the malay version of the center for epidemiologic studies-depression scale (cesd) among adolescents in malaysia. preventive medicine reports, 24, 101585. https://doi.org/10.1016/j.pmedr.2021.101585 spielberger, c.d. (1988). state-trait anger expression inventory: revised research manual. odessa, fl: psychological assessment resources. stojanović, n.m., ranđelović, p.j., nikolić, g., stojiljković, n., ilić, s., stoiljković, b., … radulović, n.s. (2020). reliability and validity of the spielberger’s state-trait anxiety inventory (stai) in serbian university student and psychiatric non-psychotic outpatient populations. acta facultatis medicae naissensis, 37(2), 149–159. https://doi.org/10.5937/afmnai37-25011 sueki, h. (2020). relationship between beck hopelessness scale and suicidal ideation: a short-term longitudinal study. death studies, 46(2), 1–6. https://doi.org/10.1080/07481187.2020.1740833 taber, k.s. (2018). the use of cronbach’s alpha when developing and reporting research instruments in science education. research in science education, 48, 1273–1296. https://doi.org/10.1007/s11165-016-9602-2 tang, s., xiang, m., cheung, t., & xiang, y.t. (2021). mental health and its correlates among children and adolescents during covid-19 school closure: the importance of parent-child discussion. journal of affective disorders, 279, 353–360. https://doi.org/10.1016/j.jad.2020.10.016 tsai, m.c. (2009). market openness, transition economies and subjective wellbeing. journal of happiness studies, 10(5), 523–539. https://doi.org/10.1007/s10902-008-9107-4 van der ark, l.a. (2012). ‘new developments in mokken scale analysis in r’. journal of statistical software, 48(5), 1–27. https://doi.org/10.18637/jss.v048.i05 van loon, a.j.m., tijhuis, m., surtees, p.g., & ormel, j. (2001). personality and coping: their relationship with lifestyle risk factors for cancer. personality and individual differences, 31(4), 541–553. https://doi.org/10.1016/s0191-8869(00)00158-6 westaway, m.s., maritz, c., & golele, n.j. (2003). empirical testing of the satisfaction with life scale: a south african pilot study. psychological reports, 92(2), 551–554. https://doi.org/10.2466/pr0.2003.92.2.551 wind, s.a. (2017). an instructional module on mokken scale analysis. educational measurement: issues and practice, 36(2), 50–66. https://doi.org/10.1111/emip.12153 zeas-sigüenza, a., oliveira, s., ferreira, c., ganho, a., & ruisoto, p. (2021). psychometric properties and factor structure of the ucla loneliness scale v3: the european portuguese version. psyarxiv. retrieved from https://psyarxiv.com/eyxvn/download/?format=pdf footnote 1. the jovanović and brdar (2018) study reported scaled mean scores. abstract introduction the high potential trait indicator rasch measurement methods results discussion limitations conclusion acknowledgements references about the author(s) david s. semmelink department of psychology, faculty of humanities, university of pretoria, pretoria, south africa david j.f. maree department of psychology, faculty of humanities, university of pretoria, pretoria, south africa citation semmelink, d.s., & maree, d.j.f. (2023). a rasch analysis of the high potential trait indicator: a south african sample. african journal of psychological assessment, 5(0), a115. https://doi.org/10.4102/ajopa.v5i0.115 original research a rasch analysis of the high potential trait indicator: a south african sample david s. semmelink, david j.f. maree received: 19 aug. 2022; accepted: 01 dec. 2022; published: 08 feb. 2023 copyright: © 2023. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the reliability and validity of the six traits comprising the high potential trait indicator (hpti) were evaluated using rasch analysis. focus was designated to the unidimensionality and local independence of each subscale; fit to the rasch model; person reliability and separation; and differential item functioning (dif). secondary data, obtained from intellectual property rights holder thomas international, were used for analysis with a sample of 1257 south african respondents. one of the six traits, curiosity (0.73), was found to be reliable. traits adjustment (0.69) and competitiveness (0.69) border on the accepted cut-off of 0.70. risk approach (0.64) obtained the lowest reliability, closely followed by conscientiousness (0.65) and ambiguity acceptance (0.65). six of the 78 hpti items did not fit the rasch model, all of which underfit the model. trait curiosity was found not to be unidimensional, while the ambiguity acceptance scale approached the value at which a scale is considered multidimensional. one item was identified to be threatening the unidimensionality of the curiosity scale based on both the factor loadings of the principal components analysis of the residuals and underfitting the rasch model. the differential item functioning (dif) analysis found no item bias between genders, female and male. eleven items displayed dif across ethnicities and home language groups. the most severe instance of dif occurred in trait competitiveness, yet it had only one item experiencing dif. trait conscientiousness, however, contained four items experiencing various severities of dif. contribution: this study highlighted the shortcomings of the current hpti in the south african context through rasch analysis. the findings illustrate the difficult nature of creating ideal personality instruments in the south african context, thus contributing to the body of knowledge of personality assessments in south africa. keywords: psychometric properties; high potential trait indicator (hpti); rasch model fit; person reliability; differential item functioning. introduction the use of psychometric testing in decision-making is commonplace in various sectors. sectors include education, human resources, coaching, forensics, counselling, medical and clinical applications and economic and financial sectors (arráiz et al., 2016; bichi, 2016; coaley, 2010; foxcroft & roodt, 2018). psychometric tools therefore have an importance in various settings globally, as they provide a measurement of psychological constructs not easily observed (foxcroft & roodt, 2018). it is also a requirement in south africa that psychological assessments used in areas of employment show scientific evidence that they are valid and reliable, can be applied fairly and show no bias towards groups (employee equity act no. 55 of 1998, government gazette, 2014). the high potential trait indicator (hpti; macrae & furnham, 2016) is one such psychometric assessment that is used in south africa which needs to comply with employment equity (ee) requirements. the high potential trait indicator the hpti is a self-reporting six-trait personality-based questionnaire with a seven-point likert-type scale. it was developed in the united kingdom to identify high performers (macrae & furnham, 2016; 2020) and has since been used globally. the six traits are: conscientiousness, adjustment, curiosity, risk approach (also known as courage), ambiguity acceptance and competitiveness (macrae & furnham, 2016; 2020). the instrument comprises 78 items, 13 per trait, which respondents are required to rate from strongly disagree (1) to strongly agree (7) per item. in its initial development, the six hpti scales achieved sufficient internal consistency reliability, with alpha coefficients above 0.70 (macrae & furnham, 2020). the initial sample consisted of 779 working professionals across 25 countries (macrae & furnham, 2020). regarding structural validity, macrae and furnham (2020) reported structural equation modelling statistics. the comparative fit indices (cfis) ranged from 0.727 (curiosity) to 0.876 (ambiguity acceptance). root mean error of approximation (rmsea) indices ranged from 0.062 (ambiguity acceptance) to 0.109 (curiosity). standard root mean squared indices ranged from 0.047 (ambiguity acceptance) to 0.078 (curiosity). the hpti also significantly correlated with various other aspects of certain assessments such as the hogan development survey (hds; hogan & hogan, 2009), neo personality inventory form s (neo-pi-r; mccrae & costa, 1985) and trait emotional intelligence questionnaire (teique; petrides, 2009) and demonstrated sufficient predictive validity (macrae & furnham, 2016, 2020). definition of the six traits according to macrae and furnham (2020), conscientiousness is a higher-order personality trait in the five factor model. it comprises industriousness, self-control, responsibility, order, traditionalism and virtue. this trait has been found to have a moderate correlation with job success and other job metrics (barrick et al, 2001; macrae & furnham, 2016) adjustment is described by macrae and furnham (2020) as emotional resilience to stress and positive affect and is the inverse of trait neuroticism in the five factor model. higher levels of adjustment were found to be associated with better teamwork and higher performance, while lower levels were associated with low job satisfaction and subjective well-being (judge & locke, 1992; macrae & furnham, 2020). curiosity is synonymous with the openness trait of the five factor model. the trait is characterised as being open to new ideas and experiences, as well as being creative, reflective and innovative (macrae & furnham, 2020). curiosity and openness were associated with job satisfaction, trainability and learning outcomes (barrick et al, 2001; judge et al, 1999; linden et al, 2010; macrae & furnham, 2020). risk approach is defined as how an individual handles challenging, difficult or threatening situations (macrae & furnham, 2016). it is the mitigation of negative, threat-based emotions that cause a strong drive to avoid that situation, restricting the potential range of responses to avoidance (macrae & furnham, 2020). ambiguity acceptance is a measure of how an individual perceives and processes unfamiliarity and that which is not clear (macrae & furnham, 2020). herman et al. (2010) suggest that tolerance for ambiguity involves unfamiliarity, change, challenging perspectives and valuing diversity. high-fliers and senior leadership are thought to require a tolerance for and adaption to ambiguity because of the need to make sense of and incorporate multiple streams of mixed information to make effective decisions (keenan & mcbain, 1979; macrae & furnham, 2020; mccall, 1997). finally, macrae and furnham (2020) describe competitiveness as a dimension that drives self-improvement and the desire for success. in a study of sales performance, wang and netemeyer (2002) found competitiveness to be a significant predictor of performance. rasch measurement rasch measurement theory (rmt; rasch, 1960) is mathematically identical to the one-parameter logistic model (1pl) level of item response theory (irt). rasch, therefore, is synonymous with 1pl irt (finch et al., 2016). however, the two theories developed in separate areas of the world around the same time (lord & novick, 1968; rasch, 1960). a key difference between the two psychometric paradigms is that irt requires the model to fit the data, whereas rmt prescribes a model which the data must fit (petrillo et al., 2015). boone et al. (2014, p. 220) provide an adequate summary of the function of the rasch model, stating that ‘the rasch model is a definition of measurement. if persons and items do not fit the model, then those items and persons are not contributing to useful measurement’. wright and mok (2004) maintain that the rasch model is the only model to satisfy the five-model requirements of measurement. these are that the measurement model must: (a) produce linear measures, (b) overcome missing data, (c) give estimates of precision, (d) have devices for detecting misfitting items/persons, and (e) the parameters of the object being measured and of the measurement instrument must be separable (wright and mok 2004, p. 4). to satisfy the requirements of measurement from a rasch perspective and psychometric evaluation, the analyses, then, are to evaluate the person reliability, fit to the rasch model and differential item functioning (dif) (bond et al., 2020). assumptions about the latent traits exist for rasch analyses (fan & bond, 2019). these are that the scales are unidimensional (measure one dimension) and that the items are locally independent (responses to an item do not rely on the response to another item). tests for unidimensionality and local independence are therefore also necessary in conducting rasch analyses (fan & bond, 2019). fit statistics rasch analysis provides two fit statistics for persons and items: infit and outfit statistics. winsteps (linacre, 2020a) provides a couple of infit and outfit statistics, namely mean-square (mnsq), which is an average value of residuals, and z-standardised (zstd), which is a t-statistic (boone et al, 2014; bond et al., 2020). a reasonable fit statistic range for rating scales, such as likert-type scales, contains mnsq values between 0.6 and 1.4 (wright & linacre, 1994). on the other hand, mnsq values less than 0.6 with a zstd greater than 2.0 are indicative of an overfitting item and can be interpreted as being more than 40% (1.0 – 0.6 = 0.4) less varied than the rasch model expects (bond et al., 2020). a mnsq value greater than 1.4 with zstd greater than 2.0 is indicative of an underfitting item and can be interpreted as being 40% more varied than the rasch model expects (bond et al., 2020). according to linacre (2020a), outfit is outlier sensitive, and high outfit values tend to be the result of random responses from lower performers. infit, on the other hand, is information weighted and therefore less influenced by outliers. high infit values are an indication of the items mis-performing and are a greater threat to validity. linacre (2020a) then suggests that outfit be examined before infit. however, bond et al. (2020) indicate that deviant infit statistics are more concerning than deviant outfit statistics. therefore, while outfit will be reported, infit statistics will be the focus, as this statistic is a greater concern to the validity of the scale. person reliability and separation the reliability of an instrument is its degree of consistency at measuring what it purports to measure (roodt & de kock, 2018). a typical measure of a questionnaire’s reliability is internal consistency reliability, traditionally evaluated by cronbach’s (1951) alpha coefficient. in rasch measurement, internal consistency reliability is reported as two metrics: person reliability and person separation (bond et al., 2020). rasch reliability statistics indicate the reproducibility of the person ordering or placement (wright & masters, 1982). that is, if the same group of respondents took an equivocal test, would they be placed in a similar order based on their measures? boone et al. (2014) indicate that the person reliability statistics produced by winsteps are interpreted similarly to traditional reliability indices. therefore, using conventional guidelines, a person reliability index of 0.7 or higher is considered acceptable (devillis, 2017; kaplan & saccuzzo, 2017; macrae & furnham, 2016; nunnally, 1978; pallant, 2020; yang & green, 2011). person separation, on the other hand, is the spread of the respondents on that measure (bond et al., 2020). a higher separation statistic indicates a larger spread of person measures, with and index of 1.50 regarded as acceptable and 2.00 and 3.00 as good and excellent, respectively (wright & masters, 1982). unidimensionality and local independence unidimensionality and local independence are two interrelated conditions required for rasch measurement (fan & bond, 2019; heffernan et al., 2019). unidimensionality refers to the measurement of a single construct (or latent trait or dimension). for example, the trait extraversion can be considered a single construct. a scale measuring extraversion alone is therefore unidimensional. a scale that measures extraversion and anxiety, then, is multidimensional (measuring more than one dimension). in rasch measurement, the requirement, then, is that each latent trait be measured one at a time (fan & bond, 2019) and is therefore unidimensional. perfect unidimensionality, however, is not a realistic expectation. instruments are then required to have a close approximation to unidimensionality. the unidimensionality of an instrument is estimated through a principal component analysis of the residuals (pcar) (fan & bond, 2019), available in software such as winsteps (linacre, 2020b). from the pcar, contrasts with eigenvalues at or greater than 2 indicate the possibility of the scale possessing more than one dimension, whereas contrasts with eigenvalues less than 2 are regarded as insignificant (bond et al., 2020). furthermore, items underfitting the rasch model provide additional concerns to the threat of unidimensionality (fan & bond, 2019). local independence is the condition in which an individual’s responses to an item is not affected by their response to any other items (fan & bond, 2019). for example, an item regarding reading several times a week would affect, or be affected by, an item regarding reading once a week. in such a case, the items are dependent rather than independent and therefore violate the condition of local independence (fan & bond, 2019). however, like unidimensionality, it is unrealistic to expect perfect independence. an estimate of the correlation between item residuals is therefore required to determine whether there are items that are significantly dependent on each other (fan & bond, 2019; linacre, 2020a). according to linacre, positive correlations of 0.7 are the beginning of concern for dependency. furthermore, items overfitting the rasch model provide additional concerns to the threat of local independence (fan & bond, 2019). differential item functioning differential item functioning is an evaluation of how congruently the items of a measure define a construct between certain groups (boone et al., 2014). in winsteps (linacre, 2020b), average measures of the relevant groups (e.g. male and female groups) are presented in logits and are compared against each other. boone et al. (2014) indicate that dif may be present in comparisons with a significant p-value (p < 0.05) of the rasch–welch statistic. linacre (2020a) substantiates that in addition to statistical significance between groups, an effect size ≥ 0.64 is considered moderate to large, whereas between 0.43 and 0.64 is considered slight to moderate. below 0.43 is considered negligible and insufficient to flag items as having dif present. a dif effect size in a rasch analysis is provided in winsteps as ‘dif contrast’ (linacre, 2020a). methods sample table 1 provides a breakdown of the sample. the sample consisted of 1257 respondents who completed the hpti through the south african subsidiary of thomas international and had agreed to participate in further research. slightly more than half of the sample were female (n = 684, 54.4%) and the rest male (n = 573, 45.6%). table 1: demographic statistics of the sample. regarding ethnic background, slightly less than half reported as white (n = 577, 45.9%). less than a third reported as black (n = 380, 30.2%), followed by mixed race (n = 184, 14.6%), then asian and indian (n = 106, 8.4%). ten (0.8%) respondents reported other. when indicating their home language, 558 (44.4%) reported english and 370 (29.4%) indicated afrikaans as their home language. isixhosa was the next most frequently reported home language with 76 (6.0%) respondents, followed by isizulu (n = 59, 5.7%), setswana (n = 49, 3.9%), sesotho (n = 48, 3.8%), sepedi (n = 47, 3.7%), tshivenda (n = 20, 1.6%), xitsonga (n = 16, 1.6%), siswati (n = 9, 0.7%), french (n = 3, 0.2%) and isindebele (n = 2, 0.2%). nearly half of the respondents reported gauteng (n = 586, 46.6%) as their residential province. the western cape (n = 313, 24.9%) was the next most frequently reported residential province, followed by the free state (n = 123, 9.8%), kwazulu-natal (n = 112, 8.9%), the eastern cape (n = 80, 6.4%), mpumalanga (n = 15, 1.2%), limpopo (n = 13, 1.0%), north west (n = 10, 0.8%) and the northern cape (n = 5, 0.4%). table 2 contains the median, mode, oldest, youngest and range of birth years of the sample. the youngest respondent was born in 1999 and the oldest in 1945. the most commonly occurring year of birth was 1985. table 2: descriptive statistics of the years of birth of respondents. data collection secondary data were obtained from thomas international ltd – the intellectual property right holder of the hpti. the dataset includes the raw data, with scores from strongly disagree (1) to strongly agree (7) of 1257 individuals. the participants completed the hpti for various purposes, including third-party recruitment and research conducted by thomas international ltd. only data of respondents who completed the hpti through the south african division of the organisation and had indicated their voluntary participation in further research were obtained. negatively phrased items were reverse scored. data analysis the primary data analyses were conducted in winsteps (linacre, 2020b) using the rasch rating scale model. the descriptive statistics were constructed in microsoft excel (microsoft corporation, redmond, washington, united states) and winsteps (linacre, 2020b). each of the six traits were analysed on person fit; descriptive statistics; item reliability and separation; item fit; unidimensionality and local dependence; and dif. descriptive statistics descriptive statistics were examined using microsoft excel to outline the trends in the demographics of the sample (table 1 and table 2), and responses of the sample (table 3). the scale statistics (table 3) were calculated from the person measures obtained from winsteps. table 3: descriptive statistics of person measure scores and reliability indices of each high potential trait indicator trait. person reliability and separation the person reliability indices of each trait were evaluated to which a person reliability of 0.70 and separation of 1.50 are regarded as sufficiently reliable. item fit misfitting items in the item fit analysis for each hpti trait were detected and labelled as either underfitting or overfitting the model based on the infit statistics: 0.60 ≥ mean squared (mnsq) ≥ 1.40 and z-standardised (zstd) ≥ |2|. unidimensionality and local independence the unidimensionality of each hpti scale was evaluated through pcar in winsteps (linacre, 2020b). scales demonstrating contrasts with eigenvalues ≥ 2 are considered to be in violation of unidimensionality (fan & bond, 2019). winsteps (linacre, 2020b) provides the eigenvalues of more than one contrast. the eigenvalue of the first contrast is provided for each trait (see table 4, column ‘1c load’). table 4: item statistics and fit status. evidence for local independence of each hpti scale items was based on correlations between item residuals. from these estimates, positive correlations between items of 0.7 or higher are considered to be in violation of local independence (fan & bond, 2019). differential item functioning the dif was analysed across the items of each hpti trait and between the relevant subgroups by means of a significant difference in the rasch–welch statistic and a sufficiently large dif contrast. a p-value less than 0.05 in winsteps indicates a significant difference, and an effect size of at least 0.43, as indicated by dif contrast in winsteps, is considered large enough. the subgroups for the dif analysis are gender, ethnicity and language. however, because of the differences in sample sizes between the european languages (english and afrikaans) and african languages (sesotho, isixhosa, isizulu, setswana, sepedi, xitsonga, isindebele and siswati), the african languages were collapsed to form the group ‘african languages’. the resultant language groups are thus african languages (n = 326, 26.2%), afrikaans (n = 370, 29.4%) and english (n = 558, 44.4%). this is not intended to reduce the differences in the african languages and may warrant further investigation with larger samples in the individual african languages. ethical considerations the secondary data obtained from thomas international ltd (thomas.co) contained the anonymised responses of individuals who completed the hpti and indicated their voluntary participation in further research. respondents were presented with the opportunity to indicate their voluntary participation in further research after completing the hpti. ethical approval was obtained from the psychology research and ethics committee at the university of pretoria (reference number: hum037/0720). results reliability the reliability indices ranged from adequate to inadequate (see table 3). curiosity obtained the highest reliability indices with a person reliability of 0.73 and separation of 1.62. risk approach (0.64, 1.33) had the lowest reliability indices, followed closely by ambiguity acceptance (0.65, 1.35), conscientiousness (0.65, 1.37), then competitiveness (0.69, 1.48) and adjustment (0.69, 1.49). item fit outfit and infit statistics were evaluated for the items of the hpti traits, with precedence given to infit. the infit and outfit statistics of the items can be viewed in table 4. conscientiousness contained two misfitting items: cn01 and cn10 underfit the model (in.mnsq = 1.42 and 2.02, in.zstd = 5.32 and 9.90, respectively). adjustment had one misfitting item, where aj06 underfit the model (in.mnsq = 1.42, in.zstd = 7.19). risk approach contained one underfitting item: ra13 (in.mnsq = 1.44, in.zstd = 9.37). ambiguity acceptance had two misfitting items: aa01 (in.mnsq = 1.48, in.zstd = 9.90) and aa13 (in.mnsq = 1.39, in.zstd = 8.88) underfit the model. unidimensionality and local independence the unidimensionality of each scale was examined through pcar. the item loadings of the first contrast can be seen in table 4 as ‘1c load.’. the results revealed that curiosity had the highest first contrast eigenvalue (λ = 2.24), followed by ambiguity acceptance (λ = 1.87), adjustment (λ = 1.75), conscientiousness (λ = 1.70), risk approach (λ = 1.70) and competitiveness (λ = 1.61). the largest standardised residual correlations were analysed to evaluate the local independence of the items of the scale. no item pairs were found to be above the correlation of 0.70, indicating that none of the items of the scales are in violation of local independence. differential item functioning gender the analysis of dif on gender revealed no items in concern across all hpti scales. while levels of significance were detected, the significant items are not described because the dif effect sizes of these items were negligible, indicating no practical significance. table 5 provides the items with significant p-values for dif between gender groups. table 5: the dif between gender groups. ethnicity the presence of dif was evaluated between ethnicities. table 6 displays the items across the traits with statistically significant and sufficiently large dif effect sizes. other instances of statistically significant differences were present; however, upon the evaluation of their dif contrasts, their effect sizes were found to be negligible and not included. the adjustment scale had no instances exhibiting dif between ethnicities. table 6: differential item functioning between ethnicity groups. four instances where dif may be present in the conscientiousness scale were identified. these instances spanned across two items: cn04 and cn08. item cn04 had the most instances in which dif may be present between ethnicities, with three of the four occurrences. item cn04 also had the greatest effect size of the four instances (dif contrast = –0.63) between the black african and white groups. nine instances of possible dif were identified in the curiosity scale. instances occurred in items cu02, cu04, cu06, cu13. items cu04 had the highest number of dif instances and the instance with the largest effect size (dif contrast = –0.65) between the black african and mixed-race groups, followed closely between the black african and white groups (dif contrast = –0.61). two dif instances were revealed across one risk approach item, ra13. the largest instance, in terms of effect size, was between the mixed race and white ethnic groups with a moderate to large effect (dif contrast = –0.51). ambiguity acceptance had four instances identified across two items. item aa02 had the largest effect size (dif contrast = 0.62) between the black african and white groups. competitiveness had four instances found across one item. item cm02 had the largest effect size of all hpti items (dif contrast = –0.85) between the black african and white groups. first-language groups the potential presence of dif was then evaluated between first-language groups. table 7 displays the items across the traits with statistically significant and sufficiently large effect sizes. other instances of statistically significant differences were present; however, upon the evaluation of their dif contrasts their effect sizes were found to be negligible and not included. adjustment and risk approach were found not to have items experiencing dif. table 7: differential item functioning between language groups. three instances of dif were detected across two conscientiousness items, cn02 and cn04, of which the instance with the largest effect size occurred in item cn04 between the african languages and afrikaans groups with slight to moderate effect (dif contrast = 0.50). curiosity had five instances identified, all of which were slight to moderate and between the african languages group and either english or afrikaans groups. the largest effect size was found to be in item cu13 between the english-speaking group and african language–speaking group (dif = –0.57). one item had been identified with two instances of potential dif in the ambiguity acceptance scale. item aa02 had a slight to moderate effect size between the afrikaans and african languages groups (dif contrast = –0.52) and english and african languages groups (dif contrast = –0.57). the competitiveness trait also had two instances across one item. item cm02 had a moderate to large effect size between the afrikaans and african languages groups (dif contrast = 0.71) and a slight to moderate effect size between the english and african languages groups (dif contrast = 0.53). discussion reliability this study set out to evaluate the psychometric properties of the hpti, a personality assessment. the psychometric properties were evaluated through rasch analysis, namely person reliability and separation, fit to the rasch model and the rasch version of dif. the reliability indices of five of the six hpti scales would not be considered reliable against the widely accepted minimum standards. the five scales are adjustment (0.69), competitiveness (0.69), conscientiousness (0.65), ambiguity acceptance (0.65) and risk approach (0.64), of which adjustment and competitiveness bordered on the minimum value required to be regarded as being reliable. when evaluating other personality-based psychological assessments in the south african context, de bruin et al.’s (2022) evaluation of the basic traits inventory revealed reliability indices (cronbach’s alpha) ranging from 0.87 (openness) to 0.94 (conscientiousness) in the adult sample. similarly, the myers–briggs type indicator® (mbti®; myers et al., 1998) obtained high alphas ranging from 0.88 (both sensing–intuition and thinking–feeling) to 0.91 (both extraversion–introversion and judging–perceiving. the rasch person reliability of the judging–perceiving dichotomy was 0.83, with the rest being 0.84 (van zyl & taylor, 2012). the south african personality inventory (sapi; fetvadjiev et al., 2015) achieved mean alphas from 0.71 (social relation – negative) to 0.81 (social relation – positive), although subscales had alphas as low as 0.61 (deceitfulness). when re-evaluating the sapi, morton et al. (2018) obtained alphas ranging from 0.61 (neuroticism) to 0.88 (social relation – positive). hill et al. (2021) evaluated the psychometric properties of the tshivenda and southern sotho versions of the sapi. the tshivenda version obtained mean alphas ranging from 0.61 (extraversion) to 0.72 (social relation – positive). the southern sotho version obtained alphas ranging from 0.50 (extraversion) to 0.77 (social relation – negative). boshoff and laher (2015) reviewed the utility of the neo-pi-3 (mccrae et al., 2005) in the south african context. they found reliability coefficients ranging from 0.61 (agreeableness) to 0.79 (extraversion) for the domains of the neo-pi-3. thus, the reliability findings of the hpti subscales are neither irregular nor the worst in the south african context but can certainly be improved upon. person reliability, according to linacre (2020a), is largely dependent on the dispersion of the characteristics of the sample – in other words, a sample with varying degrees of the trait being measured – the length of the instrument, the number of response options per item and the targeting of the sample and items. some traits’ reliability indices could potentially be impacted by the inadequate targeting, evident in the large differences between the average person scores in traits conscientiousness, adjustment, curiosity and risk approach and the constrained item measure average of zero. on the matter of targeting, boone et al. (2014) recommend the revision of the difficulty of the items accordingly, making them either more or less difficult to endorse. on the other hand, the scales that appear well targeted – ambiguity acceptance and competitiveness – may have their respective reliability indices impacted by the inability to adequately separate the higher scorers on the trait from those who have lower person measures of that trait, resulting in a lower person separation index and therefore reliability index. given the results, it may be difficult to defend the reliability of the hpti subscales in the south african context, with curiosity being the most defensible. item fit item fit statistics evaluated how well the items of each hpti trait conformed to the rasch model. fit mean-squared statistics greater than 1.40 indicate that the item is 40% less predictable (more varied) than the model. the same statistic under 0.60 indicates that the item is 40% more predictable (less varied) than the model expects (bond et al., 2020). six of the 78 hpti items (7.7%) underfit the rasch model: conscientiousness and ambiguity acceptance with two (15.4%) items each, adjustment and risk approach with one (7.7%) item each, while curiosity and competitiveness had no items underfitting the model. curiosity, however, had one item that bordered on the underfitting criterion of 1.40, item cu13. in contrast, the evaluation of the mbti® form m in the south african context had no items across all four dichotomous dimensions overfitting or underfitting the model at the criteria employed in this study (van zyl & taylor, 2012). according to tennant and conaghan (2007), however, fit to the rasch model can be influenced by item bias such as dif. differential item functioning the dif is an evaluation of how congruently the items of a measure define a construct between certain groups (boone, et al., 2014). it is especially important in cross-cultural settings (tennant et al., 2004). across all scales, while statistically significant dif was found between gender groups, the findings were not practically significant, as measured by dif contrast (linacre, 2020a). this suggests that the hpti scales do not contain items with definitions that are interpreted differently between men and women. therefore, none of the items of the hpti could be considered biased towards either men or women. in contrast, van zyl and taylor (2012) found 12 (13%) of the 93 items with a dif contrast above 0.43 (slight to moderate effect; linacre, 2020a) when evaluating dif between gender, two (2%) of which were above 0.64 (moderate to large effect; linacre, 2020a). apart from the adjustment scale, dif was discovered in several items in the ethnic groups and first-language groups comparisons. the severity varied across the scales. between ethnic or racial groups, the largest and second largest exhibitions of dif occurred in item cm02 in trait competitiveness between the black group and white group (–0.85) and the asian and indian group and mixed-race group (-0.73). most instances of dif involved the black african group. similarly, most instances of dif between language groups involved the african languages group. the dif between black and white ethnic groups was also found in van zyl and taylor (2012) and between african languages and both english and afrikaans in grobler and de beer’s (2015) evaluation of the basic traits inventory. the findings of the item-level bias are not dissimilar to historic findings of personality questionnaires in the south african context, in which bias between african and european ethnic groups and language groups is usually found and recommended for further investigation (abrahams & mauer, 1999; spence, 1982; taylor & boeyens, 1991). it may therefore be required that the relevant hpti items be re-examined in the south african context to reduce the item-level bias. limitations this study is not without limitations. firstly, with respect to the methodology, is the use of secondary data obtained from an organisation whose use in psychometric tools is largely in the recruitment sector (thomas.co, n.d.). this falls into the disadvantage of secondary data research expressed by boslaugh (2007), in which secondary data are often collected for purposes other than that of the research question using the secondary data. secondly, the limitation inherent in self-reporting personality assessments, especially used in decision-making, is one in which respondents may respond in a way that distorts or misrepresents them (coaley, 2010). it is therefore not unimaginable to have obtained data with some responses skewed towards the purposes the hpti was originally administered for, such as applications to employment. to address these two limitations, further research is encouraged in which respondents are randomly selected, the administration is standardised and the purpose of completing the assessment is exclusively for research and not for, say, the application process for employment. conclusion the study aimed to evaluate the psychometric properties of the six personality subscales of the hpti through rasch analysis and in the south african context. the core properties in question were reliability, fit to the rasch model and dif. the results indicate that while some subscales have some redeeming qualities, all subscales have their shortcomings and could be improved on for use in the south african environment. similar shortcomings have been acknowledged historically and found in more recent evaluations of personality instruments in south africa. two other personality instruments, using similar evaluation techniques, achieved high reliability and good fit to the rasch model but still experienced dif in certain items between either ethnicity or home language (grobler & de beer, 2015; van zyl & taylor, 2012). this illustrates the difficult, but not impossible, nature of creating an ideal personality instrument in the south african context, thus contributing to the wider body of knowledge of personality assessments in south africa, while simultaneously recommending further improvements upon an instrument used widely in the country. acknowledgements the authors would like to acknowledge and appreciate stephen cuppello at thomas international ltd for granting permission to use the organisation’s data, as well as providing the data for the authors to utilise in this study. competing interests the authors declare that, at the time of submission, d.s.s. was employed by the intellectual property rights holders, thomas international ltd. authors’ contributions d.s.s. was responsible for conceptualisation, methodology, formal analysis, data curation, resources and writing the original draft. d.j.f.m. was the supervisor of the study and contributed toward the conceptualisation and the reviewing and editing of the manuscript. the authors have declared that, with the tutelage and reviewership of d.j.f.m., the primary contribution of this article is credited to d.s.s. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the data that support the findings of this study are available from the corresponding author, upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references abrahams, f., & mauer, k.f. (1999). qualitative and statistical impacts of home language on responses to the items of the sixteen personality factor questionnaire (16pf) in south africa. south african journal of psychology, 29(2), 76–86. https://doi.org/10.1177/008124639902900204 arráiz, i., bruhn, m., & stucchi, r. (2016). psychometrics as a tool to improve credit information. the world bank economic review, 30(1), 67–76. https://doi.org/10.1093/wber/lhw016 barrick, m.r., mount, m.k., & judge, t.a. (2001). personality and performance at the beginning of the new millennium: what do we know and where do we go next? international journal of selection and assessment, 9(1), 9–30. https://doi.org/10.111/1468-2389.00160 bichi, a.a. (2016). classical test theory: an introduction to linear modelling approach to test and item analysis. international journal for social studies, 2(9), 27–33. retrieved from https://www.researchgate.net/publication/317012320 bond, t.g., yi, z., & heene, m. (2020). applying the rasch model: fundamental measurement in the human sciences (4th ed.). routledge. boone, w.j., staver, j.r., & yale, m.s. (2014). rasch analysis in the human sciences. springer. https://doi.org/10.1007/978-94-007-6857-4 boshoff, e., & laher, s. (2015). the utility of the neo-pi-3 in a sample of south african adolescents. new voices in psychology, 11(2), 16–35. https://doi.org/10.25159/1812-6371/1739 boslaugh, s. (2007). secondary data sources for public health: a practical guide. cambridge. https://doi.org/10.1017/cbo9780511618802 coaley, k. (2010). an introduction to psychological assessment and psychometrics. sage. cronbach, l.j. (1951). coefficient alpha and the internal structure of tests. psychometrika, 16, 297–334. de bruin, g.p., taylor, n., & zanfirescu, s.a. (2022). measuring the big five personality factors in south african adolescents: psychometric properties of the basic traits inventory. african journal of psychological assessment, 4, art. 85. https://doi.org/10.4102/ajopa.v4i0.85 devillis, r.f. (2017). scale development: theory and application (4th ed.). sage. fan, j., & bond, t. (2019). applying the rasch measurement model in language assessment: unidimensionality and local independence. in v. aryadoust & m. raquel (eds.), quantitative data analysis for language assessment volume 1: fundamental techniques (pp. 81–176). routledge. fetvadjiev, v.h., meiring, d., van de vijver, f.j.r., nel, j.a., & hill, c. (2015). the south african personality inventory (sapi): a culture-informed instrument for the country’s main ethnocultural groups. psychological assessment, 27(3), 827–837. https://doi.org/10.1037/pas0000078 finch, w.h., immekus, j.c., & french, b.f. (2016). applied psychometrics using spss and amos. information age publishing. foxcroft, c., & roodt, g. (2018). an overview of assessment: definition and scope. in c. foxcroft & g. roodt (eds.), an introduction to psychological assessment in the south african context (5th ed., pp. 3–11). oxford university press. government gazette. (1998). republic of south africa, vol. 400, no. 19370. government gazette. (2014). employment equity act (55/1998): as amended: draft employment equity regulations, 2014. regulation gazette, 10127(584), pretoria, 28 february 2014, no. 37338 labour department of government notice r. 124. grobler, s., & de beer, m. (2015). psychometric evaluation of the basic traits inventory in the multilingual south african environment. journal of psychology in africa, 25(1), 50–55. https://doi.org/10.1080/14330237.2014.997033 heffernan, e., maidment, d.w., barry, j.g., & ferguson, m.a. (2019). refinement and validation of the social participation restrictions questionnaire: an application of rasch analysis and traditional psychometric analysis techniques. ear & hearing, 40(2), 328–339. https://doi.org/10.1097/aud.0000000000000618 herman, j.l., stevens, m.j., bird, a., mendenhall, m., & oddou, g. (2010). the tolerance for ambiguity scale: towards a more refined measure for international management research. international journal of intercultural relations, 34(1), 58–65. https://doi.org/10.1016/j.ijintrel.2009.09.004 hill, c., hlahleni, m., & legodi, l. (2021). validating the indigenous versions of the south african personality inventory. frontiers in psychology, 12(1), art. 556565. https://doi.org/10.3389/fpsyg.2021.556565 hogan, r., & hogan, j. (2009). hogan development survey manual (2nd ed.). hogan assessment systems. judge, t.a., & locke, e.a. (1992). the effect of dysfunctional thought processes on subjective well-being and job satisfaction. journal of applied psychology, 78(3), 475–490. https://doi.org/10.1037/0021-9010.78.3.475 judge, t.a., higgins, c.a., thoresan, c.j., & barrick, m.r. (1999). the big five personality traits, general mental ability, and career success across the lifespan. personnel psychology, 52(3), 621–652. https://doi.org/10.1111/j.1744-6570.1999.tb00174.x kapalan, r.m., & saccuzzo, d.p. (2018). psychological testing: principles, applications, & issues (9th ed.). boston, ma: cengage learning. keenan, a., & mcbain, g.d.m. (1979). effects of type a behaviour, intolerance of ambiguity, and locus of control on the relationship between role stress and work-related outcomes. journal of occupational and organizational psychology, 52(4), 277–285. https://doi.org/10.1111/j.2044-8325.1979.tb00462.x linacre, j.m. (2020a). winsteps® rasch measurement computer program user’s guide. version 4.5.0. portland, oregon: winsteps.com linacre, j.m. (2020b). winsteps® (version 4.5.0) [computer software]. portland, oregon: winsteps.com. retrieved from https://www.winsteps.com/ linden, d., nijenhuis, j., & bakker, a.b. (2010). the general factor of personality: a meta-analysis of the big five intercorrelations and a criterion-related validity study. journal of research in personality, 44(3), 315–327. https://doi.org/10.1016/j.jrp.2010.03.003 lord, f.m., & novick, m.r. (1968). statistical theories of mental test scores. information age publishing. macrae, i., & furnham, a. (2016). high potential traits inventory: leadership capacity testing manual. high potential psychology ltd. macrae, i., & furnham, a. (2020). a psychometric analysis of the high potential trait inventory (hpti). psychology, 11(8), 1125–1140. https://doi.org/10.4236/psych.2020.118074 mccall, m.w. (1997). high flyers: developing the next generation of leaders. harvard business school. mccrae, r.r., & costa, p.t. (1985). updating norman’s ‘adequacy taxonomy’: intelligence and personality dimensions in natural language and in questionnaires. journal of personality and social psychology, 49(3), 710. https://doi.org/10.1037/0022-3514.49.3.710 mccrae, r.r., costa, t.p. jr., & martin, t.a. (2005). the neo-pi-3: a more readable revised neo personality inventory. journal of personality assessment, 84, 261–270. morton, n., hill, c., & meiring, d. (2018). validating the south african personality inventory (sapi): examining green behavior and job crafting within a nomological network of personality. international journal of personality psychology, 4(1), 25–38. myers, i.b., mccaulley, m.h., quenck, n.l., & hammer, a.l. (1998). mbti® manual (3rd ed.). cpp, inc. nunnally, j.c. (1978). psychometric theory. mcgraw-hill. pallant, j. (2020). spss survival manual: a step by step guide to data analysis using ibm spss (4th ed.). routledge. petrides, k.v. (2009). technical manual for the trait emotional intelligence questionnaires (teique). london psychometric laboratory. petrillo, j.p., cano, s.j., mcleod, l.d., & coon, c.d. (2015). using classical test theory, item response theory, and rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples. value in health, 18, 25–34. rasch, g. (1960). probabilistic models for some intelligence and attainment tests. university of chicago press. roodt, g., & de kock, f. (2018). reliability: basic concepts and measures. in foxcroft & roodt (eds.), introduction to psychological assessment in the south african context (5th ed., pp. 59–68). cape town, south africa: oxford university press. spence, b.a. (1982). a psychological investigation into the characteristics of black guidance teachers. unpublished master’s dissertation, university of south africa. taylor, t.r., & boeyens, j.c. (1991). the comparability of the scores of blacks and whites on the south african personality questionnaire: an exploratory study. south african journal of psychology, 21(1), 1–11. https://doi.org/10.1177/008124639102100101 tennant, a., & conaghan, p.g. (2007). the rasch measurement model in rheumatology: what is it and why use it? when should it be applied, and what should one look for in a rasch paper? arthritis and rheumatism, 57(8), 1358–1362. https://doi.org/10.1002/art.23108 tennant, a., penta, m., tesio, l., grimby, g., thonnard, j., slade, a., lawton, g., simone, a., carter, j., lundgren-nilsson, å, tripolski, m., ring, h., biering-sørensen, f., marincek, č, burger, h., & phillips, s. (2004). assessing and adjusting for cross-cultural validity of impairment and activity limitation scales through differential item functioning within the framework of the rasch model: the pro-esor project. medical care, 42(1), 137–148. https://doi.org/10.1097/01.mlr.0000103529.6313277 van zyl, c.j.j., & taylor, n. (2012). evaluating the mbti® form m in a south african context. south african journal of industrial psychology, 38(1), art. 977, 15 pages. https://doi.org/10.4102/sajip.v38i1.977 wang, g., & netemeyer, r.g. (2002). the effect of job autonomy, customer demandingness, and trait competitiveness on salesperson learning, self-efficacy, and performance. journal of the academic study of marketing science, 30(3), 217–227. https://doi.org/10.1177/00970302030003003 wright, b.d., & linacre, j.m. (1994). reasonable mean-square fit values. rasch measurement transactions, 8, 370–371. wright, b.d., & masters, g.n. (1982). rating scale analysis: rasch measurement. mesa press. wright, b.d., & mok, m.c.m. (2004). an overview of the family of rasch measurement models. in e.v. smith & r.m. smith (eds.), introduction to rasch measurement (pp. 1–24). jam press. yang, y., & green, s.b. (2011). coefficient alpha: a reliability coefficient of the 21st century? journal of psychoeducational assessment, 29(4), 377–392. https://doi.org/10.1177/0734282911406668 abstract introduction methods review findings implications and recommendations conclusion acknowledgements references about the author(s) nabeelah bemath department of psychology, faculty of humanities, university of the witwatersrand, johannesburg, south africa citation bemath, n. (2020). relevance of the person-environment fit approach to career assessment in south africa – a review. african journal of psychological assessment, 2(0), a22. https://doi.org/10.4102/ajopa.v2i0.22 review article relevance of the person-environment fit approach to career assessment in south africa – a review nabeelah bemath received: 11 dec. 2019; accepted: 30 apr. 2020; published: 18 june 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract despite concerns regarding its relevance, the person-environment fit approach to career counselling assessment remains a popular one in the south african context. this may be due to a lack of awareness of, or regard for, these concerns among career counselling assessment practitioners working in south africa. this narrative review thus aimed to summarise literature regarding the relevance of the person-environment fit approach to career counselling assessment in south africa and alternatives to this approach. keywords were used to search for, and identify, literature on several electronic databases. additional literature was identified through citations and citing publications in the initial literature obtained. given the nature of a narrative review, no inclusion, exclusion or appraisal criteria were specified. based on the review of literature, the following themes and subthemes were identified: questionable relevance of the person-environment approach (inadequate reliability and validity of tests in the south african context, western-based theoretical underpinnings, language and socio-economic bias, and inadequate norms for the south african context) and alternative directions of career counselling assessment in this context (development of emic tests, qualitative assessment approaches and integrated assessment approaches). the findings suggest that an integrated quantitative-qualitative approach to career counselling assessment may be a feasible alternative to the person-environment fit approach. however, further research and development regarding the person-environment fit approach and other career counselling assessment approaches is required in order to move towards a more relevant career counselling assessment practice in south africa. keywords: person-environment fit approach; career assessment; career counselling; relevance; south africa. introduction the person-environment fit approach to career counselling assessment is widely used in the south african context (watson & mcmahon, 2013). this approach requires the client to know themselves and the world of work, and thus find a fit between the two to make an appropriate career-related decision (de bruin & de bruin, 2013). to achieve this, assessment is conducted using the following categories of psychological tests: cognitive or aptitude, personality, interests and values (de bruin & de bruin, 2013). however, challenges in using this approach to career counselling assessment in south africa (e.g. de bruin & de bruin, 2013; maree & beck, 2004) suggest that the relevance of this approach in this context is questionable. what is problematic is that many career assessment practitioners in south africa remain unware of, or ignore, these issues (maree, 2013; watson & stead, 2002). there is thus a need for a scholarly summation of the relevance of this approach in the south african context. findings from such a summation may contribute to knowledge of the relevance of this approach in south africa and potentially facilitate the awareness, development and practice of a more contextually relevant assessment practice in this context. this narrative review thus aimed to describe literature regarding the relevance of the person-environment fit approach to career counselling assessment in south africa, with a focus on literature pertaining to specific aptitude, personality and interest tests that are commonly used in this context, namely, the differential aptitude test (dat) which assesses aptitude in terms of potential to obtain an ability with a given degree of training, the 16 personality factor questionnaire (16pf) which measures personality in terms of 16 primary personal traits and the relationship between these through five underlying second-order factors, and the self-directed search (sds) which assesses interests using six broad domains of interest as per holland’s theory of vocational personalities and work environment (du toit & de bruin, 2002; foxcroft, paterson, le roux, & herbst, 2004; van eeden, taylor, & prinsloo, 2013). the objectives of this review are (1) to identify, summarise and describe literature regarding the psychometric properties and use of the dat, 16pf and sds for career counselling assessment in the south african context, (2) to appraise the relevance of the person-environment fit approach in this context using the above-mentioned literature and (3) to identify and describe alternative approaches to the person-environment fit approach to career counselling assessment in south africa. methods a narrative literature review was conducted. this comprised an examination of published literature on a broad topic, in order to consolidate and summarise information on that topic and identify gaps in the knowledge area that need to be addressed (grant & booth, 2009). in contrast to systematic reviews, which aim to provide a comprehensive synthesis of knowledge on a topic, narrative reviews do not entail a structured approach to the search and selection of literature to include and review (grant & booth, 2009; greenhalgh, thorne, & malterud, 2018). for this reason, an exhaustive and detailed search process was not adopted and reported for the current review. this review used combinations of keywords such as ‘person-environment fit approach south africa’, ‘career assessment south africa’ and ‘16 personality factor questionnaire south africa’ to search for and identify literature, on several electronic databases (e.g. sabinet, ebscohost) and the search engines google and google scholar. articles that addressed the above-mentioned objectives were included. further relevant literature was identified through consulting the citations and citing publications of the initial literature obtained. themes across the literature obtained were identified and are presented. ethical consideration this article followed all ethical standards for research without direct contact with human or animal subjects. review findings questionable relevance of the person-environment approach to career assessment in south africa although the dat, 16pf and sds are used widely in south africa, research pertaining to these tests raises concerns regarding their relevance (and thus the relevance of the person-environment fit approach) in this context (see de bruin & taylor, 2013; gevers, du toit, & harilall, 1995; van eeden & de beer, 2013; van eeden et al., 2013). based on the literature, the relevance of this approach can be considered in terms of the following subthemes: inadequate reliability and validity of tests in the south african context, western-based theoretical underpinnings, language and socio-economic bias, and inadequate norms for the south african context. inadequate reliability and validity of tests in the south african context the existing evidence for the dat (coetzee & vosloo, 2000; owen, 2000; patel, 2004), 16pf (mcdonald & van eeden, 2014; jvr academy, n.d; jvr psychometrics, 2011; schepers & hassett, 2006) and sds (du toit & de bruin, 2002; gevers et al., 1995) generally indicate that these tests demonstrate good reliability and validity. however, minimal research has explored the psychometric properties of these tests, particularly the dat and sds, in south africa. a search on google, google scholar and the search engines linked to ebscohost using the search terms of ‘differential aptitude test south africa’, ‘dat’, ‘dat south africa’, ‘aptitude tests in south africa’ and variations of the same indicated that there is a paucity of research on all four forms of the dat and consequently its psychometric properties. literature has also specifically argued that little research has explored the reliability and validity of form k of the dat (laher & mokone, 2008). similarly, there is limited evidence of the psychometric properties of the sds in south africa (du toit & de bruin, 2002; watson, foxcroft, & allen, 2007). this indicates that support for the reliability and validity of these tests in this context is lacking. furthermore, evidence suggests that the psychometric properties of the dat, 16pf and sds may not replicate across different subgroups, rendering the relevance of the tests in south africa questionable. firstly, research pertaining to the cross-cultural validity of these tests is concerning. minimal research has explored the equivalence of cognitive tests (like the dat) (meiring, van de vijver, rothmann, & barrickvan, 2005) and the sds across groups within the south african context (allen, 2005; du toit & de bruin, 2002; watson et al., 2007). this raises concerns regarding the suitability of the dat and sds in south africa’s multicultural context. while there is a more substantial body of evidence regarding the cross-cultural validity of the 16pf in south africa (see de bruin & taylor, 2013; jvr psychometrics, 2011), some research does not support its cross-cultural validity in this context (abrahams, 2002; abrahams & mauer, 1999b; jvr academy, n.d.; schepers & hassett, 2006; van eeden, taylor, & du toit, 1996, as cited in van eeden et al., 2013). this suggests that the test may be culturally biased (see jvr academy, n.d.; meiring et al., 2005; prinsloo & ebersöhn, 2002). the cross-cultural applicability of the 16pf and sds is particularly concerning as these tests were originally developed in western contexts (maree, 2013). western-based theoretical underpinnings the theoretical underpinnings of the 16pf and sds further suggest that these tests may be culturally biased in south africa. the 16pf may be culturally biased in south africa as indigenous manifestations of personality may not be measured in western-based tests (van eeden & mantsha, 2007). people from non-western backgrounds may attach different meanings to the constructs being assessed by this test (van eeden & mantsha, 2007). for example, research suggests that the warmth factor in the 16pf does not manifest within the tshivenda culture in terms of open expression of feelings, as they are an emotionally reserved culture (van eeden & mantsha, 2007). similarly, the cultural validity and relevance of holland’s model and the sds in south africa is questionable as the meaning ascribed to the six broad interests of holland’s model in this context may differ from that in the usa (du toit & de bruin, 2002). for example, the collectivistic value of ‘ubuntu’ (which emphasises helping others over oneself) can alter the meaning ascribed to career interests (du toit & de bruin, 2002). interests may play less of a role in cultures that value ubuntu, in comparison to cultures where individualism is emphasised (du toit & de bruin, 2002). the limited research that has explored the suitability of the sds and holland’s theory in south africa is equivocal, with some research supporting the cross-cultural validity of holland’s model in south africa while other research does not (see du toit & de bruin, 2002; morgan, de bruin, & de bruin, 2015a, 2015b; watson et al., 2007). consequently, the relevance of holland’s theory and the sds for certain south african population groups is questionable (du toit & de bruin, 2002; morgan et al., 2015b) and its use can result in unfair treatment of clients (morgan & de bruin, 2017). language and socio-economic bias research further suggests that the dat, 16pf and sds may be biased in terms of language. as test-takers who are completing a psychometric test in a second language tend to first mentally translate a test question into their home language and then select an answer, the performance of second language test-takers on the dat (a timed test) may be negatively affected by this process (kgosana, 2017). this is supported by south african research which found population mean score differences on the precursor to the dat-k (owen, 1991, as cited in laher & mokone, 2008) and significant differences in performance on the dat-k between english and african first language black south african students (laher & mokone, 2008). research conducted with the latter sample also found the coefficients for the verbal subtests to be lower than those reported in the test manual (laher & mokone, 2008). south african students have also criticised the language used in the dat-k (bischof & alexander, 2008). english language proficiency has also been shown to differentiate between learners’ scores on the dat-s (macfarlane, 2006). second language test-takers may thus perform lower than first language test-takers (macfarlane, 2006); this is a concern since the dat is available in only two of the 11 official languages in south africa (english and afrikaans; differential aptitude tests – forms r, s, k and l, n.d.). these lower scores would be a function of the language of the test and not their cognitive functioning (kgosana, 2017), making it difficult to know if test results reflect language problems or actual ability (laher & cockcroft, 2013). similarly, language competency has been said to affect south african test-takers’ responses to the 16pf (jvr academy, n.d.). research investigating the understanding of vocabulary used in the 16pf-sa92 (abrahams & mauer, 1999a) and 16pf5 (mcdonald & van eeden, 2014) found that first language english and afrikaans speakers scored significantly higher than second language speakers. poor internal consistency reliability coefficients were also found on the 16pf5 among a sample of tshivenda first language speakers (van eeden & mantsha, 2007). the findings suggest that the test may be biased against second language english or afrikaans speakers, where language proficiency affects test performance (abrahams & mauer, 1999a; mcdonald & van eeden, 2014). this may also account for the lack of construct equivalence found across language groups (meiring et al., 2005; van eeden & mantsha, 2007) while translation of the test into different languages may address these issues, attempts to translate the 16pf5 into zulu and tshivenda have faced several challenges. these challenges included: the presence of different zulu dialects across different regions, retaining the original meaning of test items and the absence of equivalent words and expressions in zulu and tshivenda (jvr academy, n.d.; mcdonald & van eeden, 2014; van eeden & mantsha, 2007). the difficulty in translating these tests so that they are appropriate for the south african context raises further concern regarding the test’s relevance in this context. the sds may also suffer from language bias, which casts doubt on the relevance of this test in south africa. research showing poor fit between south african test-takers’ sds scores and holland’s model has been argued to be a function of test-takers experiencing difficulty in fully comprehending the meanings of test items (du toit & de bruin, 2002). the effect of socio-economic factors on scores obtained on the dat, 16pf and sds also raises concern regarding the relevance of these tests in south africa. in terms of the dat, the measurement of aptitudes assumes that everyone who has taken the measure has had the same exposure to these aptitudes; however, this may not necessarily be the case (puchert, dodd, & viljoen, 2017; van eeeden & de beer, 2013). students from disadvantaged educational backgrounds are unlikely to have been exposed to the same knowledge and skills as students from advantaged backgrounds (kgosana, 2017). although the dat-k is suitable for learners from disadvantaged educational backgrounds (laher & mokone, 2008), this may negatively affect performance on cognitive tests like the dat (puchert et al., 2017). these results may be erroneously interpreted as poor cognitive ability as opposed to lack of skills or experience in comparison to advantaged peers, possibly leading to the unfair use and interpretation of test results (puchert et al., 2017). insufficient exposure and disadvantaged educational background have also been reported to affect performance on the 16pf5, where this had led to tests-takers experiencing difficulty with certain items (van eeden & mantsha, 2007). socio-environmental factors may also partly affect sds tests scores. the apartheid legacy of social inequality and employment and other labour conditions (which are likely to differ to those found in western contexts) may shape how test-takers in south africa perceive the world of work and thus respond to the sds (du toit & de bruin, 2002; morgan & de bruin, 2017; van wijk & fourie, 2017; watson, mcmahon, & longe, 2011). for example, socio-economic status may be a more influential factor in south africans’ career decisions than career interests (van wijk & fourie, 2017). these factors may yield response patterns that depict poor fit between the test results and holland’s model (du toit & de bruin, 2002; morgan et al., 2015a). these tests thus do not consider the impact that sociocultural factors have on career-related decisions. given the possible biases and challenges faced in using the dat, 16pf and sds in south africa, the relevance of these tests in this context is thus questionable. this is particularly because this can lead to test results that provide unreliable and invalid reflections of test-takers, which can in turn lead to inappropriate career guidance and counselling (mcdonald & van eeden, 2014; prinsloo & ebersöhn, 2002; wallis & birt, 2003). inadequate norms for the south african context the relevance of the dat, 16pf and sds is further questionable given the issues regarding the norms of these tests. the norms for the dat and sds can be considered outdated. in terms of the dat, the most recent norms available for most dat forms were published in 2000 (differential aptitude tests – forms r, s, k and l, n.d.). in terms of the sds, the south african version of the test is outdated (a problem in and of itself as the test does not align with the current world of work), thus suggesting that the norms are outdated as well (du toit & de bruin, 2002; van wijk & fourie, 2017). this is concerning as outdated norms do not align with societal changes that affect test performance and thus can result in inaccurate or non-meaningful interpretations (kaplan & saccuzzo, 2017). furthermore, information available on the norms of the dat and 16pf suggest that these norms are not representative of the south african population (see differential aptitude test form k, n.d.; jvr academy, n.d.; jvr psychometrics, 2016). the standardisation of these tests also failed to consider factors that influence test scores such as quality of education and language (kgosana, 2017; wallis & birt, 2003). this is problematic as these issues can also lead to erroneous, negative interpretation and use of test scores (kaplan & saccuzzo, 2017). alternative approaches to career counselling assessment in south africa considering the challenges currently facing the person approach to career counselling assessment in south africa, alternative approaches to career counselling assessment in this context have been proposed to enable a more relevant assessment practice (maree, 2013; watson & stead, 2002). the following subthemes were identified: development of emic tests, qualitative assessment approaches and integrated assessment approaches. development of emic tests for instance, researchers and practitioners have engaged in developing and implementing new and contextually relevant assessments that can be utilised in career counselling (maree, 2013; 2016; mcdonald & van eeden, 2014). some examples of such tests are the south african personality inventory (van eeden & mantsha, 2007) and local interest instruments like the maree career matrix and south african career interest inventory (maree & taylor, 2016; morgan et al., 2015a, 2015b). these instruments are not without limitations. for example, despite the south african career interest inventory being a few years into development, research on the south african career interest inventory is limited by unrepresentative, homogenous and small samples, and issues of bias and validity require further exploration (morgan et al., 2015a, 2015b). research regarding the possible cultural and linguistic biases of these tests is also lacking (allen, 2005; maree, 2010; rabie & naidoo, 2019). hence, the relevance of these tests is not well established. qualitative assessment approaches another alternative to the person-environment fit approach is the adoption of qualitative approaches to career counselling assessment (albien & naidoo, 2016; morgan, 2010). unlike the quantitative tests that comprise the person-environment fit approach, qualitative approaches allow for the effect of socio-environmental factors on career development in the country to be considered (albien & naidoo, 2016; buthelezi, alexander, & seabi, 2009). these approaches additionally provide the client with an opportunity to take on a more active role in the assessment, helping them to gain exposure to skills that they can use in future career-related decisions and thus develop themselves (morgan, 2010; watson & mcmahon, 2013). in doing so, it removes the notion that assessment practitioners are the all-knowing experts (maree, 2015), a notion that can have negative implications given the political history of assessment in south africa (bischof & alexander, 2008). there are various qualitative approaches that have been put forward. for instance, the systems theory considers the role of intrapersonal, interpersonal and broader socio-environmental elements in clients’ career development, and can make use of appropriate qualitative assessment instruments such as the my system of career influences (albien & naidoo, 2016; de bruin & de bruin, 2013). similarly, the career construction theory focuses on clients’ life themes and construction of themselves and their careers. an offshoot of this theory is life-design counselling, which focuses on clients’ life and career narratives (cook & maree, 2016). examples of instruments that can be used in these approaches are: the career style interview (which is used to encourage clients to narrate and find meaning in their career stories; de bruin & de bruin, 2013) and the narrative component of the career interest profile (a locally developed qualitative instrument that provides narrative data regarding the client’s career interests; maree, 2017). it should be noted that while qualitative assessment approaches help to address some of the limitations of the person-environment approach, this does not negate the utility of quantitative psychometric tests (di fabio & maree, 2013; morgan, 2010). there are other limitations pertaining to the above-mentioned qualitative assessment approaches. for example, although the my system of career influences has been successfully used with south african clients, improvements are required particularly in terms of translating the instrument into different official languages (see albien & naidoo, 2016; watson & mcmahon, 2013). there is also a lack of locally relevant qualitative assessments that have been developed (maree & beck, 2004; watson & mcmahon, 2013). an additional limitation of these qualitative approaches is that they can be time and labour intensive (maree & beck, 2004; watson & mcmahon, 2013). integrated assessment approaches an alternative approach to career counselling assessment that has more recently been advocated for is the integration of quantitative and qualitative assessment approaches (maree, 2015, 2017). an integration of quantitative and qualitative approaches would involve obtaining clients’ objective psychometric test scores and subjective accounts of their career and life stories (maree, 2015). information obtained from both approaches is then combined and drawn upon together during the counselling process (watson & mcmahon, 2013). this integration can be facilitated using instruments such as the career interest profile, which comprises both quantitative and qualitative components (di fabio & maree, 2013). the integration of quantitative and qualitative approaches to assessment provides a comprehensive approach that will best meet the needs of clients (maree & morgan, 2012). it allows for assessment results to be triangulated, thus eliciting more reliable and valid results (maree, 2013; 2015). in this way, the limitations of using older assessment approaches or using quantitative and qualitative approaches in isolation within the south african context may be addressed (maree, 2010; maree & beck, 2004). some studies have conducted research using an integrated assessment approach (cf. maree, 2014, 2018, 2019; maree, gerryts, fletcher & olivier, 2019; mcmahon, watson, & zietsman, 2018; naidoo et al., 2019). while these show the promise of an integrated assessment approach to career counselling in the south african context, these studies are limited by the lack of longitudinal research, lack of translated instruments or use of translators with samples who are non-english first language speakers, use of a few specific quantitative and qualitative assessments, and lack of diverse samples. implications and recommendations based on the problems discussed in relation to the dat, 16pf and sds, it appears that these tests may not be relevant in this context. in their current forms, these tests appear to be culturally and linguistically biased and do not consider the subjective experiences of clients where external factors shape career-related decisions (de bruin & de bruin, 2013; maree, 2015; van wijk & fourie, 2017). using these tests on their own, and by implication the person-environment fit approach, for career counselling assessment may be inappropriate and potentially harmful in this context (bischof & alexander, 2008; watson et al., 2007). indeed, this approach has previously been critiqued in this regard (de bruin & de bruin, 2013; maree, 2015). although there has been some development of contextually relevant tests that can be used for career counselling assessment, the persistent popularity of the person-environment fit approach in south africa indicates that there is a need for existing tests to be updated and adapted (maree, 2013; 2016). the use of flexible assessment practices, further research and development of emic instruments are also encouraged (cf. laher & cockcroft, 2017; watson & mcmahon, 2013). of the alternatives to the person-environment fit approach that were identified in the literature, the integrated approach appears to be the more suitable alternative given its comprehensive nature. however, merely integrating quantitative and qualitative approaches may be insufficient for several reasons. integrating quantitative and qualitative approaches without addressing the limitations present in each approach is problematic. issues such as the need to translate instruments into the different official languages and to develop locally relevant assessments would still apply. integrating quantitative and qualitative assessments thus does not resolve the issue of language bias in career counselling assessment. assessments used in both approaches would need to be evaluated for this, and may require translation or the assistance of appropriately trained translators when assessing clients who are non-english first language speakers (cf. naidoo et al., 2019). similarly, the theoretical frameworks and constructs that underpin the chosen integrated approach also need to be evaluated for their suitability within the south african context (cf. arthur & mcmahon, 2018). careful consideration of the appropriateness of the assessments chosen when using an integrated approach in the south african context is thus necessary, whether this be an alternative assessment such as the career interest profile (di fabio & maree, 2013) or integration of individual quantitative and qualitative assessments. additional considerations may be required for possible qualitative assessments, as research and evidence regarding qualitative career counselling assessments is limited, with most existing research lacking in rigour (mcmahon, 2019) and located in western contexts (mcmahon, watson, & lee, 2019). assessments that have been evaluated in the south african context should preferably be used. further research would also be required on etic qualitative assessments that have not been evaluated in this context, and emic assessments that have not been evaluated across diverse groups. some qualitative career counselling assessments are also located within a positivist paradigm, and thus may not complement the quantitative assessment in providing the co-construction of realities traditionally associated with the use of qualitative assessments (mcmahon, 2019). the underlying paradigm and framework of the chosen assessment would thus also need to be considered. the choice of quantitative assessments is no less important, as quantitative scores may determine the interpretation of the qualitative assessment results (cf. mcmahon et al., 2018). if the quantitative assessment is psychometrically concerning in the south african context, this may problematically impact the interpretation and application of the integrated assessment results. therefore, without addressing the limitations of the quantitative and qualitative approaches, and carefully selecting the assessments to be used in an integrated approach, it is unlikely that one approach would compensate for the limitations of the other despite being used in an integrated manner. further research regarding the use of an integrated approach in the south african context is also required, as this body of research appears to be minimal and has several limitations (c.f. maree, 2014, 2018, 2019; maree et al., 2019; mcmahon et al., 2018; naidoo et al., 2019). the rigour of an integrated approach is also said to be unclear (see maree, 2018), suggesting this too needs to be evaluated. another matter that needs to be addressed is practitioners’ tendency to disregard calls for using an integrated approach, which is often linked to lack of knowledge and guidance, or concerns, regarding qualitative approaches (maree, 2013; mcmahon, 2019). in addition, practitioners in south africa may experience difficulty in integrating these approaches given the additional time and resources required in incorporating a qualitative approach (maree & beck, 2004). hence it is unlikely that practitioners will rely on both approaches. there is thus a need for the education and training of students and practitioners in both qualitative and integrated approaches. user guides to these approaches in the south african context may also require development. given the resource demands of an integrated approach, coupled with the minimal number of psychological practitioners in south africa and inaccessibility to career counselling assessment in lower socio-economic contexts in the country (cf. maree et al., 2019), the feasibility of the approach in the south african context also needs evaluation. while the integrated approach to career counselling assessment thus appears to be a suitable alternative to the person-environment fit approach in south africa, further research, development and education regarding this approach is required. conclusion given the person-environment fit approach’s reliance on primarily eurocentric, outdated psychometric tests whose psychometric properties and cross-cultural applicability are concerning in the south african context, the relevance of this approach to career counselling assessment in this context is questionable. alternatives to this approach also have several limitations. these concerns suggest that researchers and practitioners involved in career counselling assessment need to engage in test adaptation and development, address limitations present in current quantitative and qualitative approaches to assessment, educate students and professionals to critically engage with both approaches when conducting career counselling assessment, and conduct methodologically rigorous research regarding the feasibility and effectiveness of an integrated approach. in doing so, there may be a more reliable, valid, relevant and fairer practice of career counselling assessment in south africa that can better address the vocational needs of its population. acknowledgements competing interests the author declares that she has no financial or personal relationships that may have inappropriately influenced her in writing this article. author’s contributions i declare that i am the sole author of this article. funding information no funding was received for this article. data availability statement data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views expressed in the submitted article are the author’s own and not an official position of the institution or funder. references abrahams, f. (2002). the (un) fair usage of the 16pf (sa92) in south africa: a response to ch prinsloo and i. ebersöhn. south african journal of psychology, 32(3), 58–61. https://doi.org/10.1177/008124630203200308 abrahams, f., & mauer, k.f. (1999a). qualitative and statistical impacts of home language on responses to the items of the sixteen personality factor questionnaire (16pf) in south africa. south african journal of psychology, 29(2), 76–86. https://doi.org/10.1177/008124639902900204 abrahams, f., & mauer, k.f. (1999b). the comparability of the constructs of the 16pf in the south african context. south african journal of industrial psychology, 25(1), 53–59. https://doi.org/10.4102/sajip.v25i1.679 albien, a.j., & naidoo, a.v. (2016). social career influences of xhosa adolescents elicited using the systems theory framework in a peri-urban south african township. south african journal of higher education, 30(3), 111–137. https://doi.org/10.20853/30-3-668 allen, l.j. (2005). the appropriateness of holland’s interest code typology for south african field guides. unpublished masters thesis, nelson mandela metropolitan university, durban. arthur, n., & mcmahon, m. (2018). contemporary career development theories: expanding international perspectives. in n. arthur & m. mcmahon (eds.), contemporary theories of career development (pp. 241–257). new york, ny: routledge. bischof, d., & alexander, d. (2008). post-modern career assessment for traditionally disadvantaged south african learners: moving away from the ‘expert opinion’. perspectives in education, 26(3), 7–17. buthelezi, t., alexander, d., & seabi, j. (2009). adolescents’ perceived career challenges and needs in a disadvantaged context in south africa from a social cognitive career theoretical perspective. south african journal of higher education, 23(1), 505–520. https://doi.org/10.4314/sajhe.v23i3.51033 coetzee, m.a., & vosloo, h.n. (2000). manual for the differential aptitude tests form k. pretoria: nhuman sciences research council. cook, a., & maree, j.g. (2016). efficacy of using career and self-construction to help learners manage career-related transitions. south african journal of education, 36(1), 1225. https://doi.org/10.15700/saje.v36n1a1225 de bruin, g.p., & de bruin, k. (2013). career counselling assessment. in c. foxcroft & g. roodt (eds.), introduction to psychological assessment in the south african context (4th edn., pp. 201–212). oxford university press southern africa cape town. de bruin, g.p., & taylor, n. (2013). personality assessment. in c. foxcroft & g. roodt (eds.), introduction to psychological assessment in the south african context (4th edn., pp. 185–200). oxford university press southern africa, cape town. di fabio, a., & maree, j.g. (2013). career counselling: the usefulness of the career interest profile (cip). journal of psychology in africa, 23(1), 41–49. https://doi.org/10.1080/14330237.2013.10820592 differential aptitude test (dat) form k – 2008 edition. (n.d.). retrieved from http://www.mindmuzik.co.za/index.php?page=shop.product_details&flypage=flypage.tpl&product_id=149&category_id=3&option=com_virtuemart&itemid=78 differential aptitude tests – forms r, s, k and l (dat). (n.d.). retrieved from http://mindmuzik.com/index.php?page=shop.product_details&category_id=3&flypage=flypage.tpl&product_id=150&option=com_virtuemart&itemid=78 du toit, r., & de bruin, g.p. (2002). the structural validity of holland’s riasec model of vocational personality types for young black south african men and women. journal of career assessment, 10(1), 62–77. https://doi.org/10.1177/1069072702010001004 foxcroft, c., paterson, h., le roux, n., & herbst, d. (2004). psychological assessment in south africa: a needs analysis. the test use patterns and needs of psychological assessment practitioners. pretoria: human sciences research council. gevers, j., du toit, r., & harilall, r. (1995). manual for the self-directed search questionnaire. pretoria: human sciences research council. grant, m.j., & booth, a. (2009). a typology of reviews: an analysis of 14 review types and associated methodologies. health information & libraries journal, 26(2), 91–108. https://doi.org/10.1111/j.1471-1842.2009.00848.x greenhalgh, t., thorne, s., & malterud, k. (2018). time to challenge the spurious hierarchy of systematic over narrative reviews? european journal of clinical investigation, 48(6), e12931. https://doi.org/10.1111/eci.12931 jvr academy. (n.d.). 16pf® fifth edition training manual. jvr academy. retrieved from https://www.jvracademy.co.za/?dl_name=16pf5_manual_academy_colours.pdf jvr psychometrics. (2011). investigating the ethnic equivalence of the 16pf5-sa. jvr academy, johannesburg. jvr psychometrics. (2016, 29 may). 16pf norm update [blog post]. retrieved from https://jvrafricagroup.co.za/16pf5-re-standardised-south-africa/ kaplan, r.m., & saccuzzo, d.p. (2017). psychological testing: principles, applications, and issues (9th edn.). boston, ma: cengage learning. kgosana, m.c. (2017). affirmative action and psychometric tests use in the south african national defense force: are they complementary or conflicting forces? journal of defense management, 2, 112–118. https://doi.org/10.4172/2167-0374.1000112 laher, s., & cockcroft, k. (2013). current and future trends in psychological assessment in south africa: challenges and opportunities. in s. laher & k. cockcroft (eds.), psychological assessment in south africa: research and applications (pp. 535–552). wits university press, johannesburg. laher, s., & cockcroft, k. (2017). moving from culturally biased to culturally responsive assessment practices in low-resource, multicultural settings. professional psychology: research and practice, 48(2), 115–121. https://doi.org/10.1037/pro0000102 laher, s., & mokone, m. (2008). exploring the reliability and validity of the dat-k in grade 11 learners in a historically disadvantaged school in johannesburg, south africa. journal of psychology in africa, 18(2), 249–253. https://doi.org/10.1080/14330237.2008.10820193 macfarlane, m. (2006). predictors of academic achievement in multilingual learners. unpublished master’s thesis, university of the witwatersrand, johannesburg. maree, j.g. (2010). brief overview of the advancement of postmodern approaches to career counseling. journal of psychology in africa, 20(3), 361–367. https://doi.org/10.1080/14330237.2010.10820387 maree, j.g. (2013). latest developments in career counselling in south africa: towards a positive approach. south african journal of psychology, 43(4), 409–421. https://doi.org/10.1177/0081246313504691 maree, j.g. (2014). geïntegreerde, kwalitatiewe en kwantitatiewe beroepsvoorligting en beroepskonstruksie vir’n aandagafleibare seun met tegniese belangstelling en aanleg lei tot positiewe resultate: oorspronklike navorsing. suid-afrikaanse tydskrif vir natuurwetenskap en tegnologie, 33(1), 1–11. https://doi.org/10.4102/satnt.v33i1.1183 maree, j.g. (2015). blending retrospect and prospect in order to convert challenges into opportunities in career counselling. in j.g. maree & a.d. fabio (eds.), exploring new horizons in career counselling: turning challenge into opportunities (pp. 3–24). boston, ma: sense publishers. maree, j.g. (2016). revitalising career counselling to foster career adaptability and resilience during change and turbulence: part 1. south african journal of higher education, 30(3), 1–5. https://doi.org/10.20853/30-3-664 maree, j.g. (2017). promoting career development in the early years of a person’s life through self-and career construction counselling (using an integrated, qualitative+ quantitative approach): a case study. early child development and care, 18(4), 437–451. https://doi.org/10.1080/03004430.2017.1365361 maree, j.g. (2018). advancing career counselling research and practice using a novel quantitative+qualitative approach to elicit clients’ advice from within. south african journal of higher education, 32(4), 149–170. https://doi.org/10.20853/32-4-2558 maree, j.g. (2019). group career construction counseling: a mixed-methods intervention study with high school students. the career development quarterly, 67(1), 47–61. https://doi.org/10.1002/cdq.12162 maree, j.g., & beck, g. (2004). using various approaches in career counselling for traditionally disadvantaged (and other) learners: some limitations of a new frontier. south african journal of education, 24(1), 80–87. maree, j.g., & morgan, b. (2012). toward a combined qualitative-quantitative approach: advancing postmodern career counselling theory and practice. cypriot journal of educational sciences, 7(4), 311–325. maree, j.g., & taylor, n. (2016). development of the maree career matrix: a new interest inventory. south african journal of psychology, 46(4), 462–476. https://doi.org/10.1177/0081246316641558 maree, j.g., gerryts, e.w., fletcher, l., & olivier, j. (2019). using career counselling with group life design principles to improve the employability of disadvantaged young adults. journal of psychology in africa, 29(2), 110–120. https://doi.org/10.1080/14330237.2019.1594646 mcdonald, e., & van eeden, r. (2014). the impact of home language on the understanding of the vocabulary used in the south african version of the sixteen personality factor questionnaire fifth edition. south african journal of psychology, 44(2), 228–242. https://doi.org/10.1177/0081246314522366 mcmahon, m. (2019). qualitative career assessment: a higher profile in the 21st century? in j.a. athanasou & h.n. perera (eds.), international handbook of career guidance (2nd ed., pp. 735–754). dordrecht: springer. mcmahon, m., watson, m., & lee, m.c. (2019). qualitative career assessment: a review and reconsideration. journal of vocational behavior, 110(part b), 420–432. https://doi.org/10.1016/j.jvb.2018.03.009 mcmahon, m., watson, m., & zietsman, l. (2018). adults changing careers through university education: making meaning of quantitative career assessment scores through an integrative structured interview. south african journal of industrial psychology, 44(1), 1–10. meiring, d., van de vijver, a.j.r., rothmann, s., & barrick, m.r. (2005). construct, item and method bias of cognitive and personality tests in south africa. south african journal of industrial psychology, 31(1), 1–8. https://doi.org/10.4102/sajip.v31i1.182 morgan, b. (2010). career counselling in the 21st century: a reaction article. journal of psychology in africa, 20(3), 501–503. https://doi.org/10.1080/14330237.2010.10820406 morgan, b., & de bruin, g.p. (2017). structural validity of holland’s circumplex model of vocational personality types in africa. journal of career assessment, 26(2), 275–290. https://doi.org/10.1177/1069072717692747 morgan, b., de bruin, g.p., & de bruin, k. (2015a). constructing holland’s hexagon in south africa: development and initial validation of the south african career interest inventory. journal of career assessment, 23(3), 493–511. https://doi.org/10.1177/1069072714547615 morgan, b., de bruin, g.p., & de bruin, k. (2015b). gender differences in holland’s circular/circumplex interest structure as measured by the south african career interest inventory. south african journal of psychology, 45(3), 349–360. https://doi.org/10.1177/0081246315572514 naidoo, a.v., visser, m., de wet, m., rabie, s., van schalkwyk, i., boonzaier, m., … & venter, c. (2019). a group-based career guidance intervention for south african high school learners from low-income communities. in j.g. maree (ed.), handbook of innovative career counselling (pp. 665–685). new york: springer. owen, k. (2000). manual for the differential aptitude test form l. pretoria: human sciences research council. patel, a.b. (2004). the influence of mode testing (computer based vs. paper & pencil) on anxiety and test performance. unpublished master’s dissertation. university of the witwatersrand, johannesburg. prinsloo, c.h., & ebersöhn, i. (2002). fair usage of the 16pf in personality assessment in south africa: a response to abrahams and mauer with special reference to issues of research methodology. south african journal of psychology, 32(3), 48–57. https://doi.org/10.1177/008124630203200307 puchert, j.i., dodd, n., & viljoen, k.l. (2017). secondary education as a predictor of aptitude: implications for selection in the automotive sector. south african journal of industrial psychology, 43(1), art. #1416, 13 pages. https://doi.org/10.4102/sajip.v43i0.1416 rabie, s., & naidoo, a.v. (2019). validating the adaptation of the first career measure in isixhosa: the south african career interest inventory–isixhosa version. south african journal of psychology, 49(1), 109–121. https://doi.org/10.1177/0081246318772419 schepers, j.m., & hassett, c.f. (2006). the relationship between the fourth edition (2003) of the locus of control inventory and the sixteen personality factor questionnaire (version 5). south african journal of industrial psychology, 32(2), 9–18. https://doi.org/10.4102/sajip.v32i2.234 van eeden, r., & de beer, m. (2013). assessment of cognitive functioning. in c. foxcroft & g. roodt (eds.), introduction to psychological assessment in the south african context (4th edn., pp. 147–170). oxford university press southern africa, cape town. van eeden, r., & mantsha, t.r. (2007). theoretical and methodological considerations in the translation of the 16pf5 into an african language. south african journal of psychology, 37(1), 62–81. https://doi.org/10.1177/008124630703700105 van eeden, r., taylor, n., & prinsloo, c.h. (2013). the sixteen personality factor questionnaire in south africa. in s. laher & k. cockcroft (eds.), psychological assessment in south africa: research and applications (pp. 203–217). wits university press, johannesburg. van wijk, c.h., & fourie, m. (2017). the appropriateness of using the self-directed search questionnaire in developing countries: a pilot study with south african navy divers. open journal of social sciences, 5(2), 60–69. https://doi.org/10.4236/jss.2017.52007 wallis, t., & birt, m. (2003). a comparison of native and non-native english-speaking groups’ understanding of the vocabulary contained within the 16pf (sa92). south african journal of psychology, 33(3), 182–190. https://doi.org/10.1177/008124630303300307 watson, m.b., foxcroft, c.d., & allen, l.j. (2007). tracking holland interest codes: the case of south african field guides. australian journal of career development, 16(2), 51–59. https://doi.org/10.1177/103841620701600208 watson, m., & mcmahon, m. (2013). qualitative career assessment in south africa. in s. laher & k. cockcroft (eds.), psychological assessment in south africa: research and applications (pp. 474–487). wits university press, johannesburg. watson, m., mcmahon, m., & longe, p. (2011). occupational interests and aspirations of rural black south african children: considerations for theory, research and practice. journal of psychology in africa, 21(3), 413–420. https://doi.org/10.1080/14330237.2011.10820475 watson, m.b., & stead, g.b. (2002). career psychology in south africa: moral perspectives on present and future directions. south african journal of psychology, 32(1), 26–31. https://doi.org/10.1177/008124630203200104 ajopa_v3_2021_contents.indd http://www.ajopa.org open access table of contents original research the efficacy of the senior south african individual scale revised in distinguishing between attention deficit hyperactivity disorder, normal and sluggish cognitive tempo children leila abdool gafoor, alban burke, jean fourie african journal of psychological assessment | vol 3 | a45 | 29 july 2021 original research assessing psychological well-being measures among south african adults in the birth to twenty plus cohort feziwe mpondo, charlotte wray, shane a. norris, aryeh d. stein, alan stein, linda m. richter african journal of psychological assessment | vol 3 | a44 | 16 august 2021 original research the development and validation of a graduate leader competency questionnaire: arguing the need for a graduate leader performance measure jacques s. pienaar, carl c. theron african journal of psychological assessment | vol 3 | a61 | 17 september 2021 original research measures of executive functions predicting attention-deficit/hyperactivity disorder core symptoms tshikani t. boshomane, basil pillay, anneke meyer african journal of psychological assessment | vol 3 | a48 | 22 october 2021 original research establishing the content validity of an online depression screening tool for south africa tasneem hassem african journal of psychological assessment | vol 3 | a62 | 26 october 2021 41 48 55 68 78 page i of i table of contents editorial advancing psychological assessment in africa: contributions from the african journal of psychological assessment sumaya laher african journal of psychological assessment | vol 3 | a88 | 15 november 2021 original research over reliance on model fit indices in confirmatory factor analyses may lead to incorrect inferences about bifactor models: a cautionary note tyrone b. pretorius african journal of psychological assessment | vol 3 | a35 | 12 march 2021 original research the manifestation of the 10 personality aspects amongst the facets of the basic traits inventory xander van lill, nicola taylor african journal of psychological assessment | vol 3 | a31 | 30 march 2021 original research validation of the emotional social screening tool for school readiness erica munnik, emma wagener, mario smith african journal of psychological assessment | vol 3 | a42 | 21 june 2021 original research usefulness of the english version of the stress overload scale in a sample of employed south africans charles h. van wijk african journal of psychological assessment | vol 3 | a41 | 25 june 2021 original research measuring social well-being in africa: an exploratory structural equation modelling study itumeleng p. khumalo, ufuoma p. ejoke, kwaku oppong asante, janvier rugira african journal of psychological assessment | vol 3 | a37 | 28 june 2021 1 3 7 17 26 34 vol 3 (2021) issn: 2707-1618 (print) | issn: 2617-2798 (online)african journal of psychological assessment editorial psychological assessment during and after the covid-19 pandemic justin o. august, solomon mashegoane african journal of psychological assessment | vol 3 | a74 | 02 september 2021 original research shifting assessment practices in the age of covid-19 kim e. dowdeswell, hennie j. kriek african journal of psychological assessment | vol 3 | a50 | 28 may 2021 original research covid-19 and psychological assessment teaching practices – reflections from a south african university erica munnik, mario smith, leigh adams tucker, wilmien human african journal of psychological assessment | vol 3 | a40 | 07 april 2021 88 91 95 original research the impact of covid-19 on psychometric assessment across industry and academia in south africa mandy wigdorowitz, pakeezah rajab, tasneem hassem, neziswa titi african journal of psychological assessment | vol 3 | a38 | 31 march 2021 original research psychometric properties of the fear of covid-19 scale amongst black south african university students malose makhubela, solomon mashegoane african journal of psychological assessment | vol 3 | a57 | 23 july 2021 reviewer acknowledgement african journal of psychological assessment | vol 3 | a94 | 21 december 2021 103 108 114 vol 3 (2021) special collection: psychological assessment during and post the covid-19 pandemic, sub-edited by solomon mashegoane (university of limpopo, south africa) and justin august (nelson mandela university, south africa) abstract introduction method results discussion conclusion acknowledgements references about the author(s) xander van lill department of industrial psychology and people management, college of business and economics, university of johannesburg, johannesburg department of product and research, jvr africa group, johannesburg, south africa anneke stols department of product and research, jvr africa group, johannesburg, south africa department of psychology, faculty of social sciences, university of east anglia, norwich, united kingdom pakeezah rajab department of product and research, jvr africa group, johannesburg, south africa department of psychology, faculty of humanities, university of pretoria, pretoria jani wiggett department of product and research, jvr africa group, johannesburg, south africa citation van lill, x.v., stols, a., rajab, p., & wiggett, j. (2023). the validity of a general factor of emotional intelligence in the south african context. african journal of psychological assessment, 5(0), a123. https://doi.org/10.4102/ajopa.v5i0.123 original research the validity of a general factor of emotional intelligence in the south african context xander van lill, anneke stols, pakeezah rajab, jani wiggett received: 16 oct. 2022; accepted: 12 dec. 2022; published: 23 mar. 2023 copyright: © 2023. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract emotional intelligence (ei) plays an important role in the prediction of important work-related outcomes, such as work performance. southern african scholars frequently deploy total scores of ei without considering its hierarchical structure. this study investigated the presence of a general factor, as manifested among the subscales of the eq-i 2.0, using an archival dataset of 16 581 employees in southern africa. orthogonal first-order, single-factor, higher-order, oblique lower-order and bifactor models were specified to investigate the hierarchical structure of ei. the evidence supports the notion that a total score could be calculated for ei based on the eq-i 2.0. a total ei score also appears to be predictive of employees’ individual work performance, as measured by their managers. it might, therefore, be practically meaningful for practitioners to calculate or use a total score when making selection decisions about employees based on the eq-i. 2.0. contribution: the findings of the present study offer insights into the theoretical and empirical structure of ei based on statistical techniques that have not been used on the construct in the southern african context. concurrent validity evidence further provides additional support that an overall quantitative score, based on the eq-i. 2.0, has utility in hiring practices, where the aim is to predict future work performance. keywords: emotional quotient inventory 2.0; trait-based emotional intelligence; general factor; individual work performance; employee selection. introduction many jobs place high emotional demands on employees; for example, managers or health care workers might be required to – on a daily basis – accurately perceive, understand and regulate their own emotions in the service of fellow employees or patients (glomb et al., 2004). employees’ willingness to expend emotional labour or manage their feelings to project a certain public display is becoming increasingly important to performance in many job roles and, ultimately, the social functioning of human enterprise. the relationship between emotional intelligence (ei) and valued work-related outcomes, such as job performance, is well established. joseph and newman (2010) reported a correlation of 0.47 between ei and job performance but found a reduced correlation of 0.29 after further refinements in their meta-analytical study (joseph et al., 2015). a recent meta-analytical study conducted by sackett et al. (2021) revealed that trait-based ei and cognitive ability appeared to be equally relevant predictors of job performance, both with an estimated validity coefficient of 0.30 and 0.31 respectively. a host of studies on the predictive validity of ei has been conducted in south africa. the most recent study, conducted by sloan and geldenhuys (2021), investigated the moderating effect of self-focused ei in predicting managers’ in-role and extra-role performance, which yielded positive, significant correlations with ei of 0.27 and 0.32, respectively. nel and de villiers (2004) reported an even higher positive and significant correlation of 0.53 between overall ei and job performance in the call centre environment. in contrast to the findings of sloan and geldenhuys (2021) and nel and de villiers (2004), hayward et al. (2008) reported a non-significant and negligible effect of ei on job performance for managers in a parastatal. however, hayward et al. (2008) attributed the negligible effect to the limited variance in the performance variable. while south african research on the predictive validity of ei looks promising, there is international debate regarding the legitimacy of using a general factor of ei, as performed by nel and de villiers (2004) and hayward et al. (2008), when predicting job performance. a point of concern expressed includes limited investigation of the hierarchical structure of ei before a general score is calculated and used to predict work-related outcomes (dasborough et al., 2021). this problem appears to be endemic to studies conducted in south africa, with very few investigations conducted on the hierarchical structure of ei before inspecting the predictive validity of a general score of ei. van zyl (2014) conducted the only identifiable study in south africa that inspected a higher-order model for ei, but did not find evidence, in terms of model-data fit based on a confirmatory factor model, to support a general factor of ei. since the study conducted by van zyl (2014), specific factor analytical procedures have been recommended by credé and harms (2015), which might shed some additional light on the hierarchical structure of ei in south africa. however, before the hierarchical structure of ei is addressed, more attention needs to be paid to the theoretical structure of ei in this study. the theoretical structure of emotional intelligence emotional intelligence is conceptualised differently across various measures, that is, different models are used in its measurement. these models include (1) ability-based ei measures, like the mayer-salovey-caruso emotional intelligence test (msceit) (mayer et al., 2003), (2) self-report (or peer-report) ei measures based on the same theoretical model as the msceit (mayer et al., 2003) and (3) measures of mixed ei models, which extend beyond the theoretical model included in the aforementioned two categories (ashkanasy & daus, 2005; dasborough et al., 2021). such mixed ei measures are considered assorted, because they include various items phrased similarly to those used in measuring personality and behavioural preferences (ashkanasy & daus, 2005). in contrast to the ability-based ei measured by the msceit (mayer et al., 2003), the self-report ei measures from the two latter models are often termed ‘trait-based ei’, as it has to do with individuals’ perceptions of their own emotional skills (joseph et al., 2015; petrides et al., 2016). all three of these models include scientifically sound ei-related constructs that could greatly contribute to what we know about work performance (dasborough et al., 2021; joseph et al., 2015). within the categories of ability-based ei and trait/mixed ei, the two theoretical models that have received a lot of attention when exploring the relationships between ei and job performance were the models that underpin the msceit (mayer et al., 2003) and the bar-on emotional quotient inventory (joseph et al., 2015). this article focuses on the revised version of the latter model, namely the eq-i 2.0 model (wiechorek, 2011). the eq-i 2.0 model was established based on 25 years of research on the different aspects that constitute ei, including how these interrelate. the eq-i 2.0 model has a multidimensional structure that provides an indication of a person’s total ei, which is contextualised by the different facets believed to underlie ei. the total ei score reflects a ‘snapshot’ of a person’s overall ei and can be defined as (wiechorek, 2011): [a] set of emotional and social skills that influence the way we perceive and express ourselves, develop and maintain social relationships, cope with challenges, and use emotional information in an effective and meaningful way. (p. 49) this broad definition also speaks to the different facets included in the eq-i 2.0, which encompass 15 constructs that can be collapsed into five comparable categories or composites. a visual depiction of the theoretical structure of the eq-i 2.0, including the proposed general factor, is presented in figure 1. figure 1: general, composite and subscale dimensions of the eq-i 2.0. the meaning of the composites and subscales’ measures, as demonstrated in figure 1, is described in the wiechorek (2011) user’s handbook. the first three subscales, noted below, collectively quantify an individual’s self-perception, which describes how people see themselves: 1. self-regard: having self-respect and confidently accepting one’s gifts and flaws. 2. self-actualisation: consistently working to better oneself or to reach goals of importance. 3. emotional self-awareness: understanding one’s emotions and how they affect oneself. the second, the self-expression composite scale, considers how people express their inner perception of themselves, which is jointly portrayed through the subscales noted below: 4. emotional expression: to share one’s emotions in a constructive way. 5. assertiveness: to respectfully communicate one’s feelings and views. 6. independence: relying on oneself and not depending on others emotionally. the interpersonal composite scale considers the nature of the relationships that people build with others. the subscales that collectively inform this composite are: 7. interpersonal relationships: to form and uphold trusting relationships that are agreeable to all parties. 8. empathy: recognising and understanding others’ feelings and showing concern for them. 9. social responsibility: being socially conscious and helpful towards others in the community. the decision making composite scale considers how people use emotional information to effectively make decisions, which is jointly established through: 10. problem-solving: understanding how one’s emotions impact decisions and solving problems despite these emotions. 11. reality testing: to be aware of the reality of a situation and display one’s objectivity. 12. impulse control: being able to withstand an urge to act or make rash decisions. the stress management composite scale measures peoples’ ability to deal with stressors, utilising multiple coping strategies and showing resilience despite setbacks and incorporates: 13. stress tolerance: to cope with and manage stressful situations to achieve a positive outcome. 14. flexibility: to flexibly adapt one’s actions, feelings and thoughts to change. 15. optimism: to remain positive and resilient despite facing obstacles. the scales, measured at every level of the eq-i 2.0, were designed with a specific function in mind. the subscales serve to provide a more foundational understanding of employees’ relative strengths and weaknesses, which is valuable for development purposes. compared to a general factor, subscales are qualitatively more meaningful during psychometric feedback. psychometric feedback on subscales provides the opportunity to suggest actionable steps that employees could take to increase their overall ei at work. an ei total, or even a composite ei score, might be perceived as too ambiguous and less meaningful from a development perspective. by contrast, the composite and overall dimensions of the structure might provide more encompassing, and therefore also more consistent, dimensions that can be utilised for selection purposes (wiechorek, 2011). a study conducted by van zyl (2014) supported the existence of the composites, as set out in figure 1. van zyl (2014) was, however, unable to find support for a general factor of ei among the subscales of the eq-i 2.0. however, van zyl (2014) did not inspect the recently suggested sequence for the inspection of hierarchical structure (credé & harms, 2015), which was explored in the current study. the present researchers were particularly interested in the general factor of ei as based on subscales of the eq-i 2.0 and specified a bifactor model. the theoretical structure of individual work performance van lill and taylor’s (2022) framework underlying the individual work performance review (iwpr) was utilised to conceptualise and measure performance in the present study. five broad performance dimensions are differentiated by van lill and taylor (2022), including in-role, extra-role, adaptive, leadership and counterproductive performance. according to van lill and taylor (2022, pp. 3–5): in-role performance refers to: ‘actions that are official or known requirements for employees (carpini et al., 2017; motowidlo & van scotter, 1994). these behaviours could be viewed as the technical core (borman & motowidlo, 1997) that employees must demonstrate to be perceived as proficient and able to contribute to the achievement of organisational goals’ (carpini et al., 2017). extra-role performance refers to: ‘futureor change-orientated acts (carpini et al., 2017), aimed at benefitting co-workers and the team (organ, 1997), that are discretionary or not part of the employee’s existing work responsibilities’ (borman & motowidlo, 1997). adaptive performance relates to: ‘employees’ demonstration of the ability to cope with and effectively respond to crises or uncertainty’ (carpini et al., 2017; pulakos et al., 2000). leadership performance refers to: ‘the effectiveness with which an employee can influence co-workers to achieve collective goals’ (campbell & wiernik, 2015; hogan & sherman, 2020; yukl, 2012). counterproductive performance reflects on the: ‘intentional or unintentional acts (spector & fox, 2005) by an employee that negatively affect the effectiveness with which an organisation achieves its goals and cause harm to its stakeholders’ (campbell & wiernik, 2015; marcus et al., 2016). each of the five broad performance dimensions is represented by four narrow performance dimensions, as shown in figure 2. evidence in support of the five factors and definitions of the narrow dimensions can be obtained from van lill and taylor (2022). as portrayed in figure 2, it is theorised that a general factor stands at the apex of all the performance dimensions identified in the iwpr. van lill and van der vaart (2022) found that a general factor explained a similar amount of variance in south africa as that reported by viswesvaran et al.’s (2005) meta-analytical study. the present study focused on the predictive validity of total ei for general and broad dimensions of individual work performance. figure 2: broad and narrow dimensions of the individual work performance review. predictive validity of a general factor of emotional intelligence the evidence suggests that the criterion validity of ei is replicable in the south african context (nel & de villiers, 2004; sloan & geldenhuys, 2021). however, it is less clear what specific work-related behaviours ei predicts when compared to more established international scientific literature. performance is a multidimensional construct, and van lill and taylor (2022) suggest a five-factor model for individual behaviours at work, namely in-role, extra-role, adaptive, leadership and counterproductive performance. the only local evidence that differentiated between performance dimensions was the study conducted by sloan and geldenhuys (2021), which focused on both in-role and extra-role performance as work-related outcomes. meta-analytical evidence to date suggests that ei is predictive of in-role performance, also referred to as ‘task performance’ (joseph et al., 2015; sackett et al., 2021). emotional self-regulation is more frequently recognised as a core part of functioning in social enterprises and, therefore, the ability to succeed at essential tasks (joseph et al., 2015). there is also evidence in support of the relationship between ei and extra-role performance. employees with higher ei might be more empathetic and prosocial, which could, in turn, lead to greater displays of extra-role performance or actions aimed at doing more than what is required by their job descriptions (miao et al., 2017). emotional intelligence might also assist employees in better coping with negative emotions that arise from interpersonal strain or frustrating tasks. consequently, they are less inclined to engage in deviant intrapersonal or interpersonal behaviours that undermine collective goals (miao et al., 2017). it further appears that ei could assist individuals in coping with the strain associated with the complexities of change at work. in this respect, ei could help individuals to downregulate negative emotions in response to uncertainty and help them increase positive feelings, in order to stay focussed on solutions in response to change. emotional intelligence is, therefore, also argued to be related to adaptive performance (yang et al., 2022). finally, ei might translate into greater self-confidence, self-awareness and empathy, which are essential components of interpersonal influence and, therefore, leadership performance (harms & credé, 2010). an overview of the evidence presented suggests that ei is likely to be related to all five performance dimensions, namely in-role, extra-role, adaptive, leadership and counterproductive performance. the eq-i 2.0 has been used across multiple industries in the united states of america, with accumulating evidence of the instrument’s utility in differentiating between highand low-performing individuals (stein & book, 2011). however, limited research has been conducted on the predictive validity of the general factor of the eq-i 2.0 in south africa, which was one of the areas of focus of the current study. research objective and hypotheses the objective of this study was to determine the structural and criterion validity of a general dimension of ei in the eq-i 2.0 assessment. based on the current evidence reported in the present study, the following hypotheses were formulated: study 1 h1: the general factor of ei explains covariance between the items, independent of the covariance that the 15 facets explain in the same set of items. study 2 h2: general ei has a significant positive effect on overall job performance. h2a: general ei has a significant positive effect on in-role performance. h2b: general ei has a significant positive effect on extra-role performance. h2c: general ei has a significant positive effect on adaptive performance. h2d: general ei has a significant positive effect on leadership performance. h2e: general ei has a significant negative effect on counterproductive performance. method participants for study 1, a sample of 16 581 working adults living in southern africa was obtained via an online platform. the mean age of the respondents was 37.94 years (standard deviation [sd] = 8.86). most respondents self-identified as male (n = 9427; 57%), followed by females (n = 7154; 43%). the sample further included individuals who indicated their ethnicity as follows: black (n = 6755; 41%), white (n = 4915; 30%), coloured (mixed ancestry; n = 1675; 10%) and indian or asian (n = 1838; 11%). the researchers computed the power for the test model (degrees of freedom [df]= 6667), based on the computer software developed by preacher and coffman (2006). the models returned a value of unity, which suggested that an incorrect model would be correctly rejected (α = 0.05; null root mean square error of approximation [rmsea] = 0.05; alternative rmsea = 0.08). for study 2, a total of 108 performance ratings of south african employees, who were also administered the eq-i 2.0, were completed by managers in two participating organisations, selected via a census or stratified sampling strategy. the sample represented the finance and professional services sectors. the mean age of employees was 38.88 years (sd = 7.78 years). most of the employees self-identified as white (n = 65; 60%), followed by black african (n = 18; 17%), indian (n = 16; 15%), coloured (individuals of mixed ancestry; n = 8; 7%) and asian (1; 1%). the sample comprised more women (n = 65; 60%) than men (n = 43; 40%). most of the employees were registered professionals (n = 53; 49%), followed by low-level managers (25; 23%), mid-level managers (n = 16; 15%), skilled employees (n = 12; 11%) and top-level managers (2; 2%). the present researchers inspected the statistical power required for linear bivariate regression by using g*power (faul et al., 2007). the calculation suggested that 64 participants should be sufficient (α = 0.05; power = 0.80) to detect a slope of 0.30, per prior meta-analytical validity estimates reported (sackett et al., 2021). the present sample was roughly double the recommended size based on this calculation. instruments the eq-i 2.0 assessment consists of 133 items; that is an average of eight items in each of the 15 subscales (i.e. three subscales per composite). eight items contribute to a well-being indicator, also referred to as the ‘happiness scale’, while seven of these items are also used as a validity measurement. the eq-i 2.0 model comprises a 1-5-15 structure, where the 15 subscales underlie the five composites that all contribute to one total ei ‘snapshot’ (see figure 1). a five-point likert-type frequency scale provides the response options for each item. the response options range from 1 = never/rarely to 5 = almost always/always, with a qualitative interpretation guide connected to the meaning of each option (wiechorek, 2011). the internal consistency reliabilities for the south african sample on most of the eq-i 2.0 subscales were satisfactory (α and ω ≥ 0.71). only one subscale had an unsatisfactory internal consistency reliability coefficient of 0.66, namely independence. however, this reliability coefficient was still considered marginally acceptable (see table 1). table 1: descriptive statistics for the eq-i 2.0 subscale factors. the iwpr (van lill & taylor, 2022) consists of 80 items (4 items for each of the 20 narrow performance dimensions) that cover five factors, namely in-role performance, extra-role performance, adaptive performance, leadership performance and counterproductive performance (van lill & taylor, 2022). per the guidelines of aguinis (2019), each item was measured using a five-point behavioural frequency scale. word anchors defined the extreme points of each scale, namely, 1 = never demonstrated and 5 = always demonstrated. qualitative interpretation of numeric values between the extreme points is provided, to better approximate an interval rating scale, namely 2 = rather infrequently demonstrated, 3 = demonstrated some of the time and 4 = quite often demonstrated. van lill and taylor (2022) demonstrated the internal consistency reliability of all the narrow dimensions of the iwpr (α and ω ≥ 0.83). procedure the data on the eq-i 2.0 (n = 16 581) were collected as part of several archival projects, for different client projects on the jvr online platform. data were collected via online assessments for either selection or development purposes. a concurrent set of data was separately collected by asking managers of the 108 employees, who simultaneously completed the eq-i 2.0, to rate their employees’ performance. a study conducted by van lill and van der merwe (2022) revealed that employees greatly inflate self-ratings on the iwpr (van lill & taylor, 2022) when compared to managerial ratings, because of leniency bias. managers might therefore provide a more conservative and accurate estimate of work performance (van lill & van der merwe, 2022). managerial ratings also come with the added benefit of reduced method bias, because of another rating source used in addition to self-ratings on the eq-i 2.0 (podsakoff et al., 2012). data analysis study 1: confirmatory factor analysis confirmatory factor analysis (cfa) was performed using version 0.6–12 of the lavaan package (rosseel, 2012; rosseel et al., 2022) in r (r core team, 2016) to first inspect the inter-factor correlations between all the narrow ei factors, whereafter the hierarchical factor structure of the broad performance factors was investigated. a prior study conducted a higher-order factor analysis to inspect the general factor of ei in the eq-i 2.0 (van zyl, 2014). recent best practice guidelines recommend testing a sequence of five models before the presence of hierarchical structure of a psychometric measure is confirmed or refuted, namely (1) orthogonal first-order, (2) single-factor, (3) higher-order, (4) oblique lower-order and (5) bifactor models (credé & harms, 2015). a visual example of the different factor models is portrayed for composite stress management in figure 3. single-factor models (all items load on one factor) and orthogonal first-order models (factor models with uncorrelated lower-order factors) represent parsimonious models, and, if these models display greater fit, it might discredit the existence of hierarchical structure in the data. by contrast, better fit for lower-order (factor models with correlated factors), higher-order (items load on lower-order factors, which, in turn, load onto second-order factors) and bifactor models (items are specified simultaneously on uncorrelated firstand second-order factors) provides more support for hierarchical structure (credé & harms, 2015). figure 3: factor structure of stress management (a) orthogonal first-order model, (b) single-factor model, (c) higher-order model, (d) oblique lower-order model, (e) bifactor model. first-order factors, as specified in higher-order models, mediate the relationship between manifest variables and second-order factors and, therefore, do not explain unique variance in the manifest variables over and above the ei subscales (beaujean, 2014; mcabee et al., 2014). bifactor models differ in this respect by accounting for the unique variance that a general factor explains in the manifest variables, beyond the variance explained by the uncorrelated lower-order ei subscales (beaujean, 2014; mcabee et al., 2014). bifactor models were, therefore, used to test the existence of a general factor in ei, as manifested among the subscales of the eq-i 2.0. diagonally weighted least squares (dwls) estimation, with robust standard errors, was performed to inspect the hierarchical factor structure of ei (distefano & morgan, 2014; li, 2016). this method provides accurate estimates when larger samples are used (n > 500) when the data are multivariate non-normal, and is less sensitive when the data are based on an ordinal rating scale with five or more categories (distefano & morgan, 2014; li, 2016). the multivariate skewness (1234726.00; p < 0.001) and kurtosis (1200.95, p < 0.001) for the entire set of 118 items suggested that the data were non-normally distributed. model-data fit was considered acceptable if the rmsea and standardised root mean squared residual (srmr) were ≤ 0.08 (brown, 2015; browne & cudeck, 1992), and the comparative fit index (cfi) and tucker–lewis index (tli) were > 0.95 (brown, 2015; hu & bentler, 1999). even when cfis display a marginally good fit to the data (cfi and tli in the range of 0.90 to 0.95), models might still be considered to display an acceptable fit if other indices (srmr and rmsea) are also within the acceptable range (brown, 2015). study 2: regression analysis the researchers conducted separate linear regressions by means of the lm function in r (r core team, 2016). for the different models, a summed raw total ei score was regressed on a general dimension, as well as separate broad dimensions, of individual work performance. ethical considerations the current study was low in risk, but precautions were taken to ensure that participation was voluntary and anonymous, that no harm was caused, that the questions were filled in truthfully and that informed consent was given to use the results for research purposes. ethical clearance to conduct this study was obtained from the university of johannesburg department of industrial psychology and people management research ethics committee (reference number: ippm-2022-598). results study 1: confirmatory factor analysis the mean item score and sd for each subscale of the eq-i 2.0, along with the alpha and omega reliability estimates and standardised inter-factor correlations, are reported in table 1. the inter-factor correlations were obtained by conducting an oblique lower-order confirmatory factor model. the fit statistics for the oblique lower-order confirmatory factor model of the entire eq-i 2.0 (χ2 [df] = 254827.62 [6680]; cfi = 0.97; tli = 0.97; srmr = 0.05; rmsea = 0.05 [0.05; 0.05]) were satisfactory (brown, 2015). the oblique lower-order model, which fit statistics are also reported in table 2, allows group factors to covary and enables an inspection of the inter-factor correlations between the dimensions for descriptive purposes. table 2: fit statistics of different eq-i 2.0 factor models. the size of the relationships reported in table 1 was mostly in the medium-to-large range, alluding to the possibility of a general factor. however, 95% of the standardised upper limit inter-factor correlations (ul) were below the cut-off recommended by rönkkö and cho (2020), namely ul < 0.80, suggesting a fair degree of discriminant validity at the subscale level. in a select few cases (5% of ul), insufficient evidence for discriminant validity existed. however, the lower levels of discriminant validity could be attributable to established theoretical relationships that have been reported between the scales in the past (rönkkö & cho, 2020; wiechorek, 2011). table 1 also contains the inter-item consistency reliabilities. all the subscales obtained coefficient alpha ordinal and mcdonald’s omega values (α and ω ≥ 0.71) above the recommended threshold (cortina, 1993; cortina et al., 2020). only the independence subscale had an ω-value of 0.66, which was still accepted, as the α ordinal value was 0.77. these results suggest that the subscales reliably measure the respective scale constructs. the fit of different factor models proposed by credé and harms (2015) was subsequently investigated to determine whether a general factor or alternative configurations explained the covariances between the subscale dimensions of the eq-i 2.0. the different models are reported in table 2. the results reported in table 2 indicated that the more parsimonious models, namely the orthogonal first-order and single-factor models, displayed a weaker fit to the data (credé & harms, 2015). by contrast, the more complex models (oblique lower-order, higher-order and bifactor models) displayed superior fit, supporting the existence of hierarchical structure in the data. the present study focussed on the manifestation of a general factor of ei and, therefore, necessitated a further inspection of the satisfactory fitting bifactor model, instead of the slightly better fitting oblique lower-order factor model. it is recommended that bifactor statistical indices be calculated to determine the practical meaningfulness of general versus group factors in a bifactor analysis (rodriguez et al., 2016a, 2016b), such as the explained common variance (ecv), coefficient omega hierarchical (ωh), construct replicability (h), factor determinacy (fd), percentage uncontaminated correlations (puc) and relative percentage bias (arpb). group factors are considered plausible when ωh, h and fd2 are > 0.50, 0.70 and 0.70, respectively (dueber, 2017; reise et al., 2013). explained common variance for the general factor > 0.70 and puc > 0.80 are indicative of unidimensionality (reise et al., 2013). relative percentage bias below 10% – 15% indicates little difference in the factor loadings between a single-factor model and the general factor in a bifactor model (rodriguez et al., 2016a). bifactor statistical indices were calculated using version 0.2.0 of the bifactor indices calculator package (dueber, 2020) in r (r core team, 2016). the bifactor statistical indices are reported in table 3. table 3: bifactor statistical indices for eq-i 2.0 subscale factors. the general ei factor accounted for over half of the common variance. the ecv > 0.50, including the high puc > 0.80, suggests the presence of a strong general factor (reise et al., 2013). these results were further supported by the arpb value of 5%, which was well below the 10% – 15% mark, indicating no serious concern of measurement bias if the model was treated as unidimensional (rodriguez et al., 2016a). the subscale values all had a large difference between the ω and ωh values, with ωh ranging from 0.14 to 0.54. this normally suggests that the subscales mostly do not add additional unique variance, and that the factor model is unidimensional. however, morin (2023) cautions against the use of ω and ωhs, as both tend to underestimate the reliability of group factors. the general trend of the fd and h coefficients, as well as the evidence of the discriminant validity of the group factors presented in table 1, suggests that the subscales in the eq-i 2.0 add additional interpretive value for development purposes (dueber, 2017; reise et al., 2013). the subscales also still explain the remaining 42% of the variance in the items. consequently, at a cursory view of the hierarchical structure and without discounting the value of the subscales, it can be argued that a general ei factor exists, which supports hypothesis 1. study 2: regression analysis linear regressions were conducted to determine the relationship between overall ei and individual work performance. the regression coefficients of the different relationships are reported in table 4. table 4: total emotional intelligence regressed on general and broad dimensions of individual work performance. in comparison to meta-analytical evidence, the validity estimate (r = 0.39) for general performance appears to be slightly higher in the present study compared to other meta-analytical estimates, namely ρ = 0.29 (joseph et al., 2015) and ρ = 0.30 (sackett et al., 2021). however, the estimate was still in the same direction and, similar to prior findings, moderate in size. hypothesis 2 was, therefore, confirmed. the correlations for the broad dimensions of performance also appeared mostly moderate in size (r = 0.25 to 0.42; m = 0.35) and in the theorised directions. therefore, hypotheses 2a to 2e could also be confirmed. the regression coefficient for leadership performance appeared to be pronounced, which could be attributed to the high emotional labour often associated with interpersonal influence (glomb et al., 2004). discussion study 1 supported the existence of a strong general factor of ei, which explained 58% of the common variance. practically, the findings suggest that a total score of ei, based on the eq-i 2.0, could be calculated. a total score might enable scientists to include an overall ei score in regression analyses. a total score could also aid practitioners in including an overall ei score in selection decisions, especially when other psychometric results must be considered simultaneously. however, a nuanced interpretation based on the subscales still has merit, as it further ‘colours’ a person’s strengths and weaknesses, especially for use in development feedback and work-based counselling. an overall ei score might come across as more ambiguous and less actionable from a development perspective, compared to a more nuanced interpretation (wiechorek, 2011). in terms of study 2, total ei appeared to have a pattern of relations with performance that replicates international results on the predictive validity of ei. employees who display a fair degree of ei might be valued by supervisors as stable, well-functioning individuals and therefore be perceived as high performers. more specifically, ei could be a valuable psychological resource in uncertain conditions, such as the strain placed on employees during the coronavirus disease 2019 pandemic (moroń & biolik-moroń, 2021). in such conditions, based on the findings of the present study, ei might assist employees in adapting to change (adaptive performance) and initiate the necessary interpersonal influence (leadership performance) to ensure that organisational goals are achieved, whether within or away from the office setting. the predictive validity of total ei for one of the most valued work-related outcomes in psychology, namely individual work performance (campbell & wiernik, 2015), gives credence to the utility of such a score in employee selection processes. limitations and recommendations for future research mean group gender differences have been reported for ei in the past, with scholars suggesting that men might be adversely affected in selection procedures if a total score of ei is used in isolation to make a hiring decision (joseph & newman, 2010). multi-group cfas suggest that composite scales in the eq-i 2.0 are invariant across gender groups. an inspection of mean group differences further revealed only small effect size differences (stols & van lill, 2022). further research could inspect the invariance and mean group gender differences of a general dimension of ei based on the eq-i 2.0 in the southern african context. the predictive validity of total ei was mainly based on professional and managerial staff, many of whom were employed in the financial and health sector, for whom high ei might be a more pronounced occupational requirement. most other southern african studies appeared to have sampled managerial employees, and future studies could inspect whether these relationships are replicable for other job families in south africa, such as skilled/semiskilled, clerical, military and law enforcement jobs. conclusion prior studies in south africa frequently used total scores on ei to predict work-related outcomes without considering the hierarchical structure of the measure (hayward et al., 2008; nel & de villiers, 2004; sloan & geldenhuys, 2021). this study inspected and presented evidence in favor of the calculation of a general factor of ei based on the eq-i 2.0. the evidence further suggests that the total ei score, in accordance with the findings of nel and de villiers (2014) and sloan and geldenhuys (2021), yields meaningful validity estimates for work-related outcomes, such as work performance. consequently, using the total ei score in a report might be meaningful when, for example, selection decisions need to be made about employees. acknowledgements we would like to thank our colleagues at jvr africa group for enriching our understanding, through conversation, on the structure and predictive validity of emotional intelligence. competing interests all four authors are employees of jvr africa group, which is the distributor of the eqi 2.0 in south africa. authors’ contributions x.v.l., a.s., p.r. and j.w. developed the conceptual framework and devised the method. x.v.l and a.s. analysed the data. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability coefficients based on the bifactor analysis are available upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references aguinis, h. (2019). performance management (4th ed.). chicago business press. ashkanasy, n.m., & daus, c.s. (2005). rumors of the death of emotional intelligence in organizational behavior are vastly exaggerated. journal of organizational behavior, 26(4), 441–452. https://doi.org/10.1002/job.320 beaujean, a.a. (2014). latent variable modeling using r: a step-by-step guide. routledge. borman, w.c., & motowidlo, s.j. (1997). task performance and contextual performance: the meaning for personnel selection research. human performance, 10(2), 99–109. https://doi.org/10.1207/s15327043hup1002_3 brown, t.a. (2015). confirmatory factor analysis for applied research (2nd ed.). the guilford press. browne, m.w., & cudeck, r. (1992). alternative ways of assessing model fit. sociological methods & research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005 campbell, j.p., & wiernik, b.m. (2015). the modeling and assessment of work performance. annual review of organizational psychology and organizational behavior, 2(1), 47–74. https://doi.org/10.1146/annurev-orgpsych-032414-111427 carpini, j.a., parker, s.k., & griffin, m.a. (2017). a look back and a leap forward: a review and synthesis of the individual work performance literature. academy of management annals, 11(2), 825–885. https://doi.org/10.5465/annals.2015.0151 cortina, j.m. (1993). what is coefficient alpha? an examination of theory and applications. journal of applied psychology, 78(1), 98–104. https://doi.org/10.1037/0021-9010.78.1.98 cortina, j.m., sheng, z., keener, s.k., keeler, k.r., grubb, l.k., schmitt, n., tonidandel, s., summerville, k.m., heggestad, e.d., & banks, g.c. (2020). from alpha to omega and beyond! a look at the past, present, and (possible) future of psychometric soundness in the journal of applied psychology. journal of applied psychology, 105(12), 1351–1381. https://doi.org/10.1037/apl0000815 credé, m., & harms, p.d. (2015). 25 years of higher-order confirmatory factor analysis in the organizational sciences: a critical review and development of reporting recommendations. journal of organizational behavior, 36(6), 845–872. https://doi.org/10.1002/job.2008 dasborough, m.t., ashkanasy, n.m., humphrey, r.h., harms, p.d., credé, m., & wood, d. (2021). does leadership still not need emotional intelligence? continuing ‘the great ei debate’. the leadership quarterly, 33(6), 101539. https://doi.org/10.1016/j.leaqua.2021.101539 distefano, c., & morgan, g.b. (2014). a comparison of diagonal weighted least squares robust estimation techniques for ordinal data. structural equation modeling, 21(3), 425–438. https://doi.org/10.1080/10705511.2014.915373 dueber, d.m. (2017). bifactor indices calculator: a microsoft excel-based tool to calculate various indices relevant to bifactor cfa models. uknowledge, university of kentucky. dueber, d.m. (2020). bifactor indices calculator. retrieved from https://cran.r-project.org/web/packages/bifactorindicescalculator/bifactorindicescalculator.pdf faul, f., erdfelder, e., lang, a.-g., & buchner, a. (2007). g*power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. behavior research methods, 39(2), 175–191. https://doi.org/10.3758/bf03193146 glomb, t.m., kammeyer-mueller, j.d., & rotundo, m. (2004). emotional labor demands and compensating wage differentials. journal of applied psychology, 89(4), 700–714. https://doi.org/10.1037/0021-9010.89.4.700 harms, p.d., & credé, m. (2010). emotional intelligence and transformational and transactional leadership: a meta-analysis. journal of leadership and organizational studies, 17(1), 5–17. https://doi.org/10.1177/1548051809350894 hayward, b.a., baxter, j., & amos, t.l. (2008). employee performance, leadership style and emotional intelligence: an exploratory study in a south african parastatal. acta commercii, 8(1), a57. https://doi.org/10.4102/ac.v8i1.57 hogan, r., & sherman, r.a. (2020). personality theory and the nature of human nature. personality and individual differences, 152, 1–5. https://doi.org/10.1016/j.paid.2019.109561 hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling: a multidisciplinary journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 joseph, d.l., jin, j., newman, d.a., & o’boyle, e.h. (2015). why does self-reported emotional intelligence predict job performance? a meta-analytic investigation of mixed ei. journal of applied psychology, 100(2), 298–342. https://doi.org/10.1037/a0037681 joseph, d.l., & newman, d.a. (2010). emotional intelligence: an integrative meta-analysis and cascading model. journal of applied psychology, 95(1), 54–78. https://doi.org/10.1037/a0017286 li, c.-h. (2016). confirmatory factor analysis with ordinal data: comparing robust maximum likelihood and diagonally weighted least squares. behavior research methods, 48(3), 936–949. https://doi.org/10.3758/s13428-015-0619-7 marcus, b., taylor, o.a., hastings, s.e., sturm, a., & weigelt, o. (2016). the structure of counterproductive work behavior: a review, a structural meta-analysis, and a primary study. journal of management, 42(1), 203–233. https://doi.org/10.1177/0149206313503019 mayer, j.d., salovey, p., caruso, d.r., & sitarenios, g. (2003). measuring emotional intelligence with the msceit v2.0. emotion, 3(1), 97–105. https://doi.org/10.1037/1528-3542.3.1.97 mcabee, t.s., oswald, l.f., & connelly, s.b. (2014). bifactor models of personality and college student performance. european journal of personality, 28(6), 604–619. https://doi.org/10.1002/per.1975 miao, c., humphrey, r.h., & qian, s. (2017). are the emotionally intelligent good citizens or counterproductive? a meta-analysis of emotional intelligence and its relationships with organizational citizenship behavior and counterproductive work behavior. personality and individual differences, 116, 144–156. https://doi.org/10.1016/j.paid.2017.04.015 morin, a.j.s. (2023). exploratory structural equation modeling. in r.h. hoyle (ed.), handbook of structural equation modeling (2nd ed., pp. 503–524). guilford. moroń, m., & biolik-moroń, m. (2021). trait emotional intelligence and emotional experiences during the covid-19 pandemic outbreak in poland: a daily diary study. personality and individual differences, 168, 1–11. https://doi.org/10.1016/j.paid.2020.110348 motowidlo, s.j., & van scotter, j.r. (1994). evidence that task performance should be distinguished from contextual performance. journal of applied psychology, 79(4), 475–480. https://doi.org/10.1037/0021-9010.79.4.475 nel, h., & de villiers, w.s. (2004). the relationship between emotional intelligence and job performance in a call centre environment. sa journal of industrial psychology, 30(3), a159. https://doi.org/10.4102/sajip.v30i3.159 organ, d.w. (1997). organizational citizenship behavior: it’s construct clean-up time. human performance, 10(2), 85–97. https://doi.org/10.1207/s15327043hup1002_2 petrides, k.v., mikolajczak, m., mavroveli, s., sanchez-ruiz, m.-j., furnham, a., & pérez-gonzález, j.-c. (2016). developments in trait emotional intelligence research. emotion review, 8(4), 335–341. https://doi.org/10.1177/1754073916650493 podsakoff, p.m., mackenzie, s.b., & podsakoff, n.p. (2012). sources of method bias in social science research and recommendations on how to control it. annual review of psychology, 63(1), 539–569. https://doi.org/10.1146/annurev-psych-120710-100452 preacher, k.j., & coffman, d.l. (2006). computing power and minimum sample size for rmsea [computer software]. retrieved from http://quantpsy.org/ pulakos, e.d., arad, s., donovan, m.a., & plamondon, k.e. (2000). adaptability in the workplace: development of a taxonomy of adaptive performance. journal of applied psychology, 85(4), 612–624. https://doi.org/10.1037/0021-9010.85.4.612 r core team. (2016). r: a language and environment for statistical computing. reference index. retrieved from https://cran.r-project.org/doc/manuals/r-release/fullrefman.pdf reise, s.p., bonifay, w.e., & haviland, m.g. (2013). scoring and modeling psychological measures in the presence of multidimensionality. journal of personality assessment, 95(2), 129–140. https://doi.org/10.1080/00223891.2012.725437 rodriguez, a., reise, s.p., & haviland, m.g. (2016a). applying bifactor statistical indices in the evaluation of psychological measures. journal of personality assessment, 98(3), 223–237. https://doi.org/10.1080/00223891.2015.1089249 rodriguez, a., reise, s.p., & haviland, m.g. (2016b). evaluating bifactor models: calculating and interpreting statistical indices. psychological methods, 21(2), 137–150. https://doi.org/10.1037/met0000045 rönkkö, m., & cho, e. (2020). an updated guideline for assessing discriminant validity. organizational research methods, december, 25(1), 6–14. https://doi.org/10.1177/1094428120968614 rosseel, y. (2012). lavaan: an r package for structural equation modeling. journal of statistical software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02 rosseel, y., jorgensen, t., & rockwood, n. (2022). latent variable analysis. retrieved from https://cran.r-project.org/web/packages/lavaan/index.html sackett, p.r., zhang, c., berry, c.m., & lievens, f. (2021). revisiting meta-analytic estimates of validity in personnel selection: addressing systematic overcorrection for restriction of range. journal of applied psychology, 107(11), 2040–2068. https://doi.org/10.1037/apl0000994 sloan, m., & geldenhuys, m. (2021). regulating emotions at work: the role of emotional intelligence in the process of conflict, job crafting and performance. sa journal of industrial psychology, 47, a1875. https://doi.org/10.4102/sajip.v47i0.1875 spector, p.e., & fox, s. (2005). the stressor–emotion model of counterproductive work behavior. in p.e. spector & s. fox (eds.), counterproductive work behavior (pp. 151–174). american psychological association. stein, s.j., & book, h.e. (2011). the eq edge: emotional intelligence and your success. john wiley & sons. stols, a., & van lill, x. (2022). emotional quotient inventory 2.0 (eq-i 2.0): a south african english technical manual supplement. multi healthy systems and jvr africa group. van lill, x., & taylor, n. (2022). the validity of five broad generic dimension of performance in south africa. south african journal of human resource management, 20(0), 1–15. https://doi.org/10.4102/sajhrm.v20i0.1844 van lill, x., & van der merwe, g. (2022). differences in selfand managerial ratings on generic performance dimensions. sa journal of industrial psychology, 48, 1–10. https://doi.org/10.4102/sajip.v48i0.2045. van lill, x., & van der vaart, l. (2023). the validity of a general factor of individual work performance in the south african context. manuscript submitted for publication. van zyl, c.j.j. (2014). the psychometric properties of the emotional quotient inventory 2.0 in south africa. sa journal of industrial psychology, 40(1), a1192. https://doi.org/10.4102/sajip.v40i1.1192 viswesvaran, c., schmidt, f.l., & ones, d.s. (2005). is there a general factor in ratings of job performance? a meta-analytic framework for disentangling substantive and error influences. journal of applied psychology, 90(1), 108–131. https://doi.org/10.1037/0021-9010.90.1.108 wiechorek, d. (2011). emotional quotient inventory 2.0: user’s handbook. multi-health systems. yang, h., weng, q., li, j., & wu, s. (2022). exploring the relationship between trait emotional intelligence and adaptive performance: the role of situational strength and self-efficacy. personality and individual differences, 196, 111711. https://doi.org/10.1016/j.paid.2022.111711 yukl, g. (2012). effective leadership behavior: what we know and what questions need more attention. academy of management perspectives, 26(4), 66–85. https://doi.org/10.5465/amp.2012.0088 http://www.ajopa.org open access page 1 of 1 reviewer acknowledgement read online: scan this qr code with your smart phone or mobile device to read online. read online: scan this qr code with your smart phone or mobile device to read online. acknowledgement to reviewers in an effort to facilitate the selection of appropriate peer reviewers for the african journal of psychological assessment, we ask that you take a moment to update your electronic portfolio on https://ajopa.org for our files, allowing us better access to your areas of interest and expertise, in order to match reviewers with submitted manuscripts. if you would like to become a reviewer, please visit the journal website and register as a reviewer. to access your details on the website, you will need to follow these steps: 1. log into the online journal at https://ajopa. org 2. in your ‘user home’ [https://ajopa.org/index. php/ajopa/user] select ‘edit my profile’ under the heading ‘my account’ and insert all relevant details, bio statement and reviewing interest(s). 3. it is good practice as a reviewer to update your personal details regularly to ensure contact with you throughout your professional term as reviewer to african journal of psychological assessment. please do not hesitate to contact us if you require assistance in performing this task. publisher: publishing@aosis.co.za tel: +27 21 975 2602 tel: 086 1000 381 the editorial team of the african journal of psychological assessment recognises the value and importance of the peer reviewer in the overall publication process – not only in shaping the individual manuscript, but also in shaping the credibility and reputation of our journal. we are committed to the timely publication of all original, innovative contributions submitted for publication. as such, the identification and selection of reviewers who have expertise and interest in the topics appropriate to each manuscript are essential elements in ensuring a timely, productive peer review process. we would like to take this opportunity to thank all reviewers who participated in shaping this volume of the african journal of psychological assessment. we appreciate the time taken to perform your review(s) successfully. amanda cromhout angelina wilson fadiji anwynne kern brandon morgan brian mumba casper j. van zyl celeste m. combrinck charles h. van wijk cobi hayes david j.f. maree erica munnik ghouwa ismail itumeleng p. khumalo jacques s. pienaar jarred h. martin justin o. august kate cockcroft kevin distiller kim e. dowdeswell maria damianova marilyn lucas nicola taylor nicoleen coetzee rené van eeden ruby patel solomon mashegoane suzanne bester tyrone b. pretorius vanessa scherman victoria williams xander van lill yaseen ally zaakirah mohamed http://www.ajopa.org� https://ajopa.org https://ajopa.org https://ajopa.org https://ajopa.org/index.php/ajopa/user https://ajopa.org/index.php/ajopa/user mailto:publishing@aosis.co.za abstract background method results discussion conclusion acknowledgements references about the author(s) leila abdool gafoor centre for psychological services and career development (psycad), university of johannesburg, johannesburg, south africa alban burke centre for psychological services and career development (psycad), university of johannesburg, johannesburg, south africa jean fourie department of education, faculty of education, university of johannesburg, johannesburg, south africa citation abdool gafoor, l., burke, a., & fourie, j. (2021). the efficacy of the senior south african individual scale revised in distinguishing between attention deficit hyperactivity disorder, normal and sluggish cognitive tempo children. african journal of psychological assessment, 3(0), a45. https://doi.org/10.4102/ajopa.v3i0.45 original research the efficacy of the senior south african individual scale revised in distinguishing between attention deficit hyperactivity disorder, normal and sluggish cognitive tempo children leila abdool gafoor, alban burke, jean fourie received: 11 dec. 2020; accepted: 18 june 2021; published: 29 july 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the primary objective of this study was to determine whether attention deficit hyperactivity disorder (adhd), sluggish cognitive tempo (sct) and a non-clinical (nc) group of learners perform differently on the senior south african individual scale revised (ssais-r). the rationale for this study is based on literature that argues for sct to be considered as a separate and unique disorder to adhd. the ssais-r results of 618 (7–17 years of age) children were analysed for the purposes of this study. the total sample consisted of three groups, that is, adhd (n = 106), nc (n = 427) and sct (n = 85). between-group t-tests were performed to test for significant differences between the three groups with regard to the different ssais-r subtests. the results indicated significant differences between nc and adhd, nc and sct but not between adhd and sct. these results suggest that if sct is considered to be a separate disorder from adhd, then this is not evident in terms of the performance on the ssais-r. it is recommended that other cognitive and neuropsychological assessments be included in future research to ascertain whether sct, if it exists, affects performance differently to adhd. keywords: cognitive performance; adhd; sct; ssais-r; attention; cognitive assessments. background many factors, such as psychosocial factors, learning disorders and other neurodevelopmental disorders, as described in the diagnostic and statistical manual of mental disorders (5th ed.; dsm-5) (american psychiatric association [apa], 2013) may affect the academic performance of south african learners. one of the most common neurodevelopmental disorders is attention deficit hyperactivity disorder (adhd), which is not a homogeneous disorder (wilens & spencer, 2010) as it is erroneously accepted by many practitioners. prevalence rates in south africa indicate that approximately 4% – 5% of children present with adhd (schellack & meyer, 2012). attention deficit hyperactivity disorder is a complicated, heterogeneous disorder as characterised by the different subtypes described in the dsm-5 (apa, 2013). an additional problem is clinically subthreshold symptoms and comorbid disorders, which complicates the diagnostic process. these problems may result in either overor mis-diagnosis of adhd (barkley, 2013). in addition, there seems to be a lack of standardised diagnostic procedures to assist with making a clear diagnosis. levy, hay, mcstephen, wood and waldman (1997) suggested that it may be better to describe adhd as a spectrum disorder where symptoms of attention, inhibition and motor activity regulation are placed on a continuum. as our understanding of adhd became clearer it has become evident that adhd is a complex developmental impairment that extends further than merely a problem of inattention (brown, 2002). although some authors such as milich, balentine and lynam (2001) placed an emphasis on attention problems, other authors such as barkley (1998) argued that adhd is a result of impaired inhibitory processes. as a result of the different opinions regarding the role of inattentiveness in adhd there have been questions regarding the validity of the inattentive subtype of adhd and whether this subtype should not rather be considered to be a separate and unique disorder (barkley, 1998, 2001, 2016; lahey, 2001). in this regard, both barkley (2013, 2014) and becker (2019) have suggested that sluggish cognitive tempo (sct) be both similar and different from adhd. the overlap seems to be mainly between sct and adhd of the inattentive subtype as illustrated by the following symptoms of sct: daydreaming, hypo-arousal, confusion, objectively inattentive, lethargy, slow psychomotor speed, difficulty in following instructions, drowsiness, apathy, internally distracted, slow task completion, lack of initiative and decline in sustained performance (barkley, 2018). the main difference between these two disorders seems to be that adhd is characterised by external distractibility whereas sct seems to be characterised by internal distractibility (becker & barkley, 2021). furthermore, impulsivity is one of the core categories of symptoms in adhd but is not a distinct symptom or cluster of symptoms of sct (barkley, 2005). another difference between the two disorders is that children with adhd tend to struggle with productivity whereas children with sct tend to struggle with accuracy (barkley, 2013). the pathogenesis of the two disorders also seem to be different (see table 1), adhd is characterised by an early onset whereas sct is characterised by a later onset (barkley, 2005). bruchmüller, margraf and schneider (2012) found in the south african context that adhd not only starts in childhood but persisted into adolescence in most cases. it also seems as if there is stronger evidence for adhd being hereditary than sct (barkley, 2005). table 1: differences between sluggish cognitive tempo versus attention deficient hyperactivity disorder. different socio-economic factors seem to play a role in these two disorders (barkley, 2012, 2013). sluggish cognitive tempo seems to be more prevalent in lower socio-economic groups than adhd (barkley, 2012) and that, per implication, sct might be associated more with psychosocial difficulties than adhd. the two disorders also seem to differ with regard to comorbid conditions where children with sct are more prone to internalising disorders whereas children with adhd are more prone to externalising disorders (barkley, 2005, 2011a, 2011b, 2012). although there may be some overlap between the symptoms of adhd and sct it is not only a question of semantics but more importantly a question in terms of treatment and management of children with sct. attention deficit hyperactivity disorder is a neurologically based disorder, which is characterised by a persistent pattern of inattention and or hyperactivity-impulsivity that interferes with function or development (apa, 2013). some researchers argue that sct is nothing more than adhd of the inattentive subtype (jacobson et al., 2012) whereas others argue that sct could be conceptualised as a separate disorder (barkley, 2016; becker, 2013). however, many critics highlight a lack of a clear clinical description of sct and are also opposed to the name of the disorder as they view this as derogatory and misleading. as far as the latter is concerned there have been suggestions to change the name to concentration deficient disorder (cdd), which would be less offensive, keep the concentration of the label on the disorder and summarise the core deficiency (barkley, 2014; becker, 2013). current studies on sct are gaining international momentum (lee et al., 2016), but none have been carried out in south africa. given the link between sct and various psychosocial, socio-economic and cultural factors, it is important to investigate the possibility of sct in the south african context, as one cannot underestimate the contribution of cultural influences on mental health (achenbach & rescorla, 2007). there is a distinct need for further research, both nationally and internationally into sct, especially as far as aetiology, diagnosis and treatment are concerned. accurate descriptions of sct symptoms may help to predict areas of functional difficulty in learners with poor academic performance (jacobson et al., 2012). in this regard, slow processing speed, which seems to be mainly attributed to adhd and low arousal levels have been identified in children with sct (shanahan et al., 2006), which may explain poor academic performance. although becker (2019) argued that validated measures can be used to examine sct symptoms in different cultures, the reliability and validity of psychometric instruments, such as the senior south african individual scale revised (ssais-r), that are used in south africa remain a problem. many, if not most of these assessments are outdated and have not been standardised for all cultural and language groups. foxcroft, paterson, le roux and herbst (2004) stated that it is concerning that most tests that are being used by practitioners have not been adapted for the south african multicultural context but continue to be used for a wide variety of purposes such as identifying and diagnosing psychiatric conditions. there is, however, clearly a void in the development and improvement of psychometric assessments in south africa, which is not being filled. as a result, practitioners have no other option than to use existing assessments despite all the problems mentioned here. although the existence of a disorder such as sct is debateable, the premise of this study is that it possibly exists as a distinct disorder from adhd and that performance on the ssais-r would differ significantly from each other. it is hypothesised that both these groups would perform significantly poorer on the ssais-r than the nc group. method a comparative research design was used to determine whether there are significant differences on the ssais-r amongst the three groups. participants archival data were used for the purposes of this study. purposive sampling was used where 734 clinical files of children between the ages of 7 and 17 years, where the ssais-r was performed, were perused. these cases were then categorised into three groups: an adhd group (n = 106, 17%), a sct group (n = 85, 14%) and a nc group (n = 427, 69%) and 103 cases were excluded. based on the clinical notes in the files, a formal diagnosis by mental health professionals of adhd was used as the including criterion for the adhd group, the proposed symptoms of sct as described by barkley (2005, 2011a, 2011b, 2012, 2013, 2014) was used as including criteria for the sct group. cases with no clear disorders or diagnoses were included in the nc group and cases where there was evidence of other disorders were excluded from the study. from table 2 it can be deduced that there was not an equal distribution of males and females in the sample (chi2 = 8.82; p = 0.01). however, given the higher prevalence rate of adhd in males than females, the adhd sample could be accepted as an accurate reflection of the demographics of children with neurodevelopmental disorders. it is difficult to conclude whether the higher prevalence rate of sct in males and females in this study is representative of the sct population. table 2: gender distribution in the three groups. the higher number of males than females in the nc group is, however, of concern as this does not reflect the gender distribution in the normal population. the race distribution of the sample is not an accurate reflection of the demographics of the general south african population; however, given the absence of prevalence rates of neurodevelopmental disorders per race group available, it is difficult to conclude whether the sample is representative of the general population. it must be observed that despite the ssais-r not being standardised for black south african learners, practitioners still use this test for psycho-educational purposes. although only data for learners between the ages of 7 and 17 years were included in the study there was not an equal representation of all the age groups (chi2 = 101.5; p < 0.000). this is, however, not surprising as both adhd and sct are identified at a young age whereas the learners in the nc group did not necessarily experience psycho-educational problems, which would explain why they only requested assessments at a later age (see table 3). table 3: age distribution in the three groups. instrument despite the ssais-r not being standardised for the south african population, as the norms are limited to white, mixed race and indian groups, practitioners continue to use the ssais-r for psycho-educational and diagnostic purposes for children between the ages of 7 and 17 years. when data were retrieved from the case files, most of the cases had ssais-r data and only in a couple of cases were other assessments such as the wisc-iv utilised. based on this, only the cases where the ssais-r was used was included in the study. skills such as learning ability, general knowledge, spatial perception, visual motor skills, basic perceptual and concept performing skills are measured (van eeden, 1991). the ssais-r typically evaluates a level of general intelligence and strengths and weaknesses (van eeden, 1997). raw scores of the test are converted into norm scores for different age categories. the reliability varies from one subtest and one age group to another. the lowest reliability score was 0.59 for the missing parts subtest for the 13-year-old age group and the highest was 0.91 for ages 8, 10 and 12 year olds (laher & cockcroft, 2013). the construct validity of the ssais-r was determined by both factor analysis, which yielded two broad factors, that is, verbal and non-verbal and correlation with a similar test that measures the same construct. the different subtests in table 4 were categorised in terms of either verbal or non-verbal tests. one of the subtests, that is, number problems loaded on both the verbal and non-verbal scores, however, the test developers decided to categorise this as a verbal test. another anomaly was the form board test that did not load significantly on the two main factors, although it contributes to measuring non-verbal intelligence and was therefore categorised in the non-verbal scale. table 4: subtests and descriptors of the senior south african intelligence scale revised that were utilised in this study. the scores of the composite scales of the ssais-r were correlated with other tests that measure similar constructs. van eeden (1997) reported that subtests on the ssais-r correlated significantly with scores on similar tests. procedure all the files of children who were referred for psycho-educational assessments in the period 2006–2018 were studied and only those cases where ssais-r assessments were carried out were included initially. the diagnostic criteria for adhd as outlined in the dsm-5 and sct as described by barkley (2011a, 2011b, 2012) were used to identify cases for these two groups. each case was reviewed independently by two ratters and only the cases where there was agreement between the two ratters in terms of which group the cases should be categorised in were ultimately included in the final sample. the diagnosis, as reported in the file, was verified by the observations that were reported in the file. those cases where there were discrepancies between the diagnosis and the observations were excluded. it is unfortunate that many mental health professionals do not distinguish between the different subtypes of adhd when reporting the diagnosis. as a result of this the adhd group could not be sub-categorised in terms of subtypes. identifying the sct group proved to be challenging because of several reasons. in the absence of an official diagnostic category for sct, mental health professionals in south africa have not and do not make an sct diagnosis. in order to overcome this problem, those cases where there was evidence of concentration and attention difficulties but no formal adhd diagnosis, as well as the typical signs and symptoms of sct, as reported in the literature, were used to categorise these cases as sct. cases where there was evidence of a related neurological impairment, such as epilepsy, were excluded. it is acknowledged that this process may have yielded both false positive and false negative categorisations of sct and adhd. cases where there were no indications of any pathology were categorised as nc. those cases where there was evidence of other forms of pathology were excluded. data analysis the aim of study was to determine whether there would be a difference in the performance of adhd, sct and nc learners on the ssais-r. this dictates that statistical procedures that compare means between groups to be utilised to investigate the demographic composition of these three groups descriptive statistics were calculated (see tables 2, 3 and 5). descriptive statistics, that is, mean and standard deviations were calculated for all the subtests and the composite scores. table 5: race distribution in the three groups. to determine whether the differences between the three groups were statistically significant, in-between group t-tests were calculated. levene’s test for the equality of variances was calculated and it was found that variances for the different subtests and different groups varied between 15% and 30% being unequal. it was decided to report parametric statistical analysis, that is, the t-test, as opposed to the non-parametric equivalent, that is, the mann–whitney u test. both the t-test and the mann–whitney u test was run on the data and both produced the same results. maher, markey and ebert-may (2013) are of the opinion that the metrics of effect size provides additional information to the reporting of probability as the effect size provides information on the magnitude of the differences between variables, whereas the significance test indicates the likelihood that the difference is because of chance. significance is sensitive to sample size and has the potential to be flawed, therefore these authors suggest that researchers should report effect size in addition to significance as this would inform them whether their findings are practically meaningful or important (pallant, 2013; sullivan & feinn, 2012). for this reason, cohen’s d was calculated and reported in the results. an effect size of 0.2 is small, 0.5 is medium and 0.8 and above is large (cohen, 1992). ethical considerations the parents of the minor learners who come to the clinic for assessments are required to complete a consent form. this form has a section that gives permission for using information for research and training purposes. all data were captured by file number thus there were neither identifying details nor any identifying information used in the reporting of the results therefore it would be impossible to identify individuals from the results of the study. ethical clearance was provided by the faculty of education, research ethics committee at the university of johannesburg, reference number: sem 2 2018-029. results from table 6 it is evident that the nc group generally scored better than the other two groups and the adhd group generally scored better than the sct group in most of the subtests. table 6: mean and standard deviation scores of the different subtests for the three groups. as reported previously the nc group performed significantly better in the subtests than both the other groups (see table 7). although the sct and the adhd groups did perform differently on the subtests, none of these differences were significant (see table 7). when looking at the effect sizes (cohen’s d), it can be seen that despite significant differences the effect sizes were small to medium indicating that in the verbal subtests the ssais-r were not good predictors in distinguishing between adhd, nc and sct. table 7: t-test scores for the differences in the mean scores of the verbal subtests between the three groups. the nc group performed significantly better than the other two groups on the form board subtest (see table 8). when looking at the effect size (cohen’s d) of the non-verbal subtests, again the effect sizes were found to be small to medium indicating that the size of the differences between the adhd, nc and sct groups on the non-verbal subtests were not large despite the nc group performing significantly better on the form board and coding subtests. this again refers to the ssais-r not being a good predictor in distinguishing between adhd, nc and sct. table 8: t-test scores for the differences in the mean scores of the non-verbal subtests between the three groups. discussion this study investigated the differences in performance of adhd, sct and nc on the ssais-r. although there were significant differences between the adhd and nc groups and the nc and the sct group the effect sizes were only small to medium. the nc group performed significantly better than the other two groups on tests that were either on a time completion limit or where the time to complete a task is factored into the scoring of that test, however the effect sizes on these tests were also small to medium. barkley (2012) and lee et al. (2016) argued that sct and adhd are two separate and distinct disorders, however, the only significant differences that were found in this study were that the adhd group performed better than the sct group on the form board and coding subtests. this finding is in line with the findings of the research performed by flannery, luebbe and becker (2017) that children with adhd generally perform better on perceptual motor tasks than children with sct. as opposed to the sct group the adhd group did not differ significantly from the nc group on these two subtests. it would therefore seem as if learners with sct tend to make more errors on these tasks than either the adhd or the nc groups. both the form board and coding subtests measure perceptual motor speed to a greater or lesser extent. conclusion there were several limitations to this study, which may have affected the results and conclusions. archival data were used therefore the reports of many practitioners perused and discrepancies in test administration, scoring and interpretation cannot be accounted for in this study. a further limitation is that these practitioners used the ssais-r even though the test is not standardised for all race groups in south africa and this may also have affected the results of the study. keeping these limitations in mind, the results indicated that the adhd and sct groups differed significantly from the nc group, implying that they are more similar in terms of performance on the ssais-r than they would be to the nc group. it might be worthwhile to describe these two disorders in terms of an attention spectrum and to investigate which aspects of attention differ between these two groups. both the adhd and sct groups differed significantly from the nc group, however the sct group differed significantly on more of the ssais-r subtests from the nc group than did the adhd group. when considering disorders such as sct and adhd, it may be more appropriate to categorise these two disorders together as disorders of attention. if one accepts that attention as a construct, would be normally distributed in the general population, both the sct and adhd groups would have impaired attention. this would also be in line with the suggestion that sct should rather be named cdd (barkley, 2014; becker, 2013) as it focuses on the impairment of attention and concentration of the disorder. further investigation is necessary to determine whether there are attention differences between adhd and sct, as well as to determine the locus of distraction, that is, either internal or external (see table 1). it is recommended that other forms of assessment, such as neuropsychological assessments should be considered when drawing a distinction between these two disorders. burke, austin and waldeck (2011) argued that a diagnosis of adhd must only be made based on multiple measures such as psychometric assessments, neuropsychological assessments, behavioural observations and physiological measures. in addition, the clinical observations during a test session, keeping barkley’s symptoms of sct in mind, would be essential as the reason for poor performance and a reason for inattentiveness that would differ between adhd and sct. it can be expected that both would perform poorly on cognitive assessments for different reasons. it is expected that learners with adhd would battle to remain undistracted, show signs of impulsivity and would be hyperactive whilst those with sct would battle to respond timeously and would require prompting and guidance to sustain their effort. given the fact that results suggest that the performance on the ssais-r subtests did not differ significantly between the adhd and sct groups we draw one of two conclusions. it could be that adhd and sct are not two distinct disorders as suggested by becker and barkley (2021) or that the two groups perform similarly on a cognitive assessment such as the ssais-r. acknowledgements the authors would like to thank the team of intern psychometrists who captured the data for this article over 2017 and 2018. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions l.a.g. contributed 40% towards the article in literature review, results, conclusion, discussion and referencing. a.b. contributed 40% towards the article in data analysis, results and the conclusion. j.f. contributed 20% towards the article in editing and conclusion. funding information this research project is supported through an earmarked grant allocated as part of the teaching and learning development capacity improvement programme (tldcip), a partnership between the department of higher education and training and the european union. data availability data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references achenbach, t.m., & rescorla, l.a. (2007). multicultural supplement to the manual for the aseba school-age forms & profiles. burlington: university of vermont, research center for children, youth & families. american psychiatric association. (2013). diagnostic and statistical manual of mental disorders (5th ed.). washington, dc: american psychiatric association. barkley, r.a. (1998). attention-deficit hyperactivity disorder: a handbook for diagnosis and treatment (2nd ed.). new york: guilford. barkley, r.a. (2001). the executive functions and self-regulation: an evolutionary neuropsychological perspective. neuropsychology review, 11, 1–29. https://doi.org/10.1023/a:1009085417776 barkley, r.a. (2005). adhd and the nature of self-control (paperback ed.). new york, ny: guilford. barkley, r.a. (2011a). the barkley adult adhd rating scale – iv. new york, ny: guilford. barkley, r.a. (2011b). the barkley deficits in executive functioning scale. new york, ny: guilford. barkley, r.a. (2012). distinguishing sluggish cognitive tempo from attention deficit hyperactivity disorder in adults. journal of abnormal psychology, 121(4), 978–990. https://doi.org/10.1037/a0023961 barkley, r.a. (2013). two types of attention disorders now recognized by clinical scientists. in: taking charge of adhd: the complete, authoritative guide for parents (3rd ed., pp. 20–25). new york, ny: guilford. barkley, r.a. (2014). sluggish cognitive tempo (concentration deficit disorder?): status, future directions, and a plea to change the name. journal of abnormal child psychology, 42, 117–125. https://doi.org/10.1007/s10802-013-9824-y barkley, r.a. (2016). a brief note on the history of executive functioning. the adhd report, 24(1), 14. https://doi.org/10.1521/adhd.2016.24.1.14 barkley, r.a. (2018). barkley sluggish cognitive tempo scale – children and adolescents (bscts-ca). new york, ny: guilford. becker, s.p. (2013). topical review: sluggish cognitive tempo: research findings and relevance for paediatric psychology. journal of paediatric psychology, 38(10), 1051–1057. https://doi.org/10.1093/jpepsy/jst058 becker, s.p. (2019). sluggish cognitive tempo: the need for global inquiry. world psychiatry, 18(2), 237–238. https://doi.org/10.1002/wps.20639 becker, s.p., & barkley, r.a. (2021), field of daydreams? integrating mind wandering in the study of sluggish cognitive tempo and adhd. jcpp advances, 1(1), e12002. https://doi.org/10.1111/jcv2.12002 brown, t. (2002). dsm-iv: adhd and executive function impairments. advanced studies in medicine, 2(25), 910–914. bruchmüller, k., margraf, j., & schneider, s. (2012). is adhd diagnosed in accord with diagnostic criteria? overdiagnosis and influence of client gender on diagnosis. journal of consulting and clinical psychology, 80(1), 128–138. https://doi.org/10.1037/a0026582 burke, a., austin, t., & waldeck, c. (2011). adult adhd in a student population: preliminary findings. journal of psychology in africa, 21(1), 27–32. https://doi.org/10.1080/14330237.2011.10820426 cohen, j. (1992). a power primer. psychological bulletin, 112(1), 155–159. https://doi.org/10.1037/0033-2909.112.1.155 flannery, a.j., luebbe, a.m., & becker, s.p. (2017). sluggish cognitive tempo is associated with poorer study skills, more executive functioning deficits, and greater impairment in college students. journal of clinical psychology, 73(9), 1091–1113. https://doi.org/10.1002/jclp.22406 foxcroft, c., paterson, h., le roux, n., & herbst, d. (2004). the test use patterns and needs of psychological assessment practitioners. pretoria: human sciences research council. jacobson, l.a., murphy-bowman, s.c., pritchard, a.e., tart-zelvin, a., zabel, t.a., & mahone, e.m. (2012). factor structure of a sluggish cognitive tempo scale in clinically referred children. journal of abnormal child psychology, 40(8), 1327–1337. https://doi.org/10.1007/s10802-012-9643-6 laher, s., & cockcroft, k. (2013). psychological assessment in south africa: research and applications. johannesburg: wits university press. lahey, b.b. (2001). should the combined and predominantly inattentive types of adhd be considered distinct and unrelated disorders? not now, at least. clinical psychology: science and practice, 8(4), 494–497. https://doi.org/10.1093/clipsy.8.4.494 lee, s., burns, g.l., beauchaine, t.p., & becker, s.p. (2016). bifactor latent structure of attention-deficit/hyperactivity disorder (adhd)/opposition defiant disorder. psychological assessment, 28(8), 917–928. https://doi.org/10.1037/pas0000232 levy, f., hay, d.a., mcstephen, m., wood, c., & waldman, i. (1997). adhd: a category or a continuum? genetic analysis of a large scale twin study. journal of the american academy of child and adolescent psychiatry, 36, 737–744. journal of attention disorders, 2(2), 129–129. https://doi.org/10.1177/108705479700200206 maher, j.m., markey, j.c., & ebert-may, d. (2013). the other half of the story: effect size analysis in quantitative research. cbe life sciences education, 12(3), 345–351. https://doi.org/10.1187/cbe.13-04-0082 milich, r., balentine, a.c., & lynam, d.r. (2001). adhd combined type and adhd predominantly inattentive type are distinct and unrelated disorders. clinical psychology: science and practice, 8(4), 463–488. https://doi.org/10.1093/clipsy.8.4.463 pallant, j. (2013). spss survival manual: a step by step guide to data analysis using ibm spss (4th ed.). crows nest: allen & unwin. shanahan, m., pennington, b., yerys, b., scott, a., boada, r., willcutt, e., … defries, j. (2006). processing speed deficits in attention deficit/hyperactivity disorder and reading disability. journal of abnormal child psychology, 34(5), 584–601. https://doi.org/10.1007/s10802-006-9037-8 schellack, n., & meyer, j. (2012). the management of attention-deficit/ hyperactivity disorder in children. south african pharmacy journal, 79(10), 12–20. sullivan, g.m., & feinn, r. (2012). using effect size-or why the p value is not enough. journal of graduate medical education, 4(3), 279–282. https://doi.org/10.4300/jgme-d-12-00156.1 van eeden, r. (1991). manual for the senior south african individual scale-revised (ssais-r). part 1: background and standardization. pretoria: human sciences research council. van eeden, r. (1997). manual for the senior south african individual scale – revised (ssais-r): background and standardisation. pretoria: human sciences research council. wilens, t.e., & spencer, t.j. (2010). understanding attention-deficit/hyperactivity disorder from childhood to adulthood. postgraduate medicine, 122(5), 97–109. https://doi.org/10.3810/pgm.2010.09.2206 abstract introduction method results discussion conclusion acknowledgements references about the author(s) candice britz department of psychology, faculty of humanities, university of johannesburg, johannesburg, south africa casper j.j. van zyl department of psychology, faculty of humanities, university of johannesburg, johannesburg, south africa citation britz, c., & van zyl, c.j.j. (2020). examining the internal structure of the executive functioning inventory amongst south african students. african journal of psychological assessment, 2(0), a26. https://doi.org/10.4102/ajopa.v2i0.26 original research examining the internal structure of the executive functioning inventory amongst south african students candice britz, casper j.j. van zyl received: 13 mar. 2020; accepted: 29 july 2020; published: 21 sept. 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the role of executive functions in everyday life can hardly be overstated. its influence ranges from pathological behaviour on the negative side, to quality of life on the positive side of human functioning. assessment of executive functions includes both objective and subjective measures, which include self-report measures. most self-report measures, however, were developed for use in clinical populations. the executive functioning inventory (efi) is a brief self-report measure developed for use in healthy populations. psychometrically, the measure appears to function reasonably well in american and european populations; however, its internal structure is yet to be examined in south africa. the aim of this study was to evaluate the internal consistency reliability, item functioning and factor structure of the efi in this context. the data (n = 1904) were collected amongst students at a large urban university of the gauteng province of south africa. mcdonald’s omega reliability estimates were mostly satisfactory with some exceptions, ranging between 0.59 and 0.76. a five-factor model consistent with a multidimensional view of executive functioning found modest support in this data. with the exception of two items, item response theory analysis further found the items of the efi to function well on their respective subscales. overall, the results were largely consistent with previous findings, providing initial support for its use in south africa, especially, for research studies seeking a brief index of executive functioning or as part of a comprehensive assessment of executive functioning, if required. keywords: executive functioning; self-report; reliability; validity; confirmatory factor analysis. introduction executive functioning (ef) is an umbrella term for the capacity to create, sustain and shift mental sets (suchy, 2009, 2016). broadly, it refers to a set of top-down cognitive processes responsible for the management of coordinated thought and action (gray-burrows et al., 2019). there appears to be some consensus amongst researchers that there are essentially three core domains of ef (diamond, 2013; gray-burrows et al., 2019; miyake & friedman, 2012). these include: inhibition (i.e. referring to cognitive and behavioural restraint and the determination of attentional focus); updating or working memory (i.e. momentarily holding information in memory for later processing) and set shifting (i.e. cognitive flexibility required to switch between mental tasks and operations). in combination, they facilitate several critical capabilities such as reasoning, generating goals and plans along with the ability to sustain attention and motivation to see them through (aron, 2008). it also includes the mental flexibility required to adjust goals and plans in the event of changing circumstances. this family of behaviours are conscious and effortful and are in contrast to intuitive, instinctive, routine, automatic or otherwise overlearned behaviours (diamond, 2013). whilst there may be broad consensus on the core functions, there is still no single definition or universally adopted conceptualisation of executive function (for reviews see goldstein, naglieri, princiotta, & otero, 2014; mccloskey & perkins, 2013). further, ef is considered a multidimensional construct rather than a single unitary trait. (mccloskey, perkins, & van divner, 2009). thus, with more than 30 definitions of ef (goldstein et al., 2014) and as many constructs hypothesised to be contained under this umbrella (mccloskey & perkins, 2013), it should be clear that ef refers to an array of complex, multidimensional cognitive processes and abilities (otero & barker, 2014). executive functions are predominately associated with the prefrontal cortex and associated areas (jacobs, anderson, & anderson, 2008; otero & barker, 2014). its developmental progression is prolonged, starting in infancy and continuing to adulthood (de luca & leventer, 2008). early research on executive functions and the parts of the brain they are associated with, involve the well-known story of phineas gage, a man who suffered severe damage to his ventromedial prefrontal cortex, which had particularly interesting effects on his executive functions (barkley, 2012). in subsequent years, interest in ef has continued to increase. this is not surprising, as the relevance of ef can hardly be overstated. executive functions play a role in just about every domain of life (diamond, 2013). for example, they have been investigated in the context of school readiness (morrison, ponitz, & mcclelland, 2010), school success (borella, carretti, & pelgrina, 2010), job success (bailey, 2007), romantic relationships (eakin et al., 2004), health behaviours (crescioni et al., 2011; miller, barnes, & beaver, 2011), criminal and other potentially threatening behaviours (broidy et al., 2003; denson, pederson, friese, hahm, & roberts, 2011) and even quality of life studies (brown & landgraf, 2010; davis, marra, najafzadeh, & liu-ambrose, 2010). not to mention its importance to mental health. for example, ef has been implicated in schizophrenia, obsessive-compulsive disorder, depression, addictions, attention deficit hyperactivity and conduct disorder, to name but a few mental health problems where it has been implicated (goldstein & naglieri, 2014). importantly, evidence suggests that conditions of disadvantage in early life are associated with adverse cognitive development from childhood through adolescence (berthelsen, hayes, white, & williams, 2018; hackman & farah, 2009; hackman, farah, & meaney, 2010; hackman, gallop, evans, & farah, 2015; mcewen & gianaros, 2010; sheridan, sarsour, jutte, d’esposito, & boyce, 2012). this is particularly relevant to south africa when considering the disadvantaged circumstances in which many children are raised that render them particularly vulnerable to deficits in ef. as mentioned earlier, executive functions cover many constructs and behaviours. this has given rise to a number of different approaches to its measurement (egger, de mey, & janssen, 2007; spinella, 2005). broadly, these can be categorised into subjective and objective measures of executive functions (smithmyer, 2013). for example, a well-known objective assessment is the wisconsin card sorting test. it assesses inhibition and mental flexibility as it requires an individual to maintain a task set, to be flexible in response to feedback and to avoid perseveration by inhibiting prior incorrect responses (salthouse, atkinson, & berish, 2003). another common objective measure is the stroop test. this measure requires inhibiting an overlearned response in order to engage with an incongruent stimulus (macleod, 1991). verbal fluency tests are another important class of objective measures. these require participants to generate several items related to some category, whilst observing and evading replication and using different retrieval strategies (strauss, sherman, & spreen, 2006). in contrast, subjective measures allow individuals to report on various aspects of ef, which provides an indication of their competence in complex, daily problem-solving activities (toplak, west, & stanovich, 2012), commonly referred to as self-rated executive function (sref) measures. a well-known example includes the behaviour rating inventory of executive functioning (brief). this instrument assesses ef behaviours in children and adolescents at home and school environments. there are two versions of this measure. one requires parents and teachers to complete separate forms and the other is self-report (brief-sr; guy, isquith, & gioia, 2004; toplak et al., 2012). other self-report measures of ef include the barkley deficits in executive function scale – children and adolescents (barkley, 2012), the delis rating of executive functions (d-ref; delis, 2012) and the comprehensive executive function inventory (cefi; naglieri & goldstein, 2013). however, there are limitations to all measures of ef (smithmyer, 2013). for example, some do not map well to real world settings, whilst others measure only a single aspect of ef, and some – indeed most – were developed for use in clinical populations. owing to such limitations, spinella (2005) undertook development of the executive functioning inventory (efi), a self-report measure that seeks to index a broad spectrum of executive functions within a healthy population (egger et al., 2007). in contrast to objective assessments of ef, self-report measures have the added advantage of being cost-effective and easy to administer. the efi contains 27 items and 5 subscales, namely motivational drive (md), impulse control (ic), empathy (em), organisation (org) and strategic planning (sp). using parallel analysis, spinella (2005) found a five-factor model as best representing the data, which is consistent with the theoretical model. collectively these factors accounted for 49.7% of total variance (spinella, 2005). this five-factor structure has also found support in subsequent research (janssen, de mey, & egger, 2009; smithmyer, 2013). in a second-order factor analysis, spinella (2005) also found three higher-order factors and argued that this model is consistent with the way executive functions have been associated with the dorsolateral (sp, org), orbitofrontal (ic, em ) and the anterior cingulated (md scale) regions of the brain (cummings, 1993). as a reviewer correctly pointed out, such models reflect a time when executive function theories still mirrored functional divisions of the frontal lobes. a view no longer accepted today (chung, weyandt, & swentosky, 2014; otero & barker, 2014). indeed, this model has not found support in other studies examining higher-order models of the efi. for example, janssen et al. (2009) only found support for two higher-order factors. whilst there appears to be some support for the reliability and construct validity of the efi, to the authors’ knowledge, the efi has not been examined for use within the south african context. the purpose of this study is to investigate the internal psychometric properties of the efi amongst university students for use in this setting. specifically, its reliability, factor structure and item functioning given its susceptibility to variation when used in different populations. it is therefore important to examine these properties of the efi in this context. method participants the data that were analysed for the study were collected as part of a larger project that explored wellness within an urban african context. participants were 1904 undergraduate psychology students (mean = 20.07 years, sd = 2.3 years) at a large urban university in the gauteng province of south africa. the majority of participants (76%) were women. participants’ home languages included: isizulu (19.8%), isixhosa (5.8%), english (22.7%), isindebele (11.9%), sepedi (11.3%), sesotho (7.9%), setswana (10.5%), siswati (5%), afrikaans (4.5%), tshivenda (3.3%), xitsonga (6.5%) and unspecified (0.8%). instruments executive function index the efi consists of 27 items. the items are divided into five subscales, namely md, org, ic, em, and sp, consisting of four, five, five, six and seven items, respectively. the items of the md scale assess behavioural drive, activity level and interest in novelty (e.g. ‘i have a lot of enthusiasm to do things’). organisation items assess the ability to carry out organised goal-directed behaviour through functions such as multitasking, sequencing and holding information in mind to make decisions (e.g. ‘i have trouble when doing two things at once’). the ic scale measures self-inhibition, risk-taking and social conduct (e.g. ‘i take risks, sometimes for fun’). the em scale addresses a person’s concern for the well-being of others, pro-social behaviour and a cooperative attitude (e.g. ‘i take other people’s feelings into account when i do something’). finally, the sp scale consists of items addressing a tendency to think ahead, plan and use strategies (e.g. ‘i think about the consequences of an action before i do it’) (spinella, 2005). data analysis reliability analysis three measures of reliability were computed, namely cronbach’s alpha, guttman 6 and mcdonald’s omega. this allows for a broad consideration of the efi’s reliability. cronbach’s alpha and guttman 6 are provided as they are well known by practitioners and researchers alike, especially cronbach’s alpha. however, both have several limitations, such as the fact that they do not reflect the actual structure of a test (bentler, 2008; revelle & zinbarg, 2009; sijtsma, 2009). also, for cronbach’s alpha, the fact that it assumes tau-equivalence, which is mostly violated, means it will underestimate the reliability of a psychological measure (revelle & condon, 2019; sijtsma, 2009). by contrast, mcdonald’s omega is a latent variable modelling approach to reliability estimation, which models the structure of a test (mcdonald, 1999; revelle & condon, 2019). as such, inferences regarding reliability will be primarily based on the results from this method. confirmatory factor analysis in line with the theory informing the development of the efi, a five-factor and a three-factor higher-order confirmatory factor model was tested, reflecting the multidimensional nature of ef according to spinella (2005). the analysis was computed using the ‘lavaan’ package (rosseel, 2012) in r (r core team, 2019). several goodness-of-fit indices were considered to evaluate the model, including the comparative fit index (cfi; bentler, 1990), tucker–lewis index (tli; tucker & lewis, 1973), root mean square error of approximation (rmsea; steiger & lind, 1980) and the standardised root mean square residual (srmr). satisfactory fit is typically reflected by cfi and tli values greater than 0.95 and less than 0.08 for rmsea and srmr (hu & bentler, 1999). weighted least squares mean and variance corrected estimation (wlsmv) was used given its performance on ordered categorical data relative to maximum likelihood (ml) estimation (beauducel & herzberg, 2006). item response theory analysis both one parameter logistic model (1pl; rasch) and two parameter logistic (2pl; graded response) models were computed to more closely examine item functioning on the subscales of efi. winsteps version 4.5.4 was used for rasch rating scale analysis (1960, 1980) and the ‘mirt’ package (chalmers, 2012), version 1.32.1, was used for graded response modelling (grm; samejima’s 1969, 1997, 2013) with the r-programming language (r core team, 2019). as rasch models philosophically require data to fit the model, infit mean-square values of less than 0.60 and greater than 1.40 criteria were considered for misfit on the likert type items of the efi (bond & fox, 2007). procedure the ethics committee of the department of psychology and faculty of humanities at a large urban university in south africa granted permission for data collection. all the participants were informed about the purpose of the study and had to provide informed consent. participants were informed that they could withdraw themselves from the study at any time if they so wished and all data will be kept private and confidential. email was the primary format used to relay all necessary information to participants. furthermore, a link was emailed to participants directing them to the questionnaire containing demographic questions and the psychological measures. the study was limited to current students (student numbers required) from the relevant institution. non-students were not eligible for participation. no incentives were offered for participation. the findings were used for research purposes only. ethical consideration permission to conduct the study was obtained from the ethics committee of the faculty of humanities at a large urban university in south africa (ethical clearance number: ec010562016). results descriptive statistics summarising the subscales of the efi along with bivariate correlations are reported in table 1. the correlations are all statistically significant ranging between 0.06 and 0.42. however, the sample size is quite large, and the available statistical power enables the identification of very small and practically insignificant associations as statistically significant, for example, the associations between md and ic (r = 0.06) and org and em (r = 0.11). table 1: bivariate correlations and descriptive statistics. reliability analysis reliability estimates for each of the subscales are reported in table 2. three types of internal consistency reliability were computed: mcdonalds omega (total), guttman 6 and cronbach’s alpha to allow a broad consideration of reliability. mcdonald’s omega is a latent variable-based method to compute reliability and is used for inference in this study, given some of the limitations of cronbach’s alpha and guttman 6 mentioned earlier (revelle & condon, 2019). inspection of table 2 shows mcdonald’s omega coefficients ranging between 0.59 and 0.76. it is interesting to note that both cronbach’s alpha and guttman 6 estimates were lower than mcdonald’s omega coefficients. table 2: internal consistency reliability estimates for the subscales of the executive functioning inventory. confirmatory factor analysis confirmatory factor analytic (cfa) results for the models tested are presented in table 3. the efi is a multidimensional measure, meaning that these scales in combination represent the ef required for a range of behaviours, such as planning and goal achievement. thus, ef is not a unidimensional latent construct underlying the scales of the efi; hence, a five-factor higher-order model was not tested. the cfa results for the five-factor model, reported in table 3, are mixed when considering the goodness-of-fit values. whilst the absolute fit indices (rmsea and srmr) are satisfactory, the incremental fit values (cfi and tli) are not. the same is true for the three-factor higher order model, for which the goodness-of-fit is generally weaker compared to the five-factor model. table 3: goodness-of-fit statistics for the models tested. factor loadings for the stronger model (five-factor) are reported in table 4. in general, most items had satisfactory loadings (ranging from 0.50 to 0.74), although there are eight items with relatively weak loadings (ranging from 0.27 to 0.48), which would influence model fit. inspection of residual correlations and the modification indices suggested that adjustments can be made to improve the model; however, the correlated errors did not have sufficient content overlap that would justify amendments to the model. thus, no model re-specifications were considered. table 4: standardised and unstandardised coefficients for the items of the executive functioning inventory. item response theory analysis both rasch and graded response models were applied separately to the subscales of the efi. the items of each scale mostly fit the rasch model well, with only item 12 (infit mean square = 1.62) of the em subscale overfitting the model. graded response model analysis further suggests that this is because of a relatively weak discrimination (‘a’) parameter of 0.640. these results are not overly problematic as it suggests the item is just not contributing much new information to the subscale. whilst not exceeding the rasch cut-off threshold for misfit, item 4 on the md subscale also had a high mean square infit value (1.34) relative to the other items comprising the construct. graded response model analysis further suggested that item 4 in particular, functioned quite poorly, with improperly ordered option characteristic curves (ooc) that arguably contribute more noise than signal to the measurement of md. this can be seen in the bottom right of figure 1 where the occs for the md items are displayed. whilst such problematic items may be less influential overall in scales with many items, this subscale is the shortest on the efi, and contains only four items in total. both items 4 and 12 are the only reversed scored items on their respective subscales. figure 1: option category curves for the items of the motivational drive subscale. discussion the aim of this study was to examine the psychometric evidence for the efi amongst south african students. specifically, its internal consistency reliability, factor structure and item response functioning. previous studies have found reasonable support for the reliability and factor structure of the efi in different populations, however, no research has been conducted in south africa. the reliability results for the subscales of the efi were mostly satisfactory, with mcdonald’s omega coefficients ranging between 0.59 and 0.76, although md (ω = 0.59) and ic (ω = 0.64) were somewhat weaker than expected. the cronbach’s alpha estimates in this study are largely similar to those by janssen et al. (2009) who reported estimates ranging between 0.63 and 0.69. however, both these studies found one scale each to be notably weaker; ic (0.41) was weaker in the study by janssen et al. (2009), whilst md (0.59) was weaker in the present study. both these studies were conducted on university samples with very similar age and gender representation, although the present study is larger and culturally much more diverse. spinella’s (2005) initial alpha estimates were somewhat stronger than the present results, whereas in contrast, a later study by smithmyer (2013) reported slightly weaker results in general compared with the present study. interestingly, smithmyer (2013) found sp (0.49) and org (0.59) to have the weakest reliability amongst the five subscales, neither of which were flagged for weak reliability in previous work. overall, there appears to be some fluctuation across reliability estimates in the literature, although results are reasonable in general. all studies – with the exception of spinella (2005) – were conducted amongst university students, so it remains unclear to what degree these fluctuations are attributable to random sample variation. with respect to the previous samples, however, the present study was unique with regard to its diverse cultural representation. nonetheless, whilst some weak results were observed for most scales of the efi at least once in different studies, it is arguably a good thing that no one scale was consistently flagged as problematic. the fact that most estimates were cronbach’s alpha coefficients – which typically underestimates true reliability – suggests that the efi’s true reliability is under-reported in the literature and supports the notion that the efi’s reliability is mostly acceptable. turning to the confirmatory factor analysis, the results for both the five-factor and three-factor higher-order models were somewhat ambiguous when considering their goodness-of-fit values. whilst the absolute fit indices were satisfactory in general, the incremental fit values were weaker. however, when comparing the five-factor and three-factor higher-order models, the former had better fit compared with the latter. as such, the factor loadings of only the five-factor model were presented for consideration. most items had satisfactory loadings on their expected factors, although there were eight items with relatively weak loadings, which influenced model fit. these results provide support for spinella’s (2005) theoretical model, with five separate constructs contributing to the assessment of executive function, although model fit was modest. these results are also consistent with smithmyer (2013), who found support for a five-factor solution. in their evaluation of the dutch translated version of the efi, janssen et al. (2009) were also able to replicate the efi structure proposed by spinella (2005), although three items (3, 9, 10) of the sp scale had primary loadings on a separate factor in a principal components analysis. in contrast, these items had satisfactory loadings in the present study, whilst item 13 had a relatively weak loading on this factor. compared with the five-factor model, the three-factor higher-order model found weak support in this study. this model corresponds to three major regions of the prefrontal cortex (cummings, 1993), for which spinella (2005) reported some support. like this study, janssen et al. (2009) also found little support for a three-factor higher order model. item response theory analysis showed that the items of the efi generally function appropriately in their respective subscales. this view is supported by both rasch rating scale and graded response models. only the two reversed scored items, 4 and 12, on the md and em subscales, respectively, were found to function quite poorly. it is recommended that researchers using the efi in south africa carefully examine the impact of these items in their own work. it would also be important to see if these items emerge as problematic in future studies conducted in this context to determine the robustness of the present results. some limitations of the present study should be noted. although the data were collected in a diverse sample of students within an urban setting, its conclusions are necessarily limited to this population. as such, results cannot be generalised to the south african population broadly. further, as spinella (2005) indicated, the efi is potentially sensitive to cross-cultural factors as well as differences in age, gender and education levels. this is important when considering the efi – or any measure of ef – for use in the south african context. as mentioned before, conditions of disadvantage are known to affect the development of executive functions, and this should be borne in mind given the socio-economic disparities across racial groups that remains present in south african society. future research is required to explore the degree to which scores on the efi are invariant across relevant demographic strata. conclusion the efi is a self-report measure of several constructs representing aspects of everyday ef. the present study examined the internal consistency reliability, factor structure and item functioning of the efi amongst university students in south africa. results show that the measure has mostly acceptable internal consistency reliability, and the confirmatory factor analysis found modest support for a five-factor model consistent with previous work (spinella, 2005). items of the efi also appear to function mostly well on their respective subscales. overall, the findings offer preliminary evidence that the efi can be used effectively in student populations of south africa as a brief self-report indicator of ef, noting the weaknesses and limitations described in this article. however, if a comprehensive assessment of ef is required, the efi should be supplemented by additional measures (i.e. objective measures) along with other clinical information where relevant. acknowledgements competing interests the authors have declared that no competing interests exist. authors’ contributions both authors have contributed equally to this work. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability statement data are available on request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references aron, a.r. (2008). progress in executive-function research: from tasks to functions to regions to networks. current directions in psychological science, 17(2), 124–129. https://doi.org/10.1111/j.1467-8721.2008.00561.x bailey, c.e. (2007). cognitive accuracy and intelligent executive function in the brain and in business. annals of the new york academy of sciences, 1118(1), 122–141. https://doi.org/10.1196/annals.1412.011 barkley, r.a. (2012). executive functions: what they are, how they work, and why they evolved. new york, ny: the guilford press. beauducel, a., & herzberg, p.y. (2006). on the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in cfa. structural equation modeling: a multidisciplinary journal, 13(2), 186–203. https://doi.org/10.1207/s15328007sem1302_2 bentler, p.m. (1990). comparative fit indexes in structural models. psychological bulletin, 107(2), 238–246. https://doi.org/10.1037/0033-2909.107.2.238 bentler, p.m. (2008). alpha, dimension-free, and model-based internal consistency reliability. psychometrika, 74(1), 137. https://doi.org/10.1007/s11336-008-9100-1 berthelsen, d., hayes, n., white, s.l.j., & williams, k.e. (2018). executive function in adolescence: associations with child and family risk factors and self-regulation in early childhood. in m. huizinga, d. baeyens, & j.a. burack (eds.), executive function and education (pp. 106–119). lausanne: frontiers media. bond, t.g., & fox, c.m. (2007). applying the rasch model. mahwah, nj: lawrence erlbaum associates. borella, e., carretti, b., & pelgrina, s. (2010). the specific role of inhibition in reading comprehension in good and poor comprehenders. journal of learning disabilities, 43(6), 541–552. https://doi.org/10.1177/0022219410371676 broidy, l.m., nagin, d.s., tremblay, r.e., brame, b., dodge, k.a., & fergusson, d.e. (2003). developmental trajectories of childhood disruptive behaviors and adolescent delinquency: a six-site cross-national study. developmental psychology, 39(2), 222–245. https://doi.org/10.1037/0012-1649.39.2.222 brown, t.e., & landgraf, j.m. (2010). improvements in executive function correlate with enhanced performance and functioning and health-related quality of life: evidence from 2 large, double-blind, randomized, placebo-controlled trials in adhd. postgraduate medicine, 122(5), 42–51. https://doi.org/10.3810/pgm.2010.09.2200 chalmers r.p. (2012). ‘mirt: a multidimensional item response theory package for the r environment’. journal of statistical software, 48(6), 1–29. http://dx.doi.org/10.18637/jss.v048.i06 chung, h.j., weyandt, l.l., & swentosky, a. (2014). the frontal lobes and executive functioning. in s. goldstein & j.a. naglieri (eds.), handbook of executive functioning (pp. 29–44). new york, ny: springer. crescioni, a.w., ehrlinger, j., alquist, j.l., conlon, k.e., & baumeister, r.f., schatschneider, c., & dutton, g.r. (2011). high trait self-control predicts positive health behaviors and success in weight loss. journal of health psychology, 16(5), 750–759. https://doi.org/10.1177/1359105310390247 cummings, j.l. (1993). frontal-subcortical circuits and human behavior. archives of neurology, 50(8), 873–880. https://doi.org/10.1001/archneur.1993.00540080076020 davis, j.c., marra, c.a., najafzadeh, m., & liu-ambrose, t. (2010). the independent contribution of executive functions to health-related quality of life in older women. bmc geriatrics, 10(1), 16. https://doi.org/10.1186/1471-2318-10-16 delis, d.c. (2012). delis rating of executive functions. bloomington, mn: pearson. de luca, c.r., & leventer, r.j. (2008). developmental trajectories of executive functions across the lifespan. in v. anderson, r. jacobs, & p.j. anderson (eds.), executive functions and the frontal lobes: a lifespan perspective (pp. 23–56). new york, ny: taylor and francis group. denson, t.f., pederson, w.c., friese, m., hahm, a., & roberts, l. (2011). understanding impulsive aggression: angry rumination and reduced self-control capacity are mechanisms underlying the provocation-aggression relationship. personality and social psycholology bulletin, 37(6), 850–862. https://doi.org/10.1177/0146167211401420 diamond, a. (2013). executive functions. annual review of psychology, 64(1), 135–168. https://doi.org/10.1146/annurev-psych-113011-143750 eakin, l., minde, k., hechtman, l., ochs, e., krane, e., bouffard, r., … looper, k. (2004). the marital and family functioning of adults with adhd and their spouses. journal of attention disorders, 8(1), 1–10. https://doi.org/10.1177/108705470400800101 egger, j., de mey, h., & janssen, g. (2007). assessment of executive functioning in psychiatric disorders: functional diagnosis as the overture of treatment. clinical neuropsychiatry, 4(3), 111–116. goldstein, s., & naglieri, j.a. (2014). handbook of executive functioning. new york, ny: springer goldstein, s., & naglieri, j.a., princiotta, d., & otero, t.m. (2014). introduction: a history of executive functioning as a theoretical and clinical construct. in the handbook of executive functioning. new york: springer. gray-burrows, k., taylor, n., o’connor, d., sutherland, e., stoet, g., & conner, m. (2019). a systematic review and meta-analysis of the executive function-health behaviour relationship. health psychology and behavioral medicine, 7(1), 253–268. https://doi.org/10.1080/21642850.2019.1637740 guy, s.c., isquith, p.k., & gioia, g.a. (2004). behavior rating inventory of executive function – self-report version. lutz, fl: psychological assessment resources. hackman, d.a., & farah, m.j. (2009). socioeconomic status and the developing brain. trends in cognitive sciences, 13(2), 65–73. https://doi.org/10.1016/j.tics.2008.11.003 hackman, d.a., farah, m.j., & meaney, m.j. (2010). socioeconomic status and the brain: mechanistic insights from human and animal research. nature reviews neuroscience, 11, 651–659. https://doi.org/10.1038/nrn2897 hackman, d.a., gallop, r., evans, g.w., & farah, m.j. (2015). socioeconomic status and executive function: developmental trajectories and mediation. developmental science, 18(5), 686–702. https://doi.org/10.1111/desc.12246 hu, l., & bentler, p.m. (1999). cut off criterion for fit indices in covariance structure analysis: conventional versus new alternatives. structual equation modelling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 jacobs, r., anderson, v., & anderson, p.j. (2008). executive functions and the frontal lobes: a lifespan perspective. new york, ny: taylor & francis. janssen, g.t.l., de mey, h.r.a.d., & egger, j.i.m. (2009). executive functioning in college students: evaluation of the dutch executive function index (efi-nl). international journal of neuroscience, 119(6), 792–805. https://doi.org/10.1080/00207450802333979 macleod, c.m. (1991). half a century of research on the stroop effect: an integrative review. psychological bulletin, 109(2), 163–203. https://doi.org/10.1037/0033-2909.109.2.163 mccloskey, g., & perkins, l.a. (2013). essentials of executive functions assessment. hoboken, nj: wiley. mccloskey, g., perkins, l.a., & van divner, b.r. (2009). assessment and intervention for executive function difficulties. new york, ny: routledge. mcdonald, r.p. (1999). test theory: a unified treatment. london: taylor and francis. mcewen, b.s., & gianaros, p.j. (2010). central role of the brain in stress and adaptation: links to socioeconomic status, health, and disease. annals of the new york academy of sciences, 1186(1), 190–222. https://doi.org/10.1111/j.1749-6632.2009.05331.x mcdonald, r.p. (1999). test theory: a unified treatment. mahwah, nj: l. erlbaum associates. miller, h.v., barnes, j.c., & beaver, k.m. (2011). self-control and health outcomes in a nationally representative sample. american journal of health behavior, 35(1), 15–27. https://doi.org/10.5993/ajhb.35.1.2 miyake, a., & friedman, n.p. (2012). the nature and organization of individual differences in executive functions: four general conclusions. current directions in psychological science, 21(1), 8–14. https://doi.org/10.1177/0963721411429458 morrison, f.j., ponitz, c.c., & mcclelland, m.m. (2010). self-regulation and academic achievement in the transition to school. in s.d. calkins & m.a. bell (eds.), human brain development. child development at the intersection of emotion and cognition (pp. 203–224). american psychological association. https://doi.org/10.1037/12059-011 naglieri, j.a., & goldstein, s. (2013). comprehensive executive functioning index. toronto: multi health systems. otero, t.m., & barker, l.a. (2014). the frontal lobes and executive functioning. in s. goldstein & j.a. naglieri (eds.), handbook of executive functioning (pp. 29–44). new york, ny: springer. rasch, g. (1960, 1980). probabilitistic models for some intelligence and attainment tests. chicago, il: university of chicago press. r core team. (2019). r: a language and environment for statistical computing. vienna, austria: r foundation for statistical computing. retrieved from https://www.r-project.org revelle, w., & condon, d.m. (2019). reliability from α to ω: a tutorial. psychological assessment, 31(12), 1395–1411. https://doi.org/10.1037/pas0000754 revelle, w., & zinbarg, r.e. (2009). coefficients alpha, beta, omega, and the glb: comments on sijtsma. psychometrika, 74(1), 145–154. https://doi.org/10.1007/s11336-008-9102z rosseel, y. (2012). lavaan: an r package for structural equation modeling. journal of statistical software, 48(2), 1–36. http://www.jstatsoft.org/v48/i02/ salthouse, t.a., atkinson, t.m., & berish, d.e. (2003). executive functioning as a potential mediator of age-related cognitive decline in normal adults. journal of experimental psychology: general, 132(4), 566–594. https://doi.org/10.1037/0096-3445.132.4.566 samejima, f. (1969). estimation of latent ability using a response pattern of graded scores. psychometrika 17, 1–68. https://doi.org/10.1007/bf03372160 samejima, f. (1997). graded response model. in w.j. van der linden & r.k. hambleton (eds.), handbook of modern item response theory (pp. 85–100). new york, ny: springer. samejima, f. (2013). graded response models. in w.j. van der linden (eds.), handbook of item response theory (vol. 1, pp. 95–108). boca raton, fl: taylor and francis. sheridan, m.a., sarsour, k., jutte, d., d’esposito, m., & boyce, w.t. (2012). the impact of social disparity on prefrontal function in childhood. plos one, 7(4), e35744. https://doi.org/10.1371/journal.pone.0035744 sijtsma, k. (2009). on the use, the misuse, and the very limited usefulness of cronbach’s alpha. psychometrika, 74(1), 107. https://doi.org/10.1007/s11336-008-9101-0 smithmyer, p.j. (2013). validation of the executive function index, unpublished doctoral dissertation, javier university. spinella, m. (2005). self-rated executive function: development of the executive function index. international journal of neuroscience, 115(5), 649–667. https://doi.org/10.1080/00207450590524304 steiger, j.h., & lind, j.c. (1980). statistically based tests for the number of common factors. annual meeting of the psychonomic society, iowa city, ia, may 30. strauss, e., sherman, e.m.x., & spreen, o. (2006). a compendium of neuropsychological tests (3rd edn.). new york, ny: oxford university press. suchy, y. (2009). executive functioning: overview, assessment, and research issues for non-neuropsychologists. the society of behavioral medicine, 37(2), 106–116. https://doi.org/10.1007/s12160-009-9097-4 suchy, y. (2016). executive functioning: a comprehensive guide for clinical practice. new york, ny: oxford university press. toplak, m.e., west, r.f., & stanovich, k.e. (2012). practitioner review: do performance-based measures and ratings of executive function assess the same construct? journal of child psychology and psychiatry, 54(2), 131–143. https://doi.org/10.1111/jcpp.12001 tucker, l.r., & lewis, c. (1973). a reliability coefficient for maximum likelihood factor analysis. psychometrika, 38(1), 1–10. https://doi.org/10.1007/bf02291170 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) charles h. van wijk institute for maritime medicine, simon’s town, south africa department of global health, faculty of medicine and health sciences, stellenbosch university, cape town, south africa citation van wijk, c.h. (2022). psychometric description of the life orientation test-revised in a south african sample: a pilot study. african journal of psychological assessment, 4(0), a51. https://doi.org/10.4102/ajopa.v4i0.51 original research psychometric description of the life orientation test-revised in a south african sample: a pilot study charles h. van wijk received: 12 jan. 2021; accepted: 05 dec. 2021; published: 25 feb. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the relevance of dispositional optimism – as measured by the life orientation test-revised (lot-r) – in health psychology has been convincingly demonstrated in numerous cross-national studies; however, empirical evidence of its psychometric quality and normative parameters in the south african context are lacking. firstly, this pilot study aimed to replicate previous international psychometric and normative data analyses, and secondly, to extend the investigation into associations with clinical measures of mental health and associated measures of general psychological well-being and resilience in a south african sample. a sample of 755 adults from south african workplaces (42% women, aged 19–62 years) completed the lot-r and a selection of self-rated measures of clinical mental health and general psychological well-being and resilience. life orientation test-revised total mean scores were comparable with international samples, with normative reference data supplied to interpret individual scores. confirmatory factor analysis suggested a bi-dimensional model as best fit, and two independent factors were identified, namely, optimism and pessimism. significant correlations with measures of psychological health and well-being were observed. mental health constructs were better characterised by the presence of pessimism than the absence of optimism. no significant age or gender effects were observed but the role of language requires further clarification. this study provided a psychometric description of the lot-r in a south african sample, including support for both the bi-dimensionality of the lot-r in this context and its construct validity. the study further provided preliminary normative data for a local sample against which individual scores can be interpreted. keywords: health psychology; normative data; optimism; pessimism; psychological well-being. introduction dispositional optimism is usually understood as a personality characteristic and conceptualised as a general tendency to expect positive outcomes (carver & scheier, 2014; carver, scheier, & segerstrom, 2010). the relevance of the construct of optimism in health psychology has been convincingly demonstrated in numerous studies. optimism has been associated with differences in mental and physical health, quality of life, adaptive coping styles, life satisfaction, recovery after severe illness and mortality (cf. hinz et al., 2017, p. 162, for a summary), and has been linked to a range of biological markers and pain responses (cf. schou-bredala et al., 2017, p. 217, for a summary). pertinent to the context of this article, its association with markers of mental health and psychological well-being (e.g. depression, anxiety, fatigue, self-efficacy, perceived stress) has been established in various cross-continental contexts (yew, lim, haw, & gan, 2015; zenger et al., 2013; also cf. schou-bredala et al., 2017, p. 217, for a summary). internationally, the life orientation test-revised (lot-r; scheier, carver, & bridges, 1994) is the tool used most often for measuring dispositional optimism. the lot-r is a 10-item scale that comprises three items (reflecting optimism) that are scored positively, three items (reflecting pessimism) that are reversed scored and four filler items that are not scored. items are rated on a five-point likert scale (0 = strongly disagree, 4 = strongly agree). it has been translated into many languages and psychometrically tested in multiple studies, which included tests of its dimensional structure (cano-garcía et al., 2015; glaesmer et al., 2012; zenger et al., 2013), temporal stability (saboonchi et al., 2016) and item response theory (chiesi, galli, primi, borgi, & bonacchi, 2013; steca, monzani, creco, chiesi, & primi, 2015). cross-national comparisons suggest that optimism varies between countries (gallagher, lopez, & pressman, 2013; schou-bredala et al., 2017). normative values of the general population are available for germany (glaesmer et al., 2012; hinz et al., 2017), colombia (zenger et al., 2013), brazil (bastianello, pacico, & hutz, 2014), the united kingdom (walsh et al., 2015) and norway (schou-bredala et al., 2017) amongst others. there is an ongoing debate regarding the dimensionality of the lot-r. the original authors described the scale as a continuum in which pessimism and optimism are viewed as polar opposites and not as separate dimensions (scheier et al., 1994), and continue to recommend that the lot-r be used as a unidimensional scale in primary analyses (carver et al., 2010). in support, some recent studies endorsed the one-dimensionality of the lot-r and suggested that previously reported bi-factorial structures were artefacts of item wording (cano-garcia et al., 2015; monzani, steca, & greco, 2014; steca et al., 2015). however, most large sample studies using factor analysis tend to describe optimism and pessimism as two, at least partially, independent (but weakly related) factors (glaesmer et al., 2012; hinz et al., 2017; zenger et al., 2013). researchers further described increased independence of optimism and pessimism with increased age (creed, patton, & bartrum, 2002; glaesmer et al., 2012; hinz et al., 2017). age and gender effects appear to be marginal across international samples (bastianello et al., 2014; glaesmer et al., 2012; hinz et al., 2017; schou-bredala et al., 2017; steca et al., 2015; zenger et al., 2013). the lot-r as a measure for dispositional optimism has been established in south and north america (bastianello et al., 2014; scheier et al., 1994; trottier, mageau, trudel, & halliwell, 2008; zenger et al., 2013), europe and asia (glaesmer et al., 2012; hinz et al., 2017; lai & yue, 2000; schou-bredala et al., 2017; walsh et al., 2015; yew et al., 2015) and australia (creed et al., 2002). empirical evidence of its psychometric quality in african samples has not yet been established. the existing south african (sa) empirical studies using the lot-r have been conducted on a smaller scale (koen, van eeden, & wissing, 2011; maree, maree, & collins, 2008; rothmann, barkhuizen, & tytherleigh, 2008), each investigating highly specific samples, which limits the extent to which the outcomes could be generalised. before the lot-r can be considered for use in health research with general samples within sa, there is a need to examine the evidence of its validity in the local context. south africa has a diverse and multilingual population, with wide disparities in education, income and access to health care. in order to provide a psychometric description, a replication of cross-national studies reporting on psychometric properties and population-based norms of the lot-r is therefore indicated. this article describes a pilot study designed to establish the usefulness of continuing with population-based data collection for the lot-r. the study aimed to replicate previous psychometric and normative data analyses and also to extend the investigation into associations with clinical measures of mental health, and associated measures of general psychological well-being and psychological resilience. the study set three specific objectives, namely (1) to provide psychometric description for a sa sample (using the standard english version of the lot-r), including dimensionality, internal consistency and socio-demographic effects; (2) to explore its associations with mental health and associated psychological markers, in order to consider construct validity; and (3) to provide provisional normative data for sa workplace samples for use in local health psychology research. methods participants this pilot study used a sample from sa workplaces (n = 755). all participants were considered skilled workers, had a minimum 10 years of schooling and identified themselves as proficient in english, although only about 20% reported english as their first language. the educational inclusion criterion was partially to ensure a level of english proficiency sufficient to complete the lot-r and other measures. as a result of a technical error, language data were only available for 82% of the sample, with the distribution presented in table 1. participants were recruited to complete the measures anonymously during visits to their workplaces, which comprised a wide range of occupational backgrounds (see table 1). table 1: language and occupational backgrounds of sample. measures the lot-r was administered in its standard version in english. the original normative study (scheier et al., 1994) reported a single factor accounting for 48% of variance, with α = 0.78. test–retest reliability ranged from 0.68 over 4 months to 0.79 over 28 months. international studies reported a range of alpha coefficients, from 0.58 to 0.80 (see table 2), whilst a local study (koen et al., 2011) reported α = 0.59. psychometric properties of the lot-r from various cross-national studies are summarised in table 2 for comparison with figures from the present sample. table 2: psychometric properties of the life orientation test-revised. study participants also completed a selection of other measures. not all participants completed all scales, and the total n for each scale will be indicated in the applicable tables. the following clinical measures of mental health were included in the study. the patient health questionnaire for depression (phq-9; gilbody, richards, & barkham, 2007) is a nine-item measure that is scored on a four-point likert scale (range 0–27), with higher scores indicating higher levels of depression. moderate correlations have previously been reported for the lot-r and phq-9 (glaesmer et al., 2012), and other scales of depression (zenger et al., 2013). the generalised anxiety disorder questionnaire (gad-7; löwe et al., 2008) is a seven-item measure that is scored on a four-point likert scale (range 0–21), with higher scores indicating higher levels of anxiety. moderate correlations have also been reported for the lot-r and gad-7 (glaesmer et al., 2012), and other scales of anxiety (zenger et al., 2013). the cage questionnaire for problematic alcohol use (dhalla & kopec, 2007) is a four-item measure, scored as yes/no (range 0–4), with higher scores indicating more problematic alcohol use. the following measures of general psychological well-being were also included in the study: the stress overload scale (sos; amirkhan, 2012) is used to indicate appraisals of demands and personal resources. it has 24 scored items using a 5-point likert scale (range 24–120), with higher scores indicating greater appraisal of stress overload. two factor scores can also be calculated, namely event load and personal vulnerability. moderate to strong correlations have previously been reported for the lot-r and perceived stress scale (pss; chang, 1998; yew et al., 2015). the state trait personality inventory, trait version (stpi; spielberger, 1996) reflects emotional disposition. it has four 10-item subscales, each scored on a 4-point likert scale (range 10–40), with higher scores indicating greater endorsement of the respective emotional dispositions (namely, trait anxiety, curiosity, anger and depression). a strong correlation with trait anxiety was reported in the original validation study (scheier et al., 1994). finally, two scales of psychological resilience were included to examine associations between the lot-r and other measures from positive psychology: the dispositional resilience scale (drs-15; bartone, 2007) is a 15-item measure that is scored on a 4-point likert scale (range 0–45), with higher scores indicating greater resilience. the mental toughness questionnaire (mtq-18; clough, earle, & sewell, 2002) is an 18-item measure that is scored on a five-point likert scale (range 18–90), with higher scores again indicating greater resilience. strong correlations have previously been reported for measures of dispositional optimism and resilience (sagone & de caroli, 2015), and the above two measures were specifically included because of their previous use for measuring resilience in sa (arendse, bester, & van wijk, 2020). participants also completed a brief health questionnaire and were asked to indicate their health status with regard to debilitating acute or chronic diseases. its purpose was to exclude severe medical conditions that could unduly influence responses to the psychological scales. analysis all statistical analyses were conducted by statistical package for social sciences (spss version 27) and analysis of moment structures (amos). internal consistency was examined with cronbach’s alpha, inter-item correlations and corrected item-total correlations. against the ongoing debate on dimensionality, the lack of previous factor analytic studies from sa and the poor alpha coefficients found in the current sample, a confirmatory factor analysis (cfa) was conducted. confirmatory factor analysis is a special form of factor analysis used to test whether data fit a hypothesised measurement model (marker, 2002). the maximum likelihood estimator was used to explore a 1and 2-factor model fit. for a cfa, the global fit χ2 would be preferred to be small and not significant. this is rarely achieved, and the following indices with cut points were also taken into consideration: the root mean square error of approximation (rmsea) should be < 0.06 to < 0.08 for continuous data, whilst both the comparative fit index (cfi) and the tucker–lewis index (tli) should be > 0.95 (schreiber, nora, stage, barlow, & king, 2006). the effects of socio-demographic variables were explored using pearson’s correlation coefficients (for age effects) and t-tests for independent samples (for gender and language effects). for this analysis, language was coded into two groups, namely, english first language (21.3%) and non-english first language (78.7%). as mentioned earlier, cross-national comparisons indicated variable scores between countries (gallagher et al., 2013; schou-bredala et al., 2017), requiring individual lot-r scores to be interpreted using local norms (glaesmer et al., 2012). in line with best practice for lot-r reporting (glaesmer et al., 2012), sa normative data will be presented using standardised scores. construct validity was explored by calculating the associations of lot-r scores and markers of clinical mental health (phq-9, gad-7, cage) and general psychological well-being and resilience (sos, stpi, drs-15, mtq-18) using correlation with correction for attenuation. ethical considerations this study was a voluntary, anonymous, survey. the approval to conduct the study was received from the stellenbosch university health research ethics committee (no. n20-07-078). results the sample of 755 participants (women = 42%, men = 58%) had a mean age of 32.8 (± 7.4; range 19–62). the sample described a positive health status, self-reporting a general absence of debilitating acute or chronic disease. there were no meaningful differences in the composition of the five subsamples referenced in table 5 with regard to age, gender or language. the sample included a wide distribution across the working age, gender, home language and occupational categories. the lot-r total scale mean score was 16.4 (± 2.9), which differed significantly from the means reported by local sa studies presented in table 2 (t-tests for single samples not reported here). further basic psychometric properties are reported in table 2. the lot-r total score was normally distributed (skewness = 0.317, se = 0.089; kurtosis = –0.111, se = 0.178). in terms of internal consistency, the lot-r performed poorly with a total scale cronbach’s alpha of 0.39. no deletion of items improved the alpha. corrected item-total correlations ranged from 0.24 to 0.32 for optimism subscale items and from 0.29 to 0.40 for pessimism subscale items. inter-item correlations ranged from 0.15 to 0.26 for the optimism subscale and from 0.17 to 0.31 for the pessimism subscale. dimensionality the 6-item lot-r was subjected to cfa, and the results are presented in table 3. all model fit indices of the cfa indicated that the assumption of a bidimensional structure of the lot-r fits the data much better than the unidimensional structure. although the two-factor model did not obtain a non-significant χ2, the value was not excessively high. the rmsea (0.053) was sufficiently small (< 0.06), and the cfi (0.93) was close enough to 0.95, although the tli (0.83) was the exception (table 3). the two subscales did not significantly correlate with each other (r = –0.007, p = 0.838), further suggesting two independent constructs, rather than a bipolar scale. table 3: goodness-of-fit indices for confirmatory factor analysis. socio-demographic effects there were no significant age effects for the total score (r = 0.056, p = 0.125) or optimism subscale score (r = –0.031, p = 0.394), with a significant but very small effect for the pessimism subscale score (r = 0.096, p = 0.008). no significant gender differences were observed (table 4a). for the sub-sample where language data were available, there was a significant difference in the mean scores between the english first language and non-english first language subgroups (table 4b) although the mean difference was < 1, which may not be practically meaningful. differences across language groups for the optimism subscale were non-significant, but significant for the pessimism subscale (mean difference = 1). table 4a: comparison of means of gender groups for life orientation test-revised total and subscale scores. table 4b: comparison of means of language groups for life orientation test-revised total and subscale scores. correlations with mental health and associated psychological markers construct validity indicators are reported in table 5. dispositional optimism correlated with clinical measures of depression and anxiety, and perceived stress overload, with moderate effect sizes. correlations for the three clinical scales, as well as the sos, were stronger for the pessimism than for the optimism subscale. table 5: correlations with selected mental health markers. for measures of general psychological well-being, correlations with large effect sizes were observed for dispositional anxiety, curiosity and depression. furthermore, correlations with large effect sizes were found for the drs-15 and the mtq-18. again, in some cases (e.g. mtq-18), stronger correlations were observed for the pessimism than the optimism subscale. preliminary normative data in the absence of significant age and gender effects, normative reference data were developed for the full sample (table 6). table 6: preliminary employed south africans reference norms. discussion comparisons with other countries and local studies the total lot-r mean score of 16.4 was comparable with most international samples (with the notable exception of brazil; bastianello et al., 2014), as were the subscale means. the significant differences in the mean scores from local studies may emphasise differences within the sa society – the current sample mean fell in-between the two previous reported local means (which represented discrete and highly individualised samples), and the internationally comparable mean score could possibly be attributed to the wide range of occupational domains included in the present sample, as opposed to the previous sa samples. this may speak to the need for adequately diversified sampling when doing any general health psychology research in sa. despite similar lot-r mean scores, the standardised scores distribution for the sa sample (table 6) differed in its nuanced spread to normative data from comparable international studies (cf. glaesmer et al., 2012; hinz et al., 2017; schou-bredala et al., 2017; zenger et al., 2013), emphasising the requirement for local reference norms to enable meaningful interpretation of individuals’ scores. psychometrics evidence of a two-factor scale structure was found in the results of the cfa, which suggested the hypothesised bi-dimensional model as best fit for this sa sample. the two factors displayed no significant correlation with each other and further appeared to display different patterns of correlation with other measures. in this regard, the three clinical scales, as well as the sos and mtq-18, showed stronger associations with the pessimism than with the optimism factor. although the findings around dimensionality appear contrary to some recent reports, which suggested that the lot-r taps a single construct (cano-garcia et al., 2015), it does follow the pattern found with european, south american and asian population samples (glaesmer et al., 2012; lai & yue, 2000; zenger et al., 2013). more problematic is the poor internal consistency. the weak alpha stands in contrast with other reports and cautions against an uncritical use of the lot-r in the african context. language diversity, particularly in responding to negatively valanced items, may have contributed to the poor internal consistency. no significant age or gender effects were observed, and it is consistent with previous studies. home language offered a more complex outcome: whilst there was a significant difference in mean scores between english first language and non-english first language speakers, the difference was very small, and any practical meaning is not yet clear. further research studies would be required to enhance confidence when using the english version test across sa language groups (at least in cases where appropriate english proficiency can be demonstrated). interestingly, there was no significant difference across the two language groups for optimism mean scores, but a significant and larger mean difference for pessimism scores. in terms of direct language effect, the use of negatively valanced items – such as the three items of the pessimism subscale – has previously been implied as problematic in non-english first language-speaking sa samples (arendse et al., 2020), where the negative wording may require a higher level of english proficiency to interpret accurately. a similar split between positively and negatively worded items have also been observed in chinese samples (lai & yue, 2000). in terms of actual optimism, south africa’s political history resulted in individuals raised with different levels of access to resources and ensuing beliefs regarding future opportunities, which could conceivably have influenced the development of dispositional optimism across different subgroups (which historically were often associated with language). this, however, remains speculative, and further research would be required to investigate these issues formally. correlations with markers of associated psychological constructs evidence of construct validity was observed in the meaningful correlations with markers of clinical mental health, general psychological well-being and resilience, in this sample of healthy south africans. in general, correlations with mental health markers were similar or slightly higher than what have been reported in previous studies. as expected, lot-r scores were associated with depressiveness and anxiety, as well as problematic alcohol use. emotional disposition, as a measure of general psychological well-being, and quantified by the stpi, showed the highest correlations with dispositional optimism, which closely reflected the original conceptualisation and reported correlations of scheier et al. (1994). furthermore, contrary to some previous reports (cf. glaesmer et al., 2012; zenger et al., 2013), a general pattern appeared where mental health constructs were better characterised by the presence of pessimism than the absence of optimism. this observation supports previous reports that pessimism, but not optimism, was a better predictor of longer term psychological and physical health outcomes (robinson-whelen, kim, maccallum, & kiecolt-glaser, 1997). the association with perceived stress, whilst in the expected direction, was not as strong as previous reports (chang, 1998; yew et al., 2015), although this may be partly because of different measures used (i.e. pss vs. sos). the association with resilience measures followed the expected direction. the comparatively weak correlation with the drs-15 may be instrument, rather than construct, related, as a previous study recommended caution when using the drs-15 for measuring resilience in the sa context (arendse et al., 2020). the strong correlations with the mtq-18 suggest that both the lot-r and the mtq-18 may be useful to measure constructs of positive psychology in sa. across the various instruments, full-scale correlations were stronger, and until further research is carried out, the use of total scores rather than subscale scores would be recommended for future sa health psychology studies. it was noteworthy that the pattern of correlations was consistent across measures of psychological distress (e.g. phq-9, gad-7), as well as measures of psychological well-being (e.g. stpi, mpq-18). the evidence of construct validity – in its association with measures of mental health and psychological well-being – provide support for the use of the lot-r in local health psychology research. limitations and future directions a number of limitations to this study need to be mentioned. it was a pilot study, with concomitant limited size, and the sample cannot necessarily be considered representative of a general population of proficient english speakers. furthermore, english proficiency was assumed. the assumption was based partly on self-evaluated proficiency, and partly on reported educational attainment, and it is recognised that education may not be a good proxy for language proficiency in sa. future sa studies will need to expand sampling to clarify language effects, as well as repeat factor analysis and internal consistency calculations with larger samples. expanding studies to include other samples of sub-saharan africa would further elucidate the influence of localised environments. when further validation for the use of the lot-r in african contexts has been obtained, it can be productively applied to local health research. the lot-r was originally conceptualised to express relationships between dispositional optimism and long-term psychological and physiological health outcomes, and could be used for the same purpose in longitudinal studies to explore relationships between dispositional factors and health in local contexts. conclusion this study made a novel contribution, firstly, by providing support for the bi-dimensionality of the lot-r in a sa sample, and secondly, by presenting preliminary normative data for a sa sample against which individual scores can be interpreted. in terms of practical application, the wide distribution of participants supported a single set of reference data that can be used across gender and age variables. this study further provided support that the lot-r may contribute by extending health psychology research into multiple constructs of clinical mental health, as well as general psychological well-being and resilience, in the local context. however, caution must be observed for possible effects of language proficiency, whilst the poor internal consistency cautions against any uncritical use of the instrument in south african studies. acknowledgements competing interests the author declares that he has no financial or personal relationships that may have inappropriately influenced him in writing this article. author’s contributions c.h.v.w. declares that he is the sole author of this article. funding information this research work received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the data that support the findings of this study are available from the author, upon reasonable request. the data are not publicly available because of privacy and ethical consideration. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author. references amirkhan, j.h. (2012). stress overload: a new approach to the assessment of stress. american journal of community psychology, 49(1–2), 55–71. https://doi.org/10.1007/s10464-011-9438-x arendse, d., bester, p., & van wijk, c. (2020). exploring psychological resilience in the south african navy. in n.m. dodd, p.c. bester, & j. van der merwe (eds.), contemporary issues in south african military psychology (pp. 137–160). stellenbosch: african sun media. https://doi.org/10.18820/9781928480631/08 bartone, p.t. (2007). test-retest reliability of the dispositional resilience scale-15, a brief hardiness scale. psychological reports, 101(3 pt 1), 943–944. https://doi.org/10.2466/pr0.101.3.943-944 bastianello, m.r., pacico, j.c., & hutz, c.s. (2014). optimism, self-esteem and personality: adaptation and validation of the brazilian version of the revised life orientation test (lot-r). psico-usf, bragança paulista, 19(3), 523–531. https://doi.org/10.1590/1413-82712014019003014 cano-garcia, f.j., sanduvete-chaves, s., chacon-moscoso, s., rodriguez-franco, l., garcia-martinez, j., antuna-bellerin, m.a., & perez-gil, j.a. (2015). factor structure of the spanish version of the life orientation test-revised (lot-r): testing several models. international journal of clinical and health psychology, 15(2), 139–148. https://doi.org/10.1016/j.ijchp.2015.01.003 carver, s.c., & scheier, m.f. (2014). dispositional optimism. trends in cognitive science, 18(6), 293–299. https://doi.org/10.1016/j.tics.2014.02.003 carver, c.s., scheier, m.f., & segerstrom, s.c. (2010). optimism. clinical psychology review, 30(7), 879–889. https://doi.org/10.1016/j.cpr.2010.01.006 chang, e.c. (1998). does dispositional optimism moderate the relation between perceived stress and psychological well-being?: a preliminary investigation. personality and individual differences, 25(2), 233–240. https://doi.org/10.1016/s0191-8869(98)00028-2 chiesi, f., galli, s., primi, c., borgi, p.i., & bonacchi, a. (2013). the accuracy of the life orientation test-revised (lot-r) in measuring dispositional optimism: evidence from item response theory analyses. journal of personality assessment, 95(5), 523–529. https://doi.org/10.1080/00223891.2013.781029 clough, p., earle, k., & sewell, d. (2002). mental toughness: the concept and its measurement. in i. cockerill (ed.), solutions in sport psychology (pp. 32–46). london: thomson learning. creed, p.a., patton, w., & bartrum, d. (2002). multidimensional properties of the lot-r: effects of optimism and pessimism on career and well-being related variables in adolescents. journal of career assessment, 10(1), 42–61. https://doi.org/10.1177/1069072702010001003 dhalla, s., & kopec, j.a. (2007). the cage questionnaire for alcohol misuse: a review of reliability and validity studies. clinical investigative medicine, 30(1), 33–41. https://doi.org/10.25011/cim.v30i1.447. gallagher, m.w., lopez, s.j., & pressman, s.d. (2013). optimism is universal: exploring the presence and benefits of optimism in a representative sample of the world. journal of personality, 81(5), 429–440. https://doi.org/10.1111/jopy.12026 gilbody, s., richards, d., & barkham, m. (2007). diagnosing depression in primary care using self-completed instruments: uk validation of phq-9 and core-om. british journal of general practice, 57, 650–652. glaesmer, h., rief, w., martin, a., mewes, r., brähler, e., zenger, m., & hinz, a. (2012), psychometric properties and population-based norms of the life orientation test revised (lot-r). british journal of health psychology, 17(2), 432–445. https://doi.org/10.1111/j.2044-8287.2011.02046.x hinz, a., sander, c., glaesmer, h., brähler, e., zenger, m., hilbert, a., & kocalevent, r.-d. (2017). optimism and pessimism in the general population: psychometric properties of the life orientation test (lot-r). international journal of clinical and health psychology, 17(2), 161–170. https://doi.org/10.1016/j.ijchp.2017.02.003 koen, m., van eeden, c., & wissing, m. (2011). the prevalence of resilience in a group of professional nurses. health sa gesondheid, 16(1), 1–11. https://doi.org/10.4102/hsag.v16i1.576 lai, j.c.l., & yue, x. (2000). measuring optimism in hong kong and mainland chinese with the revised life orientation test. personality and individual differences, 28(4), 781–796. https://doi.org/10.1016/s0191-8869(99)00138-5 löwe, b., decker, o., müller, s., brähler, e., schellberg, d., herzog, w., & herzberg, p.y. (2008). validation and standardization of the generalized anxiety disorder screener (gad-7) in the general population. medical care, 46(3), 266–274. https://doi.org/10.1097/mlr.0b013e318160d093 maree, d.j.f., maree, m., & collins, c. (2008). constructing a south african hope measure. journal of psychology in africa, 18(1), 167–178. https://doi.org/10.1080/14330237.2008.10820183 marker, d. (2002). model theory: an introduction. new york, ny: springer-verlag. monzani, d., steca, p., & greco, a. (2014). brief report: assessing dispositional optimism in adolescence – factor structure and concurrent validity of the life orientation test-revised. journal of adolescence, 37(2), 97–101. https://doi.org/10.1016/j.adolescence.2013.11.006 robinson-whelen, s., kim, c., maccallum, r.c., & kiecolt-glaser, j.k. (1997). distinguishing optimism from pessimism in older adults: is it more important to be optimistic or not to be pessimistic? journal of personality and social psychology, 73(6), 1345–1353. https://doi.org/10.1037/0022-3514.73.6.1345 rothmann, s., barkhuizen, n., & tytherleigh, m.y. (2008). model of work-related ill health of academic staff in a south african higher education institution. south african journal of higher education, 22(2), 404–422. https://doi.org/10.4314/sajhe.v22i2.25794 saboonchi, f., petersson, l.-m., alexanderson, k., branstrom, r., & wennman-larsen, a. (2016). expecting the best and being prepared for the worst: structure, profiles, and 2-year temporal stability of dispositional optimism in women with breast cancer. psycho-oncology, 25(8), 957–963. https://doi.org/10.1002/pon.4045 sagone, e., & de caroli, m.e. (2015). positive personality as a predictor of high resilience in adolescence. journal of psychology and behavioral science, 3(2), 45–53. https://doi.org/10.15640/jpbs.v3n2a6 scheier, m.f., carver, c.s., & bridges, m.w. (1994). distinguishing optimism from neuroticism (and trait anxiety, self-mastery, and self-esteem): a re-evaluation of the life orientation test. journal of personality and social psychology, 67(6), 1063–1078. https://doi.org/10.1037/0022-3514.67.6.1063 schou-bredala, i., heira, t., skogstad, l., bonsaksen, t., lerdal, a., grimholt, t., & ekeberg, ø. (2017). population-based norms of the life orientation test-revised. international journal of clinical and health psychology, 17(3), 216–224. https://doi.org/10.1016/j.ijchp.2017.07.005 schreiber, j.b., nora, a., stage, f.k., barlow, e.a., & king, j. (2006). reporting structural equation modeling and confirmatory factor analysis results: a review. the journal of educational research, 99(6), 323–338. https://doi.org/10.3200/joer.99.6.323-338 spielberger, c.d. (1996). preliminary manual for the state-trait personality inventory. tampa, fl: university of south florida. steca, p., monzani, d., creco, a., chiesi, f., & primi, c. (2015). items response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses. assessment, 22(3), 341–350. https://doi.org/10.1177/1073191114544471 trottier, c., mageau, g., trudel, p., & halliwell, w.r. (2008). validation of the canadian-french version of life orientation test-revised. canadian journal of behavioural science/revue canadienne des sciences du comportement, 40(4), 238–243. https://doi.org/10.1037/a0013244 walsh, d., mccartney, g., mccullough, van der pol, m., buchanan, d., & jones, r. (2015). always looking on the bright side of life? exploring optimism and health in three uk post-industrial urban settings. journal of public health, 37(3), 389–397. https://doi.org/10.1093/pubmed/fdv077 yew, s., lim, k., haw, y., & gan, s. (2015). the association between perceived stress, life satisfaction, optimism, and physical health in the singapore asian context. asian journal of humanities and social sciences, 3(1), 56–66. zenger, m., finck, c., zanon, c., imenez, w., singer, s., & hinz, a. (2013). evaluation of the latin american version of the life orientation test-revised. international journal of clinical and health psychology, 13(3), 243–252. https://doi.org/10.1016/s1697-2600(13)70029-2 abstract introduction methods procedure and ethical considerations data analysis results discussion limitations and recommendations conclusion acknowledgements references about the author(s) itumeleng p. khumalo department of psychology, faculty of humanities, university of the free state, bloemfontein, south africa ufuoma p. ejoke department of psychology, faculty of humanities, university of the free state, bloemfontein, south africa kwaku oppong asante department of psychology, faculty of humanities, university of the free state, bloemfontein, south africa department of psychology, university of ghana, accra, ghana janvier rugira psychosocial wellbeing section, united nations high commission for refugees, pretoria, south africa citation khumalo, i.p., ejoke, u.p., oppong asante, k., & rugira, j. (2021). measuring social well-being in africa: an exploratory structural equation modelling study. african journal of psychological assessment, 3(0), a37. https://doi.org/10.4102/ajopa.v3i0.37 original research measuring social well-being in africa: an exploratory structural equation modelling study itumeleng p. khumalo, ufuoma p. ejoke, kwaku oppong asante, janvier rugira received: 01 oct. 2020; accepted: 19 may 2021; published: 28 june 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the study investigated the factor structure of the 15-item social well-being scale in an african context. social well-being is categorised into five dimensions: social integration, social contribution, social coherence, social actualisation and social acceptance. data were collected from 402 participants in south africa (50% male, average age of 21 years). confirmatory factor analysis (cfa) and exploratory structural equation modelling (esem) were conducted in mplus (version 8.1), on the 15-item measure. results showed advantages of esem’s flexibility, through which an unstable emic four factor solution emerged. for such complex multidimensional psychological constructs measured in novel contexts, esem is best suited for exploring factorial validity. although the present study’s findings should have implication for theory, future studies should further explore social well-being measurement using the longand short-form instruments in diverse african samples. keywords: africa; esem; factorial validity; measurement; social well-being. introduction the understanding of well-being as something with only an intrapersonal location misses the reality that people are both private and public beings whose lives are socially and communally embedded (keyes, 1998; kpanake, 2018; prilleltensky, 2005). white (2010) described well-being as social process with material, relational and subjective dimensions and emphasised the centrality of relatedness. not only do sense of belonging, community and relationships constitute well-being (ryff, 1989; white, 2010) but also feature prominently in what gives meaning to life (wissing, 2014). according to helliwell, barrington-leigh, harris and huang (2010), people make more positive evaluation of their lives when they live in societies where they themselves and others have people to rely on. well-being is located in the social and cultural domains (white, 2010). the social and community embeddedness of people is an integral characteristic of the african socio-cultural orientation in which the social good takes precedence over separate personhood (kpanake, 2018; molefe, 2017; nyamnjoh, 2017, 2019; wissing & temane, 2013). it therefore makes sense that those interested in the study of well-being in an african context should take into account the social, relational and communal dimensions of well-being (see chilisa & tsheko, 2014; mertens, 2016; wilson, wissing, & schutte, 2019) and its measurement. from an african socio-cultural perspective, the nature of being is inherently relational (chilisa, major, & khudu-petersen, 2017). social well-being is important because it captures a socially oriented conceptualisation of well-being (patri, albanesi, & pietrantoni, 2016). given the significance of sense of community and relationships (neto & marujo, 2013; molefe, 2017), the present study explored the factor structure of the five dimensional model social well-being of keyes (1998). social well-being captures how well an individual functions in their social life as a member of the greater community (keyes, 1998). keyes (1998, p.122) defined social well-being as ‘the appraisal of one’s circumstances and functioning in society’, and proposed a five factor structure consisting of social integration, social acceptance, social contribution, social actualisation and social coherence. social integration refers to the quality of an individual’s perception of belonging and acceptance in the society (keyes, 1998). it is therefore the extent to which people feel they have things in common with members of their environment (keyes, 1998; keyes & shapiro, 2004). social acceptance captures the meaning that individuals construct of their society as one that is accepting, characterised by trust, social comfort and the belief that people are kind and industrious. social contribution is the evaluation of one’s social worth through their perceived ability to give to others in the community (keyes, 1998). it is intertwined with the evaluation of being an important member of the society and having the ability to contribute. social actualisation reflects the judgement that society has potential and it is growing and developing in a right trajectory. social coherence captures the understanding of the social world as being making sense, organised, functioning well and predictable (keyes, 1998, 2006; keyes & shapiro, 2004). in the original measurement development and validation studies amongst adults in the united states of america, keyes (1998) confirmed the theoretically intended five factor model using the longer form and shorter form. the measure demonstrated good convergent and discriminant validity as demonstrated by theoretically expected relationships with generativity, health of neighbourhood, dysphoric symptoms and subjective physical health (keyes, 1998). except for social acceptance, which had a lower reliability index, keyes (1998) found the other four subscales had good internal consistency. age was also found to be an influential factor in social well-being measurement. except for social coherence, the other four dimensions were found to increase with age, albeit slower each year (keyes, 1998). according to keyes (1998), the observation that social coherence is higher amongst younger people can be attributed to their experience of the world reflecting their popular culture. outside of the many studies concerned with the mental health continuum (e.g. joshanloo, bobowik, & basabe, 2016; joshanloo, wissing, khumalo, & lamers, 2013; lamers, westerhof, bohlmeijer, ten klooster, & keyes, 2011), only a few studies concerned specifically with the measurement of social well-being could be located (e.g. de jager coetzee, & visser, 2018; shayeghian, amiri, vahedi-notash, karimi, & azizi, 2019). the use of the 15-item measure is an improvement on the five-item subscale of the mental health continuum – short form (mhc-sf) because each of the dimensions is measured using three items. shayeghian et al. (2019) applied exploratory factor analysis (efa) and confirmatory factor analysis (cfa) to validate the iranian version of the 15-item social well-being measure, which they found that, albeit with minor modifications, retained the intended factor structure. their minor modifications included two pairs of covariance, in social integration and social coherence and the removal of the item ‘people who do a favour expect nothing in return’ of the social acceptance dimension (shayeghian et al., 2019). in portugal, cfa on the 33-item long-format portuguese version measure yielded the theoretically intended five-factor structure, with good concurrent validity (lages, magalhães, antunes, & ferreira, 2018). a south african study utilising the 15-item short-form version found an emic factor structure comprising three dimensions amongst a sample of employees in a motor manufacturing sector (de jager et al., 2008). notwithstanding that de jager et al. (2008) could not replicate keyes’ (1998) theoretically intended model, their interpretation of the factor solution was contextually meaningful and useful for our exploration. they named their three emergent factors: social predictability and growth, social trust and social value and belonging, leading them to express a careful observation that ‘social well-being in south africa might be operationalised differently’ (de jager et al., 2008, p.57). none of these studies used exploratory structural equation modelling (esem). it is evident that the majority of previous studies relied on the use of cfa. the limitations of cfa, and the advantages of esem, are acknowledged by a number of scholars and methodologists (e.g. asparouhov & muthen, 2009; marsh, morin, parker, & kaur, 2014; perry, nicholls, clough, & crust, 2015). according to marsh et al. (2014), the multidimensional structures of many psychological scales cannot be sufficiently represented using simple cfa models. in fact, this practice results in poor model fit and overestimation of factor correlations (marsh et al., 2014). recent studies have supported the use of esem as its flexibility allows for better model fit and less inflated inter-factors correlations (marsh et al., 2014). the flexibility of esem is inherent in the sense that all items are specified to load on all the factors. this strategy allows cross-loadings, which tend to produce more realistically estimated factor correlations and better fit (marsh et al., 2014). examples of the use of esem in studying the factorial validity of multidimensional measures in positive psychology include benitez-borrego, guàrdia-olmos and urzúa-morales (2014), joshanloo (2016a, 2016b, 2016c) and joshanloo et al. (2016). in all of these studies, esem was found to be superior to cfa. according to marsh et al. (2014), esem incorporates cfa and efa, whilst efa is considered to be suboptimal to cfa because of its open-ended exploratory nature. the present study expands the research conducted by de jager et al. (2008), amongst others, through applying cfa and esem to investigate the factor structure of the social well-being measure in an african sample. as indicated by lages et al. (2018, p.16) ‘a proper understanding of mental health derives from the existence of valid and reliable measurement instruments, theoretically driven and adapted to their application contexts’. in line with this need for a contribution, whether the factor structure of keyes social well-being holds true amongst african sample, needs to be examined. thus, we needed to respond to the question of whether the social well-being indicators, namely social integration, social contribution, social coherence, social actualisation and social acceptance, as operationalised in keyes (1998) model, should be used for assessment of social well-being in africa. methods participants quantitative data were collected using a cross-sectional survey in which 402 students in south africa participated. data collection took place in 2015 at a university of technology located in the gauteng province of south africa. the sample consisted of 199 male (49.5%) and 191 female (47.5%) (12 people did not indicate their gender) students between the ages of 18 and 34 years, with an average age of 21.74 (standard deviation [s.d.] = 2.34) years. measuring instrument social well-being scale short-form the social well-being scale short-form (sws-sf) (keyes, 1998) is a 15-item scale designed to measure social well-being based on the five dimensions indicating how individuals appraise circumstances and functioning in society. it is scored on a 6-point likert scale ranging from strongly disagree (1) to strongly agree (7). the five dimensions, social integration, social contribution, social coherence, social actualisation and social acceptance are each measured using three items. in the original study, keyes (1998) found the subscales, except for social acceptance (0.41) to be reliable, as shown by cronbach’s alpha coefficients ranging between 0.64 and 0.73. using cfa in iran, shayeghian et al. (2019) found modest reliability indices. in portugal, lages et al. (2018) found, the long version to be reliable. in south africa, de jager et al.’s (2008) three factor model also produced internally coherent dimensions: social predictability and growth (α = 0.62), social trust (α = 0.69), social value and belonging (α = 0.74). procedure and ethical considerations data in the present study were collected as part of a project named hope, motivation and social well-being: exploring eudaimonic well-being amongst youth, (approved by the north-west university research ethics regulatory committee [reference number: nwu-00138-14-a8] and vaal university of technology research and innovation ethics committee [reference number: 20140425-1ms]). after the recruitment and consent process, the completion of questionnaires commenced under the supervision of research assistants and student tutors. guidelines from the helsinki declaration (world health organization [who], 2001) and the south african department of health (2014, 2015), were followed. the written informed consent entailed all the necessary information through which the participant would know about the study, details of its procedures, risks and potential benefits and their ethically entrenched rights such as confidentiality, voluntary participation and withdrawal. data analysis the present study investigated the model fit of the sws-sf (15 items) using cfa and esem in mplus (muthén & muthén, 1998–2017). we used robust maximum likelihood (mlr) estimation, with oblique geomin rotation. the five-factor model was tested first with cfa and second with esem. their model fits were tested using chi-square (χ2), root mean square error of approximation (rmsea), standardised root mean square residual (srmr), comparative fit index (cfi), akaike information criterion (aic) and bayesian information criterion (bic) (geiser, 2013). for good fit, the following criteria were used: smaller and insignificant χ2; rmsea and srmr of less than 0.06; cfi of more than 0.95; tucker–lewis index (tli) of more than 0.95; smaller aic and smaller bic (byrne, 2012; geiser, 2013; hu & bentler, 1999; wang & wang, 2012). results the cfa five factor model in which each item loaded only on its one intended factor (figure 1), yielded poor fit, χ2(80) = 305.149, p < 0.000; cfi = 0.706; rmsea = 0.091, p < 0.000 [0.080 0.102]. the esem model fits the data better, χ2(40) = 69.195, p = 0.002; cfi = 0.950; rmsea = 0.047, p = 0.588 [0.028 0.065]. model fit indices are displayed in table 1. figure 1: confirmatory factor analysis model. table 1: model fit indices. the standardised factor loadings for the cfa and esem models, based on the five factor solution are reported (table 2). in the cfa model, except for the social contribution subscale, all the others have only two of the three items with factor loadings above 0.30. this unstable internal consistency renders not only the five dimensional structure proposed by keyes (1998) untenable but also makes for an ill-fitting, highly restrictive cfa model. table 2: confirmatory factor analysis and exploratory structural equation modelling standardised factor loadings for the south african sample. the following items had non-salient loadings on any of the factors: ‘i don’t feel i belong to anything i’d call a community’ (item 1 of social integration), ‘people do not care about other people’s problems’ (item 5 of social acceptance) and ‘i find it easy to predict what will happen next in society’ (item 15 of social coherence). with one item excluded, social integration is indicated by two items which speak to the community being a source of safety. social coherence is indicated by three items in total: two from social acceptance and one from social actualisation. two items from social coherence and two items from social actualisation had salient loadings on social acceptance. lastly, the three items of social contribution straddle between social contribution and social actualisation. item 9 ‘i have nothing important to contribute to society’ cross-loads on two dimensions, whilst item 7 ‘i have something valuable to give to the world’ was retained in social contribution and item 8 ‘my daily activities do not produce anything worthwhile for my community’ loads on social actualisation by itself. the inter-factor correlations for the ill-fitting cfa model (table 3), range between −0.33 and 0.705. the inter-factor correlations from the esem model, which range between −0.148 and 0.468 are shown (table 4). a clear difference between the two sets of correlations is the moderate range in the esem model as compared with some of the extreme correlation coefficients seen in the cfa model. however, the prevalence of non-target factor loadings of the indicator items in this esem model makes the hypothetical five factor structure untenable. table 3: inter-factor correlations in confirmatory factor analysis. table 4: inter-factor correlations in exploratory structural equation modelling. discussion the findings of this study show a deviation from the theoretically intended five-factor model proposed by keyes (1998), with cfa yielding poor fit and esem being characterised by a number of non-target loadings. an emic four factor solution from the esem model, albeit with a degree of instability, was interpretable. the following four dimensions are observed: community as a source of safety (two items), the world as understandable (four items), the world as generous and kind (three items) and ability to contribute (two items). with four items excluded, this structure is made up of only 11 items. two of the factors consist of only two items, making them minor factors. the three items with no salient loadings on any of the factors, even with the esem option of flexible cross-loading were items 1 ‘i don’t feel i belong to anything i’d call a community’; item 5 ‘people do not care about other people’s problems’ and item 15 ‘i find it easy to predict what will happen next in society’. items 1 and 5 may suggest that there is a negative wording factor or that the two items hold contextual interpretation, which offers the reasons for exclusion. in the absence of a negative wording factor, the latter is more plausible. it is possible, as opined by ryff and singer (1998) that the questions of not belonging to a community and that people’s problems would not be cared about are incomprehensible in an african socio-cultural context. two possible reasons may explain why item 15 does not resonate with the respondents in this study. the first has to do with the strong shouldering of responsibility to predict and the second points to the item’s insistence on future-orientation and assumption of certainty of knowing what will happen next. the view that future-orientation does not enjoy salience in many african societies was made popular by amongst others mbiti (1991). the high value of needing to predict what will happen in society makes an assumption of the society’s tolerance for ambiguity (see hofstede, 2011). cultures, which are comfortable with unstructured situations, set and follow less rules and tend to live in the moment and are more tolerant to different opinions and are open to different experiences (hofstede, 2011). ryff and singer’s (1998) conception of practical wisdom and improvisation valued in an african socio-cultural context may offer a plausible explanation. community as a source of safety community as a source of safety dimension is constituted by content from the salient loading of two items, namely item 2: ‘i feel close to other people in my community’ and item 3: ‘my community is a source of comfort’, which had been intended to be indicators of social integration. in a study by de jager et al. (2008), these two items, together with the two which are indicative of the ideas that the world is becoming a better place and that people are kind, were thought to represent social trust. the world as understandable the world as understandable is made up of four items representing item 11: ‘society has stopped making progress’; item 12: ‘society isn’t improving for people like me’; item 13: ‘the world is too complex for me’; item 14: ‘i cannot make sense of what’s going on in the world’. the first two items were intended as indicators of social actualisation and the other two for social coherence. this dimension is reminiscent of the comprehensibility dimension of antonovsky’s (1993) sense of coherence model. it refers to the experience of the world as being ordered, constant, structured and clear. the opposite end of this spectrum, as expressed by antonovsky (1993), would be if the world is characterised by chaos, disorder, randomness and inexplicability. the world as generous and kind the world as generous and kind dimension is indicated by three items, which are item 4: ‘people who do a favour expect nothing in return’, item 6: ‘i believe that people are kind’ and item 10: ‘the world is becoming a better place for everyone’. this three item factor is reminiscent of the (individual) strength of kindness (see park, peterson, & seligman, 2004), which is characterised by generosity, nurturance and care, when expressed at an individual level. ability to contribute this factor is indicated by item 7: ‘i have something valuable to give to the world’ and item 9: ‘i have nothing important to contribute to society’. missing from this social contribution factor is item 8: ‘my daily activities do not produce anything worthwhile for my community’, which loaded on its own unique factor by itself. in keyes’ terms, it seems that one’s sense of meaningful membership of a society hinges on their belief that they have something of value to offer. in an african context, the magnitude of this generosity is never in question or under judgement. as ryff and singer (1998, p.5) observed, ‘africans have no conception of the person apart from the community’. limitations and recommendations the present study was applied on a single sample. a multi-sample study using a series of different factor analytical approaches would help in providing greater evidence of consistency of these seemingly novel findings. even when it may appear from this study ‘that esem is a more appropriate method for examining the factor structure of well-being scales’ (joshanloo et al., 2016, p.107), the inconclusive findings encourage future studies. we also used the short-form version of the scale. it is possible that a long-form version may produce more information about the factorial stability of the social well-being model. lastly, qualitative studies, which use more inductive forms of exploration of a phenomenon, would help to define and describe social well-being from a laypeople’s perspective (see delle fave et al., 2016; mozaffari, peyrovi, & nayeri, 2015; wilson fadiji, meiring, & wissing, 2019). conclusion it may be that in addition to the present study, there are at least two more empirical investigations (de jager et al., 2008; joshanloo, 2018) whose results attest to the heterogeneity and possible instability of the social well-being dimensions. these findings attest to the complexity of well-being as a socially embedded construct (see white, 2017). acknowledging that well-being is not only located at a micro-level of individuals but it is also to understand that human lives are shaped by their ecology (gruner & csikszentmihalyi, 2018). acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions i.p.k., u.p.e., k.o.a., and j.r. all contributed equally to this work. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability a data set was generated through a cross-sectional survey conducted amongst students at a university of technology in south africa. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references antonovsky, a. (1993). the structure and properties of the sense of coherence scale. social science and medicine, 36(6), 725–733. https://doi.org/10.1016/0277-9536(93)90033-z asparouhov, t., & muthen, b. (2009). exploratory structural equation modelling. structural equation modelling: a multidisciplinary journal, 16(3), 397–438. https://doi.org/10.1080/10705510903008204 benitez-borrego, s., guàrdia-olmos, j., & urzúa-morales, a. (2014). factorial structural analysis of the spanish version of whoqol-bref: an exploratory structural equation model study. quality of life research, 23(8), 2205–2212. https://doi.org/10.1007/s11136-014-0663-2 byrne, b.m. (2012). structural equation modeling with mplus: basic concepts, applications, and programming. new york, ny: routledge. chilisa, b., major, t.e., & khudu-petersen, k. (2017). community engagement with a postcolonial, african-based relational paradigm. qualitative research, 17(3), 326–339. https://doi.org/10.1177/1468794117696176 chilisa, b., & tsheko, g.n., (2014). mixed methods in indigenous research: building relationships for sustainable intervention outcomes. journal of mixed methods research, 8(3), 222–233. https://doi.org/10.1177/1558689814527878 de jager, m., coetzee, s., & visser, d. (2008). dimensions of social well-being in a motor manufacturing organisation in south africa. journal of psychology in africa, 18(1), 57–64. https://doi.org/10.1080/14330237.2008.10820171 delle fave, a., brdar, i., wissing, m.p., araujo, u., castro solano, a., freire, t., … soosai-nathan, l. (2016). lay definitions of happiness across nations: the primacy of inner harmony and relational connectedness. frontiers in psychology, 7(30), 1–23. https://doi.org/10.3389/fpsyg.2016.00030 geiser, c. (2013). data analysis with mplus. new york, ny: guilford press. gruner, d.t., & csikszentmihalyi, m. (2018). towards a new measure of societal well-being. in n.j.l. brown, t. lomas, & f.j. eiroa-orosa (eds.), the routledge international handbook of critical positive psychology (pp. 377–391). london: routledge. helliwell, j.f., barrington-leigh, c., harris, a., & huang, h. (2010). international evidence on social context of well-being. in e. diener, j.f. helliwell, & d. kahneman (eds.), international differences in well-being (pp. 291–327). oxford university press. hofstede, g. (2011). dimensionalizing cultures: the hofstede model in context. online readings in psychology and culture, 2(1), 1–26. https://doi.org/10.9707/2307-0919.1014 hu, l.t., & bentler, p.m. (1999). cutoff criteria for fit indixes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 joshanloo, m. (2016a). revisiting the empirical distinction between hedonic and eudaimonic aspects of well-being using exploratory structural equation modelling. journal of happiness studies, 17(5), 2023–2036. https://doi.org/10.1007/s10902-015-9683-z joshanloo, m. (2016b). a new look at the factor structure of the mhc-fs in iran and the united states using exploratory structural equation modeling. journal of clinical psychology, 72(7), 701–713. https://doi.org/10.1002/jclp.22287 joshanloo, m. (2016c). factor structure of subjective well-being in iran. journal of personality assessment, 98(4), 435–443. https://doi.org/10.1080/00223891.2015.1117473 joshanloo, m., bobowik, m., & basabe, n. (2016). factor structure of mental well-being: contributions of exploratory structural equation modeling. personality and individual differences, 102, 107–110. https://doi.org/10.1016/j.paid.2016.06.060 joshanloo, m., wissing, m.p., khumalo, i.p., & lamers, s.m.a. (2013). measurement invariance of the mental health continuum-short form (mhc-sf) across three cultural groups. journal of personality and individual differences, 55(7), 755–759. https://doi.org/10.1016/j.paid.2013.06.002 joshanloo, m. (2018). the structure of the mhc-sf in a large american sample: contributions of multidimensional scaling, journal of mental health, 29(2), 139–143. https://doi.org/10.1080/09638237.2018.1466044 keyes, c.l.m., & shapiro, a.d. (2004). social well-being in the united states: a descriptive epidemiology. in o.g. brim, c.d. ryff, & r.c. kessler (eds.), how healthy are we?: a national study of well-being at midlife (pp. 350–372). chicago, il: the university of chicago press. keyes, c.l.m. (2006). mental health in adolescence: is america’s youth flourishing? american journal of orthopsychiatry, 76, 395–402. https://doi.org/10.1037/0002-9432.76.3.395 keyes, c.l.m. (1998). social well-being. social psychology quarterly, 61(2), 121–140. https://doi.org/10.2307/2787065 kpanake, l. (2018). cultural concepts of the person and mental health in africa. transcultural psychiatry, 55(2), 198–218. https://doi.org/10.1177/1363461517749435 lages, a., magalhães, e., antunes, c., & ferreira, c. (2018). social well-being scales: validity and reliability evidence in the portuguese context. psicologia, 32(2), 15–26. https://doi.org/10.17575/rpsicol.v32i2.1334 lamers, s.a., westerhof, g.f., bohlmeijer, e.t., ten klooster, p.m., & keyes, c.l.m. (2011). evaluating the psychometric properties of the metal health continuum-short form (mhc-sf). journal of clinical psychology, 67(1), 99–110. https://doi.org/10.1002/jclp.20741. marsh, h.w., morin, a.j.s., parker, p.d., & kaur, g. (2014). exploratory structural equation modelling: an integration of the best features of exploratory and confirmatory factor analysis. annual review of clinical psychology, 10, 85–110. https://doi.org/10.1146/annurev-clinpsy-032813-153700 mbiti, j.s. (1991). introduction to african religion (2nd edn.). oxford: heinemann. mertens, d.m. (2016). advancing social change in south africa through transformative research. south african review of sociology, 47(1), 5–17. https://doi.org/10.1080/21528586.2015.1131622 molefe, m. (2017). critical comments on afro-communitarianism: the community versus individual. filosofia theoretica: journal of african philosophy, culture and religions, 6(1), 1–22. https://doi.org/10.4314/ft.v6i1.1 mozaffari, n., peyrovi, h., & nayeri, n.d. (2015). the social well-being of nurses shows a thirst for a holistic support: a qualitative study. international journal of qualitative studies on health and well-being, 10(1), 1–8. https://doi.org/10.3402/qhw.v10.27749 muthén, l.k., & muthén, b.o. (1998–2017). mplus statistical analysis with latent variables: users’ guide (8th edn.). los angeles, ca: muthén & muthén. neto, l., & marujo, h. (2013). positive community psychology and positive community development: research and intervention as transformative-appreciative actions in h. marujo & l.m. neto (eds.), building positive nations and communities (pp. 209–230). dordrecht: springer. nyamnjoh, f.b. (2017). incompleteness: frontier africa and the currency of conviviality. journal of asian and african studies, 52(3), 253–270. https://doi.org/10.1177/0021909615580867 nyamnjoh, f.b. (2019, may 22). ubuntuism and africa: actualised, misappropriated, endangered and reappraised [lecture presentation]. africa day memorial lecture, university of the free state, bloemfontein. retrieved from https://bit.ly/37x8er6 park, n., peterson, c., & seligman, m.e.p. (2004). strengths of character and well-being. journal of social and clinical psychology, 23(5), 603–619. https://doi.org/10.1521/jscp.23.5.603.50748 patri, g., albanesi, c., & pietrantoni, l. (2016). the reciprocal relationship between sense of community and social well-being: a cross-lagged panel analysis. social indicators research, 127, 1321–1332. https://doi.org/10.1007/s11205-015-1012-8 perry, j.l., nicholls, a.r., clough, p.j., & crust, l. (2015). assessing model fit: caveats and recommendations for confirmatory factor analysis and exploratory structural equation modelling. measurement in physical education and exercise science, 19(1), 12–21. https://doi.org/10.1080/1091367x.2014.952370 prilleltensky, i. (2005). promoting well-being: time for a paradigm shift in health and human services. scandinavian journal of public health, 33(66), 53–60. https://doi.org/10.1080/14034950510033381 ryff, c.d. (1989). happiness is everything, or is it? explorations on the meaning of psychological well-being. journal of personality and social psychology, 57(6), 1069–1081. https://doi.org/10.1037/0022-3514.57.6.1069 ryff, c.d., & singer, b. (1998). the contours of positive human health. psychological inquiry, 9(1), 1–28. https://doi.org/10.1207/s15327965pli0901_1 shayeghian, z., amiri, p., vahedi-notash, g., karimi, m., & azizi, f. (2019). validity and reliability of the iranian version of the short-form social well-being scale in a general urban population. iranian journal of public health, 48(8), 1478–1487. https://doi.org/10.18502/ijph.v48i8.2988 south african department of health (doh). (2014, september 19). national health act (act no. 61 of 2003) regulations relating to research with human participants. government gazette no. r. 719. pretoria: government printing works. south african department of health (doh). (2015). ethics in health research: principles, structures and processes. retrieved from http://www.nhrec.org.za/docs/documents/ethicshealthresearchfinalaused.pdf wang, j., & wang, x. (2012). structural equation modeling: applications using mplus. west sussex: wiley. white, s.c. (2010). analysing well-being: a framework for development practice. development in practice, 20(2), 158–172. https://doi.org/10.1080/09614520903564199 white, s.c. (2017). relational well-being: re-centring the politics of happiness, policy and the self. policy & politics, 45(2), 121–136. https://doi.org/10.1332/030557317x14866576265970 wilson, a., wissing, m.p., & schutte, l. (2019). ‘we help each other’: relational patterns among older individual in south african samples. applied research in quality of life, 14, 1373–1392. https://doi.org/10.1007/s11482-018-9657-5 wilson fadiji, a., meiring, l., & wissing, m.p. (2019). understanding well-being in the ghanaian context: linkages between lay conceptions of well-being and measures of hedonic and eudaimonic well-being. applied research in quality of life: the official journal of the international society for quality-of-life studies, 16, 649–677. https://doi.org/10.1007/s11482-019-09777-2 wissing, m.p. (2014). meaning and relational well-being in cross-cultural perspectives. journal of psychology in africa, 24(1), iii–vi. https://doi.org/10.1080/14330237.2014.904092 wissing, m.p., & temane, m. (2013). south africa’s truth and reconciliation process as applied positive psychology in nation building. in h. marujo & l.m. neto (eds.), building positive nations and communities (pp. 149–170). dordrecht: springer. world health organization. (2001). world medical association declaration of helsinki: ethical principles for medical research involving human subjects. bulletin of the world health organization, 79(4), 373–374. abstract introduction methods results discussion conclusion acknowledgements references appendix 1 about the author(s) charles h. van wijk institute for maritime medicine, simon’s town, south africa jarred h. martin institute for maritime medicine, simon’s town, south africa department of psychology, university of pretoria, pretoria, south africa citation van wijk, c.h., & martin, j.h. (2019). a brief sailor resiliency scale for the south african navy. african journal of psychological assessment, 1(0), a12. https://doi.org/10.4102/ajopa.v1i0.12 original research a brief sailor resiliency scale for the south african navy charles h. van wijk, jarred h. martin received: 18 mar. 2019; accepted: 06 sept. 2019; published: 17 oct. 2019 copyright: © 2019. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract resilience constructs and measures in the military context are of particular interest because of their association with general performance and mental health outcomes. however, in spite of the reported advantages, the use of resilience assessment models faces two challenges: firstly, measurement and, secondly, operational application within the military environment. this article aimed to provide preliminary validation for a brief sailor resiliency scale (bsrs) for use in the south african navy (san) in order to discuss its operational application for individuals and groups. the study used a sample of active-duty san sailors, distributed throughout the fleet. participants (n = 1312) completed the bsrs, together with established measures of resiliency and emotional regulation, and also provided socio-demographic information. the psychometric structure of the scale was examined, firstly, through confirmatory factor analysis within structural equation modelling, and secondly socio-demographic effects and construct validity were also explored. the model yielded acceptable fit and high internal consistency. furthermore, the results support the construct validity of the scale. the data appear to support the contention that comprehensive resilience screening measures, while still brief and time-effective, could be employed to the benefit of navy personnel. this would facilitate a ‘screen-and-stream’ approach which allows military mental health practitioners (1) to screen military personnel comprehensively and (2) to identify and stream quickly those whose resilience appears to be compromised for further assessment and targeted intervention by appropriate support providers. keywords: measurement; resilience; south africa; screen-and-stream; ice environments. introduction the brief sailor resiliency scale (bsrs) is an instrument which aims to measure four core dimensions that are thought to contribute to a comprehensive and global measure of resilience within the military environment, namely mental, physical, social and spiritual fitness. the aim of this article is to demonstrate preliminary validity of the bsrs for local use, in order to discuss the potential operational and, in particular, occupational health applications of the bsrs beyond mere use in resilience research. background resilience is the process of adapting well in the face of adversity, trauma, tragedy or threats (american psychological association, 2019). a number of constructs fall under the umbrella of resilience, such as hardiness (kobasa, 1979) and mental toughness (clough, earle, & sewell, 2002). these constructs are generally conceptualised as psychological orientations that are associated with people who remain healthy and continue to perform well under a range of stressful conditions (bartone, roland, picano, & williams, 2008; kobasa, maddi, & kahn, 1982). an extensive body of research supports the idea that resilience constructs protect against the ill effects of stress on health and performance among a wide variety of civilian occupations and contexts (bartone, 1989; gerber et al., 2015; giles et al., 2018; maddi & hess, 1992; maddi & kobasa, 1984). in the military, constructs such as hardiness have been shown to influence outcomes among soldiers in various training and combat environments (bartone, 1996, 1999; bartone, johnsen, eid, laberg, & brun, 2002; bartone, ursano, wright, & ingraham, 1989). hardy soldiers further appear less likely to develop post-traumatic stress disorder (ptsd) and other mental health conditions after combat exposure (bartone, 1999, 2000; bartone, hystad, eid, & brevik, 2012; escolas, pitts, safer, & bartone, 2013; pietrzak, johnson, goldstein, malley, & southwick, 2009), and may adapt better both during and after operational deployments (britt, adler, & bartone, 2001). resilience constructs and measures in the military context are of particular importance through their association with general performance and mental health outcomes (lee, sudom, & zamorski, 2013). military life is traditionally associated with exposure to challenging conditions, where enhanced degrees of personal resilience are known to facilitate positive health benefits (simmons & yoder, 2013) and more meaningful modes of adaptation to the demands of operational work (morgan & bibb, 2011). in certain operational environments, such as those typically faced by naval forces, a number of occupational groups work in isolated, confined and/or extreme (ice) environments (e.g. on ships and in submarines), which adds an additional layer of potentially stressful environmental circumstances (smallidge et al., 2013). moreover, naval deployments have also been shown to give rise to peculiar operationally specific stressors and traumatic exposures, which can act as potential compromisers of sailors’ resilience (martin, van wijk, hans-arendse, & makhaba, 2013). in both instances, enhanced styles of resilience may be particularly beneficial for naval personnel in withstanding the rigours of military work and life. in this regard, the ability to meaningfully measure resilience in military populations has increasingly become important because of the occupational and operational advantages that such research yields (xie, peng, zuo, & li, 2016). for example, it helps to identify: operationally at-risk individuals, in order to offer additional support behavioural targets for intervention resilience associated protective, promotive and/or compromising factors the effects of interventions on individual or organisational levels. however, there are challenges when directly measuring psychological resilience by means of psychometric measures, especially in unique ice environments. thus, within military contexts, resilience is often assessed through proxies, such as ‘adaptation, satisfaction, and other “competent functioning” indicators’ (wright, riviere, merrill, & cabrera, 2013, pp. 175–176), rather than resilience per se. as mentioned in the above definition, resilience is conceptualised as an iterative process of adaptation and adaptability. the proxies used to assess it may be considered as expressions of resiliency. resiliency is conceptualised as an outcome of the resilience process, which – at least in the military – is reflected in the successful performance of important personal and military life roles (bowen & martin, 2011). sailors’ ability (also called readiness) to fulfil their military roles is often referred to as their fitness for duty. fitness, in this use of the word, is a resilience resource (i.e. a resource that facilitates resiliency). the model investigated in this article provides for four fitness domains, which collate into a total fitness construct, using the united states air force (usaf) definitions (see table 1) that centre on ‘ability’ to cope and adapt (and that are measured by behavioural outcomes, i.e. resiliency indicators). total fitness has been reported to have a direct and positive influence on performance-based resiliency (bowen, jensen, & martin, 2016a; bowen & martin, 2011). table 1: united states air force definitions of four fitness domains. in spite of the reported advantages, the use of resilience/resiliency models faces two challenges: firstly, measurement and, secondly, operational application in specific contexts. measuring fitness in isolated, confined and/or extreme contexts there are a multitude of scales available in the general literature that purport to measure aspects of resilience, many of them relatively effective in predicting resilience in the face of real adversity. however, naval – and ice – environments are often quite unique, and established measures are not always a good fit. further, tools often measure general dispositional orientation, rather than behaviours in specific contexts. given the naval context, a measure of behaviours and beliefs may be more useful in that it could provide sailors with a means (e.g. an action) to both measure and enhance their resilience. thus, fitness-in-context, as a building block of resilience, may be particularly appropriate. such a scale already exists, having been developed in the usaf context. bowen, jensen and martin (2016a, 2016b) developed a tool to assess comprehensive airman fitness (caf), and conducted rigorous factor and multiple group comparison analyses to empirically validate a 12-item measure of the four fitness components and an overall component of comprehensive fitness. their results verified the presence of the four distinct fitness factors (mental, physical, social and spiritual), each measured with three observed indicators, with high levels of internal consistency within factors. it also demonstrated that the four individual fitness constructs loaded onto a second-order latent construct of total fitness, thus confirming that the four fitness domains can be considered as a total measure of fitness. bowen et al. (2016a) further demonstrated construct validity of the caf measure by using a self-assessed performance-based measure of military resiliency. this measure was defined as a latent factor with three observed variables (bowen et al., 2016a), and the indicator was derived from measuring human performance within the inherently stressful conditions of military duties and service life (bowen & martin, 2013). their analyses also showed that the caf instrument was invariant across subgroups defined by military pay grade, gender, marital status and deployment status in the past 12 months – a desirable characteristic of any assessment tool used within a diverse target population such as the military (bowen et al., 2016b, p. 441). operational application of measures measures are typically used as markers of resilience in larger research projects, with their scores then associated with other data that are thought to reflect mental health or performance. however, there has been less discussion on how individual scores could be used to enhance resilience, and presumably mental health, in target groups or larger populations. the need to develop contextually appropriate and comprehensive measures of resilience in military populations has in recent years become an increasingly topical matter (adler & sowden, 2018; greene & staal, 2017). this is firstly because of the increasing demand on military mental health practitioners (mmhps) to render health support services – to large numbers of military personnel over the short periods of time that are allotted during pre-and post-deployment readiness and decompression cycles – in far more (cost-)efficient and time-effective ways (mcdonald, beckham, morey, & calhoun, 2009). secondly, there is a need to timeously identify (potential) psychological casualties whose fallout may be preventable through proactive and multidimensional health interventions to enhance their overall level of resilience (castro, engel, & adler, 2004; jones, hyams, & wessely, 2003). in other words, the future trajectory of military resilience measures is likely to be informed by what the researchers of this study regard as a ‘screen-and-stream’ (or sas) approach, which allows mmhps: (1) to screen military personnel comprehensively by means of a timeously administered, scored and analysed resilience measure and (2) to quickly identify and stream those military personnel whose resilience appears to be compromised towards further assessment and targeted intervention by appropriate mmhps or support providers. operational application in the above framework refers to the practical use of a screening tool in a specific context to support positive outcomes, for example using the bsrs in the military context to identify poor resiliency in order to provide further support and targeted interventions. aims this article aimed to propose and discuss a practical application of resilience scales generally, and the bsrs in particular, supported by underlying psychometric data. it did so in two parts. firstly, the caf measure has been developed and validated by using data from the usaf. the current study thus aimed to provide a preliminary validation of a modified version of the measure (bsrs) for use in the south african navy (san). this was done by exploring three aspects of the psychometric properties of the scale in a san sample, namely: exploring the psychometric structure of the scale by using confirmatory factor analysis (cfa) within structural equation modelling, and internal reliability analysis exploring socio-demographic associates, namely the effects of age, gender and experience (operationalised as years of military service, and number of operational deployments) exploring scale validity. construct validity was examined by correlating the four fitness components, and total fitness, with a resiliency measure (the military resiliency scale [mrs]) to replicate the analysis of bowen et al. (2016a). convergent validity was further examined by correlating the four fitness components, and total fitness, with a measure of emotional self-regulation (the brunel mood state scale [brums]), as a proxy for psychological adaptation, in a sample of deployed sailors. thereafter, given the criticism of resilience scales regarding the management of individual or group specific findings in practical terms, and based on the demonstrated psychometric data, this article further aimed to propose and discuss an operational application of the bsrs beyond its limited use as a resilience marker in formal research. methods participants the study was conducted according to the principles set out in the declaration of helsinki (world medical association, 2013), and had received prior ethical approval from stellenbosch university. south african navy sailors on active duty were invited to participate anonymously in the study and were briefed that completion of the measures indicated consent. a total of 1312 san sailors (women = 21.4%, men = 78.6%) returned completed data sets. table 2 presents the sample composition. a subsample (n = 275), drawn from two warships, also completed the brums during operational deployments. they indicated code numbers on the bsrs for later correlation to their brums scores. all data were anonymised prior to analysis. table 2: south african navy sample composition (n = 1312). all participants had a minimum of 12 years of formal education, and all 11 south african official languages were spoken within the sample. the scale was administered in english, as the sample was considered to be proficient in english, which is the official command language of the san, and the language in which all training takes place. measurements brief sailor resiliency scale: the bsrs is based on the caf measure of bowen et al. (2016a, 2016b), and so named to fit into the terminology framework employed by the san. the bsrs is a 12-item measure of the four fitness components (namely mental, physical, social and spiritual), which can be calculated to obtain a comprehensive fitness score. each fitness component is measured by means of three indicators. data from the usaf indicate that the scale consists of four distinct first-order factors with high levels of internal consistency within factors, and that all four factors load onto a second-order factor of comprehensive fitness. good construct validity has been demonstrated (bowen et al., 2016a, 2016b). all items were completed using a likert scale format (anchored at 0 – not at all and 4 – completely). this 5-point likert scale differed from the original caf measure, which employed an 11-point scale; the range was chosen to be aligned to other measures used in the local san context, which typically uses 5-point scale formats. the 11-point range was further narrowed to accommodate second-language english speakers – who form the majority of this sample and the san in general. previous experience indicated that second-language english speakers found discerning the semantic nuances in an 11-point scale challenging. military resiliency scale: the mrs is a self-assessed performance-based measure of military resiliency (bowen et al., 2016a), consisting of three observed variables that tap human performance within military service life (bowen & martin, 2013). responses are indicated on a 5-point likert scale; support for validity indicators (bowen & martin, 2013), as well as an alpha coefficient of 0.81 (bowen et al., 2016a), was reported previously. brunel mood state scale: the brums is a 24-item self-report inventory that measures mood states on a 5-point likert scale (mcnair, heuchert, & shilony, 2003; terry, lane, & fogarty, 2003). good concurrent and criterion validity, as well as reliability, have been reported both internationally (mcnair et al., 2003; terry et al., 2003) and locally (terry, potgieter, & fogarty, 2003). among others, it has been used during military deployments to predict self-reported post-traumatic stress symptoms after maritime interdiction operations (van wijk, martin & hans-arendse, 2013). the brums is sensitive to changes in emotional regulation, and is regularly used in the san as an indicator of psychological adaptation during operational deployments in specific ice contexts (institute for maritime medicine, 2018). in this context, adaptation is a proxy for resiliency (i.e. an outcome of resilience), and brums total scores were used for correlation to bsrs scores to examine convergent validity. data analysis descriptive analysis was conducted through score distribution and tests of normality. the psychometric structure of the scale was examined by cfa within structural equation modelling, and internal reliability analysis. confirmatory factor analysis is a special form of factor analysis, used to test whether data fit a hypothesised measurement model (marker, 2002). with the original factor structure of the scale established in usaf samples, cfa was employed to verify the relationships between the observed variables and their underlying latent constructs in a local san sample. further information on cfa indices can be found in appendix 1 – table 2-a1. socio-demographic effects were examined by using bivariate correlation coefficients (for age and experience) and independent t-tests (for gender). construct validity was also examined by using bivariate correlation coefficients of bsrs scores and mrs and brums scores. all analyses were conducted by means of statistical package for social sciences (spss version 25) and analysis of moment structures (amos). ethical consideration the study received approval from the health research ethics committee of stellenbosch university (protocol number: n16/04/051). results normality distribution the total fitness score had a mean of 38.3 and a standard deviation of ± 6.4. it is graphically represented in appendix 1, figure 1-a1. tests of univariate normality were conducted, and it was found that all skew index values were less than 2 and all kurtosis index values were less than 3 (appendix 1, table 1-a1), thus indicating that the distributions of responses were not necessarily problematic (george & mallery, 2010). confirmatory factor analysis the 12-item bsrs was subjected to cfa, and the results associated with the final model are displayed in figure 1. the model yielded acceptable fit, as indicated by the following model fit indices: χ2 (48) = 159.59, p < 0.001; root mean square error of approximation (rmsea) = 0.042 (95% confidence interval [ci]: 0.035–0.049); comparative fit index (cfi) = 0.998; goodness-of-fit index (gfi) = 0.998; and adjusted goodness-of-fit index (agfi) = 0.995. standardised first-order factor loadings ranged from 0.78 to 0.94, and second-order factor loadings ranged from 0.70 to 0.78. expanded goodness-of-fit indices and factor correlations can be found in appendix 1, tables 2-a1 and 3-a1. figure 1: final model with standardised estimated parameters. reliability the 12-item bsrs comprehensive fitness scale produced a cronbach’s alpha (α) of 0.874. mental fitness (α = 0.745), physical fitness (α = 0.851), social fitness (α = 0.873) and spiritual fitness, (α = 0.892) all produced acceptable alphas. apart from mental fitness, which differed somewhat from α = 0.90 reported in the validation studies, the rest were similar to published reports (cf. bowen et al., 2016a, 2016b). socio-demographic associations the correlations between bsrs and age, years of military service and number of operational deployments are presented in table 3. men scored higher than women (t = 4.160, p < 0.001, cohen’s d = 0.28, mean difference = 1.8), although the actual size of the difference was very small. table 3: correlations between socio-demographic variables and brief sailor resiliency scale comprehensive fitness scores. scale validity the correlations between the bsrs firstand second-order factors and the measure of resiliency are presented in table 4, as are the correlations with a measure of emotional regulation. brief sailor resiliency scale’s total and component scores all predicted resiliency (p < 0.001 for all), as well as emotional regulation (as a proxy for psychological adaptation; p < 0.001 for all) during an operational deployment. table 4: brief sailor resiliency scale construct validity indicators. discussion preliminary validation of the brief sailor resiliency scale in the south african navy context the findings provide a preliminary validation of the bsrs for use in the san. the analysis confirmed the previously reported factor structure and internal reliability (bowen et al., 2016a, 2016b). the findings support the model of four distinct fitness domains (mental, physical, social and spiritual) that can be considered to contribute towards a more global measure, namely the second-order factor of comprehensive fitness. it was noteworthy that the variables age, years of military service and number of operational deployments, all displayed similar trajectories, suggesting that all three may be tapping into the same construct, perhaps in this case ‘life experience’, which is generally operationalised as age. although small variations across gender and age were observed, this could be because of the distribution of age and discrepant gender subgroups, and would likely not have practical significance in the application of the bsrs. furthermore, the findings replicated support for the construct validity of the scale (bowen et al., 2016a), and further extended support for its validity, in that bsrs score outcomes appeared to predict actual psychological adaptation in ice contexts. in practice, this may mean that some indicators of problematic adaptation during ice missions could possibly be predicted in advance with the bsrs, leading the way towards considering timely intervention. operational application: towards a screen-and-stream approach the establishment of preliminary psychometric properties sets the scene for considering the operational application of resilience scales, referring to its use in specific contexts. in this regard, the findings of this study appear to support the contention that comprehensive resilience screening measures, while brief and time-effective, such as the bsrs, hold both occupational and operational health benefits for military personnel, broadly, and naval personnel, in particular. while the bsrs appears psychometrically valid in providing an overall measure of comprehensive fitness, it demonstrates its operational application through the psychometrically sound and nuanced rendering of resiliency through subscales of mental, physical, social and spiritual fitness. the use of the bsrs therefore provides a screening measure of specific core dimensions, which underwrite the overall resilience of military personnel, and in effect aid in assessing the resilience dimension of a particular sailor’s combat and operational readiness. in doing so, mmhps who use the bsrs are provided with clear psychometric indicators concerning those sailors – be they individuals or teams – who may require further assessment and perhaps benefit from targeted resilience-enhancing interventions. in the south african national defence force, much like many other militaries around the world (firth & smith, 2010), the occupational health and welfare of military service personnel is regarded as multifaceted and informed by physiological (e.g. biological), psychological (e.g. emotional), social (e.g. familial) and spiritual (e.g. religio-cultural belief system) determinants (south african military health service, 2008). to this effect, the practice of military health support and service provision is often circumscribed by the involvement of multi-professional teams which involve the co-participation of various health and support professionals, such as physicians, psychologists, social workers and chaplains, who work together to collectively manage, treat and proactively enhance the occupational well-being and, by extension, operational health and utility of military personnel. while comprehensive military health support systems are necessary for sustaining the health of operationally active military personnel, such systems are also prone to laborious, time-intensive and often over-burdened and under-staffed referral channels. as a consequence, operationally at-risk individuals may ‘get lost in the system’ or ‘fall through the cracks’ because of poorly articulated or inefficient referrals for further assessment and intervention. however, the bsrs provides a concise multidimensional screening of resilience, through which specific dimensions of potentially compromised resilience can be identified and ‘streamed’ to the most appropriate mmhp or support professional for further assessment and intervention. in this regard, the bsrs becomes especially valuable for the mmhps who work with the operationally active sailors of the san, whose military work and life is increasingly characterised by regular deployments and a high operational tempo by virtue of the leading role that the san plays in maritime border patrol (defenceweb, 2019) and multinational anti-piracy operations along the southern coasts of africa (defenceweb, 2018). importantly, the sas approach is by no means limited to naval or general military contexts, and could also be considered for translation and application in other occupational environments with exposure to challenging conditions, and which require ongoing and meaningful modes of adaptation to the operational demands placed on those personnel. such occupational contexts may range from the south african police service or emergency medical or rescue services to the offshore industry that typically lives and works in ice conditions. limitations and future directions the briefness of the subscales, and the very high values of the model fit indices, may suggest some over-fitting or saturation of the bsrs model (marker, 2002), and caution is advised when interpreting these results. further research using different data sets may assist to resolve this concern. the study used a limited array of markers to represent the outcome of resilience, and future studies may need to extend the measuring of psychological adaptation and mental health, as well as the measuring of actual work performance to enhance understanding of the relationship between comprehensive sailor fitness and well-being in the naval context. further research is also required to establish the extent to which this model can be transferred to other related occupational contexts. another issue worth noting is the conceptualisation of spiritual fitness within the bsrs. although the measure of spiritual fitness in this research was found to be both valid and reliable, the operationalisation of the spiritual fitness subscale and items adhered quite closely to the original caf-based articulation of spiritual fitness in the usaf environment. to this effect, it may still be necessary to broaden the conceptualisation of spiritual fitness for san sailors who draw from diverse, intersecting and often competing religio-cultural, indigenous and cosmological systems which inform how they understand and sustain spiritual fitness as a facet of personal resilience within the military environment. conclusion this study established the potential of the bsrs to assess for resilience outcomes among san sailors. it further proposes an operational application, namely a ‘screen-and-stream’ model, as a valid and cost-effective way to provide appropriate support to individuals and groups with potentially compromised aspects of resilience. acknowledgements the authors wish to thank prof. m. kidd for support with the structural equation modelling. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions all authors contributed equally to this work by collecting the data, analysing the results and writing the manuscript. funding no funding was received for this study. data availability statement the data is from a military sample, and thus not available for sharing. disclaimer the views expressed in this article are the authors’ own and not an official position of the institutions or funders. references adler, a.b., & sowden, w.j. (2018). resilience in the military: the double-edged sword of military culture. in l.w. roberts (ed.), military and veteran mental health: a comprehensive guide (pp. 43–54). new york: springer. air force instruction 90-506. (2014 april 02). comprehensive airman fitness (caf). washington, dc: department of the air force. american psychological association (2019). the road to resilience. retrieved from https://www.apa.org/helpcenter/road-resilience.aspx. bartone, p.t. (1989). predictors of stress-related illness in city bus drivers. journal of occupational medicine, 31(8), 657–663. https://doi.org/10.1097/00043764-198908000-00008 bartone, p.t. (1996, august). stress and hardiness in u.s. peacekeeping soldiers. paper presented at the 104th american psychological association meeting, toronto, canada. bartone, p.t. (1999). hardiness protects against war-related stress in army reserve forces. consulting psychology journal, 51(2), 72–82. https://doi.org/10.1037/1061-4087.51.2.72 bartone, p.t. (2000). hardiness as a resiliency factor for united states forces in the gulf war. in j.m. violanti, d. paton & c. dunning (eds.), posttraumatic stress intervention: challenges, issues, and perspectives (pp. 115–133). springfield, il: c.thomas. bartone, p.t., hystad, s.w., eid, j., & brevik, j.i. (2012). psychological hardiness and coping style as risk/resilience factors for alcohol abuse. military medicine, 177(5), 517–524. https://doi.org/10.7205/milmed-d-11-00200 bartone, p.t., johnsen, b.h., eid, j., laberg, l.c., & brun, w. (2002). factors influencing small unit cohesion in norwegian navy officer cadets. military psychology, 14(1), 1–22. https://doi.org/10.1207/s15327876mp1401_01 bartone, p.t., roland, r.r., picano, j.j., & williams, t.j. (2008). psychological hardiness predicts success in us army special forces candidates. international journal of selection and assessment, 16(1), 78–81. https://doi.org/10.1111/j.1468-2389.2008.00412.x bartone, p.t., ursano, r.j., wright, k.m., & ingraham, l.h. (1989). the impact of a military air disaster on the health of assistance workers: a prospective study. journal of nervous and mental disease, 177(6), 317–328. https://doi.org/10.1097/00005053-198906000-00001 bowen, g.l., jensen, t.m., & martin, j.a. (2016a). a measure of comprehensive airman fitness: construct validation and invariance across air force service components. military behavioral health, 4(2), 149–158. https://doi.org/10.1080/21635781.2015.1133345 bowen, g.l., jensen, t.m., & martin, j.a. (2016b). confirmatory factor analysis of a measure of comprehensive airman fitness. military behavioral health, 4(4), 409–419. https://doi.org/10.1080/21635781.2016.1199984 bowen, g.l., & martin, j.a. (2011). the resiliency model of role performance of service members, veterans, and their families. journal of human behavior in the social environment, 21(2), 162–178. https://doi.org/10.1080/10911359.2011.546198 bowen, g.l., & martin, j.a. (2013). support and resiliency inventory (sri-m): six-month utilization report (january–june 2013). charlotte, nc: flying bridge technologies. retrieved from https://doi.org/10.13140/2.1.1536.0327 britt, t.w., adler, a.b., & bartone, p.t. (2001). deriving benefits from stressful events: the role of engagement in meaningful work and hardiness. journal of occupational health psychology, 6(1), 53–63. https://doi.org/10.1037/1076-8998.6.1.53 castro, c.a., engel, c.c., & adler, a.b. (2004). the challenge of providing mental health prevention and early intervention in the us military. in b.t. litz (ed.), early intervention for trauma and traumatic loss (pp. 301–318). new york: guilford press. clough, p., earle, k., & sewell, d. (2002). mental toughness: the concept and its measurement. in i. cockerill (ed.), solutions in sport psychology (pp. 32–46). london: thomson learning. defenceweb. (2018). sa navy two platform deployment returns from op copper duty. retrieved from https://www.defenceweb.co.za/security/maritime-security/sa-navy-two-platform-deployment-returns-from-op-copper-duty/. defenceweb. (2019). makhanda on maritime resource protection tasking. retrieved from https://www.defenceweb.co.za/featured/makhanda-on-maritime-resource-protection-tasking/. escolas, s.m., pitts, b.l., safer, m.a., & bartone, p.t. (2013). the protective value of hardiness on military posttraumatic stress symptoms. military psychology, 25(2), 116–123. https://doi.org/10.1037/h0094953 firth, k.m., & smith, k. (2010). a survey of multidimensional health and fitness indexes. military medicine, 175(suppl 8), 110–117. https://doi.org/10.7205/milmed-d-10-00257 george, d., & mallery, m. (2010). spss for windows step by step: a simple guide and reference (17.0 update, 10a edn.). boston, ma: pearson. gerber, m., feldmeth, a.k., lang, c., brand, s., elliott, c., holsboer-trachsler, e., & pühse, u. (2015). the relationship between mental toughness, stress and burnout among adolescents: a longitudinal study with swiss vocational students. psychological reports: employment psychology & marketing, 117(3), 703–723. https://doi.org/10.2466/14.02.pr0.117c29z6 giles, b., goods, p.s.r., warner, d.r., quain, d., peeling, p., ducker, k.j., … gucciardi, d.f. (2018). mental toughness and behavioural perseverance: a conceptual replication and extension. journal of science and medicine in sport, 21(6), 640–645. https://doi.org/10.1016/j.jsams.2017.10.036 greene, c.h., & staal, m.a. (2017). resilience in us special operations forces. in s.v. bowles & p.t. bartone (eds.), handbook of military psychology: clinical and organizational practice (pp. 177–192). champaign, il: springer. institute for maritime medicine (2018). usefulness of the brums for mobilisation/demobilisation of ship-based maritime operations. technical report 14 december 2018. simon’s town: institute for maritime medicine. jones, e., hyams, k.c., & wessely, s. (2003). screening for vulnerability to psychological disorders in the military: an historical survey. journal of medical screening, 10(1), 40–46. https://doi.org/10.1258/096914103321610798 kobasa, s.c. (1979). stressful life events, personality and health: an inquiry into hardiness. journal of personality and social psychology, 37(1), 1–11. https://doi.org/10.1037/0022-3514.37.1.1 kobasa, s.c, maddi, s.r., & kahn, s. (1982). hardiness and health: a prospective study. journal of personality and social psychology, 42(1), 168–177. https://doi.org/10.1037/0022-3514.42.1.168 lee, j.e., sudom, k.a., & zamorski, m.a. (2013). longitudinal analysis of psychological resilience and mental health in canadian military personnel returning from overseas deployment. journal of occupational health psychology, 18(3), 327–337. https://doi.org/10.1037/a0033059 maddi, s.r., & hess, m. (1992). hardiness and success in basketball. international journal of sports psychology, 23(4), 360–368. maddi, s.r., & kobasa, s.c. (1984). the hardy executive: health under stress. homewood, il: dow jones irwin. marker, d. (2002). model theory: an introduction. new york: springer-verlag. martin, j., van wijk, c., hans-arendse, c., & makhaba, l. (2013). ‘missing in action’: the significance of bodies in african bereavement rituals. psychology in society, 44, 42–63. mcdonald, s.d., beckham, j.c., morey, r.a., & calhoun, p.s. (2009). the validity and diagnostic efficiency of the davidson trauma scale in military veterans who have served since september 11th, 2001. journal of anxiety disorders, 23(2), 247–255. https://doi.org/10.1016/j.janxdis.2008.07.007 mcnair, d.m., heuchert, j.w.p., & shilony, e. (2003). profile of mood states manual: bibliography 1964-2002. new york: multi-health systems inc. morgan, b.j., & bibb, s.c.g. (2011). assessment of military population-based psychological resilience programs. military medicine, 176(9), 976–985. https://doi.org/10.7205/milmed-d-10-00433 pietrzak, r.h., johnson, d.c., goldstein, m.b., malley, j.c., & southwick, s.m. (2009). psychological resilience and post-deployment social support protect against traumatic stress and depressive symptoms in soldiers returning from operations enduring freedom and iraqi freedom. depression and anxiety, 26, 745–751. https://doi.org/10.1002/da.20558 simmons, a., & yoder, l. (2013). military resilience: a concept analysis. nursing forum, 48(1), 17–25. https://doi.org/10.1111/nuf.12007 smallidge, t., jones, e., lamb, j., feyre, r., steed, r., & caras, a. (2013). modelling complex tactical team dynamics in observed submarine operations. in d.d. schmorrow & c.m. fidopiastis (eds.), foundations of augmented cognition (pp. 189–198). berlin: springer. south african military health service (2008). south african military health service: conventional doctrine. pretoria: south african military health service. terry, p.c., lane, a.m., & fogarty, g.j. (2003). construct validity of the poms-a for use with adults. psychology of sport and exercise, 4(2), 125–139. https://doi.org/10.1016/s1469-0292(01)00035-8 terry, p.c., potgieter, j.r., & fogarty, g.j. (2003). the stellenbosch mood scale: a dual-language measure of mood. international journal of sport and exercise psychology, 1(3), 231–245. https://doi.org/10.1080/1612197x.2003.9671716 van wijk, c.h., martin, j.h., & hans-arendse, c. (2013). clinical utility of the brums in screening for post-traumatic stress risk in a military population. military medicine, 178(4), 372–376. https://doi.org/10.7205/milmed-d-12-00422 world medical association. (2013). declaration of helsinki. journal of the american medical association, 310(20), 2191–2194. https://doi.org/10.1001/jama.2013.281053 wright, k.m., riviere, l.a., merrill, j.c., & cabrera, o.a. (2013). resilience in military families: a review of programs and empirical evidence. in r.r. sinclair & t.w. britt (eds.), building psychological resilience in military personnel: theory and practice (pp. 167–191). washington, dc: american psychological association. xie, y., peng, l., zuo, x., & li, m. (2016). the psychometric evaluation of the connor-davidson resilience scale using a chinese military sample. plos one, 11(2), e0148843. https://doi.org/10.1371/journal.pone.0148843 appendix 1 figure 1-a1: distribution of the comprehensive fitness score. table 1-a1: normality metrics for the brief sailor resiliency scale (n = 1312). table 2-a1: goodness-of-fit indices for the brief sailor resiliency scale. table 3-a1: brief sailor resiliency scale factor correlations acknowledgements references about the author(s) justin o. august department of psychology, faculty of health sciences, nelson mandela university, port elizabeth, south africa solomon mashegoane department of psychology, faculty of humanities, university of limpopo, polokwane, south africa citation august, j.o., & mashegoane, s. (2021). psychological assessment during and after the covid-19 pandemic african journal of psychological assessment, 3(0), a74. https://doi.org/10.4102/ajopa.v3i0.74 editorial psychological assessment during and after the covid-19 pandemic justin o. august, solomon mashegoane copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. the coronavirus disease 2019 (covid-19) has become a global pandemic, with increasing numbers of infected patients being reported daily. at the beginning of july 2021, the african continent had recorded a total of nearly 5.7 million covid-19 cases and just over 147 000 deaths, whilst only just under 1.1% individuals have been fully vaccinated (africa cdc covid-19 brief, 2021). countries across the globe, including those on the african continent, have imposed restrictions on movement and social distancing, with most of them implementing some form of lockdown. the psychological, social and economic effects of the pandemic are unprecedented, most certainly like nothing this generation has ever experienced. the current covid-19 climate has necessitated shifts in all aspects of human physical interaction, with remote working becoming the norm and teleservices for everything from healthcare to shopping taking prominence. the effects on marginalised communities remain a core concern (un committee on economic, social & cultural rights [uncescr], 2020), with the pandemic exacerbating structural inequalities and exposing the stark socio-economic realities that have often remained hidden in a pre-covid-19 era. psychology and psychological assessment in particular had to re-examine its processes. the majority of assessment in africa is person to person. assessment practitioners in the african context already face challenges of lack of appropriate test material, language difficulties and constraints with regard to cross-cultural applicability (mpofu & nyanungo, 1998; mpofu, peltzer, shumba, serpell, & mogaji, 2005; laher & cockcroft, 2013). online assessments are atypical. in countries outside of the african continent, tele-assessment approaches are deployed during the emergency lockdown to continue psychological assessments, thus minimising face-to-face contact (british psychological society [bps], 2021; farmer et al., 2020; health professions council of south africa [hpcsa], 2020; hewitt, rodgin, loring, pritchard, & jacobson, 2020). across africa, very few measures can be utilised via a tele-assessment medium, and even if they could, a large percentage of the population is at a disadvantage through this practice. technological inaccessibility is a reality across the continent, particularly amongst the most marginalised populations (mahler, montes, & locke, 2019). the inequalities faced by the marginalised populations in society were always prevalent in assessment (laher, serpell, ntinda, & chireshe, in press; oppong, oppong asante, & adote anum, in press), but have been exacerbated manifold during the pandemic. the teaching of psychological assessment at universities and training sites has also been a cause of concern. guidelines from african organisations such as the hpcsa and international guidelines from the bps have been generic, presenting overarching factors that practitioners should consider but input on the efficacy of these guidelines is still forthcoming. the practice of psychological testing and assessment requires rethinking, reconstructing and critical engagement (hewitt et al., 2020). the articles in this special section explore various ways in which psychological assessment can be conducted during covid-19 traversing a number of contexts from corporate organisations through to higher education in africa. dowdeswell and kriek (2021) explore the perceptions of 41 cross-industry clients for covid-19 and post-covid-19 human resources practices. one of the most interesting observations made by the authors is that the private sector is already on solid ground when it comes to online psychological assessment during the covid-19 pandemic. seemingly, the pandemic did not impose as harsh a need for transition in the sector as online testing was already being used in industry. they discuss the use of unproctored internet testing (uit) and virtual or video interviewing technologies and the role of assessment in retrenchment and restructuring applications in industry. of note is the argument around access to technology and the role of mobile devices in this process providing some equitable access even if far from ideal. dowdeswell and kriek predict that the post-covid-19 work environment will take advantage of existing technology and maintain the momentum of digitalisation, suggesting that the pandemic and lockdown have offered opportunities for re-examining the traditional mode of assessment practices. munnik, smith, adams tucker and human’s (2021) article highlights how a south african institution responded to the constraints imposed by the emergency lockdown to the teaching and training of psychological assessment to masters students in a clinical psychology programme. munnik et al. found that teaching and training under emergency lockdown conditions necessitated reprioritisation. the reconceptualised view of the teaching process meant that training in psychometrics can be seen as processual, involving varied stakeholders and settings. the authors discuss the use of multiple pedagogies in an attempt to incorporate theoretical and practical training online reflecting on the efficacy of these going forward. wigdorowitz, rajab, hassem and titi (2021) invite test users to explore the complex challenges brought about by testing under conditions of online psychological assessment where physical contact is impossible. cognitive testing is challenging because unlike personality testing, requisite testing conditions require some form of direct observation to supervise the process, control time and set up context. wigdorowitz et al. considered all aspects of assessment in the private sector, including the added burden of shifting expenses to the test taker and the integrity of the testing process itself. for test users in academia, the incentives (e.g. employment requirements and developmental initiatives) that exist for test takers to engage in the testing tasks are virtually non-existent. thus, testing during the emergency lockdown in academia has been, in the view of wigdorowitz et al., especially challenging. that aside the authors provide a useful list to consider when undertaking online assessments. unlike the other articles in this series that provided more of a meta-view on conducting assessments, the last article in the series (makhubela & mashegoane, 2021) discusses the validation of the fear of covid-19 scale. the sudden onset of the pandemic has meant that tools evaluating aspects of individuals physical and mental health in relation to the pandemic were a priority. fear of covid-19 is an everyday reality for the world’s population. in order to intervene on a large scale, this construct must be understood and amenable to comparison. makhubela and mashegaone found the psychometric properties of the scale to be sound in a sample of south african students. furthermore, the results were comparable to what was found in other countries in and outside the african continent, suggesting that the instrument is useful for cross-national studies. the construct fear of covid-19 is also argued to be distinct from other fears. having a scale applicable across cultures bodes well for further conceptual developments and interventions in this area. in conclusion, the articles in the series show that whilst the toll that the covid-19 pandemic has exerted on many sectors of society in africa, including industry and academia, is undeniable, there are positive outcomes. the outstanding positive outcomes for assessment is the transition and improved understanding of methods of training and the possible use of online assessment technologies equitably. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions j.o.a. and s.m. both contributed equally to the writing of this editorial. ethical considerations this article followed all ethical standards for a research without direct contact with human or animal subjects. funding information the authors received no financial support for the research, authorship, and/or publication of this article. data availability data sharing is not applicable to this article, as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of an affiliated agency of the authors. references africa cdc covid-19 brief. (2021, july 06). outbreak brief #77: coronavirus disease 2019 (covid-19) pandemic. retrieved from https://africacdc.org/download/outbreak-brief-77-coronavirus-disease-2019-covid-19-pandemic/ british psychological society. (2021). guidance for psychological professionals during the covid-19 pandemic. retrieved from https://www.bps.org.uk/sites/www.bps.org.uk/files/policy/policy%20-%20files/guidance%20for%20psychological%20professionals%20during%20covid-19.pdf dowdeswell, k.e., & kriek, h.j. (2021). shifting assessment practices in the age of covid-19. african journal of psychological assessment, 3(0), a50. https://doi.org/10.4102/ajopa.v3i0.50 farmer, r.l., mcgill, r.j., dombrowski, s.c., benson, n.f., smith-kellen, s., lockwood, a.b. … stinnett, t.a. (2020). conducting psychoeducational assessments during the covid-19 crisis: the danger of good intentions. contemporary school psychology, 25, 27–32. https://doi.org/10.1007/s40688-020-00293-x health professions council of south africa. (2020, march 26). guidance on the application of telemedicine guidelines during the covid-19 pandemic. pretoria: hpcsa. retrieved from https://www.hpcsa.co.za/uploads/professional_practice/conduct%20%26%20ethics/booklet%2010%20telemedicine%20september%20%202016.pdf hewitt, k.c., rodgin, s., loring, d.w., pritchard, a.e., & jacobson, l.a. (2020). transitioning to telehealth neuropsychology service: considerations across adult and pediatric care settings. the clinical neuropsychologist, 34(7–8), 1335–1351. https://doi.org/10.1080/13854046.2020.1811891 laher, s., & cockcroft, k. (2013) psychological assessment in south africa: research and applications. johannesburg: wits university press. laher, s., serpell, r., ntinda, k., & chireshe, r. (in press). psychological assessment in southern africa. in s. laher (ed.), international histories of psychological assessment. cambridge: cambridge university press. mahler, d.g., montes, j., & locke, n.d. (2019). internet access in sub-saharan africa (english). poverty and equity note; no. 13. washington, dc: world bank group. retrieved from https://documents.worldbank.org/en/publication/documentsreports/documentdetail/518261552658319590/internet-access-in-sub-saharan-africa makhubela, m., & mashegoane, s. (2021). psychometric properties of the fear of covid-19 scale amongst black south african university students. african journal of psychological assessment, 3(0), a57. https://doi.org/10.4102/ajopa.v3i0.57 mpofu, e., & nyanungo, k.r.l. (1998). educational and psychological testing in zimbabwean schools: past, present and future. european journal of psychological assessment, 14(1), 71–90. https://doi.org/10.1027/1015-5759.14.1.71 mpofu, e., peltzer, k., shumba, a., serpell, r., & mogaji, a. (2005). school psychology in sub-saharan africa: results and implications of a six-country survey. in c.r. reynolds & c. frisby (eds.). comprehensive handbook of multicultural school psychology. (pp. 1128–1151). new york, ny: john wiley. munnik, e., smith, m., adams tucker, l., & human, w. (2021). covid-19 and psychological assessment teaching practices – reflections from a south african university. african journal of psychological assessment, 3(0), a40. https://doi.org/10.4102/ajopa.v3i0.40 oppong, s., oppong asante, k., & adote anum, a. (in press). psychological assessment in west africa. in s. laher (ed.), international histories of psychologic alassessment. cambridge: cambridge university press. un committee on economic, social and cultural rights (uncescr). (2020, april 17). statement on the coronavirus disease (covid-19) pandemic and economic, social and cultural rights. retrieved from https://www.ohchr.org›cescr›stm_covid19 wigdorowitz, m., rajab, p., hassem, t., & titi, n. (2021). the impact of covid-19 on psychometric assessment across industry and academia in south africa. african journal of psychological assessment, 3(0), a38. https://doi.org/10.4102/ajopa.v3i0.38 abstract introduction method results discussion acknowledgements reference about the author(s) samuel adjorlolo department of mental health, school of nursing and midwifery, college of health sciences, university of ghana, accra, ghana citation adjorlolo, s. (2019). generalised anxiety disorder in adolescents in ghana: examination of the psychometric properties of the generalised anxiety disorder-7 scale. african journal of psychological assessment, 1(0), a10. https://doi.org/10.4102/ajopa.v1i0.10 original research generalised anxiety disorder in adolescents in ghana: examination of the psychometric properties of the generalised anxiety disorder-7 scale samuel adjorlolo received: 05 feb. 2019; accepted: 06 june 2019; published: 18 july 2019 copyright: © 2019. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the generalised anxiety disorder-7 (gad-7) is a self-report scale used to assess general anxiety symptoms. although the gad-7 has been found to be a valid scale among adults, studies examining its psychometric properties among adolescents in high-income countries are notably limited and particularly non-existent in lowand middle-income countries. the current study addresses this lacuna by investigating the factorial validity, construct validity, internal consistency and discriminant accuracy of the gad-7. data were collected from 553 adolescents (boys = 231; average age = 16.85) recruited from a senior high school in ghana, a sub-saharan african country, using cross-sectional self-report methodology. the result supports a unidimensional structure of the gad-7 that was invariant across gender. the gad-7 correlates significantly with measures of anxiety, suicidal tendencies and mental well-being, suggesting construct validity. the internal consistency of the gad-7, based on the mean inter-item correlation value of 0.24 and cronbach’s a = 0.69, is adequate. the gad-7 similarly discriminated between individuals at high risk of suicidal tendencies and depression from those with low or no risk, with area under curve values of 0.71 and 0.70, respectively. the gad-7 is a reliable and valid measure to screen for generalised anxiety disorder among adolescents in ghana. keywords: generalised anxiety disorder; gad-7; validation; psychometric properties; adolescents; africa. introduction adolescents are highly prone to developing a range of mental health problems, including generalised anxiety disorder (gad) which is characterised by excessive, uncontrollable, irrational anxiety and worry (apa, 2013; wittchen, zhao, kessler, & eaton, 1994). the lifetime prevalence rates of gad range from 1.5% to 3% in adolescents (merikangas et al., 2010) and from 7.3% to 13% in clinical samples (caballero, bobes, vilardaga, & rejas, 2009; chocrón bentata, vilalta franch, legazpi rodríguez, auquer, & franch, 1995). generalised anxiety disorder has been associated with poor health-related quality of life and functionality (bereza, machado, & einarson, 2009), general impairment (wittchen, 2002) and comorbidity with depression and social phobia (tiirikainen, haravuori, ranta, kaltiala-heino, & marttunen, 2019). unsurprisingly, gad has been reported to contribute to 17% of the disability-adjusted life years lost among 15–19-year-old adolescents (mokdad et al., 2016). the foregoing notwithstanding, many adolescents with gad, particularly in lowand middle-income countries (lmics), are often undetected and untreated (kroenke, spitzer, williams, monahan, & löwe, 2007; lieb, becker, & altamura, 2005) mainly because of the lack of reliable and valid screening and assessment measures (owen, baig, abbo, & baheretibeb, 2016; tran et al., 2018). this development has contributed greatly to the widening of mental health treatment gap (owen et al., 2016), as well as lack of data on the burden of mental health problems in lmic (cortina, sodha, fazel, & ramchandani, 2012; tran et al., 2018). valid, reliable and easy-to-administer screening tests could help detect individuals at risk of gad for early intervention, as well as for large-scale epidemiological studies on the prevalence, risk factors and protective factors of gad. the generalised anxiety disorder-7 (gad-7) scale is a 7-item easy to administer tool developed to screen for probable cases of gad based on the diagnostic and statistical manual of mental disorders (dsm-iv) criteria (spitzer, kroenke, williams, & löwe, 2006). a recent systematic and meta-analytic review has revealed that the gad-7 has acceptable psychometric properties in adult samples (plummer, manea, trepel, & mcmillan, 2016). the bulk of the studies suggest that the gad-7 has a unidimensional or one-factor structure (ito, takebayashi, muramatsu, & horikoshi 2018; löwe et al., 2008; sawaya, atoui, hamadeh, zeinoun, & nahas, 2016; sousa et al., 2015; tiirikainen et al., 2019), although others have reported a two-factor structure (beard & björgvinsson, 2014; ito et al., 2018). additionally, the gad-7 has demonstrated sound diagnostic properties with a sensitivity of 89% and specificity of 82% at a cut-off of 10 points in clinical populations (kroenke et al., 2007; spitzer et al., 2006). as one of the widely used screening measures for gad, the gad-7 has been translated into different languages, including portuguese (sousa et al., 2015), dutch (donker, van straten, marks, & cuijpers, 2011), finnish (kujanpää et al., 2014), spanish (garcía-campayo et al., 2010), german (löwe et al., 2008) and malay (sidik, arroll, & goodyear-smith, 2012). however, there are questions regarding the utility of the gad-7 when administered on adolescent samples, given that studies investigating the psychometric properties of the gad-7 among adolescents are emerging. to the best of our knowledge, only tiirikainen et al. (2019) investigate the psychometric properties of the gad-7 in adolescents in finland. the authors reported that the gad-7 is a valid and reliable measure with a unidimensional factor structure similar to those reported in adult populations. no study has investigated the psychometric properties of the gad-7 in adolescents living in sub-saharan african (ssa) countries that differ from their counterparts in high-income on several factors. these include differences in socio-economic status and prevalence of risk factors of mental health problems such as war, trauma, child abuse and neglect, being orphaned, and food insecurity (cortina et al., 2012; reiss, 2013). cultural context significantly shapes behaviour, including how people feel, think and interact socially, as well as what constitute distress and the threshold for distress (moleiro, 2018). it has long been observed that western, individualistic cultures tend to shape the behaviours of individuals somewhat differently from non-western, collectivist cultures (huntington, 1996; nqweni, pinderhughes, & hurley, 2010; padmanabhanunni, 2019). in this context, what constitutes anxiety and the threshold for detecting and endorsing the same may differ between individualistic and collectivists cultures. more so is the observation that the acquisition and expression of behaviours tend to differ somewhat between men and women located in one cultural setting. relating specifically to ghana, boys are, for instance, socialised to develop hard and resilient personalities not only to be able to withstand adverse conditions in life but also to be able to provide for the needs of their parents in old age. in contrast, girls are more likely to be socialised within a vulnerability framework and also are encouraged to seek protection, mostly from boys (adjorlolo, adu-poku, andoh-arthur, botchway, & mlyakado, 2017b). the somewhat different socialisation processes for boys and girls could influence the extent to which they acquire, acknowledge and express psychopathological behaviours, suggesting that assessment measures for anxiety could perform differently across the genders. because the psychometric properties of western-based measures could be altered substantially when administered in ssa (adjorlolo, abdul-nasiru, chan, & bentum jr, 2017a; adjorlolo & watt, 2017), examining, for instance, the underlying factor structure of the gad-7 will help to determine whether this measure taps into gad in ssa (i.e. factorial validity). it follows that the call for more studies into the psychometric properties of the gad-7 (tiirikainen et al., 2019) should also take into consideration its cross-cultural validation (doi, ito, takebayashi, muramatsu, & horikoshi, 2018). consequently, the current study was designed to investigate the psychometric properties of the gad among adolescents in ghana to contribute to the emerging (cross-cultural) literature regarding the application of the gad-7 in adolescents. in this regard, the study first and foremost examined the factorial validity of the gad-7 and further determined whether the factorial validity is the same for boys and girls (i.e. invariant across gender). gender invariance analysis examines the extent of similarity on the endorsement of the gad-7 items for boys and girls, thereby helping to ensure that the gad-7 scores are not biased (i.e. under or overestimated) for one group (steinmetz, schmidt, tina-booh, wieczorek, & schwartz, 2009). secondly, the study investigated the internal consistency and the construct validity of the gad-7. lastly, given that anxiety is often comorbid with depression (brady & kendall, 1992) and correlates highly with suicidal tendencies (balázs et al., 2013), the ability of the gad-7 to discriminate between participants designated as high risk and low or no risk for depression and suicidal tendencies was examined. method sample the total student population at the time of data collection was 650. of the 600 questionnaires administered, 555 were returned, representing a response rate of 92.5%. two substantially uncompleted questionnaires were subsequently excluded. ghana’s educational system operates on a 6-3-3-4 system (i.e. primary school – 6 years, junior high school – 3 years, senior high school – 3 years and university bachelor’s degree – 4 years). the school was selected using a multi-stage sampling technique, involving a random selection of eastern region out of the 10 regions in ghana. this was followed by a random selection of a district in the region and one senior high school in the district. information relating to the various schools in the selected district was obtained from the ghana education service. english is the official language of instruction at the various levels of education (adjorlolo, 2016). more than half of the participants were girls (n = 322, 58.1%) with an average age of 16.85 (standard deviation [s.d.] = 1.32). measures a self-report methodology using a cross-sectional survey design was employed to gather data from senior high school students recruited from a school in the eastern region of ghana. a questionnaire consisting of a demographics section, the gad-7, phq, who-5 and sbq-r was used. generalised anxiety disorder-7 items are rated on a 4-point likert scale, ranging from 0 (not at all) to 3 (nearly always). the seven items are summed to generate a total score that ranges from 0 to 20 in the present study. higher scores indicate more severe symptoms of gad. patient health questionnaire-9 (phq; kroenke, spitzer, & williams, 2001) was administered to measure depressive symptoms in the participants. the phq-9 items are rated on a four-point scale ranging from ‘not at all’ (0) to ‘nearly every day’ (3). higher scores, obtained by summing the participants’ responses, indicate more depressive symptoms. the cronbach’s alpha recorded in the present sample was 0.71. world health organization well-being index (who-5; topp, østergaard, søndergaard, & bech, 2015) is a five-item scale used to index subjective, positive well-being. the items are scored from 5 (all of the time) to 0 (none of the time), with a total score ranging from 0 (absence of well-being) to 25 (maximal well-being). a cronbach’s alpha of 0.70 was observed for the present sample. suicidal behaviour questionnaire-revised (sbq-r; osman et al., 2001) is a four-item scale administered to screen for suicidal tendencies. the items are rated on likert scales as follows: item 1 = 1–4, item 2 = 1–5, item 3 = 1–3 and item 4 = 0–6. the sbq-r total score ranges from 3 to 18, with higher scores reflecting greater risk for suicidal tendencies. prior studies have found positive and significant correlations between suicidal tendencies and anxiety or gad (balázs et al., 2013; nepon, belik, bolton, & sareen, 2010). the cronbach’s alpha reported in the present study was 0.78. procedure data were collected from the students in their respective classes. in each class, the research team briefed the participants on the purpose of the study and their responsibilities as participants. they were encouraged to ask questions to allay any fear and anxiety pertaining to participating in the study. ethical issues, particularly those relating to confidentiality, anonymity and withdrawal from the study without being penalised, were communicated to the participants and ensured. to maintain anonymity, for instance, the questionnaires were devoid of identifying information of the participants such as name and student numbers. the participants were also informed that their responses will be treated strictly confidentially, and that they can withdraw from the study anytime without being penalised, or the research team can terminate their participation without their explicit consent. those expressing interest and willing to participate in the study were handed a pack of questionnaires described previously. the questionnaires were handed over to the research team in each class upon competition. data were collected from the secondand third-year students because the first-year students were yet to commence school at the time of data collection. data analysis missing data analysis revealed that less than 5% of cases have data points missing on the sbq-r (2.4% – 3.1%), phq (2% – 3.3%), gad-7 (1.3% – 2.7%) and who-5 (1.3% – 2.4%). further analyses revealed that data were missing completely at random (little’s chi-square > 0.05). consequently, the missing data point was imputed using the expectation-maximisation algorithm (adjorlolo & watt, 2017). confirmatory factor analysis, using the maximum likelihood method, was used to test for the factorial validity of the gad-7. a multi-group cfa was conducted to determine the invariance of the associations between the observed items and the latent factor. this was performed sequentially in the following ways (see adjorlolo & watt, 2017; byrne, 2010; steinmetz et al., 2009). firstly, separate models were estimated for boys and girls, called the baseline model. secondly, an unconstrained model was established whereby the fixed and free parameters were estimated simultaneously for the groups (i.e. configural invariance). thirdly, the factor loadings were constrained across the groups to establish metric invariance. next, the factor variance was constrained, in addition to the factor loadings to investigate invariance of factor variance. lastly, the factor loadings, factor variance and error variances were constrained to be equal across the groups to determine the invariance of error variances. constraining error variances is important for testing the equality of reliability of the gad-7 across the groups (byrne, 1988). evidence of gender invariance was evaluated using the difference in comparative fit index (cfi; δcfi) and chi-square (c2; δc2). a non-significant δc2 and δcfi ≥ −0.01 between the restrictive and less restrictive or unconstrained models indicate the attainment of gender invariance. in case of discordance between the δc2 and δcfi, the estimates provided by the latter were deemed reliable. this is because, unlike the δc2, the δcfi is independent of the sample size and model complexity, and is also uncorrelated with the overall fit measures (cheung & rensvold, 2002). model fit was determined using c2 and the following commonly used fit indicators: cfi, tucker–lewis index (tli), adjusted goodness of fit index (agfi) and a non-centrality-based index, the root mean square error of approximation (rmsea). comparative fit index and tli values close to 0.95, or greater, and rmsea value close to 0.06 or below indicate good model fit (hu & bentler, 1999). the cfa and multi-group cfa were performed using analysis of moment structures (amos) version 21. zero-order correlations with measures of anxiety, mental well-being and suicidal tendencies were performed to examine construct validity of gad-7, whereas internal consistency was investigated using cronbach’s alpha (a) and mean inter-item correlation (mic; clark & watson, 1995). the discriminant validity of the gad-7 was determined using the receiver operating curve (roc), with the area under the curve (auc) of the roc indicating the overall discriminant accuracy. the auc value ranges between 0.5 (no discriminative power) and 1.0 (maximum discriminative power). the zero-order correlations, internal consistency and roc analyses were conducted using ibm corp. spss version 23. ethical considerations the ethics committee for humanities of the university of ghana granted ethical approval for the study (ethical clearance number: ech 165/17-18). the school management granted institutional permission for the study. results prevalence of generalised anxiety disorder, depression and suicidal tendencies participants endorsing the highest response options on the items of the various measures were designated as high risk. using this criterion, 4.2% (n = 23), 17% (n = 95) and 18% (n = 102) of the participants were designated as at high risk for gad, suicidal tendencies and depression, respectively. factorial validity and gender invariance of generalised anxiety disorder-7 the results of the factorial validity and gender invariance analyses of the gad-7 are summarised in table 1. the gad-7 evidenced a unidimensional factor structure, providing a good fit to the data in the full sample (agfi = 0.98; tli = 0.97; cfi = 0.98; rmsea = 0.03), boys subsample (agfi = 0.96; tli = 0.97; cfi = 0.98; rmsea = 0.04) and girls subsample (agfi = 0.96; tli = 0.96; cfi = 0.98; rmsea = 0.03). measurement invariance analyses indicate that configural invariance (gfi = 0.97; tli = 0.96; cfi = 0.97; rmsea = 0.03), metric invariance (δcfi = −0.002), invariance of factor variance (δcfi = 0.000) and invariance of error variances (δcfi = 0.001) have been attained. as shown in table 2, the gad-7 items loaded satisfactorily and significantly onto the unidimensional factor structure. a two-factor model was also tested in accordance with previous studies (beard & björgvinsson, 2014; ito et al., 2018). it was observed that the model for the full sample did not provide a good model fit to the data (i.e. agfi = 0.88; tli = 0.74; cfi = 0.67; rmsea = 0.09). further analyses reveal that the girls (agfi = 0.75; tli = 0.92; cfi = 0.64; rmsea = 0.11) and boys (agfi = 0.89; tli = 0.94; cfi = 0.76; rmsea = 0.09) subsamples did not yield good model fits. inspection of the modification indices and coefficient suggests that no alteration of the models will cause improvement to the model. given that these initial model assessments are critical to measurement invariance analysis, a decision was reached not to investigate the models further by constraining the model parameters. it was therefore concluded a unidimensional model fit the data for ghanaian adolescents’ sample. table 1: fit statistics for the generalised anxiety disorder-7 scale in adolescents. table 2: characteristics of generalised anxiety disorder-7 scale items (boys = 231, girls = 322). construct validity and internal consistency of generalised anxiety disorder-7 the gad-7 correlated significantly and positively with measures of depression (r = 0.67, p < 0.001; r = 0.68, p < 0.001; r = 0.67, p < 0.001), suicidal behaviour (r = 0.40, p < 0.001; r = 0.44, p < 0.001; r = 0.37, p < 0.001) and negatively with mental well-being (r = −0.35, p < 0.001; r = −0.32, p < 0.001; r = −0.37, p < 0.001) in the full sample, boys and girls subsamples, respectively. likewise, the internal consistency of the gad-7 for the full sample, based on the cronbach’s alpha (a), was 0.69, whereas the mic was 0.24 for the full sample. similar result was obtained for boys (a = 0.72; mic= 0.26) and girls (a = 0.66; mic = 0.22) subsamples. discriminant accuracy of the generalised anxiety disorder-7 the gad-7 significantly discriminated between participants designated as high and low or no risk for depression (auc = 0.70; z = 6.77, p < 0.001; ci [0.66, 0.74]), as well as high and low or no risk of suicidal tendencies (auc = 0.71; z = 7.70, p < 0.001; ci = [0.67, 0.75]). discussion the study primarily investigated the psychometric properties of the gad-7 among adolescents in ghana. in keeping with the findings of earlier multi-site, adult-based studies (ito et al., 2018; löwe et al., 2008; sawaya et al., 2016; sousa et al., 2015) and a recent finnish, adolescent-based study (tiirikainen et al., 2019), the gad-7 showed a unidimensional structure among ghanaian adolescents. the factor structure of the gad-7 is stable in the present sample, given that it was not data-driven (i.e. no change was affected to the model based on the modification indices). this observation raises the possibility that a unidimensional structure of the gad-7 will emerge in similar or related adolescent samples from ghana (adjorlolo et al., 2017a). the gad-7 items loaded significantly onto the unidimensional factor structure, with all the factor loadings exceeding the conventional value of 0.30 (nunnally, 1978). likewise, the corrected item-total correlation values indicate that all the items contributed meaningfully to the gad-7 (barthel et al., 2015). more importantly, the unidimensional structure of the gad-7 was invariant for boys and girls, suggesting that the same or similar relationships can be expected between the gad and its indicators for boys and girls (doi et al., 2018). thus, any gender-based mean-level difference or difference in prevalence and incidence estimates could not be attributed to the biases of the gad-7 for one group (baas et al., 2011). the study suggests that the gad-7 has the propensity to tap into gad across different settings for both boys and girls. stated alternatively, the expression of symptoms of anxiety as captured by the gad-7 may not necessarily be influenced by the cultural background of the participants and both boys and girls are more likely to demonstrate statistically similar response patterns on the gad-7. moreover, although the cronbach’s alpha of the gad-7 was comparatively low (i.e. 0.69), the mic values, which are largely independent of the number of the item, are all in the recommended ranges (i.e. 0.15–0.50) to be considered adequate (clark & watson, 1995). it is therefore posited tentatively that the gad-7 is internally consistent for boys and girls, in keeping with the findings of previous studies (ito et al., 2018; löwe et al., 2008; sawaya et al., 2016; sousa et al., 2015; tiirikainen et al., 2019). the gad-7 correlated significantly with measures of depression, suicide and mental well-being, thereby confirming the construct validity of the gad-7. likewise, the gad-7 discriminated between individuals at high risk of suicidal tendencies and depression from those with low or no risk, although the discriminant accuracy was moderate (swets, 1988). all in all, the gad-7 demonstrated sound psychometric properties and moderate discriminant accuracy among adolescents in ghana. limitations of the study firstly, the study did not include a gold standard measure of gad, thereby making it impossible to assess the diagnostic properties (e.g. sensitivity and specificity) of the gad-7 for gad, in addition to depression and suicidal tendencies. the use of self-report measures provides no mechanism to verify the accuracy of the participants’ responses. thus, inaccurate responses (underreporting or over-reporting) are possibilities. the generalisability of the study findings to adolescents with dissimilar background characteristics (e.g. those out of school) may be limited. again, using a clinical sample would have provided an added layer of validity, and therefore it would be useful in future studies to examine adolescents diagnosed with gad. future studies addressing the limitations noted here will help to further illuminate the psychometric appropriateness and utility of the gad-7 in adolescents in ghana and other lmics. conclusion the results suggest that the gad-7 is a reliable and valid measure to screen and identify adolescents at risk of gad in ghana, an ssa state. the findings have contributed to the repertoire of the cross-cultural literature on the assessment of psychopathologies or mental disorders. while the influence of cultural factors on psychopathological behaviours should not be discounted, it is equally important to acknowledge that measures developed to assess behaviours that supposedly have roots in western, individualistic cultures could prove useful in non-western, collectivists’ cultures. acknowledgements competing interests the author declares that no competing interests exist. authors’ contributions i declare that i am the sole author of this research article. funding this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability statement data is available upon request from the author. disclaimer the views expressed in this article are the author’s own and not the official position of his institution. reference adjorlolo, s. (2016). ecological validity of executive function tests in moderate traumatic brain injury in ghana. the clinical neuropsychologist, 30(sup1), 1517–1537. https://doi.org/10.1080/13854046.2016.1172667 adjorlolo, s., abdul-nasiru, i., chan, h.c., & bentum jr, f. (2017a). attitudes toward the insanity defense: examination of the factor structure of insanity defense attitude-revised (ida-r) scale in ghana. international journal of forensic mental health, 16(1), 33–45. https://doi.org/10.1080/14999013.2016.1235628 adjorlolo, s., adu-poku, s., andoh-arthur, j., botchway, i., & mlyakado, b.p. (2017b). demographic factors, childhood maltreatment and psychological functioning among university students’ in ghana: a retrospective study. international journal of psychology, 52, 9–17. https://doi.org/10.1080/14999013.2016.1235628 adjorlolo, s., & watt, b.d. (2017). factorial and convergent validity of the youth psychopathic traits inventory-short version in ghana. international journal of psychology, 54(3), 388–396. https://doi.org/10.1002/ijop.12468 apa. (2013). diagnostic and statistical manual of mental disorders(dsm-iv). washington, dc: american psychiatric publishing. baas, k.d., cramer, a.o., koeter, m.w., van de lisdonk, e.h., van weert, h.c., & schene, a.h. (2011). measurement invariance with respect to ethnicity of the patient health questionnaire-9 (phq-9). journal of affective disorders, 129(1–3), 229–235. https://doi.org/10.1016/j.jad.2010.08.026 balázs, j., miklósi, m., keresztény, á., hoven, c.w., carli, v., wasserman, c., … cosman, d. (2013). adolescent subthreshold-depression and anxiety: psychopathology, functional impairment and increased suicide risk. journal of child psychology and psychiatry, 54(6), 670–677. https://doi.org/10.1111/jcpp.12016 barthel, d., barkmann, c., ehrhardt, s., schoppen, s., bindt, c., & international cds study group. (2015). screening for depression in pregnant women from côte d’ ivoire and ghana: psychometric properties of the patient health questionnaire-9. journal of affective disorders, 187, 232–240. https://doi.org/10.1016/j.jad.2015.06.042 beard, c., & björgvinsson, t. (2014). beyond generalized anxiety disorder: psychometric properties of the gad-7 in a heterogeneous psychiatric sample. journal of anxiety disorders, 28(6), 547–552. https://doi.org/10.1016/j.janxdis.2014.06.002 bereza, b.g., machado, m., & einarson, t.r. (2009). systematic review and quality assessment of economic evaluations and quality-of-life studies related to generalized anxiety disorder. clinical therapeutics, 31(6), 1279–1308. https://doi.org/10.1016/j.clinthera.2009.06.004 brady, e.u., & kendall, p.c. (1992). comorbidity of anxiety and depression in children and adolescents. psychological bulletin, 111(2), 244. https://doi.org/10.1037//0033-2909.111.2.244 byrne, b.m. (1988). the self description questionnaire iii: testing for equivalent factorial validity across ability. educational and psychological measurement, 48(2), 397–406. https://doi.org/10.1177/0013164488482012 byrne, b.m. (2010). structural equation modelling with amos: basic concepts, assumptions and programming (2nd edn.). new york, ny: routlege. caballero, l., bobes, j., vilardaga, i., & rejas, j. (2009). clinical prevalence and reason for visit of patients with generalized anxiety disorder seen in the psychiatry out-patient clinics in spain. results of the ligando study. actas esp psiquiatr, 37(1), 17–20. cheung, g.w., & rensvold, r.b. (2002). evaluating goodness-of-fit indexes for testing measurement invariance. structural equation modeling, 9(2), 233–255. https://doi.org/10.1207/s15328007sem0902_5 chocrón bentata, l., vilalta franch, j., legazpi rodríguez, i., auquer, k., & franch, l. (1995). prevalencia de psicopatología en un centro de atención primaria. atención primaria, 16(10), 586–593. clark, l.a., & watson, d. (1995). constructing validity: basic issues in objective scale development. psychological assessment, 7(3), 309. https://doi.org/10.1037/1040-3590.7.3.309 cortina, m.a., sodha, a., fazel, m., & ramchandani, p.g. (2012). prevalence of child mental health problems in sub-saharan africa: a systematic review. archives of pediatrics & adolescent medicine, 166(3), 276–281. https://doi.org/10.1001/archpediatrics.2011.592 doi, s., ito, m., takebayashi, y., muramatsu, k., & horikoshi, m. (2018). factorial validity and invariance of the patient health questionnaire (phq)-9 among clinical and non-clinical populations. plos one, 13(7), e0199235. https://doi.org/10.1371/journal.pone.0199235 donker, t., van straten, a., marks, i., & cuijpers, p. (2011). quick and easy self-rating of generalized anxiety disorder: validity of the dutch web-based gad-7, gad-2 and gad-si. psychiatry research, 188(1), 58–64. https://doi.org/10.1016/j.psychres.2011.01.016 garcía-campayo, j., zamorano, e., ruiz, m.a., pardo, a., pérez-páramo, m., lópez-gómez, v., … rejas, j. (2010). cultural adaptation into spanish of the generalized anxiety disorder-7 (gad-7) scale as a screening tool. health and quality of life outcomes, 8(1), 8. https://doi.org/10.1186/1477-7525-8-8 hu, l.t., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling: a multidisciplinary journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 huntington, s.p. (1996). the clash of civilizations and the remaking of world order. new york, ny: simon & schuster. ito, m., takebayashi, y., muramatsu, k., & horikoshi, m. (2018). factorial validity and invariance of the 7-item generalized anxiety disorder scale (gad-7) among populations with and without self-reported psychiatric diagnostic status. frontiers in psychology, 9, 1741. https://doi.org/10.3389/fpsyg.2018.01741 kroenke, k., spitzer, r.l., & williams, j.b. (2001). the phq-9: validity of a brief depression severity measure. journal of general internal medicine, 16(9), 606–613. https://doi.org/10.7326/0003-4819-146-5-200703060-00004 kroenke, k., spitzer, r.l., williams, j.b., monahan, p.o., & löwe, b. (2007). anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. annals of internal medicine, 146(5), 317–325. kujanpää, t., ylisaukko-oja, t., jokelainen, j., hirsikangas, s., kanste, o., kyngäs, h., & timonen, m. (2014). prevalence of anxiety disorders among finnish primary care high utilizers and validation of finnish translation of gad-7 and gad-2 screening tools. scandinavian journal of primary health care, 32(2), 78–83. lieb, r., becker, e., & altamura, c. (2005). the epidemiology of generalized anxiety disorder in europe. european neuropsychopharmacology, 15(4), 445–452. https://doi.org/10.1016/j.euroneuro.2005.04.010 löwe, b., decker, o., müller, s., brähler, e., schellberg, d., herzog, w., & herzberg, p.y. (2008). validation and standardization of the generalized anxiety disorder screener (gad-7) in the general population. medical care, 46(3), 266–274. https://doi.org/10.1097/mlr.0b013e318160d093 merikangas, k.r., he, j.-p., burstein, m., swanson, s.a., avenevoli, s., cui, l., … swendsen, j. (2010). lifetime prevalence of mental disorders in us adolescents: results from the national comorbidity survey replication–adolescent supplement (ncs-a). journal of the american academy of child & adolescent psychiatry, 49(10), 980–989. https://doi.org/10.1016/j.jaac.2010.05.017 mokdad, a.h., forouzanfar, m.h., daoud, f., mokdad, a.a., el bcheraoui, c., moradi-lakeh, m., … cercy, k. (2016). global burden of diseases, injuries, and risk factors for young people’s health during 1990–2013: a systematic analysis for the global burden of disease study 2013. the lancet, 387(10036), 2383–2401. moleiro, c. (2018). culture and psychopathology: new perspectives on research, practice, and clinical training in a globalized world. frontiers in psychiatry, 9, 366. https://doi.org/10.3389/fpsyt.2018.00366 nepon, j., belik, s.l., bolton, j., & sareen, j. (2010). the relationship between anxiety disorders and suicide attempts: findings from the national epidemiologic survey on alcohol and related conditions. depression and anxiety, 27(9), 791–798. https://doi.org/10.1002/da.20674 nqweni, z.c., pinderhughes, e.e., & hurley, s. (2010). delinquent adolescents’ regrettable behaviours and parental engagement: a cross-cultural comparison. journal of psychology in africa, 20(2), 249–257. https://doi.org/10.1080/14330237.2010.10820373 nunnally, j.c. (1978). psychometric theory (2nd ed.). new york, ny: mcgraw-hill. osman, a., bagge, c.l., gutierrez, p.m., konick, l.c., kopper, b.a., & barrios, f.x. (2001). the suicidal behaviors questionnaire-revised (sbq-r): validation with clinical and nonclinical samples. assessment, 8(4), 443–454. https://doi.org/10.1177/107319110100800409 owen, j.p., baig, b., abbo, c., & baheretibeb, y. (2016). child and adolescent mental health in sub-saharan africa: a perspective from clinicians and researchers. bjpsych international, 13(2), 45–47. padmanabhanunni, a. (2019). an examination of the psychometric properties and dimensionality of the aggression-problem behavior frequency scale in a sample of black south african adolescents. south african journal of psychology. https://doi.org/10.1177/0081246318824517 (published ahead of print). plummer, f., manea, l., trepel, d., & mcmillan, d. (2016). screening for anxiety disorders with the gad-7 and gad-2: a systematic review and diagnostic metaanalysis. general hospital psychiatry, 39, 24–31. https://doi.org/10.1016/j.genhosppsych.2015.11.005 reiss, f. (2013). socioeconomic inequalities and mental health problems in children and adolescents: a systematic review. social science & medicine, 90, 24–31. https://doi.org/10.1016/j.socscimed.2013.04.026 sawaya, h., atoui, m., hamadeh, a., zeinoun, p., & nahas, z. (2016). adaptation and initial validation of the patient health questionnaire–9 (phq-9) and the generalized anxiety disorder–7 questionnaire (gad-7) in an arabic speaking lebanese psychiatric outpatient sample. psychiatry research, 239, 245–252. https://doi.org/10.1016/j.psychres.2016.03.030 sidik, s.m., arroll, b., & goodyear-smith, f. (2012). validation of the gad-7 (malay version) among women attending a primary care clinic in malaysia. journal of primary health care, 4(1), 5–11. https://doi.org/10.1071/hc12005 sousa, t.v., viveiros, v., chai, m.v., vicente, f.l., jesus, g., carnot, m.j., … ferreira, p.l. (2015). reliability and validity of the portuguese version of the generalized anxiety disorder (gad-7) scale. health and quality of life outcomes, 13(1), 50. https://doi.org/10.1186/s12955-015-0244-2 spitzer, r.l., kroenke, k., williams, j.b., & löwe, b. (2006). a brief measure for assessing generalized anxiety disorder: the gad-7. archives of internal medicine, 166(10), 1092–1097. https://doi.org/10.1001/archinte.166.10.1092 steinmetz, h., schmidt, p., tina-booh, a., wieczorek, s., & schwartz, s.h. (2009). testing measurement invariance using multigroup cfa: differences between educational groups in human values measurement. quality & quantity, 43(4), 599. https://doi.org/10.1007/s11135-007-9143-x swets, j.a. (1988). measuring the accuracy of diagnostic systems. science, 240(4857), 1285–1293. https://doi.org/10.1126/science.3287615 tiirikainen, k., haravuori, h., ranta, k., kaltiala-heino, r., & marttunen, m. (2019). psychometric properties of the 7-item generalized anxiety disorder scale (gad-7) in a large representative sample of finnish adolescents. psychiatry research, 272, 30–35. https://doi.org/10.1016/j.psychres.2018.12.004 topp, c.w., østergaard, s.d., søndergaard, s., & bech, p. (2015). the who-5 well-being index: a systematic review of the literature. psychotherapy and psychosomatics, 84(3), 167–176. https://doi.org/10.1159/000376585 tran, t.d., kaligis, f., wiguna, t., willenberg, l., nguyen, h.t.m., luchters, s., … fisher, j. (2018). screening for depressive and anxiety disorders among adolescents in indonesia: formal validation of the center for epidemiologic studies depression scale–revised and the kessler psychological distress scale. journal of affective disorders, 246, 189–194. https://doi.org/10.1016/j.jad.2018.12.042 wittchen, h.u. (2002). generalized anxiety disorder: prevalence, burden, and cost to society. depression and anxiety, 16(4), 162–171. https://doi.org/10.1002/da.10065 wittchen, h.-u., zhao, s., kessler, r.c., & eaton, w.w. (1994). dsm-iii-r generalized anxiety disorder in the national comorbidity survey. archives of general psychiatry, 51(5), 355–364. https://doi.org/10.1001/archpsyc.1994.03950050015002 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) malose makhubela department of psychology, faculty of humanities, university of limpopo, polokwane, south africa solomon mashegoane department of psychology, faculty of humanities, university of limpopo, polokwane, south africa citation makhubela, m., & mashegoane, s. (2021). psychometric properties of the fear of covid-19 scale amongst black south african university students. african journal of psychological assessment, 3(0), a57. https://doi.org/10.4102/ajopa.v3i0.57 original research psychometric properties of the fear of covid-19 scale amongst black south african university students malose makhubela, solomon mashegoane received: 04 mar. 2021; accepted: 07 june 2021; published: 23 july 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract coronavirus disease 2019 (covid-19) has spread widely leading to a global public health crisis of a pandemic proportion. whilst infection rates tend to fluctuate in south africa, covid-19 remains a life-threatening disease with the capacity to wreak fear and concern. the present study evaluated the psychometric qualities of the fear of covid-19 scale (fcv-19s) amongst black south african university students (n = 433; female: 58%; mage = 23.51 [sd = 4.18]). the fcv-19s demonstrated a unidimensional factor structure and acceptable internal consistency (α = 0.87), omega (ω = 0.88) and the greatest lower bound (glb = 0.90) reliabilities. in addition, discriminant validity was demonstrated when fcv-19s items loaded separately from ordinary fear. the fcv-19s can be used as a measure of covid-19-related fear amongst black south african university students. keywords: covid-19-related fear; factor structure; students; ordinary fear; validity. introduction coronavirus disease 2019 (covid-19) has spread widely leading to a global public health crisis. aside from large-scale deaths, one of its consequences has been mental health problems (kim, nyengerai, & mendenhall, 2020). there are predictions that the pandemic-related mental health situation is likely to worsen (lin, 2020; qiu et al., 2020). nowhere has this been felt as in university contexts where there are reports of heightened general distress and anxiety amongst students because of pandemic-associated changes such as lockdown, interrupted academic programmes and migration to online tuition, rendering student life unpredictable (cao et al., 2020; dziech, 2020; hartocollis, 2020). fear, a negative emotional response, is one of the likely and natural mental health outcomes when facing life-threatening events such as covid-19 (lin et al., 2020; ng & kemp, 2020). the emotional response itself may lead to fear-related behaviour, which eventually determines the progress and overall outcome of a major disease outbreak (shultz et al., 2016). indeed, much has been said about the impact of fear on general mental health and quality of life (ford et al., 2019). we are labouring under the assumption that different types of threats to the organism will trigger unique fear responses (adolphs, 2013). thus, the fear of covid-19 is similar to fears associated with invisible and unique infectious diseases such as ebola. it is likely to be triggered and exacerbated by the incomplete understanding, uncontrollable nature, ever changing status and recently the discovery of more infectious strains of the virus (roberts, 2021; who, 2020). it is exactly this situation that has led researchers to suspect that conventional models and clinical interventions of general anxiety may not work with the fear associated with covid-19 (ahorsu et al., 2020; perz, lang, & harrington, 2020; rajkumar, 2020). to this end, ahorsu et al. (2020) developed a seven-item fear of covid-19 scale (fcv-19s) to evaluate anxiety specific to covid-19, based on the protection motivation theory (rogers, 1975). the measure has been translated into a number of languages and validated in a number of countries including bangladesh (sakib et al., 2020), china (chi et al., 2021), ethiopia (elemo, satici, & griffiths, 2020), france (mailliez, griffiths, & carre, 2020), greece (tsipropoulou et al., 2020), israel (bitan et al., 2020), italy (soraci et al., 2020), japan (masuyama, shinkawa, & kubo, 2020) malaysia (pang et al., 2020), mozambique (giordani, giolo, muhl, estavela, & gove, 2021), new zealand (winter et al., 2020), russia (reznik, gritsenko, konstantinov, khamenka, & isralowitz, 2020), saudi arabia (alyami, henning, krägeloh, & alyami, 2020), turkey (satici, gocet-tekin, deniz, & satici, 2020) and vietnam (nguyen et al., 2020). most of the studies consistently found that the fcv-19s has a unidimensional factor structure. a few studies have reported a bi-factor structure (bitan et al., 2020; chi et al., 2021; masuyama et al., 2020; reznik et al., 2020) in varied contexts such as israel, china, japan and russia, respectively. however, reznik et al. (2020) and bitan et al. (2020) have been criticised by researchers on account of incorrect use of factor analytic techniques (see pakpour, griffiths, & lin, 2020a; pakpour et al., 2020b). it is not clear if the psychometric properties observed in these cited studies will be found in south africa given the unique socio-cultural context and the status of the pandemic, particularly with university students (mahlokwane, 2021; morapela, 22 may; sunday times, 2021, 28 april). the sparse investigation of psychological measures across diverse populations is shown to account for measurement problems when these measures are applied on the groups they were not validated for (ramırez et al., 2005). indeed, to be suitable for use with various groups of people, health outcome measures should at the minimum show that they measure the same constructs across populations. besides, university students have previously reported high levels of covid-19-related fear, and the levels (of fear) are associated with depression and anxiety (elsharkawy & abdelaziz, 2020; zolotov, reznit, bender, & isralowitz, 2020). so, measures of covid-19-related fear with good psychometric evidence are necessary for counselling services at universities, particularly to aid in the identification of students with high levels of covid-19-associated fear and the prevention of the development of associated mental health problems. south africa has seemingly emerged from a second wave of covid-19 infections and reported a new and more infectious variant of the virus compared with other countries where the fcv-19s has been studied (farber, 2021; fink, 2021). although the roll-out of vaccines has begun in earnest amongst essential health workers and the elderly, there are still uncertainties in the country whether university students will be vaccinated for them (the vaccines) to have any public health impact in institutions of higher learning (davis, 2021; govender, 2021). whilst the spread of covid-19 is not, by all accounts, completely out of control, there are indications that south africa is experiencing a ‘third wave’ of infections (brandt, 2021; savides, 2021). rising infection rates have been reported amongst students in universities (mahlokwane, 2021; morapela, may 22). for that reason, it is still necessary to validate the fcv-19s for use in a student population. it will be needed when covid-19 interventions are continuing. only one study from the united states of america has to date reported on the psychometric properties of the measure in university students (perz et al., 2020). the aim of this study was to validate the fcv-19s in south africa, examining the following psychometric properties of the measure in a sample of black african university students: (1) dimensionality, (2) discriminant validity and (3) reliability. methods participants a convenience sample of 433 black african university students (female = 58%, mage = 23.51, standard deviation [sd] = 4.18) was used for the study. the sample was predominantly constituted by undergraduate students (88%) across the faculty of humanities, whilst 76% of the participants resided in a rural area (see table 1). table 1: sociodemographic characteristics (n = 433). design data were collected online, within a cross-sectional design. instruments fear of covid-19 scale the seven-item fcv-19s (ahorsu et al., 2020) examined covid-specific anxiety on a likert scale (i.e. 1 [strongly disagree] to 5 [strongly agree]). items of the fcv-19s include: item 1: ‘… most afraid of covid-19’ and item 4: ‘… afraid of losing my life because of covid-19’. participants could achieve a score that ranges from 7 to 35. the measure achieved an internal consistency reliability of α > 0.80 in previous studies (e.g. elemo et al., 2020; perz et al., 2020; soraci et al., 2020). in this study reliability was estimated at α = 0.88. jackson-5 fear scale the j-5fs, a seven-item fear subscale of the jackson-5 scales (jackson, 2009), was used to measure ordinary fear. its response scale is anchored from 1 [completely disagree] to 7 [completely agree], denoting that a high score is equivalent to a high fear report. two items are reverse scored. jackson (2009) reported a reliability estimate of α = 0.69 for the measure, whilst we found a modest α = 0.50 in the present study. procedure all participants consented to participation in the study before completing the questionnaire. participants were recruited online using class registers. the registers were simply used to direct the communication to the potential respondents because they contained student-number formulated e-mail addresses (i.e. a university-generated e-mail address that only contains a student number and not the student’s name). they were directed to the website where the study questionnaire, designed using google forms, was posted. the survey was only in english and required between 20 and 30 min to complete. the total sample consists of two data sets (i.e. n = 202 and n = 231) that were collected around the same time, although data set 1 does not have all the measures covered in data set 2. data were collected by two different research groups using the same data collection media, during the same period and utilising the same student population. data analysis factor structure two data sets were used to conduct the main analyses. a combined data set (n = 433, combining data set 1 and 2) was used to examine the factor structure of the fcv-19s using confirmatory factor analysis (cfa). the cfa analyses (of the one-factor model consistently found in the literature) were conducted with the weighted least squares mean and variance adjusted (wlsmv) estimator for ordinal data in mplus 7.4 (muthén & muthén, 2017). the model fit was evaluated with the comparative fit index (cfi), the tucker–lewis index (tli), standard root mean residual (srmr) and the root mean square error of approximation (rmsea) (i.e. tli and cfi ≥ 0.95 [adequate at 0.92 to 0.94] and rmsea < 0.08) (makhubela & mashegoane, 2019). two alternative models were also tested. the two-factor model consisted of the fear thoughts (items 1, 2, 4, 5) and the physical response factor (items 3, 6, 7). the bi-factor model incorporated a general fear factor and two orthogonal factors (i.e. fear thoughts and physical response). discriminant validity following cfa, only data set 2 (n = 231) was used to evaluate the discriminant validity of the measure using exploratory factor analysis (efa). the dimensionality of the fcv-19s and j-5fs items under efa was estimated with maximum likelihood using principal axis factoring (paf) with varimax rotation. factor selection was performed using parallel analysis (pa). reliability finally, based on results of the first analysis, the internal consistency, omega and greatest lower bound (glb) reliability estimates of the unidimensional scale of the fcv-19s were examined. ethical considerations ethical compliance was approved by the turfloop research ethics committee of the university of limpopo, reference number: trec/375/2020: ir. results item analysis item mean, standard deviations and normality of the fcv-19s were examined. the univariate skewness and kurtosis for each of all the seven items of the scale are within the normal range of –1.5 to 1.5 (see table 2). table 2: normality, mean and standard deviation of the fcv-19s items. confirmatory factor analysis results of cfa conducted on the fcv-19s revealed a well-fitting model (see figure 1) to the data (x2 = 51.637, degrees of freedom [df] = 14, p < 0.001, tli = 0.975, cfi = 0.983, srmr = 0.070, rmsea = 0.079, with a 90% ci [0.057–0.102]) and all parameters were viable. two alternative models were also tested: two-factor model (x2 = 55.687, df = 13, p < 0.001, tli = 0.817, cfi = 0.887, srmr = 0.041, rmsea = 0.087, with a 90% ci [0.064–0.111]) and bi-factor model (x2 = 22.793, df = 7, p < 0.005, tli = 0.966, cfi = 0.989, srmr = 0.033, rmsea = 0.072, with a 90% ci [0.041–0.106]). whilst the bi-factor model appears on the basis of fit indices to have a good fit, the model is however a poor model because of the fact that not all model parameters were viable (i.e. items 2 and 4 load poorly on the first factor). as such the overall results did not support the two alternative models. figure 1: fear of covid-19 unidimensional structure. discriminant validity the fcv-19s and the j-5fs were loaded and factor analysed together to establish discriminant validity. the kaiser–meyer–olkin (kmo) measure was 0.86 showing that the sample size used in the study was adequate for efa (field, 2009), whilst the bartlett test of sphericity was less than the critical level of significance (x2 = 1410.98, df = 21, p < 0.001) indicating that the data were suitable for efa. principal axis factoring produced a two-factor solution. table 3 shows the factor loadings (> 0.30) and there were no cross-loading items. two items of the j-5fs (i.e. j-5fs 1 and 4) did not load on either of the two factors. in sum, the efa results suggest that there is discriminant validity between the fcv-19s and the fear items of the j-5fs. table 3: factor matrix of the fear of covid-19 scale and the jackson-5 fear scale. reliability the reliability estimates of the fcv-19s were acceptable (α = 0.87, ω = 0.88 and glb = 0.90). discussion this study set out to evaluate the psychometric properties of the fcv-19s in the south african context, using a predominantly black african student population in the limpopo province. the cfa results lend support to a unidimensional factor structure of the scale (ahorsu et al., 2020; alyami et al., 2020; elemo et al., 2020; mailliez et al., 2020; nguyen et al., 2020; pang et al., 2020; reznik et al., 2020; sakib et al., 2020; satici et al., 2020; soraci et al., 2020; tsipropoulou et al., 2020; winter et al., 2020). discriminant validity was established using a sample size that can be considered to be large enough to provide stable factors (kmo measure ≥ 0.80; field, 2009). items of the fcv-19s and the j-5fs loaded separately, with all seven fcv-19s items loading on their own factor. the results demonstrate that the fear measured by the fcv-19s is unique to covid-19 (perz et al., 2020) and therefore worthy of being studied as a stand-alone construct. in spite of the fcv-19s being unidimensional, there is a pattern of response to the items where endorsements of items 3, 6 and 7 are comparatively low when contrasted with the scores of the remaining items. the pattern was observed in mean scores reported by studies such as elemo et al. (2020), giordani et al. (2021), pang et al. (2020), perz et al. (2020), sakib et al. (2020), satici et al. (2020), soraci et al. (2020) and winter et al. (2020). a closer inspection of the items shows that they refer to physiological reactions because of covid-19 fear (alyami et al., 2020). the pattern of response most likely explains why some studies (bitan et al., 2020; chi et al., 2020; masuyama et al., 2020) obtained a second factor in their factor analytic studies, which comprises the three low-scoring items. the reliability estimate of the fcv-19s is comparable with those obtained in many other studies from different geographic contexts (cf. elemo et al., 2020). the omega obtained in this study (ω = 0.88) is the same as that found by elemo et al. (2020) and the glb is nearly the same as that obtained by pang et al. (2020). the reliability of the scale implies that the scores can be trusted because they can be reproduced whenever the measure is administered. reproducibility improves confidence in making clinical and other decisions of intervention. decisions can be based on the results obtained with the fcv-19s. for instance, results of the fcv-19s can help in the pitching of messages related to covid-19 because it is well-known that extreme fear and subsequent panic during a pandemic tend to minimise the reception of communications related to the illness (ng & kemp, 2020). the reason for that is as follows: beyond the fear of potential infection, fear of the unknown related to covid-19 has been shown to give rise to clinical anxiety symptoms, also affecting the mental health of healthy people (shigemura, ursano, morganstein, kurosawa, & benedek, 2020). valid covid-19-related mental health screeners are necessary to assist with the early identification of people at risk of pandemic-related psychological distress, to enable preventative and supportive interventions. whilst there are a number of mental health screeners associated with covid-19, the fcv-19s has advantage over many of them because of its shortness, making it more appropriate for resource and time constrained student counselling contexts. this benefit is also true for research purposes. the limitation of the study is that the sample was not randomly drawn and therefore may not likely be completely representative of the targeted study population. there are more students domiciled in a rural area and they were over-represented in this sample. the study has to be replicated in a different research site and with non-student samples to confirm the results. additional psychometric properties that could offer more validity evidence for the fcv-19s, such as predictive validity, convergent validity and measurement invariance, were not assessed. conclusion this study established that the fcv-19s can be used as a unidimensional measure of covid-19-related fear amongst university students in south africa. it is also reliable. the fcv-19s has also been shown to be a construct distinct from ordinary fear. its scores will assist with messaging pertaining to covid-19 prevention. acknowledgements the authors would like to acknowledge ramokone c. lebelo, kholofelo m. makgaila, princess m. malamule, diketso mosumi, sphiwe i. ramotlhoa, katlego m. rantho, koketso e. p. shai and pennelope m. theko who assisted with part of the data collection. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions both m.m. and s.m. contributed to designing the study, implementation, data analysis and writing of the manuscript. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the data that support the findings of this study are available from the corresponding author, s.m., upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references adolphs, r. (2013). the biology of fear. current biology, 23(3), 79–93. https://doi.org/10.1016/j.cub.2012.11.055 ahorsu, d.k., lin, c.y., imani, v., saffari, m., griffiths, m.d., & pakpour, a.h. (2020). the fear of covid-19 scale: development and initial validation. international journal of mental health & addiction, 1–9. https://doi.org/10.1007/s11469-020-00270-8 alyami, m., henning, m., krägeloh, c.u., & alyami, h. (2020). psychometric evaluation of the arabic version of the fear of covid-19 scale. international journal of mental health & addiction, 1–14. https://doi.org/10.1007/s11469-020-00316-x brandt, k. (2021, july 6). nelson mandela bay metro in ec officially enters covid-19 third wave. retrieved from https://www.msn.com/en-za/news/other/nelson-mandela-bay-metro-in-ec-officially-enters-covid-19-third-wave/ar-aalq2dm bitan, d.t., grossman-giron, a., bloch, y., mayer, y., shiffman, n., & mendlovic, s. (2020). fear of covid19 scale: psychometric characteristics, reliability and validity in the israeli population. psychiatry research, 289, 113100. https://doi.org/10.1016/j.psychres.2020.113100 cao, w., fang, z., hou, g., han, m., xu, x., dong, j., & zheng, j. (2020). the psychological impact of the covid-19 epidemic on college students in china. psychiatry research, 287, 112934. https://doi.org/10.1016/j.psychres.2020.112934 chi, x., chen, s., chen, y., chen, d., yu, q., guo, t., … zou, l. (2021). psychometric evaluation of the fear of covid-19 scale among chinese population. international journal of mental health and addiction, 1–16; online first. https://doi.org/10.1007/s11469-020-00441-7 davis, r. (2021, february 28). [coronavirus] ramaphosa: south africa will move to alert level 1 after emerging from second wave of covid-19. daily maverick. retrieved from https://www.dailymaverick.co.za/article/2021-02-28-ramaphosa-south-africa-will-move-to-alert-level-1-after-emerging-from-second-wave-of-covid-19/?utm_source=ince_firstthing dziech, b.w. (2020). what about the students? inside higher education. retrieved from https://www.insidehighered.com/views/2020/04/09/students-are-among-most-severe-and-overlooked-victims-pandemic-opinion elemo, a.s., satici, s.a., & griffiths, m.d. (2020). the fear of covid-19 scale: psychometric properties of the ethiopian amharic version. international journal of mental health & addiction, 1–12; online first. https://doi.org/10.1007/s11469-020-00448-0 elsharkawy, n.b., & abdelaziz, e.m. (2020). levels of fear and uncertainty regarding the spread of coronavirus disease (covid-19) among university students. perspectives in psychiatric care, 1–9. https://doi.org/10.1111/ppc.12698 farber, t. (2021, may 11). as third wave looms, new covid variants throw scientists’ predictions awry. retrieved from https://select.timeslive.co.za/news/2021-05-10-as-third-wave-looms-new-covid-variants-throw-scientists-predictions-awry/?utm_source=&utm_me… field, a. (2009). discovering statistics using spss (3rd edn.). london: sage. fink, s. (2020, december 19). south africa announces a new coronavirus variant. new york times. retrieved from https://www.nytimes.com/2020/12/19/world/south-africa-announces-a-new-coronavirusvariant.html ford, b.n., yolken, r.h., dickerson, f.b., teague, t.k., irwin, m.r., paulus, m.p., & savitz, j. (2019). reduced immunity to measles in adults with major depressive disorder. psychological medicine, 49(2), 243–249. https://doi.org/10.1017/s0033291718000661 govender, p. (2021, january 18). covid leaves trail of devastation at tvet colleges. retrieved from https://www.sowetanlive.co.za/news/south-africa/2021-01-18-covid-leaves-trail-of-devastation-at-tvet-colleges/ giordani, r.c.f., giolo, s.r., muhl, c., estavela, a.j., & gove, j.i.m. (2021). validation of the fcv-19 scale and assessment of fear of covid-19 in the population of mozambique, east africa. psychology research & behavior management, 2021(14), 345–354. https://doi.org/10.2147/prbm.s298948 hartocollis, a. (2020). scattered to the winds, college students mourn lost semester. new york times. retrieved from https://www.nytimes.com/2020/05/27/us/coronavirus-college-mental-health.html jackson, c.j. (2009). jackson-5 scales of revised reinforcement sensitivity theory (r-rst) and their application to dysfunctional real world outcomes. journal of research in personality, 43(4), 556–569. https://doi.org/10.1016/j.jrp.2009.02.007 kim, a.w., nyengerai, t., & mendenhall, e. (2020). evaluating the mental health impacts of the covid-19 pandemic: perceived risk of covid-19 infection and childhood trauma predict adult depressive symptoms in urban south africa. psychological medicine, 1–13; online first. https://doi.org/10.1017/s0033291720003414 lin, c.-y. (2020). social reaction toward the 2019 novel coronavirus (covid-19). social health & behavior, 3(1), 1–2. https://doi.org/10.4103/shb.shb_11_20 mahlokwane, j. (2021, apr 6). covid-19: more than 100 university of pretoria students test positive. pretoria news. retrieved from https://www.iol.co.za/pretoria-news/news/covid-19-more-than-100-university-of-pretoria-students-test-positive-2c404727-464a-4e32-8d41-92c531efe72c mailliez, m., griffiths, m.d., & carre, a. (2020). validation of the french version of the fear of covid-19 scale and its associations with depression, anxiety and differential emotions. research square preprints. https://doi.org/10.21203/rs.3.rs-46616/v1 makhubela, m.s., & mashegoane, s. (2019). establishing factorial validity of the rosenberg self-esteem scale. in s. laher, a. fynn, & s. kramer (eds.), transforming research methods in the social sciences: case studies from south africa (pp. 52–63). johannesburg: wits university. https://doi.org.10.18772/22019032750 masuyama, a., shinkawa, h., & kubo, t. (2020). validation and psychometric properties of the japanese version fear of covid-19 scale among adolescents. international journal of mental health & addiction, 1–11; online first. https://doi.org/10.31234/osf.io/jkmut muthén, l.k., muthén, b.o. (1998–2017). mplus statistical analysis with latent variables: users’ guide (8th edn.). los angeles, ca: muthén & muthén. morapela, k. (2021, 22 may). #coronavirusfs: ufs concerned with rising #covid19 cases. ofm. retrieved from https://www.ofm.co.za/article/centralsa/304477/-coronavirusfs-ufs-concerned-with-rising-covid19-cases ng, k.h., & kemp, r. (2020). understanding and reducing the fear of covid-19. journal of zhejiang university-science b (biomedicine & biotechnology), 21(9), 752–754. https://doi.org/10.1631/jzus.b2000228 nguyen, h.t., do, b.n., pham, k.m., kim, g.b., dam, h.t., nguyen, t.t., … duong, t.v. (2020). fear of covid-19 scale—associations of its scores with health literacy and health-related behaviors among medical students. international journal of environmental research & public health, 17(11), 4164. https://doi.org/10.3390/ijerph17114164 pang, n.t.p., kamu, a., hambali, n.l.b., mun, h.c., kassim, m.a., mohamed, n.h., … jeffree, m.s. (2020). malay version of the fear of covid-19 scale: validity and reliability. international journal of mental health & addiction, 1–10; online first https://doi.org/10.1007/s11469-020-00355-4 pakpour, a.h., griffiths, m.d., & lin, c.-y. (2020a). assessing the psychological response to the covid-19: a response to bitan et al. “fear of covid-19 scale: psychometric characteristics, reliability and validity in the israeli population”. psychiatry research, 290, 113127. https://doi.org/10.1016/j.psychres.2020.113127 pakpour, a.h., griffiths, m.d., chang, k.-c., chene, y.-p., kuoe, y.-j., & chung-ying ling, c.-y. (2020b). assessing the fear of covid-19 among different populations: a response to ransing et al. (2020). brain, behavior, & immunity, 89, 524–525. https://doi.org/10.1016/j.bbi.2020.06.006 perz, c.a., lang, b.a., & harrington, r. (2020). validation of the fear of covid-19 scale in a us college sample. international journal of mental health & addiction, 1–11; online first. https://doi.org/10.1007/s11469-020-00356-3 qiu, j., shen, b., zhao, m., wang, z., xie, b., & xu, y. (2020). a nationwide survey of psychological distress among chinese people in the covid-19 epidemic: implications and policy recommendations. general psychiatry, 33(2), e100213. https://doi.org/10.1136/gpsych-2020-100213 rajkumar, r.p. (2020). covid-19 and mental health: a review of the existing literature. asian journal of psychiatry, 52, 102066. https://doi.org/10.1016/j.ajp.2020.102066 ramırez, m., ford, m.e., & steward, a.l. (2005). measurement issues in health disparities research. health services research, 40(5 pt 2), 1640–1657. http://doi.org/10.1111/j.1475-6773.2005.00450.x reznik, a., gritsenko, v., konstantinov, v., khamenka, n., & isralowitz, r. (2020). covid-19 fear in eastern europe: validation of the fear of covid-19 scale. international journal of mental health & addiction, 1–6; online first. https://doi.org/10.1007/s11469-020-00283-3 roberts, m. (2021, february 28). south africa coronavirus variant: what is the risk? bbc news online. retrieved from https://www.bbc.com/news/health-55534727 rogers, r.w. (1975). a protection motivation theory of fear appeals and attitude change. journal of psychology, 91(1), 93–114. https://doi.org/10.1080/00223980.1975.9915803 sakib, n., bhuiyan, a.i., hossain, s., al mamun, f., hosen, i., abdullah, a.h., … mamun, m.a. (2020). psychometric validation of the bangla fear of covid-19 scale: confirmatory factor analysis and rasch analysis. international journal of mental health & addiction. , 1–12; online first. https://doi.org/10.1007/s11469-020-00289-x satici, b., gocet-tekin, e., deniz, m.e., & satici, s.a. (2020). adaptation of the fear of covid-19 scale: its association with psychological distress and life satisfaction in turkey. international journal of mental health & addiction, 1–9; online first. https://doi.org/10.1007/s11469-020-00294-0 savides, m. (2021, may 17). it’s inland sa’s turn to feel the full impact of the third wave. retrieved from https://select.timeslive.co.za/news/2021-05-16-its-inland-sas-turn-to-feel-the-full-impact-of-the-third-ave/?utm_source=&utm_medium=email&ut… shigemura, j., ursano, r.j., morganstein, j.c., kurosawa, m., & benedek, d.m. (2020). public responses to the novel 2019 coronavirus (2019-ncov) in japan: mental health consequences and target populations. psychiatry and clinical neurosciences, 74(4), 281–282. https://doi.org/10.1111/pcn.12988 shultz, j.m., cooper j.l., baingana, f., oquendo, m.a., espinel, z., althouse, b.m., … rechkemmer, a. (2016). the role of fear-related behaviors in the 2013–2016 west africa ebola virus disease outbreak. current psychiatry reports, 18, 104. https://doi.org/10.1007/s11920-016-0741-y soraci, p., ferrari, a., abbiati, f.a., del fante, e., de pace, r., urso, a., & griffiths, m.d. (2020). validation and psychometric evaluation of the italian version of the fear of covid-19 scale. international journal of mental health & addiction, 1–10; online first. https://doi.org/10.1007/s11469-020-00277-1 sunday times. (2021, 28 april). new research shows how covid-19 affected university students in 2020 crowdfunding bridges the gaps for hundreds of students. retrieved from https://www.timeslive.co.za/sunday-times/business/2021-04-28-native-new-research-shows-howcovid-19-affected-university-students-in-2020/# tsipropoulou, v., nikopoulou, v.a., holeva, v., nasika, z., diakogiannis, i., sakka, s., … parlapani, e. (2020). psychometric properties of the greek version of fcv-19s. international journal of mental health & addiction, 1–10; online first. https://doi.org/10.1007/s11469-020-00319-8 who. (2020, december 31). emergencies preparedness, response: sars-cov-2 variants. disease outbreak news. retrieved from https://www.who.int/csr/don/31-december-2020-sars-cov2-variants/en/ winter, t., riordan, b.c., pakpour, a.h., griffiths, m.d., mason, a., poulgrain, j.w., & scarf, d. (2020). evaluation of the english version of the fear of covid-19 scale and its relationship with behaviour change and political beliefs. international journal of mental health & addiction, 1–10; online first. https://doi.org/10.1007/s11469-020-00342-9 zolotov, y., reznit, a., bender, s., & isralowitz, r. (2020). covid-19 fear, mental health, and substance use among israeli university students. international journal of mental health & addiction, 1–11; online first. https://doi.org/10.1007/s11469-020-00351-8 abstract introduction methods research design data analysis results discussion conclusion acknowledgements references about the author(s) erica munnik department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa emma wagener department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa mario smith department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa citation munnik, e., wagener, e., & smith, m. (2021). validation of the emotional social screening tool for school readiness. african journal of psychological assessment, 3(0), a42. https://doi.org/10.4102/ajopa.v3i0.42 original research validation of the emotional social screening tool for school readiness erica munnik, emma wagener, mario smith received: 20 nov. 2020; accepted: 12 may 2021; published: 21 june 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the need for contextually appropriate and accessible school readiness assessment instruments in south africa is well documented. the emotional social screening tool for school readiness (e3sr) screens for emotional and social competencies as a component of school readiness. this competency-based screening instrument was developed as a nine-factor model consisting of 54 items. this research study reports on the psychometric properties and factor structure of the e3sr by exploratory factor analysis. ten preschool centres registered under the social welfare act in the cape town metropolitan region situated in the high-, middleand low-socio-economic status (ses) areas constituted the research setting. a pilot study using a survey design was conducted. the e3sr protocols were completed by teachers on grade r children during the fourth term of the academic year. the data set of 330 protocols satisfied the assumptions for inferential statistics, except for normal distribution. normality was violated statistically; however, given the time frame, learners were expected to have mastered the competencies measured. therefore, the violation of normal distribution was supported theoretically. exploratory factor analysis yielded a six-factor structure, including emotional maturity, emotional management, sense of self, social skills, readiness to learn and communication. all the extracted factors displayed an adequate internal consistency, with a good reliability (α = 0.97). the e3sr can be shortened from 56 to 36 items without losing any important content. the e3sr can supplement formative assessments and enhance communication between role players to build children’s emotional and social competencies. keywords: emotional social competence; e3sr; factor structure; school readiness; south africa; validation study. introduction moving from early learning experiences into formal schooling constitutes a profound change (yunas & dahlan, 2013). school readiness and associated skills create a platform for learning and lifelong growth (rimm-kaufmann, pianta, & cox, 2000). early childhood development (ecd) and school readiness are exponentially compromised by contextual factors in developing countries (munnik & smith, 2019a; raikes et al., 2015). in south africa, ecd and school readiness are adversely affected by sociocultural and political factors. (bruwer et al., 2014; foxcroft, 2013). school readiness is regarded as a multidimensional concept. learning starts through early stimulation where external factors impact the personal readiness of the child, including the expectations of the parents, readiness of the school, preschool experiences and the child’s environment (bruwer et al., 2014; munnik & smith, 2019a). the primary domains identified in school readiness include cognition and general knowledge, language and literacy, perception, emotion regulation, social skills, approaches to learning; physical well-being, neurological and motor development (mohamed, 2013; rimm-kaufman & sandilos, 2017). the inclusion of emotional and social readiness as a domain of readiness has received more focus (dbe, 2013; mohamed, 2013; munnik, 2018). emotional and social development is perceived as an important domain of school readiness (ngwaru, 2012). self-understanding and awareness, social confidence, empathy and emotional growth, self and emotion regulation are identified as important competencies in the emotional and social realm (bustin, 2007). understanding, regulating and expressing of emotions are attributes of school readiness (ştefan, bălaj, porumb, albu, & miclea, 2009). similarly, compliance to rules, interpersonal skills and pro-social behaviour were identified as attributes of school readiness (mohamed, 2013). laher and cockcroft (2014) reported progress in the development of assessment protocols for educators and professionals. however, school readiness assessment in south africa remains a focus of further research. school readiness assessment needs to be seen as a multidimensional process. in south africa, formal assessment practices are still largely child focused. school readiness assessments are mainly performed by educators and healthcare professionals. preschool teachers use observations and assessment measures built into the national policy and curriculum statement to assess children’s readiness on a physical, cognitive, affective, normative, sociocultural and linguistic level (powell, 2010). instruments that are currently used by professionals to assess for school readiness include the aptitude test for school beginners (asb) (roodt et al., 2013), the junior south african individual scale (jsais) (madge, van den berg, & robinson, 1985), the school readiness evaluation by trained testers (setts) (hsrc, 1984), the griffiths mental developmental scales (gmds) (jacklin & cockcroft, 2013; luiz, barnard, knoetzen, & kotras, 2004), the school-entry group screening measure – sgsm (foxcroft, 1994) and the school readiness test of the university of pretoria (van rooyen & engelbrecht, 1997). the health professionals council of south africa has not included any additional tests to assess school readiness on its list of classified tests since 2007 (hpcsa, 2010). scientific evidence for the validity and reliability of currently used tests in the multi-cultural context is lacking. access to and costs are barriers that limit applicability in south africa. assessment is often compromised by the availability of instruments that allow for variation in sociocultural status, multilingualism and access to available resources (amod & heafield, 2013). professional screening and assessment remain unaffordable for the general population (makhalemele & nel, 2016). access, affordability and bias towards cognitive functioning require the need for the development of contextually relevant measures of social emotional competencies (secs) as the domain of school readiness assessment (bustin, 2007). there is a need for accessible and affordable contextually relevant instruments to assess social-emotional competencies in preschool-aged children (amod & heafield, 2013, munnik & smith, 2019b). munnik (2018) developed the emotional social screening tool for school readiness (e3sr) in response to the expressed need for contextually relevant screening tools. the construction followed a multi-phased procedure in which each phase used distinct methodologies. munnik (2018) reported that multiple design approaches were used to ensure a strong theoretical foundation for the e3sr and recommended the examination of the psychometric properties of the e3sr. the theoretical foundation was derived from two systematic reviews that informed the definition of the constructs and operationalised definitions and attributes that were used to build the model. applicability to the south african context was enhanced through consultation with stakeholders, including parents and professionals from the education and healthcare sectors. these processes ensured a strong theoretical basis for the construction of the original pilot version of the e3sr. the e3sr was constructed with two broad domains, namely, emotional competence and social competence. the emotional competence domain consisted of five subdomains: emotional maturity, emotional management, independence, positive sense of self and mental well-being and alertness. the social competence domain included four subdomains: social skills, pro-social behaviour, compliance with rules and communication. the screening tool included 56 items across the nine theoretical subdomains. for a detailed description of the theoretical model underpinning the e3sr, refer to munnik (2018). the initial validation of the e3sr was assessed via a confirmatory factor analysis (cfa) that tested the theoretical underpinning of the instrument. munnik (2018) reported that the cfa provided support for the theoretical model with clear recommendations for further refinement. the recommendation was that there was room for further investigation of the instrument despite the fact that the theoretical model was supported. this article reports on a post hoc analysis and data reduction as a further exploration of the factor structure of the e3sr. methods participants grade r teachers working in 10 educare centres or preschools in the western cape, cape town area were recruited as the respondent group to complete the protocols of the e3sr. the teachers were full-time employees who currently taught grade r. they had to be familiar with the child’s behavioural patterns, abilities and general traits across environments through their day-to-day interaction with the child in the preschools. seventeen teachers gave consent to participate in the pilot study. a total of 330 protocols were received. the preschools included one alternative (n = 36), one private (n = 71), three governmental (n = 201) and five community-based (n = 22) preschools. all protocols were obtained from children between the age of 6 and 7 years. the demographic profile of the children on whom the protocols were based is presented in table 1. table 1: demographic composition of the target group/ children 6–7 years (n = 330). table 1 shows that in this sample, 64% of children were male and 36% were female. english was the most frequently spoken first language (56%), followed by afrikaans (37%) and xhosa (6%). other primary languages that were specified as mother tongues included french, congolese and other south african languages, sepedi or zulu (2%). research design a cross-sectional survey design was used for data collection. instruments the e3sr is a strength-based screening instrument designed to screen for emotional and social readiness in preschool children in south africa, and was used for data collection (munnik, 2018). the e3sr used a questionnaire format with a five-point likert scale (never, rarely, some of the time, most of the time and almost always). the instrument was used in a summative way. the instrument has clear instructions on how to complete the questionnaire. the e3sr consists of two sections. section a: the demographic section includes questions on the demographics of the child, such as the child’s chronological age, birth order, gender, ethnicity, language of instruction in school, home language, and history of illness or disability. information of the respondents is also recorded, such as the length of time that the child was known to the teacher, a rating on how well the child is known and if the child has been referred for special interventions. section b: this section included items for each of the domains (2) and sub-domains (9). the emotional competence domain comprised of 31 items across five sub-domains, namely, emotional maturity, emotional management, positive sense of self, independence, and mental well-being and alertness. the social competence domain comprised of 25 items across four sub-domains, namely, social skills, pro-social behaviour, compliance with rules and communication. the theoretical and operational definitions for each scale and subscale with their personal attributes can be accessed from the unpublished doctoral thesis of munnik (2018). a composite score can be calculated for the full scale, each domain and sub-domain. the full-scale score reflects the level of readiness to enter mainstream education on the emotional–social level. the domain scores reflect the level of readiness on an emotional or social level. procedure a stratified sampling frame of preschools or educare centres registered under the social welfare act in the cape town metropolitan area was established. socio-economic status (ses) was used to stratify the sample into high, middle, and low geographical areas. schools within these areas were invited to take part in this study. the recruitment process entailed a multi-layered stakeholder consultation. firstly, an invitation was sent to the principals of the preschools, which included the proposed purpose of the pilot and an outline of what their involvement would entail. the principals discussed the invitation with the teachers and identified teachers who expressed their interest to participate. the research team then contacted the identified teachers for recruitment purposes. willing principals and teachers constituted willing schools. a meeting was convened with parents of preschool children at ‘willing schools’. the aim of the meeting was to explain that the school was participating in a pilot study and the value of the e3sr. parents were also informed that the information will be provided by the teachers as part of an administrative process as described above. thus, principals, parents and teachers had to agree in order for the school to be included. meetings between the research team, teachers and the principals were scheduled at identified settings. the main purpose was to give an outline of the research (pilot study) and to clarify what teachers’ involvement would entail. teachers were invited to ask questions about the study, the test administration,the layout of the e3sr and about the dissemination strategy. the questionnaires were delivered to an identified teacher at each preschool for completion. the respondents (teachers) were able to contact a nominated researcher at any time to discuss reservations or difficulties that arose during the pilot study. these steps increased compliance with the administration guidelines and by extension, the reliability of the data. the completed questionnaires were collected as soon as the teachers indicated that they have completed the questionnaires. data analysis data analysis entailed: (1) data curing and testing thresholds for validation, (2) testing assumptions and (3) data reduction. data curing the data set included 330 protocols for children aged 6–7 years old. protocols received from two independent schools were 107 (32.4%); 201 (60.9%) protocols were received from three governmental schools and 22 (6.7%) protocols from five community-based settings. the number of protocols in the data set (n = 330) exceeded the threshold requirements on the number of cases per item and the overall threshold for robustness recommended by devellis (2016). the minimum cases per item ratio for validation studies should be five cases per item up to 300 cases after which the ratio can be relaxed. the pilot e3sr consisted of 56 items, which set the minimum threshold at 280 (5 × 56 = 280). the recommended threshold sample size of 300 was exceeded, which increased the robustness of the analysis in this validation study (devellis, 2016). testing assumptions before the multivariate statistical analysis commenced, the assumptions for multivariate statistical analysis and data reduction were tested, as recommended by field (2013). normal distribution was assessed using the shapiro–wilk test. bartlett’s test of sphericity was used for testing homogeneity of variance. the kaiser–meyer–olkin test (kmo) assessed sample adequacy. data reduction statistical analysis was conducted using spss (version 25). internal consistency was assessed with cronbach’s coefficient alpha. the dimensional structure of the e3sr was assessed by exploratory factor analysis (efa), as recommended by henson and roberts (2006). the main aim of the efa was to clarify how many items were loaded on the identified factors and to identify a reduced set of factors that would describe the structural inter-relationships amongst the domain and sub-domain scales in the e3sr (henson & roberts, 2006). principal axis factoring (paf) was used in this study, as the intention of paf was to explain the common variance amongst variables by means of factors (henson & roberts, 2006). the direct oblimin method was used as the rotation method, which allows for factors to be correlated (laher, 2010). factor loadings were pushed towards 0 or 1.0 by decreasing the standard errors of the loadings for the variables with small communalities or increasing those of the correlations amongst oblique factors (kline, 2013). decision criteria set for this study the interpretation and reviewing of items were performed by inspecting factor loadings, communalities and factor over-determinations (nunnally & burnstein, 1994). item inspection on the correlation matrix. items that did not correlate with at least one other item significantly above 0.3 were omitted. items with few significant correlations with other items above 0.3 were flagged for further inspection consistent with the recommendation from kline (2013). measures of sampling adequacy (msa) were also assessed per item, with a threshold of 0.90. communalities. communalities express the shared variance accounted for by all the extracted factors (field, 2013). high communalities indicate a reliable item. the desirable level of communality was set at 0.40, and communalities should not vary over a wide range (gaskin, 2016; osborne, costello, & kellow, 2014). number of factors to extract. factor extraction was informed by cross validation of methods: (1) eigenvalues that exceeded 1 and inspection of the scree plot consistent with the recommendation of henson and roberts (2006), (2) parallel analysis (pa) (horn, 1965) and (3) and the velicer’s minimum average partial (map) test (velicer, eaton, & fava, 2000). factor loadings. according to costello and osborne (2005), the threshold for factor loadings was set at 0.50. items that loaded above 0.32 on more than one item were considered as cross-loading. items that cross-loaded on two components were retained in the component, in which they obtained the highest loading, on condition that they obtained a minimum loading of 0.50 based on the recommendation of williams, onsman and brown (2010). items that did not load on any factors were removed after examination. ethical considerations the humanities and social sciences research ethics committee at the university of the western cape granted ethical approval during the phd study and again for the post phd project, reference number: hs19/24. permission to conduct research at the preschools were obtained from the department of basic education and the principals. an information sheet explained what the study entailed. permission to include protocols in the final assessments was provided by parents. teachers at participating schools consented to complete protocols as respondents. all participation was voluntary, and the right to withdraw without fear or negative consequence was upheld. all protocols were anonymised. participants were informed of their rights and recourse if dissatisfied. the learners were the unit of analysis; however, they did not participate directly in the study. however, parents were appropriately informed of the study and consented to the school participation. teachers recorded their assessment based on observation of the children during the normal course of the execution of teaching responsibilities. results testing the data set for assumptions the assumption of normality was violated in this sample for the overall scale, as well as all nine subscales (shapiro-wilk = p < 0.05). the results showed that the distribution was positively skewed, and the assumption of normality was not met for this group. this violation was in line with the expected results, as 6to 7-year-old children are assumed to already have mastered most of the attributes of emotional social readiness in the fourth term of the academic year when data were collected. in other words, the distribution accurately reflected where the cohort should be in terms of the measured competencies. thus, the non-normal distribution was, in fact, an accurate representation of the target group. the kmo statistic of 0.96 suggests that sampling adequacy was within the accepted range. all items reported individual msa values above 0.90, with the exception of item 3 on the mental well-being subscale (mw3) with an msa value of 0.73, which was still considered to be acceptable according to field (2013). bartlett’s test of sphericity indicated that correlations between items were sufficiently large for paf, suggestive of a correlation matrix and not an identity matrix (χ² (1540) = 18918.98 p < 0.01). principal axis factoring first extraction the first extraction confirmed the nine-factor solution reported by munnik (2018). the nine factors accounted for 75.80% of the shared variance, which exceeded the threshold set by field (2013) where the extracted factors should account for a minimum of 60% of the variance in order to be a good fit. thus, this initial extraction confirmed the theoretical formulation of the subscales of the e3sr and was used at baseline to be refined in subsequent extractions. thus, the findings confirmed the acceptability of the model, even though several items were found to be problematic. for further refinement, items in4, in6, sos1 and cr4 were removed because of cross-loading on more than one factor and with none of the loadings reaching the 0.50 cut-off. all of these items had communalities below 0.6, and had low item-total correlations. item mw3 was also removed due to low msa (0.73) and an item-total correlation of 0.190. second extraction the second extraction resulted in seven factors based on eigenvalues exceeding 1 and inspection of the scree plot. this accounted for 74.97% of the shared variance. to confirm the number of factors for extraction, horn’s (1965) pa was conducted using glorefeld (1995) extension of sensitivity using the 99th percentile. the analysis was run in spss using o’connor’s (2000) syntax. the results were not meaningful as the eigenvalues above zero for this instrument consistently exceeded the random numbers produced in a pa. this would mean that an extraordinarily high number of factors (31) had to be retained. the over-estimation or extraction of factors can be attributed to the following: firstly, the e3sr contains a large number of variables that are not discrete (jones, 2018). secondly, adjusted correlation matrices (e.g. principal axis factoring) use squared multiple correlations on the diagonal that tends to suggest more factors that are justified (buja & eyuboglu, 1992). this over-extraction was cross-validated by looking at the average partial correlations in the velicer’s map test. the velicer’s result revealed a seven-factor solution that is consistent with the those reported in the second extraction method. upon inspection of the pattern matrix, several items were still loading poorly and cross-loading on multiple factors. items emx6, emx7, in2, mw2, mw6 and ss6 were removed as these items had loadings less than 0.5, and low communalities. item emx3 was removed because of poor communality and low item-total correlation. third extraction the third extraction method was conducted on the reduced number of items and was based on the eigenvalues (> 1) and inspection of the scree plot. a six-factor solution was suggested, which accounted for 75.48% of the shared variance. after the third extraction, item in1 was removed as this item was not loading on any factors, with a communality of 0.27. item ss1 was loading below 0.5 and was removed. both items pb6 and cr5 were removed because of cross-loading on two factors, with loadings below the 0.5 threshold. subsequent extractions were carried out with the specification of a six-factor solution, as this appeared to be the best fit for the data. fourth extraction the fourth extraction method also yielded a six-factor solution, accounting for 77.92% of the shared variance. upon inspection of the pattern matrix, items in5 and ss4 were cross-loading on more than one factor, with loadings below the 0.5 threshold, and were removed. items pb4 and cr1 reported loadings below 0.5 and were also removed. final extraction the final extraction yielded a six-factor solution, accounting for 78.94% of the shared variance amongst the 36 items. the results of the final extraction are presented in table 2. all items had loadings above 0.5 on their respective factors. item cr6 was cross-loading on factors 4 and 6; however, the item was retained in factor 4, as this loading was the highest above 0.5. table 2: final factors and factor loadings after the fourth extraction. summary of factors and revised sub-domains factor 1 consisted of seven items from the original social skills or confidence and pro-social behaviour subscales, with loadings ranging from 0.504 to 0.897. the amended scale had a cronbach’s alpha of 0.94 and reliability would not increase meaningfully by further removal of any items. this component was retained and labelled, social skills. factor 2 consisted of five items from the original positive sense of self subscale, with loadings between 0.601 and 0.889. the reduced scale had a cronbach’s alpha of 0.93, and reliability would not increase meaningfully by further removal of any items. this component was retained and labelled, sense of self. factor 3 consisted of the seven original items from the communication subscale, with loadings between 0.642 and 0.967. the subscale had a cronbach’s alpha of 0.95, and reliability would not increase meaningfully by further removal of any items. the component was retained and labelled, communication. factor 4 consisted of seven items. one item was from the original independence subscale, three items from the mental well-being or alertness, and three items from the compliance with rules subscale. the factor had loadings between 0.549 and 0.796. as these constructs are related, the retained items were combined to form a new scale. the newly merged subscale had a cronbach’s alpha of 0.94, and reliability would not increase meaningfully by further removal of any items. this component was retained and renamed as readiness to learn. factor 5 consisted of five items from the original emotional management subscale, with the loadings on this factor ranging from 0.624 to 0.901. the reduced subscale had a cronbach’s alpha of 0.92, and reliability would not increase meaningfully by further removal of any items. the component was retained and labelled, emotional management. factor 6 consisted of five items from the original emotional maturity subscale. the loadings on this factor ranged from 0.596 to 0.808, with the subscale having a cronbach’s alpha of 0.95. the reliability would not increase meaningfully by further removal of any items. the component was retained and labelled, emotional maturity. twenty items were removed from the original set of 56 items. a six-factor solution was recommended with the following sub-domains: (1) emotional maturity, (2) sense of self, (3) communication, (4) emotional management, (5) readiness to learn and (6) social skills. the amended and reduced scale consisted of 36 items and obtained a high level of internal consistency, as evidenced by the cronbach’s alpha of 0.97. correlations between factors is presented in table 3. all sub-domains of the revised scale were found to be significantly correlated with one another (r = 0.48–0.81, p < 0.01). table 3: means, standard deviations and correlations between factors. figure 1 is a graphical depiction of the data reduction process and the revised sub-domains. figure 1: original and revised emotional social screening tool for school readiness domains. internal consistency reliability table 4 provides an overview of the internal consistency for the original and revised scales and subscales of the e3sr. overall, the scale and subscales of the original e3sr were found to be internally consistent, as demonstrated by alpha levels indicative of good to excellent reliability. the reported internal consistency estimates suggest that the items hang together in a reliable way. the internal consistency results of the revised scale are also found to be excellent. the final cronbach’s alphas for the e3sr range from 0.92 to 0.95 across the six sub-domains. the revised scale had an overall cronbach’s alpha of 0.97 (36 items). the emotional competence domain reported a cronbach’s alpha of 0.95 (22 items), and the social competence domain had a cronbach’s alpha of 0.94 (14 items). table 4: internal consistency for the original and revised scales and subscales of the emotional social screening tool for school readiness before and after principal axis factoring. discussion this research study evaluated the psychometric properties of the newly constructed e3sr by post hoc analysis on a data set of 330 protocols. the data set supported an exploration of the factorial structure of the e3sr. the results suggested a revised factor structure with six subscales instead of the nine-factor solution in the original instrument. the efa yielded a six-factor solution. four of the subscales, emotional maturity, emotional management, sense of self and communication, were retained as conceptualised in the theoretical model. the efa proposed two mergers: firstly, social skills and pro-social behavior sub-domains were two separate domains in the theoretical model, which were merged in the data reduction process. the social skills and pro-social behaviour merge is understandable, as these domains are inter-related and interdependent with attributes that tap into similar hypothetical constructs (munnik, 2018; stefan, et al., 2009). the merged subscale was termed, social skills. secondly, three separate domains in the theoretical model were merged into one. the proposed merger between independence, mental well-being and alertness, and compliance with rules highlighted the interdependent nature of these constructs. the attributes of these sub-domains spoke of the child’s general readiness for learning, including their awareness of surroundings and the ability to reason within the context of social rules. it includes the ability to follow and adhere to ground rules stipulated in specific contexts, to be responsive to feedback about one’s behaviour in relation to complying with rules, and to be able to focus and attend to tasks independently. the merger of these domains of competence is not limited to one area of development or functioning but embraces the interrelationships between skills and behaviours across domains of development and learning (mohamed, 2013; munnik, 2018). the inter-related attributes of the merged sub-domain resonated with the research study, which stated that a child’s attitude towards learning is linked to several constructs, such as task persistence, attention, creativity, initiative, curiosity and problem solving (amod & heafield, 2013; mohamed, 2013). the results suggested good psychometric properties. the emotional social competence scale had a cronbach’s alpha of 0.97, indicative of excellent reliability and suitability for use in psychological research. the emotional competence and social competence domain had cronbach’s alphas of 0.95 and 0.94 respectively, both indicative of excellent reliability as per the classification provided by taber (2018). the revised subscales showed an excellent reliability as evidenced by cronbach’s alphas ranging from 0.92 to 0.95, indicative of internal consistency between items. the following limitations are noted: the timing of the data collection, that is, towards the end of the last term of the academic year impacted the assumption of normality as a requirement for data reduction or multivariate analysis. the violation of normality detracts from the robustness of the analysis, even though the assumption of normality was supported theoretically. the sample for the initial validation was limited to the cape metropole in the western cape. thus, the results, however encouraging, must be interpreted cautiously until a more inclusive target group can be recruited. the psychometric properties cannot be retested on the same sample, and thus, a cfa can only be conducted on a new sample. conclusion the e3sr is a valid and reliable screening tool for emotional–social competence as a domain of school readiness. the data reduction process supported a six-factor model, consisting of (1) emotional maturity, (2) sense of self, (3) communication, (4) emotional management, (5) readiness to learn and (6) social skills. the e3sr was successfully reduced to 36 items without losing important content. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions e.m., e.w. and m.s. all contributed equally to this work. funding information the authors thank the national research foundation for financial support using the following grants: nrf sabbatical grant for completion of phd 2018 and thutuka post phd track 2019–2021 awarded to erica munnik. the research study has not been commissioned nor does it represent the opinions of the nrf. no commissions or prohibitions have been placed on the study or dissemination protocol because of the funding. data availability data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references amod, z., & heafield, d. (2013). school readiness assessment in south africa. in k. cockcroft & s. laher (eds.), psychological assessment in south africa: research and applications (1st edn., pp. 74–85). johannesburg: wits university press. bruwer, m., hartell, c., & steyn, m. (2014). inclusive education and insufficient school readiness in grade 1: policy versus practice. south african journal of childhood education, 4(2), 18–35. https://doi.org/10.4102/sajce.v4i2.202 buja, a., & eyuboglu, n. (1992). remarks on parallel analysis. multivariate behavioral research, 27(4), 509–540. https://doi.org/10.1207/s15327906mbr2704_2 bustin, c. (2007). the development and validation of a social emotional school readiness scale. doctoral dissertation. bloemfontein, bl: university of the free state. costello, a.b., & osborne, j.w. (2005). best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. practical assessment, research & evaluation, 10(7), 1–9. department of basic education. (2013). promotion requirements of the national curriculum statement grades r – 12). retrieved from ww.acsi.co.za/legallegislativeadvocacy/policy-documents/the-national-policy-pertaining-to-the-programme-and-promotion-requirements-of-the-national-curriculum-statement-grade-r-12/ devellis, r.f. (2016). scale development: theory and applications (vol. 26). sage. field, a. (2013). discovering statistics using spss. sage. foxcroft, c.d. (1994). the development of a group screening measure for south african children. unpublished manuscript. university of port elizabeth. foxcroft, c.d. (2013). developing a psychological measure. in c. foxcroft & g. roodt (eds.), introduction to psychological assessment in the south african context (4th edn., pp. 69–81). oxford university press. foxcroft, c.d., & roodt, g. (2013). introduction to psychological assessment in the south african context. (4th edn.). oxford university press. gaskin, j. (2016). exploratory factor analysis: communalities. gaskination’s stat wiki. retrieved from http://statwiki.kolobkreations.com/index.php?title=exploratory_factor_analysis#communalities glorfeld, l.w. (1995). an improvement on horn’s parallel analysis methodology for selecting the correct number of factors to retain. educational and psychological measurement, 55(3), 377–393. https://doi.org/10.1177/0013164495055003002 health professionals council of south africa. (2010). the professional board for psychology. health professions council of south africa. list of tests classified as being psychological tests. form 207. retrieved from http://www.hpcsa.co.za/uploads/editor/userfiles/downloads/psych/psychom_form_207.pdf henson, r.k., & roberts, j.k. (2006). use of exploratory analysis in published research: common errors and some comments on improved practice. educational and psychological measurement, 66(3), 393–416. https://doi.org/10.1177/0013164405282485 horn, j.l. (1965). a rationale and test for the number of factors in factor analysis. psychometrika, 30(2), 179–185. https://doi.org/10.1007/bf02289447 human sciences research council of south africa. (1984). manual for the school readiness evaluation by trained testers. pretoria: hsrc. jacklin, l., & cockcroft, k. (2013). the griffiths mental developmental scales: an overview and a consideration of their relevance for south africa. in k. cockcroft & s. laher (eds.) (1st edn., pp. 169–185). psychological assessment in south africa: research and applications. johannesburg: wits university press. jones, j. (2018). the influence of a proposed margin criterion on the accuracy of parallel analysis in conditions engendering under-extraction. masters theses & specialist projects. paper 2446. retrieved from https://digitalcommons.wku.edu/theses/2446 kline, r.b. (2013). exploratory and confirmatory factor analysis. in y. petscher & c. schatsschneider (eds.), applied quantitative analysis in the social sciences (pp. 171–207). new york, ny: routledge. laher, s. (2010). using exploratory factor analysis in personality research: best-practice recommendations. sa journal of industrial psychology, 36(1), 1–7. https://doi.org/10.4102/sajip.v36i1.873 laher, s., & cockcroft, k. (2014). psychological assessment in post-apartheid south africa: the way forward. south african journal of psychology, 44(3), 303–314. https://doi.org/10.1177/0081246314533634 luiz, d., barnard, a., knoetzen, n., & kotras, n. (2004). griffiths mental development scales extended revised. (gmds-er). technical manual. amsterdam: association for research in infant and child development (aricd). madge, e.m., van den berg, a.r., & robinson, m. (1985). manual for the junior south african individual scales (jsais). pretoria: human science research council. makhalemele, t., & nel, m. (2016). challenges experienced by district-based support teams in the execution of their functions in a specific south african province. international journal of inclusive education, 20(2), 168–184. https://doi.org/10.1080/13603116.2015.1079270 mohamed, s.a. (2013). the development of a school readiness screening instrument for grade 00 (pre-grade r) learners. doctoral dissertation. university of the free state. munnik, e. (2018). the development of a screening tool for assessing emotional social competence in preschoolers as a domain of school readiness (doctoral dissertation). university of the western cape. retrieved from http://hdl.handle.net/11394/6099. munnik, e., & smith, m.r. (2019a). contextualising school readiness in south africa: stakeholders perspectives. south african journal of childhood education, 9(1), a680. https://doi.org/10.4102/sajce.v9i1.680 munnik, e., & smith, m.r. (2019b). methodological rigour and coherence in the construction of instruments: the emotional social screening tool for school readiness. african journal of psychological assessment, 1(0), a2. https://doi.org/10.4102/ajopa.v1i0.2. ngwaru, j.m. (2012). parental involvement in early childhood care and education: promoting children’s sustainable access to early schooling through social-emotional and literacy development. southern african review of education, 18(2), 25–40. nunnally, j.c., & bernstein, i.h. (1994). psychometric theory. new york, ny: mcgraw-hill. o’connor, b.p. (2000). spss and sas programs for determining the number of components using parallel analysis and velicer’s map test. behavior research methods, instrumentation, and computers, 32, 396–402. https://doi.org/10.3758/bf03200807 osborne, j.w., costello, a.b., & kellow, j.t. (2014). best practices in exploratory factor analysis (pp. 86–99). louisville, ky: createspace independent publishing platform. powell, p.j. (2010). the messiness of readiness. phi delta kappan, 92(3), 26–28. https://doi.org/10.1177/003172171009200307 raikes, a., dua, t. & britto, r. measuring early childhood development: priorities post 2015. in early childhood matters, june 2014/124, 74–78. bernard van leer foundation. netherlands. rimm kaufman, s., & sandilos, l. (2017). school transition and school readiness: an outcome of early childhood development. updated july 2017. encyclopedia on early childhood development [online]. cceecd, skc-ecd. retrieved from https://www.child-encyclopedia.com/sites/default/files/dossiers-complets/en/school-readiness.pdf. rimm-kaufmann, s.e., pianta, r.c., & cox, m.j. (2000). teachers’ judgments of problems in the transition to kindergarten. early childhood research quarterly, 15(2), 147–166. https://doi.org/10.1016/s0885-2006(00)00049-1 roodt, g., stroud, l., foxcroft, c., & elkonin, d. (2013). the use of assessment measures in various applied context. in c foxcroft & g roodt (eds), introduction to psychological assessment in the south african context (4th ed, pp 240–249). cape town: oxford university press. ştefan, c.a., bălaj, a., porumb, m., albu, m., & miclea, m. (2009). preschool screening for social and emotional competencies – development and psychometric properties. cognition, brain, behavior. an interdisciplinary journal, 13(2), 121–146. taber, k.s. (2018). the use of cronbach’s alpha when developing and reporting research instruments in science education. research in science education, 48(6), 1273–1296. https://doi.org/10.1007/s11165-016-9602-2. van rooyen, a.e., & engelbrecht, p. (1997). die effektiwiteit van enkele skoolgereedheidstoetse vir die voorspelling van skolastiese prestasie by die skool beginner. south african journal of education, 17(1), 7–10. velicer, w.f., eaton, c.a., & fava, j.l. (2000). construct explication through factor or component analysis: a review and evaluation of alternative procedures for determining the number of factors or components. in r.d. goffin & e. helmes (eds.), problems and solutions in human assessment boston (pp. 41–71). new york: kluwer. williams, b., onsman, a., & brown, t. (2010). exploratory factor analysis: a five-step guide for novices. australasian journal of paramedicine, 8(3). https://doi.org/10.33151/ajp.8.3.93 yunus, k. r. m., & dahlan, n. a. (2013). child-rearing practices and socio-economic status: possible implications for children’s educational outcomes. procedia social and behavioral sciences, (90), 251–259. https://doi.org/10.1016/j.sbspro.2013.07.089. abstract introduction method data analysis results discussion limitations conclusion acknowledgements references about the author(s) tyrone b. pretorius department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa anita padmanabhanunni department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa citation pretorius, t.b., & padmanabhanunni, a. (2020). beyond factor analysis: insights into the dimensionality of the fortitude questionnaire through bifactor statistical analysis. african journal of psychological assessment, 2(0), a30. https://doi.org/10.4102/ajopa.v2i0.30 original research beyond factor analysis: insights into the dimensionality of the fortitude questionnaire through bifactor statistical analysis tyrone b. pretorius, anita padmanabhanunni received: 09 june 2020; accepted: 11 sept. 2020; published: 20 oct. 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract this study applied confirmatory factor analyses to explore the factor structure of the fortitude questionnaire (forq) in three samples: adolescents, students and lay counsellors. for the student and lay counsellor samples, the analysis demonstrated that a bifactor and a three-factor second-order model of the forq was a much better fit than a one-factor model, whilst in the adolescent sample, there was no discernible difference between the three models. ancillary bifactor analysis was also conducted to examine the dimensionality of the forq. the bifactor measures confirmed that the forq is not unidimensional, but rather multidimensional for the student and lay counsellor samples. for the adolescent sample, there are some concerns as the general factor accounted for 77% of the variance, whilst the subscales accounted for only 23% of the variance. furthermore, in the standardised solution for the adolescent sample, only the factor loadings for the total scale were significant. in addition, the model-based estimates of reliability were low for the self-appraisal and support-appraisal subscales in the adolescent sample. this finding indicates that the forq was essentially unidimensional in the adolescent sample. these results suggest that for young adult and adult samples, the forq may be utilised as a total scale and three subscales, whilst in adolescent samples, caution needs to be applied in using the forq subscales with children and adolescent samples. however, further research that replicates this finding in adolescents and other samples is needed before a definite conclusion about the suitability of the forq in different age groups can be reached. keywords: forq; dimensionality; factor structure; fortitude; bifactor. introduction adversity is part of the human experience. for some people, the experience of adverse life events is associated with negative psychological outcomes, including depression, anxiety and post-traumatic stress disorder. for others, adversity leads to growth, adaptative functioning and effective coping. to understand this heterogeneity in responses to life stressors, pretorius (1998) proposed the construct of fortitude, which is defined as the psychological strength to manage stress and stay well. fortitude arises from three inter-related positive or fortigenic appraisals of self, family and external sources of support. self-appraisals include both a global positive evaluation of oneself as well as more specific positive appraisals of one’s competence and capacities to manage stressors. family appraisals entail an evaluation of both the general family environment and family unit as cohesive, facilitative of emotional expression and responsive and accessible in times of stress. support appraisals include an evaluation of the availability, accessibility and value of support from others (e.g. friends). from this theoretical perspective, pretorius (1998) developed the fortitude questionnaire (forq), which measures the psychological strength associated with managing adversity. since its development, the forq has been extensively used in south africa (e.g., geldenhuys & van schalkwyk, 2019; padmanabhanunni, 2020), as well as in several other countries, such as nigeria (adejuwon, aderogba, & adekeye, 2015), canada (beattie, stewart, & walker, 2016), indonesia (yuwanto & atmadji, 2017) and the united arab emirates (hameed, khan, shahab, hameed, & qadeer, 2016). the forq has been applied to different populations, such as healthcare workers (adejuwon et al., 2015), university students (beattie et al., 2016), adolescents exposed to traumatic events (pretorius, padmanabhanunni, & campbell, 2016), nurses caring for patients with alzheimer’s disease (heyns, venter, esterhuyse, bam, & odendaal, 2003) and lay trauma counsellors (padmanabhanunni, 2020). it has also been used for a variety of purposes, including to investigate the role of fortitude in mental health-seeking behaviour (beattie et al., 2016) and in psychological outcomes following exposure to secondary trauma (padmanabhanunni, 2020), to evaluate a resiliency programme for children (de villiers & van den bergh, 2012), to establish levels of wellbeing amongst adolescents in high-risk communities (geldenhuys & van schalkwyk, 2019) and to assess the effect of programmes intended to enhance psychosocial wellbeing (van schalkwyk & wissing, 2013). the forq has generally demonstrated sound internal consistency (cronbach’s alpha) in previous studies, with a few exceptions mostly related to the self-appraisal subscale (laureano, grobbelaar, & nienaber, 2014; talbot, 2012). both exploratory (yuwanto & atmadji, 2017) and cfa (wissing, du toit, & michael temane, 2008) have provided support for the conceptualisation of the forq as consisting of a total scale and three subscales. despite this conceptualisation of the forq, the questionnaire has been used in a variety of ways in terms of structure, namely as a total scale (geldenhuys & van schalkwyk, 2019), as a total scale with subscales (padmanabhanunni, 2020), as subscales only (talbot, 2012) and as selected subscales (peters, 2005). however, in the original conceptualisation (pretorius, 1998), fortitude was conceived as arising from the interaction of three domains: self-appraisals, family-appraisals and support-appraisals. it is therefore questionable whether the use of only subscales or selected subscales reflects the original conceptualisation of fortitude. with respect to scales that are presumed to consist of a total scale and several subscales, it is important to examine whether the subscales account for a sufficient amount of the variance amongst the items to be regarded as independent scores. if the subscale scores have very little specific reliable variance with most of the variance being the variance that is shared with other subscales, they cannot be regarded as independent scores. for example, exploratory factor analysis (efa) could result in a three-factor solution, and confirmatory factor analysis (cfa) could confirm that such a three-factor model best fits the data. however, the existence of such factors does not address the ‘essential’ unidimensionality or multidimensionality of the instrument (raykov & pohl, 2013). if a scale is multidimensional, it means that (1) a general factor accounts for some of the variance between items and (2) beyond the variance accounted for by the general factor, sufficient variance remains that is accounted for by the subscales. in an essentially unidimensional scale, a general latent variable account for the majority of variance, with only a small proportion of variance accounted for by the subscales (rodriguez, reise, & haviland, 2016a). in this instance, where the general factor accounts for almost all of the variance, the subscale scores should not be interpreted as independent scores. the importance of underscoring the difference between factor structure and dimensionality was highlighted in a study that applied bifactor statistical indices to 50 published studies of different questionnaires (rodriguez, reise, & haviland, 2016b). the researchers concluded that, although all of the measures in the 50 studies had been described as multidimensional, the bifactor indices indicated that the variance in all of these measures was overwhelmingly accounted for by a single latent variable. the aim of this study is to provide evidence of the dimensionality of the forq in three samples through the use of cfa and bifactor statistical indices. no published work to date has investigated the dimensionality of the forq, that is, the extent to which sufficient variance is accounted for by the subscales after isolating the variance attributable to the general scale. this kind of information can support or serve as a caution in the use of the total scale and/or subscales and highlight issues that need to be considered when the scale is used amongst specific population groups. method participants sample 1: padmanabhanunni (2020) investigated the role of fortitude in professional quality of life amongst lay trauma counsellors (n = 146) in the western cape province of south africa. the study employed a cross-sectional survey design and convenience sampling. the participants were lay trauma counsellors who worked for non-governmental organisations providing services in disadvantaged community contexts. the majority of participants were women (76.9%), and the mean age was 44 years. sample 2: pretorius et al. (2016) investigated the role of fortitude in affecting psychological outcomes after exposure to traumatic events amongst adolescents in two low-income communities in the western cape province of south africa. the participants were adolescents (n = 498) in grades 8–12. the majority of participants were women (51.2%) and afrikaans speaking (70.2%), and the mean age was 15.1 years. sample 3: the third data set is unpublished to date and focuses on fortitude in relation to psychological well-being. the associated study was conducted amongst undergraduate students (n = 454) at a university in the western cape province of south africa. the majority of participants were women (71.4%), and the mean age was 25.1 years. procedure study 1: participants received information on the nature and aims of the study and a request to participate in the study. consenting participants were provided with the questionnaires electronically or in person. the response rate was 58%. study 2: self-report measures were administered over a 2-week period. after the researchers determined the language preference of participants, the questionnaires were administered in english or afrikaans. when the use of afrikaans was appropriate, the forq was translated into afrikaans and back-translated into english. participants were provided with information regarding the nature and aims of the study, as well as the content and completion requirements of the questionnaire. study 3: participants received information regarding the nature and aims of the study during their regular classes. those interested in participating were provided with self-report measures that were completed anonymously. measures all three studies used the forq. the forq is a 20-item questionnaire that uses a four-point scale ranging from ‘does not apply’ to ‘applies very strongly’. the scale measures three domains of fortitude: self-appraisals, family-appraisals and support-appraisals. the sum of the three domains represents the individual’s level of fortitude. in a validation study, pretorius (1998) reported coefficient alphas of between 0.74 and 0.82 for the subscales and a coefficient of 0.85 for the full scale. other south african studies have reported reliability coefficients of between 0.77 and 0.88 (heyns et al., 2003; wissing et al., 2008). the forq is also correlated with measures of psychological distress and measures of self-appraisal (i.e. self-esteem), social support and the family environment (pretorius, 1998). in addition to the forq, the participants in the three studies completed the measures indicated below. study 1: participants completed two self-report measures: the life events checklist-5 (weathers et al., 2013) and the professional quality of life scale (stamm, 2005). study 2: participants completed the harvard trauma questionnaire (mollica et al., 1992). study 3: self-report measures included the satisfaction with life scale (diener, emmons, larsen, & griffin, 1985) and the positive and negative affect schedule (watson, clark, & tellegen, 1988). data analysis confirmatory factor analysis was used to test three conceptualisations of the factor structure in the three samples. in cfa, the items of the scale are regarded as the observed measurements, whilst the hypothesised factors are regarded as the latent variables represented by the items (bentler, 1995). the three conceptualisations of the factor structure of the forq that were examined were a one-factor model (representing a total fortitude score), a three-factor second-order model and a bifactor model. the bifactor model hypothesised that the forq consists of a single general factor with the three subscales as orthogonal factors reflecting the variance amongst clusters of items (mansolf & reise, 2017). more specifically, it is the reliable variance that remains after removing the variance attributable to the general factor. in addition, using the bifactor indices calculator (dueber, 2017), ancillary bifactor measures were calculated to clarify the dimensionality of the forq. these measures include (1) explained common variance (ecv), which is the proportion of reliable variance explained by the specific factor; (2) omega, which is a model-based estimate of reliability (omegas for subscales); and (3) omega hierarchical (omegah), which indicates the proportion of systematic variance in total scores that can be attributed to individual differences on the general factor. in general, a high omegah (> 0.80) is an indication that the scale is essentially unidimensional. for subscales, the omegahs represents the proportion of reliable systematic variance of a subscale score after excluding the variability attributed to the general factor (rodriguez et al., 2016a). in cfa, the extent to which the hypothesised model fits the observed data is measured by the chi-square statistic (χ2), which tests the null hypothesis of a perfect fit. jöreskog, olsson and wallentin (2016), however, pointed out that the χ2 test has too much power in large samples and is very sensitive to violations of distributional assumptions. kline (2005) suggested that, in addition to the model χ2, at a minimum, the following indices should be reported: the root mean square error of approximation (rmsea: best if close to 0.08 or less), comparative fit index (cfi: best if close to 0.90 or greater) and standardised root mean square residual (srmr: best if close to 0.08 or less). additional indices include the goodness-of-fit index (gfi: best if close to 0.95 or greater) and tucker–lewis index (tli: best if close to 0.95 or greater; byrne, 1994; hu & bentler, 1999). arbuckle (2012) also proposed the inclusion of fit indices such as akaike’s information criterion (aic), which is used specifically for model comparisons. lower aic values are generally associated with a better model fit. with the exception of the bifactor indices, all analyses were performed using ibm spss amos (version 26; ibm corp., armonk, ny, usa). ethical consideration study 1: ethical approval for the study was provided by the humanities and social sciences research committee of the university of the western cape. assent was obtained from non-governmental organisation (ngo) directors to contact lay counsellors, and each participant completed an informed consent form. the questionnaires contained no identifying information. study 2: ethical approval for the study was provided by the university of the free state. the parents of the participants provided consent, and all questionnaires were completed anonymously. the nature and aims of the research were described to each class, and confidentiality was assured. study 3: ethical approval for the study was obtained from the research ethics committee of the university of the western cape. participants completed informed consent forms, and questionnaires were completed anonymously. results the three models that were tested with cfa in the three samples are presented in figures 1 and 2. figure 1: one-factor and bifactor models of the factor structure of fortitude questionnaire. rectangles are the observed measurements (items), and ellipses are latent variables. figure 2: three-factor higher order model of the factor structure of the fortitude questionnaire. rectangles are the observed measurements (items), and ellipses are latent variables. the one-factor model presumes that a single factor (fortitude) best explains the variance amongst the items, whilst the three-factor second-order model presumes that a second-order factor (total scale) best accounts for the variance amongst the first-order factors (subscales). the bifactor model, in contrast, presumes that a single general factor (fortitude) explains some of the variance, whilst three specific factors (subscales) account for the remainder of the variance. the fit indices for the three models in the three samples are reported in table 1. table 1: fit indices for two models of the structure of the fortitude questionnaire in three samples. as detailed in table 1, there was almost no difference in the fit indices of the three models in the adolescent sample. whilst the model comparison index (aic) was lower for the bifactor model (456.54 < 476.45 and 481.13), indicating a marginally better fit than the one-factor and three-factor second-order models, the other indices, including the rmsea (0.05) and srmr (0.06), indicated that all three models fit the data to an acceptable degree. in the student and lay counsellor samples, however, the bifactor model demonstrated a better fit than the one-factor model and marginally better than the three-factor second-order model. the model comparison index was much lower for the bifactor model in both samples (418.55 < 1132.58 and 464.58, as well as 370.64 < 597.80 and 413.33), and the cfi (0.94 for student sample), rmsea (0.05 and 0.07) and srmr (0.04 and 0.04) met the criteria for acceptable fit for the bifactor model. the cfi, tli and gfi in the case of the lay counsellor sample were not interpreted because, as a rule of thumb, incremental fit measures such as these are not very informative when the rmsea null model is below 0.158 (kenny, 2020). because of the small sample size of the lay counsellor sample, the rmsea null model in this instance was below 0.158 for all three models. the one-factor model failed to meet any of the criteria indicating a good fit in both samples, with the exception of the srmr (0.06) in the case of lay counsellors. the standardised solution for the three samples with respect to the cfa is reported in table 2. table 2: standardised solution for the fortitude questionnaire in the three samples. in the adolescent sample, all the loadings on the general factor (fortitude) were significant. the loadings on the specific factors (subscales), however, were all non-significant, with the exception of item 1 of the self-appraisal subscale. in the student sample, the loadings for the general factor as well as the specific factors were all significant. the same applied to the lay counsellor sample, except for item 6 of the self-appraisal subscale and item 2 of the support-appraisal subscale. despite the evidence provided by the cfa in relation to the bifactor structure of the forq, the cfa did not address the dimensionality of the questionnaire. more specifically, the cfa did not clarify the relative proportion of variance accounted for by the total scale and the subscales. for this reason, some authors have called for the use of bifactor indices to examine dimensionality (rodriguez et al., 2016a). these indices for the forq across the three samples are reported in table 3. table 3: dimensionality indices for the fortitude questionnaire. explained common variance is the proportion of all common variance for all items explained by a factor. in the case of the student and lay counsellor samples, table 3 indicates that the general factor (fortitude) explained 44% and 57%, respectively, of the common variance. the specific factors (self-appraisal, support-appraisals and family-appraisals), therefore, explained 56% and 43% of the variance in the student and lay counsellor samples, respectively. this result confirms the multidimensionality of the forq for these samples, as the specific factors accounted for sufficient variance after the variance attributable to the general factor was taken into consideration. the omega/omegas coefficient, which is a model-based estimate of reliability, further confirmed that the self-appraisal (omegas: students = 0.82; lay counsellors = 0.74), support-appraisal (omegas: students = 0.78; lay counsellors = 0.85) and family-appraisal (omegas: students = 0.77; lay counsellors = 0.89) subscales demonstrated sufficient reliability. it also confirmed the reliability of the general factor (omega = 0.88 and 0.91). omegahs, which is the reliable variance of the subscales after removing the variance attributable to the general factor, was reasonable in both the lay counsellor (self-appraisals = 0.54, support-appraisals = 0.10, family-appraisals = 0.43) and student sample (self-appraisals = 0.45, support-appraisals = 0.53, family-appraisals = 0.30). additionally, the results of the bifactor analysis with respect to the adolescent sample suggest that the forq is essentially unidimensional in this sample. first, the general factor explained 77% of the common variance, whilst the specific factors explained only 23% of the variance. second, the model-based indicator of reliability, omegas, reflected low levels of reliability for the self-appraisal (0.42) and support-appraisal (0.52) subscales. third, omegah, which reflects the percentage of variance in total scores attributable to the general factor, was very close to the cut-off point suggested in the literature. reise, bonifay and haviland (2013) proposed that when omegah is greater than 0.80, the scale can be considered essentially unidimensional. lastly, omegahs, which indicates the percentage of reliable variance of the subscales after considering variability because of the general factor, was extremely low in the three subscales (self-appraisals = 0.05; support-appraisals = 0.01; family-appraisals = 0.04). discussion given the different structures in which the forq has been used in published research, this article sought to investigate, through cfa and bifactor statistical indices, the factor structure and dimensionality of the forq in three samples: adolescents, students and adults. the cfa demonstrated that, in the student and lay counsellor samples, the bifactor model is a better fit compared to the one-factor model and only marginally a better fit than the three-factor second-order model. however, both the bifactor model and the second-order model confirmed the conceptualisation of the forq as consisting of a total scale and three subscales in these two samples. for the adolescent sample, there were no significant differences in indices of fit amongst the one-factor, the second-order model and bifactor models. the standardised solution resulting from cfa suggested that only the general factor was meaningful in the adolescent sample, as only the loadings for the total scale were significant, whilst the loadings on the specific factors were non-significant. in the case of the student and lay counsellor samples, the loadings on both the general and specific factors were significant, with the exception of two items loading on the self-appraisal and support-appraisal subscales in the case of the lay counsellor sample. the bifactor statistical indices suggest that the forq was multidimensional in both the student and lay counsellor samples. the ecv indicates that the general factor only accounted for 44% and 57% of the variance amongst items in the student and lay counsellor samples, respectively, whilst 46% and 53% of the variance was accounted for by the subscales. the factor loadings confirm this result, as all the loadings, with two exceptions, were significant. for the adolescent sample, the non-significant factor loadings and the bifactor statistical indices suggest that the forq in this sample is essentially unidimensional, as the general factor explained 77% of the common variance, whilst the specific factors explained only 23%. the model-based estimates of reliability were also very low for two of the subscales in this sample. it therefore appears that whilst the total scale (i.e., fortitude) is meaningful in this sample, the use of subscales may not be justified. this finding may be ascribed to the ongoing process of self-concept clarification that takes place during the adolescent phase of development and ultimately leads to the consolidation of appraisals of self, family and significant others. self-concept clarity refers to the extent to which beliefs about the self and others are clearly defined, internally consistent and stable across time (crocetti, rubini, branje, koot, & meeus, 2016). a core task in adolescence is the acquisition of an enduring self-concept, which occurs through interactions with peers, parents and significant others (laursen & hartl, 2013). during adolescence, more time is spent with peers than family members; therefore, the reference group for social experiences and social support shifts towards the peer group. indeed, peer acceptance has a significant impact on self-esteem and related appraisals of competence and worth (crocetti et al., 2016). the adolescent phase is also associated with increased conflict in the parent–child relationship, possibly owing to the adolescent’s increasing need for independence and autonomy. this conflict can lead to physical and emotional distancing from parents and other family members and can influence appraisals of the family environment and the family as a potential source of support (moed et al., 2015). the search for identity can lead to adolescents seeking out new experiences, taking on different roles and forming new relationships. some of these relationships may be dissolved within a short period of time owing to new interests (laursen & hartl, 2013). social perspective-taking abilities also increase, and adolescents come to better understand the extent to which others can be relied upon in times of need (raufelder, sahabandu, martínez, & escobar, 2015). as cognitive maturation is a gradual process, the adolescents in the study may have been in the process of consolidating their appraisals of self and others, which may account for the unidimensional nature of the forq in relation to the adolescent sample. taken together, the cfa and the bifactor analyses seem to suggest that caution needs to be applied in using the forq subscales with children and adolescents. however, further research that replicates this finding in adolescents and other samples is needed before a definite conclusion about the suitability of the forq in different age groups can be reached. finally, the article demonstrates the importance of going beyond overall model fit statistics, especially with regard to bifactor models, and implementing bifactor analyses. bornovalova et al. (2020) cautioned against the overreliance on overall model fit indices and highlighted the problem of overfitting in the case of the bifactor model. they argued that overall fit statistics favour the bifactor model over other models, and because of the flexibility of this model, the bifactor model ‘can exhibit good global fit even if the pattern of loadings does not resemble a bifactor structure in any meaningful sense’ (p. 2). limitations the small sample size with respect to the lay counsellors should be considered a limitation, and it has affected the interpretation of incremental fit indices. whilst it is possible that the forq is multidimensional for older age groups, this might just be a sample-specific result with a sample size of n = 146, and as pointed out earlier, this finding would need to be replicated several times before a definite conclusion can be made. similarly, there is also evidence in younger population groups that most of the variance might be attributable to the general factor, and this needs to be replicated as the results for the different language groups were pooled. however, whilst age and language were offered as potentially explaining the unidimensionality of the scale in adolescents, there are many other potentially confounding variables that could have played a role. for example, the adolescents were drawn from a marginalised and extremely disadvantaged community; thus, socio-economic status might also have been a confounding variable. conclusion this article confirms the multidimensionality of the forq and provides support for its use as a scale consisting of a general factor and three specific factors for young adults and adults. the results indicate, however, that for adolescents the forq is essentially unidimensional, which suggests that only the general factor should be used. however, further research is called for to replicate these tentative findings. the article also highlights the importance of going beyond overall model fit statistics, especially for bifactor models and to implement bifactor analyses to draw conclusions about the unidimensionality or multidimensionality of scales. acknowledgements competing interests the authors have declared that no competing interests exist. authors’ contributions both authors contributed equally to this work. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability statement data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references adejuwon, g.a., aderogba, a., & adekeye, o.a. (2015). health workers’ commitment in delta state: influence of personality and workplace experiences. mediterranean journal of social sciences, 6(4), 258–258. https://doi.org/10.5901/mjss.2015.v6n4s2p258 arbuckle, j.l. (2012), amos 21.0 user’s guide. chicago, il: spss inc. beattie, b.e., stewart, d.w., & walker, j.r. (2016). a moderator analysis of the relationship between mental health help-seeking attitudes and behaviours among young adults. canadian journal of counselling and psychotherapy, 50(3), 290–314. retrieved from https://cjc-rcc.ucalgary.ca/article/view/61119 bentler, p.m. (1995). eqs: structural equations program manual. encino, ca: multivariate software. bornovalova, m.a., choate, a.m., fatimah, h., petersen, k.j., & wiernik, b.m. (2020). appropriate use of bifactor analysis in psychopathology research: appreciating benefits and limitations. biological psychiatry, 88(1), 18–27. https://doi.org/10.1016/j.biopsych.2020.01.013 byrne, b.m. (1994). testing for the factorial validity, replication, and invariance of a measuring instrument: a paradigmatic application based on the maslach burnout inventory. multivariate behavioral research, 29(3), 289–311. https://doi:10.1207/s15327906mbr2903_5 crocetti, e., rubini, m., branje, s., koot, h.m., & meeus, w. (2016). self-concept clarity in adolescents and parents: a six-wave longitudinal and multi-informant study on development and intergenerational transmission. journal of personality, 84(5), 580–593. https://doi.org/10.1111/jopy.12181 de villiers, m., & van den berg, h. (2012). the implementation and evaluation of a resiliency programme for children. south african journal of psychology, 42(1), 93–102. https://doi.org/10.1177/008124631204200110 diener, e.d., emmons, r.a., larsen, r.j., & griffin, s. (1985). the satisfaction with life scale. journal of personality assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13 dueber, d.m. (2017). bifactor indices calculator: a microsoft excel-based tool to calculate various indices relevant to bifactor cfa models. retrieved from http://sites.education.uky.edu/apslab/resources/ geldenhuys, o., & van schalkwyk, i. (2019). investigating the relational well-being of a group of adolescents in a south african high-risk community. psychology and behavioral science international journal, 12(2). retrieved from https://www.semanticscholar.org/paper/relational-well-being-of-a-group-of-adolescents-in-geldenhuys/66da2791de1955ba1622d140007d79447d03afa8 hameed, i., khan, m.b., shahab, a., hameed, i., & qadeer, f. (2016). science, technology and innovation through entrepreneurship education in the united arab emirates (uae). sustainability, 8(12), 1280. https://doi.org/10.3390/su8121280 heyns, p.m., venter, j.h., esterhuyse, k.g., bam, r.h., & odendaal, d.c. (2003). nurses caring for patients with alzheimer’s disease: their strengths and risk of burnout. south african journal of psychology, 33(2), 80–85. https://doi.org/10.1177/008124630303300202 hu, l.t., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling: a multidisciplinary journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 jöreskog, k.g., olsson, u.h., & wallentin, f.y. (2016). confirmatory factor analysis (cfa). in springer series in statistics, 283–339. https://doi.org/10.1007/978-3-319-33153-9 kenny, d.a. (2020). measuring model fit. retrieved from http://www.davidakenny.net/cm/fit.htm kline, r.b. (2005). principles and practice of structural equation modelling. 2nd edn. new york: guilford. laureano, c., grobbelaar, h.w., & nienaber, a.w. (2014). facilitating the coping self-efficacy and psychological well-being of student rugby players. south african journal of psychology, 44(4), 483–497. https://doi.org/10.1177%2f0081246314541635 laursen, b., & hartl, a.c. (2013). understanding loneliness during adolescence: developmental changes that increase the risk of perceived social isolation. journal of adolescence, 36(6), 1261–1268. https://doi.org/10.1016/j.adolescence.2013.06.003 mansolf, m., & reise, s.p. (2017). when and why the second-order and bifactor models are distinguishable. intelligence, 61(1), 120–129. https://doi.org/10.1016/j.intell.2017.01.012 moed, a., gershoff, e.t., eisenberg, n., hofer, c., losoya, s., spinrad, t.l., & liew, j. (2015). parent–adolescent conflict as sequences of reciprocal negative emotion: links with conflict resolution and adolescents’ behavior problems. journal of youth and adolescence, 44(8), 1607–1622. https://doi.org/10.1007/s10964-014-0209-5 mollica, r.f., capi-yavin, y., bollini, p., truong, t., tor, s., & lavelle, j. (1992). validating a cross-cultural instrument for measuring torture, trauma and post-traumatic stress disorder in indochinese refugees. journal of nervous and mental disease, 180(2), 111–116. https://doi.10.1097/00005053-199202000-00008 padmanabhanunni, a. (2020). caring does not always cost: the role of fortitude in the association between personal trauma exposure and professional quality of life among lay trauma counsellors. traumatology. advanced online publication. peters, e. (2005). neuropsychological executive functioning and psychosocial well-being. doctoral dissertation. potchefstroom: north-west university. retrieved from http://hdl.handle.net/10394/865 pretorius, t.b. (1998). fortitude as stress-resistance: development and validation of the fortitude questionnaire (forq). bellville: university of the western cape. retrieved from https://www.uwc.ac.za/rectorsoffice/pages/inferential_data_files.aspx pretorius, t.b., padmanabhanunni, a., & campbell, j. (2016). the role of fortitude in relation to exposure to violence among adolescents living in lower socio-economic areas in south africa. journal of child & adolescent mental health, 28(2), 153–162. https://doi.org/10.2989/17280583.2016.1200587 raufelder, d., sahabandu, d., martínez, g.s., & escobar, v. (2015). the mediating role of social relationships in the association of adolescents’ individual school self-concept and their school engagement, belonging and helplessness in school. educational psychology, 35(2), 137–157. https://doi.org/10.1080/01443410.2013.849327 raykov, t., & pohl, s. (2013). essential unidimensionality examination for multicomponent scales: an interrelationship decomposition approach. educational and psychological measurement, 73(4), 581–600. https://doi.org/10.1177/0013164412470451 reise, s.p., bonifay, w.e., & haviland, m.g. (2013). scoring and modeling psychological measures in the presence of multidimensionality. journal of personality assessment, 95(2), 129–140. https://doi.org/10.1080/00223891.2012.725437 rodriguez, a., reise, s.p., & haviland, m.g. (2016a). evaluating bifactor models: calculating and interpreting statistical indices. psychological methods, 21(2), 137. https://doi.org/10.1037/met0000045 rodriguez, a., reise, s.p., & haviland, m.g. (2016b). applying bifactor statistical indices in the evaluation of psychological measures. journal of personality assessment, 98(3), 223–237. https://doi.org/10.1080/00223891.2015.1089249 stamm, b.h. (2005). the proqol manual: the professional quality of life scale: compassion satisfaction, burnout & compassion fatigue/secondary trauma scales. baltimore, md: sidran. talbot, b.d. (2012). the prediction of psychological well-being in children and adolescents with chronic, life threatening illnesses. doctoral dissertation. university of the free state, bloemfontein. retrieved from http://hdl.handle.net/11660/1545 van schalkwyk, i., & wissing, m.p. (2013). ‘evaluation of a programme to enhance flourishing in adolescents’. in m.p. wissing (ed.). well-being research in south africa (pp. 581–605). dordrecht: springer. watson, d., clark, l.a., & tellegen, a. (1988). development and validation of brief measures of positive and negative affect: the panas scales. journal of personality and social psychology, 54(6), 1063. https://10.1037/0022-3514.54.6.1063 weathers, f.w., blake, d.d., schnurr, p.p., kaloupek, d.g., marx, b.p., & keane, t.m. (2013). the life events checklist for dsm-5 (lec-5). national center for ptsd, white river junction, vt. retrieved from http://www.ptsd.va.gov wissing, j.a.b., wissing, m.p., du toit, m.m., & michael temane, q.m. (2008) psychometric properties of various scales measuring psychological well-being in a south african context: the fort 1 project, journal of psychology in africa, 18(4), 511–520. https://doi.org/10.108dx.0/14330237.2008.10820230 yuwanto, l., & atmadji, g. (2017). pengembangan fortitude questionnaire versi indonesia. jurnal ilmiah psikologi mind set, 8(1), 31–36. retrieved from http://journal.univpancasila.ac.id/index.php/mindset/article/view/321 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) charles h. van wijk division of health systems and public health, department of global health, faculty of medicine and health sciences, stellenbosch university, cape town, south africa department of psychology, institute for maritime medicine, simon’s town, south africa citation van wijk, c.h. (2021). usefulness of the english version of the stress overload scale in a sample of employed south africans. african journal of psychological assessment, 3(0), a41. https://doi.org/10.4102/ajopa.v3i0.41 original research usefulness of the english version of the stress overload scale in a sample of employed south africans charles h. van wijk received: 23 oct. 2020; accepted: 23 may 2021; published: 25 june 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract amidst reports of high levels of stress in south africa, it remained difficult to quantify psychological stress in the absence of locally validated measures. this study explored the english version of the stress overload scale (sos) in a south african sample. the first aim was to replicate the basic psychometric analysis of the original english version used in american samples, as well as the setswana version used in a south african rural community setting. the second aim was to investigate criterion validity to determine its appropriateness for use in south africa. a total of 2136 employed south africans with at least 9 years of schooling participated in this study. participants completed a range of mental health and well-being measures, both clinical and dispositional. responses were analysed to examine both scale characteristics and validity indices related to the sos. little sociodemographic influence (age, gender and first language) was found, with analyses supporting validity across most indices. furthermore, good predictive ability for mental (ill-) health was observed. this study, for the most part, replicated previous validation findings of the sos. validity was further confirmed by correlating the scale with measures of clinical mental health and dispositional well-being. given the positive support to its validity, when used amongst employed south africans with at least 9 years of education, the scale holds promise for application in local health-related research, for triage in primary healthcare contexts and for measuring outcomes of mental health interventions in therapeutic settings. keywords: anxiety; depression; occupational medicine; stress; stress overload scale; south africa; workplace health. introduction headlines such as ‘south africa second most stressed country in world’ (city press, 2019) or ‘work stress cost sa r40 billion’ (business report, 2016) certainly have shock value and, whilst seemingly intended to attract readers’ curiosity, are also based on (some) truth. the issue of how stress is understood locally in south africa (sa) has been brought into sharp focus during the recent coronavirus 2019 (covid-19) pandemic, where individuals had to face challenges from finding adequate shelter and food to managing social isolation and anxiety about the future. against this backdrop, psychologists are often asked questions such as ‘what is stress?’ or ‘how do psychologists measure stress?’ there are many models on stress in existence today, ranging from engineering to biology to psychology. within the field of medical physiology, the term stress was first used by hans selye in 1936 to describe the non-specific response of the body to any demand for change. within the field of psychology, stress is typically described from a transactional perspective (lazarus & folkman, 1984), where individuals evaluate both the events or demands they are facing (called primary appraisal) and the resources available to them (called secondary appraisal). in the transactional definition, stress may occur when individuals appraise the demands of their environments as exceeding their personal resources (overload). when overload is experienced, well-being is affected negatively (lazarus & folkman, 1984). putting it differently, stress is the degree to which a person feels overwhelmed or unable to cope as a result of pressures that are perceived as unmanageable (mental health foundation, 2018). there are many questionnaires and inventories purporting to measure stress, with some focussing on environmental demands and others on personal resources, but a few include a transactional approach that would emphasise overload as a result of a perception of excessive demands and/or depletion of resources. in response to this shortcoming of previous scales, amirkhan (2012) developed the stress overload scale (sos). this scale has strong theoretical underpinnings and captures overload through two underlying factors, namely, impinging demands (termed ‘event load [el]’) and the depletion of resources (termed ‘personal vulnerability [pv]’) to handle those demands (amirkhan, 2012, 2018). the 30-item sos has been found to predict illness and cortisol responses, as well as sick days and workdays missed. it also distinguishes between stressed and non-stressed populations (amirkhan, 2012; amirkhan, urizar, & clark, 2015). amirkhan (2018) also developed the stress overload scale–short form (sos-s) for application in contexts where the long form may prove impractical. the original english version sos and sos-s have been extensively validated in different community and college samples in the united states of america (amirkhan, 2012, 2018; amirkhan et al., 2015). as context influence both the kind of stressors and the availability of resources in a community, a setswana version was developed and tested in a rural community sample in sa (wilson, wissing, & schutte, 2018). that study reported good psychometric support for the short form. rationale and aims although high levels of stress have been reported amongst working south africans, the multilingual nature of sa society and potentially divergent understandings of stress and mental well-being pose challenges to the use of globally available measures to identify stress and predict possible negative consequences locally. the english version sos has not been validated in sa, but if sound psychometric support could be found, it would then allow for the use of a single-language version with a larger part of the employed population. this study set out to explore the english version sos in a sample of employed south africans with at least 9 years of schooling (to enable meaningful completion of the english language scale). it aimed to do so in two ways. firstly, it aimed to replicate aspects of two previous projects, namely, the validation of the original english version used in american validation studies with communityand college-based samples (amirkhan, 2012, 2018) and the setswana version used in a south african rural community setting (wilson et al., 2018). this was performed in order to report on basic psychometric properties. secondly, it aimed to investigate criterion validity by exploring associations across a range of mental health and well-being measures. this was carried out in order to report on practical value of the scale (for possible use in research or clinical practice). analysis was performed on both the long and short versions of the sos. this study aimed to contribute to the validation of the english version of the sos (and sos-s) for use in sa workplace populations. if a single language version can be used on a larger segment of the population, it may enable the identification of stress overload that in turn may pose a risk to poor mental health and emotional distress. where this can be identified, it could facilitate the allocation of resources to support mental health and well-being in the workplace. methods participants participants were recruited through workplace occupational health programmes and invited to complete the sos when they were completing their regular health surveillance questionnaires. potential participants were briefed that completion of the forms would be interpreted as implied consent and that the results would not form part of their occupational health screening outcome. participants were given time at work to complete the sos and other measures, in group settings, whilst sitting at individual work stations. all participants had a minimum of 9 years of formal schooling. this was to ensure a level of english proficiency sufficient to complete the indicated range of standard mental health measures. participants also completed a range of other measures of mental health and well-being, but because of practical considerations (e.g. different protocols used at different sites), not all participants were involved in all aspects of measurement. the numbers of completed measures are shown in table 1. furthermore, participants also underwent a semi-structured interview with a clinical psychologist, who was blind to the questionnaire outcome and who allocated a binary category of the presence of ‘any mental health disorder’ at the completion of each interview. for participants who presented with signs or symptoms of poor mental health, referrals were arranged to an appropriate mental health service provider (e.g. psychologist or medical practitioner, depending on the need at the time). table 1: descriptive and correlational statistics for the sample. measures stress overload scale the sos comprises 30 items and is designed to measure ‘stress overload’ (amirkhan, 2012). a 5-point likert scale (1 = not at all and 5 = a lot) is used to indicate subjective feelings and thoughts experienced during the previous week. there are two factors underlying overload, namely, pv and el, which are measured by two distinct but correlated subscales (12 items each); there are also six filler items included to discourage negative response sets, which are not scored (amirkhan, 2012). the scales can be summed to obtain a continuous total score, with higher scores indicating higher levels of stress overload. alternatively, the subscales can be split at their means to form a four-category diagnostic matrix; those scoring in the high el–high pv category have been shown to be at the greatest risk for subsequent pathology (amirkhan, 2012). only the continuous scoring was used in the current study. the sos is unique in the sense that it was empirically constructed through a sequenced series of factor analytic and psychometric studies, using community samples matched to us census demographic proportions (amirkhan, 2012). this provides three advantages. firstly, it is psychometrically strong, especially in terms of validity; secondly, it is appropriate to community research because of its brevity and fit to a broad demographic spectrum and thirdly, it is unique in its ability to cross-section individuals into risk categories (amirkhan, 2012). the sos has excellent internal consistency (with cronbach’s α > 0.94 for both subscales and full scale) and good test-retest reliability (with coefficients averaging 0.75 over 1 week; amirkhan, 2012). construct validity has been demonstrated through significant correlations with other measures of stress and illness (amirkhan, 2012; amirkhan et al., 2015). criterion validity has been shown in the sos ability to predict illness following a stressful event. furthermore, sos scores have been found to significantly correlate with illness, sick days and workdays missed (amirkhan, 2012; amirkhan et al., 2015). psychometric support for the setswana version of the sos was less convincing, with an inconclusive confirmatory factor analysis (cfa) reported. exploratory factor analysis (efa) identified four factors, which were not theoretically interpretable (wilson et al., 2018). this study used the english version sos, in its standard administration, with two small modifications. extensive piloting showed that two 1-word items proved somewhat difficult (14 out of 316 cases), namely, items 3 and 4. both were modified by including a rider in parentheses after the word. for item 3, ‘like you’re not up to the task’ was added in parentheses and for item 4, ‘like you need to do too many things’ was added in parentheses. the 10-item sos-s (amirkhan, 2018) preserved many of the features of the full measure, including the two-subscale structure that permits both continuous and categorical scoring. it also maintains the full measure’s excellent internal consistency (α > 0.94) and good test-retest reliability (r = 0.75; amirkhan, 2018). construct validity was demonstrated by significant convergence with the similar length perceived stress scale (pss-10). criterion validity was shown in its associations with both concurrent and future signs of illness, both symptomatic and behavioural (amirkhan, 2018). because filler items are missing in the sos-s, it is more vulnerable to response biases; however, even with social desirability and negative affectivity influences controlled, it maintained significant associations with signs of illness (amirkhan, 2018). positive psychometric support for the setswana version of the sos-s was demonstrated previously: a cfa reported good model fit and evidence of concurrent validity was found through significant correlations with the phq-9 (wilson et al., 2018). markers of mental health and mental well-being mood, anxiety and alcohol abuse disorders are the most common mental health conditions in sa (herman et al., 2009) and four measures were included to examine correlations between the sos and these clinical constructs. to examine correlations with non-clinical measures, some participants also completed two dispositional measures that are often related to general mental well-being. the patient health questionnaire for depression (phq-9) is a well-established 9-item screening, diagnostic and monitoring tool measuring the severity of depression (gilbody, richards, & barkham, 2007; kroenke, spitzer, & williams, 2001) and previous use in sa reported cronbach’s α > 0.70 (bhana, rathod, selohilwe, kathree, & petersen, 2015; wilson et al., 2018). strong correlations were previously found for the sos-s and phq-9 in a setswana sample (el: r = 0.42, pv: r = –0.47, p < 0.001 for both; wilson et al., 2018) and with the related center for epidemiological studies-depression (ces-d) in a us sample (full scale: r = 0.53, el: r = 0.46, pv: r = 0.52, p < 0.0001 for all; amirkhan, 2012). the generalized anxiety disorder questionnaire (gad-7) is a well-established 7-item screening, diagnostic and monitoring tool measuring the severity of generalise anxiety (löwe et al., 2008; spitzer, kroenke, williams, & lowe, 2006). it also has utility for detecting panic and social anxiety disorder (kroenke, spitzer, williams, monahan, & löwe, 2007). cronbach’s α > 0.90 has been reported (spitzer et al., 2006). no correlations between the sos and markers of generalised anxiety have been reported previously. the primary care post-traumatic stress disorder (ptsd) screen for diagnostic and statistical manual of mental disorders, fifth edition (dsm-5) (pc-ptsd-5) was developed as a brief 5-item screen for ptsd in pc settings, with cronbach’s α > 0.90 and high diagnostic accuracy (bovin et al., 2021; prins et al., 2016). this measure was included against the background of sa’s high reported prevalence of post-traumatic stress (williams et al., 2007). no correlations between the sos and markers of post-traumatic stress have been reported previously. the 4-item cage questionnaire (ewing, 1984) has been developed for use in identifying problematic alcohol use, with high sensitivity and specificity reported (dhalla & kopec, 2007; o’brien, 2008; williams, 2014). it has been used extensively across the world and also in sa (labadarios, 2018; van wijk, cronje, & meintjes, 2020). no correlations between the sos and markers of problematic alcohol use have been reported previously. the state trait personality inventory, trait version (stpi-t; spielberger, 1996) is a 40-item measure of emotional disposition (including dispositional anxiety, curiosity, anger and depression) in adults (spielberger & reheiser, 2009). acceptable psychometric properties were reported for sa samples (du plessis, 2014; van wijk, 2017). these dispositional traits have been strongly correlated with measures of both work and non-work stress (hogan, carlson, & dua, 2002) and strong correlations with the pss-10 (r = 0.7) have been reported (silver, 2013). furthermore, significant correlations have been reported for stpi-t anxiety and primary appraisal tasks (r = 0.30, p < 0.001) but not secondary appraisal tasks (r = 0.15, p = 0.064; abdullatif, 2006). in contrast to the phq-9 and gad-7, the stpi-t is not a clinical (i.e. diagnostic) scale, but rather reflects personal disposition, manifested in general mental well-being (spielberger, 1996). the 15-item dispositional resilience scale (drs-15; bartone, 2007) measures hardiness, which has been described as a psychological orientation associated with people who remain healthy and continue to perform well in a range of stressful conditions (kobasa, maddi, & kahn, 1982). hardiness appears to protect against the ill effects of stress on health and performance amongst a wide variety of occupations and contexts (bartone, 1989; maddi & hess, 1992; maddi & kobasa, 1984; topf, 1989). whilst hardiness and its sub-components could theoretically affect both primary and secondary appraisal, and thus the interpretation of stress, its influence may be most visible on pv (related to secondary appraisal). in this regard, significant correlations with the drs-45 (a longer version of the same measure) were reported with the full-score sos (r = –0.31, p < 0.001), pv (r = –0.40, p < 0.001) and el (r = –0.17, p < 0.01) (amirkhan, 2012). sociodemographic information initial validation studies of the sos and sos-s observed only small associations (seldom reaching significance) for age and gender (amirkhan, 2012, 2018). age and gender information was available for the current sample and was included in the analysis. home language was retrospectively coded into two categories, namely, english first language and non-english first language, for further analysis. occupational domains are reported for sample description only. diagnostic marker for any mental health disorder after each interview, the consulting psychologist allocated a binary category to each participant, indicating the presence of ‘any mental health disorder’. this was based on a semi-structured interview and allocated at the discretion of each individual clinical psychologist. data analysis mean, standard deviation (sd) and range (for the full scale and its factors, as well as the short form and its factors) were calculated and are reported in table 1. sociodemographic effects were examined through pearson’s correlation coefficients (for age) and t-tests for independent samples (gender and language). cronbach’s alpha coefficient was calculated to describe internal consistency (for full and short forms and factors). factor analysis was conducted using an efa, given that the previous sa study did not find meaningful outcomes from a cfa (wilson et al., 2018). this was performed for both the full scale (24 items) and short form (10 items). validity indices were examined through correlations with scores on the four measures of mental health and two measures of mental well-being. possible predictive associations with mental health diagnoses were calculated using receiver operating characteristics (roc) curve analysis, with the binary interview outcome as state variable. all analyses were performed by using the statistical package for the social science (ibm spss for windows. version 24). ethical considerations the study has been approved by the health research ethics committee of stellenbosch university (reference number: n20-07-078). results sample characteristics the sample consisted of 2136 employed south africans, with a mean age of 34.0 years (± 8.4); of whom 19.8% reported english as their first language. participants came from all sa language groups and a wide range of occupational domains. further breakdown of the sample composition can be found in tables 1 and 2. all participants had a minimum 9 years of schooling and were considered skilled workers. table 2: home language and occupation field distribution of sample. sociodemographic effects age correlations with sos scores, whilst significant, were very small (see table 1) and might not have had much practical impact in the current sample with a limited age range (20–60 years). in the case of both gender and language variables, there were no significant differences between the mean scores of women and men, or of english first language and non-english first language speakers, on any the sos full-scale or short form totals and subscale totals (see table 3). given the lack of significant differences between gender and language groups, and the very small mean differences between them, the combined full sample was used for the remainder of the analyses. table 3: results of independent t-tests for gender and language. scale characteristics high internal reliability was found for the full scale (cronbach’s α = 0.946) and short form (cronbach’s α = 0.901). in neither case did any item deletion improve α. the subscales el and pv (using the original item allocation) were strongly correlated (r = 0.806, p < 0.001). the full scale and short form of the sos correlated strongly (r = 0.969, p < 0.001) and the full-scale and short-form subscales were also strongly correlated (see table 1). the results of the efa for the full scale showed a kaiser-meyer-olkin (kmo) index of 0.972 and the result of the bartlett sphericity test was significant (p < 0.001). using principal component analysis, two factors with eigenvalues > 1 could be extracted, explaining 53.1% of variance. there were substantial cross-loading (< 0.40) on eight items, mainly items of the el subscale loading onto the pv subscale (see table 4). the two factors were strongly correlated (r = 0.629 and p < 0.001). as the item loading did not follow the clear differentiation of el and pv in the original studies, it did raise the issue of language influences and a separate efa was conducted for the english first language and non-english first language groups. the analyses resulted in similar matrices, for both groups, as presented in table 4 for the combined group. table 4: principal component analysis with varimax rotation. the results of the efa for the short form showed a kmo index of 0.921 and significant bartlett sphericity test (p < 0.001). principal component analysis identified a single factor (eigenvalue = 4.82) that explained 54.0% of variance. validity indices descriptive statistics of mental health and well-being scales can be found in table 1, as well as their correlations with the sos full and short forms and subscales. all correlations between sos scores and measures of mental health and well-being were significant at p < 0.001. strong correlations were observed for the clinical scales phq-9 and gad-7, with correlations with pv stronger than with el in both cases. correlations with pc-ptsd-5 were moderate and lower for the cage measure. strong correlations were also observed with the stpi-t subscales for dispositional anxiety and depression and in each case, pv displayed stronger correlations than el. strong correlation with dispositional anger and moderate correlation with dispositional curiosity and hardiness (drs-15) were also found. the sos demonstrated positive predictive validity for ‘any mental health disorder’. the roc analysis showed highly significant areas under the curve (full scale = 0.916 [95% ci: 0.896–0.938]; short form = 0.898 [95 % ci: 0.872–0.925]). optimal sensitivity and specificity appear to be around > 51 (86% sensitivity and 83% specificity) for the full scale and around > 20 (86% sensitivity and 81% specificity) for the short form (see table 5). table 5: sensitivity and specificity data for the stress overload scale and any mental health disorder. discussion this study used data from a large sample of employed adult south africans to replicate aspects of previous validation studies and extend previous exploration of associations with mental health constructs. the sociodemographic variables of age and gender appeared to have very little practical effect on sos scores in this sample. more importantly, first language in this group of relatively educated workers did not meaningfully influence scores on the sos or sos-s. it appears that the english version scale may be used across different language groups in sa, providing participants have at least a grade nine level education. regarding scale characteristics, high internal consistency was observed, mirroring the original validation studies (amirkhan, 2012; amirkhan et al., 2015). the strong correlation between the el and pv subscales was higher than previously reported (amirkhan, 2012), suggesting some overlap of the underlying constructs. furthermore, whilst the two factors identified during the efa generally adhered to the original subscales (with a few exceptions), the substantial cross-loading and high inter-factor correlation questioned the extent to which the factors could be viewed as distinct constructs. as with previous sa research using the setswana version (wilson et al., 2018), the factor structure poses a challenge to the uncritical acceptance of the sos’s structural validity. the sos demonstrated positive predictive validity and scores could predict the risk for mental health disorders with reasonable probability. current sensitivity and specificity appear adequate for research use. furthermore, the strong correlations between the full scale and short form suggest that the short form can be used confidently where there are concerns regarding time or respondent fatigue. positive criterion validity was demonstrated through significant correlations with all the measures of (clinical) mental health and (dispositional) well-being. strong correlation with the phq-9 and gad-7 supported earlier studies using comparable clinical measures (amirkhan, 2012; wilson et al., 2018). similarly, strong correlation with the stpi-t closely followed earlier studies using comparable measures (silver, 2013). one observation was of particular interest, namely, that clinical and dispositional indicators were more strongly associated with pv than el. however, it could be argued that whilst the pattern of effect sizes (i.e. higher for pv than for el) was consistent, the actual effect size difference may not have been that meaningful, given the high confidence intervals for significance reported. furthermore, the problematic factor analyses cautions against confident interpretation of subscale scores. correlations with hardiness, a construct of personal orientation to life, were moderate and very similar to earlier reports (amirkhan, 2012), although the difference between el and pv was not as substantial as previously reported, likely because of the scale overlap observed here. local applications the sos can be used across a number of local applications: in primary healthcare (in both community and occupational setting), the sos could be used for screening – on a larger scale – to facilitate the streaming of high-risk individuals to appropriate support services (‘triage’). within research in the local health context, it could be used to explore associations of stress overload with specific health conditions (in a clinical health framework) and other health outcomes (in an occupational health framework). within therapeutic settings, the sos could be used productively to measure outcomes of psychotherapeutic (and other) interventions through longitudinal comparison. in a broader national context, it could also be possibly used for quantifying stress overload in different sectors of the sa economy. limitations this study has a number of limitations. determination of english language proficiency used years of formal education as proxy. the perils of using this criterion in sa, with its history of disparate educational opportunities, resource allocation and outcome standards, are recognised and future studies may need to include finer calibrated indicators of language competence. determination of ‘any mental health disorder’ was performed by different clinical psychologists as part of their clinical practice. there were a number of psychologists involved over time and it is recognised that the threshold for allocating a yes response might have differed amongst them. although this might have been mitigated by them being very experienced in this type of work, future studies may need to standardise the criteria for such a category more explicitly to enhance inter-rater reliability. future directions the close association of stress overload with both clinical indicators and dispositional orientation raises the question of direction of influence, which may occur in opposite directions. it has previously been argued that appraisal of stress causally affects mental health because higher perceived stress would lead to a higher incidence of mental health diagnoses (de lange, taris, kompier, houtman, & bongers, 2004; diette, goldsmith, hamilton, & darity, 2012; shigemi, mino, & ohtsu, 2000). it has further been argued that personal disposition would causally affect the appraisal of stress in the sense that a resilient disposition would lead to lower perceived stress (abdullatif, 2006; amirkhan & greaves, 2003). future research will need to empirically test these hypotheses locally, to add to the understanding of the relationship amongst dispositional or personality constructs, stress appraisal and mental health disorders in the sa context. conclusion this study replicated, for the most part, previous validation of the sos and extended validity exploration across multiple measures. high internal consistency and positive criterion validation were confirmed. most of the tested indices provided evidence of validity of the original sos in the study context, suggesting that it could be usefully employed across different language groups where at least a grade nine-level education can be demonstrated. the subscales might not provide equal confidence, and further research is required to explore the factorial structure of the sos and the use of subscales as individual markers. there is some support for the use of the full scale and short form in research and clinical practice. for example, this sample was accessed prior to the covid-19 pandemic and their scores could be viewed as reflective of that period. using this as baseline, the full-scale sos (and sos-s) could be used constructively in studies to explore perceived stress, in local comparable populations, in the post-covid-19 era. acknowledgements the author would like to thank nazneen firfirey for her assistance in managing the collected data. competing interests the author declared that he has no financial or personal relationships that may have inappropriately influenced him in writing this article. author’s contributions c.h.v.w is the sole author for this article. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the data that support the findings of this study are available from the author, upon reasonable request. the data are not publicly available because of privacy and ethical considerations. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author. references abdullatif, q. (2006). effects of trait anxiety and cognitive appraisals on emotional reactions to psychological and physical stressors. unpublished doctoral dissertation. university of south florida. retrieved from https://scholarcommons.usf.edu/cgi/viewcontent.cgi?referer=https://www.google.com/&httpsredir=1&article=3431&context=etd amirkhan, j.h. (2012). stress overload: a new approach to the assessment of stress. american journal of community psychology, 49(1–2), 55–71. https://doi.org/10.1007/s10464-011-9438-x amirkhan, j.h. (2018). a brief stress diagnostic tool: the short stress overload scale. assessment, 25(8), 1001–1013. https://doi.org/10.1177/1073191116673173 amirkhan, j.h., & greaves, h. (2003). sense of coherence and stress: the mechanics of a healthy disposition. psychology & health, 18(1), 31–62. https://doi.org/10.1080/0887044021000044233 amirkhan, j.h., urizar, g.g., & clark, s. (2015). criterion validation of a stress measure: the stress overload scale. psychological assessment, 27(3), 985–996. https://doi.org/10.1037/pas0000081 bartone, p.t. (1989). predictors of stress-related illness in city bus drivers. journal of occupational medicine, 31(8), 657–663. https://doi.org/10.1097/00043764-198908000-00008 bartone, p.t. (2007). test-retest reliability of the dispositional resilience scale-15, a brief hardiness scale. psychological reports, 101(3 pt 1), 943–944. https://doi.org/10.2466/pr0.101.3.943-944 bhana, a., rathod, s.d., selohilwe, o., kathree, t., & petersen, i. (2015). the validity of the patient health questionnaire for screening depression in chronic care patients in primary health care in south africa. bmc psychiatry, 15, a118. https://doi.org/10.1186/s12888-015-0503-0 bovin, m.j., kimerling, r., weathers, f. w., prins, a., marx, b.p., post, e.p., & schnurr, p.p. (2021). diagnostic accuracy and acceptability of the primary care posttraumatic stress disorder screen for the diagnostic and statistical manual of mental disorders (fifth edition) among us veterans. j ama network open, 4(2), e2036733. https://doi.org/10.1001/jamanetworkopen.2020.36733 business report. (2016, october 10). work stress cost sa r40bn. retrieved from https://www.iol.co.za/business-report/economy/work-stress-costs-sa-r40bn-2077997 city press. (2019, april 23). sa is the second most stressed country in the world: here’s how you can cope. retrieved from https://city-press.news24.com/careers/sa-is-the-second-most-stressed-country-in-the-world-heres-how-you-can-cope-20190423. see also bloomberg, 2017, https://www.bloomberg.com/graphics/best-and-worst/#most-stressed-out-countries de lange, a.h., taris, t.w., kompier, m.a.j., houtman, i.l.d., & bongers, p.m. (2004). the relationships between work characteristics and mental health: examining normal, reversed and reciprocal relationships in a 4-wave study. work & stress, 18(2), 149–166. https://doi.org/10.1080/02678370412331270860 dhalla, s., & kopec, j.a. (2007). the cage questionnaire for alcohol misuse: a review of reliability and validity studies. clinical & investigative medicine, 30(1), 33–41. https://doi.org/10.25011/cim.v30i1.447 diette, t.m., goldsmith, a.h., hamilton, d., & darity, w. (2012). causality in the relationship between mental health and unemployment. in l.d. appelbaum (ed.), reconnecting to work: policies to mitigate long-term unemployment and its consequences (pp. 63–94). kalamazoo, mi: w.e. upjohn. du plessis, k.e. (2014). an evaluation of the psychometric properties of the stpi (form y) for south african students. unpublished master’s thesis. university of pretoria, pretoria. retrieved from http://hdl.handle.net/2263/43322 ewing, j.a. (1984). detecting alcoholism: the cage questionnaire. journal of the american medical association, 252(14), 1905–1907. https://doi.org/10.1001/jama.1984.03350140051025 gilbody, s., richards, d., & barkham, m. (2007). diagnosing depression in primary care using self-completed instruments: uk validation of phq–9 and core–om. british journal of general practice, 57(541), 650–652. herman, a.a., stein, d.j., seedat, s., heeringa, s.g., moomal, h., & williams, d.r. (2009). the south african stress and health (sash) study: 12-month and lifetime prevalence of common mental disorders. south african medical journal, 99(5 pt 2), 339–344. hogan, j.m., carlson, j.g., & dua, j. (2002). stressors and stress reactions among university personnel. international journal of stress management, 9, 289–310. kobasa, s.c., maddi, s.r., & kahn, s. (1982). hardiness and health: a prospective study. journal of personality and social psychology, 42(1), 168–177. https://doi.org/10.1037/0022-3514.42.1.168 kroenke, k., spitzer, r.l., & williams, j.b. (2001). the phq-9: validity of a brief depression severity measure. journal of general internal medicine, 16(9), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x kroenke, k., spitzer, r.l., williams, j.b.w., monahan, p.o., & löwe, b. (2007). anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. annals of internal medicine, 146(5), 317–325. https://doi.org/10.7326/0003-4819-146-5-200703060-00004 labadarios, g. (2018). determination of a brief audit screening questionnaire to identify women at risk of harmful and hazardous alcohol consumption in primary care settings. unpublished master’s thesis. university of cape town, cape town. retrieved from https://open.uct.ac.za/bitstream/handle/11427/29356/thesis_hsf_2018_labadarios_grace.pdf?sequence=1&isallowed=y lazarus, r.s., & folkman, s. (1984). stress, appraisal, and coping. new york, ny: springer. löwe, b., decker, o., müller, s., brähler, e., schellberg, d., herzog, w., & herzberg, p.y. (2008). validation and standardization of the generalized anxiety disorder screener (gad-7) in the general population. medical care, 46(3), 266–274. https://doi.org/10.1097/mlr.0b013e318160d093 maddi, s.r., & hess, m. (1992). personality hardiness and success in basketball. international journal of sports psychology, 23(4), 360–368. maddi, s.r., & kobasa, s.c. (1984). the hardy executive: health under stress. burr ridge, il. irwin professional publishing. mental health foundation. (2018). stress. retrieved from https://www.mentalhealth.org.uk/a-to-z/s/stress o’brien, c.p. (2008). the cage questionnaire for detection of alcoholism. jama, 300(17), 2054–2056. https://doi.org/10.1001/jama.2008.570 prins, a., bovin, m.j., smolenski, d.j., mark, b.p., kimerling, r., jenkins-guarnier, m.a., kaloupek, d.g., … tiet, q.q. (2016). the primary care ptsd screen for dsm-5 (pc-ptsd-5): development and evaluation within a veteran primary care sample. journal of general internal medicine, 31, 1206–1211. https://doi.org/10.1007/s11606-016-3703-5 shigemi, j., mino, y., & ohtsu, t. (2000). effects of perceived job stress on mental health: a longitudinal survey in a japanese electronics company. european journal of epidemiology, 16, 371–376. https://doi.org/10.1023/a:1007646323031 silver, r. (2013). coping with college stress: does sense of coherence influence the use of alcohol and otc medication? unpublished doctoral dissertation. syracuse university. retrieved from https://surface.syr.edu/cgi/viewcontent.cgi?article=1180&context=psy_etd sinclair, r.r., & tetrick, l.e. (2000). implications of item wording for hardiness structure, relation with neuroticism, and stress buffering. journal of research in personality, 34(1), 1–25. https://doi.org/10.1006/jrpe.1999.2265 spielberger, c.d. (1996). preliminary manual for the state-trait personality inventory. tampa, fl: university of south florida. spielberger, c.d., & reheiser, e.c. (2009). assessment of emotions: anxiety, anger, depression, and curiosity. applied psychology: health and well-being, 1(3), 271–302. https://doi.org/10.1111/j.1758-0854.2009.01017.x spitzer, r.l., kroenke, k., williams, j.b., & lowe, b. (2006). a brief measure for assessing generalized anxiety disorder: the gad-7. archives of internal medicine, 166(10), 1092–1097. https://doi.org/10.1001/archinte.166.10.1092 topf, m. (1989). personality hardiness, occupational stress, and burnout in critical care nurses. research in nursing health, 12(3), 179–186. https://doi.org/10.1002/nur.4770120308 van wijk, c.h. (2017). screening mental well-being in high demand occupational settings in south africa. european scientific journal, 13(14), 140–157. https://doi.org/10.19044/esj.2017.v13n14p140 van wijk, c.h., cronje, f.j., & meintjes, w.a.j. (2020). mental wellbeing monitoring in a sample of emergency medical service personnel. occupational diseases and environmental medicine, 8(1), 26–33. https://doi.org/10.4236/odem.2020.81002 williams, n. (2014). the cage questionnaire. occupational medicine, 64(6), 473–474. https://doi.org/10.1093/occmed/kqu058 williams, s.l., williams, d.r., stein, d.j., seedat, s., jackson, p.b., & moomal, h. (2007). multiple traumatic events and psychological distress: the south africa stress and health study. journal of traumatic stress, 20(5), 845–855. https://doi.org/10.1002/jts.20252 wilson, a., wissing, m.p., & schutte, l. (2018). validation of the stress overload scale and stress overload scale – short form among a setswana-speaking community in south africa. south african journal of psychology, 48(1), 21–31. https://doi.org/10.1177/0081246317705241 abstract introduction application of methodology illustrative case: the emotional social screening tool for school readiness discussion significance of the study concluding remarks acknowledgements references about the author(s) erica munnik department of psychology, university of the western cape, cape town, south africa mario r. smith department of psychology, university of the western cape, cape town, south africa citation munnik, e., & smith, m.r. (2019). methodological rigour and coherence in the construction of instruments: the emotional social screening tool for school readiness. african journal of psychological assessment, 1(0), a2. https://doi.org/10.4102/ajopa.v1i0.2 original research methodological rigour and coherence in the construction of instruments: the emotional social screening tool for school readiness erica munnik, mario r. smith received: 30 oct. 2018; accepted: 11 may 2019; published: 24 june 2019 copyright: © 2019. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the need for a contextually relevant and empirically grounded measure of emotional social competence in grade r children was identified in the literature. the aim of this study was to develop a contextually relevant instrument for emotional social competence in preschool children. the study adopted a four-phase approach with each phase using distinct methodological approaches. this article reports on the use of multiple research methods to achieve methodological rigour and coherence in the construction. phase 1 used systematic review methodology to establish a theoretical foundation for the instrument. the results identified two domains and nine subdomains that formed the theoretical model for the instrument. in phase 2 stakeholder perceptions of emotional and social competence were identified through concept mapping to increase contextual relevance and sensitivity. the results highlighted that early stimulation and contextual factors impacted school readiness and needed to be included. the construction of the instrument incorporated the findings from the first two stages. the draft instrument was presented to a panel of experts, using the delphi technique, for validation of content and scalar decisions in phase 3. the results supported the proposed format and content of the screening tool. the resulting instrument was piloted in phase 4 with survey research. good internal consistency was reported and the factor structure supported. the multiphase methodology provided an overarching framework with methodological rigour and coherence. the grounding in the literature, stakeholder consultation and rigorous validation processes enhanced the resultant instrument. the articulation of one phase into the next ensured methodological coherence. keywords: e3sr; test construction; systematic review; concept mapping; delphi study; survey research. introduction test construction process and models foxcroft (2011) identified that the lack of methodological coherence and scientific rigour followed in the construction and validation phases resulted in many instruments being perceived as inadequate. evidence of the strategies and the rigour in the process of test construction is essential to ensure that instruments are deemed adequate, reliable and valid for use in applied contexts (foxcroft, 2013). thus, a need exists for rigorous studies in scale construction that employ coherent design principles. this manuscript reports on the use of multiple methodologies to strengthen the methodological rigour and coherence in the construction of instruments. the construction process of the emotional social screening tool for school readiness (e3sr) is used as an illustrative case study. theoretical framework devellis (2016) conceptualised scale construction as a continuous, well-designed process with four distinct steps, namely (1) theoretical foundation, (2) scale construction, (3) structural validation and (4) preparation of manuals. this framework provides an overarching model for the process of scale construction that underscores the methodological decisions that must be taken in order to develop a sound scale. each step entails a series of activities that pursue the aim of the respective steps and feed into the overarching model. the first step in devellis’s model entails the establishment of a theoretical foundation. three core activities included here are the thorough consultation of the literature to identify current thinking and theory about the construct, available instruments, domains included and definitions used. a major concern in scale construction is the extent to which stakeholder consultation takes place. this is particularly important for enhancing contextual relevance and sensitivity in construct definition (foxcroft, 2011). stakeholder consultation also increases buy-in with users (kline, 2015). the theoretical and operational definitions for the proposed scale are developed from the literature and stakeholder consultation in this process. step 2 is focused on scale construction. in this step, the selection of items, pre-testing and revision of the scale receive attention. the scalar decisions made are often not reported explicitly or interrogated sufficiently. thus, this step must ensure that important aspects such as the user group, target group and scoring values are appropriate for the intended scale. during test construction, scalar decisions are often made without due consideration or without the recognition that it constitutes a methodological decision. the third step is focused on the structural validation of the scale. this step usually includes piloting of the newly constructed instrument. piloting is often performed with conveniently selected samples without due consideration for methodological or design principles guiding this kind of research process. the resulting data set is then used to establish the psychometric properties of the scale. typical techniques used include cronbach’s alpha for internal consistency, factor analytics methods for construct validation and where possible convergent or discriminant analyses for criterion referencing. the challenge is that these methods are often applied without testing whether the data set satisfied the requirements for the respective statistical analysis or data reduction as recommended by kline (2015). in addition, statistical techniques are applied at a technical level without using theoretical formulations to guide the analytic process (kline, 2015). the fourth step in the model entails the writing of manuals. particular attention is paid to technical details about the scale construction, guidelines for administration and use and instructions for scoring and interpretation. a particular challenge is that many scales are constructed without the subsequent preparation of manuals detailing appropriate use and construction. the four steps in this model provide a coherent process that culminates in the accurate conceptualisation, construction and documentation of the scale (devellis, 2016). the model also underscores that construction is a continuous process (devellis, 2016). the revision and ongoing refinement of the instrument follows the same four step process which makes the model cyclical and continuous. the challenge often is that instruments are used with expanding populations and samples without using the feedback loop that contributes to further refinement. the scale becomes a means to an end without attention to the ongoing construction process. this is evidenced by the lack of reporting on psychometric properties of scales when used in subsequent studies (foxcroft, 2004). in short, this model provides a logical process for scale construction, but leaves the operationalisation of steps to the developers. research and development into scale construction often lacks rigorous attention to methodological principles at the various stages. thus, there is a need to demonstrate how multiple methodologies can be harnessed to strengthen the activities within each of the steps. figure 1 illustrates the model proposed by devellis (2016). figure 1: model of test construction. application of methodology step 1: theoretical foundation from the above discussion, two activities emerged as key considerations in the first step. the first consideration is that the existing body of literature must be consulted and consolidated to identify definitions of the identified construct, and scales measuring the identified construct in part or as a whole. narrative literature reviews are limited in that they do not provide a systematic, replicable process for filtering through the body of literature. in summative literature reviews, researchers often read specific sources at the expense of more comprehensive searches. the traditional approach to this step can be strengthened with the use of secondary research methods that specifically attempt to filter the body of literature following a specified set of procedures (gough, oliver, & thomas, 2017). for example, scoping reviews and systematic reviews are recognised research methods that provide a rigorous process for the identification of literature reporting on a particular construct. scoping reviews are recommended when researchers want to obtain an overview of the available literature reporting on a particular construct (grant & booth, 2009). systematic reviews reportedly are the highest form of evidence and provide a critical appraisal of the literature to identify good-quality research from which information about the construct can be extracted. wardlaw (2010) can be consulted for a comprehensive summary of systematic review methodology. the primary consideration is to strengthen this step by replacing narrative reviews with scoping or systematic reviews. a major advantage is that scale developers can identify existing measures and extract information about definitions and theoretical formulations of the construct. the second consideration is to consult stakeholders about the constructs under study. through this consultation, contextual relevance of the construct can be enhanced. stakeholder consultation should follow a rigorous methodological process and can draw on existing methodologies that have demonstrated efficacy in this. concept mapping is recommended for distilling the perceptions of a variety of people into one coherent whole (see pokharel, 2009 for a comprehensive overview). concept mapping can draw on qualitative methods if more exploratory work is required or quantitative methods if the construct needs further development (novak & cañas, 2006). through the use of methods like concept mapping, important insights can be gained for consideration in the development of the construct. the combination of consolidation of the literature and stakeholder consultation can strengthen the resulting theoretical and operational definitions of the construct under study. the use of these methodologies then operationalises at least two activities in this first step. each of these activities will be informed by well-established methodologies that lend rigour and methodological coherence to the establishment of a theoretical foundation for the proposed scale. step 2: scale construction the primary consideration in this step can be summarised in two activities. the first activity would be to make scalar decisions explicit. scalar decisions such as the intended user group of the scale, administration guidelines, scoring keys and the selection of items should all be documented clearly and the decisions substantiated. this process of careful and explicit documentation will become the basis of a draft manual that will be finalised in the final step. the primary consideration is to strengthen this step through improved documentation of decisions with motivation which will ensure an engagement with a more systematic and methodical process of decision-making. the second activity entails testing the scalar decisions and the pool of draft items against an external panel. testing and the scalar decisions can be performed through established techniques such as delphi studies. the delphi method is an iterative process to collect and distil the anonymous judgements of experts using a series of data collection and analysis techniques interspersed with feedback (boulkedid, abdoul, loustau, sibony, & alberti, 2011). this type of research method employs a qualitative methodology when an interactive panel of experts is invited to share their expertise and work towards a consensus about a set of indicators by sharing expertise and opinions. by employing this methodology, one is able to facilitate an organised discussion that analyses information individually, but also as a set. the steps that are usually followed include (1) identification of a clearly defined research problem, rationale, aim and objectives, (2) the selection of expert panellists, (3) the development of a stimulus document, (4) dissemination of information (stimulus document) in various rounds and (5) analyses of feedback after each round with incorporation of feedback into the next rounds until consensus is reached. see boulkedid et al. (2011) for a comprehensive review of the delphi methodology. delphi studies present stimulus documents in an iterative process to a panel. the panel provides feedback after each round and revisions are made until there is consensus on the items presented. the draft formulated in the first activity of this step can be used as the stimulus document. panels are carefully constituted and can include experts and/or stakeholder groups who can provide input from identified vantage points. delphi studies are well documented as an effective method to establish content validity (hasson, keeney, & mckenna, 2000). the resultant document will be a more refined version that is ready for piloting. the combination of documenting scalar decisions and delphi techniques can significantly enhance the construction process. it provides the constructor with an opportunity to record initial considerations and expands the construction team through the inclusion of the delphi panel. thus, the end product is strengthened through the introduction of rigorous methods and more explicit reporting that can provide insight into how the resultant scale performs. step 3: structural validation the primary consideration in this step includes two activities. the first activity is conducting a pilot study of the newly constructed instrument. the pilot study should be conceptualised well to ensure that methodological decisions such as sampling are taken into consideration. survey research provides a well-established framework for pilot studies that can enhance the methodological rigour of the pilot study and the quality of the resulting data set. the second activity is the calculation of psychometric properties. this process should be guided by a strong theoretical formulation. developers must identify whether they are testing a theoretical model or exploring how items load onto factors in a more organic process. the former would set out to test a theoretical model that has been conceptualised a priori. the latter uses a pool of items and examines how items would load onto factors and the number of factors in the solution. thus, the data reduction process is not merely a technical exercise, but a well thought out analytic process that follows a broader theoretical underpinning. the resulting data must be tested to determine whether the data conform to the requirements for the selected analysis or data reduction. testing the assumptions underpinning inferential statistics and data reduction must become empirical questions. this is an important step to ensure that the data support the selected analysis. establishing the psychometric properties of the scale can proceed with a greater measure of confidence if the assumptions for the data analyses or data reduction were tested. pilot studies that are more formalised and incorporate good practice methodological principles can strengthen this step substantially. it shifts the focus from technical aspects of establishing psychometric properties to the overall scientific and empirical value of the pilot study. step 4: revision and manuals the primary consideration here is to strengthen this step through two activities. the first activity prioritises the production of technical and instructional manuals. instructional manuals ensure that the resultant instrument is used appropriately. instructional manuals capture the scientific rigour of the construction process and provide a template for other researchers. the second activity entails further piloting and refinement either by the primary developers or by other researchers who may use the scale. primary developers should actively conduct further research on the structural validity of the scale and refine as indicated. permission should be granted to other researchers to use the scale with the proviso that feedback is provided about the psychometric properties of the scale in subsequent studies. this ensures that there is clear commitment to continued refinement of the instrument. figure 2 illustrates the link between methods, activities and steps in the model. figure 2: steps of test construction and methodological choices. ethical consideration project registration and ethics clearance were granted by the senate research committee of the university of the western cape (ethical clearance number: 14/2/8). illustrative case: the emotional social screening tool for school readiness the e3sr was developed as part of a doctoral study by munnik (2018). the aim was to develop an instrument that could assess emotional and social competence in grade r children as part of school readiness assessment. the aims and objectives of the study reflected the first three steps of the devellis (2016) framework. the conceptual framework articulated into a four-phase study. the phases were conceptualised as separate studies with independent methodologies. the results of each phase fed into the succeeding phase to form a coherent whole resulting in the prototype of the e3sr. a comprehensive discussion of the results can be accessed in the unpublished thesis of munnik (2018). the phases below are described for illustrative purposes and not a detailed discussion of the results. step 1: establish a theoretical foundation the first step included two activities that articulated into two separate phases. phase 1: consolidation of the literature the first phase corresponded to the first activity that was the consolidation of the body of literature reporting on emotional social competence. systematic review methodology was adopted to conduct two reviews focusing on (1) definitions of emotional social readiness in pre-schoolers and (2) instruments measuring emotional social competence in pre-schoolers, respectively. the reviews took place at four levels: (1) identification of articles with specific keywords or phrases, (2) screening or filtering of the identified articles by abstract, (3) appraisal of the identified article with a quality appraisal tool and (4) the summation of the articles by means of data extraction and meta-synthesis. the smith franciscus swartbooi (sfs) scoring system was used to evaluate the identified studies for methodological quality (smith, franciscus, swartbooi, munnik, & jacobs, 2015). the preferred reporting items for systematic reviews and meta-analyses (prisma) informed the filtration process used to consolidate the literature (liberati et al., 2009). reviews were conducted by a team of reviewers. team meetings were facilitated in which reviewers discussed their assessments. after each operational step, reviewers were provided an opportunity to calibrate their findings. the first review identified existing definitions of emotional social readiness in pre-schoolers from good-quality literature. peer-reviewed, full text articles published between january 2003 and december 2013 were identified from a comprehensive search across eight databases selected on their relevance to psychology and education, as well as reference mining and grey literature. a total of 68 titles were identified of which seven articles were included in the final summation. theoretical and operational definitions and their underpinning behaviours or attributes were extracted. the results indicated that there is no consensus on the definition of emotional and social competence in preschool children. the second review identified instruments purported to measure emotional or social readiness or competence as part of school readiness. peer-reviewed full text articles with a quantitative design published between january 2002 and december 2012 were identified from a comprehensive search across eight databases. four articles were included in the final summation from 282 titles. four instruments were identified and data were extracted that included (1) a description of identified instrument, (2) type of instrument, (3) aim of the instrument, (4) target group, (5) theoretical and operational definitions, (6) sample items in domains, (7) administration, (8) language of construction and (9) psychometric properties. the review indicated the need for a single-form, strengths-based screening instrument rather than a diagnostic tool. the results indicated that ease of administration and interpretation would allow for a wider application across the health professions. an integrated instrument would thus be more applicable and beneficial in the south african context. the review identified the lack of psychometrically sound, contextually appropriate measures for school readiness more specifically emotional or social readiness as a domain of school readiness. phase 2: stakeholder consultation concept mapping was used to consult stakeholders about their perceptions of school readiness and emotional social competence as a domain of school readiness. five focus groups were conducted with a purposive sample of 23 educators, 9 professionals and 9 parents. two semi-structured interviews were conducted with an educator and a paediatrician, respectively, who were unable to attend the focus groups. participants were recruited from a mixture of socio-economic areas to provide a cross-section of contextual considerations at play. data collection and analysis happened concurrently until saturation was reached (creswell, 2007). the conventions of reflexivity, dependability and trustworthiness of data were adhered to. thematic analysis informed by braun and clarke (2006) was used and produced four core themes. the results were used to develop an interpretable conceptual framework, expressed in the language of the participants. this resulted in a more nuanced and contextualised understanding of emotional social readiness. the resultant concept map illustrated that understandings of children’s emotional social readiness cannot be separated from the systems within which they function. societal, community, educational and familial systems act as the overarching framework and influence children’s emotional social readiness before school entry. the findings from phases 1 and 2 formed the basis for developing theoretical and operational definitions of emotional and social competence as primary domains. nine subdomain definitions were operationalised for (1) emotional maturity, (2) emotional management, (3) independence, (4) positive sense of self, (5) mental well-being and alertness, (6) social skills or confidence, (7) pro-social behaviour, (8) compliance to rules and (9) communication. these definitions formed the basis for a contextually sensitive theoretical model for the proposed instrument. step 2: scale construction step 2 was achieved through the third phase of the study. phase 3 had two subsections that corresponded to the two activities identified in the second step. subsection a entailed the construction of the proposed measure. subsection b entailed a delphi study. sub-phase a entailed the development of the draft screening tool and a pool of test items. the developmental phase included steps as proposed by foxcroft (2013) and taguma (2000). firstly, the intended aim or purpose of the tool was established. secondly, the constructs were defined and operationalised and a pool of items generated. thirdly, decisions were made about the content and format of the test. all of these steps resulted in the prototype. sub-phase b entailed external validation. the delphi method fulfilled the validation process with a panel of 11 experts. the stimulus document included questions about the test construction (scalar) choices such as domain identification, theoretical and operational definitions and item writing. the stimulus document included three sections: (1) the aim and core constructs (i.e. aim, purpose, target population, theoretical and operational definitions), (2) the instrument (i.e. composition of the demographic section and the proposed items of the e3sr) and (3) technical aspects of the prototype (i.e. type of scale, scoring and general administration prompts). revisions were based on the feedback of the panellists. if consensus was reached (above 70%), the prompt or item was retained and not included again in subsequent rounds. stimulus prompts were revised if the level of agreement was between 50% and 70%. items that obtained levels of agreement that were below 35% were revised, replaced or omitted. the replacement stimulus prompts and revised prompts were included in the subsequent rounds. qualitative data were also obtained that assisted with the revision or refinement of constructs and/or items. during round 1, consensus was reached on the majority of questions about the form and function of the prototype. the majority of the items (n = 74) were retained in their original format. twenty items (n = 20) were retained and revised, and 28 items (n = 28) were omitted. seven new items were included in round 2. reversed items initially scored poorly, but were retained and identified as such in the second round. consensus was reached on the form, function and content of the proposed screening tool after the second round after which the delphi was concluded. the delphi study established face and content validity. the findings were incorporated into a pilot version of the screening instrument now named the e3sr. step 3: structural validation the third step was achieved through phase 4. phase 4 entailed a pilot study that aimed to establish the psychometric properties of the instrument. the first activity consisted of a cross-sectional survey conducted with a local sample of 26 preschool teachers in the western cape region in south africa who completed 493 protocols in which they assessed preschool-aged children for emotional and social competence. the survey included a biographic questionnaire and the e3sr. the second activity comprised advanced statistical analysis to determine the psychometric properties of the scale. reliability was assessed through internal consistency. the nine sub-scales showed good to excellent cronbach’s alphas ranging from 0.794 to 0.951. construct validity was established using data reduction methods. the assumptions for data reduction were tested and the results proposed that the data would support factor analytic methods. confirmatory factor analyses supported the theoretical nine-factor solution of the e3sr, whilst exploratory factor analyses provided an improved seven-factor model. the results suggested revisions to increase the model fit. step 4: refinement and revision the fourth step will be achieved through a postdoctoral study. the first activity will entail the revision of the e3sr based on the recommendations of munnik (2018). the revised instrument will be piloted with new samples and the psychometric properties established. the second activity will entail the finalisation of instructional and technical manuals, as well as copyrighting of the e3sr. thereafter permission can be granted to other researchers to use the e3sr in their studies with the agreement to feed information about the scale in those studies back to the scale developer. in this way, the e3sr will be refined and revised. discussion munnik (2018) used the theoretical formulation of devellis (2016) to construct a screening tool for emotional and social competence in preschool children. the first three steps of the model articulated into a four-phased study that contributed to the empirical underpinning of the construction process. methodological rigour was applied to the conceptualisation of the instrument including well-established methodologies such as systematic review, concept mapping, delphi study and survey research. a theoretical model was developed for the proposed scale from the theoretical foundation established through the consolidation of the literature (systematic reviews in phase 1) and stakeholder consultation (concept mapping in phase 2). the contextual sensitivity and relevance of the theoretical model were enhanced through consultation with stakeholders groups in the conceptualisation phase. this process also increased buy-in through stakeholder consultation consistent with the recommendation by pokharel (2009). the conceptual model was operationalised in the construction phase through scalar decisions and item writing resulting in a prototype. the prototype was subjected to a delphi process that provided expert validation. the panel of multidisciplinary experts in the delphi also represented different cultural groupings, providing a second opportunity for enhancing contextual sensitivity. the expert panel endorsed all scalar decisions that established face and content validity in only two rounds attesting to the enhanced quality of the prototype resulting from the more rigorous conceptualisation process. the pilot study used a robust design with a larger than recommended sample to establish construct validity through the combination of exploratory factor analysis (efa) and confirmatory factor analysis (cfa). exploratory factor analysis provided insight into sources of variance, whereas cfa tested the theoretical model underpinning the instrument. the theoretical model for the e3sr was adopted and revisions were recommended for further refinement. further refinement will draw on the proposed revisions and the revised instrument will be piloted with new samples. this will be operationalised through a postdoctoral study aimed at refinement and preparation of the manuals. significance of the study the present study contributed to the identified limited research conducted in south africa on test construction in general and the design of instruments to measure emotional social skills or competencies as a domain of school readiness in particular (e.g. bustin, 2007). the present study also contributed to addressing the lack of reliable and valid instruments, resulting from adaptation, poor test design or inadequate piloting (laher & cockcroft, 2014). this multi-method approach was not mixed methodology. it constituted methodological triangulation between theory and method. this increased the methodological rigour and enhanced the resultant screening tool. the multi-method approach could act as a blueprint or framework for test construction in education and psychology. clinicians usually use a variety of diagnostic and screening tools without an appreciation and acknowledgement of the methodological and conceptual underpinning of the instrument. research and development is often dictated by clinical interest and a focus on content. this study would assist clinicians with shifting to a more balanced position where they are able to use empirical methods in test construction. it also provides a way for clinicians to evaluate new and existing instruments in that the model highlights the important psychometric aspects one has to consider in selecting a test. the present study was a collaboration between the department of education and the university of the western cape, demonstrating the powerful results that could result from such collaborative initiatives. this study forged important stakeholder relationships that paved the way for further adaptation and refinement of the resultant screening tool, ongoing collaborative research and knowledge exchange, as well as knowledge translation of assessment principles and developmental milestones for the target group. ultimately, this process increases the likelihood of adoption into a variety of health professional practices. the operationalisation of devellis’s model through multiple methods might make the theoretical and methodological underpinnings of test construction accessible, understandable and easier to use in scale construction. this, in turn, can foster more effective use and application of instruments, and promote construction and adaptation studies. concluding remarks the methodological choices in the case study contributed to the establishment of a contextually appropriate screening tool designed in and for the south african context. the construction of the e3sr illustrated how various methodologies can be used to strengthen overall design. clear methodological processes with sound methodological decisions assist in enhancing the end product without compromising the process of research. it underscores the importance of explicit methodological decisions and the benefits of using theoretical frameworks. the four-phase study with the respective methodologies proved to be a thorough process that contributed to methodological rigour and coherence despite being time-consuming. the rigour of the empirical process followed during construction provided a strong foundation for the screening instrument that ultimately increased the confidence with which the instrument could be applied in practice. acknowledgements we thank the national research foundation (nrf) for financial support of the research project. the research has not been commissioned nor does it represent the opinions of the nrf. no conditions or prohibitions were placed on the study or dissemination protocol because of the funding. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions both authors participated in the conceptualisation, design, composition, writing and critical revision of the manuscript. funding this research was supported via two grants from the nrf. the first grant was awarded in the thutuka phd funding track from 2014–2016 and the second grant was awarded in the nrf sabbatical grant for completion of phd track in 2018. data availability statement data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer this research has not been commissioned nor does it represent the opinions of the nrf or any affiliated agency of the authors. references boulkedid, r., abdoul, h., loustau, m., sibony, o., & alberti, c. (2011). using and reporting the delphi method for selecting healthcare quality indicators: a systematic review. plos one, 6(6), e20476. https://doi.org/10.1371/journal.pone.0020476 braun, v., & clarke, v. (2006). using thematic analysis in psychology. qualitative research in psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa bustin, c. (2007). the development and validation of a social emotional school readiness scale (doctoral dissertation), university of the free state. creswell, j.w. (2007). qualitative inquiry and research design: choosing among five approaches. 2nd edn. thousand oaks, ca: sage. devellis, r.f. (2016). scale development: theory and applications (vol. 26). us, university of north carolina. los angeles, ca: sage. foxcroft, c.d. (2004). planning a psychological test in the multicultural south african context. south african journal of industrial psychology, 30(4), 8–15. https://doi.org/10.4102/sajip.v30i4.171 foxcroft, c.d. (2011). ethical issues related to psychological testing in africa: what i have learned (so far). online readings in psychology and culture, 2(2), 7. https://doi.org/10.9707/2307-0919.1022 foxcroft, c.d. (2013). developing a psychological measure. in c. foxcroft & g. roodt (eds.), introduction to psychological assessment in the south african context (4th edn., pp. 69–81). cape town: oxford university press. gough, d., oliver, s., & thomas, j. (eds.). (2017). an introduction to systematic reviews. london: sage. grant, m.j., & booth, a. (2009). a typology of reviews: an analysis of 14 review types and associated methodologies. health information and libraries journal, 26(2), 91–108. https://doi.org/10.1111/j.1471-1842.2009.00848.x hasson, f., keeney, s., & mckenna, h. (2000). research guidelines for the delphi survey technique. journal of advanced nursing, 32(4), 1008–1015. https://doi.org/10.1046/j.1365-2648.2000.t01-1-01567.x kline, p. (2015). a handbook of test construction (psychology revivals): introduction to psychometric design. london: routledge. laher, s., & cockcroft, k. (2014). psychological assessment in post-apartheid south africa: the way forward. south african journal of psychology, 44(3), 303–314. https://doi.org/10.1177/0081246314533634 liberati, a., altman, d.g., tetzlaff, j., mulrow, c., gøtzsche, p.c., ioannidis, j.p., & moher, d. (2009). the prisma statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. plos medicine, 6(7), e1000100. https://doi.org/10.1371/journal.pmed.1000100 munnik, e. (2018). the development of a screening tool for assessing emotional social competence in preschoolers as a domain of school readiness (doctoral dissertation). university of the western cape. retrieved from http://hdl.handle.net/11394/6099. novak, j.d., & cañas, a.j. (2006). the theory underlying concept maps and how to construct them (technical report no. ihmc cmap tools 2006-01). pensacola, fl: institute for human and machine cognition. pokharel, b. (2009). concept mapping in social research. tribhuvan university journal, 26(1), 1–6. smith, m.r., franciscus, g., swartbooi, c., munnik, e., & jacobs w. (2015). the sfs scoring system. in m.r. smith (ed., chair), symposium on methodological rigour and coherence: deconstructing the quality appraisal tool in systematic review methodology conducted at the 21st national conference of the psychological association of south africa, south africa. taguma, j. (2000). steps in test construction. paper presented at the annual meeting of the southwestern psychological association, 20–22 april, texas a&m university, dallas, tx. wardlaw, j.m. (2010). advice on how to write a systematic review. retrieved from http://www.sbirc.ed.ac.uk/documents/advice%20on%20how%20to%20write%20a%20systematic%20review.pdf. abstract introduction method results discussion conclusion acknowledgements references about the author(s) david j. schoeman department of psychology, faculty of humanities, university of pretoria, pretoria, south africa nafisa cassimjee department of psychology, faculty of humanities, university of pretoria, pretoria, south africa citation schoeman, d.j., & cassimjee, n. (2022). psychometric properties of the brief sailor resiliency scale in the south african army. african journal of psychological assessment, 4(0), a100. https://doi.org/10.4102/ajopa.v4i0.100 original research psychometric properties of the brief sailor resiliency scale in the south african army david j. schoeman, nafisa cassimjee received: 30 jan. 2022; accepted: 23 aug. 2022; published: 26 oct. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract serving in the military is considered one of the most stressful occupations; therefore, because of the potential mitigation effect resilience has against stressors, it has often been incorporated as a component in predeployment programmes for soldiers. consequently, assessing, facilitating and sustaining resilience is of particular importance in military environments. the brief sailor resiliency scale (bsrs) has been utilised within the south african navy (san) environment, where it yielded promising results as a measure of resilience. the aim of this article is to investigate the psychometric properties of the bsrs and the applicability thereof to the south african army (sa army). the study utilised a sample of sa army soldiers (n = 418) that completed the bsrs along with the brunel mood scale (brums), emotion regulation questionnaire (erq) and the dispositional resilience scale – ii (drs-ii). the psychometric properties of the bsrs were examined through confirmatory factor analysis (cfa) and structural equation modelling (sem), together with construct validity and internal reliability. the model yielded acceptable fit, and the construct validity was supported with high internal reliability of the scales. findings provided confirmatory evidence for the application of the bsrs as a resilience screening tool in the sa army. the utilisation of the bsrs as a valid screening instrument, together with the aligned interventions, can potentially contribute substantially to the combat readiness of the sa army. keywords: assessment; resilience; intervention; sandf; military; measurement. introduction the brief sailor resiliency scale (bsrs) (van wijk & martin, 2019) assesses four domains of resilience: mental, physical, spiritual and social fitness. the bsrs has been utilised locally and specifically within the south african navy (san) environment, where it yielded promising results as a measure of resilience (van wijk & martin, 2019). although the south african army (sa army) and san have different operational environments, resilience is arguably a valued attribute that could enhance individual functioning in both environments. the sa army is routinely utilised for internal and external deployments, and combat readiness of the soldier is a key driver of performance in military environments. therefore, it would be beneficial to adopt a resilience measurement such as the bsrs that can aid in enhancing the combat readiness of the sa army. the aim of this article is to investigate the psychometric properties of the bsrs and the applicability thereof to the sa army. resilience in the military resilience has often been described and defined in terms of the ability to bounce back or thrive and withstand the effects of stressful events (connor & davidson, 2003; smith et al., 2008). although there is some debate regarding the term, most definitions include two aspects: positive adaptation and adversity (fletcher & sarkar, 2013). resilience has been broadly characterised as the ability to maintain healthy psychological and physiological functioning in the presence of high stress and trauma (wu et al., 2013). consequently, assessing, facilitating and sustaining resilience is of particular importance in military environments. the canadian armed forces (caf) define resilience as the capacity of a soldier to recover quickly, resist and possibly even thrive in the face of direct and indirect traumatic events and adverse situations in garrison, training and operational environments (hellewell & cernak, 2018). the australian defence force (adf) defines resilience as the capacity of individuals, teams and organisation to adapt, recover and thrive in situations of risk, challenge, danger, complexity and adversity (gilmore, 2016). although similar to the caf definition, the adf includes teams and the organisation, thus taking a wider system perspective of the construct. it is, however, apparent that resilience is a multifaceted construct, and the ability to not only cope but perform at the best of one’s ability is emphasised. the increased focus on performance has led to concepts such as hardiness and resilience becoming increasingly important in the development of a high-performing soldier (krueckel et al., 2020). hardiness is a personality style that has emerged as a composite of interrelated attitudes of commitment, control and challenge (maddi et al., 2009). hardiness consists of cognitions and attitudes which act as buffers against the negative effects of traumatic and severe life stressors on individual well-being (stoppelbein et al., 2017). research indicated that hardiness could enhance individual resilience through the protection it provides against the effects of stress on health and performance (bartone et al., 2022). these hardy attitudes have been associated with resilience and high performance in both civilian and military samples, specifically under a range of stressful conditions (hystad et al., 2011; maddi et al., 2009). the increase in nontraditional military tasks regularly performed by an army’s soldiers has further underlined the risky, challenging, dangerous, complex and adverse environments soldiers are exposed to (gilmore, 2016). for military personnel to be able to cope with the stress of modern military operations and other aspects of a military career, the importance of optimal psychological resilience cannot be understated (kamphuis et al., 2012). resilience is argued to play a decisive role in performance outcomes, as a lack of resilience has been found to contribute to poor military results and performance (gilmore, 2016). van wijk and martin (2019) pointed out that specific operational environments as faced by san personnel can have deleterious effects on soldier’s well-being. consequently, enhanced resilience has been highlighted as particularly beneficial for naval personnel when withstanding the rigours of military work and life. combat readiness of military personnel pertains to the level of preparedness, both psychologically and physically, through training and interventions aimed at enhancing an individual’s capability to execute specific military tasks successfully (shinga, 2016). therefore, combat readiness of a soldier not only pertains to an absence of ill-health symptoms but also to a state of well-being and an overall resilient state that would empower soldiers to perform optimally in demanding situations and environments. evaluating mood states could provide an indication of psychological distress (van wijk et al., 2013), with a positive affect state being beneficial for individual resilience (daphne, 2020). troy and mauss (2011) proposed that those with a higher internal emotional regulation ability are more likely to display resilience after adversity. although numerous emotional regulation strategies exist, troy and mauss (2011) proposed that the utilisation of cognitive reappraisal strategies lead to more adaptive and less negative emotional responses and subsequently, higher resilience. furthermore, cultivating positive emotions may be particularly useful to build resilience to stressful events (tugade & fredrickson, 2007). reappraisers have been reported as experiencing and expressing a higher level of positive emotions and fewer negative emotions than suppressors (gross & john, 2003). thus, understanding emotional states and implementing interventions focused on reappraisal strategies of positive emotions may enhance resilience when an individual encounters adversity. measures of changes in emotional regulation have proven to be a useful indicator of psychological adaptation in operational deployments (institute for maritime medicine, 2018). as adaptation is an outcome of resilience (van wijk & martin, 2019), measurement of mood states and emotion regulation strategies are useful indicators for determining individual resilience. similar to the san deployments that were investigated by van wijk and martin (2019), the sa army also deploys to areas that can be considered isolated, confined and extreme (ice) environments. internal deployments are usually of a 6-month duration, whilst external deployments are 1 year long, with the possibility of extensions depending on circumstances. internal deployments along the country’s border require soldiers’ involvement with various safeguarding activities. south african national defence force (sandf) members are deployed externally to various countries in a peacekeeping capacity. although military personnel deployed in this capacity experience stressors different from those engaging in active warfare, they are vulnerable to developing stress-related symptoms (platania et al., 2020). serving in the military is considered one of the most stressful occupations (de visser et al., 2016), with major stressors reported by externally deployed soldiers related to the following themes: support, vehicles and equipment, country-related circumstances and conditions and family (semmelink et al., 2020). typical experiences of soldiers included a perceived lack of support, shortage of equipment or apparel, inconsistent delivery of subsistence and sustainment, exposure to the extreme country-specific environments as well as interpersonal family-related stressors such as working away from home for extended periods of time. these themes are indicative of the isolation and extremity of the environment that a soldier experiences on deployment. resilience development has been incorporated as part of a predeployment programme for soldiers because of the potential mitigation effect it has for certain stressors associated with health and performance outcomes (bartone et al., 2022). traumatic responses to events are influenced by pre-exposure resilience (doody et al., 2019). research has indicated that resilience is negatively associated with post-traumatic stress and serves a moderating role between post-deployment stressors and the development of post-traumatic stress symptoms amongst soldiers (wooten, 2012). as the sa army is routinely involved in peacekeeping missions that often place great demands on the individual because of operation-related stressors (koopman & van dyk, 2012), the screening and enhancement of individual soldier resilience during the predeployment phase is likely to hold substantial benefits for individuals functioning on deployment. utilisation of resilience screening measures the increased focus in the military environment on performance-related constructs such as hardiness and resilience (krueckel et al., 2020) highlights the importance of an assessment tool that is relevant for use in the military. ensuring the highest level of own force combat readiness is contingent on valid and reliable assessments providing accurate measures of performance-related aspects. the utilisation of accurate and psychometrically sound performance-related measurements provides potential benchmarks from which training and development can be initiated in order to empower soldiers to confront and overcome challenges that inhibit optimal performance (madrigal et al., 2013). van wijk and martin (2019) alluded to the existence of many available assessments which are relatively effective in predicting resilience in the face of adversity; however, they acknowledged that these instruments are often not a good fit because of the unique environments certain soldiers function in. the four fitness domains of the bsrs stem from the united states air force definitions of the respective fitness domains (air force instruction, 2014). mental fitness relates to the individual’s ability to effectively cope with mental stressors and challenges. physical fitness pertains to the ability to adopt and sustain healthy behaviours needed to enhance individual health and well-being. social fitness is defined as the ability to engage in healthy social networks that promote overall well-being and performance. spiritual fitness refers to adherence to beliefs, principles or values needed to persevere and prevail in accomplishing missions. the four domains perspective of the bsrs links well with the multifaceted conceptualisation of resilience and predeployment screening and assessment of combat readiness of soldiers prior to deployment. this supports the implementation of baseline resilience interventions instituted by the applicable military mental health practitioners for those individuals who appear to be experiencing some fitness challenges in respective domain(s) (van wijk & martin, 2019). following van wijk and martin’s (2019) findings regarding the utility of the bsrs amongst the san, this article explores the psychometric properties of the bsrs as a screening instrument to assess individual soldier resilience in the sa army. valid screening coupled with the appropriate interventions would serve as a useful intervention to enhance individual resilience as well as the broader combat readiness status of the sa army. method participants a total of 418 sa army soldiers participated in the study, with the majority of the sample categorised as infantry soldiers and the remaining participants functioning in different support capacities, such as signallers, engineers and military police. convenience sampling was adopted in order to obtain the largest possible number of participants. table 1 illustrates the composition of the sample. all questionnaires were completed anonymously as personal indicators were not included, and all questionnaires were administered in english. table 1: sociodemographic characteristics of participants. measurements brief sailor resiliency scale the brief sailor resiliency scale bsrs (van wijk & martin, 2019) is an adapted form of the comprehensive airman fitness instrument developed by bowen et al. (2016). the only adaptations were a change to a five-point likert scale and a name change. all original items were retained in the adapted version. the instrument assesses four domains of resilience: mental, physical, spiritual and social fitness. the instrument consists of 12 items. each respondent provides a rating on each statement and item responses range from not at all (0) to completely (4). the sum of the scores obtained for each of the four scales yields a total fitness score. the bsrs has been utilised locally within the san environment and yielded satisfactory psychometric properties, with alpha coefficients ranging from 0.745 to 0.892 for the respective subscales (van wijk & martin, 2019). model fit indices of the san study also indicated an acceptable fit for the original developed model (van wijk & martin, 2019). dispositional resilience scale – ii the dispositional resilience scale – ii (drs-ii) (sinclair et al., 2003) is an 18-item questionnaire designed to measure psychological hardiness. the instrument provides results for six factors: control, powerlessness, commitment, alienation, challenge and rigidity. the instrument incorporates the traditional three factors of hardiness (control, commitment and challenge) as well as an additional three factors. the traditional three factors (control, commitment and challenge) are referred to as the positive dimensions, where higher scores indicate a greater resource in dealing with stress. the additional three dimensions (powerlessness, alienation, rigidity), referred to as the negative dimensions, indicate a greater vulnerability to stress; thus, a lower score on these dimensions would result in a greater degree of hardiness. respondents are provided statements and asked to indicate the extent they feel the statement is true. a five-point likert scale is provided that ranges from definitely false (1) to definitely true (5). the drs-ii was found to be applicable for utilisation on military samples, with validity and reliability analyses showing acceptable results on different international military samples (delahaij et al., 2010; sinclair et al., 2003). brunel mood scale the brunel mood scale (brums) (terry et al., 1999) was developed from the profile of mood states (mcnair et al., 1971). the brums measures six identifiable mood states through a self-report inventory, with respondents rating a list of 24 adjectives. the adjectives are words that describe feelings people have. respondents provide a rating on a five-point likert scale of how they had been feeling the previous week. item responses range from not at all (0) to extremely (4). the six factor-based subscales measured by the scale are: tension, depression, anger, vigour, fatigue and confusion. a total mood distress (tmd) score can also be computed by summing all the subscale scores except for the subscale vigour, which gets subtracted. higher scores on the respective subscales are thus indicative of greater prevalence of the mood state, and a higher tmd score would also then indicate greater mood distress. the instrument has also been utilised locally and specifically within the military, with norms developed on the south african population. reported alpha coefficients ranged from 0.66 to 0.89 for respective subscales (van wijk, 2011). the brums provides an indication of mood changes and has been utilised in the san as a self-reported post-traumatic stress symptoms indicator after deployment (van wijk et al., 2013). emotion regulation questionnaire the emotion regulation questionnaire (erq) was developed to measure two specific aspects related to emotion control: reappraisal and suppression (gross & john, 2003). respondents self-report how they feel about a statement revolving around their emotional experience and expression. respondents are provided with 10 statements and asked to indicate the extent they disagree or agree with each statement. a seven-point likert scale is provided that ranges from strongly disagree (1) to strongly agree (7). calculated total scores of the scales are thus indicative of the preferred strategy of emotional regulation and also provide an indication of the degree of utilisation thereof. the instrument has been utilised locally (ginton et al., 2022; nicholson et al., 2021) with a reported alpha coefficient of 0.85 for the instrument (nicholson et al., 2021). procedure the researcher collected most of the data by visiting respective military units across the country and administering the measurements described here. registered psychologists staffed in the sandf assisted the researcher with data collection when practical constraints limited accessibility. potential participants were informed through the official command channels of the arranged dates for data collection. this procedure was followed to ensure the maximum number of available participants. all participants were briefed about the aim of the study and the voluntary nature of participation, and written consent was also obtained before commencing with the data collection. data analysis data were screened for accuracy, outliers, missing values and normality (hair et al., 2010). questionnaires not completed correctly were removed from the analyses. minimum and maximum values were investigated for each item, and where discrepancies were detected, they were clarified and corrected by referring to the raw data. missing values resulted in the removal of the participants’ data for that particular instrument. following this process, a sample of 418 participants was retained for analyses. a confirmatory factor analysis (cfa), using structural equation modelling (sem) with maximum likelihood estimation of the bsrs four-factor model in line with the originally developed structure, was conducted to determine model fit on the sample. in terms of goodness-of-fit indicators for the models, the following measures (table 2) were used to determine the overall fit of the models (hooper et al., 2008; hu & bentler, 1999). table 2: goodness-of-fit indicators for the models. reliability estimates (cronbach’s alpha coefficients) were computed in order to evaluate internal consistency of the instrument. coefficients > 0.6 are generally regarded as acceptable (field, 2005; hair et al., 2010). construct validity was also assessed utilising bivariate correlations with the results from the bsrs, drs-ii, brums and erq scores. bivariate correlations were conducted only on available data where participants completed every question across all the instruments (n = 366). convergent validity, which indicates the degree to which two measures of the same concept are correlated (hair et al., 2019), was investigated utilising the correlation results between the bsrs scales and applicable scores from the other assessments used in the study. the data for this study were analysed using the statistical package for the social sciences (spss) version 23 (ibm corporation, armonk, new york, united states) in combination with amos graphics 22 (ibm corporation, armonk, new york, united states). ethical considerations the study received approval from the faculty of humanities postgraduate research ethics committee of the university of pretoria (reference number: hum20190107). approval for submission and publication of this article has been provided by defence intelligence (reference number: di/dds/r/3/7). written informed consent was also obtained from the participants. results normality distribution tests of univariate normality were conducted in order to investigate the distribution of the data. hair et al. (2010) argued that data are considered to be normal if the absolute skewness value is below 2 and absolute kurtosis value below 7; however, kline (2011) suggested an absolute value for skewness of below 3 and absolute kurtosis value below 10. only item 5 had values that did not meet the criteria suggested by kline (2011) (skewness = 3.01; kurtosis = 10.83), which indicated a negatively skewed distribution. this could potentially be attributed to individuals responding in a socially desirable manner, which supports the concern highlighted by bowen et al. (2016) that certain items lend themselves to individuals responding in a favourable or expected manner. item 5 forms part of the mental fitness subscale. a comparison of means for this subscale with the san sample does not indicate an observable difference. non-normality can have significant effects when the sample size is small (< 50); however, the impact effectively diminishes when the sample size reaches 200 or more participants (hair et al., 2019). in consideration of the above, it would appear that the distribution of responses is within acceptable parameters. descriptive statistics descriptive statistics for the bsrs total fitness score as well as respective scales are indicated in table 3. the mean scores and standard deviations for the sa army sample and the san sample are provided in table 3. a comparison of mean total fitness score for the sa army (38.8) with the san sample (van wijk & martin, 2019) yielded a very similar result with the san mean score (38.3), although the standard deviation of 7.8 in this sample was slightly higher when compared with the san sample (san sample s.d. = 6.4). table 3: brief sailor resiliency scale descriptive statistics. confirmatory factor analysis confirmatory factor analysis was conducted using sem with maximum likelihood estimation in line with the originally developed structure. an item factor loading indicates the strength of relationship between the item and the factor. all items loaded significantly on the respective factors (≥ 0.4) (field, 2005). in terms of the overall fit of the model, the chi-square statistic was found to be statistically significant with x2(50) = 150.827, p < 0.05, suggesting poor fit of the hypothesised model. the chi-square statistic, however, is sensitive to sample size, with larger samples tending to yield a significant result (bentler & bonnet, 1980; hair et al., 2010). further goodness-of-fit statistics were thus also investigated in order to assess the overall fit of the hypothesised model. in contrast to the chi-square statistic, the following fit indices suggested a good fit of the hypothesised model: root mean square error of approximation (rmsea) = 0.070; comparative fit index = 0.961; goodness-of-fit index = 0.944; standardised root mean square residual = 0.0485. therefore, the original model was supported by the goodness-of-fit indices. the cfa verifies relationships of observed variables and their latent constructs on the sa army sample. figure 1 portrays the validated original structure of the bsrs of this study on the sa army sample. figure 1 portrays the respective subscales (mental, physical, social, spiritual) as second-order factors, which load onto a single higher-order factor (total fitness). figure 1: brief sailor resiliency scale factor structure. reliability pertaining to this study, the bsrs total fitness scale produced a cronbach’s alpha of 0.886. furthermore, all the fitness subscales (mental = 0.733; physical = 0.819; social = 0.862; spiritual = 0.875) were found to have good internal consistency and reliability (> 0.6) (field, 2005; hair et al., 2010). a comparison of cronbach’s alpha coefficients with the san sample is provided in table 4. alphas were found to be very similar in comparison to the san sample, which is indicative of consistency of the bsrs instrument across samples. alphas for the other instruments utilised in the study are also included in table 5. table 4: brief sailor resiliency scale reliability estimates – cronbach’s alpha coefficients. table 5: brief sailor resiliency scale construct validity coefficients and cronbach’s alphas. construct validity correlations between the bsrs firstand second-order factors were all significant, in accordance with the theoretical model as depicted in figure 1. the correlations between the bsrs with other instruments utilised in the study are presented in table 5. the bsrs total fitness score showed a significant positive association with the emotional regulation strategy of reappraisal (r = 0.15), although no significant relationship was found between the emotional regulation strategy of suppression and total fitness. positive hardiness factors were significantly correlated with bsrs total fitness (r = 0.36), together with all the bsrs subscales, except for the physical fitness scale. the bsrs total fitness score showed a significant negative correlation with the brums-tmd (r = −0.35), as well as with the negative hardiness factors assessed (r = −0.28). these results reflect the resilience and positive emotional regulation strategies utilised by the participants. a comparison of correlations between the bsrs scales and the brums-tmd score pertaining to the sa army and san samples indicated a similar trend, although in some cases the sa army sample correlations were not as strong compared with the san sample (table 6). in both samples, the strongest correlation manifested between the brums-tmd score and the mental fitness subscale, followed by the total fitness score. the similar trend and correlations between the samples is indicative of generalisability of the instrument also to the sa army. table 6: brief sailor resiliency scale and brunel mood scale correlations: south african navy† and south african army sample comparison. discussion the findings of the study provide preliminary validation results for the utilisation of the bsrs in the sa army. furthermore, the findings provide confirmatory validation of the originally developed factor structure along with the internal reliability of the scales (bowen et al., 2016). findings of this study also confirmed the construct validity in accordance with results reported for the san (van wijk & martin, 2019). soldiers deployed to dangerous, volatile environments confront numerous operational and performance stressors, and resilience has been established as a buffer for mitigating the stress induced by modern military operations and challenges unrelated to combat (kamphuis et al., 2012). resilience to the effects of stress is vital for maintaining performance and maintaining readiness for deployment (de visser et al., 2016). de visser et al. (2016) also argued that experienced military personnel may be able to mitigate and even utilise stress productively, which is indicative of resilience against the effect thereof. predicting successful adaptations in arduous deployment conditions holds both occupational and operational combat readiness benefits for soldiers (nindl et al., 2018). application of the bsrs in the sandf to assess individual resilience, coupled with appropriate interventions (if needed), could address areas of concern in a predeployment phase and be combined with a mid-deployment assessment as part of continuous monitoring, along with a post-deployment assessment in order to identify any domains for further intervention. the multifactorial nature of military stress (beckner et al., 2021), the performance correlates of resilience (georgoulas-sherry & kelly, 2019) and pre-emptive demands for evolving strategies of adaptation and adjustment to volatile environments foster a need for a multiphase dynamic assessment model. the bsrs addresses the unique reported areas of stress a deployed sandf soldier might experience. major stressor themes related to deployment environments and interpersonal or family relations (semmelink et al., 2020) align well with the bsrs scales of social and physical fitness. the mental fitness scale displays face validity of fostering the right cognitive and psychological outlook in order to deal well with unexpected challenges on deployment. physical fitness relates to physiological health and psychological resilience (nindl, et al., 2018). the use of the instrument in the operational environment in a screening capacity could potentially highlight areas of concern for early intervention, which may fall beyond the scope of routine psychological screenings. as emotional regulation ability is known to influence mood and resilience, an integrative approach for resilience enhancement training is warranted (troy & mauss, 2011). certain cognitive emotional regulation strategies such as refocus on planning and positive reappraisal have been found to increase resilience among individuals with mood disorders (min et al., 2013). therefore, resilience predeployment training could integrate the training of certain regulation techniques in order to enhance resilience, sustain optimal performance and mitigate the impact of emotional dysregulatory predictors of stress-related disorders (platania et al., 2020). the results from this study propose a focus on developing reappraisal as an emotional regulation strategy, as a positive significant correlation was found with resilience, whereas suppression did not yield a significant correlation. furthermore, positive hardiness factors showed stronger correlations than the negative factors with resilience and are potentially indicative of a focus point for resilience enhancement interventions. in conclusion, the bsrs provides the user with an assessment tool that can be utilised to promote and sustain resilience and contribute to the achievement and maintenance of a mission-ready force (bowen et al., 2016). limitations and future research as the bsrs displayed good psychometric properties, further research is needed on the use of the bsrs as a screening instrument in conjunction with relevant interventions and evaluation of interventions. although the bsrs displayed adequate psychometric properties and provides a brief and accurate evaluation of individual resilience in terms of four different facets of resilience, the researcher is of the opinion that one should apply caution when interpreting the result of the social scale. items from the social scale pertain to family, unit or workplace members and friends, thus providing a general indication of social domain fitness. for intervention purposes, a more specific indication would perhaps be more beneficial. as family dynamics were reported as a major stressor on deployment, expansion of the social scale into different subcategories might especially be beneficial for application in the sandf. an expanded scale could assist the mental health professional with a clearer picture of the area of concern for adequate intervention planning. the utilisation of the drs-ii in this study was also a limitation, and further research with the drs-ii is proposed. to date, no published research referencing the validation of the drs-ii on a south african sample could be found by the researcher; consequently, numerous aspects were taken into consideration before the inclusion thereof. the drs-ii has been validated on other international military samples (delahaij et al., 2010; sinclair et al., 2003). the drs-ii results from refinements made to bartone’s dispositional resilience scale (bartone et al., 1989), which has been utilised on south african samples. both these instruments are an adaptation of the personal views survey (hardiness institute, 1985). furthermore, investigation of the drs-ii’s psychometric properties supported the six-factor model as proposed by the developers (sinclair et al., 2003). in the light of these considerations, the drs-ii was included in the study. as the drs-ii yielded favourable results, it is recommended that a separate study should be conducted on larger south african samples in order to further investigate the psychometric properties of this instrument. following from the results derived from two arms of services from the sandf (san and sa army), psychometric properties should be further investigated on samples from the other arms of services (south african military health service and south african air force) for potential utilisation of the bsrs across the broader sandf. conclusion the findings of the study provided preliminary confirmatory evidence for the application of the bsrs as a resilience screening tool in the sa army and support the application of the bsrs as a tool to screen and stream individuals (van wijk & martin, 2019). the utilisation across the sandf warrants further investigation, as it has the potential to make a significant contribution to combat readiness of soldiers and the implementation of multiphase intervention strategies. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions all authors contributed equally to this work through design and implementation of the research, as well as the preparation of the manuscript. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the data supporting the findings of this study is from a south african national defence force sample and therefore not available. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references air force instruction 90 – 506 (2014). comprehensive airman fitness (caf). department of the us air force. bartone, p.t., mcdonald, k., hansma, b.j., stermac-stein, j., escobar, e.m.r., stein, s.j., & ryznar, r. (2022). development and validation of an improved hardiness measure: the hardiness resilience gauge. european journal of psychological assessment. https://doi.org/10.1027/1015-5759/a000709 bartone, p., ursano, r., wright, k., & ingraham, l. (1989). the impact of a military air disaster on the health of assistance workers. the journal of nervous and mental disease, 177(6), 317–328. https://doi.org/10.1097/00005053-198906000-00001 beckner, m.e., main, l., tait, j.l., martin, b.j., conkright, w.r., & nindl, b.c. (2021). circulating biomarkers associated with performance and resilience during military operational stress. european journal of sport science, 22(1), 72–86. https://doi.org/10.1080/17461391.2021.1962983 bentler, p.m., & bonett, d.g. (1980). significance tests and goodness of fit in the analysis of covariance structures. psychological bulletin, 88(3), 588–606. https://doi.org/10.1037/0033-2909.88.3.588 bowen, g.l., jensen, t.m., & martin, j.a. (2016). a measure of comprehensive airman fitness: construct validation and invariance across air force service components. military behavioral health, 4(2), 149–158. https://doi.org/10.1080/21635781.2015.1133345 connor, k.m., & davidson, j.r.t. (2003). development of a new resilience scale: the connor-davidson resilience scale (cd-risc). depression and anxiety, 18(2), 76–82. https://doi.org/10.1002/da.10113 daphne, p. (2020). positive affect and mindfulness as predictors of resilience amongst women leaders in higher education institutions. south african journal of human resource management, 18, 10. https://doi.org/10.4102/sajhrm.v18i0.1260 delahaij, r., gaillard, a., & van dam, k. (2010). hardiness and the response to stressful situations: investigating mediating processes. personality and individual differences, 49(5), 386–390. https://doi.org/10.1016/j.paid.2010.04.002 de visser, e.j., dorfman, a., chartrand, d., lamon, j., freedy, e., & weltman, g. (2016). building resilience with the stress resilience training system: design validation and applications. work, 54(2), 351–366. https://doi.org/10.3233/wor-162295 doody, c.b., robertson, l., uphoff, n., bogue, j., egan, j., & sarma, k.m. (2019). pre-deployment programmes for building resilience in military and frontline emergency service personnel. the cochrane database of systematic reviews, 2019(1), cd013242. https://doi.org/10.1002/14651858.cd013242 field, a.p. (2005). discovering statistics using ibm spss statistics (2nd edn.). sage publications. fletcher, d., & sarkar, m. (2013). psychological resilience a review and critique of definitions, concepts, and theory. european psychologist, 18(1), 12–23. https://doi.org/10.1027/1016-9040/a000124 georgoulas-sherry, v., & kelly, d.r. (2019). resilience, grit, and hardiness: determining the relationships amongst these constructs through structural equation modeling techniques. journal of positive psychology & wellbeing, 3(2), 165–178. retrieved from https://journalppw.com/index.php/jpsp/article/view/90 gilmore, p.w. (2016). leading a resilient force: insights of an australian general. army research paper, p. 11. retrieved from https://researchcentre.army.gov.au/sites/default/files/161107_gilmore_resilient.pdf ginton, l.m., vuong, e., lake, m.t., nhapi, r.t., zar, h.j., yrttiaho, s., & stein, d.j. (2022). investigating pupillometry to detect emotional regulation difficulties in post-traumatic stress disorder. the world journal of biological psychiatry: the official journal of the world federation of societies of biological psychiatry, 23(2), 127–135. https://doi.org/10.1080/15622975.2021.1935316 gross, j.j., & john, o.p. (2003). individual differences in two emotion regulation processes: implications for affect, relationships, and well-being. journal of personality and social psychology, 85(2), 348–362. https://doi.org/10.1037/0022-3514.85.2.348 hair, j.f., black, w.c., babin, b.j., & anderson, r.e. (2010). multivariate data analysis (7th edn.). pearson education limited. hair, j.f., black, w.c., babin, b.j., & anderson, r.e. (2019). multivariate data analysis (8th edn.). pearson education limited. hardiness institute. (1985). personal views survey. the hardiness institute. hellewell, s.c., & cernak, i. (2018). measuring resilience to operational stress in canadian armed forces personnel. journal of traumatic stress, 31(1), 89–101. https://doi.org/10.1002/jts.22261 hooper, d., coughlan, j., & mullen, m. (2008). structural equation modelling: guidelines for determining model fit. electronic journal of business research methods, 6(1), 53–60. retrieved from https://core.ac.uk/download/pdf/297019805.pdf hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling: a multidisciplinary journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 hystad, s.w., eid, j., laberg, j.c., & bartone, p.t. (2011). psychological hardiness predicts admission into norwegian military officer schools. military psychology, 23(4), 381–389. https://doi.org/10.1080/08995605.2011.589333 institute for maritime medicine. (2018). usefulness of the brums for mobilisation/demobilisation of ship-based maritime operations. technical report 14 december 2018. institute for maritime medicine. kamphuis, w., venrooij, w., & van den berg, c. (2012). a model of psychological resilience for the netherlands armed forces. retrieved from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1086.9456&rep=rep1&type=pdf kline, r.b. (2011). principles and practice of structural equation modeling (5th edn.). the guilford press. koopman, r., & van dyk, g.a.j. (2012). peacekeeping operations and adjustment of soldiers in sudan: peace in the minds and hearts of soldiers? african journal on conflict resolution, 12(3), 53–76. krueckel, o., heidler, a., von luedinghausen, n., auschek, m., & soet, m. (2020). building resilience and hardiness in military leaders – robustness training programs of the german army. in u. khumar (ed.), the routledge international handbook of military psychology and mental health (pp. 151–163). routledge. maddi, s.r., harvey, r.h., khoshaba, d.m., fazel, m., & resurreccion, n. (2009). the personality construct of hardiness, iv. journal of humanistic psychology, 49(3), 292–305. https://doi.org/10.1177/0022167809331860 madrigal, l., hamill, s., & gill, d. (2013). mind over matter: the development of the mental toughness scale (mts). the sport psychologist, 27(1), 62–77. https://doi.org/10.1123/tsp.27.1.62 mcnair, d.m., lorr, m., & droppleman, l.f. (1971). manual for the profile of mood states. educational and industrial testing services. min, j.a., yu, j.j., lee, c.u., & chae, j.h. (2013). cognitive emotion regulation strategies contributing to resilience in patients with depression and/or anxiety disorders. comprehensive psychiatry, 54(8), 1190–1197. https://doi.org/10.1016/j.comppsych.2013.05.008 nicholson, l.r., lewis, r., thomas, k.g.f., & lipinska, g. (2021). influence of poor emotion regulation on disrupted sleep and subsequent psychiatric symptoms in university students. south african journal of psychology, 51(1), 6–20. https://doi.org/10.1177/0081246320978527 nindl, b.c., billing, d.c., drain, j.r., beckner, m.e., greeves, j., groeller, h., teien, h.k., marcora, s., moffitt, a., reilly, t., taylor, n.a.s., young, a.j., & friedl, k.e. (2018). perspectives on resilience for military readiness and preparedness: report of an international military physiology roundtable. journal of science and medicine in sport, 21(11), 1116–1124. https://doi.org/10.1016/j.jsams.2018.05.005 platania, s., castellano, s., petralia, m.c., digrandi, f., coco, m., pizzo, m., & di nuovo, s. (2020). the moderating effect of the dispositional resilience on the relationship between post-traumatic stress disorder and the professional quality of life of the military returning from the peacekeeping operations. mediterranean journal of clinical psychology, 8(3), 1–21. https://doi.org/10.6092/2282-1619/mjcp-2560 semmelink, d.s., matebula, t.t., & ngwenya, m. (2020). a post deployment investigation into the experiences of soldiers in the drc. unpublished research report. human factor combat readiness department, military psychological institute. shinga, d.n. (2016). factors involved in combat readiness in africa. in g. van dyk (ed.), military psychology for africa (pp. 261–287). african sun press. retrieved from http://www.iamps.org/papers/shinga_factors%20involved%20in%20cr%20in%20africa.pdf sinclair, r.r., & oliver, c.m., ippolito, j., & ascalon, e. (2003). development and validation of a short measure of hardiness. portland state university. retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/a562799.pdf smith, b.w., dalen, j., wiggins, k., tooley, e., christopher, p., & bernard, j. (2008). the brief resilience scale: assessing the ability to bounce back. international journal of behavioral medicine, 15, 194–200. https://doi.org/10.1080/10705500802222972 stoppelbein, l., mcrae, e., & greening, l. (2017). a longitudinal study of hardiness as a buffer for posttraumatic stress symptoms in mothers of children with cancer. clinical practice in pediatric psychology, 5(2), 149–160. https://doi.org/10.1037/cpp0000168 terry, p.c., lane, a.m., lane, h.j., & keohane, l. (1999). development and validation of a mood measure for adolescents. journal of sports sciences, 17(11), 861–872. https://doi.org/10.1080/026404199365425 troy, a.s., & mauss, i. (2011). resilience in the face of stress: emotion regulation as a protective factor. in s.m. southwick, b.t. litz, d. charney & m.j. friedman (eds.). resilience and mental health: challenges across the lifespan (pp. 30–34). cambridge university press. tugade, m.m., & fredrickson, b.l. (2007). regulation of positive emotions: emotion regulation strategies that promote resilience. journal of happiness studies: an interdisciplinary forum on subjective well-being, 8(3), 311–333. https://doi.org/10.1007/s10902-006-9015-4 van wijk, c.h. (2011). the brunel mood scale: a south african norm study. south african journal of psychiatry, 17(2), 44–54. https://doi.org/10.4102/sajpsychiatry.v17i2.265 van wijk, c.h., & martin, j.h. (2019). a brief sailor resilience scale for the south african navy. african journal of psychological assesment, 1(1), 1–8. https://doi.org/10.4102/ajopa.v1i0.12 van wijk, c.h., martin, j.h., & hans-arendse, c. (2013). clinical utility of the brunel mood scale in screening for post-traumatic stress risk in a military population. military medicine, 178(4), 372–376. https://doi.org/10.7205/milmed-d-12-00422 wooten, n. (2012). deployment cycle stressors and post-traumatic stress symptoms in army national guard women: the mediating effect of resilience. social work in health care, 51(9), 828–849. https://doi.org/10.1080/00981389.2012.692353 wu, g., feder, a., cohen, h., kim, j.j., calderon, s., charney, d.s., & mathé, a.a. (2013). understanding resilience. frontiers in behavioral neuroscience, 7, 10. https://doi.org/10.3389/fnbeh.2013.00010 abstract introduction method results discussion conclusion acknowledgements references about the author(s) chevon p. haarhoff department of psychology, college of human sciences, university of south africa, pretoria, south africa christi gadd private, pretoria, south africa boshadi semenya department of psychology, college of human sciences, university of south africa, pretoria, south africa rené van eeden department of psychology, college of human sciences, university of south africa, pretoria, south africa citation haarhoff, c.p., gadd, c., semenya, b., & van eeden, r. (2020). standardising the single and double letter cancellation test for south african military personnel. african journal of psychological assessment, 2(0), a19. https://doi.org/10.4102/ajopa.v2i0.19 original research standardising the single and double letter cancellation test for south african military personnel chevon p. haarhoff, christi gadd, boshadi semenya, rené van eeden received: 18 oct. 2019; accepted: 27 mar. 2020; published: 08 june 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract neuropsychological testing is widely used for specialised placements within the military. within the south african national defence force (sandf), there is concern about the representation of the normative information currently available for these tests. the letter cancellation test, a paper-and-pencil-based test used as a quick measure of attention, is subject to unstandardised administration and scoring procedures as well as broad cut-off scores. the aim of this study was to develop detailed administration and scoring procedures for the single and double letter cancellation test and to provide preliminary normative data on these versions of the test in the sandf. a non-probability sampling strategy resulted in a sample of 292 participants. normative data are provided for the total sample and classified into three performance categories: omissions, errors and time. between-group comparisons indicated gender and age-related differences (but no differences for rank) in terms of time, and normative data are therefore also provided for related subgroups. keywords: attention; concentration; distractor stimuli; target stimuli; military; letter cancellation test. introduction neuropsychological assessment is integral to clinical work (lucas, 2013) and also forms part of test batteries used in organisations such as mining, manufacturing, construction and the military. psychological testing within the military has become invaluable in the assessment and preparation of its personnel. nwafor and adesuwa (2014) described the use of psychological testing within the military context as a process that takes place on a continuum, starting from recruitment where an individual is assessed, to job utilisations for promotions and placements, to special missions and the diagnosis and treatment of disorders, and this continues until their retirement. within the south african military context, its personnel perform a wide array of functions and occupational duties each of which has its own specific requirements and criteria. attention was highlighted as a central neurocognitive skill that is necessary for highly specialised occupational duties as well as simple everyday functions in the military. kennedy and zillmer (eds. 2012) stated that soldiers are required to ‘maintain high levels of consistent attention and concentration in order to perform effectively and safely’ (p. 199). even when preparing for the start of a day, the command of ‘attention’ is often given by the commander in order to make all soldiers focus on their duties for the day. it is therefore a standard practice to include a measure of attention as part of an assessment battery. one of the major concerns in the field of psychological testing for this context is, however, ensuring that the normative data are representative in terms of military personnel. military personnel take up many different jobs, such as pilots, weapon handling, medical staff and deployment. it is therefore essential that there are tests available that can help evaluate attention and concentration in order to ensure that the individuals are competent enough to carry out their specific duties. currently, related tests are used in the military for specialised career placements. these tests are also used as part of the soldier’s rehabilitation processes. even slight impairments in attention and concentration which can be a result of traumatic brain injury can have substantial repercussions for a soldier’s effectiveness while on duty or in combat during the recovery period (hatta, yoshizaki, ito, mase, & kabasawa, 2012; kennedy & zillmer, 2012). attentional disorders (e.g. attentional deficit and hyperactivity disorder, perseveration and distractibility, confusional state and visual neglect), if undetected or not treated, would also impact on effective functioning in this context. for example, risk factors associated with attentional deficit and hyperactivity disorder include: [s]lowed information processing; error proneness secondary to lapses of judgement, impulsivity, and poor problem-solving skills; limited capacity to multitask or perform when divided attention is required; difficulty with set shifting; problems with attention to detail; and difficulties with task organization. (eds. kennedy & moore, 2010, p. 208) in a classic quote by psychologist william james, attention is defined as processing ‘one out of what seem several simultaneously possible objects or trains of thought … it implies withdrawal from some things in order to deal effectively with others’ (james, 1890, pp. 403–404). a person’s capacity for paying attention to daily activities is crucial for the successful completion of everyday tasks (lezak, howieson, & loring, 2004). in order for individuals to function effectively, they need the ability to focus on the task at hand while simultaneously ignoring other distracting factors. this ability requires them to filter, select, focus, shift and track information (groth-marnat, 2009). stankov (1988) identified six components of attention: attentional span refers to the size of an individual’s capacity to hold information in mind to allow processing. concentration encompasses ‘the capacity to sustain attention on relevant stimuli and the capacity to ignore irrelevant competing stimuli’ (scott, 2011, p. 149). concentration requires sustained focus on a task over a period of time. search speed refers to the time of target selection when visually searching through a series of items for an identified target, or to detect similarities or differences (cohen, 2013). divided attention refers to the ability to respond to more than one stimuli at a time (baron, 2004). in everyday language, we may refer to this ability as multi-tasking. selective attention refers to attending to certain stimuli while disregarding other irrelevant stimuli (glisky, 2007). attention switching refers to the capacity to ‘consciously reallocate attentional resources from one activity to another’ (hebben & milberg, 2009, p. 108). as attention consists of a variety of processes, a comprehensive test of attention would consequently measure a range of these processes. based on a review of existing literature, coetzer and balchin (2014), lezak, howieson, bigler and tranel (2012), mirsky, anthony, duncan, ahearn and kellam (1991) and stump (2002) recommended letter cancellation tests as comprehensive measures of attention. letter cancellation tests can also be considered as screening tests. hatta et al. (2012) stated that cancellation tests are simple, yet effective measures of attention as they are cost-effective and applicable over a wide spectrum of age groups. cancellation tests are usually paper-and-pencil tests where an individual needs to identify and cancel target items (azouvi et al., 2006). most cancellation tests consist of target stimuli that are distributed amongst distractor stimuli. the target stimulus is the identified symbol or letter that the individual needs to identify and cancel, while the distractor stimuli aim to divert the individual’s attention from the target stimulus. performance is scored by recording the number of omissions, errors and time taken to complete the test (lezak et al., 2004). the use of cancellation tests has been extensively documented in neuropsychological literature as measures of: visual selectivity and sustained attention (lezak et al., 2004; mitrushina, boone, razini, & d’elia, 2005) processing speed, perceptual speed and visuomotor ability (mccrea & robinson, 2011) ‘visual selectivity at fast speed with a repetitive motor response’ (lezak et al., 2004, p. 378) when timed. a range of cancellation tests are discussed in literature including line bisection tests, symbol cancellation tests and letter cancellation tests. each test differs in terms of the stimuli used and the method of administration and scoring. letter cancellation tests, the focus of this study, make use of a certain letter(s) as target stimuli that are distributed in columns and rows amongst other letters that serve as the distractor stimuli. examinees are then required to cancel the target letter(s) distributed amongst the distractor letters. there are various types of letter cancellation tests, for instance, the single letter cancellation test, double letter cancellation test (lezak et al., 2004) and six letter cancellation test (pradhan, 2013). this study focussed on the single and double letter cancellation tests. studies on standardisations of the letter cancellation test have been conducted in different contexts. amongst others, these include the original development for the 1946 birth cohort study in a british context (richards, kuhn, hardy, & wadsworth, 1999), the use of the letter cancellation test on american samples (uttl & pilkenton-taylor, 2001; warren, moorre, & vogtle, 2008) and normative data for indian school going children (pradhan & nagendra, 2008). these studies all differ in terms of the administration and scoring instructions utilised. the differing administration and scoring instructions pose a challenge to the reliable use of the letter cancellation test as administration with differing instructions alters the quality of the responses by the participants, thus compromising comparability of test results (groth-marnat, 2009). each of these studies also developed differing sets of normative data. in addition, no south african standardisations were found. the letter cancellation test has been used in research studies in south africa (e.g. jossub, cassimjee, & cramer, 2017); however, to date, there have been no studies focussing on the suitability of the test for south african populations. practitioners and academics in the south african context indicate that lezak et al.’s (2012) international guideline that ‘normal performance limits have been defined as 0-2 omissions in 120 seconds’ (p. 381) is used. this, however, provides a vague description of the scores, was developed on an international platform and does not allow for consideration of the impact of south african socio-demographic variables on test performance. according to nell (2000) and shuttleworth-edwards (2016) neuropsychological tests without relevant normative data place clinicians at risk of misdiagnosing their patients. anderson (2001) argued that, ‘the injudicious use of imported normative data could result in an unacceptably high diagnostic rate of neuropsychological impairment in otherwise healthy south africans’ (p. 33). in particular, no normative data are available which provide for the context-specific demographics and skills profile of the south african military environment. the letter cancellation test is a paper-and-pencil test that may prove beneficial as a quick measure for the attention (pradhan & nagendra, 2008) of military personnel. given that crucial decisions are made using test results, appropriate normative data are essential to ensure fairness. therefore, this study set out to standardise the letter cancellation test for military personnel in the south african national defence force (sandf), by: constructing standardised administration and scoring procedures for testing investigating the influence of demographic variables with a view to establishing subgroup normative data for military personnel establishing preliminary normative data for a sample of military personnel. method participants the target population comprised military personnel in the sandf. non-random (voluntary) sampling was used, resulting in an initial selection of 300 participants. the sample comprised people who were multilingual. demographic variables of interest were age (the majority of the military personnel are 18–49 years old), gender (approximately 30% of the population are female and 70% male) and rank (15% are officers and 85% non-commissioned officers) (defence web, 2011; martin, 2015). the latter refers to the level of seniority in terms of military rank and is regarded as relevant to assessment-related research conducted in the sandf. the majority of the participants were right-handed (93.8%) – handedness is a variable of importance when conducting neuropsychological tests. level as well as quality of education has been shown to influence neuropsychological test performance (lucas, 2013), especially in the case of cognitive batteries with a higher level of complexity. in the present sample, 97% of the participants completed grade 12 and 38% obtained further qualifications. education was therefore not regarded as a challenge considering the nature of cancellation tasks (see brucki & nitrini, 2008). individuals with a history of attention or neurological disorders, and those with visual impairments were excluded to limit confounding variables that might impact on testing performance. participants were also screened for current use of chronic medication that might impact on their performance. the resulting sample comprised 292 participants. representation in terms of age, gender and rank is illustrated in table 1. table 1: sample frequencies: age, gender and rank (n = 292). instruments the aim of this study was to develop standardised administration and scoring procedures for the letter cancellation test before establishing normative data on the test. two trials of the letter cancellation test were constructed for the data collection of this study, namely the single (h) letter cancellation test and the double (ce) letter cancellation test. this was done to establish normative data for simple and double mental tracking. currently, the existing h letter cancellation test used to assess single mental tracking, which is presented in the work of lezak et al. (2004), is made up of two parts. existing scoring procedures present the two parts of the letter cancellation test, with an overall score and total number of errors and omissions (lezak et al., 2004; pradhan & nagendra, 2008; uttl & pilkenton-taylor, 2001). because tests of sustained attention require prolonged tasks, modifications were made to the number of parts when compiling both the single and double letter cancellation tests for use in collecting empirical data for this study. the length of both the h and ce letter cancellation tests was expanded from two to six parts to allow for the assessment of sustained attention. the formats of the parts are consistent – a group of letters arranged in the same number of lines. in order to ensure uniform administration of the letter cancellation test, the instructions were documented in text and the test administrators were required to read it out verbatim so that the testing instructions remained consistent. clear and detailed instructions were provided on how to complete the test: firstly, participants were instructed to scan the test from left to right, and then to go down one row at a time following the same scanning process, and to cancel targets by striking out the specified letter using a pencil. the second instruction was that their performance on the test will be timed and they were required to work as quickly as they could. they were also informed that there was no specific time limit imposed on how long they should take to complete the test. lastly, participants were informed that they would be completing two trials of the test. additionally, a scoring profile was created so that the scoring remained consistent, thus enhancing the integrity of the study. this document was constructed to record participants’ time and performance in each part of the test. test administrators were instructed to record the time taken to complete the task (in seconds), the number of errors made (i.e. non-target items erroneously identified), the number of omitted letters (i.e. target items not identified) and any self-correcting attempts for each part in order to establish what is significant and what is not. a second scoring sheet was included for recording qualitative notable observations made during testing. this study, therefore, provides test scores for each of the six parts of the h and ce letter cancellation tests in terms of time, error and a total score, a significant improvement on earlier scoring procedures. the proposed detailed scoring aims to provide clinicians with more comprehensive information on the letter cancellation test, and to further aid assessment and diagnostic practices. procedures all sandf members have their health status examined annually. appointments are made on a random basis implying that at any given period, representation in terms of the specified stratification variables (age, gender, rank) could be expected amongst those being assessed. participants were recruited on a voluntary basis during an arbitrary selected period of assessments. they were primarily from the gauteng assessment centre with some participants selected from the western cape centre. (note the former centre often also caters for members from other provinces.) all possible efforts were made to ensure that the testing environment was comfortable and reasonably quiet. a screening questionnaire was completed by all participants. socio-demographic information was obtained and participants had to answer questions regarding their suitability for the study. psychologists (clinical and counselling) and registered counsellors employed in the sandf administered the test on an individual basis. the administrators attended a training session and also met with the researcher before each session to prepare for the testing. the tests were administered in english. this is the main medium of communication in the sandf, and as such, proficiency in the language is a requirement and could be assumed in this study. ethical consideration ethical clearance was obtained from the university of south africa (unisa) ethics committee, reference number: sg (d psych)/r/104/10/5, for a study involving human participants. in the case of the sandf, the chain of command implied clearance by various structures, departments and units; (defence intelligence), reference number: di/ dds/r/202/3/7 and (military health service), reference number: amhf/r/104/10/05. in the case of the latter, the chain of command implied clearance by various structures, departments and units. permission was also granted for collecting and using the data for a master’s dissertation and for publishing the results in a journal. informed consent was obtained from all participants, and confidentiality was maintained by securing the data (a locked cupboard and password protection) and ensuring that no personal information was published. arrangements were made for appropriate referral should the test results indicate the need for further intervention in individual cases. data analyses descriptive statistics (i.e. means and standard deviations) were calculated for the two trials of the test (h and ce letter cancellation test), for each of the six parts, and for each score, that is, time, omissions and errors made. comparative analyses were conducted to determine if selected demographic variables had a significant impact on test performance (and thus warranted separate tables for comparison). analyses were only performed in cases where the cell size was at least n = 30. the sample size allowed for an independent samples t-test to be used to compare the performance of the gender groups and the different ranks, whereas the role of age was investigated by means of one-way analysis of variance (anova). in the case of the latter, significant results were further explored by means of post-hoc comparisons using the tukey’s honest significant difference (hsd) test (pallant, 2016) to determine which specific group means differ from each other. visual representation was considered to determine the normality of the distributions. in addition, the shapiro–wilk test and the kolmogorov–smirnov test were conducted. results descriptive statistics the means and standard deviations for each of the six parts of the h letter cancellation test and the ce letter cancellation test are provided in table 2 for the different scoring categories (i.e. omissions, errors and time). the number of errors made in both versions was small with no errors recorded in some parts of the tests. in both versions, performance was progressively slower in the different parts of the tests. table 2: descriptive statistics for the h and the ce letter cancellation tests (n = 292). only in the case of time taken to complete the tests did the distributions resemble normality (see pillay, 2017 for detail). however, all results could be regarded as right skewed, and this has implications for the interpretation of the typical performance of the target population. demographic variables: gender, rank and age independent samples t-tests showed no significant differences between males (n = 198) and females (n = 92) in terms of omissions and errors on both the h and ce letter cancellation tests. significant differences were, however found for time scores on all parts of the tests with females performing the tasks in less time than males (refer to tables 3 and 4). no significant differences were found between officers (n = 53) and non-commissioned officers (n = 238). the performance of four age categories (20–29 years, n = 100; 30–39 years, n = 101; 40–49 years, n = 72; and 50–59 years, n = 18) was compared by means of anova. no significant differences were found in terms of omissions and errors but the groups did differ on the time scores (refer to tables 5 and 6). post-hoc comparisons using the tukey’s hsd test indicated that these differences were between those younger than 40 and those older than 40, with the latter performing slower on the tasks. the descriptive statistics for time total for gender by age illustrate these trends (table 7). table 3: independent samples t-test: comparison of time scores for males (n = 198) and females (n = 92) on the h letter cancellation test. table 4: independent samples t-test: comparison of time scores for males (n = 198) and females (n = 92) on the ce letter cancellation test. table 5: anova: comparison of time scores for age groups on the h letter cancellation test. table 6: anova: comparison of time scores for age groups on the ce letter cancellation test. table 7: descriptive statistics for total time for gender by age. discussion at present, comparative data for the ce letter cancellation test are limited to an overall score (for two parts) and the statement that ‘normal performance limits have been defined as 0–2 omissions in 120 seconds’ (diller, ben-yishay, & gerstman, 1974; lezak et al., 2012, p. 381). after standardising the administration and scoring procedures for the test, this study provides detailed data on three performance categories (i.e. omissions, errors and time) on the total score as well as the six parts of the h and ce letter cancellation tests respectively. for the target population in this study, normal performance limits on the h letter cancellation test are defined as 196.57 seconds, with 2.59 omissions (for six parts). for the ce letter cancellation test, the normal performance limits are defined as 316.03 seconds, with 12.73 omissions (for six parts). although diller et al. (1974) found that there were no significant differences in performance based on gender and age, the present study did find age and gender to impact on an individual’s time taken to complete the tests. in addition to the comparative data for the total sample provided in this manuscript, stratification in terms of age and gender was necessary. significant differences between males and females in terms of the time taken to complete the tasks are consistent with the findings of pradhan and nagendra (2008), upadhayay and guragain (2014) and uttl and pilkenton-taylor (2001). upadhayay and guragain (2014) also found that women performed faster than men in paper-and-pencil tests. these findings could be partly explained by the fact that different parts of men’s and women’s brains are activated during different tasks, thus demonstrating that the genders utilise different parts of their brains to solve problems (brizendine, 2009). significant differences were also found between those below and above 40 years of age with the latter taking more time to complete the tasks. age-related decline in speed for the letter cancellation test has been reported previously (pradhan & nagendra, 2008; uttl & pilkenton-taylor, 2001). this may be accounted for by age-related slowing and attentional deficits (erel & levy, 2016; fortinash & worret, 2014; kramer & madden, 2011). deficits have been noted in the ability to selectively attend to certain tasks (brink & mcdowd, 1999; glisky, 2007), for example, when requiring an individual to focus their attention on one stimulus among several other sets of information. madden et al. (2007) found that older adults performed slower and less accurately than younger adults in visual search tests. military personnel are recruited at a young age, based on their functioning at that point in time. continuous evaluation of fitness for duty would therefore imply the need to assess any decline, especially in attention associated with normal ageing in addition to those associated with injury. skewness could be attributed to the target population being a pre-selected group. according to kennedy and moore (eds. 2010), some samples of the military population outperform the general population on neuropsychological tests, as only healthy and generally well-functioning individuals are considered to be fit for duty. nwafor and adesuwa (2014) further supported this by adding that the specialised skills required by soldiers for their operational and functional duties require them to function higher than the general population. the raw scores can be converted to standard scores by means of z-score conversions using the typical performance presented in tables 2 and 7. a z-score represents the distance from the mean expressed in standard deviation units (i.e. z = (the raw score – the mean)/the standard deviation). it is important to note that the distribution of z-scores has the same form as the raw scores on which they are based. in this instance, the z-scores will therefore be right skewed and not normally distributed. although these scores do not have the statistical advantages of normally distributed scores, conversation will nevertheless allow for comparison within and between individuals in this population. in the case of age and gender stratification for total time, comparisons will be limited to each specified group (e.g. females, 20–29 years) (gadd & phipps, 2012). conclusion a major contribution of this study is the development of standardised administration and scoring procedures for a test of attention. additionally, the military context implies a need for appropriate normative data on this construct. however, larger sample sizes are required for adequate representation in terms of some of the demographic variables (i.e. individuals older than 50 years, females older than 40 years and left-handed individuals). this will also enable further exploration of the distribution of the performance. standard scores based on the present data set cannot be interpreted in terms of the properties of a normal distribution. these recommendations would allow for a comprehensive standardisation and evaluation of the psychometric properties of the test in the military context. the study furthermore involved a highly specific subgroup of the general population and replication studies including additional subpopulations should be considered. the letter cancellation test is widely used despite being subject to unstandardised administration and scoring procedures and broad cut-off scores. this study provides a review of the letter cancellation test and puts forward improved administration procedures, detailed scoring methods and relevant normative data for adequate sample sizes. this was done to provide clinicians in the sandf with meaningful scores for interpretation and to guide future developments in the wider south african context. acknowledgements the authors would like to thank the sandf for allowing them to conduct the research and supporting the publication of the results. competing interests the authors declare that no competing interests exist and this is an original research. authors’ contributions this empirical work was conducted by c.p.h. with assistance from c.g. and guidance from b.s. and r.v.e. the authors co-wrote the manuscript. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability statement data sharing is not applicable because of the sensitive nature thereof. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references anderson, s.j. (2001). on the importance of collecting local neuropsychological normative data. south african journal of psychology, 31(3), 29–34. https://doi.org/10.1177/008124630103100304 azouvi, p., bartolomeo, p., beis, j.m., perennou, d., pradat-diehl, p., & rousseaux, m. (2006). a battery of tests for the quantitative assessment of unilateral neglect. restorative neurology and neuroscience, 24(4–6), 273–285. baron, i.s. (2004). neuropsychological evaluation of the child. new york, ny: oxford university press. brink, j.m., & mcdowd, j.m. (1999). aging and selective attention: an issue of complexity or multiple mechanisms? journal of gerontology, 54b(1), 30–33. https://doi.org/10.1093/geronb/54b.1.p30 brizendine, l. (2009). the female brain. london: transworld publishers. brucki, s.m.d., & nitrini, r. (2008). cancellation task in very low educated people. archives of clinical neuropsychology, 23(2), 139–147. https://doi.org/10.1016/j.acn.2007.11.003 coetzer, r., & balchin, r. (2014). working with brain injury: a primer for psychologists in under-resourced settings. london: psychology press. cohen, r.a. (2013). the neuropsychology of attention. new york, ny: springer. defence web. (2011). fact file: sandf regular force levels by race & gender: april 30, 2011. retrieved from www.defenceweb.co.za/index.php?option=com_content&view=article&id=16708:fact-fil-sandf-regular-force-levels-by-race-a-gender-april-30-2011-&catid=79:fact-files&itemid=159 diller, l., ben-yishay, y., & gerstman, l.j. (1974). studies in cognition and rehabilitation in hemiplegia. new york, ny: new york medical centre institute of rehabilitation medicine. erel, h., & levy, d.a. (2016). orienting of visual attention in aging. neuroscience and biobehavioural reviews, 69(1), 357–380. https://doi.org/10.1016/j.neubiorev.2016.08.010 fortinash, k.m., & worret, p.a.h. (2014). psychiatry mental health nursing. st louis: elsevier. gadd, c., & phipps, w.d. (2012). a preliminary standardisation of the wisconsin card sorting test for setswana-speaking university students. south african journal of psychology, 42(3), 389–398. https://doi.org/10.1177/008124631204200311 glisky, e.l. (2007). changes in cognitive function in human aging. in d.r. riddle (ed.), brain aging: models, methods, and mechanisms (pp. 4–17). boca raton, fl: crc press. groth-marnat, g. (2009). handbook of psychological assessment (5th edn.). hoboken, nj: john wiley & sons. hatta, t., yoshizaki, k., ito, y., mase, m., & kabasawa, h. (2012). reliability and validity of the digit cancellation test, a brief screen of attention. psychologia, 55(4), 246–256. https://doi.org/10.2117/psysoc.2012.246 hebben, n., & milberg, w. (2009). essentials of neuropsychological assessment. hoboken, nj: john wiley & sons. james, w. (1890). the principles of psychology. new york, ny: holt. jossub, n., cassimjee, n., & cramer, a. (2017). the relationship between neuropsychological performance and depression in patients with traumatic brain injury. south african journal of psychology, 47(2), 171–183. https://doi.org/http://hdl.handle.net/10520/ejc-76e5a568a kennedy, c.h., & moore, j.l. (eds.). (2010). military neuropsychology. new york, ny: springer. kennedy, c.h., & zillmer, e.a. (eds.). (2012). military psychology: clinical and operational applications. new york, ny: guilford press. kramer, a.f., & madden, d.j. (2011). attention. in f.i.m. craik & t.a. salthouse (eds.), the handbook of aging and cognition, pp. 189–250. hoboken, nj: taylor & francis group. lezak, m.d., howieson, d.b., bigler, e.d., & tranel, d. (2012). neuropsychological assessment. new york, ny: oxford university press. lezak, m.d., howieson, d.b., & loring, d.w. (2004). neuropsychological assessment (4th edn.). new york, ny: oxford university press. lucas, m. (2013). neuropsychological assessment in south africa. in s. laher & k. cockcroft (eds.), psychological assessment in south africa: research and applications (pp. 186–200). johannesburg: wits university press. madden, d.j., spaniol, j., whiting, w.l., bucur, b., provenzale, j.m., cabeza, r., white, l.e., & huettel, s.a. (2007). adult age differences in the functional neuroanatomy of visual attention: a combined fmri and dti study. neurobiological aging, 28(3), 459–476. https://doi.org/10.1016/j.neurobiolaging.2006.01.005 martin, g. (2015). sa army making effort to ensure equal demographic representation. retrieved from www.defenceweb.co.za/index.php?option=com_content&view=article&id=39556:sa-army-making-effort-to-ensure-equal-dempgraphic-representation&catid=111:sa%20defence&itemid=242 mccrea, s.m., & robinson, t.p. (2011). visual puzzles, figure weights, and cancellation: some preliminary hypotheses on the functional and neural substrates of these three new wais-iv subtests. international scholarly research network neurology, 19, article id: 123173. https://doi.org/10.5402/2011/123173 mirsky, a.f., anthony, b.j., duncan, c.c., ahearn, m.b., & kellam, s.g. (1991). analysis of the elements of attention: a neuropsychological approach. neuropsychology review, 2(1), 109–145. https://doi.org/10.1007/bf01109051 mitrushina, m., boone, k.b., razini, j., & d’elia, l.f. (2005). handbook of normative data for neuropsychological assessment (2nd edn.). oxford: oxford university press. nell, v. (2000). cross-cultural neuropsychological assessment: theory and practice. mahwah, nj: lawrence erlbaum associates. nwafor, c., & adesuwa, a. (2014). psychological testing in the military. practicum psychologia, 4, 1–10. pallant, j. (2016). spss survival manual (4th edn.). berkshire: open university press. pillay, c. (2017). a preliminary standardisation of the letter cancellation test for military personnel in the sandf. unpublished masters dissertation. pretoria: university of south africa. pradhan, b. (2013). effect of kapalabhati on performance of six-letter cancellation and digit cancellation task in adults. international journal of yoga, 6, 128–130. https://doi.org/10.4103/0973-6131.1134415 pradhan, b., & nagendra, h.r. (2008). normative data for the letter cancellation task in school children. international journal of yoga, 1(2), 72–75. https://doi.org/10.4103/0973-6131.43544 richards, m., kuhn, d., hardy, r., & wadsworth, m.e.h. (1999). lifetime cognitive function and timing of the natural menopause. neurology, 53(2), 308–314. https://doi.org/10.1212/wnl.53.2.308 scott, j.g. (2011). attention/concentration: the distractible patient. in m.r. schoendberg & j.g. scott (eds.), the little black book of neuropsychology: a syndrome-based approach (pp.149–158). london: springer. shuttleworth-edwards, a.b. (2016). generally representative is representative of none: commentary on the pitfalls of iq test standardization in multicultural settings. south african journal of psychology, 30(7), 975–998. https://doi.org/10.1080/13854046.2016.1204011 stankov, l. (1988). aging, attention, and intelligence. psychology and aging, 3(1), 59–74. https://doi.org/10.1037/0882-7974.3.1.59 stump, d.a. (2002). neuropsychological testing: methodology, interpretation and outcomes. seminars in cardiothoracic and vascular anaesthesia, 6(1), 27–33. https://doi.org/10.1177/108925320200600107 upadhayay, n., & guragain, s. (2014). comparison of cognitive functions between male and female medical students: a pilot study. journal of clinical diagnostic research, 8(6), 5–12. https://doi.org/10.7860/jcdr/2014/7490.4449 uttl, b., & pilkenton-taylor, c. (2001). letter cancellation performance across the adult life span. the clinical neuropsychologist, 15(4), 521–530. https://doi.org/10.1076/clin.15.4.521.1881 warren, m., moorre, j.m., & vogtle, l.k. (2008). search performance if healthy adults on cancellation tests. american journal of occupational therapy, 62(5), 588–594. https://doi.org/10.5014/ajot.62.5.588 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) ingrid opperman department of student development and support, higher education development and support, tshwane university of technology, pretoria, south africa citation opperman, i. (2020). time limits and english proficiency tests: predicting academic performance. african journal of psychological assessment, 2(0), a20. https://doi.org/10.4102/ajopa.v2i0.20 original research time limits and english proficiency tests: predicting academic performance ingrid opperman received: 30 oct. 2019; accepted: 02 june 2020; published: 25 june 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract english is the primary language of instruction in south african higher education, but entering students of first year are often not sufficiently proficient. therefore, a need is evident for proficiency testing to guide intervention initiatives. international proficiency tests are lengthy and expensive, but cloze procedure and vocabulary tests have been used as effective alternatives. however, time limits may affect observed reliability and predictive validity in the context of higher education. the present research assessed a cohort of first-year tourism management students using versions of the english literacy skills assessment (elsa) cloze procedure and vocabulary in context tests under three time-limit conditions: normal, double and no time limits. students in double and no time-limit conditions performed significantly better than the normal time-limit group. group scores were correlated with, and significant predictors of, academic subject first-test scores. better performance and more accurate prediction under extended time limits may be related to students attempting more questions. as the elsa vocabulary in context was the better predictor in this research, the importance of non-technical vocabulary, as opposed to semantic and contextual understandings in cloze procedure, is highlighted. therefore, screening the english proficiency levels of students admitted to higher education institutions may be useful to flag likelihood of success and guide interventions. keywords: higher education; english proficiency; cloze procedure; vocabulary; time limits. introduction english has become the dominant language of business, public life and higher education (benzie, 2010; casale & posel, 2011; coleman, 2006; nunan, 2003). therefore, formal acquisition of english language skills has become essential for success in both higher education and business contexts to enhance economic opportunities in a multinational and international economy (bedenlier & zawacki-richter, 2015; prinsloo & heugh, 2013). higher education serves an essential role in enhancing the future career prospects in a competitive social and economic framework, making success integral for many young people (coleman, 2006; cross & carpentier, 2009; prinsloo & heugh, 2013). although higher academic success has become essential for entry into the 21st century economy (jackson, 2015), academic english language proficiency remains a challenge for the majority of south african students in a linguistically diverse society (andrade, 2006; cross & carpentier, 2009; murray, 2010; trenkic & warmington, 2018). academic english proficiency in higher education encompasses formal and functional control of the properties of english language, including vocabulary, grammar and contextual understanding (bridgeman, mcbride, & monaghan, 2004; masrai & milton, 2018; murray, 2010). limited english proficiency on entry may lead to academic vulnerability, characterised by unsuccessful adaptation to higher education demands, which could be detrimental to academic literacy, problem-solving techniques, constructive engagement in learning processes (murray, 2010; taylor & von fintel, 2016) and communications (benzie, 2010; murray, 2010; trenkic & warmington, 2018; webb, 2002). concomitantly, lack of capability in basic interpersonal communication skills (bisc; expression of conversational fluency) alongside cognitive academic language proficiency (calp; decontextualised language proficiency) may synergistically impact the expression of general english language proficiency in multiple contexts (bruton, wisessuwan, & tubsree, 2018; cummins, 2000). this disadvantage is displayed where decontextualised language learning experiences in everyday learning and communications, linked to bisc, impact the learning of academic concepts, and thereby result in less than optimal calp (abriam-yago, yoder, & kataoka-yahiro, 1999; tomasello, 2014). thus, students lacking english language skills sufficient for the tertiary academic environment are placed at a disadvantage, even if basic literacy skills are sufficient. apart from basic literacy, the context of higher education often requires content-specific skills (linked to calp; cummins, 2000), which are reliant on technical vocabulary beside general contextual identification and understanding (dalton-puffer, 2011; fenton-smith, humphreys, & walkinshaw, 2018; millin & millin, 2018). global research has implied that basic skills are a necessary component for developing technical/academic language (birrell, 2006; coleman, 2006). consequentially, students lacking english proficiency skills, or exhibiting competency gaps, may be at an academic disadvantage on entering english language institutions. internationally, english proficiency tests are frequently conducted pre-admission for selection purposes. although these tests could be utilised for admitting students in first year, they are often time-consuming, expensive and focused on overall proficiency rather than critical basic skills more relevant to post-admissions phase (arrigoni & clark, 2015; feast, 2002; goto, maki, & kasai, 2010; murray, 2010). these traditional gate-keeping tests include the international english language testing system (ietls) and the test of english as a foreign language (toefl). the viability and financial feasibility of utilising these assessments post-admissions to identify competency gaps is insufficient. post-admissions, other options, including the diagnostic english language test and diagnostic english language needs assessment, have been used globally for screening and diagnosis with good predictive and diagnostic validity (doe, 2014; read, 2008). similar to pre-admission tests, the foci include vocabulary, speed-reading, listening and interpretation of texts. in both cases, complex, rather than base skills are inherent to the tests. thus, other research has indicated that briefer, basic ability tests, including cloze procedure protocols and vocabulary assessments, are timeand cost-effective whilst retaining sufficient psychometric properties (goto et al., 2010; sun & henrichsen, 2010). cloze procedure protocols require the reader to insert missing words or phrases, illustrating semantic and contextual understanding linked to reading comprehension and writing skills (gellert & elbro, 2013; trace, brown, janssen, & kozhevnikova, 2017). such skills are considered essential in higher education and significantly vulnerable for second-language english speakers, perhaps because of inability to decode new information and translate key words within specific contexts (escamilla, 2009; huettig, 2015; staub, grant, astheimer, & cohen, 2015). decoding, recognition and translation to english (in the case of non-native speakers) have been closely related to cloze procedure protocol performance in children and adults (gellert & elbro, 2013; keenan, betjemann, & olson, 2008). these findings suggest that background and fundamental learning could play a role in developing essential skills which are transferable to higher education english language requirements. similarly, vocabulary acquisition has been linked to success in the context of higher education. acquired vocabulary has often been used as a proxy for general proficiency, demonstrating predictive power (masrai & milton, 2018; trenkic & warmington, 2018). non-technical vocabulary levels have been further linked to academic writing, reading comprehension and general academic performance (harrington & roche, 2014; qian, 2002; schmitt, jiang, & grabe, 2011; snow, lawrence, & white, 2009; trenkic & warmington, 2018). these findings are supportive of the inclusion of vocabulary components in traditional gate-keeping tests, lending support for the use of these tests as a proxy for proficiency even post-admissions in first-year students. in both cases, the feasibility of reduction in time and cost is a significant benefit. although research has demonstrated that both cloze procedure protocols and contextually based vocabulary tests may be used as proxies to understand english proficiency, these assessments are often conducted under time constraints, potentially confounding content performance with response time (e.g. goto et al., 2010; harrington & roche, 2014; masrai & milton, 2018). administration under time-constrained conditions remains a common practice for a variety of reasons but may result in decreased validity and reliability values (van der linden, 2011). concomitantly, the test may then lack accuracy for its stated purpose, which is problematic for both selections and post-admission competency identification contexts. therefore, a balance between internal consistency, predictive validity, length of assessment and other administration factors is required to enhance identification of the status of english language skills. the question then arises as to whether a sufficient balance of time-effectiveness, practicality and predictive validity is present when time constraints are implemented. researchers have reported improvements in performance on various english language tests with additional time allocations (bridgeman et al., 2004; powers & fowles, 1997), suggesting a focus on performance in complex understandings may be more important for academic outcomes than time-constrained responses (daly & stahmann, 1968; harrington & roche, 2014; macintyre & gardner, 1994). the removal of time constraints may also mitigate other factors associated with poorer performance, including inadequate test-taking strategies, test anxiety and familiarity with testing contexts (anderson, 1991; fairbairn, 2007; solano-flores, 2008). similar findings are present in the context of higher education, for which increased predictive validity, reliability and construct validity of cloze procedure protocols and vocabulary tests have been reported when time constraints are removed (hajebi, taheri, & allami, 2018; snow et al., 2009; trace et al., 2017). researchers have hypothesised that changes in performance under different time constraints may be linked to the number of items attempted, changes to item structures or content functions operating differently (luke & christianson, 2016; talento-miller, guo, & han, 2013; van der linden, 2011). other research has suggested that increased time may allow for better translation and internal reconstructions of semantics and syntax, although this may only be true for lengthy fragments in cloze procedure protocols or when a wide range of possible responses is presented (hajebi et al., 2018; staub et al., 2015). although this research has considered cloze procedure protocols, vocabulary and other english proficiency tests without time constraints, limited published work (e.g. goto, maki, & kasai, 2010) has considered different predictive validity of short assessments under various time constraints. the present study assessed the relative influence of time limits on two english language proficiency tests, that is, a cloze procedure protocol and contextual vocabulary assessment, to understand differences in the predictive validity under each time limit in determining first-test academic outcomes. the importance of this study lies in differentiation between english proficiency itself and the impact of time constraints on the expression of that proficiency in predicting academic outcomes. thus, the study intends to contribute through further understanding english proficiency testing in terms of the potentially detrimental impact of time limitations on test outcomes. these findings are potentially useful in enhancing mass language post-admission screening to improve skills-targeted interventions which are time-efficient and effective. methods participants participants comprised commencing first-year students (n = 81) enrolled in an institute for a tourism management national diploma course with common first-year academic subjects and admission requirements. the restriction for course enrolment was intended to indirectly standardise minimum english language entry criteria. the majority of enrolled first-year students at the institute were aged between 18 and 20 years, with a vast majority being of black ethnicity equally split between males and females. research design the present research made use of a cross-sectional, quasi-experimental design to assess the impact of different time limits on performance of both cloze procedure protocol and contextual vocabulary assessment. instruments kaleidoprax (2014) developed english literacy skills assessment (elsa) as two modified tests for the institute conducting the study: the cloze procedure and the vocabulary in context tests. at present, no psychometric properties have been made available for the tests (kaleidoprax, 2014). the cloze procedure test requires the insertion of missing words within the context of a sentence. cloze procedure comprises 20 questions, each with four possible responses, of which one is correct (max = 20). the vocabulary in context test identifies words in the context of a full sentence to require extrapolation of meaningof definitions, synonyms, antonyms and usage. vocabulary in context comprises 30 questions, each with four possible responses, of which one is correct (max = 30). no penalty scoring is implemented for either test. in this study, academic performance was assessed using percentages for the first-test marks for first-year subjects of national diploma courses in the department of tourism management (min = 1%, max = 100%). all marks obtained were above 0%. procedure data on the elsa were generated as part of administration of a battery which took place after english language portion. the battery was solicited by the academic departments of the institute as part of a post-admissions first-year student assessment. academic departments granted permission to modify the english language portion for research purposes, and all participants gave informed consent. no data were used for exclusionary, probationary or placement purposes. the full sample (n = 81) was broken down into three groups: normal time limit (n = 44), double time limit (n = 23) and no time limit (n = 15). separate test sessions took place for each group. participants had freedom to join the group of their choice. participation in the experimental group was voluntary, and verbal informed consent was obtained with written signatory. because of the voluntary nature of participation, a convenience sample was produced. resultantly, control for grade-12 english performance and the size of groups were not possible. voluntariness of participation, however, was essential because of the testing (personal development) and deviation from the normal quasi-experimental protocol. thus, it was not possible to specifically split students in experimental and control groups whilst retaining the intent of the testing session and considering the autonomy. examples were administered, and the test methods were explained, including the use of multiple-choice answer sheet, demands of the assessment and use of examples for familiarity and understanding. participants were informed about relevant time limit and provided with a clock to monitor timings. completed answer sheets were collected and checked for clarity of response prior to optical scanning and passing through a software program. electronic data scores were collated with first-test subject performance marks from the institute’s management information systems. data were anonymised and stored appropriately and securely for analysis. data analyses data analyses were conducted on spss® version 25. comparisons of the three time-limit groups were conducted using a one-way analysis of variance and tukey’s honest significant difference (hsd) post hoc test of mean differences and significances. pearson’s r correlation coefficients and standard linear regression models (standardised beta weights because of range discrepancies) were used to assess the relationship between scores of tests and first-test marks. ethical consideration this study received ethical clearance from the tshwane university of technology research ethics committee (no. rec/2016/09/001) results the cloze procedure subtest yielded a maximum score of 20, whilst the vocabulary in context subtest score was out of a possible 30. first-test subject marks were expressed as a percentage value out of 100 possible points. table 1 shows the mean values (m) and standard deviations (sd) of variables. table 1: descriptive statistics for the english literacy skills assessment tests and first-test subject marks by time-limit group. table 1 shows similar levels of dispersion across different groups and subjects. performance on the elsa tests improved when time constraints were reduced but levels of dispersion remained stable despite differing sample sizes. no substantial differences in academic marks were present between the three time-limit groups. differences between the time-limit groups the one-way analysis of variance with tukey’s hsd post hoc revealed that the three time-limit groups differed significantly. the group without a time limit had higher scores on the cloze procedure subtest (m = 13.93, sd = 4.30) than the double time-limit group (m = 12.22, sd = 4.40) or the normal time-limit group (m = 6.80, sd = 4.08). the one-way analysis of variance demonstrated that the groups differed significantly (f = 22.156, p = 0.000) and the levene’s test of homogeneity of variance met the required assumption of equal variances (f = 0.100, p = 0.905). the significant differences were identified as involving the normal time-limit group, for which scores were significantly lower than that of the double time-limit group (mdifference = 5.442, p = 0.000) and the no time-limit group (mdifference = 7.138, p = 0.000). however, the no time-limit and double time-limit groups did not differ significantly, despite slightly better performance by the no time-limit group (mdifference = 1.716, p = 0.440). similar findings were observed for the vocabulary in context subtest. the no time-limit group performed best on the vocabulary in context subtest (m = 12.07, sd = 4.98), whilst the double time-limit group’s scores were slightly lower (m = 11.13, sd = 5.36) and the normal time-limit group’s scores were considerably lower (m = 6.48, sd = 4.61). the one-way analysis of variance revealed that the groups differed significantly (f = 10.902, p = 0.000) and the requirement of homogeneity of variance was satisfied (f = 0.666, p = 0.517). examination of tukey’s hsd post hoc showed that the statistically significant differences were present between the normal time-limit group and the double time-limit group (mdifference = 4.653, p = 0.001) as well as the no time-limit group (mdifference = 5.589, p = 0.001). the no time-limit and double time-limit groups did not differ significantly (mdifference = 0.936, p = 0.833). therefore, significant differences were observed between the three time-limit groups, suggesting that time limitations influenced measuring english language skills by these tests. as a result, the timed conditions may also have affected the predictive power of each test. prediction of first-test subject marks pearson’s r correlation coefficients were calculated to examine the association between performance on the elsa tests and performance in the first-test of each subject, followed by separate regression models for each group. table 2 shows the correlation coefficients between the three time-limit mean values and subject performance. table 2: pearson’s r correlations between the english literacy skills assessment tests and first-test subject marks by time-limit group. statistically significant positive correlations were present between the cloze procedure subtest and the subject of ‘communications’, which had a strong emphasis on english language. similar coefficients were observed for the normal time-limit group (r = 0.437, p = 0.003) and the double time-limit group (r = 0.473, p = 0.023). a stronger statistically significant correlation was observed between the no time-limit group and the scores of the subject of ‘communications’ (r = 0.706, p = 0.003). the no time-limit group scores were also significantly correlated with scores of the first-test of tourism development (r = 0.574, p = 0.025), whilst the normal time-limit group was less strongly, but more significantly, correlated (r = 0.373, p = 0.013). the same is true about correlations between travel and tourism practice and cloze procedure for the normal time-limit group (r = 0.450, p = 0.002) and the no time-limit group (r = 0.656, p = 0.008). no other statistically significant correlation coefficients were present. the correlational findings tentatively suggested that higher scores on the cloze procedure test were associated with better performance on the subjects of ‘communications’, ‘tourism development’ and ‘travel and tourism practice’. in most of the cases, the relationship between the scores and academic performance was strongest when no time limit was present, although the double time-limit coefficients were frequently similar. significant positive correlation coefficients were also observed between the vocabulary in context test scores and the first-test subject marks, particularly if no time limit was implemented. vocabulary in context was more strongly associated with academic performance than the cloze procedure. correlations between the subject of ‘communications’ scores and vocabulary in context scores were statistically significant for the normal time-limit (r = 0.313, p = 0.038), double time-limit (r = 0.600, p = 0.002) and no time-limit groups (r = 0.634, p = 0.011). the double time-limit group was also significantly correlated with ‘travel and tourism practice’ scores (r = 0.544, p = 0.007). however, only the no time-limit group was statistically significantly correlated with the first-test marks on ‘marketing for tourism’ (r = 0.648, p = 0.009), ‘tourism development’ (r = 0.708, p = 0.003) and ‘travel and tourism practice’ (r = 0.590, p = 0.210). for the cloze procedure subtest, no statistically significant correlations were present with the first-test marks on ‘travel and tourism management’. for both tests, the no time-limit group appeared to be the most strongly associated group with performance on the first-test of various subjects of tourism management, particularly the subject of ‘communications’. regression models were used to understand the relative predictive power of different time limit groups of each subject. table 3 shows the standardised beta weights, statistically significant levels of the cloze procedure subtest and coefficients of determination reporting the amount of variance explained. table 3: regression values for the cloze procedure test on first-test subject marks by time-limit group. when cloze procedure is used as a predictor of the first-test marks, the regression on the subject of ‘communications’ was strong, but the ‘marketing for tourism’ and ‘travel and tourism management’ scores were not well predicted. statistically significant increase in the sds of first-test scores were associated with a single sd increase in cloze procedure for the no time-limit group for the subjects of ‘communications’ (β = 0.706, p = 0.003), ‘tourism development’ (β = 0.574, p = 0.025) and ‘travel and tourism practice’ (β = 0.656, p = 0.008). however, a slight inverse predictive function was observed for ‘travel and tourism management’ (β = -0.265, p = 0.013). the first-test scores for the subject of ‘communications’ were also predicted by scores on the cloze procedure for the normal time-limit group (β = 0.437, p = 0.003) and the double time-limit condition (β = 0.473, p = 0.023). the same was true for the subject of ‘travel and tourism practice’ for the no time-limit (β = 0.656, p = 0.008), double time-limit (β = 0.407, p = 0.054) and normal time-limit groups (β = 0.450, p = 0.002). for the subject of ‘travel and tourism practice’, all three conditions had similar predictive power. for the cloze procedure, no time limits resulted in stronger strength of prediction than doubling the time limits or implementing the normal time limit. similar findings were present for the vocabulary in context test. the coefficients of determination, standardised regression values and probability values for vocabulary in context are shown in table 4. table 4: regression values for the vocabulary in context test on the first-test subject marks by time-limit group. the vocabulary in context scores had statistically significant regression values for the subject of ‘communications’ for the normal time-limit group (β = 0.313, p = 0.038), double time-limit group (β = 0.600, p = 0.002) and no time-limit group (β = 0.634, p = 0.011). standard deviation values of subjects were substantially increased with subtest increase for ‘travel and tourism practice’ for both the double time-limit (β = 0.544, p = 0.007) and the no time-limit groups (β = 0.590, p = 0.021). however, the no time-limit group proved to be the strongest predictor, also having statistically significant power for the subjects of ‘marketing for tourism’ (β = 0.648, p = 0.009) and ‘tourism development’ (β = 0.708, p = 0.003). like the cloze procedure test, the regression of vocabulary in context test on the subject of ‘travel and tourism management’ was poor and not statistically significant (p > 0.05). both elsa tests showed predictive power for the majority of the first-year subjects of tourism management based on statistically significant correlation coefficients and regression models. however, the no time-limit condition exhibited the strongest predictive power. variance between ~33% and ~50% in academic first-test subject performance was explicable by english language proficiency measured on each of the two elsa tests. in spite of not being significantly different from the no time-limit group, the double time-limit group did not show the same predictive relationship, potentially because of a truncated range of scores. the subject of ‘travel and tourism management’, however, was not sufficiently associated with scores on either of the elsa tests in terms of correlation or prediction. discussion the findings indicated that performance on both elsa tests improved relatively to increase in time limitations. the statistics demonstrated that increased time limits resulted in a statistically significant improvement in performance, whilst the sd levels of mean values remained stable, suggesting that a consistent dispersion in scores was retained. therefore, the findings reflected improvements in test outcome predictive quality when time limits are removed, despite the inherent limitations of comparing groups of differing sizes (rosenthal & rosnow, 2008). nonetheless, similar improvements in english test outcomes were found by hajebi et al. (2018) and snow et al. (2009). in this regard, harrington and roche (2014) and van der linden (2011) also suggested that improvements in performance could be related to the more accurate assessment of constructs in the english language, rather than the ability to perform under time constraints. this disparity could be partially because of long-held notion of the influence of time constraints on the number of item responses and internal reliability of english proficiency tests themselves (evans & reilly, 1972). similar studies have suggested that implementing time constraints could reduce the reliability and validity of psychometric and language tests for a wide variety of constructs (lu & sireci, 2007), resulting in the absence of equivalency across instruments (cronbach & warrington, 1951). additionally, a biased presentation of english language ability is present if response levels below certain thresholds occur, or without readjustment of item functions (e.g. van der linden, 2011). the present research findings of improved performance without time constraints cannot necessarily be equated to changes in reliability or validity per se because of the absence of measurement of item response functions, despite studies such as those performed by harrington and roche (2014) being focused on similar assessment types. nonetheless, talento-miller et al. (2013) also suggested that increasing the number of items attempted influenced the outcome of english language tests because of the varying difficulties and types of items rather than processing speed. the evidence suggests that inherent, internal test-structure issues under time-constrained conditions are influential, and the present findings concurred that working under time constraints could have negatively affected performance on both elsa tests for this cohort. although some other research has explored the inherent reliability issues surrounding time limits on english tests, the reviewed literature has not extensively explored the relative impact of differing time limits on predictive validity in the context of higher education. the regression analyses in the present research provided evidence of a predictive component for the two elsa tests utilised, which strengthened when time limitations were extended or nulled. the double time-limit and no time-limit groups’ academic performance was positively and significantly correlated with performance on both elsa tests, whilst the normal time-limit group demonstrated limited predictive power. interestingly, predictive performance was similar for both double time-limit and no time-limit groups in most of the cases. this finding suggested that item response thresholds, such as those discussed by van der linden (2011) and talento-miller et al. (2013), could be important for predictive power as well as for internal consistency and reliability of measurement. therefore, the present academic first-test performance could have been at least a partial function of english language ability, as measured by the elsa tests. several english language performance actions applied to the cloze procedure protocol were used as one of the elsa tests. however, in the present research, non-technical vocabulary levels measured in the context were found to be better predictors of academic performance. non-technical vocabulary levels have been successfully used as predictors in higher education institutions (heis) as well as a proxy for general english proficiency and cloze procedures (daller & wang, 2014; masrai & milton, 2018; qian, 2002; schmitt et al., 2011; snow et al., 2009; trenkic & warmington, 2018). the present study’s findings suggest that vocabulary levels were more important in accurately predicting academic success than the cloze procedure test, which required semantic manipulation and decision-making within the context of a passage. however, vocabulary ability could be subsumed into a variety of english functions present in the hei performance requirements, such as lecture participation and development of text understanding and technical vocabulary. vocabulary may be linked to other aspects of english language performance related to higher education, including deliberate performance and response selection (macalister, 2010), improved heuristic learning of phrases and lexical translation (koehn, och, & marcu, 2003), speed of translation and decoding within a finite memory capacity system (sakurai, 2015), and meta-cognitive focus on syntactical awareness beside reformulation between languages in an attempt at better understanding (jiménez et al., 2015). known to be influenced by time constraints, some of these factors directly relate to essential skills measured in vocabulary and cloze procedure tests, including semantic representations, understanding of words in context, reading speed and quality and the ability to manipulate syntactical arguments. reported findings that english proficiency tests encompassing vocabulary, grammar and contextual representation are affected by time limitations (e.g. bridgeman et al., 2004; murray, 2010) were confirmed by the present research using two elsa tests. such content-specific skill development for understanding may require measurement outside of what could generally be considered as the normal, time-constrained and psychometrically focused framework. furthermore, development of content/subject-specific technical language could also play a role in academic outcomes, particularly if basic levels have not been fully developed as a foundation (birrell, 2006). the findings of the present research also suggest that time limitations play an important role in performance and predictive validity, beside choice of test for predictive purposes. removal of time limitations resulted in more accurate prediction of academic success outcomes, and use of the vocabulary in context test resulted in the strongest predictive power. these findings suggested that appropriate english proficiency assessment could hinge more on the determination of specific academic weaknesses within english language whilst reducing the role of time limitation as an essential factor in predicting performance. in spite of the various findings suggesting that time constraints impact a variety of factors concerning english proficiency tests, from a practical perspective, it is unlikely that performing lengthy tests without time limits would be practical in the context of real world. nonetheless, studies suggest that time constraints could alter the psychometric properties of tests in a variety of ways. in spite of important findings in the present research, the study carried some limitations which created some uncertainty in the interpretation of the results. groups of unequal sizes, because of the voluntary nature of participation, may have resulted in misrepresentation of values because of the use of parametric statistics in such a case (rosenthal & rosnow, 2008). similarly, small groups and lack of randomisation could have affected the statistical outcomes. an example of this issue could be the negative correlations seen in for the subject of ‘travel and tourism management’, although alternate explanations such as subject content could also account for this anomaly. nonetheless, inequality in ranges of scores between different variables still resulted in pearson’s r and a linear regression model being the most suitable choice, albeit imperfect. in addition, it was not possible to fully standardise the english language pre-entry (grade 12) performance in this case.therefore, this criterion was only passively standardised as a minimum level through the use of a specific qualification grouping of students. pre-entry english ability could have impacted the outcome on either of the english proficiency tests, thus introducing bias in the results or impacting the selection of groups in an attempt by the participants to maximise their performance. nonetheless, it is believed that present language ability, regardless of prior ability, is the most important factor in interpreting the findings, because the intention is to predict academic performance rather than investigate validity of the assessments in question. furthermore, the results appear to indicate that time limitations imposed on english proficiency tests are of importance in fully applying the concept of language proficiency to higher education outcomes. conclusion the present research findings demonstrate that performance and predictive power on the modified elsa versions of cloze procedure and vocabulary in context improves when time limits are increased or removed. the findings imply that factors such as item completion thresholds, reading speed, semantic understanding, and translation for decision-making requirements could contribute to negative changes in performance under time-constrained conditions. therefore, students may possess some of the english language skills associated with academic performance but are unable to demonstrate these skills within the imposed time constraints. although these findings are useful, they should be treated with caution as current internal reliability and predictive validity data are not available for full assessment and this pilot study was conducted on smaller, unequal sample groups. nonetheless, it is apparent that the english proficiency as measured by the elsa could be inaccurately reflected under time-constrained conditions, limiting the ability of the test to serve as a predictor of academic performance in tertiary education. these findings imply that further investigations are required to develop sufficiently competency gap-targeted english interventions, and the future research should consider larger-scale studies to identify specific components within the tests which contribute to academic success in south african heis. acknowledgements competing interests the author has declared that no competing interest exists. author’s contributions i declare that i am the sole author of this research article. funding information the research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability statement data sharing is negotiable by request. sharing of data cannot be guaranteed and will depend on the nature of the request. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author. references abriam-yago, k., yoder, m., & kataoka-yahiro, m. (1999). the cummins model: a framework for teaching nursing students for whom english is a second language. journal of transcultural nursing, 10(2), 143–149. https://doi.org/10.1177/104365969901000208 anderson, n.j. (1991). individual differences in strategy use in second language reading and testing. the modern language journal, 75(4), 460–472. https://doi.org/10.1111/j.1540-4781.1991.tb05384.x andrade, m.s. (2006). international students in english-speaking universities. journal of research in international education, 5(2), 131–154. https://doi.org/10.1177/1475240906065589 arrigoni, e., & clark, v. (2015). investigating the appropriateness of ielts cut-off scores for admissions and placement decisions at an english-medium university in egypt. ielts research report series. retrieved from https://www.ielts.org/teaching-and-research/research-reports bedenlier, s., & zawacki-richter, o. (2015). internationalization of higher education and the impacts on academic faculty members. research in comparative & international education, 10(2), 185–201. https://doi.org/10.1177/1745499915571707 benzie, h.j. (2010). graduating as a ‘native speaker’: international students and english language proficiency in higher education. higher education research & development, 29(4), 447–459. https://doi.org/10.1080/07294361003598824 birrell, b. (2006). implication of low english standards among overseas students at australian universities. people and place, 14(4), 53–64. bridgeman, b., mcbride, a., & monaghan, w. (2004). testing and time-limits. princeton, nj: educational testing services. bruton, c., wisessuwan, a., & tubsree, c. (2018). praxial interlanguage experience: developing communicative intentionality through experiential and contemplative inquiry in international education. hrd journal, 9(1), 27–36. casale, d., & posel, d. (2011). english language proficiency and earnings in a developing country: the case of south africa. journal of behavioral and experimental economics 40(4), 385–393. coleman, j.a. (2006). english-medium teaching in european higher education. language teaching, 39(1), 1–14. https://doi.org/10.1017/s026144480600320x cronbach, l.j., & warrington, w.g. (1951). time-limit tests: estimating their reliability and degree of speeding. psychometrika, 16(2), 167–188. https://doi.org/10.1007/bf02289113 cross, m., & carpentier, c. (2009). ‘new students’ in south african higher education: institutional culture, student performance and the challenge of democratisation. perspectives in education, 27(1), 6–18. cummins, j. (2000). language, power, and pedagogy: bilingual children in the crossfire. new york, ny: multilingual matters. isbn: 9781853594731 daller, m., & wang, y. (2017), predicting study success of international students. applied linguistics review, 8(4), 355–374. https://doi.org/10.1515/applirev20162013 dalton-puffer, c. (2011). content-and-language integrated learning: from practice to principles? annual review of applied linguistics, 31, 182–204. https://doi.org/10.1017/s0267190511000092 daly, j.l., & stahmann, r.f. (1968). the effect of time-limits on a university placement test. the journal of educational research, 62(3), 103–104. https://doi.org/10.1080/00220671.1968.10883779 doe, c. (2014). diagnostic english language needs assessment. language testing, 31(4), 537–543. https://doi.org/10.1177/0265532214538225 escamilla, k. (2009). english language learners: developing literacy in second-language learners – report of the national literacy panel on language-minority children and youth (d. august, & t. shanahan, eds.). journal of literacy research, 41, 432–452. https://doi.org/10.1080/10862960903340165 evans, f., & reilly, r. (1972). a study of speededness as a source of test bias. journal of educational measurement, 9(2), 123–131. https://doi.org/10.1111/j.1745-3984.1972.tb00767.x fairbairn, s. (2007). facilitating greater test success for english language learners. practical assessment, research & evaluation, 12(11), 1–7. feast, v. (2002). the impact of ietls scores on performance at university. international education journal, 3(4), 70–85. fenton-smith, b., humphreys, p., & walkinshaw, i. (2018). on evaluating the effectiveness of university-wide credit-bearing english language enhancement courses. journal of english for academic purposes, 31, 72–83. https://doi.org/10.1111/j.1745-3984.1972.tb00767.x gellert, a.s., & elbro, c. (2013). cloze tests may be quick, but are they dirty? development and preliminary validation of a cloze test of reading comprehension. journal of psychoeducational assessment, 31(1), 16–28. https://doi.org/10.1177/0734282912451971 goto, k., maki, h., & kasai, c. (2010). the minimal english test: a new method to measure english as a second language proficiency. evaluation & research in education, 23(2), 91–104. https://doi.org/10.1080/09500791003734670 hajebi, m., taheri, s.q., & allami, h. (2018). a comparative study of cloze test and c-test in assessing collocational competence of iranian efl learners. european online journal of natural and social sciences, 7(1), 225–234. harrington, m., & roche, t. (2014). identifying academically at-risk students at a english-as-a-lingua-franca university setting. journal of english for academic purposes, 15, 37–47. https://doi.org/10.1016/j.jeap.2014.05.003 huettig, f. (2015). four central questions about prediction in language processing. brain research, 1626, 118–135. https://doi.org/10.1016/j.brainres.2015.02.014 jackson, d. (2015). employability skill development in work-integrated learning: barriers and best practice. studies in higher education, 40(2), 350–367. https://doi.org/10.1080/03075079.2013.842221 jiménez, r.t., david, s., fagan, k., risko, v.j., pacheco, m., pray, l., et al. (2015). using translation to drive conceptual development for students becoming literate in english as an additional language. research in the teaching of english, 49, 248–271. kaleidoprax. (2014). what elsa measures? retrieved n.d. from https://www.kaleidoprax.co.za/english-literacy-skills-assessment.html keenan, j.m., betjemann, r.s., & olson, r.k. (2008). reading comprehension tests vary in the skills they assess: differential dependence on decoding and oral comprehension. scientific studies of reading, 12(3), 281–300. https://doi.org/10.1080/10888430802132279 koehn, p., och, f.j., & marcu, d. (2003). statistical phrase-based translation. in proceedings of the 2003 conference of the north american chapter of the association for computational linguistics on human language technology (naacl ‘03) (vol. 1, pp. 48–54). edmonton: association for computational linguistics. https://doi.org/10.3115/1073445.1073462 lu, y., & sireci, s.g. (2007). validity issues in test speededness. educational measurement: issues and practice, 26(4), 29–37. https://doi.org/10.1111/j.1745-3992.2007.00106.x luke, s.g., & christianson, k. (2016). limits on lexical prediction during reading. cognitive psychology, 88, 22–60. https://doi.org/10.1016/j.cogpsych.2016.06.002 macalister, j. (2010). investigating teacher attitudes to extensive reading practices in higher education: why isn’t everyone doing it? relc journal, 41(1), 59–75. macintyre, p.d., & gardner, r.c. (1994). the subtle effects of language anxiety on cognitive processing in the second language. language learning, 44(2), 283–305. masrai, a., & milton, j. (2018). measuring the contribution of academic and general vocabulary knowledge to learners’ academic achievement. journal of english for academic purposes, 31, 44–57. https://doi.org/10.1177/0033688210362609 millin, t., & millin, m. (2018). english academic writing convergence for academically weaker senior secondary school students: possibility or pipe-dream? journal of english for academic purposes, 31, 1–17. https://doi.org/10.1016/j.jeap.2017.12.002 murray, n.l. (2010). conceptualising the english language needs of first year university students. the international journal of the first year in higher education, 1(1), 55–64. https://doi.org/10.5204/intjfyhe.v1i1.19 nunan, d. (2003). the impact of english as a global language on educational policies and practices in the asia-pacific region. tesol quarterly, 37(4), 589–613. https://doi.org/10.5204/intjfyhe.v1i1.19 powers, d.e., & fowles, m.e. (1997). effects of applying different time-limits to a proposed gre writing test. princeton, nj: educational testing service. prinsloo, c.h., & heugh, k. (2013). the role of language and literacy in preparing south african learners for educational success: lessons learnt from a classroom study in limpopo province. pretoria: human sciences research council. qian, d. (2002). investigating the relationship between vocabulary knowledge and academic reading performance: an assessment perspective. language learning, 52(3), 513–536. https://doi.org/10.1111/1467-9922.00193 read, j. (2008). identifying academic language needs through diagnostic assessment. journal of english for academic purposes, 7(3), 180–190. https://doi.org/10.1016/j.jeap.2008.02.001 rosenthal, r., & rosnow, r.l. (2008). essentials of behavioral research (3rd edn.). new york, ny: mcgraw hill. sakurai, n. (2015). the influence of translation on reading amount, proficiency, and speed in extensive reading. reading in a foreign language, 27(1), 96–112. issn 1539-0578. schmitt, n., jiang, x., & grabe, w. (2011). the percentage of words known in a text and reading comprehension. the modern language journal, 95, 26–43. https://doi.org/10.1111/j.1540-4781.2011.01146.x snow, c.e., lawrence, j.f., & white, c. (2009). generating knowledge of academic language among urban middle school students. journal of research on educational effectiveness, 2(4), 325–344. https://doi.org/10.1080/19345740903167042 solano-flores, g. (2008). who is given tests in what language by whom, when and where? the need for probabilistic views of language in the testing of english language learners. educational researcher, 37(4), 189–199. https://doi.org/10.3102/0013189x08319569 staub, a., grant, m., astheimer, l., & cohen, a. (2015). the influence of cloze probability and item constraint on cloze task response time. journal of memory and language, 82, 1–17. https://doi.org/10.3102/0013189x08319569 sun, c., & henrichsen, l. (2010). major university english tests in china: their importance, nature and development. tesl reporter, 44, 1–24. talento-miller, e., guo, f., & han, k.t. (2013). examining test speededness by native language. international journal of testing, 13, 89–104. https://doi.org/10.1080/15305058.2011.653021 taylor, s., & von fintel, m. (2016). estimating the impact of language instruction in south african primary schools: a fixed effects approach. economics of education review, 50, 75–89. https://doi.org/10.1016/j.econedurev.2016.01.003 tomasello, m. (2014). a natural history of human thinking. boston, ma: harvard university press. isbn: 9780674724778 trace, j., brown, j.d., janssen, g., & kozhevnikova, l. (2017). determining cloze item difficulty from item passage characteristics across different learner backgrounds. language testing, 34(2), 151–174. https://doi.org/10.1177/0265532215623581 trenkic, d., & warmington, m. (2018). language and literacy skills of home and international university students: how different are they, and does it matter? bilingualism: language and cognition, 22(2), 349–365. https://doi.org/10.1017/s136672891700075x van der linden, w.j. (2011). test design and speededness. journal of educational measurement, 48(1), 44–60. https://doi.org/10.1111/j.1745-3984.2010.00130.x webb, v. (2002). english as a second language in south africa’s tertiary institutions: a case study at the university of pretoria. world englishes, 21(1), 49–61. https://doi.org/10.1111/1467-971x.00231 abstract introduction professional training guidelines programme pedagogy method data collection results discussion limitations of the study concluding remarks acknowledgements references footnote about the author(s) erica munnik department of psychology, faculty of community and health sciences, university of the western cape, bellville, south africa mario smith department of psychology, faculty of community and health sciences, university of the western cape, bellville, south africa leigh adams tucker department of psychology, faculty of community and health sciences, university of the western cape, bellville, south africa wilmien human department of psychology, faculty of community and health sciences, university of the western cape, bellville, south africa citation munnik, e., smith, m., adams tucker, l., & human, w. (2021). covid-19 and psychological assessment teaching practices – reflections from a south african university. african journal of psychological assessment, 3(0), a40. https://doi.org/10.4102/ajopa.v3i0.40 original research covid-19 and psychological assessment teaching practices – reflections from a south african university erica munnik, mario smith, leigh adams tucker, wilmien human received: 16 oct. 2020; accepted: 03 mar. 2021; published: 07 apr. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the coronavirus disease 2019 (covid-19) crisis posed new challenges in higher education, which compounded the existing challenges. the south african higher education sector responded with plans to secure the learning and teaching mandates and bolster support services for students. an emergency remote learning and teaching plan was launched to mitigate the impact of the pandemic in the 2020 academic year. this article reports on the reflections of lecturers who were teaching psychometric assessment and supervising student psychologists on the clinical master’s programme during the pandemic. the master’s clinical psychology programme at the university of the western cape was the case study. the focus on the psychological assessment module was the unit of analysis. course documents and reflective notes, generated during the adaptation of psychological assessment training, were used as the data source. thematic analysis generated five themes, namely, (1) the importance of statutory guidelines for clinical training, (2) adapting content, (3) pedagogy and modalities, (4) management of test libraries and (5) lecturer experience. the management of changes to the module in response to the covid-19 crisis was challenging. lecturers had to balance competency training and assessment with revised work and adapted teaching conditions. emergency teaching interventions took place in the framework of ethics and professional requirement, and the learning outcomes articulated within the scope of practice for clinical psychologists. keywords: case study; covid-19; clinical psychology; learning and teaching; pedagogy; psychological assessment. introduction continuous changes in the academic landscape and student protests have been the realities of higher education in south africa over the past two decades. the higher education sector is considered ‘high stress’ as functions are performed in a volatile, uncertain, complex and ambiguous environment (simons, munnik, frantz, & smith, 2019). the coronavirus disease 2019 (covid-19) crisis posed new challenges that compound existing socio-structural barriers. president ramaphosa issued a state of national disaster (ramaphosa, 2020a), calling for a national lockdown from 23 march 2020 (ramaphosa, 2020b). this containment response comprised five increasing levels of restriction and prohibition to curb the spread of the virus (south african government, 2021). the minister of higher education, science and innovation in south africa announced an early recess for institutions of higher learning on 17 march 2020 (nzimande, 2020a). the initial 3-week lockdown at the highest restriction level (5) was extended by 2 weeks. the higher education sector appealed to the south african government to save the 2020 academic year, and institutions developed business continuity plans to this end. institutional differences related to the historical influences on institutional status and functioning were key in formulating a response to the crises. thus, it was important to reflect on responses at an institutional level. the university of the western cape (uwc) launched an emergency remote learning and teaching plan to curtail the impact of the pandemic on the 2020 academic year (lawack, 2020). staff worked remotely using a range of digital platforms to continue learning and teaching. however, the information and communication technology (ict) readiness of the country proved a concern through lack of access to sufficient data, reliable internet connectivity and appropriate devices (department of health, 2020). governance and administration continued within the limitations of the business continuity plans, remote working conditions and infrastructure. the phased return of students to university campuses and to work-integrated learning was contingent on the relaxation of containment measures (nzimande, 2020b). professional training guidelines during the lockdown, professional programmes prioritised the completion of theoretical components. with the advent of covid-19, the guidelines for teaching psychometrics at higher institutions of learning and the minimum standards for training of clinical psychology published by the board of psychology (hpcsa, 2019a, 2019b) still served as the principal documents to orientate training towards the requisite professional competencies in psychometric assessment. these competencies included, but were not limited to, the knowledge of psychometric theory; understanding of scientific, theoretical, empirical and contextual bases of assessment; and the skills and techniques to conduct assessments and assess outcomes. other important competencies include the ability to evaluate multiple roles, contexts and relationships within which client’s function, and the ability to understand the collaborative relationship in assessment. competence in assessment modalities entails a high level of integration and abstraction (hpcsa, 2019a, 2019b). the professional board for psychology at the health professions council of south africa (hpcsa) and the psychological society of south africa (psyssa) provided guidelines for the adoption and operationalisation of remote training in psychology. these guidelines encouraged institutions that engage in postgraduate training to provide trainees with reasonable clinical exposure as well as support during this time of national disaster (hpcsa, 2020). it also advocated for developing contingency plans for the delivery of high-quality training to postgraduate students who still met the minimum standards of professional education and training. the professional board for psychology further stipulated that alternative assessment practices need to be considered, practical work needs to be rescheduled as far as possible and video conferencing should be considered as an alternative to traditional face-to-face teaching, assessment and intervention practices (hpcsa, 2020). the curriculum had to incorporate the revised guidelines for training. programme pedagogy competency-based training in psychometric assessment included theory, practice and integration. the modes of delivery included face-to-face lectures, work-integrated learning and supervision. under normal circumstances, changes to the modes of delivery were introduced systematically and gradually. similarly, learning modalities, activities and platforms were selected to enhance the traditional face-to-face (contact) learning and teaching. the pandemic necessitated an emergency online teaching approach that focused on case-based learning supported by paper-based supervision and integration. clinical training in the second and third terms was directly impacted by the lockdown and the lack of access to the physical campus (padmanabhanunni, 2020). the training collective consists of lecturers and clinical supervisors who reconceptualised the teaching of core competencies and learning outcomes of the programme as well as clinical practical requirements and exposure. this article reports on the operational and pedagogical challenges and solutions experienced and changes effected in the psychometric assessment curriculum for student psychologists during the pandemic. method aim the aim of this article is to identify the adaptations to teaching of psychometric assessment during covid-19. it reflects on the rationale and impact of the adaptation of the psychometric curriculum for master’s-level student psychologists at a south african university. design a case study design was used to describe and explore adaptation to psychological assessment teaching practices during covid-19. the case study method helped to attain an in-depth appreciation of the adaptation to teaching (the issue) in response to covid-19 (the phenomenon of interest) in its naturalistic, real-life context, that is, professional training at the identified university as recommended by crowe et al. (2011) and creswell (2013). the case study design facilitated learning about this phenomenon in a micro-environment as described by yin (2014). the case study design facilitated the examination and reflection on the adaptation of the psychological assessment curriculum during the covid-19 pandemic. case study site the research setting or case study site was uwc. the master’s programme in clinical psychology is housed in the department of psychology and the faculty of community and health sciences. the master’s training programme is a full-time training programme accredited by the hpcsa and leads to registration as a clinical psychologist (hpcsa, 2018). the psychological assessment module is one of three core modules in the clinical programme. it is a year-long module with a minimum of 2 hours of formal teaching a week. the psychological assessment module incorporates, but is not limited to, training in test construction or psychometric theory, the administration, scoring and interpretation of psychological tests in adult and child practice, neuropsychology and intellectual disability. it has a strong focus on the attainment of clinical skills and competency as per hpcsa recommendation (hpcsa, 2020). the module also includes practical exposure to face-to-face assessments at a uwc-based community clinic and exposure to hospital-based placements from the second term onwards. defining the case the psychological assessment module was defined as the case. successful completion of this module is a prerequisite for achieving the published professional competencies in psychological assessment (hpcsa, 2019a). guiding principles about the pedagogy and operationalising the teaching and assessment practices remain the responsibility of specific programme committees. training in psychological assessment is a year-long module that included work-integrated learning, lectures, observation of assessments, simulations of test situations, case studies and testing in real time (faculty of community and health sciences, 2019). the usual mode of delivery was face-to-face instruction. the training team and the students had ready access to a well-curated psychometric test library that supported teaching and learning. teaching of psychological assessment at uwc is a collective endeavour, where lecturers are engaged in the teaching and supervision of various components of psychological assessment such as psychometric theory, test administration and report writing. the training team in psychometry consisted of four lecturers with phd qualifications, who were appointed at the rank of senior lecturer (3) and associate professor (1). the lecturers were registered clinical psychologists for 17, 14, 11 and 10 years, respectively. their teaching experience in higher education spanned 23, 12, 9 and 7 years, respectively. their focused experience of 17, 12, 9 and 7 years, respectively, in teaching psychometrics in professional programmes led to their registration as counsellors (bpsych) and psychologists (m.a. psychology). their experience of clinical supervision along with psychometric assessment included 17, 12, 11 and 6 years, which were acquired in professional programmes spanning eight institutions and five internships. thus, there was a breadth and depth of experience in this teaching cohort. the training team for psychometry reports to the broader collective at monthly meetings. during the imposed early recess, the training team met more regularly to discuss adaptations and teaching strategies. minutes of these meetings as well as reflective notes by the staff were compiled in order to track decision-making and dynamic aspects including, but not limited to, personal experiences and collective experiences of the adaptation process. similarly, course outlines and guidelines for student participation and assessment were revised. all these course documents were compiled into an archive that is stored as evidence for accreditation, quality assurance and compliance purposes. the aim of the case study was to establish how covid-19 affected the teaching and assessment practices of this module at uwc. data collection the archive of course documents comprised the source of data for this study. the use of records and documents can be limiting in terms of fit for purpose, completeness and robustness (creswell, 2013). the course documents were specifically created to record changes in the module and the underlying pedagogical and logistic considerations that enhanced its fit for purpose. the process of generating the documents included reflexive notes generated by clinical teachers in response to stimulus prompts such as ‘did changes occur in teaching and assessment practices during covid-19?’ ‘if so why?’ and ‘in what way did it change?’ the reflexive notes provided rich, thick descriptions and abstractions that in turn constituted robust raw data. the documents were generated in the process of adaptation and were used to inform decision-making and provide accountability mechanisms. the resultant documents had a high level of completeness which made it appropriate for the study. multiple sources of data such as revised course outlines that accommodated online teaching, revised assessment schedules, instructions to students about the proposed changes to the digital learning management system (ikamva) and lecturer reflections were considered to develop a holistic picture of the data pertaining to the case study. the use of multiple sources of data increased the trustworthiness of the study as recommended by crowe et al. (2011). data analysis thematic analysis was used for data analysis. two reviewers immersed themselves into the data through repeated reading and viewing of all the sources. data were organised and coded to allow key issues to emerge. key issues were organised into themes. the data analysis followed the basic steps outlined by creswell (2013) for thematic analysis. the analysis was conducted by two people, of which one taught on the module in 2020. the resultant themes were shared with the remaining lecturers on the module in two rounds of reflexive discussions until consensus was reached between the lecturers and the independent analyst. this approximated respondent validation and served to enhance the authenticity and trustworthiness of the results through collective alignment of themes as recommended by harrison, birks, franklin and mills (2017). ethical considerations the data were produced as part of an educational quality assurance process. secondary research constitutes low-risk research and does not require ethics clearance. no students were identified or no work produced by students were used. the data were generated by the authors as the lecturers involved in the quality assurance process. the lecturers consented to the use of the documents for the purposes of producing this manuscript. these documents are available in selected shared digital spaces. the potential benefits of this article include a contribution to the community of practice in professional training during the covid-19 pandemic. the draft manuscript was presented to the office of the deputy registrar at uwc to obtain permission for publishing an article in which the course documents were used and the institution is named. through this process, the risks associated with reputational harm to third parties and the institution were mitigated or averted. written permission for submission of the article was obtained as a letter of endorsement with reference number: uwcen161020ms on 16 october 2020. basic principles such as proper anonymisation of data were applied in the study. only information related to teaching and supervision experience in higher education, placement and qualification were disclosed to provide context. this information is available in the public domain and in the register of the hpcsa. the lecturers consented to the content and the nature of information shared in the article. results the transition to online learning, focused on digitised teaching and learning methods, posed various challenges to achieve the bridging of theory and practice and mastery of clinical competencies. five themes emerged, namely, (1) guidelines for clinical training, (2) adapting content, (3) pedagogy and modalities, (4) managing test libraries and (5) lecturer experience. guidelines for clinical training the training team reflected that the responsiveness demonstrated through the provision of revised guidelines was containing personally and facilitative at an operational level. the responsiveness of the professional bodies provided a framework that in turn promoted a common reference and sense of a community of practice. the teaching team could make decisions about adaptation to the curriculum within a clear framework that provided both security and direction during uncertain times. adapting content there was variation in the ease with which content could be adapted for emergency remote teaching. this variation related to the nature of test contents as well as the structure of the tests. nature of content adapting teaching modes was easier for principles of test construction such as validity and reliability. abstract constructs that underpin tests were more challenging. theoretical and operational definitions were more readily translated for remote teaching. in contrast, the more nuanced, dimensional understanding of constructs demanded more engagement than the digital platform could afford. such constructs were further illustrated using clinical examples that drew more heavily on the experience of the lecturer, as students were not exposed to clinical settings. the nature of the content intersected with the extent to which students had clinical references. the domains in cognitive assessment were challenging to teach. personality assessment was less sensitive to issues of practice effects and teaching during measurement, creating greater flexibility in how the resource could be demonstrated to students. type of test self-report measures, for example, structured personality assessments, use questionnaire test booklets with standardised instructions. test booklets are available in hard copy or digital form. the structured nature of these tests shifts the emphasis of training onto scoring and interpretation of client profiles as administration is relatively uncomplicated and can be self-taught. such tests allowed for independent work to become familiar with the test and administration. this provided a basis for teaching technical aspects like scoring and higher order aspects such as interpretation. the additional time available to focus on interpretation of structured measures appeared to mitigate the impact of remote teaching and the limited contact. projective tests required actual teaching of test administration and scoring to ensure that students understood the highly specified administration procedures and scoring systems in addition to the interpretation and online exposure to manifest the content of stimulus cards. in this instance, the online environment was secured to avoid the risk of violating the confidentiality of content through unauthorised auditing. thick descriptions of manifest content were provided rather than visual stimuli as part of presentation slides or screen share functions. teaching the administration and scoring of projective personality tests was done using anonymised protocols. students were given completed protocols to practice scoring. the uniformity of protocols allowed for comparison between students and mastery was achieved through providing multiple cases to score. the number of completed scoring trials far exceeded the number of students who would have achieved through actual administration. the teaching of cognitive measures in an online learning space was more challenging as it entailed mastery of complex administration and high-level interpretation. cognitive and developmental measures require the test administrator’s familiarity with various novel test components. students’ limited access to test material during lockdown substantially reduced the opportunity to familiarise themselves with the material. logistic challenges included access to test materials and challenges of a conceptual nature such as how to teach students to effectively administer, score and interpret tests without face-to-face assessments. ethical concerns in online teaching such as copyright for material used (digitisation or copying), the confidentiality and the security of test content posed threats during online teaching. experiential aspects live assessment was not feasible, given the social distancing and travel restrictions. the clinic on campus remained closed and physical access was only permitted to a selected number of students within the allowed 30% during the adjusted level 3, for example, access to science labs. although online counselling services were available, no psychometric assessments were possible. student psychologists (m1s) were not permitted to access the clinical platform and, therefore, could not visit hospital-based placements. viable alternatives had to be found to provide experiential learning. case studies effectively helped to develop clinical reasoning, selection of tests and compilation of batteries. protocols from the departmental archive provided appropriate training cases that students could use to master scoring and interpretation in an online space without the risk of direct client contact. the remedial learning on administration will take place during internships (hpcsa, 2020). the time allocated for supervision was used to focus on learning and integration from case studies and protocols. pedagogy and modalities providing epistemological access to the complex nature of psychometric assessment remains a key challenge when teaching. the requirement of high level of technical and interpretive skills is challenging to all students regardless of the prior learning experiences and individual differences in ability. it is also important to hone the skills of observing and describing behaviour during testing in the context of assessment. the ability to read emotional cues during the physical assessment is key for the trainee psychologist to master. this learning was deferred to internship because of continued restrictions that made direct client contact difficult. similarly, the teaching clinician has a reduced opportunity to use the emotional resonance in the classroom to assess and track students in terms of engagement, comprehension, emotive responses and reactivity to text content as well as the ability to learn and generalise acquired skills. during remote teaching, students typically disable the video function on their devices in order to save bandwidth and maintain privacy. this further reduces the ability to assess students learning in live, albeit digital, settings. this in turn detracts from the ability to understand where students may struggle, where to provide further elaboration or general encouragement. the virtual classroom blunts this effect and contributes towards feelings of isolation. the virtual classroom and online formats required additional consideration and actions in order to ensure a safe and ethically sound learning space. staff had to manage the boundaries of online classrooms by setting passwords that were valid for restricted periods and creating defined participant lists. in this way, the risk of accidental access to confidential content and test material was reduced. similarly, the end-to-end encryption on platforms maintained the confidentiality of the teaching space. the online learning environment included a number of modalities. the advantages and disadvantages of each are outlined below. narrated powerpoint slides lecturers compiled slides under the pressure of time during early recess. the quick turnaround period to revise or prepare the slides in the required format was challenging and largely dependent on the level of technical skill in the teaching complement. pre-recorded narrated presentations were supplemented with worksheet exercises and online question and answer sessions. completion of the narrated slides did not equate to the confidence that the work was effectively covered to achieve learning, mastery and integration. tracking students’ downloading of prescribed material and narrated slides did not sufficiently replace tracking engagement with the content. narrated powerpoint slides gave students flexibility to work through the material at their own time relative to the competing demands of studying from home. data vulnerability and connectivity were obstacles to students’ ability to access and engage the material. social media applications applications such as whatsapp provided a platform for contact between the staff and the students. student queries, quick announcements and ‘checking in’ were easily facilitated via this platform in a data-light manner. it proved to be useful to foster open communication and greater access and connection with students. in this way, it created a community of practice with the flexibility to respond even whilst otherwise occupied, for example attending a meeting. this format required explicit boundary setting and underscoring professional use of the platform to make it fit for the purpose when connecting via personal devices. webinars webinars presented as 90-min sessions were used as critical discussion forums after students worked through relevant content distributed through the secure online university learning management platform. these sessions facilitated ‘check-ins’ both to hear how students were doing and to identify the challenges in learning. video recordings video recordings of demonstrations of test administration were useful to offset the loss of contact sessions where demonstrations would have normally taken place. the recordings become resources that students can revisit if necessary. this contributed to deepened understanding and fostered learning. the management and regulation of access to such video materials were critical to ensure that the ethical obligation of keeping the content of test material confidential is met. thus, the instruction in professional and ethical conduct had to be explicit, and measures are introduced to formalise the ethical obligations for students, for example a confidentiality binding agreement. although these measures cannot guarantee that transgressions will not occur, it provides a clear account of ethical conduct in the teaching and instructional space, which confirms that the student was appropriately and properly informed. such documented processes can be subsequently used in disciplinary processes. case studies case studies provided an alternative to practical hands-on training during lockdown. aspects like scoring, interpretation and report writing were facilitated through the use of case studies to integrate the various components of assessment. history taking, compilation of test batteries, observation of behaviour during testing and clinical reasoning required additional facilitation in order to develop mastery from case study methods. the former set of skills could be developed to a greater extent whilst the latter set of skills was achieved to a lesser extent than during the normal circumstances. during the pandemic, the transition to online learning was an emergency measure implemented under the pressure of time with the possibility of contact being uncertain and unconfirmed. the pedagogy, teaching practices and assumptions underpinning the psychological assessment curriculum was re-examined, evaluated and adapted with formats and modes of delivery not typically associated with professional training in general and psychometric assessment in particular. the challenge was to ensure that adaptations resulted in an inclusive learning experience. the resultant curriculum of necessity had to be dynamic, responsive to changing demands and sensitive to contextual realities of students in south africa. contextual challenges such as load shedding, unequitable access to data and devices, socio-economic challenges and the extent to which home environments were conducive to living and learning, increased family responsibilities during lockdown and stereotypical gender roles and division of labour had to be considered. managing test libraries professional training in psychometry requires access to a secure and well-managed test library. as mentioned before, the psychology department at uwc has a well-curated test library that ensures access to current test material, despite the resource constraints associated with a historically disadvantaged institution.1 during the stringent lockdown, this resource was not accessible. during subsequent relaxation of the measures, the resource became accessible, but practical contact with clinical cases or referrals was still not possible because of the continuation of social distancing and the risk that close contact posed for infection. duplication of material was allowed for teaching purposes but posed greater challenges as the appropriate storage and management of such copies could not be ensured. thus, it was not considered feasible. the development of teaching videos became an extension of the test library subject to the same controls and regulation for storage and access. as mentioned before, this necessitated explicit instruction in and agreements about management of the said materials. thus, achieving the learning outcomes during the pandemic posed challenges to the conceptualisation and implementation of the rules for managing the confidentiality of and access to test materials. lecturer experience as described earlier, the breadth and depth of experience in the training collective was substantial. lecturers were able to draw on their teaching, supervisory and clinical experience to offset limitations in the current teaching context. the ability to provide real-life clinical references assisted students to develop a more nuanced understanding of key constructs in assessment. the depth in teaching experience across programmes and institutions provided a strong base from which to move into adaptation. thus, the confidence in the content and a clear understanding of the learning outcomes and competencies that had to be mastered meant that more attention could be focused on finding alternate modes of teaching and creating space for reflective practice. the depth of experience in supervision of psychometric assessment articulated into a good understanding of how to work developmentally and to facilitate the learning needs of students. in the adaptation process, this experience was important as it provided a framework and a reference for decision-making about which aspects to prioritise and which aspects to defer. the breadth and depth of experience was as important as the intentional reflexive practice in the programme, which meant that the training team was able to access the experience and had the disposition to want to access it. discussion the covid-19 pandemic necessitated emergency responses from the higher education sector in order to ensure business continuity and the completion of the 2020 academic year. in the context of professional programmes, such as clinical psychology, adaptations and accommodations had to make educational sense and satisfy professional competency and statutory requirements. the pandemic challenged the pedagogy and learning principles, underpinning professional training in psychometric assessment. course adaptations had to be contextually appropriate considering the physical, emotional, social and financial barriers students would potentially face during covid-19. the findings identified that the ability to be responsive and make meaningful changes was in part facilitated by clear frameworks from the professional bodies. the professional bodies provided an adapted statutory framework that guided curriculum changes in psychometric assessment. decisions about adaptations to curriculum were taken within this adapted framework. careful, albeit pressured, considerations of the learning outcomes and competency requirements assisted in the identification and implementation of realistic amendments to the curriculum within the broader competency framework provided by professional bodies. this was consistent with the literature on curriculum adaptation and re-curriculisation (barab & luehmann, 2003; brown, 2009). familiarity with the professional frameworks and minimum competency requirements for psychometric assessment practices was a facilitator of responsiveness and meaningful adaptation of the curriculum in the context of emergency response to the pandemic. the findings indicated that the conceptualisation of professional training as a process involving different phases and stakeholders promoted responsiveness. such a conceptualisation allowed the training team to identify the skills that could be deferred to internship. the training in the first year of the programme could then prioritise mastery of theoretical concepts and baseline clinical competencies that were deemed essential for the completion of first year clinical training in psychometry and progression to internship. in this way, the training could ensure that adapted learning outcomes and competency requirements were attained. this resonates with the hpcsa’s policy regarding intern psychologists, guidelines for universities, internships, training institutions and intern psychologists (2015) who underscored the importance and value of conceptualising clinical training as a cumulative and continuous process that spans theoretical training, internship and community service. the training offered in psychometric assessment in the theoretical component of the programme must be understood in the broader context of clinical training. the complementary relationship between course work, internship and supervised practice enabled decision-making about adaptations to and the timing of teaching clinical competencies. the practical value of the internship as an extension of training in assessment was underscored. the literature indicated that universities with the existing ict infrastructure and learning management systems were able to migrate faster to emergency remote teaching using digital platforms (almaiah, al-khasawneh, & althunibat, 2020; crawford et al., 2020; eltahir, 2019; shehzadi et al., 2020; uwc, 2020). the findings of the current study further illustrated this point. the existence of digital learning management systems at the institution facilitated an easier transition to online learning with a strong digital or online focus. this study demonstrated that the responsiveness of institutions was influenced by their governance structures. the identified institution was able to provide flexible governance and decisive leadership in academic planning and quality (uwc, 2021a, 2021b, 2021c). the readiness and responsiveness of quality assurance processes at the institution enabled the amendments to assessment schedules and curriculum changes. flexible governance reduced pressure on the system in all respects and reflected in the lecturer experiences. the findings illustrated that adaptation to teaching practices and modes of learning was linked to the type of assessment measure and the nature of the content. theoretical concepts, psychometric properties and self-administered measures were easily taught. it required less focused instruction in and practice of administration that in turn freed up time to focus on scoring, interpreting and report writing. projective personality tests, cognitive and developmental measures required explicit instruction in administration in addition to scoring, interpreting and report writing. immersion in real-life assessments was not possible, and the requisite skills were taught through case-based learning. the availability of a well-curated archive of developmentally appropriate cases including original protocols assisted the shift to case-based learning. similarly, the creative use of instructional videos, narrated slides, simulation and digital classrooms were useful to teach assessment. teaching in a collective also provided a community of practice and sound boarding that reduced the isolation of teaching in this context. the depth in the experience of the teaching staff in psychometric assessment and clinical supervision contributed to the readiness and capacity to be adaptive. reflexive practice and team-based training positively influenced the process of curriculum adaptation. teaching psychometric assessment during the pandemic necessitated additional considerations in the ethical management of test materials. the findings illustrated the importance of regulating the use of test content and materials in slide presentations (hpcsa, 2019a, 2019b). in addition, it was necessary to secure the digital classroom to prevent unauthorised access and auditing. the recording of lectures and instructional videos became useful resources and required a binding agreement around the use and storage as it constituted a part of the test library. limitations of the study one of the key limitations of the study is that it focused on one of the three professional programmes in psychology offered at the institution, despite the rationale provided for the decision. the study does not include student reflections and thus is limited in terms of the insights into the subjective experiences of students of this revised programme. similarly, the article and study focused on the operational and pedagogical issues experienced by staff whilst not addressing the subjective experience of working through re-conceptualisation. the demographic information of the lecturers involved was not provided in order to protect identities, and it was not considered important for the collective-level data. in this way, some of the dynamic information about the relative position of the lecturers in the collective is lost to analysis. such an analysis, although useful, was outside the scope of what this manuscript set out to do. concluding remarks the present case study is reflective of the many challenges and considerations that covid-19 posed and necessitated in the training of clinical psychologists in psychometric assessment at higher education institutions. it demonstrates how learning and teaching were adapted and re-aligned within the framework of statutory and professional requirements to facilitate a digital online environment for students to still attain professional competency in psychological assessment. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions e.m. and m.s. participated in the conceptualisation, design, composition of the study and writing of the manuscript. all the authors participated in the reflections, preparation and critical revision of the manuscript. funding information this research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. data availability the source of data was confidential course documents that are not available for public distribution. disclaimer this research has not been commissioned nor does it represent any affiliated agency of the authors. references almaiah, m.a., al-khasawneh, a., & althunibat, a. (2020). exploring the critical challenges and factors influencing the e-learning system usage during covid-19 pandemic. education and information technologies, 25, 5261–5280. https://doi.org/10.1007/s10639-020-10219-y barab, s.a., & luehmann, a.l. (2003). building sustainable science curriculum: acknowledging and accommodating local adaptation. science education, 87(4), 454–467. https://doi.org/10.1002/sce.10083 brown, m.w. (2009). the teacher-tool relationship. mathematics teachers at work: connecting curriculum materials and classroom instruction (pp. 17–36). new york, ny: taylor and francis. crawford, j., butler-henderson, k., rudolph, j., malkawi, b., glowatz, m., burton, r. … lam, s. (2020). covid-19: 20 countries’ higher education intra-period digital pedagogy responses. journal of applied learning and teaching, 3(1), 1. https://doi.org/10.37074/jalt.2020.3.1.7 creswell, j.w. (2013). qualitative inquiry and research design: choosing among five approaches. thousand oaks, ca: sage. crowe, s., cresswell, k., robertson, a., huby, g., avery, a., & sheikh, a. (2011). the case study approach. bmc medical research methodology, 11(1), 100. https://doi.org/10.1186/1471-2288-11-100 department of health. (2020). minister of higher education, science and innovation. statement on the measures to phase out the lockdown and phasing in of pset strategic functions. 30 april 2020. covid 19. online resource and news porthole. retrieved from https://sacoronavirus.co.za/2020/04/30/minister-of-higher-education-science-and-innovation-statement-on-the-measures-to-phaseout-the-lockdown-and-phasing-in-of-pset-strategic-functions eltahir, m.e. (2019). e-learning in developing countries: is it a panacea? a case study of sudan. ieee access, 7, 97784–97792. https://doi.org/10.1109/access.2019.2930411 faculty of community and health sciences. (2019). prospectus: department of psychology university of the western cape. retrieved from https://www.uwc.ac.za/faculties/chs/psychology/pages/academic-programmes.aspxon harrison, h., birks, m., franklin, r., & mills, j. (2017). case study research: foundations and methodological orientations. forum qualitative sozialforschung/forum: qualitative social research, 18, 1. health professions council of south africa (2015). policy regarding intern psychologists, guidelines for universities, internships training institutions and intern psychologists. (form 160). retrieved from https://www.hpcsa.co.za/uploads/psb_2019/policy%20and%20guidelines/form%20160%2028%20%20%20%20updated-%20october%202014%20final.pdf health professions council of south africa. (2018). professional board for psychology: list of accredited universities in south africa. retrieved from https://www.hpcsa.co.za/uploads/psb_2019/accredited%20universities%20in%20south%20africa%202018.pdf health professions council of south africa. (2019a). guidelines for the teaching of psychometrics at higher institutions of learning. the professional board for psychology. retrieved from https://www.hpcsa.co.za/uploads/psb_2019/guidelines%20for%20the%20teaching%20of%20psychometrics%20final.pdf health professions council of south africa. (2019b). minimum standards for the training of clinical psychology. the professional board for psychology. retrieved from https://www.hpcsa.co.za/uploads/psb_2019/policy%20and%20guidelines/sgb%20clin%20-%20revised%20october%202019.pdf health professions council of south africa. (2020). covid 19 guidelines. retrieved from https://www.psyssa.com/hpcsa-covid-19-guidelines/ lawack, v. (2020). no student will be left behind. 15 april 2020. university of the western cape. retrieved from https://www.uwc.ac.za/announcements/pages/no-student-will-be-left-behind.aspx nzimande, b. (2020a). address by minister of higher education, science and innovation, on measures to deal with the covid-19 threat in the post-school education and training sector (17/03/2020). the presidency, south african government. retrieved from https://www.polity.org.za/article/sa-blade-nzimande-address-by-minister-of-higher-education-science-and-innovation-on-measures-to-deal-with-the-covid-19-threat-in-the-post-school-education-and-training-sector-17032020-2020-03-20 nzimande, b. (2020b). minister of higher education, science and innovation: statement on progress in the implementation of measures by the post school education sector in response to covid-19 epidemic. 09 june 2020. retrieved from https://www.gov.za/speeches/minister-blade-nzimande-progress-implementation-coronavirus-covid-19-measures-post-school padmanabhanunni, a. (2020). personal communication by course coordinator to the masters psychology collective. psychology department. cape town: university of the western cape. ramaphosa, c. (2020a). measures to combat coronavirus covid-19 epidemic. the presidency, south african government. retrieved from https://www.gov.za/speeches/statement-president-cyril-ramaphosa-measures-combat-covid-19-epidemic-15-mar-2020-0000 ramaphosa, c. (2020b). escalation of measures to combat coronavirus covid-19 pandemic. the presidency, south african government. retrieved from https://www.gov.za/speeches/president-cyril-ramaphosa-escalation-measures-combat-coronavirus-covid-19-pandemic-23-mar shehzadi, s., nisar, q.a ., hussain, m.s., basheer, m.f. hameed, w.u, & chaudhry, n.i. (2020). the role of digital learning toward students’ satisfaction and university brand image at educational institutes of pakistan: a post-effect of covid-19. asian education and development studies, https://doi.org/10.1108/aeds-04-2020-0063 simons, a., munnik, e., frantz, j., & smith, m. (2019). the profile of occupational stress in a sample of health profession academics at a historically disadvantaged university in south africa. south african journal of higher education, 33(3), 132–154. https://doi.org/10.20853/33-3-3199 south african government. (2021). covid-19 alert systems. retrieved from https://www.gov.za/covid-19/about/about-alert-system university of the western cape. (2021a). flexible learning and teaching provisioning policy draft. academic planning unit (apu). cape town: university of the western cape. university of the western cape. (2021b). centre for innovative education and communication technologies (ciect). student orientation interventions. cape town: university of the western cape. university of the western cape. (2021c). communication. from the office of the executive director: human resources – update on covid-19 protocols. cape town: university of the western cape. university of the western cape communication. (2020). workshop and training material now online and on ikamva. division for postgraduate studies. cape town: university of the western cape. yin, r.k. (2014). case study research: design and methods. los angeles, ca: sage. footnote 1. historically disadvantaged institutions refer to institutions in south africa created under apartheid to cater to africans and the mixed race. abstract introduction methods results discussion implications and recommendations conclusion acknowledgements references about the author(s) saleha mahomed-kola department of psychology, faculty of humanities, university of the witwatersrand, johannesburg, south africa aline ferreira-correia department of psychology, faculty of humanities, university of the witwatersrand, johannesburg, south africa casper j.j. van zyl department of psychology, faculty of humanities, university of johannesburg, johannesburg, south africa citation mahomed-kola, s., ferreira-correia, a. & van zyl, c.j.j. (2022). preliminary normative data for the hooper visual organization test for a south african sample. african journal of psychological assessment, 4(0), a64. https://doi.org/10.4102/ajopa.v4i0.64 original research preliminary normative data for the hooper visual organization test for a south african sample saleha mahomed-kola, aline ferreira-correia, casper j.j. van zyl received: 25 june 2021; accepted: 25 mar. 2022; published: 30 may 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the hooper visual organization test (hvot) was designed to measure an individual’s ability to organise visual stimuli and assess visual-spatial abilities and synthesis. the current investigation sought to explore the psychometric properties of the hvot and develop normative data for south africans who do not speak english as a first language and who received primary and secondary public education. the research design was cross-sectional and the hvot was administered to healthy adults (n = 111) and a clinical group (n = 17) whose ages ranged between 19 and 70 years and had an education of between 6 and 22 years. the clinical group was made up of huntington’s disease patients (hd/hdl2). reliability indicators (mcdonald’s omega and rasch person reliability index) were satisfactory. the hvot fit the rasch model well, although item locations deviated somewhat from the expected monotonic increase in item difficulties. statistically significant differences in total scores were observed across age, education and gender groups, forming the basis of the norms presented in this paper. a few items across these groups were flagged for potential differential item functioning. several statistically significant associations with the montreal cognitive assessment (moca) were observed. these were consistent with theoretical expectations and provided evidence of convergent validity. the clinical group performed worse than the control group when mean total hvot scores were compared. preliminary norms stratified by age, gender and years of education are presented. future studies should include larger sample sizes and additional research on the influence of gender on the total hvot score is needed. keywords: hvot; neuropsychological assessment; norms; psychometric properties; visuospatial ability; huntington’s disease; moca; rasch model. introduction the hooper visual organization test (hvot) (hooper, 1958) is a neuropsychology test of visual-spatial function and visual organisation. it was developed in 1958 and consists of 30 drawings of common objects and animals that are segmented into two or more pieces, which require mental rotation to identify and name each item (giannakou & kosmidis, 2006; hooper, 1958; lezak et al., 2012). it is scored by awarding one point for a correct response. some items allow for partially correct responses, which receive half a credit, and zero credit is given to incorrect responses (hooper, 1958). the standardised norms provided by hooper (1958) were used to formulate cut-off scores that reduced the number of misclassification of individuals in each normative age group (hooper, 1958) and cut-off scores of 20 to 25 were recommended (hooper, 1958). the hvot is easy to administer and sensitive to the detection of visuospatial deficits that link to a wide range of neuropathologies (booth & happé, 2018; boyd, 1981; eberson, 2014; ferreira-correia, anderson, cockcroft & krause, 2020; gasparini et al., 2008; mitolo et al., 2016; paxton et al., 2007; sanz cortés, olivares crespo & barcia albacar, 2011). it has been found to be valid and reliable in different populations (campagna & ferreira-correia, 2021; giannakou & kosmidis, 2006; greve, lindberg, bianchini & adams, 2000; lin, su, guo & wuang, 2012; lopez, lazar & oh, 2003), although differences in cultural item appropriateness and item ranking have been noted (merten & beal, 1999; su, lin, wu & wuang, 2013). moreover, achievement on the test seems to be influenced by age (devries, 2005; miller et al., 2015; su et al., 2013), level of education and gender (campagna & ferreira-correia, 2021; elias et al., 2011; giannakou & kosmidis, 2006; merten & beal, 1999). despite its clinical potential, the use of the hvot in south africa is limited by the lack of psychometric and normative data. it is widely accepted that using foreign norms in south africa is not an adequate practice, but it is often the only option because local norms are unavailable. this issue is further compounded by the fact that country-wide norms are also not appropriate in south africa. this is because of the major socioeconomic inequalities and disparities of educational opportunities connected with ethnicity (watts & shuttleworth-edwards, 2016), as well as cultural and linguistic diversity (foxcroft, paterson, le roux & herbst, 2004). these in turn are linked to wide differences between standardised tests scores of different samples, especially when south africans are compared against foreign norms (lucas, 2013). this study intends to mitigate the socio-cultural biases in neurocognitive assessment by exploring the psychometric value of the hvot in a sample of south africans who do not speak english as a first language and who attended public school. specifically, the objectives of the study are as follows: (1) to examine the reliability, (2) to determine whether demographic variables (age, years of education and/or gender) are associated with better performance on the hvot, (3) to evaluate item difficulty through rasch analysis, (4) to explore the diagnostic and convergent validity of the hvot and (5) to provide normative data for the hvot for a homogenous south african sample. methods participants the secondary data obtained from a study titled ‘the neurocognitive profile of huntington’s disease-like 2’ was used for this study (ferreira-correia, 2019). the data were collected in stages where participants for the control and clinical huntington’s disease / huntington’s disease-like 2 (hd/hdl2) groups were recruited. given the rarity of hd/hdl2 in the south african population within the stipulated demographics, only 18 patients participated in the study. for the recruitment of the participants for the control sample, the researcher aimed to match the hd/hdl2 participants in terms of age, years of education, and language (english not first language). data collection occurred simultaneously for both the control and clinical group. both groups were formed by means of purposive homogenous sampling (leedy & ormrod, 2015), so as to better match the demographics of the patients. the sample is described in table 1. table 1: demographic characteristics of the clinical (hd/hdl2) and control groups. to be included in the study, participants needed to be able to speak english. potential participants with comorbid neurological or metabolic diseases, history of traumatic brain injury with loss of consciousness, abuse of illegal drugs, and/or who did not give formal consent were not included. design this investigation is non-experimental as none of the variables have been manipulated. it involves a cross-sectional design because it investigates particular variables retrospectively, without directly interfering with it (field, 2018). instruments a demographic questionnaire (ferreira-correia, 2019) was used to collect data on age, level of education, occupation, language experience, gender and other medical variables relevant for the original study. the montreal cognitive assessment (moca) (nasreddine, 2005) was administered after the demographic questionnaire. this screening tool includes items measuring executive function and visuospatial ability, confrontational naming, short-term memory, attention and working memory, language, concentration, verbal abstraction and orientation (nasreddine et al., 2005). each functional area was scored separately and then points were added for a total maximum score of 30 points. hooper visual organization test (hooper, 1958) – the hvot consists of 30 line drawings that depict uncomplicated objects which have been cut into pieces and placed in a puzzle-like manner. the hvot was the sixth test in a battery of 12 neuropsychological tests given to participants. during administration, the original version of the test was used, and the following protocol was adhered to (hooper, 1958): if participants were unable to respond in english, answers were accepted in other languages and subsequently translated into english by a research assistant who was proficient in several south african languages. the scoring rules of the manual were followed, although adjustments to accommodate for linguistic variances were made. for example, item 3, ‘bench’ was awarded a full point; item 4, ‘flying machine’ was awarded a full point; item 5, a full point was awarded if participants named any round ball, whilst ‘football’ and ‘rugby ball’ were awarded half a point; item 7, ‘sheep’ and ‘lamb’ were awarded a full point, and ‘animal’ was awarded half a point; item 8, ‘lorry’ was awarded a full point, ‘car’ or ‘vehicle’ was given half a point; item 9, ‘mug’ was awarded a full point, ‘jug’ was scored half a point; item 11, ‘peach, tomato, pumpkin, pear and the like’ were awarded a full point, and ‘fruit’ was awarded half a point; item 14, ‘hockey stick, walking stick and stick’ were awarded a full point; item 15, ‘boat and ship’ were awarded full points; item 16, ‘kettle and teapot’ were awarded full points; item 17, ‘couch’ was awarded a full point and ‘sofa’ was awarded half a point; item 19, ‘kettle’ was awarded a full point; item 20, ‘animal’ was awarded half a point; item 21, ‘pansy’ etc. were awarded full points; item 22, ‘rat, guinea pig’ etc. were awarded full points and ‘animal’ was awarded half a point; item 23, ‘bible and dictionary’ were awarded a full point; item 24, ‘animal’ was awarded half a point; item 25, ‘cube’ was awarded a full point; item 26, ‘house of the sea’ was awarded a full point, and ‘tower, castle, watch tower, church, tower or high place’ were awarded half points; item 27, ‘boot’ was awarded a full point; and item 29, ‘diamond ring’ was awarded a full point. procedure the human research ethics committee (medical) [redacted] granted ethical clearance for this study (clearance certificate number [redacted]). the original data for the study [redacted] was collected after obtaining clearance from human research ethics committee (medical) (clearance certificate number: [redacted]). the helsinki declaration and the singapore statement on research integrity (resnik & shamoo, 2011) were honoured. participants received written information and a detailed briefing about the study. volunteers signed a written consent form before they participated in the assessment. the assessment was conducted in one session of approximately 2 h. the data was captured in research electronic data capture (redcap ®) (harris et al., 2009) and then analysed and reported on. for the purpose of the current project, the data obtained from the demographic questionnaire, the hvot and the moca were used. the hvot tests were scored by a registered neuropsychologist ([redacted]. quality control of the data was conducted by implementing two additional rounds of blinded scoring conducted by an independent clinical psychologist and by the first author ([redacted]). statistical analysis and report writing were the final step of this report. data analyses data were analysed using spss statistics 26, winsteps (version 4.8.0.0; linacre, 2015) and the psych package (revelle, 2021) in r (r core team, 2021). distribution of the data was explored through tests of normality which were run on the hvot total scores of the healthy participant group. the sample was described using frequency distributions, measures of central tendency and variability in order to better define demographic variables and ranges to use for the normative data. pearson’s correlations and independent samples t-tests were used to explore group differences on the total score of the hvot. mcdonald’s coefficient omega (ωt) was computed to evaluate the internal consistency reliability of the hvot. item response theory (specifically, rasch analysis) was used to examine the construct validity of the hvot. a partial credit model was computed to investigate fit to the rasch model. infit mean square values close to one were expected with values > 0.60 and < 1.40 considered to fall within an acceptable range. item fit values outside this range were thought to misfit the model, with values < 0.60 and > 1.4 indicating overfit and underfit, respectively. in addition, we examined differential item functioning (dif) across the age, gender and education groups to ensure that the observed group differences were not a result of item bias (bond & fox, 2015). diagnostic validity was investigated by using an independent samples t-test to compare hvot total scores across diagnostic groups. convergent validity was examined by correlating the total score of the hvot with the selected scores on the moca. ethical considerations the human research ethics committee (medical) university of the witwatersrand granted ethical clearance for this study (clearance certificate number m200669). the original data for the study ‘the neurocognitive profile of huntington’s disease-like 2’ were collected after obtaining clearance from human research ethics committee (medical) (clearance certificate number: m140872). the helsinki declaration and the singapore statement on research integrity (resnik & shamoo, 2011) were honoured. participants received written information and a detailed briefing about the study. volunteers signed a written consent before taking part in the assessment. the assessment was conducted in one session of approximately 2 h. results the data were symmetrical and normally distributed (skewness statistic = 0.162; kurtosis statistic of –0.490; kolmogorov smirnov = 0.079, df (111), sig = 0.089; and shapiro wilk = 0.981, df (111), sig = 0.108). reliability (mcdonald’s omega, ωt = 0.90; rasch person separation index = 0.87) for the 30 items of the hvot was excellent. pearson’s correlations revealed a significant (p < 0.01) moderate negative correlation between age and the total hvot score (r = –0.368), suggesting lowered scores in older participants. there was a weaker positive correlation between gender and hvot total score (r = 0.268) which suggests that women performed better, (n = 63, x = 19.413, sd = 5.1223) than men (n = 48, x = 16.479, sd = 5.4762). there was a moderate positive correlation between years of education and hvot total score (r = 0.343), indicating that participants with a higher level of education (12–22 years) performed better that those with a lower level of education (2–11 years). we tested for group differences across age, gender and education. importantly, only two groups were created for age and years of education, as further splitting would have made the groups too small given the sample size. the independent t-tests are reported in table 2. statistically significant differences across the groups were observed with women scoring higher than men (p = 0.004); younger participants (aged 19–40 years) scoring higher (p = 0.004) than older participants (aged 41–70 years); and individuals with more education (12–22 years) scoring higher (p ≤ 0.001) than those with less education (2–11 years). to ensure that these observed mean score differences are meaningful and not a result of item bias, we also tested for dif. these results are reported below as part of the rasch analysis. table 2: independent samples t-test for age, gender and years of education and the hooper visual organization test total score. the clinical group (table 2) had statistically significantly lower mean hvot total score (11.30 ± 6.56) compared to the control group (18.14 ± 5.45). these results provide good evidence for the diagnostic validity of the hvot. whilst the effect size is large (d = 1.8), the difference in sample size between the clinical and control group should still be noted, as it likely affected the statistical power of this test, increasing the chance of a type i error. table 3 presents results of the rasch analysis. the item measure column indicates item ‘difficulty’. it shows that the items do in general increase in difficulty from the beginning to the end of the measure, although there is substantial deviation from the expected monotonic progression. whilst items 25 (block), 29 (ring), 28 (key), 30 (broom) and 26 (lighthouse) appear to be the most difficult items, the easiest items included, 1 (fish), 2 (saw), 3 (table), 11 (apple) and 7 (dog). the difficulty estimates for items 7, 11 and 25 are examples of surprising results, with item 25 being relatively more difficult than expected, whereas items 7 and 11 were relatively easier than expected. table 3: rasch fit and differential item functioning statistics. in general, however, the items of the hvot fit the expectations of the rasch model well. the infit mean square values are all reasonably close to the expected value of zero, although items 12 and 27 had relatively larger values, leaning towards underfit. with regard to dif, slight variation was observed in the item location parameters across the groups of interest, with some items being relatively more difficult for one group whilst other items were somewhat more difficult for the other group. such variation is expected, and in general, has the effect of cancelling out. however, a few items were flagged for dif. slight dif were observed on items 5, 1 and 6 for age, gender and education respectively. moderate to large dif was flagged on items 3, 9, 23, and 38 for age; item 3 for gender; and items 3 and 11 for education. whilst concerning, in this preliminary research on the hvot, these findings should probably just be noted as such given the modest sample size, requiring further research with larger samples in future. should the same items again be flagged in subsequent work, there might be stronger reason to investigate possible causes for the observed dif, and if no substantive reason can be identified, one could consider amending these items, or excluding them from the measure entirely should the problem persist. at this stage, however, such actions would be premature. table 4 presents the frequency of full score, half scores and zero scores for each of the hvot items. the items for which participants scored zero most frequently, included 25 (block), 29 (ring), 28 (key), 30 (broom) and 26 (lighthouse) suggesting that these were the most difficult items. the easiest items, with the highest percentage of correct answers in the sample were 1 (fish), 2 (saw), 3 (table), 11 (apple) and 7 (dog). table 4: frequencies of full credit, half credit and no credit for each hooper visual organization test item. convergent validity was explored by correlating the total scores of the hvot with different domains of the moca (table 5). there were statistically significant correlations between all the respective domains and the moca total score with the hvot total score, except for the moca orientation total. a moderate positive correlation was noted between the hvot and the moca language total (r = 0,564, p ≤ 0,001). this means that participants who achieve a high score on the hvot will likely achieve a high language total score on the moca. there was a moderate, positive correlation between the hvot and the moca delayed recall total (r = 0.395, p ≤ 0.01). therefore, participants who obtain a high hvot score are likely to achieve a high score on the moca delayed recall subtest, and vice versa. a moderate, positive strong correlation was also noted between the hvot and the moca naming total (r = 0.354, p ≤ 0.01). consequently, high hvot scores are likely to be accompanied high moca naming total scores. similarly, a moderate positive correlation was observed between the moca visuospatial executive total and the hvot total (r = 0.193, p ≤ 0.005). lastly, a moderate positive correlation was noted between the hvot and the moca total score (r = 0,548, p ≤ 0,001), indicating a tendency that high scores on the hvot present with high total scores in the moca. table 5: pearson correlation coefficient showing the relationship between the hooper visual organization test total score and different subtests of the montreal cognitive assessment. preliminary norms for the hvot are presented in table 6, stratified by age, level of education and gender. it is important to note that the age group 19–40 years, with an education of 2–11 years for both men and women, were excluded because of small sample sizes. the groups which showed the highest performance were men and women between the ages of 19–40 years with an education of 12–22 years. lowest performance was seen in male participants in the 41–70 years-of-age category with an education of 2–11 years. table 6: hooper visual organization test normative performance of a south african sample stratified by age, education and gender (percentiles). discussion the primary focus of the present study was to develop stratified hvot norms for a south african sample of participants that do not speak english as a first language and who have attended public primary and secondary schools. by selecting this specific sample, our study mitigated the effects of multi or bilingualism and quality of education as sources of biases in cognitive tests (watts & shuttleworth-edwards, 2016). for this, the effects of sociodemographic variables (age, number of years of formal education and gender) on the hvot total score were investigated. mcdonalds omega was used to determine the internal consistency reliability of the hvot. item response theory was used to further examine item functioning on the hvot. this study also provided evidence of diagnostic and convergent validity, which is necessary for the evaluation of the clinical utility of this test in south africa. in this study, the highest means were obtained by the youngest and most educated groups (female mean = 20,5/sd = 4,6 and male mean = 20,5/sd = 5.1), but these were lower than the cut-off point of 21 suggested by hooper (1958) and by a demographically similar group from greece (mean = 25.43/sd = 2.17) (giannakou & kosmidis, 2007) and from venezuela (50th percentile = 25) (campagna & ferreira-correia, 2021). our results indicate that people with more education performed better than those with less. the effects of age on the total score of the hvot are illustrated by the common use of this variable in the norms stratification (devries, 2005; hooper, 1983; tamkin & jacobsen, 1984). years of education are included less frequently in the hvot norms, despite the impact of this variable in cognitive performance, and more specifically in the visuospatial function (roldán-tapia, cánovas, león & garcía-garcia, 2017). two exceptions are the hvot norms for the venezuelan and greek populations (campagna & ferreira-correia, 2021; giannakou & kosmidis, 2006). in our study, women also performed better than men. although a gender bias in the visuospatial functions has been suggested (hatta et al., 2015; parsons et al., 2004), our study challenges this notion, as other studies suggest that men outperform women (campagna & ferreira-correia, 2021; hatta et al., 2015), whereas other reports did not find any significant relationship between gender and hvot scores (giannakou & kosmidis, 2006). future studies in south africa should explore the potential contribution of gender towards the total score of the hvot whilst controlling for age and years of education. whilst the group differences described above are noteworthy, results from the dif analysis should be noted. the items flagged for dif across age, gender and education may have contributed somewhat, and to varying degrees, to these observed mean score differences. however, the influence of these items is likely to be minor, as relatively few items were affected in each case. these results should be investigated and confirmed in future research with larger samples to determine if the dif results observed in this study are indeed robust. the results from the current investigation suggest that the hvot has good reliability with satisfactory estimates observed for both mcdonalds omega total and rasch person reliability. these were consistent with reliability estimates reported in other work (campagna & ferreira-correia, 2021). item response theory analysis supported the construct validity of the hvot with all items fitting the rasch model. when inspecting the item location parameters, there was a clear progression in item difficulty with a few unexpected results, with some items being easier than anticipated whilst others were more difficult. when comparing the item frequencies reported in table 4 to that of devries (2005), none of the items in this sample obtained 100% correct responses. item one (fish), however, was considered to be the easiest, with only 1% of the participants giving incorrect responses. surprisingly, item 11 was amongst the easiest items, whilst item 25 (block) was one of the most difficult. hence, when clinicians administer the hvot, the order of administration should follow empirical data that is context specific (campagna & ferreira-correia, 2021). when administration takes place by presenting the items in the order of difficulty, there is a probability that one takes into account the application of a discontinuation rule (campagna & ferreira-correia, 2021) which yields good discriminatory power (wetzel & murphy, 1991). it also decreases the time taken to administer the test as well as levels of fatigue for the patient (campagna & ferreira-correia, 2021). however, given the findings of the current study, no discontinuation rule should be applied when administering the hvot in the south african context. in this study, we presented the hvot norms for the south african population stratified by age, gender and years of education. all these variables had a significant correlation to the total hvot score. to our knowledge, no other norms for the hvot are available in this country. the mean hvot scores of adults in georgia, which were stratified by age (tamkin & jacobsen, 1984), revealed similar mean scores which were obtained for the specific south african population, as was defined previously. also, a more recent normative study conducted on the venezuelan population demonstrated a significant association between age, gender and level of education (campagna & ferreira-correia, 2021), much like the current study. it is, however, important to consider that the generalisability of the current norms may be questionable. one of the reasons is that the use of quota sampling represented a limitation because some of the resulting subgroups were too small to be representative of a particular set of demographics (e.g. the age group of 19–40 years who had an education level of 2–11 years for both men and women). a potential confounding association between naming abilities and performance on the hvot has been reported (greve et al., 2000), but it has been challenged (paolo, cluff & ryan, 1996). in our investigation, a strong correlation was evident between the hvot and the moca language total (r = 0.564, p = 0.000) and the moca naming total (r = 0.354, p = 0.000). this study therefore supports the claim that naming ability may have an impact on hvot performance. additional research should consider incorporating other psychometric measures apart from the moca in order to further validate the hvot’s association with naming ability, and south african clinicians should consider assessing naming ability in the language of assessment when using the hvot. furthermore, the significant correlation between the total moca and the majority of the sub-component scores and the hvot total score would support the argument that the hvot can act as a screening test. studies have shown that visuo-perceptive tests like the hvot can be multifactorial (campagna & ferreira-correia, 2021; devries, 2005) as it indirectly recruits several cognitive functions beyond the core one. this may support the value of the hvot as a screening tool, although keeping in mind the limitations of these tasks as diagnostic tools (roebuck-spencer et al., 2017). furthermore, the fact that answers can be accepted in different languages allows for assessment of linguistically diverse patients. however, this needs to be further explored in studies that better control for this. although this was accepted, answering in the participants’ home language was not overtly encouraged. therefore, the impact of these linguistic variables (naming capacity, english proficiency, and choosing to provide answers in different languages) on the the psychometric properties of the hvot remains to be investigated. in terms of the hvot’s ability to discriminate between the normal control and clinical group, it was apparent that the clinical group performed significantly worse than the healthy control group. this supports literature which states that patients with hd/hdl2 often present with visuo-constructive deficits (gómez-tortosa, del barrio, barroso & garcía ruiz, 1996), and that the hvot is known to be able to discriminate these cognitive dysfunctions in patients (azambuja et al., 2012). given that this study only included a small sample of hd/hdl2 clinical population, the generalisability on the results to hd/hdl2 and other pathologies is limited. both the clinical and control groups comprised of small sample sizes. however, literature suggests that it is better to make use of well-matched, small homogenous groups (n > 5) rather than large, heterogenous groups (crawford & garthwaite, 2012). a well-defined and homogenous sample was selected for this project and is representative of a large proportion of the south african population. therefore, this study may have significant value for clinicians using the hvot in this context, despite the small sample size. implications and recommendations south african clinicians working with patients with demographic characteristics similar to our sample are encouraged to use the adapted version of the hvot and the stratified norms provided in order to reduce the biases caused by the use of non-representative norms. future studies should expand the current control sample to include participants with different demographic characteristics (e.g. english first language speakers and younger and older adults) and better explore the construct validity of this test. conclusion this study represents an important contribution to the literature on psychological assessment in south africa, as it demonstrates the psychometric properties and potential of the hvot and provides preliminary stratified normative data for south african polyglot adults who do not speak english as a first language and attended public schools. the test yielded good reliability, convergence and discriminatory validity, although the item difficulty values did not follow the expected monotonic increase. the total hvot correlated significantly with age, years of education and gender. acknowledgements we are grateful to all the participants who volunteered their time to be part of this study. the article is based on the first author’s thesis, submitted to the university of the witwatersrand for the degree of master of arts in social and psychological research. the data reported in this article were collected for the phd thesis of the second author (ferreira-correia, 2019). competing interests the authors declare that they have no conflict of interests, financial or personal, that may have inappropriately influenced them in writing this article. authors’ contributions s.m.-k. was responsible for the study concept and design, quality control of the data, data analysis and interpretation, writing of manuscript. a.f.-c. was responsible for the study concept and design, organisation, acquisition and capturing of data, supervision of the study, and critical revision of the manuscript for important intellectual content. c.j.j.v.z. was responsible for the psychometric data analysis, critical revision of the manuscript for important intellectual content. funding information the assessment of the hdl2 cases was partially financed by the medical research council’s self-initiated research grant, south africa entitled ‘the clinical and genetic profile of huntington disease-like 2 (hdl2) in south africa’. data availability the anonymised data set that supports the findings of this study are available on request from the corresponding author. disclaimer the views expressed in the present article are of the authors alone and not an official position of the institutions they are affiliated with. references ardila, a., ostrosky-solis, f., rosselli, m. & gomez, c. (2000). age-related cognitive decline during normal aging: the complex effect of education. archives of clinical neuropsychology, 15(6), 495–513. https://doi.org/10.1093/arclin/15.6.495 azambuja, m.j., radanovic, m., haddad, m.s., adda, c.c., barbosa, e.r. & mansur, l.l. (2012). language impairment in huntington’s disease. arquivos de neuro-psiquiatria, 70(6), 410–415. https://doi.org/10.1590/s0004-282x2012000600006 booth, r.d.l. & happé, f.g.e. (2018). evidence of reduced global processing in autism spectrum disorder. journal of autism and developmental disorders, 48(4), 1397–1408. https://doi.org/10.1007/s10803-016-2724-6 bond, t.g. & fox, c.m. (2015). applying the rasch model: fundamental measurement in the human sciences (3rd ed.). mahwah, nj: l. erlbaum. boyd, j.l. (1981). a validity study of the hooper visual organization test. journal of consulting and clinical psychology, 49(1), 15–19. https://doi.org/10.1037/0022-006x.49.1.15 campagna, l. & ferreira-correia, a. (2021). hooper visual organization test: psychometric properties and regression-based norms for the venezuelan population. retrieved from http://mc.manuscriptcentral.com/hapn crawford, j.r. & garthwaite, p.h. (2012). single-case research in neuropsychology: a comparison of five forms of t-test for comparing a case to controls. cortex, 48(8), 1009–1016. https://doi.org/10.1016/j.cortex.2011.06.021 devries, m.r. (2005). analysis of group differences and predictors of hooper visual organization test scores. western michigan university. retrieved from https://scholarworks.wmich.edu/dissertations/1027 eberson, s.c. (2000). visuospatial impairments in alzheimer’s disease and huntington’s disease. masters thesis, california university. california, usa: california state university san marcos. https://scholarworks.calstate.edu/concern/theses/tm70mv67f elias, m.f., dore, g.a., goodell, a.l., davey, a., zilioli, m.k.c., brennan, s. & robbins, m.a. (2011). normative data for elderly adults: the maine-syracuse study. experimental aging research, 37(2), 142–178. https://doi.org/10.1080/0361073x.2011.554511 ferreira-correia, a. (2019). the neurocognitive profile of huntington disease-like 2: a comparison with huntington disease and healthy controls. phd thesis. johannesburg: university of the witwatersrand. ferreira correia, a., anderson, d.g., cockcroft, k. & krause, a. (2020). the neuropsychological deficits and dissociations in huntington disease-like 2: a series of case control studies. neuropsychologia 136, 107238. https://doi.org/10.1016/j.neuropsychologia.2019.107238 field, a. (2018). discovering statistics using ibm spss statistics (5th ed.). uk, london: sage. foxcroft, c., paterson, h., le roux, n. & herbst, d. (2004). psychological assessment in south africa: a needs analysis: the test use patterns and needs of psychological assessment practitioners: final report. pretoria: human science research council. gasparini, m., hufty, a.m., masciarelli, g., ottaviani, d., angeloni, u., lenzi, g.l. & bruno, g. (2008). contribution of right hemisphere to visual imagery: a visual working memory impairment? journal of the international neuropsychological society, 14(5), 902–911. https://doi.org/10.1017/s1355617708080995 giannakou, m. & kosmidis, m.h. (2006). cultural appropriateness of the hooper visual organization test? greek normative data. journal of clinical and experimental neuropsychology, 28(6), 1023–1029. https://doi.org/10.1080/13803390591004374 gómez-tortosa, e., del barrio, a., barroso, t. & garcía ruiz, p.j. (1996). visual processing disorders in patients with huntington’s disease and asymptomatic carriers. journal of neurology, 243(3), 286–292. https://doi.org/10.1007/bf00868528 greve, k.w., lindberg, r.f., bianchini, k.j. & adams, d. (2000). construct validity and predictive value of the hooper visual organization test in stroke rehabilitation. applied neuropsychology, 7(4), 215–222. harris, p.a., taylor, r., thielke, r., payne, j., gonzalez, n. & conde, j.g. (2009). research electronic data capture (redcap) – a metadata-driven methodology and workflow process for providing translational research informatics support. journal of biomedical informatics, 42(2), 377–381. https://doi.org/10.1016/j.jbi.2008.08.010 hatta, t., iwahara, a., hatta, t., ito, e., hatta, j., hotta, c., … hamajima, n. (2015). developmental trajectories of verbal and visuospatial abilities in healthy older adults: comparison of the hemisphere asymmetry reduction in older adults model and the right hemi-ageing model. laterality, 20(1), 69–81. https://doi.org/10.1080/1357650x.2014.917656 hooper, h.e. (1958). the hooper visual organization test manual. california, usa: western psychological services. leedy, p.d. & ormrod, j.e. (2015). practical research planning and design (11th ed.). new york, usa: pearson. lezak, m.d., howieson, d.b., bigler, e.d. & tranel, d. (2012). neuropsychological assessment. (5th ed.). usa, new york: oxford university press. linacre, j.m. (2015). winsteps rasch measurement computer program user’s guide. beaverton, or: winsteps.com. lin, y.-h., su, c.-y., guo, w.-y. & wuang, y.-p. (2012). psychometric validation and normative data of a second chinese version of the hooper visual organization test in children. research in developmental disabilities, 33(6), 1919–1927. https://doi.org/10.1016/j.ridd.2012.05.016 lopez, m.n., lazar, m.d. & oh, s. (2003). psychometric properties of the hooper visual organization test. assessment, 10(1), 66–70. https://doi.org/10.1177/1073191102250183 lucas, m.d. (2013). neuropsychological assessment in south africa. in s. laher & k. cockcroft (eds.), psychological assessment in south africa (pp. 186–200). johannesburg: wits university press. merten, t. & beal, c. (1999). an analysis of the hooper visual organization test with neurological patients. the clinical neuropsychologist, 13(4), 521–529. https://doi.org/10.1076/1385-4046(199911)13:04;1-y;ft521 miller, i.n., himali, j.j., beiser, a.s., murabito, j.m., seshadri, s., wolf, p.a. & au, r. (2015). normative data for the cognitively intact oldest-old: the framingham heart study. experimental aging research, 41(4), 386–409. https://doi.org/10.1080/0361073x.2015.1053755 mitolo, m., hamilton, j.m., landy, k.m., hansen, l.a., galasko, d., pazzaglia, f. & salmon, d.p. (2016). visual perceptual organization ability in autopsy-verified dementia with lewy bodies and alzheimer’s disease. journal of the international neuropsychological society, 22(6), 609–619. https://doi.org/10.1017/s1355617716000436 nasreddine, z.s., phillips, n.a., bédirian, v., charbonneau, s., whitehead, v., collin, i., cummings, j.l. & chertkow, h. (2005). the montreal cognitive assessment, moca: a brief screening tool for mild cognitive impairment. journal of the american geriatrics society, 53(4), 695–699. https://doi.org/10.1111/j.1532-5415.2005.53221.x parsons, t.d., larson, p., kratz, k., thiebaux, m., bluestein, b., buckwalter, j.g. & rizzo, a.a. (2004). sex differences in mental rotation and spatial rotation in a virtual environment. neuropsychologia, 42(4), 555–562. paolo, a.m., cluff, r.b. & ryan, j.j. (1996). influence of perceptual organization and naming on the hooper visual organization test. neuropsychiatry, neuropsychology and behavioral neurology, 9(4), 254–257. paxton, j.l., peavy, g.m., jenkins, c., rice, v.a., heindel, w.c. & salmon, d.p. (2007). deterioration of visual-perceptual organization ability in alzheimer’s sisease. cortex, 43(7), 967–975. https://doi.org/10.1016/s0010-9452(08)70694-4 resnik, d.b. & shamoo, a.e. (2011). the singapore statement on research integrity. accountability in research, 18(2), 71–75. doi: 10.1080/08989621.2011.557296 revelle, w. (2021). psych: procedures for psychological, psychometric, and personality research. evanston, il: northwestern university. r package version 2.1.3. retrieved from https://cran.r-project.org/package=psych r core team (2020). r: a language and environment for statistical computing. vienna, austria: r foundation for statistical computing. retrieved from https://www.r-project.org/ roebuck-spencer, t.m., glen, t., puente, a.e., denney, r.l., ruff, r.m., hostetter, g. & bianchini, k.j. (2017). cognitive screening tests versus comprehensive neuropsychological test batteries: a national academy of neuropsychology education paper. archives of clinical neuropsychology, 32(4), 491–498. https://doi.org/10.1093/arclin/acx021 roldán-tapia, m.d., cánovas, r., león, i. & garcía-garcia, j. (2017). cognitive vulnerability in aging may be modulated by education and reserve in healthy people. frontiers in aging neuroscience, 9, 340. https://doi.org/10.3389/fnagi.2017.00340 sanz cortés, a., olivares crespo, m.e. & barcia albacar, j.a. (2011). aspectos neuropsicológicos en pacientes diagnosticados de tumores cerebrales. clínica y salud, 22(2), 139–155. https://doi.org/10.5093/cl2011v22n2a4 su, c.-y., lin, y.-h., wu, y.-y. & wuang, y.-p. (2013). development of the chinese version of the hooper visual organization test: normative data. international journal of rehabilitation research, 36(1), 56–67. https://doi.org/10.1097/mrr.0b013e3283588b95 tamkin, a.s. & jacobsen, r. (1984). age-related norms for the hooper visual organization test. journal of clinical psychology, 40(6), 1459–1463. https://doi.org/10.1002/1097-4679(198411)40:6<1459::aid-jclp2270400633>3.0.co;2-3 watts, a.d. & shuttleworth-edwards, a.b. (2016). neuropsychology in south africa: confronting the challenges of specialist practice in a culturally diverse developing country. the clinical neuropsychologist, 30(8), 1305–1324. wetzel, l. & murphy, s.g. (1991). validity of the use of a discontinue rule and evaluation of discriminability of hooper visual organization test. neuropsychology, 5(2), 119–122. https://doi.org/10.1037/0894-4105.5.2.119 abstract introduction methods common method variance results discussion implications for theory and practice limitations conclusion acknowledgements references about the author(s) ibrahim a. musenze department of economics and management, faculty of management sciences, busitema university, tororo, uganda thomas s. mayende department of business management, faculty of business, ict university, iganga, uganda citation musenze, i.a., & mayende, t.s. (2020). a psychometric evaluation of the 17-itemed utrecht work engagement scale in uganda. african journal of psychological assessment, 2(0), a8. https://doi.org/10.4102/ajopa.v2i0.8 original research a psychometric evaluation of the 17-itemed utrecht work engagement scale in uganda ibrahim a. musenze, thomas s. mayende received: 17 dec. 2018; accepted: 18 sept. 2019; published: 29 jan. 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract this study aimed at the establishment of the psychometric properties of the 17-itemed utrecht work engagement scale (uwes-17) itemed factorial structure. this was done by examining the similarities and differences in terms of model fit of the tri-factor model to a one-factor model. using a cross-sectional design, confirmatory factor analysis was used to evaluate the 17-item uni-dimensional and the 17-item tri-factor uwes respectively on a sample of 323 education assistants (professional teachers) in uganda. the study confirmed an 11 -item tri-factor uganda’s primary school sample (uwes-ug) as a reliable and parsimonious factor structure within this cohort. the sample was restricted to teachers and this limits the generalisability of the findings. on account of these results, the study sample evidently attests to the fact that work engagement is best represented as a tri-factor construct in the ugandan context. this study contributes to theory by confirmation of the three-factor structure of work engagement in developing countries through use of perceptual data from a ugandan sample. this is a pioneer empirical study that validates the uwes 17-itemed scale in uganda. keywords: uwes-ug; uwes-17; psychometric evaluation; work engagement; uganda. introduction engaged teachers can be conceptualised as teachers ‘who feel energetic and dedicated, and are absorbed by their work’ (bakker, schaufeli, leiter, & taris, 2008, p. 188). this implies that such personnel work hard (have vigour), are immensely involved in teaching work (are dedicated) and feel happily engrossed (are absorbed) in their work (bakker et al., 2008). these teachers experience positive emotions comprising happiness, joy and enthusiasm; experience enhanced psychosomatic health; are capable of designing their own job and personal resources (like getting support from others) and transmit their engagements to others (bakker et al., 2008). the issue of engaged teachers is now a global matter. for instance, because of its importance, a scale has been developed to better visualise it (sasmoko, doringin, indrianti, goni, & ruliana, 2018). annual attrition rates have been on the rise in the teaching profession, and more interestingly, many teachers report low levels of engagement (oecd, 2005). past studies suggest that highly engaged teachers are less likely to want to quit their jobs (klassen et al., 2013); yet, low attrition levels among teachers do not necessarily signify high levels of engagement. over the years, uganda has persistently faced acute and unequal distribution of its primary school teachers across regions and schools (moes, 2016). attrition levels among primary school teachers were severe following the introduction of universal primary education (upe) in 1997. the inception of upe resulted in high pupil enrolment, necessitating mass teacher recruitment, and meanwhile, salaries for the teachers have been gradually rising, though still low (moes, 2014b). between 1997 and 2010, enrolment shot up from 2.9 million to over 8.0 million (moes, 2014b), and since then, the number is increasing steadily. the upe programme has increased workload leading to a poor pupil–teacher ratio (ptr). according to the education management information system report (2014a), in 2001, the ptr in government-aided schools was 98:1, while in privately run schools, it was at 58:1. this has since improved to 54:1, though it is still above the national average target of 45:1. as a result, the government of uganda via the implementing body (the ministry of education and sports), adopted some policy interventions including the construction and rehabilitation of schools; buying of text books and co-curricular materials; implementation of teacher training and development policies; implementation of measures to deal with teacher absenteeism (hard-to-reach, hard-to-stay); and strategies for teacher retention and syllabi reforms among others (moes, 2014b). besides, unqualified teachers, hereafter referred to as licenced teachers (lts) were recruited for the delivery of primary education services. though lts proved to be useful over time, challenges related to competence persisted. it is worth pointing out that both lts and qualified teachers in uganda, remained vulnerable to high rates of attrition (moes, 2014a, 2014b, 2016). these policy interventions are yet to materialise into completely reasonable enrolment and retention rates as will be determined through regular staff head-counts. attrition levels are predominantly high in the countryside where, in addition to greater need, teachers grapple with an increased workload because of massive pupil enrolment, poor remuneration, hard-to-reach areas and lack of or poor accommodation facilities (kagolo, 2013). attrition denotes a reduction in the number of workers as a result of retirement, resignation or death and attrition rate refers to reduction rate in size or number of workers (india, 2019). primary school teacher attrition rates differ widely across diverse settings and agenda (kagolo, 2013; moes, 2014a), signifying an array of interacting factors such as engagement that determine primary school teachers’ decision to remain working in a particular school. research has repetitively confirmed that workers who are engaged in their work contribute considerably to quality service delivery, productivity and innovation (konermann, 2012; salanova, agut, & peiró, 2005). engaged workers exhibit extraordinary energy and enthusiasm at work. therefore, work engagement has significant effects for organisations. it does not only trigger exceptional performance, but also enhances organisational commitment and customer loyalty (halbesleben, 2010; salanova et al., 2005). according to vallières and mcauliffe (2015), carr et al. (2012), vallières, mcauliffe, hyland, galligan and ghee (2017), organisational psychology (op) is increasingly considered a significant field to help overcome the current challenges of human resources in organisations. organisational psychology has the unique ability to broaden our present perception of the issues that lower staff attrition. an appropriate grasp of the psychological issues that contribute to a durable teacher engagement in their workplaces is regarded important (wurie, samai, & witter, 2016). current research calls for greater and better evidence to lessen high attrition levels through the development and at some point, the testing of the level of engagement using a durable and reliable tool. in view of this, the utrecht work engagement scale (uwes), a 17-itemed variant, has been adopted and used among employees both in the highly-developed and mid-developed countries (ahmed, majid, & zin, 2016; shimazu et al., 2008; storm & rothmann, 2003). presently, available research examining the scales’ factorability, reliability and validity for individuals from low-developed countries, with the exception of vallières, et al.’s study of 2017 in sierra leone, is limited, and specifically invisible in uganda. this study is therefore a response to the calls for testing of the uwes in different multi-cultural settings (cf. balducci, fraccaroli, & schaufeli, 2010; petrović, vukelić, & čizmić, 2017; schaufeli & bakker, 2003; schaufeli, bakker, & salanova, 2006). also, though earlier studies have revealed acceptable reliability and validity under diverse contexts; for instance, in a multi-national setting involving some european, scandinavian and african countries (schaufeli et al., 2006), in brazil (vazquez, dos santos magnan, pacico, & hutz, 2015) and in hong kong (fong & ho, 2015), to mention but a few, there remain many unsettled issues surrounding the scale’s dimensionality, or whether its replication would provide similar results across continents and countries. moreover, debates on the uwes are yet to be reconciled and present several lacunae. for instance, some evidence suggests that a nine-item uni-dimensional scale, presents better and robust results over the three-factor 17-itemed scale (schaufeli et al., 2006; seppälä, mauno, hakanen, kinnunen, tolvanen, & schaufeli, 2009). further, it is still unclear if the three-dimensional, 17-item uwes (schaufeli & bakker, 2003) offers identical and reliable results along contrasting demographics and work situations (seppälä, et al., 2009). factorial frameworks meeting acceptable thresholds abound. some support has been provided for the uni-dimensional factor structure (alok, 2013; de bruin, hill, henn, & muller, 2013; fong & ho, 2015; sautier et al., 2015; shimazu et al., 2008; vallières et al., 2017), some for the bi-factor model (kulikowski, 2017) and some for the original tri-factor model (hadassah & balducci, 2013; lathabhavan, balasubramanian, & natarajan, 2017). therefore, the findings in regards to the uwes’ dimensionality are still inconclusive. for the case of uganda, a dearth of studies providing evidence relating to the uwes application exists. therefore, testing the psychometric properties of the uwes (schaufeli & bakker, 2003) specifically in uganda, and sub-saharan africa in general, might contribute to knowledge growth in terms of its validation, generalisation in developing countries and application in workplace situations. its properties need to be re-examined so that it can be applied in individual and organisational settings with more rigour. to fill the above gaps, we set to examine the psychometric properties of the uwes-17. the specific objectives were, (1) to evaluate the factorial validity by comparing the fit of the tri-factor model to that of the uni-factor model (which assumes that all items load on one single underlying dimension), (2) examine the scale’s reliability using cronbach’s alpha coefficient on a ugandan sample. methods participants from a total population of 1700 education assistants (primary school teachers), as obtained from the updated staff list from the directorate of human resources as of 30 january 2018 – from a district local government in uganda, a sample of 323 respondents were selected to complete the uwes-17 questionnaire. however, only 225 questionnaires were retrieved and therefore, judged usable. the usable questionnaires constituted a response rate of about 70%. participants were neither identified by names in the research process nor coerced into taking part in the study – they could leave at any stage of the research. the mean age was 38–48 years (sd = 10.00), with 54% being female. in terms of educational background, 45% of the sample had, at the least, graduated from higher educational institutions, with a diploma in education, while, the majority (55%), had a basic certificate in education. in order to draw a sample for this study, we relied on suggestions by yamane (1967), and krejcie and morgan (1970), generating a sample of 323 and 313 respectively. we used a sample size of 323 based on yamane’s guidelines because it gives exact values. later, we adopted a simple random technique to draw a sample of 323 participants from a population of 1700 primary school teachers. we considered the following inclusion criteria: all participants had to be formally employed and duly appointed by the district service commission (a body charged with primary teachers’ recruitment in the district as either education assistants, senior education assistants, principal education assistants, deputy head teachers, and head teachers). measures in order to evaluate work engagement, the uwes’s short version uwes-17 (schaufeli & bakker, 2003) was adopted. this is a self-report scale that was scored on a 5-pont likert rating scale: 1 (strongly disagree) to 5 (strongly agree). vigour was assessed using six items, dedication using five and absorption using six questions. the choice of uwes-17 was dictated by its extensive usage, parsimony in terms of empirical validation and its capacity to evaluate staff’s work engagement regardless of their specialised and work-related focus (seppälä, 2013; sinval, pasian, & marôco, 2018). since uganda uses english as an official language, and considering that all the respondents were literates, there was no need for back and forth translations. procedures consistent with the work of hinkin (1998), in order to develop and test the adequacy of the uwes tool, we conducted a pilot test on 10 employees from private primary schools. the respondents filled in a self-report tool (uwes-17). using the district education officer, and the district constituent inspectors as contact persons, we accessed the respondents and distributed the questionnaires for completion. the participation was voluntary and respondents were not required to indicate their names on the questionnaire. out of 323 questionnaires that were physically distributed, 225 were retrieved constituting a response rate of approximately 70%. an attempt was undertaken to explain the aim of the study to participants. the authors ensured the participants consent was given by means of signed consent forms that were completed before commencing the study. statistical analysis in validation of the scale, a confirmatory factor analysis technique executed in amos 21.0 (arbuckle, 2012) was relied on. the psychometric validity of two uwes versions (i.e., the 17-itemed uni-dimensional scale, and the 17-itemed three-factor scale) was validated. confirmatory factor analysis (cfa) was conducted using the maximum likelihood estimation procedure to determine the appropriateness of both the uni-dimensional and tri-factor models. the goodness of fit of the models was assessed based on the following conventional benchmarks: the goodness of fit index (gfi) ≥ 0.8, adjusted goodness of fit index (agfi) ≥ 0.8, tucker–lewis index (tli) ≥ 0.9, the comparative fit index (cfi) ≥ 0.9 and the root mean square error of approximation (rmsea) ≤ 0.06 (hu & bentler, 1998). to examine the reliability of the scale, cronbach’s alpha coefficients (a) that were the determinants of internal consistency and homogeneity were assessed. cronbach’s alpha coefficients (a) having a value of 0.70 and above were used as the cut-off threshold (amin, 2005; nunnally & bernstein, 1994). the above test posted values of 0.86, and 0.72 for the uni-dimensional and tri-factor models respectively. fit indices we used multiple fit indices to evaluate model fit (for instance, absolute and incremental). the absolute model fit was examined with the chi-square (c2) index and the fit of the alternate models was compared with the c2 difference test consistent with satorra and bentler’s (2001) guidelines. rule of the thumb suggests that a non-significant c2 statistic signifies robust model fit (kline, 2011). further, in the c2 difference test, a non-significant decrease in c2, relative to the change in the number of degrees of freedom (df ), shows that the constrained model is satisfactory. the baseline model is more acceptable if there is a significant reduction in c2. the models’ fits were also assessed through other fit statistics. the rmsea (browne & cudeck, 1993) provides an estimate of the difference between the hypothesised model and the true population model. rmsea adjusts for errors of approximation in the population (bollen, 1989). rmsea depicts the error of approximation and the values of 0.06 and below indicate better fit of the model (hu & bentler, 1998); values less than 0.08 but above 0.06 indicate reasonable model fit; while values above 0.08 indicate poor model fit (browne & cudeck, 1993). the incremental fit of the models was assessed through the non-normed fit index (nnfi), and the cfi. the nnfi and cfi measure model improvement by comparing the hypothesised model’s fit statistics with an independence model. according to hu and bentler (1999), the cfi and nnfi statistics of 0.95 and above indicate good model fit. we also adopted goodness of fit (gfi) and agfi. according to kim (2007), gfi and agfi values that are above 0.90 indicate acceptable fit statistic. common method variance in order to minimise common method biases, and given that the data were collected from the same source, we undertook several safeguards based on the recommendation of podsakoff, mackenzie and podsakoff (2012) and williams and mcgonagle (2016). initially, respondents were informed that their identities were to remain anonymous and information gathered from them would remain confidential. none of them had to fill in their names in the survey instrument. secondly, instead of grouping questionnaire items under the construct to which they were associated, the items were randomly ordered. this technique aided in the reduction of the probability of priming effects produced by item entrenchment (embeddedness). thirdly, three survey sessions were conducted a week apart, which helped to suppress consistency themes. we additionally conducted the harman’s one-factor test to spot the common method bias threat (podsakoff, mackenzie, & podsakoff, 2003). in this analysis, the first factor did not account for the greatest variance (30.1%), which is less than the 50%. all factors explained 68.3% of the total variance. this finding further suggests a tolerable common method bias. ethical considerations prior to carrying out this study, ethical clearance was obtained from the faculty of management sciences of busitema university under the ethical clearance number: fgsec no. 14/18/2. results descriptive statistics the means, standard deviations and inter-correlations of the variables are reported in table 1. dedication is positively related to vigour (r = 0.450, p < 0.01), and absorption is positively related to vigour (r = 0.347, p < 0.01) and dedication (r = 0.520, p < 0.01). table 1: means, standard deviations and inter-correlations among variables (n = 225). factorial validity of the utrecht work engagement scale in uganda the cfa results of the uni-dimensional and the tri-factor models of the uwes-17 in ugandan context are shown in table 2 and figures 1 and 2. regardless of the underlying factor structure, uni-dimensional model of the uwes-17 fits the data poorly with rmsea of 0.103 beyond the mentioned criteria. the chi-square test (c2 = 399.412/df = 119) was significant (p = 0.000), well above the acceptable limits. other fit indices such as cfi (0.327), nfi (0.277), tli (0.251), gfi (0.811), and agfi (0.757) were below the prescribed criteria. figure 1: uni-dimensional utrecht work engagement scale-17. figure 2: tri-factor utrecht work engagement scale-17. table 2: confirmatory factor analysis results. for the tri-factor uwes-17 model, a slightly acceptable fit to data was established. rmsea was 0.067, which met the threshold values of below 0.08 (browne & cudeck, 1993). the lower chi-square, from the chi-square test was marginally better compared to the uni-dimensional model (c2 = 231.369/df = 116), and the model was significant (p = 0.004). other fit indices such as cfi (0.721), nfi (0.681), tli (0.675), gfi (0.896) and agfi (0.862), though below the prescribed criteria, were marginally acceptable in comparison to the uni-dimensional model. in view of the above, the tri-factor model of the uwes-17 moderately fit the data. therefore, further analysis was based on the tri-factor model of the uwes-17. post hoc analyses given the moderate, but not acceptable fit of the tri-factor model of the uwes-17, the attention moved from model test to model development. in view of the high standardised residuals of six items: that is vigour = item 4, item 5 and item 6; dedication = item 4, and item 5 and absorption = item 5, a decision was taken to re-specify the model with the above items deleted, one at a time. model re-specification was therefore based on further scrutiny of descriptive and reliability statistics, the modification indices and on theoretical considerations (schaufeli, salanova, gonzalez-roma, & bakker, 2002). therefore, the tri-factor model was re-specified with its parameters freely estimated. the re-specified tri-factor model showed better fit of the data (c2 = 46.870/df = 41) and was non-significant (p = 0.244). analysis revealed a rmsea of 0.025, which met the prescribed criteria of less than 0.06 (browne & cudeck, 1993). other fit indices such as cfi (0.969), nfi (0.952), tli (0.958), gfi (0.965) and agfi (0.964), showed that the re-specified model was robust as it appropriately fit the data. we therefore, confirmed an 11-item tri-factor uwes-17 model in uganda’s primary school sample (uwes-ug). the fit statistics are presented in table 3, while the standardised factor and descriptive statistics for the confirmed 11 item tri-factor uwes-17 model is shown in table 4 and figure 3. further, the critical ratio values used for determining the level of statistical significance for estimated parameters for the scale items were within the range of 34.087 and 86.487, well above the suggested minimum of > ± 1.96 and all the items were statistically significant at 0.001. figure 3: tri-factor utrecht work engagement scale-ug. table 3: confirmatory factor analysis results. table 4: standardised factor loadings, standard errors, and descriptive statistics for the uwes-17. discussion the purpose of this study was to examine the psychometric properties of the uwes-17 in a ugandan sample of primary school teachers. we aimed to evaluate the factorial validity in particular through comparison of the fit of the three-factor model to that of the one-factor model, which postulates that all items load on one single underlying construct. this study was inspired by the need for determination of the most robust and parsimonious technique of scoring this popular and extensively-used measure in an exclusive cultural setting. substantial arguments exist in the extant literature as to whether the uwes-17, is a uni-dimensional psychological construct or a tri-factor construct. findings of the cfa, offered support for a tri-factor uwes-17 model within the staff category of primary school teachers. findings confirmed an 11-item tri-factor uwes-17 model in uganda’s primary school sample (uwes-ug). this is essentially in line with previous research that did not find evidence for a uni-dimensional construct of work engagement (lathabhavan et al., 2017; hadassah & balducci, 2013). this may suggest that among ugandan employees (particularly the primary school teachers studied), work engagement measured by the uwes-17 still denotes a three underlying factor structure (vigour, dedication and absorption) rather than one. the uni-dimensional uwes-17 model displayed poor item discrimination. the items were poorly correlated (the correlations ranged from 0.30 to 0.47). the high correlations between the three factors – vigour, dedication and absorption, ranging between 0.89 and 0.94, would point to a uni-dimensional structure, though the excellent fit of the data of the correlated tri-factor model provided support for the three different, although highly correlated factors. this finding is in line with the work of schaufeli et al. (2006), who argue for uni-dimensional scale in multiple regression studies because the three sub-scales of vigour, dedication and absorption could lead to problems of collinearity and tri-factor scales, in studies that rely on structural equation modelling in work engagement research like this one. further, inspection of the factor loadings for both uni-dimensional and tri-factor uwes-17 models provided superior statistical evidence for the tri-factor model owing to its superior and robust statistical fit indices. given the high correlations between the 11-item tri-factor-confirmed work engagement model, the strong evidence of multi-dimensionality besides, the robust as well as acceptable model fit indices observed, we argue that the tri-factor model offers the finest statistical representation of the uwes-17 in the ugandan sample. also, in accordance with the suggestions of nunnally and bernstein (1994), the internal consistency of the 11-item three-factor uwes-ug was adequate. the cronbach’s alpha coefficient (a) for all three factors (vigour, dedication and absorption) was substantially higher than 0.78. these findings indicate that the 11-item uwes-ug version is a dependable measure of work engagement in the ugandan milieu of primary school teachers. the demonstration that the uwes-17, developed in a particular cultural context, reveals same psychometric properties in other cultural contexts (uganda) confirms its validity. the current findings are consistent with the past literature that that suggests that the tri-factor uwes-17 is an encouraging instrument for carrying out cross-cultural research on work engagement (cf. balducci et al., 2010; schaufeli & bakker, 2003; schaufeli et al., 2006). furthermore, the current findings also suggest that the uwes-ug might be useful for measuring engagement levels in diverse organisation settings. therefore, the tri-factor model of the uwes-ug offers a unique benefit of being the most parsimonious and fast scoring tool that could be adopted for usage by education managers. implications for theory and practice the current study presents significant implications for theory and practice. to begin with, the findings validate and extend the tri-factor structure of work engagement to developing countries by using data from a ugandan sample. therefore, an attempt has been taken towards appreciating the significance of the construct of work engagement within organisations (i.e., in uganda’s education sector). this research is important as work engagement studies in uganda can further develop with the availability of a validated and reliable research tool. this is in response to the calls by schaufeli and bakker (2003), schaufeli et al. (2006), balducci et al. (2010) and petrović et al. (2017), for testing of the uwes in different multi-cultural settings. therefore, examining the psychometric properties of the instrument might hasten work engagement studies in uganda. moreover, this study attempted to address a dearth of academic works on work engagement from low resourced countries (storm & rothmann, 2003; vallières et al., 2017). this finding provides evidence of the 11-item tri-factor model of work engagement across a spectrum of occupational settings. further, a revised and shorter measure of work engagement with only 11 items (uwes-ug) offers a parsimonious understanding of the work engagement construct. with the 11-item work engagement instrument, managers could gain from the advantage of applying a shorter work engagement tool in occupational settings, with the likelihood of obtaining a more comprehensive understanding of work engagement. also, from an organisational perspective, this study may be of help in the establishment of the extent to which work engagement represents the most appropriate scale. this might improve the usability of the instrument by the managers and thus boost employee productivity and organisation competitiveness. limitations this study is not immune to limitations. the respondents were taken from only one sector, that is, primary education. accordingly, there is a risk that the particular features of this sector (such as leadership, remuneration, location and professional training) influenced the study outcomes. this may call for future research on this area with multiple samples. secondly, the cross-sectional research design adopted by this study, curtails comprehensive observations on the instrument’s reliability and validity. future studies should consider the longitudinal approach to unmask the validity of the tri-factor uwes-17 in the ugandan context so that better conclusions on the adequacy of the scale can be drawn. thirdly, this instrument validation study relied on self-reported data that may have caused the threat of common method bias. storm and rothmann (2003), point out that studies like this one which rely on self-report measures face this challenge. conclusion this study underscores the context-specific validity of the uwes in the social and economic milieu of uganda. the findings have demonstrated that the 11 item tri-factor uwes uganda version has excellent psychometric properties and factorial structure in line with the theoretical model. accordingly, this confirms that the uwes-ug version is applicable in the ugandan context in empirical settings and for practical aims. on account of the established research findings it can be inferred that in uganda work engagement is a tri-dimensional construct comprising vigour, dedication and absorption. acknowledgements we are grateful to the mayuge district local government leadership of uganda for having allowed us to collect data from their staff, which enabled this research project to be a success. in a similar vein, we are immensely grateful to our respective deans (busitema university and ict university), who gave us additional opportunities outside the traditional roles within the faculties to undertake this study. lastly, we are much indebted to the anonymous reviewers who provided extra insights that led to the improvement of this article. competing interests there were no competing interests in the process of developing this article. authors’ contributions t.s.m. was the project leader, and responsible for conceptualisation, and project design, data collection and analysis. i.a.m. was responsible for project design, data analyses and report writing. funding information this was a self-funded project. data availability statement data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references ahmed, u., majid, a., & zin, m. (2016). construct validation of 17-item utrecht university work engagement scale amongst the white collar employees of malaysian universities. international journal of academic research in business and social sciences, 6(5), 306–312. https://doi.org/10.6007/ijarbss/v6-i5/2144 alok, k. (2013). work engagement in india: a factorial validation study of uwes-9 scale. management and labour studies, 38 (1&2), 53–62. https://doi.org/10.1177/0258042x13491478 amin, e.m. (2005). social science research: conception, methodology and analysis. kampala: makerere university press. arbuckle, j. (2012). amos 18 user’s guide. chicago, il: amos development corporation. bakker, a., schaufeli, w., leiter, m., & taris, t. (2008). work engagement: an emerging concept in occupational health psychology. work & stress, 22(3), 187–200. https://doi.org/10.1080/02678370802393649 balducci, c., fraccaroli, f., & schaufeli, w.b. (2010). psychometric properties of the italian version of the utrecht work engagement scale (uwes-9), a cross-cultural analysis. european journal of psychological assessment, 26, 143–149. https://doi.org/10.1027/1015-5759/a000020 bollen, k.a. (1989). structural equations with latent variables. new york: wiley. browne, m.w., & cudeck, r. (1993). alternative ways of assessing model fit. in k. bollen & j. long (eds.), testing structural equation models (pp. 136–162). newbury park, ca: sage. carr, s., eltayeb, s., maclachlan, m., marai, l., mcauliffe, e., & mcwha, i. (2012). aiding international development: some fresh perspectives from organisational psychology. in j. olson-buchanan, l. bryan, & l. thompson (eds.), using i-o psychology for the greater good: helping those who help others. washington, dc: american psychological association. de bruin, g., hill, c., henn, c., & muller, k.-p. (2013). dimensionality of the uwes-17: an item response modelling analysis. sa journal of industrial psychology/sa tydskrif vir bedryfsielkunde, 39(2), 1–8. https://doi.org/10.4102/sajip.v39i2.1148 fong, t., & ho, r. (2015). dimensionality of the 9-item utrecht work engagement scale revisited: a bayesian structural equation modeling approach. journal of occupational health, 57(4), 353–358. https://doi.org/10.1539/joh.15-0057-oa hadassah, l.-o., & balducci, c. (2013). psychometric properties of the hebrew version of the utrecht work engagement scale (uwes-9). european journal of psychological assessment, 29(1), 58–63. https://doi.org/10.1027/1015-5759/a000121 halbesleben, j. (2010). a meta-analysis of work engagement: relationships with burnout, demands, resources, and consequences. in a. bakker, & m. leiter (eds.), work engagement: a handbook of essential theory and research (vol. 8, pp. 102–117), new york: psychology press. hinkin, t.r. (1998). a brief tutorial on the development of measures for use in survey questionnaires. organizational research methods, 2(1), 104–121. https://doi.org/10.1177/109442819800100106 hu, l., & bentler, p.m. (1998). fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. psychological methods, 3(4), 424–453. https://doi.org/10.1037/1082-989x.3.4.424 hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6, 1–55. https://doi.org/10.1080/10705519909540118 india, b. (2019, july 11). attrition in indian bpo industry. retrieved from http://www.bpoindia.org/research/attrition.shtml kagolo, f. (2013, october 2). over 10,000 teachers quit each year for greener pasture. new vision. retrieved from https://www.newvision.co.ug/new_vision/news/1333054/-teachers-quit-greener-pasture. kim, k. (2007). structural equation modeling. seoul: hannarae. klassen, r., wilson, e., siu, a.f.y., hannok, w., wong, m.w., wongsri, n., … jansem, a. (2013). preservice teachers’ work stress, self-efficacy, and occupational commitment in four countries. european journal of psychology of education, 28, 1289–1309. https://doi.org/10.1007/s10212-012-0166-x kline, r. (2011). principles and practice of structural equation modelling (3rd ed.). new york: guilford press. konermann, j. (2012). teachers’ work engagement: a deeper understanding of the role of job and personal resources in relationship to work engagement, its antecedents, and its outcomes. enschede: universiteit twente. https://doi.org/10.3990/1.9789036533027 krejcie, v.r., & morgan, d. (1970). determining sample size for research activities. educational and psychological measurement, 30, 688. https://doi.org/10.1177/001316447003000308 kulikowski, k. (2017). one, two or three dimensions of work engagement? testing the factorial validity of the utrecht work engagement scale on a sample of polish employees. international journal of occupational safety and ergonomics. https://doi.org/10.1080/10803548.2017.1371958 lathabhavan, r., balasubramanian, s., & natarajan, t. (2017). a psychometric analysis of the utrecht work engagement scale in indian banking sector. industrial and commercial training, 49(6), 296–302. https://doi.org/10.1108/ict-04-2017-0031 moes. (2014a). educational management information system. kampala: ministry of education and sports. moes. (2014b). teacher issues in uganda: a shared vision for an effective teachers policy. ministry of education and sports. kampala: unesco – iiep pôle de dakar. moes. (2016). educational abstract. kampala: education policy and planning department. nunnally, j.c., & bernstein, i.h. (1994). psychometric theory (3rd edn.). sydney: mcgraw hill. oecd. (2005). oecd annual report: 45th anniversary. paris: oecd publications. petrović, i.b., vukelić, m., & čizmić, s. (2017). work engagement in serbia: psychometric properties of the serbian version of the utrecht work engagement scale (uwes). frontiers in psychology, 8(1799), 1–11. https://doi.org/10.3389/fpsyg.2017.01799 podsakoff, p., mackenzie, s.b., & podsakoff, n. (2003). common method biases in behavioral research: a critical review of the literature and recommended remedies. journal of applied psychology, 88(5), 879–903. https://doi.org/10.1037/0021-9010.88.5.879 podsakoff, p., mackenzie, s.b., & podsakoff, n. (2012). sources of method bias in social science research and recommendations on how to control it. annual review of psychology, 63(1), 539–569. https://doi.org/10.1146/annurev-psych-120710-100452 salanova, m., agut, s., & peiró, j.m. (2005). linking organizational resources and work engagement to employee performance and customer loyalty: the mediation of service climate. journal of applied psychology, 90(6), 1217. https://doi.org/10.1037/0021-9010.90.6.1217 sasmoko, doringin f., indrianti y., goni a.m., & ruliana, p. (2018). indonesian teacher engagement index (itei): an emerging concept of teacher engagement in indonesia. iop conf ser mater sci eng, 306(1). https://doi.org/10.1088/1757-899x/306/1/012119 satorra, a., & bentler, p.m. (2001). a scaled difference chi-square test statistic for moment structure analysis. psychometrika, 66, 507–514. https://doi.org/10.1007/bf02296192 sautier, l., scherwath, a., weis, j., sarkar, s., bosbach, m., schendel, m., … mehnert, a. (2015). assessment of work engagement in inpatient and rehabilitative oncological settings: psychometric properties of the german version of the utrecht work engagement scale 9 (uwes-9). die rehabilitation, 54, 1–7. https://doi.org/10.1055/s-0035-1555912 schaufeli, w., & bakker, a.b. (2003). test manual for the utrecht work engagement scale. unpublished manuscript, utrecht university, the netherlands. retrieved from http://www.schaufeli.com. schaufeli, w., bakker, a.b., & salanova, m. (2006). the measurement of work engagement with a short questionnaire: a cross-national study. educational and psychological measurement, 66(4), 701–716. https://doi.org/10.1177/0013164405282471 schaufeli, w., salanova, m., gonzalez-roma, v., & bakker, a. (2002). the measurement of engagement and burn out: a confirmative analytic approach. journal of happiness studies, 3, 71–93. https://doi.org/10.1023/a:1015630930326 seppälä, p. (2013). work engagement: psychometrical, psychosocial, and psychophysiological approach. faculty of social sciences, university of jyväskylä. jyväskylä: university library of jyväskylä. seppälä, p., mauno, s.f., hakanen, j., kinnunen, u., tolvanen, a., & schaufeli, w. (2009). the construct validity of the utrecht work engagement scale: multisample and longitudinal evidence. journal of happiness studies, 10, 459–481. https://doi.org/10.1007/s10902-008-9100-y shimazu, a., schaufeli, w.b., kosugi, s., suzuki, a., nashiwa, h., kato, a., … kitaoka-higashiguchi, k. (2008). work engagement in japan: validation of the japanese version of the utrecht work engagement scale. applied psychology: an international review, 57(3), 510–523. https://doi.org/10.1111/j.1464-0597.2008.00333.x sinval, j., pasian, s.q., & marôco, j. (2018). brazil-portugal transcultural adaptation of the uwes-9: internal consistency, dimensionality, and measurement invariance. frontiers in psychology, 9(353), 1–18. https://doi.org/10.3389/fpsyg.2018.00353 storm, k., & rothmann, s. (2003). a psychometric analysis of the utrecht work engagement scale in the south african police service. sa journal of industrial psychology, 29(4), 62–70. https://doi.org/10.4102/sajip.v29i4.129 vallières, f., & mcauliffe, e. (2015). reaching mdgs 4 and 5: the application of organizational psychology to maternal and child health programme sustainability in sierra leone. in i. mcwha-herrmann, d.c. maynard, & m. o’neill barry (eds.), contribution of humanitarian work psychology to the sustainable development goals (pp. 15–27). london: routledge. vallières, f., mcauliffe, e., hyland, p., galligan, m., & ghee, a. (2017). measuring work engagement among community health workers in sierra leone: validating the utrecht work engagement scale. journal of work and organizational psychology, 33, 41–46. vazquez, a., dos santos magnan, e., pacico, j., & hutz, c. (2015). adaptation and validation of the brazilian version of the utrecht work engagement scale. psico-usf, bragança paulista, 20(2), 207–217. https://doi.org/10.1590/1413-82712015200202 williams, l., & mcgonagle, a. (2016). four research designs and a comprehensive analysis strategy for investigating common method variance with self-report measures using latent variables. journal of business and psychology, 31, 339–359. https://doi.org/10.1007/s10869-015-9422-9 wurie, h., samai, m., & witter, s. (2016). retention of health workers in rural sierra leone: findings from life histories. human resources for health, 14(3), 1–15. https://doi.org/10.1186/s12960-016-0099-6 yamane, t. (1967). statistics, an introductory analysis (2nd edn.). new york: harper and row. abstract introduction executive functions and attention-deficit/hyperactivity disorder method results discussion conclusion acknowledgements references appendix 1: biographical questionnaire about the author(s) tshikani t. boshomane department of behavioural medicine, faculty of health sciences, university of kwazulu-natal, durban, south africa basil pillay department of behavioural medicine, faculty of health sciences, university of kwazulu-natal, durban, south africa anneke meyer department of psychology, faculty of humanities, university of limpopo, polokwane, south africa citation boshomane, t.t., pillay, b., & meyer, a. (2021). measures of executive functions predicting attention-deficit/hyperactivity disorder core symptoms. african journal of psychological assessment, 3(0), a48. https://doi.org/10.4102/ajopa.v3i0.48 original research measures of executive functions predicting attention-deficit/hyperactivity disorder core symptoms tshikani t. boshomane, basil pillay, anneke meyer received: 01 apr. 2021; accepted: 19 aug. 2021; published: 22 oct. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract attention-deficit/hyperactivity disorder (adhd) is a common childhood disorder, and in many children, adhd is thought to be aggravated by a deficit in executive functions (efs). this study tried to establish whether commonly used neuropsychological tests of ef also predicted the core symptoms of adhd, namely hyperactivity/impulsiveness (h/i) and inattention, as well as total adhd symptomatology, according to the diagnostic and statistical manual of mental disorders, 4th edition, text revision (dsm-iv-tr). the participants were children from the limpopo province, south africa, aged from 6 to 15 years (m = 11.7 years; sd = 1.7). one hundred and fifty-six children (51.3% girls) were assessed by neuropsychological tests of efs: the tower of london (tol), digits forward and digits backward, trails-a and trails-b and wisconsin card sorting test (wcst). forward stepwise regression analysis was employed to predict h/i and inattention, as well as total adhd symptomatology, based on dsm-iv-tr criteria. all the tests, except trails-a, were found to predict adhd symptomatology. the wcst (total errors) was the best predictor of all the adhd symptoms and also for h/i and inattention separately, followed by trails-b and digits backwards, which were found to predict more symptoms of inattention than h/i. perseverative errors on the wcst predicted more h/i symptomatology, whilst non-perseverating errors were more associated with inattention. the tol and digits forward predicted fewer adhd symptoms. the tol seemed more sensitive to inattention, whilst digits forward showed a stronger association with h/i. the wcst, digits backwards and trails-b may be used to measure ef to support the diagnosis of adhd in a clinical setting and to indicate cognitive impairment. keywords: adhd; executive functions; hyperactivity/impulsiveness; inattention; neuropsychological tests. introduction attention-deficit/hyperactivity disorder (adhd) is the most commonly diagnosed psychiatric disorder, affecting 5% – 7% of children and adolescents worldwide (polanczyk, de lima, horta, biederman, & rohde, 2007) and 5.5% in the limpopo province, south africa (meyer, eilertsen, sundet, tshifularo, & sagvolden, 2004). in about two-thirds of cases, adhd continues into adulthood (faraone, biederman, & mick, 2006). it is a neurodevelopmental disorder, characterised by the core symptoms of hyperactivity/impulsiveness (h/i), inattention or both (american psychiatric association, 2013). hyperactivity manifests as greater than usual levels of movement and activity and an inability to remain still for a long time (danielson et al., 2016), whilst impulsiveness is the tendency to act prematurely without anticipation or consideration of the consequences (dalley, everitt, & robbins, 2011). inattention can be described as the inability to focus, high levels of distractibility, forgetfulness and poor planning and organising abilities (elisa, balaguer-balleester, & paris, 2016). the diagnostic and statistical manual of mental disorders, 4th edition, text revision (dsm-iv-tr) (american psychiatric association, 2000) requires a child to meet six or more of h/i or six or more of inattention behaviours, for at least 6 months, before the age of 7 years. the dsm-iv criteria are mainly similar to those of dsm-5, except for the age of onset that changed from 7 to 12 years of age. executive functions and attention-deficit/hyperactivity disorder executive functions are an umbrella term that embraces a varied range of cognitive processes and abilities that facilitate goal-orientated behaviour and thought processes such as planning, insight, judgement, reasoning and cognitive flexibility (ogilvie, stewart, chan, & shum, 2011). the efs involve the cognitive abilities necessary for controlling attention, timed organisation of responses, goal-directed planning of complex tasks, abilities to access and manage information in long-term memory and the monitoring of current internal and external states (funahashi, 2001). the efs measurements are generally designed to measure performance in experimental settings; however, in real-life settings, the demands on ef capacities are complex, multifaceted and involve multiple sub-tasks (ogilvie et al., 2011). most research on efs focuses on the following: mental flexibility, which refers to the ability to switch rapidly between established task sets (van holstein et al., 2011). chiang and gau (2014) indicated that planning and problem-solving be defined as the categorising and organising of the steps and elements required to carry out an intention, whilst inhibition refers to the ability to suppress irrelevant stimuli or behavioural impulses to enable goal-directed behaviour. working memory is the cognitive ability to store limited amounts of information for a short period so that it can be manipulated to direct behaviour and to navigate the social world effectively (diamond, 2013). attention-deficit/hyperactivity disorder is not only a behavioural disorder that is characterised by hyperactivity and inattention in children and excessive restlessness and impulsiveness in adults but also a cognitive disorder (ciuluvica, mitrofan, & grilli, 2013). children with adhd show deficits in executive functions (efs) (barkley, 1997; miyake et al., 2000; nigg, 2017; willcutt, doyle, nigg, & faraone, 2005). children with adhd who do not present impairment in tasks in experimental settings may still face difficulties with everyday tasks that involve executive control (sonuga-barke, dalen, daley, & remington, 2002; sonuga-barke, dalen, & remington, 2003; thorell & wåhlstedt, 2006). all these processes and functions are complex and depend on multiple sub-processes and sub-functions (ogilvie et al., 2011). although children with adhd have often exhibited poor efs (thorell & wåhlstedt, 2006), these deficits are not present in all children with the disorder. researchers in the area have repeatedly emphasised the need to take the heterogeneity of efs into account when studying the symptomatology of adhd (sonuga-barke et al., 2002, 2003; thorell & wåhlstedt, 2006). the work of several authors suggests that adhd symptoms are the result of a primary flaw in a specific ef domain (e.g. response inhibition or working memory), or they arise from a more global difficulty with executive control (barkley, 1997; pennington & ozonoff, 1996; willcutt et al., 2005). because of the heterogeneity of efs, they are difficult to measure (miyake et al., 2000). miyake and friedman (2012) called this a task-impurity problem and maintain that any target efs must be embedded within a specific task context. therefore, any score obtained from an efs task includes systematic non-ef variance and measurement error attributed to non-ef processes (miyake & friedman, 2012). for this reason, multiple tasks that appear different on the surface but still capture the targeting ability are often selected. if these tasks share little systematic non-ef variance, it is possible to statistically extract what is common across those tasks and use that ‘pure’ variable as the measurement of ef (miyake & friedman, 2012). there are several theoretical explanations for adhd and ef’s relationship. firstly, barkley (1997) proposed that a deficit in behavioural inhibition is the core deficit of adhd, which, in turn, creates disturbances in five neuropsychological functions: working memory; internalisation of speech; self-regulation of affect, motivation, and arousal; behaviour analysis and synthesis and motor control, fluency, and syntax. barkley (1997) also suggested that difficulties with inhibition of behaviour may underlie some of the psychological and social difficulties linked with the other four efs (barkley, 1997). according to barkley (1997), the configuration of deficits found in children with adhd suggests the involvement of efs including working memory. therefore, efs have been found to correspond with the symptoms of adhd. secondly, the influential baddeley, logie, bressi, sala and spinnler (1986) multi-component model of working memory includes three components (the phonological loop specialised for the maintenance of speech-based phonological information) and the visuospatial sketchpad (specialised for visual and spatial information). the model also includes a central control structure called the central executive, which controls and regulates the cognitive processes (efs) and is frequently connected to frontal lobes functioning (miyake et al., 2000). miyake et al. (2000) suggested a model that identifies three separable but partially correlated constructs: inhibiting prepotent responses (inhibition), shifting between tasks or mental sets (shifting) and updating of working memory representations (updating). adhd-related working memory deficits were apparent across all three cognitive systems with deficits in the central executive. it also indicated that children with adhd tend to perform poorly in a complex working memory task as they rely heavily on the central executive. lastly, according to sonuga-barke’s dual pathway model (2002), children with adhd display problems with set-shifting and working memory because adhd may pertain not only to dysregulation of the thought and action pathway but also to the motivational style pathway. the first of these pathways is manifested in a primary, inhibitory dysfunction, that is mediated by secondary cognitive and behavioural dysfunctions, which in turn leads to faulty task engagement (deficits of set-shifting and working memory) and to symptomatic behaviour (i.e. hyperactivity and inattentiveness). the second pathway, in contrast, is involved in reward mechanisms (sonuga-barke et al., 2003). according to the delay aversion concept, children with adhd experience higher sensitivity to delays than their peers. this leads to decisions that entail choosing a smaller-sooner reward over larger-later rewards on tasks designed to measure the relationship between impulsivity and delay aversion. delay aversion is expressed as certain behaviour theorised to be motivated by the desire to escape or avoid delay. children with adhd act thoughtlessly because they avoid waiting. they may demonstrate elevated frustration when they feel annoyed owing to an unexpected delay during task performance and may show early detachment and inattention during long and tedious tasks. this leads to impulsive choices and perseverating responses. neuroanatomically, ef processes are primarily mediated through the frontal cortex, especially the prefrontal cortex (pfc). however, it is not clear how specific frontal areas are involved (miyake & friedman, 2012; miyake et al., 2000). however, the integrity of the whole brain is necessary for the best performance of ef tasks (funahashi, 2001). koechlin (2016) indicated that damage to the pfc may result in impaired concentration, problem-solving ability, planning and judgement. studies amongst rural south african children (mokobane, pillay, & meyer, 2020; pila-nemutandani & meyer 2016) indicate that children with adhd are significantly impaired on measures of planning behaviour and problem solving (as measured by the tower of london [tol]), showing that mainly the inattention component is involved. this was confirmed by saydam, ayvaşik and alyanak (2015) and oosterlaan, scheres and sergeant (2005). shikwambana (2006), also in a study amongst rural south african children, found that children with adhd were impaired in working memory as measured by the memory for digits (mfd), especially the digits backwards (db). gropper and tannock (2009) and kofler, rapport, bolden, sarver, & raiker (2010) confirmed these results, but the latter also found that children with adhd encountered difficulties with the digits forward (df) test. cockcroft (2011) in another sa study on working memory functioning in children with adhd indicated that children with adhd often experience working memory difficulties, as measured by db test. pennington and ozonoff (1996) found that part a of the trail making test (tmt) could not detect adhd symptoms, whilst kofler et al. (2010) found that children with adhd performed worse than controls on the trails-b, which measures cognitive flexibility, indicating the instrument’s sensitivity to adhd symptoms. in another study amongst rural south african children, mathivha (2005) showed that children with adhd made more perseverative errors (pe) and non-perseverative errors (npe), as measured by the wisconsin card sorting test (wcst), with especially the h/i component being affected. geurts, verté, oosterlaan, roeyers and sergeant (2005) found poor performance on the wcst amongst children with adhd, with both h/i and inattention affected. tsuchiya, oki, yahara and fujieda (2005) and saydam et al. (2015) indicated that all adhd presentations exhibited poor performance on the wcst as suggested by total errors (te) and pe. tsuchiya et al. (2005) also found that children with adhd exhibited poorer performance on the wcst, as indicated by npe. it was, therefore, hypothesised that instruments used to measure ef performance (planning, working memory and set-shifting) would predict the core symptoms of adhd, namely h/i and inattention as well as total adhd symptomatology. the purpose of the study was to examine whether commonly used neuropsychological ef tests, the tol, mfd (df and db), trails-a and trails-b and wcst could predict the core symptoms of adhd, namely h/i and inattention, as well as total adhd symptomatology, as measured by a questionnaire (appendix 1) based on the dsm-iv-tr criteria (american psychiatric association, 2000) in a south african population of primary school children. method participants one hundred and fifty-six children between 6 and 15 years of age (m = 11.7 years; sd = 1.7) were recruited through a screening process from public primary schools around tzaneen, in the limpopo province of south africa. the sample was obtained from grade 1 to grade 7 learners from six schools of a total 10 schools in the circuit; the learners were randomly selected. the home languages of the learners were sepedi and xitsonga. the exclusion criteria, based on the information provided by parents on the demographic questionnaire (appendix 1) and school records, were academic problems at school, as reported by their teachers, a history of head injury, epilepsy, cerebral palsy, cerebral malaria, autism spectrum disorder or severe psychiatric disorders and children who did not return the consent forms. none of the recruited children were taking psychostimulant medication at the time of testing. instruments demographic questionnaire the parent or guardian of each participant was requested to complete a demographic questionnaire (appendix 1) which included biographical, socio-economic, developmental and medical history. they were recorded on an extensive database. disruptive behaviour rating scale the dependent variables comprised the total adhd score, as well as the scores of the h/i and inattention subscales as measured on the dbd (pelham, gnagy, greenslade, & milich, 1992; pillow, pelham, hoza, molina, & stultz, 1998). the dbd assesses the presence and the degree of adhd-related symptoms (h/i and inattention), oppositional defiant disorder and conduct disorder. in this study, only 18 adhd items were used. both the parents and teachers of the participants were asked to rate each item on a four-point scale of a paper and pencil rating scale: ‘not at all’ (0); ‘just a little’ (1); ‘pretty much’ (2) and ‘very much’ (3). for each scale (h/i and inattention), the minimum score was 0 and the maximum 27. teachers’ and parents’ scores were averaged. cut-off points were established at ≥ 17 on the h/i scale and at ≥ 20 on the inattention scale, based on the epidemiological study by meyer et al. (2004). raw scores were recorded. the scale is standardised and normed for all languages and population groups in limpopo province, south africa (meyer et al., 2004). this locally normed dbd has been shown in other studies to be valid and reliable for the population (mokobane et al., 2020; pila-nemutandani & meyer, 2016). the cronbach α computed for the locally normed dbd was 0.90 for the h/i scale and 0.92 for the inattention scale (meyer et al., 2004). tower of london the tol is a widely used instrument for assessing planning ability and consists of two tower boards, which contain three pegs of different lengths and three balls, usually coloured red, blue and green (boccia et al., 2017). the test consists of 12 problems, of which the first two are a practice problem and 10 are test problems. the participants are shown two identical tower boards, one for the participants and one for the examiner. the examiner places the participants’ beads in the start configuration and sets up the practice problem. in the practice problems, two steps are needed to reach a solution. the participants are asked to transform the start state into the goal state in a predetermined minimum number of moves whilst following three rules: (1) they have to move only one ball at a time; (2) a ball in the lower row cannot be moved when another ball was lying above it and (3) three balls may be placed on the tallest peg, two balls on the middle peg and one ball on the shortest peg. from the start position, the participants are required to use the fewest steps to move the beads to the end position. the minimum number of moves required is seven. the number of moves required to reach the goal position and the time taken to complete the test are counted. good planning is indicated by a lower total number of moves. the total number of moves and the time taken were manually recorded on a scoring sheet and scored. the scoring for moves depends on the minimum number of solutions moves of each test problem subtracted from the participants’ actual move count to determine the move score. raw scores were used. the time taken to complete the test was 10–15 min. the split-half reliability coefficient was r = 0.72 and internal consistency, cronbach α = 0.69 (kaller, unterrainer, & stahl, 2012). the cronbach α for the present study was 0.62. memory for digits memory for digits is a subtest of the senior south african individual scales-revised (ssais-r), an instrument that is used to measure general intelligence that was published in 1964 and revised in 1992 (cockcroft & blackburn, 2008; van eeden & visser, 1992). the test also determines the participants’ working memory, auditory sequencing and auditory attention ability (van eeden & visser, 1992). the test requires the concentration of the participants to be able to encode and recall the digits. although this test was originally standardised for mixed race, indian and white children, the test has been successfully used amongst black children by shikwambana (2006), who found that the instrument distinguished between children with and without adhd symptoms, the latter successfully repeating more digits, especially digits backward. the test consists of two subtests of strings of digits that are read at a steady rate to the participant, who repeats the digits read to the researcher. in one subtest, df, the two series of eight sets of digits are read to the participant, who is required to repeat them. in the second subtest, db, two series of seven digits are read to the participant who is required to repeat them backwards. each of the mfd tests (df and db) is discontinued after two consecutive items are incorrectly answered (van eeden & visser, 1992). the scoring of 2 marks is awarded if the participant repeats the first series of an item correctly, 1 mark if the participant repeats only the second series of an item correctly and 0 if they repeat both series incorrectly. the total maximum score is 16. the internal reliability of the test ranges from 0.83 to 0.90 and construct validity ranges from 0.1 to 0.5 (cockcroft, 2013). for the present study, the cronbach α was 0.78. trail making test the tmt has been used as an indicator of visual scanning, graphomotor speed, ef, working memory and inhibition (lezak, howieson, loring, & fischer, 2012) and is also a test of visual search, attention, mental flexibility and motor function. the tmt is a timed task, consisting of two subtests: part a measures visual search, attention and mental tracking ability, whilst part b measures cognitive abilities such as flexibility and the capacity to deal with more than one stimulus at a time (kokubo et al., 2012). both parts of the tmt comprise 25 circles distributed over a sheet of paper. in part a, the circles are numbered 1–25, and the participant is expected to draw lines to connect the numbers in ascending order. in part b, the circles contain 13 numbers and 12 letters; participants need to connect circles, alternating both numerically and alphabetically, in increasing order. any errors made by the participants are recorded. in both parts, a participant’s performance (score) is the time taken to complete each trial correctly. the test-retest reliability for the tmt is between 0.60 and 0.90 (wagner, helmreich, dahmen, lieb, & tadić, 2011). cronbach α for the present sample was 0.67 and 0.72, for parts a and b, respectively. wisconsin card sorting test the wcst consists of 128 cards that present sets of geometric designs that vary according to colour, form and number. in the computerised version of the wcst (cv4-research edition), the stimulus cards remain at the top of the screen, and a single response card appears at the bottom of the screen. the participant is required, with the use of a computer mouse, to select a stimulus card that they believe to be correctly ‘matched’ to the response card. after each attempt, the computer provides positive or negative feedback by displaying the word ‘right’ or ‘wrong’ at the bottom of the screen (williams & jarrold, 2013). the purpose of the test is to measure mental flexibility. the classification rule changes after every 10 cards, which means that once the child has worked out the rule, they may begin to make a single mistake (or more) when the rule changes. in this study, the numbers of te, pe, perseverative responses (pr) and npe were used as the main scores to assess set shifting. the inter-rater reliability for the wcst is between 0.88 and 0.93 (mitrushina, boone, razani, & d’elia, 2005), with cronbach α of 0.90. cronbach α for the present study was 0.89. procedure the department of education and principals of the schools gave permission to assess the participants at their school. the dbd questionnaires were distributed to both educators and parents of 5480 children to screen for adhd symptoms. the final sample consisted of 78 children, who could be classified as adhd and 78 with not enough symptoms to meet the criteria for adhd, who were selected for further testing. the participants used were selected for other studies that required matched controls. they were matched according to gender, age and ethnicity with neurotypical controls. the assessment procedures and instructions were conducted by the researcher and trained assistants in the participants’ home language. the researcher and research assistants had a minimum of a bachelor’s degree in psychology and were fluent in sepedi and xitsonga. the assessments were conducted individually with each participant, in a quiet room, during the morning school hours. the tests were administered in the following sequence: tol, df and db, trails-a and trails-b and wcst. the assessment procedure for each child took ± 60 min. ethical considerations the ethics committee of the university of kwazulu-natal (reference number: hss/1452/015d) approved the study. permission to conduct the tests was obtained from both the department of education of limpopo province and the school principals of the identified schools. participation was voluntary. written, informed consent was obtained from the parents or legal guardians of the learners. the children themselves also had to agree to participate in the study. the completed consent forms were submitted to the school principals, in sealed envelopes and locked in a safe until the researcher collected them. the researchers read out the assent form to children in their home language and, after establishing that they understood the content, all participants assented to their participation in the study. the children’s identity was coded on all questionnaires and the database to guarantee anonymity. all data were then stored securely in the researcher’s office and entered onto the researcher’s computer with a security code. test protocols and answer sheets are securely stored in a locked cabinet for 5 years, after which they will be destroyed. confidentiality was explained and assured to the participants. no risks were involved when assessing children. the parents were informed that the participants will be referred to the closest psychological services for the final diagnosis and treatment when the need arises. data analysis during the evaluation of the participants, their scores were recorded on the score sheets by the researcher and research assistants and later transferred to a database for analysis. depending on the tests, they were either manually or electronically scored. a multiple regression analysis was carried out on the raw scores to determine the capacity of the various ef measurements to predict the diagnostic criteria for adhd, as well as for the core symptoms of h/i and inattention. the main goal was to establish whether the tol (moves and time), mfd (df and db), trails-a and trails-b and wcst (te, pe, pr, and npe) correctly predicted adhd symptoms. consequently, the raw scores of measures were introduced in the analysis as predictor variables, whilst the dbd scores on the h/i and inattention scales, as well as the total adhd score, were dependent variables. outliers were only noted for a few tests investigated and were not removed for analysis. the forward stepwise multiple regression programme from statistica-13 (statistica, 2015) was employed. results descriptive statistics for all predictors (tests of ef) and dependent variables (h/i, inattention and adhd total score) are presented in table 1. table 1: attention-deficit/hyperactivity disorder and executive function test results (n = 156). table 2 illustrates the pearson product-moment correlation coefficients between the measurement of ef and the dbd scores for h/i, inattention and total adhd. the correlation coefficients between the tests of ef and the dbd scores ranged from 0.11 to 0.69. alpha was adjusted for multiple comparisons with bonferroni corrections. the correlation coefficient for trails-a was not statistically significant and therefore did not form part of the regression analysis. table 2: correlation between attention-deficit/hyperactivity disorder symptom domains and executive function measures. a multiple regression analysis was conducted where the nine remaining tests of ef were entered into a forward stepwise regression analysis to predict h/i, inattention and total adhd criteria, as measured on the dbd scale (see table 3). table 3: relationship between scores of tests for executive function and attention-deficit/hyperactivity disorder domains (df = 2, 153). significant associations were found for all nine tests. the analysis revealed that te on the wcst was the strongest predictor for total adhd, which explained 48% of the variance. this was followed by db and trails-b, which each predicted 39% of the variance. the pe on the wcst explained 33% of the variance and npe 30%. these were followed by the wcst pr, which predicted 23% of the variance, and the tol, which predicted 17% of the variance for both moves and time taken. the df test was revealed as the poorest predictor for adhd symptomatology, as it explained only 6% of the total variance. total errors on the wcst were again found as the strongest predictor of h/i symptoms as they predicted 40% of the variance. this was followed by wcst pr, at 32% of the variance, db at 30% of the variance, trails-b at 28% of the variance and wcst npe and pr, both at 21% of the variance. the tol, both for moves and time taken, and the df test were the poorest predictors of h/i symptoms, as each predicted 12% of the variance. the strongest predictor for the inattention criteria was once again the total number of errors on the wcst, which explained 41% of the variance, followed closely by trails-b and db, at 40% and 37% of the variance, respectively. the npe on the wcst explained 31% of the variance and could also be regarded as a satisfactory predictor of inattention symptoms. wisconsin card sorting test pe and wcst pr explained 25% and 19% of the variance, respectively. the tol moves and time taken were weak predictors of inattention criteria and predicted only 17% and 18% of the variance, respectively. the poorest predictor of inattention, however, was the df, which only explained 7% of the variance. discussion the results of the study support the hypothesis that commonly used clinical tests of ef predict the diagnostic criteria for adhd, namely h/i and inattention, as well as total adhd symptoms, according to the dsm-iv-tr criteria. all the tests investigated, except trails-a, predicted adhd symptomatology. of the efs measures analysed, the wcst (te) was the best predictor, as it accounted for the largest variance, contributing to total adhd symptoms and also to h/i and inattention separately. trails-b and db followed closely, as they both accounted equally for the variance of total adhd symptoms, but they were found to predict more symptoms of inattention than h/i. the responses on the wcst indicated that pe predicted more h/i symptomatology, whilst npe were largely associated with inattention. although there was also an association between adhd symptoms and the tol and df, their predictive power was much lower. however, the tol seemed more sensitive to inattention symptoms, whilst the df test showed a slightly stronger association with h/i than with inattention. the significance of the wcst in predicting adhd symptoms (both h/i and inattention) did not come as a surprise. performance on the wcst measures not only cognitive flexibility (set-shifting) but also involves other efs such as working memory and inhibition. the instrument measures higher cognitive abilities and requires attention, perseverance, abstract thinking, planning, organised search and use of feedback, all frontal lobe functions that are often deficient in adhd candidates (toplak, west, & stanovich, 2013). the results of the analysis showed that pe and te of the wcst predicted more h/i symptoms than inattention symptoms. saydam et al. (2015) also indicated that the wcst, especially in terms of pe and te, showed that children with adhd lack strategic problem solving because of a more impulsive strategy rather than thinking through the planning of the problem. tsuchiya et al. (2005) also reported that the wcst is sensitive mainly to symptoms of impulsiveness. the npe of the wcst predicted more symptoms of inattention than those of h/i. ahmadi, mohammadi, araghi and zarafshan (2014) also reported that the npe of the wcst are associated with more inattention symptoms in children with adhd than pe. because of their distractibility, children with adhd fail to sustain attention and therefore display inefficient use of working memory strategies. moreover, these children struggle to pay attention to maintain interest in a task; they frequently make careless errors and become distracted by external stimuli (tripp & wickens, 2009). trails-b and db were also strong predictors of adhd symptomatology. trails-b predicted inattention (40% of the variance) better than h/i (28%) and db predicted inattention (37% of the variance) slightly better than h/i (30%). trails-b measures mental flexibility, working memory and attention (sánchez-cubillo et al., 2009). the db test measures working memory (cockcroft, 2011, 2013). poor performance on the trails-b and db tests suggests that because of their inattentiveness, children with adhd are slow to switch between stimuli or between sets of stimuli, in order to control and adapt their behaviour to adjust it appropriately for changing situations. other research also indicated that adhd symptoms of inattention are associated with poor performance on the trails-b task (oades & christiansen, 2008; pennington & ozonoff’ 1996; willcutt et al., 2005) and also on db (gropper & tannock, 2009; kofler et al., 2010; shikwambana, 2006). barkley (1997) and chhabildas, pennington and willcutt (2001) also indicated that inattention causes problems with executing working memory tasks. barkley (1997) explained that children with adhd show difficulties with working memory because they struggle to suppress competing stimuli, and their distractibility means they are less likely to retain information in mind. the tol, which measures behavioural planning, was not a strong predictor of adhd symptomatology although it showed a slightly stronger association with inattention (18% of the variance in time taken and 17% of the moves) than with h/i (12% for each). chhabildas et al. (2001) also indicated that the tol had a stronger association with inattention symptoms than h/i. mokobane et al. (2020), pila-nemutandani and meyer (2016) and saydam et al. (2015) also found that inattention was mainly involved, as especially children with adhd-pi and adhd-c’s ability to plan strategies are negatively affected. cornoldi et al. (2001) found that children with adhd had difficulty with problem solving, as they tend to remember information that are less relevant or irrelevant. kofman, gidley larson and mostofsky (2008) also reported that children with adhd struggled with competence on tasks needing strategic planning. according to kaller et al. (2012), planning requires adequate control of impulses (the h/i component), as well as reasonably functioning memory (inattention). digits forward was found to be a poor predictor of adhd symptomatology. however, it showed a stronger association with h/i (12% of the variance), probably because of impulsive responses by the participants, than with inattention (7%), because the df only measures short-term auditory memory. rosenthal, riccio, gsanger and jarratt (2006) found that the df test very slightly predicted inattention and did not predict ef involvement. the finding that trails-a, which measures visual scanning, simple attention and motor speed but not ef, did not tap into adhd symptoms was confirmed by johnson et al. (2001). finally, the results of the current study indicated that most of the tests used to assess efs predicted the core symptoms of adhd: h/i and inattention. barkley (1997), miyake et al. (2000), nigg (2017) and willcutt et al. (2005) confirmed that efs are actually an integral part of adhd symptomatology. the detection of executive dysfunction will supply insight into cognitive difficulties that may contribute to scholastic and behavioural problems (nigg, 2017). the results of our study suggest, therefore, that measures for efs may detect adhd symptomatology effectively and will supply valuable additional information for a successful diagnosis. implications the results suggest that especially the wcst, trails-b and db tests could be effective complementary instruments to indicate cognitive impairment in children diagnosed with adhd. the combined use of adhd rating scales, parent interview and the abovementioned tests may provide valuable information on the functioning of children with adhd in academic and social settings. limitations and future recommendations the sample used in this study was fairly homogeneous, in that the participants all came from the same geographical area. the children were sepedi and xitsonga speaking. therefore, it is not possible to generalise the results to children in other regions of south africa. the study has yet a further limitation in that it did not test for comorbidities. comorbid disorders should be carefully examined as they play a significant role in ef performance and in day to day. children with adhd may display more difficulties with efs if they have comorbid disorders such as oppositional defiant disorder, conduct disorder, depression, or reading disorder (willcutt, pennington, chhabildas, friedman, & alexander, 1999). another limitation is that the sample size was limited. fmri could be used to indicate frontal lobe dysfunction associated with efs. conclusion the study showed that the tests of the efs investigated predicted the core symptoms of adhd, except trails-a. the tests predicted adhd symptomatology to various degrees. the study showed that, whilst the wcst was the strongest predictor, both db and trails-b were also found to be strong predictors of adhd. the wcst, db and trails-b could be used in clinical settings to successfully measure efs to complement the diagnosis of adhd. acknowledgements the authors would like to acknowledge and thank the following field workers: xichavo hobyani, khuliso matidza, penny mafela and rotakala sadiki. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions t.t.b. made an extensive contribution to the concept and design of the article, collected data and drafted the article and finalised the version to be published. b.p. assisted with overseeing, made substantial remarks on the prepared articles and approved the final version to be published. a.m. provided the data analysis and tables, revised the article and approved the version to be published. funding information this project was partially funded by the university of kwazulu-natal. data availability data will be available in the university of kwazulu-natal library, no data sharing. disclaimer the views expressed in the submitted article are those of the authors and not an official position of the institution or funder. references ahmadi, n., mohammadi, m.r., araghi, s.m., & zarafshan, h.j. (2014). neurocognitive profile of children with attention deficit hyperactivity disorders (adhd): a comparison between subtypes. iranian journal of psychiatry, 9(4), 197–202. american psychiatric association. (2000). diagnostic and statistical manual of mental disorders (4th ed., text rev.). washington, dc: american psychiatric association american psychiatric association. (2013). diagnostic and statistical manual of mental disorders (5th ed.). arlington, va: american psychiatric association. baddeley, a., logie, r., bressi, s., sala, s.d., & spinnler, h.j. (1986). dementia and working memory. quarterly journal of experimental psychology, 38(4), 603. https://doi.org/10.1080/14640748608401616 barkley, r. (1997). behavioural inhibition, sustained attention, and executive functions: constructing a unifying theory of adhd. psychological bulletin, 121(1), 65–91. https://doi.org/10.1037/0033-2909.121.1.65 boccia, m., marin, d., d’antuono, g., ciurli, p., incoccia, c., antonucci, g., & piccardi, l.j. (2017). the tower of london (tol) in italy: standardization of the tol test in an italian population. neurological sciences, 38(7), 1263–1270. https://doi.org/10.1007/s10072-017-2957-y chhabildas, n., pennington, b.f., & willcutt, e.g. (2001). a comparison of the neuropsychological profiles of the dsm-iv subtypes of adhd. journal of abnormal child psychology, 29(6), 529–540. https://doi.org/10.1023/a:1012281226028 chiang, h.-l., & gau, s.s. (2014). impact of executive functions on school and peer functions in youths with adhd. research in developmental disabilities, 35(5), 963–972. https://doi.org/10.1016/j.ridd.2014.02.010 ciuluvica, c., mitrofan, n., & grilli, a. (2013). aspects of emotion regulation difficulties and cognitive deficit in executive functions related of adhd symptomatology in children. procedia-social and behavioral sciences, 78, 390–394. https://doi.org/10.1016/j.sbspro.2013.04.317 cockcroft, k., & blackburn, m.j. (2008). the relationship between senior south african individual scale–revised (ssais-r) subtests and reading ability. south african journal of psychology, 38(2), 377–389. https://doi.org/10.1177/008124630803800209 cockcroft, k.j. (2011). working memory functioning in children with attention-deficit/hyperactivity disorder (adhd): a comparison between subtypes and normal controls. journal of child and adolescent mental health, 23(2), 107–118. https://doi.org/10.2989/17280583.2011.634545 cockcroft, k.j. (2013). the senior south african individual scales–revised: a review. in s. laher & k. cockcroft (eds.), psychological assessment in south africa: research and applications (pp. 48–59). wits university press. cornoldi, c., marzocchi, g.m., belotti, m., caroli, m.g., meo, t., & braga, c.j. (2001). working memory interference control deficit in children referred by teachers for adhd symptoms. journal of normal and abnormal development in childhood, 7(4), 230–240. https://doi.org/10.1076/chin.7.4.230.8735 dalley, j., everitt, b., & robbins, t. (2011). impulsivity, compulsivity, and top-down cognitive control. neuron, 69(4), 680–694. https://doi.org/10.1016/j.neuron.2011.01.020 danielson, m., bitsko, r., ghandour, r., holbrook, j., kogan, m., & blumberg, s. (2016). prevalence of parent-reported adhd diagnosis and associated treatment among us children and adolescents. journal of clinical child & adolescent psychology, 47, 199–212. https://doi.org/10.1080/15374416.2017.1417860 diamond, a. (2013). executive functions. annual review of psychology, 64, 135–168. elisa, r., balaguer-balleester, e., & paris, b. (2016). inattention, working memory and goal neglect in acommunity sample. frontiers in psychology, 7, 1428. https://doi.org/10.3389/fpsyg.2016.01428 faraone, s., biederman, j., & mick, a. (2006). the age-dependent decline of attention-deficit/hyperactivity disorder: a meta analysis of follow-up studies. psychological medicine, 36, 159–156. https://doi.org/10.1017/s003329170500471x funahashi, s. (2001). neuronal mechanisms of executive control by the prefrontal cortex. neuroscience research, 39, 147–165. https://doi.org/10.1016/s0168-0102(00)00224-8 geurts, h.m., verté, s., oosterlaan, j., roeyers, h., & sergeant, j.a. (2005). adhd subtypes: do they differ in their executive functioning profile? archives of clinical neuropsychology, 20(4), 457–477. https://doi.org/10.1016/j.acn.2004.11.001 gropper, r.j., & tannock, r.j. (2009). a pilot study of working memory and academic achievement in college students with adhd. journal of attentional disorders, 12(6), 574–581. https://doi.org/10.1177/1087054708320390 johnson, d.e., epstein, j.n., waid, l.r., latham, p.k., voronin, k.e., & anton, r.f. (2001). neuropsychological performance deficits in adults with attention deficit/hyperactivity disorder. archives of clinical neuropsychology, 16(6), 587–604. https://doi.org/10.1093/arclin/16.6.587 kaller, c.p., unterrainer, j.m., & stahl, c.j. (2012). assessing planning ability with the tower of london task: psychometric properties of a structurally balanced problem set. psychological assessment, 24(1), 46–53. https://doi.org/10.1037/a0025174 koechlin, e.j. (2016). prefrontal executive function and adaptive behavior in complex environments. current opinion in neurobiology, 37, 1–6. https://doi.org/10.1016/j.conb.2015.11.004 kofler, m.j., rapport, m.d., bolden, j., sarver, d.e., & raiker, j.s. (2010). adhd and working memory: the impact of central executive deficits and exceeding storage/rehearsal capacity on observed inattentive behavior. journal of abnormal child psychology, 38(2), 149–161. https://doi.org/10.1007/s10802-009-9357-6 kofman, o., gidley larson, j., & mostofsky, s.h. (2008). a novel task for examining strategic planning: evidence for impairment in children with adhd. journal of clinical and experimental neuropsychology, 30(3), 261–271. https://doi.org/10.1080/13803390701380583 kokubo, n., inagaki, m., gunji, a., kobayashi, t., ohta, h., & kajimoto, o. (2012). developmental change of visuo-spatial working memory in children: quantitative evaluation through an advanced trail making test. brain development, 34(10), 799–805. https://doi.org/10.1016/j.braindev.2012.02.001 lezak, m.d., howieson, d.b., loring, d.w., & fischer, j.s. (2012). neuropsychological assessment (5th ed.). new york: oxford university press. mathivha, m. (2005). neuropsychological deficits in tshivenda-speaking children with attention-deficit/hypersensitivity disorder [master’s thesis, university of limpopo]. turfloop. retrieved from http://hdl.handle.net/10386/367 meyer, a., eilertsen, d.-e., sundet, j.m., tshifularo, j., & sagvolden, t. (2004). cross-cultural similarities in adhd-like behaviour amongst south african primary school children. south african journal of psychology, 34(1), 122–138. https://doi.org/10.1177/008124630403400108 mitrushina, m., boone, k.b., razani, j., & d’elia, l.f. (2005). handbook of normative data for neuropsychological assessment. new york: oxford university press. miyake, a., & friedman, n.p. (2012). the nature and organization of individual differences in executive functions: four general conclusions. current directions in psychological science, 21(1), 8–14. https://doi.org/10.1177/0963721411429458 miyake, a., friedman, n.p., emerson, m.j., witzki, a.h., howerter, a., & wager, t.d. (2000). the unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: a latent variable analysis. cognitive psychology, 41(1), 49–100. https://doi.org/10.1006/cogp.1999.0734 mokobane, m., pillay, b.j., & meyer, a. (2020). behaviour planning and inhibitory control in sepedi-speaking primary school children with attention-deficit/hyperactivity disorder. south african journal of psychology, 50(1), 11–23. https://doi.org/10.1177/0081246319838104 nigg, j. (2017). annual research review: on the relations of self-regulation, self-control, executive functioning, effortful control, cognitive control, impulsivity, risk taking, and inhibition for developmental psychopathology. journal of child psychology and psychiatry, 58(4), 361–383. https://doi.org/10.1111/jcpp.12675 oades, r.d., & christiansen, h.j. (2008). cognitive switching processes in young people with attention-deficit/hyperactivity disorder. archives of clinical neuropsychology, 23(1), 21–32. https://doi.org/10.1016/j.acn.2007.09.002 ogilvie, j.m., stewart, a.l., chan, r.c., & shum, d.h. (2011). neuropsychological measures of executive function and antisocial behavior: a meta-analysis. criminology, 49(4), 1063–1107. https://doi.org/10.1111/j.1745-9125.2011.00252.x oosterlaan, j., scheres, a., & sergeant, j.a. (2005). which executive functioning deficits are associated with ad/hd, odd/cd and comorbid ad/hd+ odd/cd? journal of abnormal child psychology, 33(1), 69–85. https://doi.org/10.1007/s10802-005-0935-y pelham, jr, w.e., gnagy, e.m., greenslade, k.e., & milich, r.j. (1992). teacher ratings of dsm-iii-r symptoms for the disruptive behavior disorders. journal of abnormal child and adolescent psychiatry, 31(2), 210–218. https://doi.org/10.1097/00004583-199203000-00006 pennington, b., & ozonoff, s. (1996). executive functions and developmental psychopathology. journal of child psychology and psychiatry, 37, 51–87. https://doi.org/10.1111/j.1469-7610.1996.tb01380.x pila-nemutandani, r.g., & meyer, a. (2016). behaviour planning and problem solving deficiencies in children with symptoms of attention deficit hyperactivity disorder from the balobedu culture, limpopo province, south africa. journal of child and adolescent mental health, 28(2), 109–121. https://doi.org/10.2989/17280583.2016.1200582 pillow, d.r., pelham, w.e., hoza, b., molina, b.s., & stultz, c.h. (1998). confirmatory factor analyses examining attention deficit hyperactivity disorder symptoms and other childhood disruptive behaviors. journal of abnormal child psychology, 26(4), 293–309. https://doi.org/10.1023/a:1022658618368 polanczyk, g., de lima, m., horta, b., biederman, j., & rohde, l. (2007). the worldwide prevalence of adhd: a systematic review and metaregression analysis. american journal of psychiatry, 164, 942–948. https://doi.org/10.1176/ajp.2007.164.6.942 rosenthal, e.n., riccio, c.a., gsanger, k.m., & jarratt, k.p. (2006). digit span components as predictors of attention problems and executive functioning in children. archives of clinical neuropsychology, 21(2), 131–139. https://doi.org/10.1016/j.acn.2005.08.004 sánchez-cubillo, i., periáñez, j., adrover-roig, d., rodríguez-sánchez, j., rios-lago, m., tirapu, j., & barcelo, f.j. (2009). construct validity of the trail making test: role of task-switching, working memory, inhibition/interference control, and visuomotor abilities. journal of the international neuropsychological society, 15(3), 438–450. https://doi.org/10.1017/s1355617709090626 saydam, r.b., ayvaşik, h.b., & alyanak, b.j. (2015). executive functioning in subtypes of attention deficit hyperactivity disorder. archives of neuropsychiatry, 52(4), 386–392. https://doi.org/10.5152/npa.2015.8712 shikwambana, b.t. (2006). neuropsychological and cognitive deficits in children with disruptive behaviour disorders [master’s thesis, university of limpopo]. turfloop. retrieved from http://hdl.handle.net/10386/915 sonuga-barke, e.j., dalen, l., daley, d., & remington, b.j. (2002). are planning, working memory, and inhibition associated with individual differences in preschool adhd symptoms? developmental neuroscience, 21(3), 255–272. https://doi.org/10.1207/s15326942dn2103_3 sonuga-barke, e.j., dalen, l., & remington, b.j. (2003). do executive deficits and delay aversion make independent contributions to preschool attention-deficit/hyperactivity disorder symptoms? neuroscience & behavioural review, 42(11), 1335–1342. https://doi.org/10.1097/01.chi.0000087564.34977.21 statistica (2015). dell statistica data analysis software system, version 13. round rock, tx: dell inc. thorell, l.b., & wåhlstedt, c.j. (2006). executive functioning deficits in relation to symptoms of adhd and/or odd in preschool children. infant and child development, 15(5), 503–518. https://doi.org/10.1002/icd.475 toplak, m.e., west, r.f., & stanovich, k.e. (2013). practitioner review: do performance-based measures and ratings of executive function assess the same construct? journal of child psychology and psychiatry, 54(2), 131–143. https://doi.org/10.1111/jcpp.12001 tripp, g., & wickens, j. r. (2009). neurobiology of adhd. neuropharmacology, 57(7–8), 579–589. https://doi.org/10.1016/j.neuropharm.2009.07.026 tsuchiya, e., oki, j., yahara, n., & fujieda, k.j. (2005). computerized version of the wisconsin card sorting test in children with high-functioning autistic disorder or attention-deficit/hyperactivity disorder. brain and development, 27(3), 233–236. https://doi.org/10.1016/j.braindev.2004.06.008 van eeden, r., & visser, d.j. (1992). the validity of the senior south african individual scale-revised (ssais-r) for different population groups. south african journal of psychology, 22(3), 163–171. https://doi.org/10.1177/008124639202200308 van holstein, m., aarts, e., van der schaaf, m.e., geurts, d.e., verkes, r.j., franke, b., & cools, r.j. (2011). human cognitive flexibility depends on dopamine d2 receptor signaling. psychopharmacology, 218(3), 567–578. https://doi.org/10.1007/s00213-011-2340-2 wagner, s., helmreich, i., dahmen, n., lieb, k., & tadić, a.j. (2011). reliability of three alternate forms of the trail making tests a and b. archives of clinical neuropsychology, 26(4), 314–321. https://doi.org/10.1093/arclin/acr024 willcutt, e., doyle, a., nigg, j., & faraone, s. (2005). validity of the executive function theory of attention-deficit/hyperactivity disorder: a meta-analytic review. biological psychiatry, 57(11), 1336–1346. https://doi.org/10.1016/j.biopsych.2005.02.006 willcutt, e.g., pennington, b.f., chhabildas, n.a., friedman, m.c., & alexander, j.j. (1999). psychiatric comorbidity associated with dsm-iv adhd in a nonreferred sample of twins. journal of the american academy of child and adolescent psychiatry, 38(11), 1355–1362. https://doi.org/10.1097/00004583-199911000-00009 williams, d., & jarrold, c.j. (2013). assessing planning and set-shifting abilities in autism: are experimenter-administered and computerised versions of tasks equivalent? autism research, 6(6), 461–467. https://doi.org/10.1002/aur.1311 appendix 1: biographical questionnaire child and family information number/code: birth date: age: child’s school: child’s grade: developmental and medical history pregnancy and delivery a. length of pregnancy (weeks) b. length of delivery (number of hours from initial labour pains to birth c. mother’s age when child was born d. child’s birth weight e. did any of the following conditions occur during pregnancy/delivery? 1. bleeding no yes 2. excessive weight gain no yes 3. toxaemia/preeclampsia no yes 4. rh factor incompatibility no yes 5. frequent nausea or vomiting no yes 6. serious illness or injury no yes 7. took prescription medications a. if yes, name of medication no yes 8. took illegal drugs no yes 9. used alcoholic beverage a. if yes, approximate number of drinks per week no yes 10. smoked cigarettes a. if yes, approximate number of cigarettes per day (e.g., ½ pack) no yes 11. used snuff a. if yes, how many times per day? no yes 12. was given medication to ease labour pains. a. if yes, name of medication no yes 13. delivery was induced no yes 14. forceps were used during delivery no yes 15. had a breech delivery no yes 16. had a caesarean section delivery no yes 17. other problems – please describe no yes f. did any of the following conditions affect your child, during delivery or within the first few days after birth? 1. injured during delivery no yes 2. cardiopulmonary distress during delivery no yes 3. delivery with cord around neck no yes 4. had trouble breathing following delivery no yes 5. needed oxygen no yes 6. was cyanotic, turned blue no yes 7. was jaundiced, eyes turned yellow no yes 8. had an infection no yes 9. had seizures no yes 10. was given medications no yes 11. born with a congenital defect no yes 12. was in hospital more than 7 days no yes g. breast feeding 1. did you breastfeed your child? no yes 2. if you breastfed your baby, for how long? no yes 3. at what age did you introduce solid food? no yes 4. at what age was your child completely weaned from the breast? no yes abstract introduction conceptual framework aim of the study methods procedure and data analysis results discussion implications for future research, practice and theory conclusion acknowledgements references appendix 1: quality of translation and linguistic equivalence checklist (revised) appendix 2: quality of translation and linguistic equivalence checklist: interpretation matrix appendix 3: quality of translation and linguistic equivalence checklist template appendix 4: quality of translation and linguistic equivalence checklist: reviewer response form appendix 5: reviewer’s comments about the author(s) mario r. smith department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa nuraan adams department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa erica munnik department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa citation smith, m.r., adams, n., & munnik, e. (2022). the development of the quality of translation and linguistic equivalence checklist. african journal of psychological assessment, 4(0), a108. https://doi.org/10.4102/ajopa.v4i0.108 original research the development of the quality of translation and linguistic equivalence checklist mario r. smith, nuraan adams, erica munnik received: 22 feb. 2022; accepted: 13 july 2022; published: 26 oct. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the international test commission (itc) established guidelines for test adaptations. the itc encourages the adaptation of locally developed measures with proven validity. a good quality translation process ensures that the same meaning is conveyed from the source to the target language. through test adaptation, researchers focus on cultural differences between the source and the target language to maintain linguistic equivalence. research involving adaptation has systematically failed to report on the rigour of the translation process and to make translation part of the empirical process. the itc guidelines are generally referred to; however, the assessment of the quality of translations and the process of establishing linguistic equivalence remain an important research focus. this study reports on the development of the quality of translation and linguistic equivalence checklist (qtlc). the construction of the qtlc was based on itc guidelines. the qtlc consists of two sections, translation and linguistic equivalence, and produced section scores with accompanying quality descriptions. the draft instrument was presented to three independent reviewers. once feedback was incorporated, the qtlc was piloted in an ongoing study on the translation of the e3sr. two reviewers applied the checklist, and inter-rater reliability was established. the kappa statistic (0.78) tested significant at a 0.00 alpha level, indicating substantial agreement between the raters on the quality of the translation process and equivalence. four items were identified as functioning differently and were subsequently revised. the qtlc appears to be a robust checklist assessing the quality of translations and the process of establishing linguistic equivalence. keywords: adaptation; linguistic equivalence; inter-rater reliability; translation; qtlc. introduction access to reliable and valid measures is critical to the provision of culture-fair quality assessment practices (sousa & rojjanasrirat, 2011). in a multilingual country such as south africa, culture-fair assessment practices require reliable and valid assessment measures that are available in different languages (mohamed, 2013). south africa has 11 official languages, but services and instrumentation are largely available in english and afrikaans (munnik, wagener, & smith, 2021). hernández, hidalgo, hambleton and gomez-benito (2020) recommended that locally developed measures with proven validity should be considered for translation, as they already have a higher level of contextual relevance than measures developed in other countries or cultures. thus, the need for the translation of measures, especially locally developed measures, is an important area of research currently. hernández et al. (2020) contended that translation is only one part of a multifaceted process called test adaptation. test adaptation attempts to modify the content of an instrument to make it culturally appropriate and accurate (epstein, santo, & guillemin, 2015). this process includes translation, equivalence and validation (lakens, scheel, & isager, 2018). translation entails the written rendering of the meaning of a word from the source language to the target language. the translated version must be as close as possible to the format of the original and must consider possible linguistic challenges. the quality of the translation process ensures that the same meaning is conveyed from the source language to the target language (behr, 2017). linguistic equivalence ensures that there is similarity in meaning between two sets of words spoken or written in different languages (geisinger, 2003). in other words, the translation of an item from the source language to the target language will convey the same meaning. researchers must adjust the content to avoid culturally biased items and poor phrasing whilst ensuring that the content in the translated version is comparable to the original. in this way, they are attempting to maintain linguistic equivalence (arnold & smith, 2013). rawoot and florence (2017) stated that the process of adaptation usually requires well-defined and executed steps. similarly, lakens (2017) recommended that adaptation requires careful planning and must follow a rigorous and comprehensive empirical process. to this end, hambleton (2011) recommended the guidelines proposed by the international test commission (itc) (itc, 2016). the itc comprises various psychological associations, publishers, test commissions and other organisations committed to promote effective testing and assessment policies in the construction and evaluation of instruments through the guidelines that they develop (itc, 2016). a detailed account of the guidelines is provided under the heading conceptual framework. hambleton (2011) criticised test adaptation practices and underscored that researchers do not consistently and systematically report on the methodological rigour and coherence of the translation process and whether or how linguistic equivalence is achieved. researchers focus on data reduction techniques to demonstrate construct validity and cronbach’s alpha to demonstrate internal consistency of the translated measures (peters, 2014; tavakol & dennick, 2011). cross-cultural test constructors remain concerned about the lack of systematic methodology and quality assurance of test adaptation through translation (arafat, chowdhury, qusar, & hafez, 2016). the guidelines proposed by the itc are often cited or referred to without an account of how it was applied (hernández et al., 2020). similarly, there is no existing means to evaluate and report on translation processes against the itc guidelines. to address this identified gap in the literature, the authors developed and piloted a checklist based on the itc guidelines against which translation processes can be assessed. this manuscript reports on the development of the quality of translation and linguistic equivalence checklist (qtlc). conceptual framework the second edition of the itc test translation and adaptation guidelines was adopted as the conceptual framework for the study (itc, 2016). the guidelines were developed as the adaptation processes in research did not always follow a rigorous process (itc, 2016). the itc guidelines consists of general guidelines that constitute a framework for good practice in test adaptation. the itc guidelines are structured into four divisions for ease of use, namely (1) precondition, (2) test development and confirmation, (3) administration and (4) documentation. each will be discussed in turn. 1. ‘precondition’ emphasises that decisions must be made before the adaptation process begins. an important consideration at this stage is whether the construct has an equivalent in the target language, that is, linguistic equivalence. the guidelines recommend that permission needs to be obtained from the test developer. the amount of overlap in the definition and content of the construct being measured must be estimated. in addition, the item content in the populations of interest must be assessed to determine if it is sufficient for the intended use(s) of the instrument. 2. ‘test development and adaptation’ focuses on the actual process of adaptation. adaptation is a more expansive term referring to the conversion of content in one language and culture into another language. adaptation refers to activities that include deciding whether or not a test or instrument in a second language and culture could measure the same construct as it was intended to measure in the first language. adaptation includes translation that entails converting the content of a test from one language to another to preserve the linguistic meaning and establishing whether the resultant translation is equivalent to the version in the source language (itc, 2016). the guidelines recommend the use of appropriate translation designs and procedures to maximise the suitability of the test adaptation in the intended populations. translators must be selected carefully and should have demonstrable expertise in translation over and above fluency in the source and target languages (itc, 2016). expert knowledge of the subject matter is also recommended (graneheim & lundman, 2004). when conducting the translation, translators must complete their work without prior knowledge of the instrument and of each other (odero, 2017). two independent sets of translators must complete the forward and backward translations. a minimum of two translators are recommended per translation (itc, 2016). a design for evaluating the work of test translators must be selected. the guidelines recommend the use of forward and backward translations as an acceptable method for evaluating the quality of translations (chidlow, plakoyiannaki, & welch, 2014). it is also recommended that a team implements a formal evaluation process that is ideally audited externally (odero, 2017). the evaluation process must consider any necessary accommodations including modification of the test format and revisions to the source format if it enhances the meaning (odero, 2017). an important focus in this section is to ensure that the translation and adaptation processes consider linguistic and cultural differences in the intended populations. this section further focuses on establishing linguistic equivalence between the test in the source and target languages and cultures. the guidelines recommend that evidence is provided to confirm that test instructions and item content have similar meaning for all intended populations. the item formats, rating scales, scoring categories, test conventions, modes of administration and other procedures must also be suitable for all intended populations (itc, 2016). various forms of evidence have been suggested. geisinger (2003) asserted that linguistic equivalence can be achieved in two possible ways. firstly, equivalence can be established through high-quality back translation. back translation entails translating the target language back to the source language independently to ensure that the target language carries that same meaning as the source language (chen & boore, 2010). cash and snider (2014) recommended that both translators should be bilingual speakers and knowledgeable of the topic under study to ensure that equivalence is maintained. secondly, manifest and latent content analysis can be used to establish linguistic equivalence (chen & boore, 2010). graneheim and lundman (2004) described manifest content analysis that focuses on the content aspect and components within a text, whereas latent content analysis is involved with the underlying meanings of interpretations. manifest content analysis is more objective in nature, whereas latent content analysis is more subjective. omar (2012) highlighted the importance of understanding the use of these concepts in context as it influences the grammatical, semantic, social and cultural meanings. small pilot studies are recommended. pilot studies provide data generated by the adapted instrument that can be subjected to techniques such as item analysis, reliability assessment and small-scale validity studies (hernández et al., 2020). the results of these studies can inform any necessary revisions to the adapted test. the guidelines recommend that the initial evidence is followed up with full-scale (larger) validity studies (itc, 2016). the analysis in such studies provide relevant statistical evidence about construct equivalence, method equivalence and item equivalence for all intended populations. this process also provides evidence supporting the norms, reliability and validity of the adapted version of the test in the intended populations. this part of the guidelines is referred to as ‘confirmation’ and includes the gathering of empirical evidence to address the equivalence, reliability and validity of a test or instrument in multiple languages and cultures (hambleton, 2011). 3. administration relates to ‘administration’ and ‘score scales and interpretation’. this set of guidelines aims to provide direction as to how the assessment should be administered in the different languages. it underscores the importance of providing clear instructions so that the test or instrument is used as it is intended (itc, 2016). variation in instrumentation (how the test is used) can invalidate the resulting profile. thus, clear and explicit written instructions should be provided for each version. 4. documentation has been a particularly neglected topic in test adaptation (hambleton, 2011). the guidelines demand more when it comes to documentation of the test adaptation process (itc, 2016). the focus in this section is to prepare supplementary materials and instructions to minimise any cultureand language-related problems that are caused by administration procedures and response modes that can affect the validity of the inferences drawn from the scores. it also includes the specification of any testing conditions that should be followed closely in all populations of interest. the provision of technical documentation to note any changes, including an account of the evidence obtained to support equivalence, when a test is adapted for use in another population is emphasised (hambleton, 2011). this is important to enable good practice amongst test users using the adapted test with people in the context of the new population. documentation also extends to the publication of information in the form of manuals. dissemination through journal publications should specifically report on the methodological rigour and coherence of the adaptation process (hambleton, 2011). similarly, funding instruments should request information about the planned documentation or dissemination protocol in order to enhance knowledge translation for test users and science communication to test constructors and test adaptors (hambleton, 2011). the conceptual framework informed the overall aim of the study, which was to develop a checklist for assessing the quality of translation and equivalence processes. the itc guidelines formed the theoretical underpinning of the proposed checklist and also informed subsequent methodological decisions. aim of the study the aim of the study was to design and develop the qtlc that can evaluate the quality of processes used in test translation and in the establishment of linguistic equivalence. methods design this construction study consisted of two phases. phase 1 entailed the construction of the qtlc. phase 2 entailed piloting of the checklist. phase 1 the construction followed a five-step process. the first step entailed selecting a theoretical structure for the checklist based on the itc guidelines for test adaptation (itc, 2016). the second step entailed deciding on the format of the checklist and the quantification for scoring purposes. the third step entailed generating a pool of items and finalising the draft checklist. the fourth step entailed reviewing and refining the draft scale. the fifth step entailed developing the accompanying templates and instruction guide. the steps are elaborated as follows. step 1: theoretical structure: as mentioned before, the conceptual framework formed the theoretical basis of the proposed measure. the itc guidelines for test adaptation formed the primary theoretical tenets that underpin the proposed measure. (itc, 2016). thus, the itc guidelines for adaptation through translation were defined for measurement, that is, operationalised. the resultant measure is called the qtlc (appendix 1). step 2: format of the instrument: the checklist format was deemed appropriate as it would allow using the itc guidelines as the basis for items. each item corresponds to criteria recommended by the itc for good practice in translation and establishing linguistic equivalence. the checklist was divided into two sections to address the processes for translation and linguistic equivalence respectively. section 1 deals with translation and contains two subsections subsection 1 deals with the experience of translators, their formal qualification and cumulative experience of the translators. subsection 2 relates to the process of translation. section 2 deals with linguistic equivalence and has three subsections. subsection 1 addresses the comparison between the original (source document) and draft in the target language. subsection 2 assesses the comparison between the translated version and back translations. subsection 3 evaluates the comparison between the original version (source document) and back-translated drafts. the three subsections reflect the assumption that good practice would include forward and back translation, as well as comparisons between the different versions produced. a sliding scale was adopted for quantification and scoring purposes. each item is scored, where higher scores indicate a higher-quality response. it was decided that each subsection would generate a score that is the sum of the scores on items in that subsection. each section produces a section score, which is the sum of the subsection scores. this structure was based on the recommendations of mahmood and jacobo (2019), in which the scoring is best understood as a cumulative process, producing scores that can be interpreted independently for subsections and cumulatively for composite scores. an interpretation matrix was designed to assist with interpretation of the scores (appendix 2). each section was assigned a quality description that guides the interpretation of the composite (section) scores. quality descriptions for section 1 described the quality of the translation. descriptions for section 2 describe the quality of the process for establishing equivalence. three quality descriptions were distilled, namely (1) poor, (2) good and (3) excellent. each quality description or category had corresponding actions that can assist researchers or test constructors to apply corrective actions. scores were expressed as a percentage to guide the quality descriptions. poor compliance is considered to be reflective of scores below the 50% threshold. good compliance was considered to be reflective of scores ranging between 50% and 79%. excellent compliance was considered to be reflective of scores equal to or exceeding 80%. as mentioned here, section 2 consisted of three subsections that evaluate independent aspects or processes. thus, it was decided to apply the quality descriptors to the subsections in section 2 as well. this enables the instrument to be used in a formative manner when assessing the process followed to establish linguistic equivalence. step 3: item generation: for section 1, items were formulated that assessed the formal qualification and cumulative experience of the translators, the number of translators involved, the process of comparing different versions of the translations, whether back translation was conducted and how an integrated final version was produced. items were generated for each of the criteria stipulated in the itc guidelines. items were not generated for the precondition. obtaining permission to use an instrument for adaptation was considered an ethics principle. thus, this particular precondition could be assessed under ethics. the guidelines pertaining to the existence of an equivalent in the target language was considered an important aspect in the rationale for pursuing an adaptation study including translation and linguistic equivalence. as such, this can be assessed under the rationale for an adaptation study. the inclusion of items from guidelines about preconditions was considered necessary but not sufficient to evaluate the quality of translation and equivalence processes. thus, these can be addressed relatively easily as mentioned here, and their inclusion in a checklist was thought not to add much value to the assessment of quality. for section 2, the items were generated to assess whether the meaning of items was captured accurately. items across all three subsections aimed to evaluate the manifest and latent content of translated items in terms of clarity and lack of ambiguity. items across this section assess whether the meaning of items were accurately captured. the draft checklist consisted of 37 items. section 1 included 16 items. section 2 included 21 items with seven items per subsection. step 4: reviewing and refining the draft scale: the draft checklist was reviewed by two independent reviewers who were registered research psychologists (n = 2) with the health professions council of south africa (hpcsa). the reviewers had expertise in research methodology, psychometric test construction and psychological assessment, as evidenced by their qualifications, work experience and publications in the areas mentioned. the reviewers identified that the flow of items in section 2 was confusing, and items seemed to be repetitive. the items (n = 7) were revised to create a better progression and to remove the appearance of repetition. the reviewers also indicated that the composite scores for sections and subsections were higher than the maximum score indicated. this was revised accordingly, but it did not impact the number of items or the structure of the draft checklist. each item is scored using a sliding scale, where higher scores indicate a higher quality response. each section and subsection generates a score that is summed across the items in that section or subsection. the scoring grid was finalised. section 1 produces a maximum composite section score of 32 that comprises the subsection 1 score (a maximum score of 18) and subsection 2 score (a maximum score of 14). section 2 produces a maximum composite section score of 39 that comprises the subsection 1 score (a maximum score of 13), subsection 2 score (a maximum score of 13) and subsection 3 score (a maximum score of 13). interpretation: each section has a quality description that guides the interpretation of the subsections and composite scores. three quality assurance descriptions were defined, namely (1) poor, (2) good and (3) excellent. quality descriptions for section 1 describe the quality of the translation. descriptions for section 2 describe the quality of the process for establishing linguistic equivalence. each quality description or category has corresponding corrective actions that can be undertaken by the researcher. step 5: developing accompanying templates and instruction guide: two accompanying documents were compiled and included in the qtlc. the first document is the qtlc template that the researcher(s) responsible for the translation completes (appendix 3). this template corresponds to the items and sections of the checklist. the researcher(s) captures the details of the translation and equivalence processes on this template. the template is used by reviewers as the source document for their evaluations. the motivation for this template was to create a higher level of consistency and uniformity in presenting the content on the adaptation process. it also reduces bias where researchers familiar with the checklist may be advantaged and can tailor the presentation of their information. the second document is the reviewer response form (appendix 4). the response form includes the items and scoring options. reviewers are the intended users of the response form. this response form facilitates ease of use, as provision is made for the reviewer to record his or her scores. an interpretation matrix was designed that included a guide to interpretation in tabular form. the tables contain the categorisation of composite scores and the corresponding quality description and corrective actions. for translation, scores below 50% (less than 16 out of a possible 32) indicate a low level of compliance with itc guidelines and are given a ‘poor’ rating. researchers are recommended to redo the translation as per the recommended guidelines in such cases. scores between 50% and 79% (between 16 and 24 out of a possible 32) were given a rating of ‘good’. such a quality description indicates that there was basic compliance with the guidelines. researchers are recommended to identify and revise items where concerns were raised. scores above 79% (25 or more out of a possible 32) were given a quality description as ‘excellent’. this indicates that there was a high level of compliance with the itc guidelines. for linguistic equivalence, scores below 50% (less than 19 out of a possible 39) indicate a low level of compliance with the guidelines. researchers are recommended to redo the equivalence process in compliance with the recommended guidelines. scores between 50% and 79% (between 19 and 30 out of a possible 39) indicate a basic level of compliance. researchers must identify and revise items or subsections where concerns have been raised. scores equal to and above 80% (between 31 and 39) suggest a high level of compliance and were given a quality description as ‘excellent’. based on this evaluation, researchers can reasonably conclude that linguistic equivalence was achieved. given the nature of linguistic equivalence, the quality description was also applied to the subsections to assist researchers to identify areas where they can improve or enhance the process. phase 2 piloting entailed an application of the instrument to the translation process of the emotional-social screening tool for school readiness (e3sr) from english into afrikaans. the e3sr is a locally developed screening instrument that assesses preschoolers’ emotional and social competencies before entry into mainstream education. it has six factors: emotional maturity, emotional management, sense of self, readiness to learn, social skills and communication. research on the e3sr established construct validity and reliability (munnik et al., 2021). more recently, research on the e3sr focused on its translation into afrikaans. additional considerations were that the construct ‘emotional-social competence’ had an equivalent in the target language, afrikaans (bornman & potgieter, 2017). afrikaans has been well established as an academic language and has been used widely in education in south africa. therefore, equivalent constructs were readily available for most psychological constructs, including emotional-social competence. there was a clear indication from experts in development and education that the definition and content of the construct ‘emotional-social competence’ was well defined in the source language with established equivalents in the target language (munnik & smith, 2019). therefore, the afrikaans lexicon or vocabulary sufficiently covered the denotations and connotations of the content (constructs, domains and attributes) that required translation. as a result, translated items could be developed that were appropriate for use with the intended population (preschoolers) and afrikaans-speaking respondents who would complete the screening tool. these considerations were aligned well with the preconditions outlined in the itc guidelines. the translation and adaptation of the e3sr from english to afrikaans was deemed appropriate for piloting the qtlc. translation of the emotional-social screening tool for school readiness the translation of the e3sr followed the operational steps proposed by sousa and rojjanasrirat (2011) as follows. step 1: translation of the original emotional-social screening tool for school readiness into afrikaans the e3sr was translated from english to afrikaans by two independent translators. the translators were fluent in english and afrikaans. translator 1 was a clinical psychologist with 45 years of focused experience in translation and editing. translator 2 was a research psychologist with expertise in test construction and psychometrics and possessed 40 years of experience in translation. this step generated two translations, independently labelled as tl-1 and tl-2. step 2: comparison of the two translated versions (tl-1 and tl-2) the two forward-translated versions of the instrument (tl-1 and tl-2) were compared for ambiguities and discrepancies by the two authors, both qualified psychologist with experience in test development and the content domain. differences were discussed and resolved by the research team. this step resulted in a final draft (tl-3). an external auditing process was conducted by the third author to distil a final translation. step 3: back translation of the initial translated version the translated version (tl-3) was translated back into english by three independent translators who produced three back-translated versions (b-tl1, b-tl2 and b-tl3). the second set of translators had no prior knowledge of the original draft and performed their translations blind. translator 1 was a clinical psychologist with expertise in clinical practice, language studies and 4 years of translation experience. translator 2 was a research psychologist with expertise in research methodology, building capacity and qualifications in both editing and language studies, with 3 years of translation experience. translator 3 was a linguist with expertise in language, education, communication studies and translation and had 30 years of translation experience. step 4: comparison of the back translated versions (b-tl1, b-tl2 and b-tl3) the back translations were compared with the original e3sr for format, wording, grammatical structure and meaning by the two authors. ambiguities and discrepancies regarding cultural meaning and colloquialisms, idioms in words, sentences between back translations and the original e3sr were discussed and resolved between the researchers and the translators. an external auditing process was conducted by the third author to distil a final translation. step 5: assessing the quality of the translation process and establishing linguistic equivalence the qtlc was piloted during this step. two independent reviewers assessed the quality of the process using the qtlc. reviewer 1 (r1) was a research psychologist who had expertise in the field of statistical techniques and psychometric test construction. reviewer 2 (r2) was a research psychologist with expertise in capacity development and transferable skills training in research methodology. the reviewers submitted their reviews of the e3sr, and their scores were entered into a composite sheet for ease of comparison and the calculation of inter-rater reliability. the reviewers were also asked to provide qualitative feedback on the qtlc. the comments of the reviewers were tabularised and presented as appendix 5. the table includes general comments on the qtlc and comments on specific items. procedure and data analysis the details of the translation and equivalence processes in the translation of the e3sr were recorded on the qtlc template. this populated template was given to the reviewers as the source document for the evaluation. the reviewers used the reviewer response form to record their scoring of the items and tallying of subsection and section scores. inter-rater reliability the kappa statistic was used to calculate inter-rater reliability. the kappa statistic uses cross-tabulations to assess inter-rater reliability (field, 2013). a threshold kappa statistic of 0.61 was established, which is described as a substantial agreement by glen (2014). the inter-rater reliability provided evidence on the agreement between raters when using the qtlc to assess the quality of the processes followed in the translation of the e3sr. ethical considerations ethical clearance was obtained from the human and social science research ethics committee (reference number: hs21/9/2) at the university of western cape. permission was given by dr munnik to use the translation of the e3sr for piloting of the qtlc. all personal data of translators and reviewers were de-identified and stored in line with the specified guidelines of the protection of personal information act no. 1 of 2019 (popia). translators and raters signed a binding agreement to maintain independence of their contributions. the agreement included an undertaking to uphold any copyright and intellectual property stipulations by the authors of the e3sr, and the qtlc. results phase 1: construction the draft qtlc was found to be clear and coherent by the reviewers. specific comments on the scoring of individual items were raised. these recommendations were applied that resulted in a simplified and more unified scoring grid. in particular, items asking about the experience of translators were revised to list translators separately and evaluate them separately. a weighted score was introduced for these items, which are described in the scoring section. the reviewers reported that the addition of the qtlc template was crucial and that the researchers had to take responsibility to complete this in a detailed manner, as it formed the source document for reviewing the adaptation processes. the alignment of the structure of the qtlc template and the reviewer report form to that of the qtlc was found to be very helpful. phase 2: piloting the results are presented per section of the qtlc for ease of presentation. section a: rating the translation processes the raters scored subsection 1 identically. the raters awarded a subsection score of eight out of nine, indicating that the translators involved had a high level of experience relevant to translation in the source and target languages. both raters awarded a score of 16 for subsection 2, which was the maximum score possible. the section score was the sum of the two subsection scores. a section score of 24 was attained on both ratings. the corresponding quality description indicated that a high level of compliance with the itc guidelines was achieved in this translation process. the recommended action was to proceed to establish linguistic equivalence. section b: rating linguistic equivalence the scoring in subsection 1 was identical for both raters. a subsection score of 15 was attained. this score rates the equivalence achieved between the original and the resultant afrikaans version (tl-3) as excellent, as evidenced by a score exceeding 11. the scores of the raters for subsection 2 differed by two points. rater 1 assigned 14 points, whereas rater 2 awarded 12 points. the difference was on the items that dealt with the resolution of differences. the text on the qtlc template indicated that the reviewers discussed the differences and reached a decision. rater 1 interpreted this to mean ‘consensus’ and scored three points. rater 2 interpreted this as resolution by ‘discussion’ and scored two points. the scores indicated that excellent adherence to the itc guidelines for equivalence was achieved between the afrikaans e3sr and back translations, as evidenced by a score exceeding 11. there was a difference of two points between the ratings awarded for subsection 3. as before, the difference occurred on two items dealing with how differences were resolved. the qtlc does not make the distinction between discussion and consensus clear, resulting in the different interpretation of the reviewers on these items. the scores, 14 and 12, respectively, indicated that excellent equivalence was achieved between the english e3sr and back translations. the section score was the sum of subsections 1, 2 and 3. the section scores awarded by rater 1 (43) were four points higher than rater 2 (39). both scores indicate that a high level of compliance to the itc guidelines for establishing equivalence between the english and the afrikaans versions of the e3sr was achieved. linguistic equivalence between the english and afrikaans draft has therefore been endorsed. the kappa statistic (0.78) tested significant at a 0.00 alpha level. there was a substantial agreement between the raters on the quality of the translation process and equivalence. high inter-rater reliability was achieved, despite the response options being interpreted differently by the reviewers. revisions of the scoring on the identified items and response options was addressed in a subsequent revision. discussion the itc guidelines for adaptation through translation and linguistic equivalence are established and widely accepted. however, the lack of a formal checklist hampered the systematic application in adaptation studies. similarly, systematic reporting was lacking. the construction of the qtlc addressed an important gap in the body of literature. the checklist format was easy to administer. the processes followed in the construction of the qtlc followed a systematic process and demonstrated a high level of alignment with the itc guidelines that deal specifically with test adaptation. as mentioned here, the qtlc excluded guidelines related to the preconditions, as these are thought to be covered in general research processes and reporting conventions. the checklist is formative because it identifies areas where there may be concerns about the level of compliance with the itc guidelines. the interpretation of scores includes a useful recommendation for corrective action that can enhance the processes. the resultant checklist constituted an operationalisation of the itc guidelines for good practice in translation and establishing linguistic equivalence. the response options on two items in subsection 2 and two items in subsection 3 were interpreted differently by the reviewers. the lack of a clear distinction between the terms ‘discussion’ and ‘consensus’ as means of resolving differences resulted in raters scoring differently. this limitation was offset by follow-up discussions with the raters to understand the reasoning behind the difference in their scoring. the respective section scores still attained the same quality description. thus, the difference in scoring impacted the section score quantitatively but not the corresponding quality description. the revision of the affected items is a priority in further refinement. the qltc was successfully used to evaluate the translation of the e3sr into afrikaans. the finding suggests a high level of compliance with the itc guidelines in the processes followed during translation and linguistic equivalence between the resultant afrikaans translation and the original english version of the e3sr. the excellent rating obtained provides a basis for concluding that the resultant translation was linguistically equivalent to the original english version of the e3sr. the following limitations were observed: the qtlc was only piloted in one translation study. the findings, although encouraging, need to be replicated in more studies. the theoretical underpinnings of the instrument are closely aligned with the itc guidelines. thus, the interpretation of the qtlc must be performed in relation to the itc guidelines and it may not reflect other criteria contained in guidelines that were developed separately. the item assessing the experience of the translators was scored based on the cumulative experience of the translators. during the review period, this item was flagged as problematic, as it might not accurately reflect differences in experiences between examiners. in the template, the researcher is required to record the exact experience in years. the scoring grid was retained as per recommendations of reviewers. scoring for this item was amended to score translators separately. in addition, provision was made for assessing the translators for forward and backward translation separately resulting in two items. provision was made for additional translators to be included. increasing the number of translators above two could exceed the threshold criterion, and therefore the cumulative maximum score could increase correspondingly. this challenge was addressed by introducing a weighted score. the weighted score also made it possible for all translators to be included and evaluated separately. this avoided aggregated scores masking the differences between translators. the maximum score for each of these items was based on a threshold of two translators who both would have the highest level of experience. the maximum score for forward translation and back translation would be 6 (2 translators × 3 points), respectively. the maximum scores for these items that can be added to the subsection and section scores was set at 6. this was based on the threshold expectation of two translators with the maximum score awarded for experience (2 × 3 = 6). in other words, above threshold practices would not result in an inflation of the maximum score and the overall section score. the subsection score increased from 9 to 18 and the section score from 23 to 32. quality descriptions are still based on the stated percentages, but the value of scale scores would increase for section one. the formula to calculate the weighted score is included on the qtlc and the reviewer form. implications for future research, practice and theory the qtlc attempts to distil the guidelines proposed by the itc for the processes for translation of instruments and establishing linguistic equivalence. this checklist creates a means for empirically evaluating the translation process from a theory-driven perspective that produces quantifiable outcomes. the checklist contributes to making the methodology underpinning translation explicit, which improves upon the tacit and implicit assumptions offered in the reporting of adaptation studies. the adoption of the qtlc through reuse provides an avenue for making the translation process part of the methodology of adaptation studies and centralise translation and equivalence as a core aspect of the adaptation process. conclusion the qtlc is a robust checklist that is conceptually grounded in the globally accepted itc guidelines. this checklist provides a quantifiable methodology for assessing the quality of the processes followed in translation and the establishment of linguistic equivalence. the qltc provides a method for making implicit processes explicit that in turn enhances the quality of reporting on adaptation through translation and equivalence. acknowledgements the reviewers involved in the piloting of the checklist are hereby acknowledged for their constructive feedback and contribution to the study. competing interests the authors confirm that there are no financial or personal relationships that may have improperly influenced them in writing this article. authors’ contributions m.r.s. developed the qtlc and conceptualised the article. m.r.s. contributed to the writing of the article. n.a. piloted the instrument as part of her research towards a postgraduate qualification. the author contributed to the writing of the article. e.m. contributed to the review of the checklist and contributed to the piloting and revision of the checklist. this author contributed to the writing of the article and acted as the corresponding author. funding information the national research foundation (nrf) provided financial assistance through the thuthuka instrument to the first author. opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to the nrf. data availability the data that support the findings of this study can by made available by the corresponding author, e.m., upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references arafat, s.y., chowdhury, h.r., qusar, m.m.a.s., & hafez, m.a. (2016). cross cultural adaptation and psychometric validation of research instruments: a methodological review. journal of behavioural health, 5(3), 129–136. https://doi.org/10.5455/jbh.20160615121755 arnold, b.r., & smith, j.l. (2013). methodologies for test translation and cultural equivalence. in f. paniagua & a. yamada (eds.), handbook of multicultural mental health (pp. 243–262). san diego: elsevier, academic press. behr, d. (2017). assessing the use of back translation: the shortcomings of back translation as a quality testing method. international journal of social research methodology, 20(6), 573–584. https://doi.org/10.1080/13645579.2016.1252188 bornman, e., & potgieter, p.h. (2017). language choices and identity in higher education: afrikaans-speaking students at unisa. studies in higher education, 42(8), 1474–1487. https://doi.org/10.1080/03075079.2015.1104660 cash, p., & snider, c. (2014). investigating design: a comparison of manifest and latent approaches. design studies, 35(5), 441–472. https://doi.org/10.1016/j.destud.2014.02.005 chen, h.y., & boore, j.r. (2010). translation and back-translation in qualitative nursing research: methodological review. journal of clinical nursing, 19(1–2), 234–239. https://doi.org/10.1111/j.1365-2702.2009.02896.x chidlow, a., plakoyiannaki, e., & welch, c. (2014). translation in cross-language international business research: beyond equivalence. journal of international business studies, 45(5), 562–582. https://doi.org/10.1057/jibs.2013.67 epstein, j., santo, r.m., & guillemin, f. (2015). a review of guidelines for cross-cultural adaptation of questionnaires could not bring consensus. journal of clinical epidemiology, 68(4), 435–441. https://doi.org/10.1016/j.jclinepi.2014.11.021 field, a. (2013). discovering statistics using ibm spss statistics. los angeles, london, new delhi: sage. geisinger, k.f. (2003). testing and assessment in cross-cultural psychology. in j.r. graham & j.a. naglieri (eds.), handbook of psychology (2nd ed., pp. 95–117). washington: john wiley & sons. glen, s. (2014). cohen’s kappa statistic. retrieved from statisticshowto.com: elementary statistics for the rest of us! https://www.statisticshowto.com/cohens-kappa-statistic/ graneheim, u.h., & lundman, b. (2004). qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. nurse education, 24(2), 105–112. https://doi.org/10.1016/j.nedt.2003.10.001 hambleton, r.k. (2011). the next generation of the itc test translation and adaptation guidelines. european journal of psychological assessment, 17(3), 164–172. https://doi.org/10.1027//1015-5759.17.3.164 hernández, a., hidalgo, m.d., hambleton, r.k., & gomez-benito, j. (2020). international test commission guidelines for test adaptation: a criterion checklist. psicothema, 32(2), 390–398. international test commission (itc). (2016). the international test commission guidelines on the security of tests, examinations, and other assessments: international test commission (itc). international journal of testing, 16(3), 181–204. https://doi.org/10.1080/15305058.2015.1111221 lakens, d. (2017). equivalence tests: a practical primer for t tests, correlations, and meta-analyses. social psychological and personality science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177 lakens, d., scheel, a.m., & isager, p.m. (2018). equivalence testing for psychological research: a tutorial. advances methods and practices in psychological science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963 mahmood, d., & jacobo, h. (2019). grading for growth: using sliding scale rubrics to motivate struggling learners. the interdisciplinary journal of problem-based learning, 13(2). https://doi.org/10.7771/1541-5015.1844 mohamed, s.a. (2013). the development of a school readiness screening instrument for grade 00 (pre-grade r) learners. doctoral dissertation. bloemfontein: university of the free state. munnik, e., & smith, m.r. (2019). methodological rigour and coherence in the construction of instruments: the emotional social screening tool for school readiness. african journal of psychological assessment, 1, a2. https://doi.org/10.4102/ajopa.v1i0.2 munnik, e., wagener, e., & smith, m. (2021). validation of the emotional social screening tool for school readiness. african journal of psychological assessment, 3, a42. https://doi.org/10.4102/ajopa.v3i0.42 odero, e.o. (2017). problems of finding linguistic equivalence when translating & interpreting for special purposes. international journal of academic research in business and social sciences, 7(7), 402–414. https://doi.org/10.6007/ijarbss/v7-i7/3110 omar, y.z. (2012). the challenges of denotative and connotative meaning for second-language learners. etc: a review of general semantics, 69(3), 324–351. peters, g.j. (2014). the alpha and the omega of scale reliability and validity: why and how to abandon cronbach’s alpha and the route towards more comprehensive assessment of scale quality. european health psychologist, 16(2), 59–69. rawoot, i., & florence, m.a. (2017). equivalence and bias in the south african substance use contextual risk instrument. psychological report, 120(1), 158–178. https://doi.org/10.1177/0033294116685865 sousa, v.d., & rojjanasrirat, w. (2011). translation, adaptation and validation of instruments or scales for use in cross-cultural health care research: a clear user friendly guideline. journal of evaluation in clinical practice, 17(2), 268–274. https://doi.org/10.1111/j.1365-2753.2010.01434.x tavakol, m., & dennick, r. (2011). making sense of cronbach’s alpha. international journal of medical education, 2, 53–55. https://doi.org/10.5116/ijme.4dfb.8dfd appendix 1: quality of translation and linguistic equivalence checklist (revised) appendix 2: quality of translation and linguistic equivalence checklist: interpretation matrix appendix 3: quality of translation and linguistic equivalence checklist template appendix 4: quality of translation and linguistic equivalence checklist: reviewer response form appendix 5: reviewer’s comments abstract introduction method results discussion conclusion acknowledgements references about the author(s) xander van lill department of industrial psychology and people management, college of business and economics, university of johannesburg, johannesburg, south africa product and research, jvr africa group, johannesburg, south africa nicola taylor department of industrial psychology and people management, college of business and economics, university of johannesburg, johannesburg, south africa data enablement, jvr africa group, johannesburg, south africa citation van lill, x., & taylor, n. (2021). the manifestation of the 10 personality aspects amongst the facets of the basic traits inventory. african journal of psychological assessment, 3(0), a31. https://doi.org/10.4102/ajopa.v3i0.31 original research the manifestation of the 10 personality aspects amongst the facets of the basic traits inventory xander van lill, nicola taylor received: 16 july 2020; accepted: 15 feb. 2021; published: 30 mar. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract personality traits play an important role in the prediction of important work-related outcomes. adapting the level at which personality constructs are measured can assist in predicting work-related outcomes at the corresponding level of specificity with greater accuracy. this study investigates whether eight hierarchical factors (also referred to as personality aspects) manifest amongst the facets of the basic traits inventory (bti). the study is based on an archival dataset of 1359 south african employees. orthogonal first-order, single-factor, higher-order, oblique lower-order and bifactor models were specified to investigate the hierarchical structure of eight of the 10 personality aspects. the evidence supports the notion that seven of the 10 personality aspects (as measured by the bti) could be more parsimoniously interpreted as total scores, but not necessarily hierarchical factors, amongst south african employees. it is, therefore, practically meaningful for practitioners to calculate such scores when the need arises for more detailed levels of prediction when selecting applicants or developing employees. keywords: 10 personality aspects; basic traits inventory; bifactor structure; hierarchical factor analysis, bandwidth-fidelity. introduction personality predicts several important criteria in the workplace, such as job performance, team effectiveness, leadership effectiveness and motivation (ones, dilchert, viswesvaran, & judge, 2007). a meta-analysis conducted by van aarde, meiring and wiernik (2017) in south africa, which included predictive studies based on the basic traits inventory (bti), reaffirmed the predictive validity of personality traits, especially conscientiousness, for technical, training, contextual and counterproductive performance. however, there is a continuing debate regarding the predictive validity of the five broad traits versus their constituent personality facets (see figure 1) for job performance (judge, rodell, klinger, simon, & crawford, 2013). some argue that the five broad traits are more robust predictors of job performance (barrick & mount, 2005; ones, viswesvaran, & dilchert, 2005), whereas others argue each broad trait’s facets enable researchers and practitioners to better exploit predictive validity at specific levels of job performance (anglim & grant, 2014; pletzer, oostrom, bentvelzen, & de vries, 2020; tett, steele, & beauregard, 2003). cronbach and gleser (1965) captured an important aspect of this debate by referring to it as the bandwidth-fidelity dilemma where ‘…there is some ideal compromise between a variety of information (bandwidth) and thoroughness of testing to obtain more certain information (fidelity)’ (p. 100). figure 1: a non-statistical representation of the hierarchical structure of the basic traits inventory (taylor & de bruin, 2017) based on deyoung et al.’s (2007) typology and judge et al.’s (2013) guidelines for the 10 personality aspects. on further analysis, facets in grey were removed due to low reliabilities and strength of inter-factor correlations reported. deyoung, quilty, & peterson (2007) found evidence to support a hierarchical level of personality between the broad five traits and their constituent facets, referred to as the 10 aspects of personality (see figure 1), based on the measures of abridged big five circumplex (hofstee, de raad, & goldberg, 1992) and neo personality inventory-revised (neo-pi-r; costa & mccrae, 1992). judge et al. (2013) provide evidence that the 10 personality aspects have a distinct advantage over broad personality traits in that they more coherently represent the unique correlations between personality facets. judge et al. (2013) further provide evidence that 10 personality aspects offer predictive gains over five broad traits when narrower aspects of job performance are measured. deyoung et al.’s (2007) findings on the 10 personality aspects had a considerable impact in the united states of america, boasting a total of 321 citations at the time of the search on the web of science. other countries citing deyoung et al.’s (2007) work at the time included canada (93), australia (73), germany (65), england (63), the netherlands (34), new zealand (20), peoples republic of china (16), scotland (14) and belgium (13). however, no replications of the 10 personality aspects have been investigated in less developed parts of the southern hemisphere. it is becoming increasingly important to provide evidence on the replicability of personality models in non-weird (white, educated, industrialised, rich and democratic) countries (laajaj et al., 2019). van de vliert and van lange (2019) emphasise the need for a discipline called cross-latitudinal psychology to investigate the replicability of findings from the northern hemisphere to less developed parts of the southern hemisphere. south africa is a middle-income country with 11 official languages with unique challenges in terms of its educational system (department of basic education, 2019), economic growth (south african reserve bank, 2019), distribution of wealth (statistics south africa, 2019), as well as public sector corruption in terms of accountability, transparency and state capture (transparancy international, 2019). an investigation of the hierarchical structure of eight of the 10 personality aspects in south africa could provide practitioners in the region with a more parsimonious representation of facets whilst still allowing employers to make more nuanced personnel selection and development decisions. the intention of this article is not to argue against the five factors of personality in favour of the 10 personality aspects but to provide practitioners with alternative ways of interpreting the same results based on evidence (wiernik, yarkoni, giordano, & raghavan, 2020). for example, based on figure 1, when it is important to predict either quantity or quality of tasks performed, measures of industriousness and orderliness respectively might provide unique information. figure 1 visually depicts a hierarchical structure of personality based on the three levels proposed for the bti in the present study, namely personality facets, aspects and traits. agreeableness (trait), as an example from figure 1, can be represented by two personality aspects, namely politeness and compassion. politeness, in turn, is represented by straightforwardness, compliance and modesty (facets) whereas compassion is represented by prosocial tendencies and tender-mindedness (facets). arrows leading from personality traits to aspects, such as the arrows leading from agreeableness to politeness and compassion, reflect a common factor at the trait level (agreeableness). similar to the findings of judge et al. (2013) on the neo-pi-r, some aspects, such as volatility (represented by facet affective instability in the bti) and intellect (represented by facet ideas in the bti), are represented by one indicator only and therefore, do not represent a hierarchical composite. research objective and hypotheses the current study aims to investigate the hierarchical structure of eight of deyoung et al.’s (2007) 10 personality aspects from data collected on the bti in south africa (taylor & de bruin, 2017). deyoung et al.’s (2007) 10 personality aspects could be viewed as a hierarchical level of personality between the big five traits and its constituent facets. according to deyoung et al. (2007), personality aspects might be a more parsimonious breakdown of the big five than the personality facets. furthermore, when compared to personality facets, aspects might better represent the phenotypical patterns of thought, affect and behaviour (deyoung et al., 2007; jang, livesley, angleitner, reimann, & vernon, 2002). deyoung et al. (2007) distinguish between two personality aspects on the trait extraversion, namely assertiveness and enthusiasm. assertiveness refers to an individual’s agency or dominance, whereas enthusiasm refers to outward friendliness. following the guidelines of judge et al. (2013), it is hypothesised that: h1: the personality aspect assertiveness explains covariance between a set of items in extraversion independent of the covariance that facets ascendance, liveliness and excitement seeking explained in the same set of items. h2: the personality aspect enthusiasm explains covariance between a set of items in extraversion independent of the covariance that facets excitement seeking, positive affectivity and gregariousness explained in the same set of items. trait neuroticism, according to deyoung et al. (2007), can be divided between two personality aspects, namely volatility and withdrawal. volatility refers to the outward expression of negative affect, as with irritability and aggression, whereas withdrawal refers to the internalisation of negative affect. the bti had only one indicator for volatility, which made the computation of a hierarchical composite unfeasible. however, as per judge et al.’s (2013) guidelines on withdrawal, it could be hypothesised that: h3: the personality aspect withdrawal explains covariance between a set of items in neuroticism independent of the covariance that facets depression, self-conscious and anxiety explained in the same set of items. deyoung et al. (2007) argue that trait conscientiousness can be represented by two personality aspects, namely industriousness and orderliness. whereas industriousness refers to the tendency to be reliable and hardworking, orderliness reflects a preference for perfectionism. following the guidelines of judge et al. (2013), it is hypothesised that: h4: the personality aspect industriousness explains covariance between a set of items in conscientiousness independent of the covariance that facets effort and self-discipline explained in the same set of items. h5: the personality aspect orderliness explains covariance between a set of items in conscientiousness independent of the covariance that facets order, dutifulness and prudence explained in the same set of items. trait openness to experience, according to de young et al. (2007), composes of two personality aspects, namely intellect and aesthetic openness. intellect refers to an inclination towards creative ingenuity, whereas aesthetic openness reflects an appreciation for beauty in the world. like the study conducted by judge et al. (2013), intellect is represented by one indicator only, namely ideas, which makes the calculation of a hierarchical composite unfeasible. however, as per judge et al.’s (2013) guidelines on aesthetic openness, it could be hypothesised that: h6: the personality aspect aesthetic openness explains covariance between a set of items in openness to experience independent of the covariance that facets aesthetics, actions, values and imagination explained in the same set of items. deyoung et al. (2007) distinguish between two personality aspects of agreeableness, namely politeness and compassion. politeness refers to an individual’s tendency to be a pleasant person to be around, whereas compassion reflects a tendency for social awareness and goodwill. per the guidelines of judge et al. (2013), it is hypothesised that: h7: the personality aspect politeness explains covariance between a set of items in agreeableness independent of the covariance that facets straightforwardness, compliance and modesty explained in the same set of items. h8: the personality aspect compassion explains covariance between a set of items in agreeableness independent of the covariance that facets prosocial tendencies and tender-mindedness explained in the same set of items. method participants the respondents were 1359 individuals of varying ages (mean = 28.33 years, standard deviation [sd] = 7.44 years) who completed the bti for selection (n = 1019, 75%) or development (n = 340, 25%) purposes at various south african organisations. most of the respondents were black african (n = 941, 69%), followed by white (n = 197, 14%), coloured (individuals of mixed ancestry; n = 92, 7%) and indian (63, 5%). the sample comprised more men (n = 693, 51%) than women (n = 666, 49%). the majority of the respondents’ first language was isizulu (n = 257, 19%), followed by english (n = 237, 17%), sepedi (n = 146, 11%), afrikaans (n = 145, 11%), setswana (n = 143, 11%), isixhosa (n = 123, 9%), sesotho (n = 109, 8%), xitsonga (n = 80, 6%), tshivenda (n = 61, 4%), siswati (n = 38, 3%) and isindebele (n = 15, 1%). most of the respondents’ highest qualification was grade 12 (n = 693, 51%), followed by a diploma (n = 340, 25%), bachelor’s degree (n = 139, 10%), less than matric (n = 73, 5%), honour’s degree (n = 50, 4%), master’s degree (n = 12, 1%) and doctoral degree (n = 3, 0.22%). instruments archival data on the bti was used to inspect the manifestation of the 10 personality aspects amongst south african employees. the bti is a measure of the five factors of personality and provides a further breakdown of 24 facets. a review of the technical manual on the bti indicated that most of the facets, apart from values (0.44) and modesty (0.56), display good internal consistency reliabilities (α ≥ 0.64). exploratory factor analysis supports the big five structure of the bti. a calculation of congruence scores, between the factor structures for south africans that self-identified as white or black african, supported the measurement invariance of the assessment across ethnic groups (taylor & de bruin, 2017). the measure has 193 items and utilises a five-point likert scale. procedure the data were collected as part of several projects that have been conducted by the jvr africa group in different workplace settings. data were collected via paper-and-pen or online assessments. the study was low in risk, but precautions were taken to ensure that participation was voluntary and anonymous, no harm was caused, the questions were filled in truthfully and informed consent was given to use the results for research purposes. data analysis descriptive statistics the internal consistency reliability of the scales in the respective measures was inspected by calculating cronbach’s alpha (cronbach, 1951) and mcdonald’s omega (mcdonald, 1999). cronbach’s alpha coefficient and mcdonald’s omega coefficient were calculated using version 0.4–14 of the semtools package in r (jorgensen, pornprasertmanit, schoemann, & rosseel, 2019) and are interpreted as estimates of internal consistency reliability (revelle & zinbarg, 2009). confirmatory factor analysis judge et al. (2013) conducted a higher-order confirmatory factor analysis (cfa) to inspect facet loadings on the second-order personality aspects. credé and harms (2015) recommend that five sequential models are tested before it can be argued that hierarchical structure exists within a psychometric measure, namely (1) orthogonal first-order, (2) single-factor, (3) higher-order, (4) oblique lower-order, and (5) bifactor models. figure 2 provides an example, based on agreeableness, on how each of the mentioned factor models is specified. not all the items of agreeableness are visually displayed in figure 2 (scale of the trait consists of 30 items). figure 2: factor structures of agreeableness on the bti based on credé and harms’s (2015) guidelines. as portrayed in figure 2, both higher-order (3) and bifactor (5) models, represent hierarchical factor models. with higher-order models, personality facets mediate the relationship between the manifest variables and the second-order personality aspects (beaujean, 2014). consequently, the second-order personality aspects do not explain unique variance in the manifest variables over and above the personality facets (beaujean, 2014; mcabee, oswald, & connelly, 2014). bifactor models, in contrast, account for the unique variance explained in the manifest variables by the orthogonal personality aspects, over and above the variance explained by the orthogonal personality facets (beaujean, 2014; mcabee et al., 2014), which justifies bifactor models as the test models in this study. as presented in figure 1, the aspects are specified to correlate in the hierarchical models because of the aspects’ common variance at the trait level. a cfa with weighted least square mean and variance (wlsmv) estimation was performed to inspect the inter-factor correlations and hierarchical factor structures of the 10 personality aspects (beauducel & yorck herzberg, 2009; distefano, 2002; li, 2016). the wlsmv estimation was chosen based on the recommendation of li (2016), who indicated that wlsmv outperforms robust maximum likelihood (mlm) estimation when determining the parameter estimates and standard errors of factor loadings for items with scales consisting of five or more numerical categories. the multivariate skewness (1 374 251, p < 0.001) and kurtosis (212.41, p < 0.001) for the entire set of 180 items (excluding the social desirability scale) further justified the use of a robust estimator (distefano & morgan, 2014). the fit was considered suitable if the rmsea and srmr were ≤ 0.08 (brown, 2006; browne & cudeck, 1992) and comparative fit index (cfi) and tucker-lewis index (tli) > 0.95 (brown, 2006; hu & bentler, 1999). even if comparative fit indices display marginally good fit to the data (cfi and tli in the range of 0.90 to 0.95), models might still be considered to display acceptable fit if other indices (srmr and rmsea) in tandem are in the acceptable range (brown, 2006). because of the lack of the log-likelihood value in wlsmv, the akaike information criterion (aic) (akaike, 1987) and schwarz’s bayesian information criterion (bic) (raftery, 1995) could not be calculated (finch & french, 2015). the chi-square statistic, including a comparison of the relative fit of different models (vandenberg & lance, 2000), was used to compare each of the alternatives to the hypothesised models (credé & harms, 2015). ethical considerations ethical approval to conduct the study was obtained from the research ethics committee (department of industrial psychology and people management) at the university of johannesburg on 30 june 2020 (reference no. ippm-2020-431). results descriptive statistics table 1 provides the mean item score and standard deviation for each scale of the bti, along with the alpha and omega reliability estimates and standardised inter-factor correlations of the facets comprising the big five traits. the inter-factor correlations were obtained by conducting oblique lower-order confirmatory factor models. the indices of fit for extraversion (χ2 [df] = 4280.58 [550]; cfi = 0.81; tli = 0.79; srmr = 0.09; rmsea = 0.08 [0.07; 0.08]) and agreeableness (χ2 [df] = 4440.87 [550]; cfi = 0.84; tli = 0.83; srmr = 0.07; rmsea = 0.07 [0.07; 0.08]) were less desirable but improved when facets with low internal consistency reliabilities and inter-factor correlations were removed in later analyses, as evident with the fit reported for the oblique lower-order model for extroversion and agreeableness in table 2. the fit statistics for an oblique lower-order confirmatory factor model for neuroticism (χ2 [df] = 3550.79 [521]; cfi = 0.91; tli = 0.90; srmr = 0.06; rmsea = 0.07 [0.07; 0.07]), conscientiousness (χ2 [df] = 3268.39 [769]; cfi = 0.93; tli = 0.93; srmr = 0.05; rmsea = 0.05 [0.05; 0.05]) and openness to experience (χ2 [df] = 2424.85 [454]; cfi = 0.91; tli = 0.91; srmr = 0.06; rmsea = 0.06 [0.06; 0.06]) were satisfactory. table 1a: inter-factor correlations of scales on basic traits inventory. extraversion. table 1b: inter-factor correlations of scales on basic traits inventory. neuroticism. table 1c: inter-factor correlations of scales on basic traits inventory. conscientiousness. table 1d: inter-factor correlations of scales on basic traits inventory. openness to experience. table 1e: inter-factor correlations of scales on basic traits inventory. agreeableness. table 2: fit statistics of different factor models. in general, evidence from table 1 suggests that the facets associated with each trait are highly correlated, except for the correlation of excitement seeking with other facets of extraversion. this finding corroborates the claim that personality facets can be empirically categorised under broad traits with the population under investigation. most of the facets yielded satisfactory inter-item reliability coefficients (omega ≥ 0.66), apart from the scales for values (omega = 0.42) and modesty (omega = 0.44). these lower reliability scores are in line with previous findings (taylor & de bruin, 2017). the difference in reliability between alpha and omega with modesty could have been caused by the violation of the condition for tau-equivalence, which made alpha a less conservative estimate of the true population reliability (dunn, baguley, & brunsden, 2014). confirmatory factor analysis to determine whether eight hierarchical composites could be derived from the facets of the bti, the fit of different factor models proposed by credé and harms (2015) was investigated for each trait and are reported in table 2. each of the five models’ chi-square statistics is compared with the test model with two personality aspects. with openness to experience and agreeableness, the respective facets values and modesty were dropped because of low reliabilities reported. in extraversion, the excitement seeking facet was dropped from both assertiveness and enthusiasm because of the low inter-factor loadings reported. liveliness was dropped because of several negative factor variances reported, which could have been caused by model mis-specification (brown, 2006; schumacker & lomax, 2010). the removal of the liveliness facet also improved the overall fit of the factor models specified. because of the removal of liveliness and excitement seeking, a hierarchical composite for assertiveness could not be specified, which made the assessment of h1 unfeasible. the first two items from the compliance facet of agreeableness and the sixth item of the gregariousness facet of extraversion were dropped because of negative factor variance reported, which could also have been caused by model specification error (brown, 2006; schumacker & lomax, 2010). as reflected in table 2, the bifactor structure of deyoung et al.’s (2007) two-personality aspects yielded a better fit to the data than the orthogonal first-order, single-factor, higher-order and oblique lower-order structures in most of the traits. the bifactor model with two personality aspects for neuroticism yielded a similar fit to the oblique lower-order model. bifactor statistical indices the superiority of bifactor models’ fit indices, relative to other confirmatory factor models, could be a symptom of overfitting (bonifay, lane, & reise, 2017). rodriguez, reise and haviland (2016) recommend that bifactor statistical indices are calculated to determine the practical meaningfulness of group factors, such as the explained common variance (ecv), coefficient omega hierarchical (ωh), construct replicability (h), factor determinacy (fd), percentage of uncontaminated correlations (puc) and absolute relative parameter bias (arpb). group factors of each personality aspect were considered more plausible when ωh, h and fd2 were > 0.50, 0.70, and 0.70, respectively (dueber, 2017; reise, bonifay, & haviland, 2013). explained common variance for the general factor (g) > 0.70 and puc > 0.80 were indicative of unidimensionality (reise et al., 2013). when puc is < 0.80, ecv of g is > 0.60 and omegah of g is > 0.70, the factor structure may still be interpreted as unidimensional (reise et al., 2013). absolute relative parameter bias of 10% to 15% was indicative of little difference in the factor loadings between a single-factor model and the general factor in a bifactor model (rodriguez et al., 2016). bifactor statistical indices were calculated from the standardised factor loadings in the bifactor models of the two personality aspects (dueber, 2020) in r (r core team, 2016). bifactor statistical indices were only calculated for the seven personality aspects for which the calculation of total scores might be meaningful. the bifactor statistical indices are reported in table 3. table 3: bifactor statistical indices for personality aspects. the bifactor statistical indices in table 3 provide evidence for: h2: a unidimensional model for enthusiasm and diminished biasing effect for the group factors of positive affectivity and gregariousness. an interpretation of enthusiasm as a total score, instead of a hierarchical factor with sub-scores for positive affectivity and gregariousness, might be a more appropriate representation of the data. h3: a unidimensional model for withdrawal and diminished biasing effect for the group factors of depression, self-consciousness and anxiety. an interpretation of withdrawal as a total score, instead of a hierarchical factor with sub-scores for depression, self-consciousness and anxiety might be a more appropriate representation of the data. h4: a unidimensional model for industriousness and diminished biasing effect for the group factors of effort and self-discipline. an interpretation of industriousness as a total score, instead of a hierarchical factor with sub-scores for effort and self-discipline, might be a more appropriate representation of the data. h5: a unidimensional model for orderliness and diminished biasing effect for the group factors of order, dutifulness and prudence. an interpretation of orderliness as a total score, instead of a hierarchical factor with sub-scores for order, dutifulness and prudence, might be a more appropriate representation of the data. h6: a unidimensional model for aesthetic openness and diminished biasing effect for the group factors of aesthetics, actions and imagination. an interpretation of aesthetic openness as a total score, instead of a hierarchical factor with sub-scores for aesthetics, actions and imagination, might be a more appropriate representation of the data. h7: a unidimensional model for compassion and diminished biasing effect for the group factors of prosocial tendencies and tender-mindedness. an interpretation of compassion as a total score, instead of a hierarchical factor with sub-scores for prosocial tendencies and tender-mindedness, might be a more appropriate representation of the data. h8: a unidimensional model for politeness and diminished biasing effect for the group factors of straightforwardness and compliance. an interpretation of politeness as a total score instead of a hierarchical factor with sub-scores for straightforwardness and compliance, might be a more appropriate representation of the data. discussion evidence based on the procedure proposed by credé and harms (2015), supports the hierarchical structure of seven of the 10 personality aspects in the bti. however, interpretations of sub-scores on facets, independent of the total scores for personality aspects, should be tempered by the evidence provided by bifactor statistical indices in table 3. interpretations of facet-level sub-scores might be relevant for development purposes when anomalous results exist for a candidate but should still be interpreted in light of total aspect or trait score. for this study, we decided to drop the facet of excitement seeking from extraversion because of its low correlation with other facets of extraversion. previous research shows that excitement seeking is related to both openness to experience and extraversion (aluja, garcía, & garcía, 2003; hough, oswald, & ock, 2015; taylor & de bruin, 2017). other scholars argue that excitement seeking might be related to an alternative broader trait such as spontaneity, plasticity or impulsivity (deyoung, 2015; deyoung et al., 2007; hofstee et al., 1992). liveliness was dropped because of several negative factor variances reported, which could have been caused by model misspecification (brown, 2006; schumacker & lomax, 2010). liveliness and excitement seeking might share communality and could be investigated in the future as an indication of spontaneity, plasticity or impulsivity (deyoung, 2015; deyoung et al., 2007; hofstee et al., 1992). facet playfulness in the south african personality inventory (sapi) displays a strong loading on extraversion (morton, hill, meiring, & de beer, 2019) and appears invariant across ethnic groups in south africa (morton, hill, meiring, & van de vijver, 2019). an adaptation of the facet playfulness might be a useful alternative to liveliness and excitement seeking or a meaningful addition to trait extraversion on the bti. the number of facets available for assertiveness did not enable the inspection of h1. modesty was dropped from agreeableness and values dropped from openness to experience because of the facets’ low reliability. the low reliabilities are similar to findings reported in prior studies conducted on the bti (taylor & de bruin, 2017). constructs, such as the interpersonal relatedness and broad-mindedness from the sapi (morton, hill, meiring, & de beer, 2019), might be more reliable additions than facets modesty and values. a revised version of the bti might include additional facets to inspect the hierarchical structure of assertiveness, volatility and intellect. based on the above findings, we propose working definitions for the 10 personality aspects in table 4. these definitions are phrased in terms of their implications for the workplace. table 4: potential work-related definitions for the 10 personality aspects. this study has important implications for assessment in the workplace. firstly, the potential of personality aspects from the bti to predict aspects of performance at the corresponding level of specificity, such as task, contextual (motowidlo & van scotter, 1994), adaptive, (pulakos, arad, donovan, & plamondon, 2000), counterproductive (spector et al., 2006) or leadership performance (yukl, 2012) can now be investigated. secondly, personality aspects provide practitioners with another, perhaps more parsimonious, layer of interpretability to help their clients make more informed decisions about the selection of applicants (judge et al., 2013). some limitations should be considered when interpreting the results. hierarchical structures could not be inspected for three of the 10 personality aspects. the addition of facets, with deyoung et al.’s (2007) conceptualisation of the 10 personality aspects as the point of departure might be a prospect for a revised version of the bti. an inspection of the bifactor statistical indices further reveals a more unidimensional rather than hierarchical structure amongst the seven of the 10 personality aspects. this suggests that the personality aspects might be a more parsimonious, albeit not hierarchical, representation of the common variance amongst the facets in the traits. psychology is currently facing challenges related to the replicability of the discipline’s findings (efendic & van zyl, 2019). even though this study provides evidence for the existence of personality aspects, albeit not hierarchical aspects, amongst south african employees, the proposed factor structure must be replicated with larger samples than the one used in the current study before definitive statements are made about the manifestation of the 10 personality aspects amongst the facets of the bti. an investigation of the 10 personality aspects amongst alternative measures of the big five in south africa, such as the neo-pi-r (laher, 2013) or the south african personality inventory (fetvadjiev, meiring, van de vijver, nel, & hill, 2015), might serve as additional evidence for the manifestation of the 10 personality aspects amongst facets of personality measures. other measures might also yield evidence for the hierarchical structure of the 10 personality aspects. in doing so, a more robust case can be built to argue the existence of deyoung et al.’s (2007) 10 personality aspects in south africa. the self-report data are the only point of reference. further studies on the prediction of specific facets of job performance from the personality aspects could bolster the scientific and practical usefulness of the 10 personality aspects in south africa (judge et al., 2013). finally, africa is in a unique position in that it has geographical areas stretching across the northern and southern hemispheres. it might be meaningful to determine if the findings on the 10 personality aspects are replicable in the northern hemispheres of africa as well, thereby giving heed to van de vliert and van lange’s (2019) call for cross-longitudinal research in psychology. conclusion the current study supports the notion that 10 personality aspects might provide a more parsimonious representation, but not necessarily a hierarchical representation, of the common variance amongst the facets of the bti. the findings hold promise for further research into the predictive validity of these personality aspects for specific levels of performance in the workplace. further replications are required before it can be conclusively shown that deyoung et al.’s (2007) 10 personality aspects represent a more parsimonious representation of personality facets in the south african context. whilst this model does not override the five-factor model of personality, it does allow for the prediction of more specific work-related outcomes based on parsimonious factors at a greater level of specificity than the big five. acknowledgements competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions x.v.l. and n.t. developed the conceptual framework and analysed the data. funding information this research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. data availability coefficients based on the bifactor confirmatory factor analysis are available on request from the corresponding author, x.v.l. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references akaike, h. (1987). factor analysis and aic. psychometrika, 52(3), 317–332. https://doi.org/10.1007/bf02294359 aluja, a., garcía, ó., & garcía, l.f. (2003). relationships among extraversion, openness to experience, and sensation seeking. personality and individual differences, 35(3), 671–680. https://doi.org/10.1016/s0191-8869(02)00244-1 anglim, j., & grant, s.l. (2014). incremental criterion prediction of personality facets over factors: obtaining unbiased estimates and confidence intervals. journal of research in personality, 53, 148–157. https://doi.org/10.1016/j.jrp.2014.10.005 barrick, m.r., & mount, m.k. (2005). yes, personality matters: moving on to more important matters. human performance, 18(4), 359–372. https://doi.org/10.1207/s15327043hup1804_3 beauducel, a., & yorck herzberg, p. (2009). on the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in cfa. structural equation modeling: a multidisciplinary journal, 13(2), 186–203. https://doi.org/10.1207/s15328007sem1302_2 beaujean, a.a. (2014). latent variable modeling using r: a step-by-step guide. new york, ny: routledge. bonifay, w., lane, s.p., & reise, s.p. (2017). three concerns with applying a bifactor model as a structure of psychopathology. clinical psychological science, 5(1), 184–186. https://doi.org/10.1177/2167702616657069 brown, t.a. (2006). confirmatory factor analysis for applied research. new york, ny: the guilford press. browne, m.w., & cudeck, r. (1992). alternative ways of assessing model fit. sociological methods & research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005 costa, p.t., & mccrae, r.r. (1992). the neo-pi–r professional manual. odessa, fl: psychological assessment resources. credé, m., & harms, p.d. (2015). 25 years of higher-order confirmatory factor analysis in the organizational sciences: a critical review and development of reporting recommendations. journal of organizational behavior, 36(6), 845–872. https://doi.org/10.1002/job.2008 cronbach, l.j. (1951). coefficient alpha and the internal structure of tests. psychometrika, 16(3), 297–334. https://doi.org/10.1007/bf02310555 cronbach, l.j., & gleser, g.c. (1965). psychological tests and personnel decisions (2nd ed.). urbana, il: university of illinois press. department of basic education. (2019). annual performance plan | 2018/2019. retrieved from https://www.education.gov.za/portals/0/documents/reports/annual%20performance%20plan%20201819.pdf?ver=2018-03-14-121624-263 deyoung, c.g. (2015). cybernetic big five theory. journal of research in personality, 56, 33–58. https://doi.org/10.1016/j.jrp.2014.07.004 deyoung, c.g., quilty, l.c., & peterson, j.b. (2007). between facets and domains: 10 aspects of the big five. journal of personality and social psychology, 93(5), 880–896. https://doi.org/10.1037/0022-3514.93.5.880 distefano, c. (2002). the impact of categorization with confirmatory factor analysis. structural equation modeling: a multidisciplinary journal, 9(3), 327–346. https://doi.org/10.1207/s15328007sem0903_2 distefano, c., & morgan, g.b. (2014). a comparison of diagonal weighted least squares robust estimation techniques for ordinal data. structural equation modeling: a multidisciplinary journal, 21(3), 425–438. https://doi.org/10.1080/10705511.2014.915373 dueber, d.m. (2017). bifactor indices calculator: a microsoft excel-based tool to calculate various indices relevant to bifactor cfa models. lexington, ky: uknowledge, university of kentucky. https://doi.org/10.13023/edp.tool.01 dueber, d.m. (2020). bifactor indices calculator. retrieved from https://cran.r-project.org/src/contrib/archive/bifactorindicescalculator/ dunn, t.j., baguley, t., & brunsden, v. (2014). from alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. british journal of psychology, 105(3), 399–412. https://doi.org/10.1111/bjop.12046 efendic, e., & van zyl, l.e. (2019). on reproducibility and replicability: arguing for open science practices and methodological improvements at the south african journal of industrial psychology. sa journal of industrial psychology, 45. https://doi.org/10.4102/sajip.v45i0.1607 fetvadjiev, v.h., meiring, d., van de vijver, f.j.r., nel, j.a., & hill, c. (2015). the south african personality inventory (sapi): a culture-informed instrument for the country’s main ethnocultural groups. psychological assessment, 27(3), 827–837. https://doi.org/10.1037/pas0000078 finch, w. h., & french, b. f. (2015). latent variable modeling with r. new york, ny: routledge. hofstee, w.k., de raad, b., & goldberg, l.r. (1992). integration of the big five and circumplex approaches to trait structure. journal of personality and social psychology, 63(1), 146–163. https://doi.org/10.1037/0022-3514.63.1.146 hough, l.m., oswald, f.l., & ock, j. (2015). beyond the big five: new directions for personality research and practice in organizations. annual review of organizational psychology and organizational behavior, 2(1), 183–209. https://doi.org/10.1146/annurev-orgpsych-032414-111441 hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling: a multidisciplinary journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 jang, k.l., livesley, w.j., angleitner, a., reimann, r., & vernon, p.a. (2002). genetic and environmental influences on the covariance of facets defining the domains of the five-factor model of personality. personality and individual differences, 33(1), 83–101. https://doi.org/10.1016/s0191-8869(01)00137-4 jorgensen, t.d., pornprasertmanit, s., schoemann, a., & rosseel, y. (2019). useful tools for structural equation modeling. retrieved from https://cran.r-project.org/web/packages/semtools/semtools.pdf judge, t.a., rodell, j.b., klinger, r.l., simon, l.s., & crawford, e.r. (2013). hierarchical representations of the five-factor model of personality in predicting job performance: integrating three organizing frameworks with two theoretical perspectives. journal of applied psychology, 98(6), 875–925. https://doi.org/10.1037/a0033901 laajaj, r., macours, k., hernandez, d.a.p., arias, o., gosling, s.d., potter, j., … vakis, r. (2019). challenges to capture the big five personality traits in non-weird populations. science advances, 5(7), eaaw5226. https://doi.org/10.1126/sciadv.aaw5226 laher, s. (2013). the neo-pi-r in south africa. in s. laher & k. cockcroft (eds.), psychological assessment in south africa: research and applications (pp. 257–269). johannesburg, south africa: wits university press. li, c.-h. (2016). confirmatory factor analysis with ordinal data: comparing robust maximum likelihood and diagonally weighted least squares. behavior research methods, 48(3), 936–949. https://doi.org/10.3758/s13428-015-0619-7 mcabee, t.s., oswald, l.f., & connelly, s.b. (2014). bifactor models of personality and college student performance: a broad vs narrow view. european journal of personality, 28(6), 604–619. https://doi.org/10.1002/per.1975 mcdonald, r.p. (1999). test theory: a unified treatment. mahwah, nj: erlbaum. morton, n., hill, c., meiring, d., & de beer, l.t. (2019). investigating the factor structure of the south african personality inventory – english version. sa journal of industrial psychology, 45(0), 1–13. https://doi.org/10.4102/sajip.v45i0.1556 morton, n., hill, c., meiring, d., & van de vijver, f.j.r. (2019). investigating measurement invariance in the south african personality inventory: english version. south african journal of psychology, 50(2), 274–289. https://doi.org/10.1177/0081246319877537 motowidlo, s.j., & van scotter, j.r. (1994). evidence that task performance should be distinguished from contextual performance. journal of applied psychology, 79(4), 475–480. https://doi.org/10.1037/0021-9010.79.4.475 ones, d.s., dilchert, s., viswesvaran, c., & judge, t.a. (2007). in support of personality assessment in organizational settings. personnel psychology, 60(4), 995–1027. https://doi.org/10.1111/j.1744-6570.2007.00099.x ones, d.s., viswesvaran, c., & dilchert, s. (2005). personality at work: raising awareness and correcting misconceptions. human performance, 18(4), 389–404. https://doi.org/10.1207/s15327043hup1804_5 pletzer, j.l., oostrom, j.k., bentvelzen, m., & de vries, r.e. (2020). comparing domainand facet-level relations of the hexaco personality model with workplace deviance: a meta-analysis. personality and individual differences, 152, 1–11. https://doi.org/10.1016/j.paid.2019.109539 pulakos, e.d., arad, s., donovan, m.a., & plamondon, k.e. (2000). adaptability in the workplace: development of a taxonomy of adaptive performance. journal of applied psychology, 85(4), 612–624. https://doi.org/10.1037/0021-9010.85.4.612 r core team. (2016). r: a language and environment for statistical computing. retrieved from https://cran.r-project.org/doc/manuals/r-release/fullrefman.pdf raftery, a.e. (1995). bayesian model selection in social research. sociological methodology, 25, 111–163. https://doi.org/10.2307/271063 reise, s.p., bonifay, w.e., & haviland, m.g. (2013). scoring and modeling psychological measures in the presence of multidimensionality. journal of personality assessment, 95(2), 129–140. https://doi.org/10.1080/00223891.2012.725437 revelle, w., & zinbarg, r.e. (2009). coefficients alpha, beta, omega, and the glb: comments on sijtsma. psychometrika, 74(1), 145–154. https://doi.org/10.1007/s11336-008-9102-z rodriguez, a., reise, s.p., & haviland, m.g. (2016). evaluating bifactor models: calculating and interpreting statistical indices. psychological methods, 21(2), 137–150. https://doi.org/10.1037/met0000045 schumacker, r.e., & lomax, r.g. (2010). a beginner’s guide to structural equation modeling. new york, ny: taylor and francis group. south african reserve bank. (2019). quarterly bulletin for march 2020. retrieved from https://www.resbank.co.za/content/dam/sarb/publications/quarterly-bulletins/quarterly-bulletin-publications/2020/9797/01full-quarterly-bulletin---march-2020.pdf spector, p.e., fox, s., penney, l.m., bruursema, k., goh, a., & kessler, s. (2006). the dimensionality of counterproductivity: are all counterproductive behaviors created equal? journal of vocational behavior, 68(3), 446–460. https://doi.org/10.1016/j.jvb.2005.10.005 statistics south africa. (2019). inequality trends in south africa: a multidimensional diagnostic of inequality. retrieved from http://www.statssa.gov.za/publications/report-03-10-19/report-03-10-192017.pdf taylor, n., & de bruin, g.p. (2017). basic traits inventory: technical manual. johannesburg, south africa: jvr psychometrics. tett, r.p., steele, j.r., & beauregard, r.s. (2003). broad and narrow measures on both sides of the personality-job performance relationship. journal of organizational behavior, 24(3), 335–356. https://doi.org/10.1002/job.191 transparancy international. (2019). corruption perceptions index. retrieved from https://images.transparencycdn.org/images/2019_cpi_report_en_200331_141425.pdf van aarde, n., meiring, d., & wiernik, b.m. (2017). the validity of the big five personality traits for job performance: meta-analyses of south african studies. international journal of selection and assessment, 25(3), 223–239. https://doi.org/10.1111/ijsa.12175 van de vliert, e., & van lange, p.a.m. (2019). latitudinal psychology: an ecological perspective on creativity, aggression, happiness, and beyond. perspectives on psychological science, 14(5), 860–884. https://doi.org/10.1177/1745691619858067 vandenberg, r.j., & lance, c.e. (2000). a review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. organizational research methods, 3(1), 4–69. https://doi.org/10.1177/109442810031002 wiernik, b.m., yarkoni, t., giordano, c., & raghavan, m. (2020, april 22). two, five, six, eight (thousand): time to end the dimension reduction debate!. psyarxiv. https://doi.org/10.31234/osf.io/d7jye yukl, g. (2012). effective leadership behavior: what we know and what questions need more attention. academy of management perspectives, 26(4), 66–85. https://doi.org/10.5465/amp.2012.0088 abstract introduction core concepts used in gamification in assessment designing game-based assessments scoring in game-based assessments using artificial intelligence in game-based assessment methods review findings conclusion acknowledgements references about the author(s) yaseerah akoodie psychology department, faculty of humanities, university of the witwatersrand, johannesburg, south africa citation akoodie, y. (2020). gamification in psychological assessment in south africa: a narrative review. african journal of psychological assessment, 2(0), a24. https://doi.org/10.4102/ajopa.v2i0.24 review article gamification in psychological assessment in south africa: a narrative review yaseerah akoodie received: 21 jan. 2020; accepted: 30 june 2020; published: 10 sept. 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract gamification is defined as the implementation of game design elements in real-world contexts for non-gaming purposes. gamification is increasingly being used in psychological assessment as it is thought to increase the attractiveness, motivation and performance of test takers. the ease of use of gaming principles is also a strong enabling factor for gamification in assessment. however, not much is known about the field in the south african context. hence, this article uses the narrative review method to present the latest research on gamification in assessment. more specifically, the article discusses the benefits, costs, validity and scoring methods used with gamification in assessment. research was conducted through electronic databases as well as the world wide web using google content analysis. based on the review it was evident that individuals performed similarly in traditional and gamified assessments. based on the results, the use of gamification was shown to decrease anxiety and stress and to increase motivation, loyalty and efficiency, especially in corporate environments. despite the benefits, critics point out that gamification may be viewed as less important because of the inclusion of game elements as candidates may pay less attention to the assessment than required. further, gamification has the ability to manipulate individuals as well as to bias certain groups of individuals that may be more accustomed to the use of technology than others. this raises ethical concerns, which are discussed in the article. the results also demonstrate a gap in research and practice in south african contexts with few gamified assessments available in the market. keywords: artificial intelligence; game-based assessment; gamification; gamified assessments; psychological assessment. introduction gaming is increasingly accepted as a form of entertainment with individuals of all ages playing games regularly. technological developments have led to a new use of gaming called gamification. gamification involves the addition of game elements to different contexts from engineering through to education and more recently to psychological assessment. gamified assessment can be employed in different sectors such as clinical, industrial or educational systems (karagiorgas & niemann, 2017). this study adopts the form of a narrative review and presents the latest research on gamification in assessment, its benefits and costs, validity and scoring methods using examples as applicable from the literature. ethics and the appropriateness of gamified assessment will also be discussed specifically as it pertains to the south african context. however, it is necessary that the concepts in the field are first discussed. hence, the concepts used in gamification in assessment are clarified. following this the design and scoring of gamified assessments are presented as well as current uses of artificial intelligence (ai) in gamified assessments. core concepts used in gamification in assessment gamification refers to the inclusion of game design elements into a nongaming activity in different contexts, for example, the workplace or educational settings (georgiou, gouras, & nikolaou, 2019). games are known to possess positive effects such as collaborative learning, increased levels of participation, continual interest and enjoyment (kocadere & caglar, 2015). gamification is used to increase the attractiveness and ease of use and when applied it increases the engagement and motivation of individuals’ (mekler, brühlmann, tuch, & opwis, 2017). it enhances performance on tasks and according to empirical evidence is believed to do so by providing external motivators in the form of game elements such as points, leader boards, badges, levels, challenges and more. however, the effects on intrinsic motivation are unclear as the implementation of isolated game elements does not seem to make any observable changes in intrinsic motivation and applicants’ well-being (mekler et al., 2017). within assessment, the concepts of gamified assessment and game-based assessments need to be distinguished. these different approaches vary on where they are implemented and for what purpose they are employed. the psychometric properties used in a traditional assessment remain unchanged in a gamified assessment, but the application of game elements is used to give the assessment the effects of a game, therefore making it more playful and enjoyable (georgiou et al., 2019). fatehi, holmgard, snodgrass and harteveld (2018) gamified the thematic apperception test (tat) and participants were required to complete the traditional version and the gamified version. participants had reported that the gamified tat had been increasingly motivating and engaging and had correlated with the validity and reliability scores of the traditional tat assisting in evidencing its appropriateness. therefore, because of the nature of gamified assessments, test publishers and companies employing this construct tend to create robust psychometrics that increases the validity of assessments (landers, armstrong, & collmus, 2017). this makes gamified assessment an attractive new development. haydt (2008) mentioned as cited in menezes and bortolli (2016) that gamified assessments are used in schools to test whether students are reaching the objectives they are expected to reach, to assess student learning and to test the cognitive ability of individuals before and after instructions are administered. because of these purposes, gamified assessments can take the form of a diagnostic, summative or formative assessment. game-based assessments change the core of a traditional assessment model by harnessing the full scope of game thinking to capitalise on the inherent psychometric properties of games and better applicant reactions (georgiou et al., 2019). game-based assessments aim to rebuild an assessment as a game (landers et al., 2017). these types of assessments assist in recording a player’s choices’ and the data about how a player arrived at a particular choice. this allows for game-based assessments to analyse information that traditional assessments cannot capture, specifically internal thought processes over lengthy periods. designing game-based assessments in the design of game-based assessment three aspects need to be considered for the creation of an efficient system, namely dynamics, mechanics and components. firstly, particular dynamics need to be established. the major dynamics as proposed by werbach and hunter (2012) included adding constraints to challenge the test taker, emotions to attract and maintain interest in the gamified assessment, narratives for a storytelling effect, progression and the chance to build relationships or status. the behaviours and interactions that the player has towards these dynamics help assessors to analyse an individual’s cognition as well as their engagement in an organisation. for example, if a player knows they are being watched and scored they tend to be more competitive and this can be monitored through the dynamic progression (wiklund & wakerius, 2016). therefore, these dynamics are added to game-based assessment so that they can encourage participation. a gamified environment further consists of mechanics used to create player engagement. for example, a reward which is a mechanic process may be found by the player and may stimulate happiness, a sense of achievement or curiosity (wiklund and wakerius, 2016). other game mechanics include challenges, chance, competitions, cooperation, feedback, rewards, transactions and resource acquisition. these properties contribute to the aspect of motivation needed for an individual to engage with an assessment (werbach & hunter, 2012). in application, challenges require the player to extend effort in order to solve. an example of a challenge may be to include time restrictions that create a sense of pressure on the player. this could be used to assess an employee’s ability to work in such conditions. the idea of feedback may be to increase the chance that certain behaviours may be repeated allowing for a clearer observation to be made (wiklund & wakerius, 2016). direct feedback is further required to observe the progress an individual is making towards a specific goal as different dynamic processes can impact the positive or negative feelings related to an assessment. the feedback evaluation can activate the mechanics of reward in order to formulate the scoring measures of the assessment and increase engagement and happiness (kocadere & caglar, 2015). when designing a game-based assessment researchers also consider components – the minute parts that directly affect the design of gamification. examples of components as proposed in wiklund and wakerius (2016) consisted of avatars, levels, leader boards, points, teams, virtual goods, content unlocking and badges. each component impacts the gamification process differently, for instance the levels demonstrate a player’s position in the game and can act as a method of feedback, whereas content unlocking that requires individuals to meet certain criteria to move forward, serves the mechanics of challenge, feedback and reward. alternatively, badges can be used for setting goals, providing explanations for learned activities, identifying particular players, providing feedback and encouraging competition (wiklund & wakerius, 2016). ultimately, the dynamics, mechanics and components of a game-based assessment are considered in a pedantic nature as they create the environment of the assessment and the environment of an assessment helps to improve its purpose, as well as the gamified experience (werbach & hunter, 2012). an improved gamified experience results in lower anxiety, increased motivation and the feeling of flow resulting in informed decision-making. gamified assessments are distinguished from traditional assessments because of the emphasis placed on the complete experience of the assessment as well as the promotion of user engagement rather than just focusing on the final scores (lopes, pereira, magalhães, oliveira, & rosário, 2019). scoring in game-based assessments a potent mechanic used for scoring individuals is the concept of a point system. point systems are put in place to inform players about the scores they acquire and further provide insight into the progress an individual is making (werbach & hunter, 2012). points can also serve as an information provider for the game designers as these points can be stored, tracked and be of help for the developers to understand occurrences between the game and participant (wiklund & wakerius, 2016). points can be used to encourage competition by demonstrating scores between players or to feel progression by demonstrating scores only to the specific player (odyssey, 2019). some talent management companies may provide feedback through a point system. these point systems may include an overall score as well as learning potential scores. odyssey, a south african company, also indicates whether or not a candidate is recommended for employment in order to account for the large populations (odyssey, 2019; werbach & hunter, 2012). alternatively, individuals could be scored through badges. badges could serve as a goal-setting device in order to encourage players to progress towards the goal. badges can also guide and educate the player whilst also acting as a status symbol as they communicate a player’s accomplishments. they can also serve as an identification marker for a specific group of individuals (werbach & hunter, 2012). furthermore, levels can be used to score an individual’s progress, as levels display a player’s position at any point throughout the game. in a study conducted by kocadere and caglar (2015), the concept of levels was used to provide feedback and the designed levels were based on bloom’s revised taxonomy, which consists of six steps; remembering, understanding, applying, creating, analysing and evaluating. the gamified assessment that was created had covered these topics through the development of levels based on each construct. in this manner the standard of cognitive ability of test-takers based on how well they had progressed was able to be assessed. lastly, leader boards are also used for scoring as they are in place to make simple comparisons. they provide the player with a description of their performance in relation to others. this can be a motivator as individuals can see how a few more points may lead them to an upward movement in the leader board. leader boards can also be discouraging to individuals if they are not performing as well as others taking the gamified assessment (werbach & hunter, 2012). alternatively, individuals can be scored depending on the speed of their responses, the number of correct responses chosen or through the item response theory formula. the item response theory is a scoring method where psychometric scores are assigned to individuals based upon their ability to interact with difficult situations, the probability of guessing, discriminability and thinking capability. data collected from these four constructs are integrated into a mathematical formula in order to calculate the psychometric score of a test taker (coetzee, 2018). using artificial intelligence in game-based assessment recent developments such as ai are making appearances in almost every sector of the economy. with specific regard to the field of psychometrics, ai has been implemented in many organisations in processes such as the screening of candidates and in employee selection tests (geetha & reddy, 2018). for example, realistic chatbot-type conversations with candidates occur in situational judgements tests. these conversations give insight into talent decision-making as they are capable of tracking the cognitive progress and adapting to the individual in order to further analyse the ability they present (ai in assessment, 2019; geetha & reddy, 2018). chatbot-type conversations can also assist by screening individuals for specific job requirements on preferred channels,that is, whatsapp, wechat, facebook messenger, imessage. chatbot conversations can then organise candidates based on salary expectations, willingness to relocate, interests and more. candidates who answer favourably towards the company’s requirements could be referred to the suitable recruiters for further assessment via gamified assessments to further test their suitability (ai in assessment, 2019). these advances also assist in hiring with quality and talent. it also provides an easy solution to the mapping of talents so that candidates can be placed in optimal positions (geetha & reddy, 2018). furthermore, ai initiates automated scoring and computer-generated interpretive reports hereby eradicating the need for developers to perform these tasks. artificial intelligence produces benefits in assessment. firstly, precision is assured through the use of ai as these systems can analyse large amounts of data faster than any human could. this can improve the time and cost of selection decisions. next, efficiency can be ascertained when a system is automated as this eliminates the room for human error (verma & bandi, 2019). artificial intelligence allows recruiters to conduct consistent and objective assessments of relevant data at an earlier stage than expected (ai in assessment, 2019). specifically, regarding video interviews ai has simplified the process as ai systems can transcribe and analyse data quickly, as well as help to analyse the visual elements through emotion tracking and facial recognition. following these techniques, ai provides a new insight and efficiency into scoring gamified assessments (ai in assessment, 2019). beyond the screening and selection uses, ai has also proven to have positive benefits in the educational sectors in which students can be assessed in a new superior system that can track the progress of individuals and form an evaluation of the knowledge students have in a specific area of study (luckin, 2017). this can aid teachers in understanding their students better and make them more aware of the performance of individuals thus enabling them to prepare and focus on necessary areas at a later stage in a semester or educational year. it also assists students by encouraging them to reflect on their learning and current grades (luckin, 2017). in contrast to the compelling developments in ai, a limitation surrounding bias presents itself. it has been discovered that algorithms are a reflection of the bias in the world and the impact of an ai system functioning through an algorithm is massive. algorithms are trained on data documented in the world, thus the data contracted should consider different cultures, environments, socio-economic profiles, preferences, lifestyles and genetic endowments and should reflect this rich diversity (panch, mattie, & atun, 2019). however, data is not uniformly available for all groups, hence an imbalance is created, for example, a certain group is not sampled as much as others are or some groups are overlooked completely. this creates insufficient data and an inaccurate prediction for under-represented categories (panch et al., 2019). recent research conducted by amazon’s machine learning specialists acts as evidence to these discoveries. amazon frequently makes use of automation and with regard to their ai based hiring tool researchers have found that the system was not gender-neutral (dastin, 2018). in addition, researchers have found problems with facial recognition intelligence. in the majority of the contemporary facial recognition algorithms evaluated in a different study it was found that demographic differentials exist (grother et al., 2019). false positive and false negative results are common for many algorithms where false positives are more commonly found and false negatives are algorithm-specific. ultimately, it was concluded that more accurate algorithms produce fewer errors and it had been suggested that having smaller demographic differentials in a study is favourable (grother et al., 2019). these efforts display that ai can be reengineered to produce fairer results, however it also accentuates the dependency of ai on human training and presents how challenging and complex the problem of bias can be specifically because of the belief that by adding to the data sets the balance of the system will be affected (knight, 2019). therefore, it is critical to evolve and instruct ai systems with data that is unbiased and algorithms that can easily be explained. to date there is not much research available on the uses and limitations of gamification as well as the ethics associated with this type of assessment. this study intends to present an overview of the uses and limitations of gamification in assessment and the ethics associated with gamification with a specific focus on the south african context. methods this study used a narrative review method to explore gamification in assessment. a narrative review approach was chosen as this study provides a broad perspective on the topic area and explores the general debates in the area (green, johnson, & adams, 2006). hence, narrative reviews provide an examination of literature in the topic area in order to summarise information on the topic area and identify gaps for future research (grant & booth, 2009). literature searches were conducted on google scholar; sage research methods; national centre for biotechnology information (ncbi); semantic scholar and research gate using keywords as follows ‘gamification’; ‘gamified psychological assessments’; ‘scoring methods in assessment’ and ‘psychometrics and gamification’. grey literature was also located by searching the world wide web using google content analysis. review findings benefits and limitations of gamification in assessment the first benefit of gamification in assessment was that individuals have been found to perform similarly in traditional assessments and in gamified assessments demonstrating that gamified assessments have an equal footing in psychometrics deeming it an acceptable technique (ai in assessment, 2019). further, gamified assessments improve the users’ experience and in organisations, it improves the employers’ brand perception too. because of this belief, gamified assessment can be considered an appropriate tool to use in businesses. they help reduce stress and any stereotype threats. they can test a participant’s decision-making skills, reactions, preferences and biases aiding in making informed employee selection decisions and finding appropriate roles for candidates. in addition, a gamified assessment environment might distract candidates from the idea that they are under assessment unlike traditional assessments. this can reduce test anxiety, which is beneficial as low levels of anxiety increase performance. it can promote behaviours that are more likely to appear unconsciously instead of socially acceptable or desired outcomes (fetzer, mcnamara, & geimer, 2017). gamification has also been found to boost motivation and loyalty as it allows a sense of motivation and confidence in one’s abilities, as well as ensure individuals know just how valued they are by a company. it allows individuals to take pride in what they are good at and improve on what they may not be even if this is not through extended hours of play and is discovered by approaching individuals who rank higher than you for advice. in this manner gamified assessments provide a more accurate interpretation of the candidates under examination, increasing the reliability of these types of assessments and the quality of staff members employed (fetzer et al., 2017; guy, 2019; maltitz, 2014). further, game engagement and the use of contexts can aid in diagnosing how an individual handles a particular problem and this may lead to more robust inferences about performance (fetzer et al., 2017). because of these advantages the use of game elements improves the criterion validity of assessments too (guy, 2019). gamification utilises three dimensional graphics, sounds and avatars that give the assessment a realistic feel. this can enhance the transferability of the assessment, which increases the face and ecological validity related to it (lumsden et al., 2015). according to a test partnership company called mindmetriq all the assessments that they distribute to companies such as htc, steinhoff uk retail ltd, unilever, barclays and many more, withhold high validity and reliability. the tests are measured against alternate measures and have been found to have correlations that are statistically significant. this suggests evidence of validity in the test partnership. further, the company tests for reliability using rasch item reliability and pearsons’ reliability producing high levels of reliability that ensures accuracy and precision, therefore, permitting the use of these assessments for potent selections and assessment (guy, 2019). fairness and objectivity are also increased, as well as efficiency as administration time is reduced significantly (guy, 2019). gamified assessments reduce the costs of administration as professionals are not required to spend the time and money it takes to incorporate a telephonic interview, and by making assessments available on online networks recruiters can attract a certain crowd that may not be interested in traditional assessments and may align with the skill set the organisation is interested in (guy, 2019; krasulak, 2015). this too impacts efficiency as it creates a skill-specific talent pool. alongside this, the ability of scoring helps to ease the costs of recruitment as selections can be made by a simple pass or fail rule, for example, candidates are required to obtain a minimum score in order to progress in selections. this reduces the time administrators put in regarding traditional assessments as they are not expected to work case-by-case. time to hire can also be reduced by combining tests allowing different constructs to be measured simultaneously, such as performance, personality and cognitive abilities (nikolaou, georgiou, & kotsasarlidou, 2019). many companies are affected by open vacancies as the longer the vacancy is left open the greater the cost is for the organisation because of loss in productivity. gamified assessment systems aid in increasing productivity in selections as a single assessor can oversee and invite thousands of candidates whilst ensuring that they complete the assessment within a desired time frame instead of contacting candidates individually to arrange a mutually agreed upon interview time and wait for each candidates response (guy, 2019). in the health sector, gamified assessments are used to promote positive health behaviour and to eliminate the stigma surrounding certain health issues (hamari & koivisto, 2015). mobile devices are an effective medium for individuals to monitor their health, for example, a cognitive behavioural therapy (cbt) based application (app) called mycompass is a self-guided psychological treatment that is designed to monitor and reduce mild to moderate anxiety, stress and depression as well as suggest techniques to improve social and work functioning (giota & kleftaras, 2014). this is conducted through a set of monitored interactive activities and assistance in restructuring the way an individual thinks or behaves under these conditions. resultantly, this gamified app can assist clients and potential clients in increasing their health-seeking behaviours positively (giota & kleftaras, 2014). furthermore, rehabilitation centers assist individuals who have suffered from brain injuries, diseases or disorders by aiding patients in relearning how to complete daily tasks individually (vourvopoulos, ponnam, faria, & badia, 2014). instead of utilising the traditional usage of questionnaires and scales a gamified assessment known as rehabcity has been developed in the united states so that individuals can assess the cognitive deficits associated with the injury they face (vourvopoulos et al., 2014). players are placed in real-life environments in a virtual world in order to familiarise themselves with the daily situations they may encounter, for example, visiting the grocery store or abiding by traffic laws. the rehabcity gamified assessment correlates strongly with the mini mental state examination test used in clinical assessments for cognitive functioning and therefore provides potent assistance in the health sector. with gamified assessment candidates may also complete the test at any location without any transport cost or potentially wasting time of the interviewer (guy, 2019). however, some concerns surrounding this benefit exist. firstly, standardising the environmental conditions of the test-takers would present as a difficult task and could leave room for inaccurate representations of the participants (foxcroft & roodt, 2018). next, all candidates or potential candidates may not have access to a stable internet connection (du plessis, 2014). this could impact the gamified assessments accuracy or it may require candidates to travel to areas that can provide a stable internet connection, which could be costly. alternatively candidates may not have access to wifi causing an increase in data expenses. nonetheless, cancellations or candidates that don’t show up are no longer of concern when using online psychometric testing and with regard to providing feedback, online gamified assessment may have a customisable automatic email that can be sent to candidates, containing a feedback report for both successful and unsuccessful participants. this eradicates the need for administrators to contact each candidate individually and managing a candidate’s negative emotions when faced with rejection (guy, 2019). critics argue further that gamification is a form of manipulation. awarding someone a badge may only motivate them for a short period of time and could demonstrate a patronising nature. however, mandating skills as a prerequisite gives people a goal to work towards and assigns real value to the task and respective badge earned (maltitz, 2014). in addition, a consideration surrounds the scepticism related to gamified assessment as certain groups who may be tested may be less likely to play assessment games, for example, millennials are more drawn to play assessment games yet older candidates may be less familiar with the activity (guy, 2019). this could affect senior positions. in order to counteract this difficulty organisations have considered the idea of using a combination of assessments (evalex, 2014). another risk that exists regarding the appropriate nature of gamified assessment is that some individuals may take the assessment less seriously because it is a game and may pay less attention than they would in a more formal traditional method of testing. the results may not represent true abilities. by setting assessments that are goal-specific and motivational companies can eradicate this hurdle (evalex, 2014). with reference to cultural differences an alternate limitation arises. according to khaled (2014), because of the multicultural nature of the world it is increasingly important that cultural differences that may present in the utility of gamified assessments be taken into account. this may be a difficult task to complete specifically because some cultural convictions contradict and can negate each other, for example, in danish and other scandinavian cultures it is frowned upon to try to stand out and if you do it is perceived as though you believe that you are superior to others. amongst this belief, there are a set of rules that encourage social equality, uniformity and social stability. in such a culture where competition is looked down upon gamified assessments may not flourish. on the other hand, in countries such as the united states it might be considered to be admirable for someone to pursue themselves and their personal objectives against all odds (khaled, 2014). cultural importance is in accord with risk taking, competitiveness, achievements, self-assertion and success. these cultures hold opposing beliefs and utilising one gamified assessment for both, may result in dishonest representations of the individuals’ abilities. although gamification consultancy companies recognise the need to address cross-cultural differences, a lack of how to rectify these difficulties exists. research notes that reform is necessary in the design phase, however more detail is required and for the present, designers working on international or national designs focus their attention on company culture or national culture rather than individual culture (guhl, 2017). using gamification for assessment in south africa currently gamification for assessment purposes is primarily used in organisational settings in south africa. deloitte, a multinational professional accounting services network, utilises a gamified assessment called firefly freedom. this gamified assessment maps an individual’s personality in order to gain insight into the mannerisms that candidates make use of in different situations (hanna & dettmer, 2004; whitelock, 2019). l’oreal, a multinational, beauty focused organisation uses a gamified assessment – ‘reveal’ – in which candidates move through the reveal platform as avatars. candidates face various challenges from different departments in order to grant applicants the chance to explore the available positions in the company as well as to demonstrate the skills they possess (l’oreal, 2019). another example of gamified assessments in south africa is pricewater house cooper’s (pwc) use of a game-based psychometric assessment called ascender. ascender is a web-based assessment set that takes the form of a novel-styled intergalactic journey in which candidates are required to apply their judgement and make decisions in different situations (pwc, 2019). ascender evaluates applicant’s personal values by linking each decision to a specific personal value and with the use of a scoring algorithm, pwc recruiters are able to determine the degree of alignment of each candidate to a set of values and therefore align employee’s personal and corporate values to ensure efficient inter-team dynamics (pwc, 2019). evalex, a talent management company in south africa, promotes the use of odyssey – a gamified assessment. odyssey was created to cater for the specific requirements found in developing economies where a minimum level of education and literacy cannot be assumed. odyssey aims to identify talent and potential regardless of any previous formal training individuals may have. further, recruiters often overlook entry level candidates with huge potential simply because the traditional assessments that are administered require a much more advanced level of education than the job requires. odyssey uses a gamified assessment approach in order to measure the real skills that employees need in order to operate at the lower levels of work. these include problem-solving skills, instruction assimilation, trainability, english literacy, numerical literacy and productivity (evalex, 2014). du plessis (2014) discussed the challenges associated with using gamification in schools given the social and economic inequalities that exist in south africa. even though gamification offers an alternative for large-scale assessments in a climate of limited resources, difficulties arise in the appropriateness of this implementation. structural problems exist in terms of access to technology, electricity and data. many areas in south africa do not have the necessary high-speed delivery systems that are required to make online technologies work optimally and many individuals do not have the budget to own any form of technology and have endless access to electricity and data. wifi is also not freely accessible everywhere in south africa making it increasingly difficult to afford enough data to run gamified assessments remotely (du plessis, 2014; xala, 2018). thus, transportation costs to and from organisations or wifi hotspots has to be considered too. these factors affect the access to gamified assessments and therefore, affect the opportunities for individuals in south africa. however, the south african government aims to provide broadband connectivity for all citizens through public wifi by 2030. public wifi programmes are important as they can assist in addressing the issues of inequity for communities that are unable to afford high costs of data (xala, 2018). beyond the difficulties of access to resources it was discovered that teachers in educational systems struggle to utilise game-based assessments or learning techniques because of a lack of knowledge or familiarity with technological devices and systems and they tend to avoid using such advancements (connolly & boyle, 2016). in south africa this may pose as a difficulty too because of the vast inequalities of the country as not all individuals have access to technology and therefore may lack familiarity with certain devices and the manner in which they function. as a result, it is critical that training programs are made available in order to integrate technology into educational systems and other sectors (botha, herselman, & ford, 2014). further, gamification in assessment still deals with the challenges in traditional assessment ranging from language proficiency, quality of schooling, test-wiseness and multilingualism amongst others (laher & cockcroft, 2013). ethical considerations of gamified assessment with regard to ethical considerations, it is important to follow the principles of fair-testing as every applicant should have equal chances of success. the assessment should appeal to all candidates regardless of their cultural background, age or physical ability. one way to ascertain that the gamified assessment is applicable to all is to omit the demographic variables generally requested from the participant (psymetric company, 2019). hiring managers tend to favour candidates who are similar to themselves either based on a similar experience, university qualification, demographic or personality trait and they tend to disfavour candidates who are similar to an employee whom they had a bad experience with even though the new candidate could potentially be the correct choice. similarly, younger candidates tend to suffer when hiring managers utilise traditional methods that require the aforementioned demographic details because of the lack of depth displayed on their resumes and older candidates tend to be impacted by irrelevant past experiences. in this manner, demographic information or identifiable information can cause companies to neglect the skills that candidates can really offer (keijzer, 2018). therefore, by removing this retrieval of information, individuals can be assessed or selected based on their ability presented in a game rather than their race, gender or other demographic details deeming gamified assessments increasingly fair. in relation to fair-testing and applicability a potent concept to consider is cultural differences. it has been discovered that there is no solitary arrangement of qualities, beliefs or values that individuals from every single world culture accept to be similarly significant (khaled, 2014). because of this claim, it is safe to deduce that specific representations found in gamified assessments would not be interpreted in the exact same way cross-culturally, causing in-game behaviour to be impacted. these behaviours may then be misinterpreted by the assessor evaluating the scoring system in place or they may be evaluated in a manner that is positive to some scoring systems and negative to others. for example, participant a responds in a manner that is culturally acceptable to his standard; however this action may not be viewed as positive to the systems scoring. alternatively, a candidate from the united states may approach the assessment with success, ambition and competitiveness in mind because of the cultural importance standards demonstrated in the united states and they may constantly try to better their score, whereas individuals in new zealand who view markers of achievement as needless and almost offensive may react with less zest to score high. in this manner participants may be overlooked because of cultural beliefs rather than skill and this may infringe on discrimination (khaled, 2014). in order to further follow through on applicability requirements, developers’ should consider the country and the laws of the country where the gamified assessment will be implemented in too, as the laws of different countries may differ resulting in differing belief systems of the inhabitants (wiklund & wakerius, 2016). beyond the beliefs of the people of a country, the laws and policies must be considered too as there may be severe consequences for non-compliance of labour laws and data privacy laws. therefore, in order for gamified assessments to be appropriate they need to be adaptable to the contexts in which they will be used. this requires a flexible and configurable design that provides capability for users to turn features on and off based on their geographical preferences (kumar & herger, 2013). in addition, data privacy is a necessary concern as particular european laws disallow collection, processing and the use of individual data without consent from the participants, therefore, candidates should be required to perform an action such as signing a document, clicking a button or checking a box to agree that their data are being collected and used (kumar & herger, 2013; wiklund & wakerius, 2016). in some countries the law gives power to workers councils’ to approve assessments utilised in a company. such councils are generally concerned with the purpose of the data collection and the amount of data collected, as well as the justification behind it. further, they consider where data is stored, if the purpose of the assessment can be reached with less data, the anonymity of the data and if the data serves as a basis for performance review decision such as salary increases, bonus calculation, promotions or expulsions. in addition, the negative impacts of these considerations on employee’s must be addressed too so that employee well-being and fairness is consistent. these factors need to be contemplated and approved by a council before usage of the gamified assessment is allowed. in order to avoid the prohibition of the entire game design the ability to turn off some features could help assure the game is flexible enough for worldwide use (kumar & herger, 2013). as yet there are no guidelines, legislative or otherwise in south africa with specific regard to gamification. however, game-based assessments and gamified assessments are still considered tests and in the event they are utilised in south africa they are required to comply with the employment equity act (eea) (no. 55 of 1998) and the hpcsa guidelines (tomu, 2013). this is specifically important when testing for psychological constructs. the eea and the hpcsa work together in order to regulate the ethical conduct of psychologists (tomu, 2013). in accordance with the eea, when using gamification in assessments individuals need to ensure that the test is valid, reliable, applied fairly and is not biased against any employee or group. the hpcsa complements the professional laws and codes in order to regulate test use and it assumes that the tests being used are compliant with it. furthermore, the protection of personal information act (popi) (no. 4 of 2013) should be considered. this act protects the privacy rights of all individuals, hence it impacts all parties that collect, process, store and disseminate personal information, therefore, directly impacting gamified assessments or game-based assessments (de bruyn, 2014). the popi enforces that a responsible party utilising an individual’s private data should obtain consent for collecting and storing the data, the purpose of the data collection must be made known to the individual, access should be provided or removed if requested, the individual providing the data should be aware of who will have access to their information and how and where the data will be stored and lastly, the measures that are put in place in order to safeguard your information (de bruyn, 2014). the act allows for an exclusion with regard to processing information in one’s personal capacity, information that has been de-identified, that is, using an anonymisation technique or information that has been collected on behalf of a public body that promotes national security and public safety (de bruyn, 2014). ultimately, it would be necessary for a default setting to be used in which individuals are made aware that data are being collected, assessed or used (wiklund & wakerius, 2016). the methods that the assessments are administered should also perform equally well on all devices as technology advancements may distort what is being assessed. it cannot be assumed that all measures of an assessment will be equivalent across different modes of delivery as it is more common that non-cognitive measures in a test can be transferred equivalently but cognitive measures do not always transfer correctly, for example, when moving from pc to mobile devices (ryan & derous, 2019). whilst the reasons behind why different devices respond differently to the transfer of data has to be studied further as this would call for reliability trial sessions spread across all different types of devices in order to assure assessments fit their purpose (aon, 2018). despite reliability trial sessions, technology itself poses a set of ethical dilemmas as it cannot be classified as a fully reliable source because of battery failures and internet connectivity issues, which have the potential to interfere with assessments or data recordings. furthermore, privacy is not ascertained as the chances of theft and hacking exist in which important, private information could be exposed and confidentiality agreements can be broken (giota & kleftaras, 2014). another ethical concern surrounds the risk of individuals cheating the system. this means that the gamified experience needs to be built with the belief that players will try to cheat the system and as a result add cheat protection to the assessment as any form of cheating could result in inconsistent or dishonest results (wiklund & wakerius, 2016). in order to reduce cheating, gamified assessments should decrease the perceived values of rewards, for example, intrinsic rewards should be employed and they should not be transferable in the real world or rewards should have a large perceived value for the target audience, but not for the rest of the population. notably, transferability of virtual rewards in the real world could further infringe on country laws and should be avoided. total transparency between developers’ and participants is required in order to maintain such a system too (kumar & herger, 2013). other factors that could result in manipulation could form from the developers’ side. companies that recruit through gamified assessment do not clearly disclose to test-takers the contents and aims of a gamification system. this edges into the exploitation of individuals’ vulnerabilities and requires further research (kim, 2016). with regard to manipulation, companies that employ gamification to increase employee skills and competitiveness tend to generate increased productivity in the workplace; however, the employees receive no physical or monetary reward for these improvements rather they are awarded virtual rewards, that is, badges or higher placement on a leader-board (kim & werbach, 2016). in this manner, the employee may receive less satisfying rewards whereas the employer will receive monetary rewards and recognition. for example, target utilises a gamification technique to assess the speed of cashiers through a game called ‘checkout’ in which cashiers are rated with a green light when they work fast enough or a red light when they are too slow. cashiers are then awarded badges for reaching higher levels of speed and can be promoted to new levels as their efficiency increases. checkout has proven to increase the speed of checkout lines and cashiers report an increased satisfaction in their job experiences; however, no monetary or real reward exists for the employees – only for employers (kim, 2016). this relates to the issue of deceit, which is considered unethical and should be addressed (kim & werbach, 2016). a gamified assessment can be considered meaningful and effective if it prioritises the needs of the participant rather than those of the assessor or organisation (bhavani et al., 2019). furthermore, kumar and herger (2013) had stated that gamification designers mimic the role of a social architect to a certain extent and with this title comes a responsibility. assessors should be aware of the manipulation of the influence they may have on participants as gamification connects the virtual world to the real world and decisions that candidates make in a gamified system tends to affect their reality. misuse of this power infringes on manipulation as individuals’ decisions may be affected by the assessment in order to benefit the assessor rather than the participant. for example, reminding players to save water and electricity, organise their workspace, create lists to organise their minds or to recycle materials would be a positive influence whereas describing players’ rights in a confusing or incomprehensible manner to them or constantly demonstrating an unhealthy work environment would demonstrate a negative influence (kumar & herger, 2013). alternatively, significant psychological harms can be found in gamified assessments and some scoring methods. video-screening leader board systems can generate anxiety, shame and embarrassment among employees. according to empirical research carried out in disneyland hotels, it was discovered that individuals found that seeing their performances listed and ranked against other co-workers caused them to panic and express anxiety about losing their jobs (kim, 2016). in addition, scoring at the bottom of a leaderboard has proven to be a disliked position and in an assessment context this causes individuals to disengage from the assessment (ferrell et al., 2015). in turn, this decreases their test-taking motivation and directly impacts their likelihood to complete the assessment. if these psychological impacts were to occur in a high-stakes assessment context, the organisation utilising the assessment might end up with a smaller talent pool. therefore, gamification developers should aim to avoid such harms and grant attention to the method so that scoring is correctly expressed. lastly, it has been said that whilst gamified assessments have eradicated the ethical issue of adaptability regarding language because of their visual characteristics, it has been argued that these types of assessments do not take into consideration individuals that may be colour blind or may have sight impairments. these candidates should not be disadvantaged as such action would be considered as discriminatory. furthermore, with regard to discrimination, it is possible that the instructions of some gamified assessments may not be understandable cross-culturally and in countries such as south africa with a multiplicity of cultures and languages, this may impose difficulty on assessors and developers when adapting a gamified assessment (aon, 2018). conclusion ultimately, the development of gamification had led to the creation of gamified assessments. this technological advancement has climbed to the top of the talent acquisition agenda globally. the uses of gamification have expanded significantly with games being used to acquire job for candidates, assess skills before and after instructions and assess abilities for specific roles across fields. gamification in assessment is also being used internationally in educational and clinical settings. this review has highlighted the benefits and limits as well as the ethics of gamification in assessment with a specific focus on the south african context. from the review, it is evident that gamification has a lot to offer towards providing greater access to assessment for the south african population provided that ethical concerns are addressed. acknowledgements we wish to thank and express our appreciation towards professor sumaya laher, who had assisted in the production of this review by overseeing all proceedings and guiding the author through the process. competing interests the author has declared that no competing interest exists. author’s contributions i declare that i am the sole author of this article. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability statement data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position or any affiliated agency of the authors. references ai in assessment. (2019). artificial intelligence (ai) in assessment. aon. retrieved from https://www.cut-e.com/ai-in-assessment/ aon. (2018). gamification in assessment: upgrade your talent strategy [white paper]. aon company. retrieved from https://www.cut-e.com/assessment-solutions/ bhavani, s., mukherjee, d., dasgupta, j., verma, d., parameshwaran, d., divan, g., … patel, v. (2019). development, feasibility and acceptability of a gamified cognitive developmental assessment on an e-platform (deep) in rural indian pre-schoolers – a pilot study. global health action, 12(1), 1548005. https://doi.org/10.1080/16549716.2018.1548005 botha, a., herselman, m., & ford, m. (2014). gamification beyond badges. pretoria: ist-africa. retrieved from http://www.ist-africa.org/conference2014 coetzee, m. (2018). south african journal of industrial psychology: annual editorial overview. sa journal of industrial psychology, 44. https://doi.org/10.4102/sajip.v44i0.1591 connolly, t., & boyle, l. (2016). proceedings of the 10th european conference on game based learning. paisley: academic conferences and publishing international limited. retrieved from http://academic-bookshop.com dastin, j. (2018). amazon scraps secret ai recruiting tool that showed bias against women. reuters. retrieved from https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-iduskcn1mk08g de bruyn, m. (2014). the protection of personal information (popi) act – impact on south africa. international business & economics research journal (iber), 13(6), 1315. https://doi.org/10.19030/iber.v13i6.8922 du plessis, p. (2014). problems and complexities in rural schools: challenges of education and social development. mediterranean journal of social sciences, 5(20), 215–220. https://doi.org/10.5901/mjss.2014.v5n20p1109 evalex intellectual capital management. (2014). assessment for entry into the world of work [white paper]. evalex intellectual capital management. retrieved from https://www.evalex.com/wp-content/uploads/2016/10/odyssey-brochure.pdf fatehi, b., holmgard, c., snodgrass, s., & harteveld, c. (2018). gamifying psychological testing: insights from gamifying the tat. boston, ma: northeastern university. ferrell, j.z., carpenter, j.e., vaughn, e.d., dudley, n.m., & goodman, s.a. (2015). gamification of human resource processes. in d. davis & h. gangadharbatla (eds.), emerging research and trends in gamification (pp. 108–139). shaker, usa: igi global. fetzer, m., mcnamara, j., & geimer, j.l. (2017). gamification, serious games and personnel selection. in h.w. goldstein, e.d. pulakos, j. passmore & c. semedo (eds.), the wiley blackwell handbook of the psychology of recruitment, selection and employee retention (pp. 293–309). john wiley & sons ltd. foxcroft, c., & roodt, g. (2013). introduction to psychological assessment in south african context (4th edn.). cape town oxford university press. geetha, r., & reddy, d. (2018). recruitment through artificial intelligence: a conceptual study. international journal of mechanical engineering and technology, 9(7), 63–70. georgiou, k., gouras, a., & nikolaou, i. (2019). gamification in employee selection: the development of a gamified assessment. international journal of selection and assessment, 27(2), 91–103. https://doi.org/10.1111/ijsa.12240 giota, k.g., & kleftaras, g. (2014). mental health apps: innovations, risks and ethical considerations. scientific research, 3, 19–23. https://doi.org/10.4236/etsn.2014.33003 grant, m.j., & booth, a. (2009). a typology of reviews: an analysis of 14 review types and associated methodologies. health information & libraries journal, 26, 91–108. https://doi.org/10.1111/j.1471-1842.2009.00848.x green, b.n., johnson, c.d., & adams, a. (2006). writing narrative literature reviews for peer-reviewed journals: secrets of the trade. journal of chiropratic medicine, 5(3), 101–117. https://doi.org/10.1016/s0899-3467(07)60142-6 grother, p., & ngan, m. (2019). face recognition vendor test (frvt) performance of face identification algorithms. gaithersburg, md: national institute of standards and technology. guhl, a. (2017). gamification across borders: the impact of culture. master’s degree of international business and trade. gothenburg, sweden: university of gothenburg. guy, l. (2019). gamified assessments: a literature review [white paper]. mindmetriq. retrieved from https://www.testpartnership.com/fact-sheets/2019-gamification-literature-review.pdf hamari, j., & koivisto, j. (2015). why do people use gamification services? international journal of information management, 35(4), 419–431. https://doi.org/10.1016/j.ijinfomgt.2015.04.006 hanna, g.s., & dettmer, p.a. (2004). assessment for effective teaching: using context-adaptive planning. boston: pearson a&b. karagiorgas, d., & niemann, s. (2017). gamification and game-based learning. journal of educational technology systems, 45(4), 499–519. https://doi.org/10.1177/0047239516665105 keijzer, p. (2018). eliminate bias by using gamified assessments | hr trend institute. hr trend institute. retrieved from https://hrtrendinstitute.com/2018/10/01/eliminate-bias-by-using-gamified-assessments/ khaled, r. (2014). gamification and culture. in s. walz & s. deterding (eds.), the gameful world: approaches, issues, applications (1st edn.). cambridge, ma: massachusetts institute of technology. kim, t. (2016). gamification of labor and the charge of exploitation. journal of business ethics, 152(1), 27–39. https://doi.org/10.1007/s10551-016-3304-6 kim, t.w., & werbach, k. (2016). more than just a game: ethical issues in gamification. ethics and information technology, 18(2), 157–173. https://doi.org/10.1007/s10676-016-9401-5 knight, w. (2019). ai is biased. here’s how scientists are trying to fix it. wired. retrieved from https://www.wired.com/story/ai-biased-how-scientists-trying-fix/ kocadere, s.a., & çağlar, s. (2015), the design and implementation of a gamified assessment. journal of e-learning and knowledge society, 11(3), 85–99. krasulak, m. (2015). use of gamification in the process of selection of candidates for the position in the opinion of young adults in poland. jagiellonian journal of management, 1(3), 203–215. https://doi.org/10.4467/2450114xjjm.15.015.4472 kumar, j., & herger, m. (2013). gamification at work: designing engaging business software (1st edn., pp. 103–112). the interaction design foundation. laher, s., & cockcroft, k. (2013). psychological assessment in south africa: research and applications. johannesburg sa: wits university press. landers, r.n., armstrong, m., & collmus, a.b. (2017). how to use game elements to enhance learning: applications of the theory of gam-ified learning. in m. ma, a. oikonomou & l.c. jain (eds.), serious games and edutainment applications (vol. 2, pp. 457–483). surrey, uk: springer. lopes, s., pereira, a., magalhães, p., oliveira, a., & rosário, p. (2019). gamification: focus on the strategies being implemented in interventions: a systematic review protocol. bmc research notes, 12(1), 1–8. https://doi.org/10.1186/s13104-019-4139-x luckin, r. (2017). towards artificial intelligence-based assessment systems. nature human behaviour, 1(3), 1–3. https://doi.org/10.1038/s41562-016-0028 lumsden, j., edwards, e.a., lawrence, n.s., coyle, d., munafò, m.r., lumsden, j., … munafò, m. (2016). gamification of cognitive assessment and cognitive training: a systematic review of applications and efficacy. jmir serious games, 4(2), e11. https://doi.org/10.2196/games.5888 maltitz, b. (2014). the potential for gamification in south african contact centres [white paper]. 1stream. retrieved from https://1stream.co.za/potential-gamification-south-africa-contact-centres/ mekler, e., brühlmann, f., tuch, a., & opwis, k. (2017). towards understanding the effects of individual gamification elements on intrinsic motivation and performance. computers in human behavior, 71, 525–534. https://doi.org/10.1016/j.chb.2015.08.048 menezes, c.c.n., & de bortolli, r. (2016). potential of gamification as assessment tool. creative education, 7(4), 561–566. https://doi.org/10.4236/ce.2016.74058 nikolaou, i., georgiou, k., & kotsasarlidou, v. (2019). exploring the relationship of a gamified assessment with performance. the spanish journal of psychology, 22, e6. https://doi.org/10.1017/sjp.2019.5 odyssey | welcome to odyssey talent management. (2019). odyssey talent management. retrieved from http://odysseytalent.co.za/ panch, t., mattie, h., & atun, r. (2019). artificial intelligence and algorithmic bias: implications for health systems. journal of global health, 9(2), 010318. https://doi.org/10.7189/jogh.09.020318 psymetric company. (2019). psymetrics: using neuroscience and data science to revolutionuze talent management. psymetric. retrieved from https://www.psymetrics.com/employers/ pwc. (2019). ascender: values-based assessment. retrieved from https://www.pwc.co.za/en/services/people-and-organisation/ascender-values-based-assessment.html reveal by l’oréal, top com d’or – l’oréal group | world leader in beauty | official website. (2019). loreal. retrieved from https://www.loreal.com/group/who-we-are/awards---recognitions/2011/%c2%ab-reveal-by-l%e2%80%99or%c3%a9al-%c2%bb--top-com-d%e2%80%99or ryan, a., & derous, e. (2019). the unrealized potential of technology in selection assessment. revista de psicología del trabajo y de las organizaciones, 35(2), 85–92. https://doi.org/10.5093/jwop2019a10 tomu, h. (2013). the role played by the health professions of south africa (hpcsa) ethical code of conduct and the employment equity act (eea) in regulating professional, legal and ethical conduct of psychologists in south africa. international journal of academic research in economics and management sciences, 2(1), 59. verma, r., & bandi, s. (2019). artificial intelligence & human resource management in indian it sector. ssrn electronic journal. https://doi.org/10.2139/ssrn.3319897 vourvopoulos, a., ponnam, k., faria, a.l., & badia, s.b. (2014). rehabcity: design and validation of a cognitive assessment and rehabilitation tool through gamified simulations of activities of daily living. conference paper. madeira, portugal: acm werbach, k., & hunter, d. (2012), for the win: how game thinking can revolutionize your business. philadelphia: wharton digital press. whitelock, j. (2019). deloitte gamify the recruitment process. think in circles. retrieved from https://thinkincircles.com/deloitte-gamify-the-recruitment-process/ wiklund, e., & wakerius, v. (2016). the gamification process: a framework on gamification. masters thesis. jonkoping, sweden: jonkoping university. xala, n. (2018). the current state of free public wifi in south africa – htxt.africa. hypertext. retrieved from https://www.htxt.co.za/2018/09/11/the-current-state-of-free-public-wifi-in-south-africa/ abstract the structure and measurement of personality in adolescence hierarchy and continuity basic traits inventory method results discussion practical implications limitations conclusion acknowledgements references footnote about the author(s) gideon p. de bruin department of industrial psychology, faculty of economic and management sciences, stellenbosch university, stellenbosch, south africa nicola taylor jvr psychometrics, johannesburg, south africa department of industrial psychology and people management, faculty of management, university of johannesburg, johannesburg, south africa șerban a. zanfirescu department of psychology, faculty of psychology and educational sciences, university of bucharest, bucharest, romania citation de bruin, g.p., taylor, n., & zanfirescu, ș.a. (2022). measuring the big five personality factors in south african adolescents: psychometric properties of the basic traits inventory. african journal of psychological assessment, 4(0), a85. https://doi.org/10.4102/ajopa.v4i0.85 original research measuring the big five personality factors in south african adolescents: psychometric properties of the basic traits inventory gideon p. de bruin, nicola taylor, șerban a. zanfirescu received: 27 sept. 2021; accepted: 31 jan. 2022; published: 31 mar. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the present study examined the psychometric properties of the basic traits inventory (bti): a big five personality questionnaire that was developed for adults, amongst south african adolescents. the research focussed on (1) whether the factor structure of the inventory manifested similarly for younger and older adolescents and whether this structure matched that found for adults and (2) whether the scales of the bti yield scores with similar reliabilities for adolescents of different ages and whether these reliabilities match those found for adults. results demonstrate the replicability of the theoretical five-factor structure of the bti amongst younger and older adolescents and evidence that the scales yield scores with high reliability. overall, the results show that the bti holds promise as a measure of the personality traits of the big five model amongst adolescents in the south african context. keywords: bti; adolescents; reliability; personality; factor structure. whereas the measurement of personality traits in adults appears to be a productive area for research psychologists and practitioners, much less attention is focused on the measurement of personality traits amongst children and adolescents. there are a number of personality inventories available for use with adults, for example, the revised neo personality inventory (neo-pi-r) (costa & mccrae, 2008), the occupational personality questionnaire (shl, 2009) and the hogan personality inventory (hogan & hogan, 2007). by contrast, there are few instruments available for the evaluation of personality in adolescents and children and this may have contributed to the relative scarcity of research carried out in this area. moreover, the existing research was principally conducted in europe and northern america, which raises questions about how personality assessment in adolescents and children can best be performed in non-western contexts. there have been recent efforts to examine the utility of big five measures amongst adolescents in non-western contexts (e.g. john, xavier, waldmeier, meyer, & gaab, 2019; wu, lindsted, tsai, & lee, 2008), but such research has not been performed in the south african context. against this background, the present study examines the reliability and validity of the basic traits inventory (bti) (taylor & de bruin, 2006), which has been shown to yield reliable and valid measures of the big five in adults (ramsay et al., 2008; taylor & de bruin, 2006, 2013), amongst adolescents in the south african context. the big five model of personality is arguably the most widely accepted model of personality traits. there are many instruments that have been developed using this model as the underlying structure (cf. eds. de raad & perugini, 2002), for instance the neo-pi-r (costa & mccrae, 2008), the hogan personality inventory (hogan & hogan, 2007), the big five inventory (goldberg, 1993) and the big-five questionnaire (caprara et al., 1993). measures of the big five have also been developed for children and adolescents, for example, hierarchical personality inventory for children (hipic; mervielde & de fruyt, 2002) and the five-factor personality inventory – children (ffpi-c; mcgheem, ehrier, & buckhalt, 2007). taylor and de bruin (2006) developed the bti as a measure of the big five traits amongst adults in the multicultural and multilingual south african context, where the vast majority of the population are african. similarly, fetvadjiev, meiring, van de vijver, nel and hill (2015) recently developed the south african personality inventory (sapi), which includes but is not restricted to the big five factors. the structure and measurement of personality in adolescence the dominant model of personality structure amongst adults specifies that individual differences in personality attributes can be optimally described in terms of five factors. whereas the labels and definitions of the traits varies somewhat across countries, instruments and authors, the five traits are commonly labelled as extraversion, neuroticism, conscientiousness, agreeableness and openness/intellect (de raad, 2000). the five factors present a satisfactory balance between bandwidth and fidelity, which means that the factors provide an economical description of personality, yet allow for meaningful prediction of important outcomes (ones & viswesvaran, 1996), such as health, education and work performance (e.g. cheng, weiss, & siegel, 2015; judge & zapata, 2015). the big five structure has been shown to be replicable across different cultures amongst adults (rolland, 2002). personality psychologists have also demonstrated that the big five factors are also useful in personality description of adolescents (de fruyt, mervielde, hoekstra, & rolland, 2000; mccrae, martin, & costa, 2005; mervielde & de fruyt, 2002; parker & stumpf, 1998; shiner & caspi, 2003; soto & tackett, 2015). indeed, the so-called little six model of personality in childhood and adolescence includes the big five traits with the addition of activity as a sixth trait (soto & john, 2014; soto & tackett, 2015). there has been some debate in the literature as to whether the big five model of personality is an adequate representation of the structure of personality in childhood and adolescence. soto, john, gosling and potter (2008) postulated that the factor structures of personality measures ‘should be recovered less clearly in the responses of children and adolescents than in those of adults’ (p. 720). this is related to an increasing awareness of identity and differentiation of self-concept with age, which should lead to more clearly defined structures amongst adults. in this respect it appears that self-report ratings indeed become more consistent, and factors are better differentiated with an increase in age (soto et al., 2008; soto & john, 2014). a related issue is the consistency with which persons respond to evaluations of their own behaviour and attributes and how this relates to age. in this respect soto et al. (2008) suggested that older adolescents would likely respond more consistently to personality items than younger adolescents, which would manifest in better reliability coefficients of measures of the traits for older adolescents. they suggest that the reason for this is because older adolescents have more developed self-concepts and better ability to evaluate the issues of logical consistency when rating their own behaviour. hierarchy and continuity soto, john, gosling and potter (2011), highlighted two principles with respect to youth personality development. the first principle states that that youth personality traits are organised hierarchically in a similar fashion to adult traits (soto & john, 2014) that is higher-order traits (e.g. agreeability) subsuming narrow, lower-order ones (e.g. modesty and generosity). the cumulative-continuity principle states that changes in personality traits occur during the transition from childhood to adulthood and that traits reach their highest levels of stability in adulthood (roberts & delvecchio, 2000; soto & tackett, 2015). basic traits inventory the bti (taylor & de bruin, 2006) was developed using a combined emic–etic approach to test development. from an etic perspective, the big five taxonomy was used to inform the five-factor structure of the inventory. from an emic perspective, the items were developed keeping the multilingual and multicultural south african context in mind. two versions are available, namely an english and an afrikaans version. the items are brief, require a low reading level and avoid cultural particularities. in other words, the development philosophy was that the items should be comprehensible for persons who complete it in a language other than their first language and that the items should include content that would be relevant to most adults in south africa. for a more detailed description of the development of the bti, please consult taylor (2004, 2008) or taylor and de bruin (2006). the bti measures five factors, namely extraversion, neuroticism, conscientiousness, openness to experience and agreeableness. each factor has a number of sub-factors (varying between four and five), called facets, which measure narrow aspects of the broader factors and provide potentially rich interpretive information (e.g. de vine & morgan, 2020). the five-factor structure of the bti has been replicated with adults across gender groups, language groups and cultural groups in numerous studies (ramsay et al., 2008; taylor, 2008; taylor & de bruin, 2006). the reliability of the five-factor scores is consistently higher than 0.85 and most facets consistently demonstrate reliability coefficients of 0.70 and above (taylor & de bruin, 2006, 2013). against the background of the lack of personality inventories suitable for the use with adolescents and emerging evidence that the big five structure is replicable and provides adequate descriptions of personality amongst adolescents (cheng et al., 2015; mccrae et al., 2005; wu et al., 2008), the present study examined the factor structure and reliability of the bti for south african adolescents. the bti was deemed appropriate given the steps that were taken in its development to keep the content brief and simple, which could facilitate its use with adolescents. it is necessary to demonstrate the structural validity and reliability of the bti with adolescents given the cautions that have been expressed regarding the differentiation and consistency of personality in childhood and adolescence in particular, the present study examined (1) the similarity of the factor structure of the bti, and (2) the reliability of the big five traits and their facets across younger adolescents (12–15 years), older adolescents (16–18 years) and the normative adult sample (18–72 years). method participants participants were 450 boys and 415 girls from various schools across south africa. there was an almost even split between black (n = 313; 36%) and white (n = 321; 37%) adolescent respondents. an additional 10% was made up of mixed-race respondents (n = 91), whereas asian or indian respondents made up 4% of the sample (n = 35). younger adolescents (15 years and younger, mage = 14.54, s.d. = 0.56, age range = 13–15 years) made up 44% of the sample (n = 381) and older adolescents (16–18 years, mage = 16.90, s.d. = 0.79, age range = 16–years) made up 56% of the sample (n = 484). all participants were high school learners. the normative adult sample consisted of 5352 participants (n = 3323 female respondents). the participants specified their ethnic group as black people (n = 3548; 66.3%), white people (n = 790; 14.8%), mixed-race (n = 180; 3.4%), asian (n = 139; 2.6%), other (n = 31; 0.6%) and 12.4% (n = 664) didn’t specify their corresponding ethnic group. the sample consisted of adults aged 18 to 72 years (mage = 24.81, s.d. = 5.67). the participants in the normative sample completed the bti for selection and personal development purposes. procedure data were collected over a period of 3 years across a number of different initiatives in different provinces. data collection in other provinces was performed as part of career information processes or other youth initiatives that were not necessarily large-scale school assessments. for all data collection initiatives, participation was voluntary and participants were provided with feedback on their results along with a personal development workshop or personal feedback session. parental consent was obtained where required, along with individual informed consent from each of the participants. assessments were administered in a supervised setting, either using paper and pencil format or online through the jvr online platform that hosts the bti scoring and reporting facility. qualitative evaluation a total of 27 second-language english high school learners (13 boys and 14 girls) were asked to evaluate the items of the english bti in terms of their relevance to them at their age (between 14 and 17 years). a total of 7 of the 193 items were flagged as potentially problematic. two items were flagged because of content related to working long hours, where some learners indicated that they did not have a job, so could not answer. three items were flagged with regard to contributing to charity or lending money, which some learners indicated that they do not earn money so could not contribute. one item was flagged regarding the discussion of politics, where some learners indicated that they were not interested in political matters. one item regarding making changes in the house was flagged as the learners indicated that they did not own houses. the flagged items were retained in all statistical analyses, however, were earmarked for revision in a future version of the bti for adolescents. results we used the psych package (revelle, 2017) in r (r core team, 2016) to subject the correlations of the 24 bti facets of the pooled adolescent data set to an unrestricted unweighted least squares factor analysis. we decided the number of factors to retain with reference to velicer’s map test, the empirical bayesian information criterion (ebic), horn’s parallel analysis, cattell’s scree plot, the root mean squared residual (rmsea) and the standardised root mean squared residual (srmr). the maximum a-posteriori (map) criterion and the ebic reached their respective minima with five factors. parallel analysis evidenced that only the first five roots had eigenvalues that exceeded those of random data, and the scree plot revealed a clear elbow in the plot of the eigenvalues after the fifth root. with five factors extracted, the rmsea = 0.05 and the srmr = 0.02 suggested satisfactory fit with the observed data. against this background, and theoretical expectation, we retained five factors. next, we obtained separate unweighted least squares five-factor solutions for adolescents of 15 years and younger (labelled the younger adolescents) on the one hand and adolescents 16 year and older (labelled the older adolescents) on the other hand. for each group the factor solution was rotated to a target structure based on the theoretical structure of the bti. the factors were allowed to freely correlate. in both groups the obtained factor pattern matrices corresponded closely with the theoretical structure of the bti.1 on the basis of the pattern of high and low loadings of the 24 facets the factors were labelled as follows: factor 1 = conscientiousness; factor 2 = neuroticism; factor 3 = openness to experience; factor 4 = agreeableness and factor 5 = extraversion. next, we examined the similarity of the empirically obtained factor pattern matrices of the two groups. congruence coefficients (tucker’s phi coefficient) of corresponding factors show that the factors manifested very similarly across the two groups of adolescents (conscientiousness, φ = 0.98; neuroticism, φ = 0.99; openness, φ = 0.97; agreeableness, φ = 0.98; and extraversion, φ = 0.96; table 1). these coefficients indicate that the corresponding factors of the younger and older adolescents can be considered similar (lorenzo-seva & ten berge, 2006). table 1: coefficients of congruence of basic traits inventory factors for younger and older adolescents. we also compared the empirical target rotated factor pattern matrices of the two adolescent groups with the rotated factor pattern matrix of the adult standardisation sample (as reported in table 8 of the bti manual [taylor & de bruin, 2006]). for both adolescent groups each facet’s primary factor loading corresponded with the pattern of loadings observed for adults. the coefficients of congruence of the corresponding factors of the younger adolescents and adults were as follows: conscientiousness, φ = 0.97; neuroticism, φ = 0.99; openness, φ = 0.97; agreeableness, φ = 0.97; and extraversion, φ = 0.96. in turn, the coefficients for the older adolescents and the adults were as follows: conscientiousness, φ = 0.97; neuroticism, φ = 0.98; openness, φ = 0.97; agreeableness, φ = 0.97; and extraversion, φ = 0.97. these coefficients indicate that the corresponding factors of the adolescents and the working adults can be considered similar. on the basis of the high levels of factor similarity across the younger and older adolescents we obtained a target rotated solution for the pooled adolescent data set. the factor pattern matrix is presented in table 2, which shows that each facet had a salient loading (> 0.30) on its target factor. two facets had cross-loadings that just exceeded the |0.30| criterion on non-target facets (i.e. ascendance on the conscientiousness factor (λ = 0.33) and excitement-seeking on the conscientiousness factor [λ = −0.33]). the factor correlations are given in table 3, which shows small to medium sized correlations between the five factors. somewhat lower correlations between the factors were observed for the adolescents compared with the adults. table 2: oblique target rotated factor pattern matrix of the 24 basic traits inventory facets for the pooled adolescent group. table 3: basic traits inventory factor correlations for the pooled adolescent group and normative working adult group. the reliability coefficients (cronbach’s alpha) of the five factors and the facets across the two groups and the working adult group are given in table 4, which shows that the personality scales yielded scores with similar levels of measurement precision across the three groups. across the five traits the reliability coefficients of the three groups were within |0.01| of each other, with coefficients ranging from 0.86 to 0.95 across the five traits. overall, the reliability coefficients indicate a high level of measurement precision for each of the five scales. table 4: reliability coefficients of the basic traits inventory scales and facets for adolescents and adults. discussion against the background of a dearth of suitable tools for the measurement of personality traits amongst adolescents, we examined the construct validity of the bti amongst south african adolescents. as a whole the results support the replicability of the factor structure of the bti (and therefore its construct validity) and indicate that the scales yield highly reliable scores for adolescents. in the paragraphs that follow we discuss these results in more detail. results show that the theoretical big five structure of the bti was replicated in the empirical correlations of the 24 facets amongst younger (13–15 years) and older (16–18 years) adolescents. these results underline the robustness of the big five factors (i.e. extraversion, neuroticism, conscientiousness, openness for experience and agreeability) across age groups and support the construct validity of the bti scales for adolescents. the big five factors manifested almost identically amongst the younger and older adolescents. these factors were also almost identical with the factors of the normative working adult group, which suggest that the adolescents and adults attached similar meaning to the content of the bti items. mccrae et al. (2005) and wu et al. (2008) similarly demonstrated that the factor structures of the neo-pi-3 (mccrae et al., 2005) and neo-pi-r (wu et al., 2008), respectively, were replicable across adolescents and adults. the reliabilities of scores yielded by the big five scales were uniformly high across the two adolescent groups and similar to the reliabilities reported for adults by taylor and de bruin (2006). hence, adolescents and adults responded with similar consistency to the items. these results suggest that the structure and coherence of personality, as reflected in responses to the bti, might be established amongst adolescents as young as 13 years. practical implications the replicable factor structure and high reliabilities of the five scales suggest that the bti holds promise as a measure of the big five traits in adolescents, which opens possibilities for future personality research amongst this group in the south african context. qualitative analysis revealed a small number of items (about 3%) with content that do not directly apply to adolescents, for example, items related to work, owning a house and contributing towards charity. the psychometric analyses suggested that the inclusion of these items do not detract from the measurement quality (i.e. the factor structure or reliability) of the scales, but it is necessary to revise these items for future applications. limitations we adopted a top-down approach, where a measure that was developed for adults is examined with respect to its utility for measuring personality traits in adolescents [(see de fruyt et al., 2000; mccrae et al., 2005 for studies where a similar approach was adopted]. whereas the evidence in support of the replicability of the bti factor structure amongst adults support the construct validity of these factors, it does not necessarily mean that the particular set of bti facets are optimal for the description of personality amongst adolescents. indeed, it is possible that a bottom-up approach may yield a different set of facets as indicators of the broader big five traits. in this respect it is perhaps useful to emphasise that there is no such thing as ‘the correct set of facets’ for the big five factors. ultimately, as long as the facets are proper indicators of the factors, it is the utility of the chosen facets that matters and in this respect studies that examine the predictive validity of the bti scales with respect to educational, health and social outcomes represent a fruitful area of further research. a second limitation is that the study focussed on structural similarity and reliability of the bti for adolescents at the scale or trait level only. whereas these results were supportive of the construct validity of the scales it is possible that some items may function less than optimally for adolescents. further research should examine the quality of individual items when used with adolescents and whether the items function equivalently for younger and older adolescents. conclusion the results indicate that the bti, which was developed for adults, holds promise as a measure of the big five personality traits amongst adolescents. the bti appears to be a potentially useful tool to track the development of personality in adolescence. in addition, practitioners who are interested in the role of personality traits in educational and career counselling with adolescents might fruitfully employ the bti in these contexts. as a whole, these results add to the growing body of evidence that supports the validity and usefulness of the big five personality traits in south africa and the validity of the bti as a measure of these traits. acknowledgements competing interests the authors declares that they had no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions g.p.d.b. contributed to the literature review, method, results and discussion. n.t. contributed to the literature review, method and discussion. she also contributed to the data collection process. a.s.z. contributed to the literature review, method, results and discussion. ethical considerations permission was granted to collect data in gauteng schools by the gauteng department of education (d2017/187g). funding information this research received no specific grant from any funding agency in the public, commercial or non-for-profit sectors. data availability data has been stored on osf online repository and can be accessed via the following link: https://osf.io/q4967/?view_only=ff272b303fbc4b169c53a762f2a5bfff. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references caprara, g.v., barbaranelli, c., borgogni, l., & perugini, m. (1993). the ‘big five questionnaire’: a new questionnaire to assess the five factor model. personality and individual differences, 15(3), 281–288. https://doi.org/10.1016/0191-8869(93)90218-r cheng, c.-h.e., weiss, j.w., & siegel, j.m. (2015). personality traits and health behaviors as predictors of subjective wellbeing among a multiethnic sample of university-attending emerging young adults. international journal of wellbeing, 5(3), 21–43. https://doi.org/10.5502/ijw.v5i3.2 costa, p.t., & mccrae, r.r. (2008). the revised neo personality inventory (neo-pi-r). in g.j. boyle, g. matthews, & d.h. saklofske (eds.), the sage handbook of personality theory and assessment: volume 2 – personality measurement and testing (pp. 179–198), thousand oaks: sage. https://doi.org/10.4135/9781849200479.n9 de fruyt, f., mervielde, i., hoekstra, h.a., & rolland, j.p. (2000). assessing adolescents’ personality with the neo pi-r. assessment, 7(4), 329–345. https://doi.org/10.1177/107319110000700403 de raad, b. (2000). the big five personality factors: the psycholexical approach to personality. göttingen: hogrefe & huber publishers. de raad, b., & perugini, m. (eds.). (2002). big five assessment. seattle: hogrefe & huber publisher. de vine, j.b., & morgan, b. (2020). the relationship between personality facets and burnout. sa journal of industrial psychology, 46, a1786. https://doi.org/10.4102/sajip.v46i0.1786 fetvadjiev, v.h., meiring, d., van de vijver, f.j.r., nel, j.a., & hill, c. (2015). the south african personality inventory (sapi): a culture-informed instrument for the country’s main ethnocultural groups. psychological assessment, 27(3), 827–837. https://doi.org/10.1037/pas0000078 goldberg, l.r. (1993). the structure of phenotypic personality traits. american psychologist, 1–34. https://doi.org/https://doi.org/10.1037/0003-066x.48.1.26 hogan, r., & hogan, j. (2007). hogan personality inventory (3rd ed.). hogan assessment system. retrieved from www.hoganpress.com john, r.k., xavier, b., waldmeier, a., meyer, a., & gaab, j. (2019). psychometric evaluation of the bfi-10 and the neo-ffi-3 in indian adolescents. frontiers in psychology, 10, 1057. https://doi.org/10.3389/fpsyg.2019.01057 judge, t.a., & zapata, c.p. (2015). the person–situation debate revisited: effect of situation strength and trait activation on the validity of the big five personality traits in predicting job performance. academy of management journal, 58(4), 1149–1179. https://doi.org/10.5465/amj.2010.0837 lorenzo-seva, u., & ten berge, j.m.f. (2006). tucker’s congruence coefficient as a meaningful index of factor similarity. methodology, 2(2), 57–64. https://doi.org/10.1027/1614-2241.2.2.57 mccrae, r.r., martin, t.a., & costa, p.t. (2005). age trends and age norms for the neo personality inventory-3 in adolescents and adults. assessment, 12(4), 363–373. https://doi.org/10.1177/1073191105279724 mcgheem, r.l., ehrier, d.j., & buckhalt, j.a. (2007). five-factor personality inventory – children (ffpi-c). austin, texas: pro-ed. mervielde, i., & de fruyt, f. (2002). assessing children’s traits with the hierarchical personality inventory for children. in b. de raad & m. perugini (eds.), big five assessment (pp. 129–142). göttingen: hogrefe & huber publishers. ones, d.s., & viswesvaran, c. (1996). bandwidth-fidelity dilemma in personality measurement for personnel selection. journal of organizational behavior, 17(6), 609–626. https://doi.org/10.1002/(sici)1099-1379(199611)17:6<609::aid-job1828>3.0.co;2-k parker, w.d., & stumpf, h. (1998). a validation of the five-factor model of personality in academically talented youth across observers and instruments. personality and individual differences, 25(6), 1005–1025. https://doi.org/10.1016/s0191-8869(98)00016-6 ramsay, l.j., taylor, n., de bruin, g.p., & meiring, d. (2008). the big five personality factors at work: a south african validation study. in j. deller (ed.), research contributions to personality at work (pp. 99–114). munich, germany: rainer hampp verlag. r core team. (2016). r: a language and environment for statistical computing. r foundation for statistical computing. retrieved from http://www.r-project.org/ revelle, w.r. (2017). psych: procedures for personality and psychological research. retrieved from https://cran.r-project.org/package=psych roberts, b.w., & delvecchio, w.f. (2000). the rank-order consistency of personality traits from childhood to old age: a quantitative review of longitudinal studies. psychological bulletin, 126(1), 3–25. https://doi.org/10.1037/0033-2909.126.1.3 rolland, j.p. (2002). the cross-cultural generalizability of the five-factor model of personality. in r.r. mccrae & j. allik (eds.), the five-factor model of personality across cultures (pp. 7–28). new york: springer us. https://doi.org/10.1007/978-1-4615-0763-5_2 shiner, r., & caspi, a. (2003). personality differences in childhood and adolescence: measurement, development, and consequences. journal of child psychology and psychiatry, 44(1), 2–32. https://doi.org/10.1111/1469-7610.00101 shl. (2009). opq32r user manual. washington: shl group. soto, c.j., & john, o.p. (2014). traits in transition: the structure of parent-reported personality traits from early childhood to early adulthood. journal of personality, 82(3), 182–199. https://doi.org/10.1111/jopy.12044 soto, c.j., john, o.p., gosling, s.d., & potter, j. (2008). the developmental psychometrics of big five self-reports: acquiescence, factor structure, coherence, and differentiation from ages 10 to 20. journal of personality and social psychology, 94(4), 718–737. https://doi.org/10.1037/0022-3514.94.4.718 soto, c.j., john, o.p., gosling, s.d., & potter, j. (2011). age differences in personality traits from 10 to 65: big five domains and facets in a large cross-sectional sample. journal of personality and social psychology, 100(2), 330–348. https://doi.org/10.1037/a0021717 soto, c.j., & tackett, j.l. (2015). personality traits in childhood and adolescence: structure, development, and outcomes. current directions in psychological science, 24(5), 358–362. https://doi.org/10.1177/0963721415589345 taylor, n. (2004). the construction of a south african five-factor personality questionnaire. unpublished master’s dissertation. johannesburg: rand afrikaans university. taylor, n. (2008). construct, item and response bias across cultures in personality measurement. unpublished doctoral dissertation. johannesburg: rand afrikaans university. taylor, n., & de bruin, g.p. (2006). basic traits inventory: technical manual. johannesburg: jopie van rooyen & partners. taylor, n., & de bruin, g.p. (2013). the basic traits inventory. in s. laher & k. cockroft (eds.), psychological assessment in south africa (pp. 232–243). johannesburg: wits university press. wu, k., lindsted, k.d., tsai, s.-y., & lee, j.w. (2008). chinese neo-pi-r in taiwanese adolescents. personality and individual differences, 44(3), 656–667. https://doi.org/10.1016/j.paid.2007.09.025 footnote 1. these factor pattern matrices can be obtained from the first author on request. abstract introduction materials and methods results discussion acknowledgements references about the author(s) martins c. nweke department of physiotherapy, university of pretoria, pretoria, south africa nalini govender department of basic medical sciences, durban university of technology, durban, south africa aderonke akinpelu department of physiotherapy, university of ibadan, ibadan, nigeria adesola ogunniyi department of medicine, university of ibadan, ibadan, nigeria nombeko mshunqane department of physiotherapy, university of pretoria, pretoria, south africa citation nweke, m.c., govender, n., akinpelu, a., ogunniyi, a., & mshunqane, n. (2022). reliability, minimum detectable change and sociodemographic biases of selected neuropsychological tests among people living with hiv in south-eastern nigeria. african journal of psychological assessment, 4(0), a84. https://doi.org/10.4102/ajopa.v4i0.84 original research reliability, minimum detectable change and sociodemographic biases of selected neuropsychological tests among people living with hiv in south-eastern nigeria martins c. nweke, nalini govender, aderonke akinpelu, adesola ogunniyi, nombeko mshunqane received: 27 aug. 2021; accepted: 04 jan. 2022; published: 28 apr. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract verification of the psychometric properties of neuropsychological (np) tests in each society of people living with hiv (plwhiv) will facilitate accurate classification of hiv-associated neurocognitive disorder. this study aimed to determine the reliability, minimum detectable change (mdc) and sociodemographic biases of selected np tests among plwhiv. the study took place at the hiv clinic of the university of nigeria teaching hospital, enugu. a total of 60 plwhiv were randomly recruited into two groups of 30 each. the first group was evaluated by two independent examiners (inter-rater) and the other by a single rater (intra-rater). the np tests utilised included the hopkins verbal learning test-revised (hvlt-r), controlled oral word association test (cowat), trail making test-a (tmt-a) and -b (tmt-b), digit span test-forward (dst-f) and -backward (dst-b). we examined agreement using intra-class correlation (icc), standard error of measurement and mdc. we verified the influence of sociodemographic variables on test performance using man–whitney u-test and kruskal–wallis test. the hvlt-rdelay recall (dr), tmt-a, tmt-b and cowat showed excellent inter-rater reliability with icc values of 0.83, 0.86, 0.78 and 0.89, respectively. the hvlt-rverbal learning (vl), dst-f and dst-b showed moderate inter-rater reliability with iccs of 0.4.99, 0.52 and 0.60, respectively. the hvlt-r-dr, tmt-a, dst-b and cowat showed excellent intra-rater reliability, with iccs values of 0.76, 0.80, 0.84 and 0.97, respectively. the tmt-a, dst-f and dst-b were free from sociodemographic bias. the hvlt-r-dr, tmt-a, tmt-b, dst-f, dst-b and cowat are reliable candidate np tests for plwhiv in our setting. keywords: hiv; neurocognitive disorder; neuropsychological assessment; reliability; nigeria. introduction human immunodeficiency virus (hiv)-associated neurocognitive disorder (hand) remains prevalent in the sub-saharan part of africa, especially in nigeria, which has the world’s second-highest hiv and aids incidence (awofala & ogundele, 2018; nweke, akinpelu, & ezema, 2019; yakasai et al., 2015). hiv targets the central nervous system (cns) and compromises the blood–brain barrier, causing hiv-infected microglia to infiltrate the cns and secrete neurotoxic viral proteins such as tat and proinflammatory cytokines, as well as disrupting neurogenesis, dysregulation of cd4+ t cells and damaging synaptodendritic networks (cody & vance, 2016). these occurrences result in damage to specific brain structures and neural circuits, increasing the brain’s predisposition for acquiring subsequent neuropsychological (np) diseases, such as neurocognitive disorder (watkins & treisman, 2012). the hand is a spectrum of disorders comprising hiv-associated dementia (had), mild neurocognitive disorder and asymptomatic neurocognitive impairment (antinori et al., 2007). memory, learning information processing speed, executive function, attention and/or concentration and verbal fluency are the most frequently impaired cognitive abilities in people living with hiv (plwhiv) (yakasai et al., 2015). the widespread use of antiretroviral therapy (art) has reduced the burden of had, notwithstanding, the prevalence of mild but limiting phenotypes of hand remains high (yakasai et al., 2015). therefore, proper screening tools are necessary to ensure effective evaluation and prompt initiation of treatment for hand. until now, in sub-saharan africa, brief measures such as the hiv dementia scale and the international hiv dementia scale, montreal cognitive assessment test, neuroscreen and cogstate brief battery dominate neurocognitive screening among plwhiv (mwangala, newton, abas, & abubakar, 2019). their simplicity of administration is fascinating but they are characterised with diagnostic weakness such as an inability to detect asymptomatic or mild cognitive impairment that makes them unacceptable when pursuing a definitive diagnosis of hand (singh et al., 2010). towards the diagnosis of hand, the comprehensive np battery tests are the tools of choice (robertson, liner, & heaton, 2009). they are the most useful instruments for identifying and classifying the impact of hiv or aids vis-a-vis the cns (robertson et al., 2009). in a resource-limited environment where sophisticated laboratory and neuroimaging techniques are not accessible, the application of np screening to characterise neurocognitive functioning among plwhiv is essential to successful diagnosis and treatment of hand (robertson et al., 2009). before now, the utility of the np battery tests in ssa was narrow because they require instrumentations that are not common in the region. also, their administration demands previous clinical or research experience among people living with hand (nweke, mshunqane, govender, & akinpelu, 2021; singh et al., 2010). in nigeria, the use of np tests in the diagnosis of hand is gaining momentum in recent times (jumar et al., 2017; robertson et al., 2016). the np tests are the gold standard for the diagnosis of hand but they are cumbersome and culture-specific (fernandez & marcopulos, 2008) and this makes up one of the well-known challenges faced by neuro-hiv researchers or clinicians (yeatesh & taylor, 2011). they often require that two or more raters administer them to participants to reduce their burden (andrews, janzen, & saklofske, 2001). this is especially true of clinical trials or normative studies where many participants undergo screening following the diagnostic criteria stipulated by the american academy of neurology’s research nomenclature and diagnostic method (antinori et al., 2007). attrition is a common observation in clinical trials and results from premature discontinuation, missed or flawed assessments (sridhara, mandrekar, & dodd, 2013). the cumbersomeness of np testing may put the attrition rate above acceptable attrition margin (20%) (nunan, aronson, & bankhead, 2018), thus undermining the feasibility of a clinical trial or observational study requiring large-scale testing. in the research setting where the same assessor or other assessors may follow up treatment outcomes, there is a need to examine the reliability of commonly used np tests (mchugh, 2012). the significance of rater reliability stems from the fact that it indicates the degree to which the data obtained are accurate representations of the variables being examined (mchugh, 2012). in clinical research, reliability is most commonly verified using intra-class correlation (icc) (lee et al., 2013). however, the use of icc in the estimation of reliability presents a weakness as it is a measure of relative reliability, thus underscoring the necessity of the estimation of minimum detectable change (mdc) (lee et al., 2013). the mdc is a precise measure of measurement error and consistency (lee et al., 2013). based on the data available to us, no study has examined the psychometric properties of np tests in the nigerian context. hence, this study aimed at examining the reliability and mdc of selected np tests and the effects of demographic characteristics. materials and methods participants the study was a cross-sectional study conducted among hiv-positive adults aged between 18 and 64 years receiving art at the hiv clinic of the university of nigeria teaching hospital, ituku-ozalla, enugu, nigeria. the study took place between june and august 2020. the study constitutes one of the preliminary steps undertaken in pursuit of a clinical trial investigating the ‘effects of an aerobic exercise programme on neurocognitive disorder in nigeria’. owing to the elaborateness of the np evaluations, the study took place in two batches using different study samples to save patients’ waiting time as many of the participants visited the clinic from distant locations while reducing contact time with patients, in the wake of coronavirus disease 2019 (covid-19). a simple random sampling, balloting (3 ‘yes’:1 ‘no’) was used to select study participants. those who picked ‘yes’ from the ballot box were recruited into the study and consecutively assigned to raters who carried out the independent evaluation. the first batch involved a sample of 30 hiv-positive adults recruited to examine the inter-rater reliability between a neurological physiotherapist and a clinical psychologist, while the second batch, which involved a sample of 30 hiv-positive adults examined the intra-rater reliability. the study population was plwhiv on art. in line with viechtbauera et al. (2015), we estimated the sample size for this pilot study using a pilot sample size calculator available at http://www.pilotsamplesize.com. using a hand prevalence rate of 21.5% (yusuf et al., 2014), an acceptable level of withdrawal or an incomplete assessment rate of 10%, we required a minimum sample size of 28 to detect the problem with 95% confidence. the inclusion criteria for the study include being hiv positive, adult between 18 and 65 years, being on art for at least three months, formal (primary six) education with an ability to use english and capacity for consent. for the inter-rater examination, we used the consecutive assessment to assign participants to the raters. we excluded the following participants: individuals above 65 years of age, smokers, alcohol-dependents, substance abusers, individuals with cardiorespiratory disease (heart attack, asthma or chronic obstructive pulmonary disease [copd]) disease; history of focal neurological deficit, traumatic brain injury with history of loss of consciousness, stroke, psychiatric illness including depression, opportunistic infection such as tb, candidiasis, hepatitis or record blood pressure over 140/90 mmhg, structured physical activity or cognitive enhancing medication were all excluded. exclusion criteria were determined through the use of self-reports obtained during recruitment. blood pressure, depression, alcohol abuse and substance abuse were measured using appropriate instruments. instruments neurocognitive evaluations were undertaken with the aid of np tests. the np tests constitute the gold standard instrument in screening and diagnosis of hand (yakasai et al., 2015). the frascati criteria stipulate that np evaluation for plwhiv should cover at least five ability domains commonly impaired among plwhiv including verbal learning (vl), memory, working memory, attention, abstraction and/or executive function, information processing speed and verbal fluency (antinori et al., 2007). in this study, we examined four np tests covering seven ability domains. they include the hopkins verbal learning test-revised (hvlt-r), with test-retest reliability indices of 0.537–0.818 (o’neil-pirozzi, goldstein, strangman, & glenn, 2012), trail making test (tmt), with icc of 0.7–0.98 (salthouse, 2011), digit span test (dst), with a test-retest reliability index of 0.7–0.78 (groth-marnat & baker, 2003) and the controlled oral word association test (cowatf-a-s letter fluency), with an excellent inter-rater reliability index ≥ 0.9 (ross et al., 2007). these tests have been used in african and nigerian settings (singh et al., 2010; yakasai et al., 2015) and a recent clinical trial on hand (towe, puja patel, & meade, 2017). the selection of these was mainly based on their utility in african setting and ease-of-administration. procedure a clinical psychologist (evaluator 1) was trained by the lead investigator (evaluator 2), who is familiar with np testing and an acceptable degree of agreement was established. neuropsychological testing was carried out by the two independent evaluators in the first batch of the study, with each subject being assessed twice a day. only evaluator 1 completed np testing in the second batch, with each individual being assessed twice, one after the other. data from the first evaluation by evaluator 1 and data from the second evaluation by evaluator 2 completed on the same day were utilised for the inter-rater reliability evaluation (van lummel et al., 2016). data from the first and second measurements of evaluator 1 were used to assess intra-rater reliability. every effort was made to ensure that participants who were being tested stayed away from the consulting bench to avoid contamination. data analysis exploratory statistics showed that some data sets were not normally distributed and log-transformation did not improve the distribution, hence we ignored log transformation as this could result from sociodemographic biases. data were summarised using descriptive statistics. we used the icc to verify the rater reliability. specifically, interand intra-rater reliability was examined using the use icc2,1 and icc1,1, respectively. both icc2,1 and icc1,1 were based on the absolute agreement in a two-way (mixed-effects) repeated-measures analysis of variance model. according to the literature, the icc values were graded as follows: weak (icc < 0.40), moderate (icc between 0.40 and 0.75) and excellent (icc > 0.75) (sedrez et al., 2016). the following formula was used to calculate the standard error of measurement: we estimated the mdc using the formula: based on a 95% confidence interval (ci). the mdc of < 30% and < 10% were deemed acceptable and excellent, respectively (lee et al., 2013). after bland-altman procedures proportional bias was examined. the mann–whitney u test and the kruskal–wallis test were employed to test the putative impact of sociodemographic factors on np test performance. the statistical package for social sciences version 22 (ibm corp., 2012) was used, with an alpha set at 0.05. ethical considerations this review was approved by the research ethics committee of the faculty of health sciences, university of pretoria (ethics reference number: 152/2020), which complies with the international conference on harmonisation-good clinical practice (ich-gcp) guidelines. results a total of 60 plwhiv (30 each for interand intra-rater reliability groups) with a mean age of 43.5 ± 8.2 years took part in the study. participants in both groups were comparable concerning the sampled clinical and sociodemographic characteristics. in both groups, females were twice as many as men. seventy per cent of the participants had secondary to tertiary education (table 1). table 1: clinical and sociodemographic variables. evaluation of the inter-rater reliability showed moderate to excellent reliability, with limited to an acceptable level of random measurement error. specifically, the hvlt-r-dr domain, tmt-a, tmt-b and cowat showed excellent inter-rater reliability with icc values of 0.83, 0.86, 0.78 and 0.89, respectively. three tests, namely the hvlt-r-vl, dst-f and dst-b showed moderate inter-rater reliability with iccs of 0.4.99, 0.52 and 0.60, respectively. the result shows that all the tests except hvlt-r possessed mdc percentage within the predefined limit of acceptance (<30%). there was no proportional bias throughout the inter-rater evaluations (p > 0.05) (table 2). table 2: result for inter-rater reliability showing intra-class correlation. assessment of intra-rater reliability showed that all the tests except hvlt-r-vl possessed moderate to excellent intra-rater reliability. four tests, namely hvlt-r-dr, tmt-a, dst-b and cowat showed excellent intra-rater reliability, with icc values of 0.76, 0.80, 0.84 and 0.97, respectively. the result shows that all the tests except hvlt-r possessed mdc percentage within the predefined limit of acceptance (< 30%). the dst-f, dst-b and cowat had mdc percentage less than 10%. three tests, namely the hvlt-r-vl, dst-f and dst-b showed moderate intra-rater reliability with iccs of 0.4.99, 0.52 and 0.60, respectively. intra-rater proportional bias was observed in hvltr-vl (β = –0.475; p = 0.011) and hvltr-dr (β = –0.371; p = 0.047) (table 3). table 3: result for intra-rater reliability showing intra-class correlation. examination of the sociodemographic biases of the np test revealed that three of the seven tests, namely the tmt-a, dst-f and dst-b were free from any form of sociodemographic bias, while four tests, namely the hvlt-r-vl, hvltr-dr, tmt-b and cowat possessed at least one form of sociodemographic bias. specifically, the inter-rater performance of the cowat was significantly related to age (rho = 0.36; 0.048). the intra-rater performance of both the hvlt-r-vl (p = 0.001) and hvlt-r-dr (p = 0.010) was affected by sex, with females showing better performance than their male counterparts. the inter-rater performance of the hvlt-r-vl (p = 0.046) and the intra-rater performance of the tmt-b (p = 0.003) showed education bias, with individuals with tertiary and secondary education showing the better performance (p = 0.007) than those with primary education (table 4). table 4: influence of sociodemographic characteristics on test score of selected neuropsychological tests. discussion all the np tests showed moderate to excellent inter-rater reliability, with acceptable levels of measurement error as determined by the mdc percentage. to facilitate the right interpretation of the results obtained during clinical or research follow-up, it is essential to note the variability inherent in the measurement as defined by the sem. the sem shows that the measurement’s precision varied from 0.2 to 5.3. we can consider these values clinically acceptable, showing that np evaluation can be reliably conducted by two more raters using the selected tests. this simply proves the fact that np assessments can be reliably conducted by two or more raters and yield similar results. by implication, np evaluations using these tests can be undertaken by two or more raters with similar experience or training during a clinical or research follow-up. there is a paucity of literature on the inter-reliability of the selected np tests among plwhiv, notwithstanding they remain the gold standard instrument in the diagnosis of neurocognitive disorders among plwhiv (antinori et al., 2007). the finding of this study collaborates with that of singh et al. (2010), in which an icc of 0.89 was obtained between a psychiatrist and a psychologist. similarly, fals-stewart (1992) reported excellent inter-rater reliability with the use of tmt-a and -b, although in an hiv-seronegative population. using two or more evaluators in the assessment of neurocognitive performance is especially important in normative studies and clinical trials of many participants. the mdc percentage for the seven tests was within the predefined limit of acceptance, thus suggesting that that the tests possess good sensitivity among plwhiv in nigeria. this provides clinicians and researchers with a baseline for the evaluation of the impact of a clinical intervention on np performance of plwhiv in nigeria. there was no inter-rater proportional bias, indicating that measurement approaches agree evenly over the measurement range. this means that the boundaries of the agreement are unaffected by the measurement itself (bland & altman, 1999). as touching intra-rater reliability, all but the hvltr-vl showed good to excellent intra-rater reliability, with icc values between 0.65 and 0.97. the mdc percentage values showed limited measurement error except for the hvltr-vl, where the measurement error was greater than the pre-defined mark of 30%. the mdc values for the hvlt-r-vl, hvlt-r-dr, tmt-a, tmt-b, dst-f, dst-b and cowat were approximately eight words, one word, 18 s, 23 s, one digit, one digit and one word, respectively. the poor intra-reliability of the hvltr-vl reflects the considerable level of measurement error, which was the result of interruption experienced during evaluation. in this study, we assessed participants who were waiting to collect their drugs in an art clinic. occasionally, a few participants’ attentions were drawn during np evaluation and may this constitute a bias for the hvltr-vl. for these participants, we ensured, most of the time, that they completed the test at hand before responding to the call. we also emphasised the need to pay attention during the test. however, the poor-intra-rater reliability of the hvltr-vl suggests that the hvltr-vl is sensitive to minute distractions. regarding the effects of sociodemographic characteristics on the selected np tests, previous studies have been inconsistent in their support of age as a variable influencing verbal fluency performance, our finding supports the postulation that age affects verbal fluency performance. to nullify the potential age-related bias, the verbal fluency when used as a candidate np test requires the use of aged-matched hiv negative control. this agrees with the findings of barry, bates and labouvie (2008). however, contrary to bates & colleagues, our study showed that older adults performed better than younger adults. although verbal ability is a crystallised ability that does not deteriorate with age or minor brain dysfunction, phonemic fluency necessitates executive ability, specifically the ability to initiate and maintain effort, as well as the ability to organise information for retrieval, both of which are abilities that are sensitive to nuanced cerebral dysfunction and ageing (burke, crowder, hagan-burke, & zou, 2009; henry & beatty, 2006; plumet, gil, & gaonac’h, 2005). hence, the better cowat performance of the older adults relative to younger adults found in this study could be because of the poor sample size, with skewed age distribution: older adults made up only 10% of the participants. although education is a potential predictor of cowat performance, in our study, educational levels and sex did not influence verbal fluency. this suggests that educationand sex-adjusted norms may not be necessary for this measure in our setting. the finding that age and sex affected the performance of the hvlt-r-dr is consistent with the result obtained by vanderploeg et al. (2000), in which age and sex had a significant impact on the performance of the hvlt-r. however, unlike vanderploeg and colleagues, our studies showed that educational level did not influence participants’ performance on hvlt-r, suggesting that its election as a candidate np tests for plwhiv in our setting may not require educational consideration. unlike vanderploeg et al. (2000), it is likely that our sample size was insufficient to identify an influence of education on the hvlt-r or that educational differences were not correctly reflected by years of education. although feng et al. (2014) found that being single was linked to a higher risk of cognitive impairment than being married, our study is the first to report a significant effect of marital status on hvlt-r performance among plwhiv. however, contrary to feng et al. (2014), this study revealed better performance among single and widows compared with the ones who are married. the discrepancy could be because of the difference in the study population, with the former conducted amongst elderly chinese. in this study, the performance of tmt-b was influenced by sex. this agrees with the findings of singh et al. (2010) in which gender was associated with tmt performance. it implies that the use of tmt-b in the diagnosis of hand among plwhiv must be based on sex-matched norms. however, the fact that education influenced the tmt-b in this study is consistent with mitrushina, boone, razani and d’elia (1999) and tombaugh (2004) but contrary to singh et al. (2010), which is a south african-based study among plwhiv. this discrepancy may point to variation in sample size, and how many years of education is a true reflection of experience between the societies. just like other np tests, the effect of sociodemographic characteristics on dst performance is controversial (choi et al., 2014). some studies reported significantly higher dst score in females than in males (ostrosky-solis & azucena, 2006; singh et al., 2010), while others showed a minor or non-existent gender effect, implying that no gender-related modifications to the normative data are required (anstey, matters, brown, & lord, 2000; pena-casanova et al., 2009). our studies showed that dst performance was not subject to any sex, age or educational bias, thus indicating that sociodemographic matched norms may not be necessary when it is used as a candidate np test to aid hand diagnosis. overall, the imperfect agreement between the findings of this study and the previous studies constitutes support to the postulation that the use of cross-cultural data for a np test will lead to errors in hand classification (fernandez & marcopulos, 2008). the limitations of the study include the use of unequal sample categories, practice effect and the distractions encountered during np evaluation; the healthcare professional drew the attention of some participants during np assessment. the study draws its strength in being the first of its kind to report reliability and mdc for seven cognitive ability domains relevant to plwhiv in nigeria. the hvlt-r-dr, tmt-a, tmt-b, dst-f, dst-b and cowat are reliable candidate np tests for plwhiv. in our setting, the tmt-a, dst-f and dst-b are free from demographic bias and hence may not require demographically adjusted norms, while tmt-b, hvlt-r and cowat exhibit at least one form of sociodemographic biases. this may require that appropriate adjustments be made when they are selected as candidates for np tests in pursuit of hand diagnosis. extra caution aiming at minimising distraction should be ensured when administering the hvlt-r. further studies are recommended with large sample size to establish normative score of the selected np tests and to examine the effect of socioeconomic status on np test performance. acknowledgements the authors would like to express their profound gratitude to the clinical psychologist, mrs jane nwodo, and the research assistant, maryjane ukwuoma who converted the citations and list of references to apa format. competing interests the authors declare that they have no financial or personal relationship that may have inappropriately influenced them in writing this article. authors’ contributions m.c.n. and n.m. conceived of the presented idea. m.n. carried out the research under the supervisions of n.m., n.g., a.a. and a.o. m.c.n. carried out statistical analysis and wrote the first draft of the manuscript. m.c.n., n.m., a.g., a.a. and a.o. revised the manuscript for intellectual content. funding information the authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: this work was supported by the national student financial aid scheme (nsfas) via the university of pretoria [grant number: 1338]. data availability raw data were generated at a nigerian tertiary health institution. derived data supporting the findings of this study are available upon special request from the corresponding author, m.n. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references andrews, j., janzen, h.l., & saklofske, d.h. (2001). handbook of psychoeducational assessment ability, achievement, and behavior in children educational psychology (pp. 415–450). san diego, ca: academic press. anstey, k.j., matters, b., brown, a.k., & lord, s.r. (2000). normative data on neuropsychological tests for very old adults living in retirement villages and hostels. the clinical neuropsychologist, 14(3), 309–317. https://doi.org/10.1076/1385-4046(200008)14:3;1-p;ft309 antinori, a., arendt, g., becker, j.t., brew, b.j., byrd, d.a., cherner, m., clifford, d.b., … wojna, v.e. (2007). updated research nosology for hiv-associated neurocognitive disorders. neurology, 69(18), 1789–1799. https://doi.org/10.1212/01.wnl.0000287431.88658.8b awofala, a.a., & ogundele, o.e. (2018). hiv epidemiology in nigeria. saudi journal of biological sciences, 25(4), 697–703. https://doi.org/10.1016/j.sjbs.2016.03.006 barry, d., bates, m.e., & labouvie, e. (2008). fas and cfl forms of verbal fluency differ in difficulty: a meta-analytic study. applied neuropsychology, 15(2), 97–106. https://doi.org/10.1080/09084280802083863 bland, j.m., & altman, d.g. (1999). measuring agreement in method comparison studies. statistical methods in medical research, 8(2), 135–160. https://doi.org/10.1191/096228099673819272 burke, m.d., crowder, w., hagan-burke, s., & zou, y. (2009). a comparison of two path models for predicting reading fluency. remedial and special education, 30(2), 84–95. https://doi.org/10.1177/0741932508315047 choi, h.j., lee, d.y., seo, e.h., jo, m.k., sohn, b.k., choe, y.m., byun, m.s., … woo, j.i. (2014). a normative study of the digit span in an educationally diverse elderly population. psychiatry investigation, 11(1), 39–43. cody, s.l., & vance, d.e. (2016). the neurobiology of hiv and its impact on cognitive reserve: a review of cognitive interventions for an aging population. neurobiology of disease, 92 (pt b), 144–156. https://doi.org/10.1016/j.nbd.2016.01.011 fals-stewart, w. (1992). an interrater reliability study of the trail making test (parts a and b). perceptual & motor skills, 74(1), 39–42. https://doi.org/10.2466/pms.1992.74.1.39 feng, l., ng, x., yap, p., li, j., lee, t., håkansson, k., & kua, e. (2014). marital status and cognitive impairment among community-dwelling chinese older adults: the role of gender and social engagement. dementia and geriatric cognitive disorders extra, 4(3), 375–384. https://doi.org/10.1159/000358584 fernandez, a.l., & marcopulos, b.a. (2008). a comparison of normative data for the trail making test from several countries: equivalence of norms and considerations for interpretation 1. scandinavian journal of psychology, 49(3), 239–246. https://doi.org/10.1111/j.1467-9450.2008.00637.x groth-marnat, g., & baker, s. (2003). digit span as a measure of everyday attention: a study of ecological validity. perceptual and motor skills, 97(3 pt 2), 1209–1218. https://doi.org/10.2466/pms.2003.97.3f.1209 henry, j.d., & beatty, w.w. (2006). verbal fluency deficits in multiple sclerosis. neuropsychologia, 44(7), 1166–1174. https://doi.org/10.1016/j.neuropsychologia.2005.10.006 ibm corp. (2012). ibm spss statistics for windows, version 21.0. armonk, ny: ibm corp. jumar, j., sunshine, s., ahmed, h., el-kamary, s.s., magder, l., hungerford, l., burdo, t., … royal 3rd, w. (2017). peripheral blood lymphocyte hiv dna levels correlate with hiv associated neurocognitive disorders in nigeria. journal of neurovirology, 23(3), 474–482. https://doi.org/10.1007/s13365-017-0520-5 lee, p., liu, c., fan, c., lu, c., lu, w., & hsieh, c. (2013). the test-retest reliability and the minimal detectable change of the purdue pegboard test in schizophrenia. journal of the formosan medical association, 112(6), 332–337. https://doi.org/10.1016/j.jfma.2012.02.023 mchugh, m.l. (2012). interrater reliability: the kappa statistic. biochemia medica (zagreb), 22(3), 276–282. https://doi.org/10.11613/bm.2012.031 mitrushina, m.n., boone, k.l., razani, j., & d’elia, l. (1999). handbook of normative data for neuropsychological assessment (2nd ed.). new york, ny: oxford university press. mwangala, p.n., newton, c.r., abas, m., & abubakar, a. (2019). screening tools for hiv-associated neurocognitive disorders among adults living with hiv in sub-saharan africa: a scoping review. aas open research, 1, 28. https://doi.org/10.12688/aasopenres.12921.2 nunan, d., aronson, j., & bankhead, c. (2018). catalogue of bias: attrition bias. bmj evidence-based medicine, 23(1), 21–22. https://doi.org/10.1136/ebmed-2017-110883 nweke, m.c., akinpelu, a.o., & ezema, c.i. (2019). variation in spatio-temporal gait parameters among patients with hiv-related neurocognitive impairment. indian journal of physiotherapy and occupational therapy, 13(4), 186–191. https://doi.org/10.5958/0973-5674.2019.00158.8 nweke, m.c., mshunqane, n., govender, n., & akinpelu, o.a. (2021). physiological effects of physical activity on neurocognitive function in people living with hiv: a systematic review of intervention and observational studies. african journal for physical activity and health sciences, 27(1), 101–122. https://doi.org/10.37597/ajphes.2021.27.1.8 o’neil-pirozzi, t.m., goldstein, r., strangman, g.e., & glenn, m.b. (2012). test-re-test reliability of the hopkins verbal learning test-revised in individuals with traumatic brain injury. brain injury, 26(12), 1425–1430. https://doi.org/10.3109/02699052.2012.694561 ostrosky-solís, f., & azucena, l. (2006). digit span: effect of education and culture. international journal of psychology, 41(5), 333–341. https://doi.org/10.1080/00207590500345724 pena-casanova, j., quinones-ubeda, s., quintana-aparicio, m., aguilar, m., badenes, d., molinuevo, j.l., torner, l., … blesa, r. (2009). spanish multicenter normative studies (neuronorma project): norms for verbal span, visuospatial span, letter and number sequencing, trail making test, and symbol digit modalities test. archives of clinical neuropsychology, 24(4), 321–341. https://doi.org/10.1093/arclin/acp038 plumet, j., gil, r., & gaonac’h, d. (2005). neuropsychological assessment of executive functions in women: effects of age and education. neuropsychology, 19(5), 566–577. https://psycnet.apa.org/doi/10.1037/0894-4105.19.5.566 robertson, k., jiang, h., evans, s.r., marra cm, berzins, b, hakim, j., sacktor, n., … walawander, a. (2016). international neurocognitive normative study: neurocognitive comparison data in diverse resource-limited settings: aids clinical trials group a5271. journal of neurovirology, 22(4), 472–478. https://doi.org/10.1007/s13365-015-0415-2 robertson, k., liner, j., & heaton, r. (2009). neuropsychological assessment of hiv-infected populations in international settings. neuropsychology review, 19, 232–249. https://doi.org/10.1007/s11065-009-9096-z ross, t.p., calhoun, e., cox, t., wenner, c., kono, w., & pleasant, m. (2007). the reliability and validity of qualitative scores for the controlled oral word association test. archives of clinical neuropsychology: the official journal of the national academy of neuropsychologists, 22(4), 475–488. https://doi.org/10.1016/j.acn.2007.01.026 salthouse, t. (2011). what cognitive abilities are involved in trail-making performance? intelligence, 39, 222–232. https://doi.org/10.1016/j.intell.2011.03.001 sedrez, j.a., candotti, c.t., rosa, m.i.z., medeiros, f.s., marques, m.t., & loss, j.f. (2016). test-retest, inter-and intra-rater reliability of the flexicurve for evaluation of the spine in children. brazilian journal of physical therapy, 20(2), 142–147. https://doi.org/10.1590/bjpt-rbf.2014.0139 singh, d., joska, j.a., goodkin, k., lopez, e., myer, l., paul, r.h., … sunpath, h. (2010). normative scores for a brief neuropsychological battery for the detection of hiv-associated neurocognitive disorder (hand) among south africans. bmc research notes, 3, 28. https://doi.org/10.1186/1756-0500-3-28 sridhara, r., mandrekar, s.j., & dodd, l.e. (2013). missing data and measurement variability in assessing progression-free survival endpoint in randomized clinical trials. clinical cancer research, 19(10), 2613–2620. https://doi.org/10.1158/1078-0432.ccr-12-2938 tombaugh, t.n. (2004). trail making test a and b: normative data stratified by age and education. archives of clinical neuropsychology, 19(2), 203–214. https://doi.org/10.1016/s0887-6177(03)00039-8 towe, s.l., puja patel, b.a., & meade, c.s. (2017). the acceptability and potential utility of cognitive training to improve working memory in persons living with hiv: a preliminary randomized trial. journal of the association of nurses in aids care, 28(4), 633–643. https://doi.org/10.1016/j.jana.2017.03.007 van lummel, r.c., walgaard, s., hobert, m.a., maetzler, w., vandieën, j.h., galindo-garre, f., & terwee, c. (2016). intra-rater, inter-rater and test-retest reliability of an instrumented timed up and go (itug) test in patients with parkinson’s disease. plos one, 11(3), e0151881. https://doi.org/10.1371/journal.pone.0151881 vanderploeg, r.d., schinka, j.a, jones, t., small, b.j., graves, a.b., & mortimer, j.a. (2000). elderly norms for the hopkins verbal learning test-revised. the clinical neuropsychologist, 14(3), 318–324. https://doi.org/10.1076/1385-4046(200008)14:3;1-p;ft318 viechtbauera, w., smitsb, l., kotzc, d., budee, l., spigtc, m., serroyeng, j., & crutzen, r. (2015). a simple formula for the calculation of sample size in pilot studies. journal of clinical epidemiology, 68(11), 1375–1379. https://doi.org/10.1016/j.jclinepi.2015.04.014 watkins, c.c., & treisman, g.j. (2012). neuropsychiatric complications of aging with hiv. journal of neurovirology, 18(4), 277–290. https://doi.org/10.1007/s13365-012-0108-z yakasai, a.m., gudaji, m.i., muhammad, h., ibrahim, a., owolabi, l.f., ibrahim, d.a., babashani, m., … habib, a.g. (2015). prevalence and correlates of hiv-associated neurocognitive disorders (hand) in northwestern nigeria. neurology research international, 2015, 486960. https://doi.org/10.1155/2015/486960 yeatesh, k.w., & taylor, g. (2001). neuropsychological assessment of children. in handbook of psychoeducational assessment, ability, achievement, and behavior in children educational psychology (415–450). retrieved from https://doi.org/10.1016/b978-012058570-0/50016-1 yusuf, a.j., hassan, a., mamman, a.i., muktar, h.m., suleiman, a.m., & baiyewu, o. (2017). prevalence of hiv-associated neurocognitive disorder (hand) among patients attending a tertiary health facility in northern nigeria. journal of the international association of providers of aids care, 16(1), 48–55. https://doi.org/10.1177/2325957414553839 abstract introduction and literature review methods discussion conclusion acknowledgements references footnotes about the author(s) tasneem hassem department of psychology, faculty of human and community development, university of the witwatersrand, johannesburg, south africa citation hassem, t. (2021). establishing the content validity of an online depression screening tool for south africa. african journal of psychological assessment, 3(0), a62. https://doi.org/10.4102/ajopa.v3i0.62 original research establishing the content validity of an online depression screening tool for south africa tasneem hassem received: 30 may 2021; accepted: 17 aug. 2021; published: 26 oct. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract depression is a global concern as with an estimated 300 million individuals worldwide experiencing depression. in south africa, the prevalence rate of depression is estimated at 9.7% of the population. with the increase in mobile internet usage in south africa, an online depression screening tool could provide opportunities for the screening of depression symptoms aiding access to mental health interventions. this project identified an open access tool for screening depression, the center for epidemiologic studies depression scale – revised (cesd-r), and adapted it for online use by the adult south african population. this study followed on from the adaptation phase on the cesd-r and aimed to determine the content validity of the adapted cesd-r for online use in south africa using the consensus-based standards for the selection of health measurement instruments (cosmin) methodology. the study followed a two-phased design. study one utilised a qualitative approach, where 50 experts commented on the content validity of the tool. the results were used to further adapt the tool which resulted in a 20-item depression screening tool. study two followed a quantitative design in order to establish the content validity in terms of determining the content validity ratios, item-content validity index as well as the kappa statistic of the 20 items. based on these statistics, 19 of the 20 items were retained. overall, the adapted online depression screening tool displays good content validity and holds potential as a screening tool where access to mental health may be limited. keywords: cesd-r; content validity; online depression screening tool; mental health; south africa. introduction and literature review according to the world health organization’s (who, 2017) global health estimates for 2017, an estimated 300 million individuals have depression globally, accounting for 4.4% of the world’s population (who, 2017). nine per cent of individuals on the african continent suffer from depression (who, 2017). in south africa, a prevalence rate of 9.7% was attributed to major depression (tomlinson et al., 2009). the association between mental health and non-communicable diseases (ncd) has been highlighted by the who world mental health surveys (leentjens, 2010). depression is one of the mental health illnesses that have been found to be comorbid with ncd, such as cancer and diabetes, and respiratory and cardiovascular diseases, therefore interventions for depression are vital in controlling non-communicable diseases (caruso et al., 2017; leentjens, 2010; stein et al., 2019). the treatment of depression in south africa is often met with many obstacles, such as stigmas, lack of mental health facilities, lack of depression screening, and under-resourced hospital settings. in 2017, mental health facilities in south africa were limited with 4.33 beds per 100 000 population in general hospitals, and 16.56 beds per 100 000 in mental health hospitals (who, 2017). in 2019, south africa reported having 0.31 psychiatrists and 0.97 psychologists in the public sector per 100 000 population (docrat, besada, cleary, daviaud, & lund, 2019). as a result of the lack of or limited mental health resources in south africa, mental illnesses, such as depression, are often under-diagnosed and under-treated (nglazi et al., 2016). in an attempt to overcome the barriers to mental health access and care, researchers propose the use of digital technologies such as the internet, portable electronic devices and mobile applications (aguilera, 2015; cortelyou-ward, rotarius, & honrado, 2018; lal & adair, 2014; patel et al., 2018). online mental health screening provides easy and wide access; it is economicaland time-efficient, and allows for early detection of people at risk of depression (austin, carlbring, richards, & andersson, 2006; buchanan, 2003; donker, van straten, & cuijpers, 2010; lal & adair, 2014; patel et al., 2018). in south africa, 58.7% of individuals have mobile internet access and 63.0% of households have at least one member who has access to internet either at work, at home, a place of study or through an internet café (statistics south africa, 2019). the potential benefits of online mental health screening together with the access to benefits of online mental health screening suggest that online mental health resources could provide access to mental health services where they are limited and often inaccessible. this study focussed on adapting a depression screening tool for online usage. as part of the study, a systematic review was conducted to identify any existing, psychometrically sound, online depression screening tools for the general public of south africa. results indicated that the beck depression inventory-ii (bdi-ii), the center for epidemiology studies depression scale (cesd) and the patient health questionnaire (phq-9) were the most commonly utilised online depression screening tools, but only one depression screening tool was specifically designed for use by the general public. there were no screening tools specifically designed for the diverse groups in south africa (hassem & laher, 2019). based on the results of the systematic review, it was decided to adapt the cesd. the revised version of the cesd (cesd-r) is the most recent version of the cesd which reflects the depression symptoms in accordance with the diagnostic and statistical manual of mental disorders fourth edition (dsm-iv) (eaton, smith, ybarra, muntaner, & tien, 2004). the cesd-r website (the center for epidemiologic studies depression scale revised, n.d.) confirms that the symptoms as assessed are in accordance with symptoms of major depressive episode (mde) in the dsm-5. in addition, the cesd-r is the most recently developed depression screening tool of the three commonly utilised depression screening tools (bdi-ii, cesd and phq-9), and is available as an open access resource (the center for epidemiologic studies depression scale revised, n.d.). the adaptation of the cesd-r was grounded in the biopsychosocial-spiritual (bpss) model. the model recognises biological, psychological, social and spiritual as distinct dimensions which cannot be separated from the whole as the components are intertwined allowing for multifactorial understandings of mental illness aetiology (van rensburg, poggenpoel, myburgh, & szabo, 2015; sulmasy, 2002). the bpss is particularly salient to african contexts where mental health is said to be located in the relationship between the ancestors or spirits and human beings (meyer, moore, & viljoen, 2003; mufamadi & sodi, 2010). while spirituality and culture are relevant across the diverse south african population, screening tools for depression have not been adapted to account for these unique cultural and spiritual presentations of depression. the south africa population is made up of individuals from various ethnic and religious groups as well as being a multilingual country with 11 official languages. defining or translating the term ‘depression’ in african cultures is complex because it is not recognised within traditional african practices (ellis, 2003; patel, 2001; stafford, pedersen, van staden, & jäger, 2008). for example, in the isizulu language, the following terms are used to approximate depression: dangala (worn out of body and mind or dejected), khathele/ukukhathala (sense of worry and also conveys peace) and ukukhathazeha (conveys grief, worry, hurt, sadness as well as heartache). ukukhathazeha is also used in the isixhosa language and has the same meaning as in the isizulu language (ellis, 2003). the distress idiom called ‘thinking too much’ is also often associated with depression or as a symptom of depression (kaiser et al., 2015). in a study conducted in a small khwe community in kimberley, south africa, ‘thinking too much’ was associated with negative behavioural, emotional, social and somatic complaints (hertog, de jong, van der ham, hinton, & reis, 2016). ‘thinking too much’ is viewed by traditional healers as well as a sample of women living with human immunodeficiency virus (hiv) as a symptom of depression (andersen, kagee, o’cleirigh, safren, & joska, 2015; ellis, 2003). when reporting depression symptoms, studies have shown that individuals in african cultures tend to report more somatic than cognitive symptoms (andersen et al., 2015; mosotho, louw, calitz, & esterhuyse, 2008). this emphasis on somatic symptoms often compromises the diagnosis of depression in many primary care settings (mosotho et al., 2008). in studies conducted in south africa, women tended to have a higher prevalence rate of depression, but this could be attributed to gendered cultural beliefs. in the sesotho culture, for example, men are viewed as monna ke nlu ha alle, which implies that men should not display emotions of grief, sadness and depression (mosotho et al., 2008; tomlinson et al., 2009). in addition to cultural factors, english language proficiency needs to be considered. according to the general household survey (statistics south africa, 2018), english has been ranked as the sixth most common language spoken in south african homes (8.1%), and is ranked as the second most common language spoken outside south african homes (16.6%; spoken by 8.1% of individuals). given that most south africans are not english first language speakers, there is a need for a depression screening tool free of psychological jargon. hence, this was taken into account when adapting the cesd-r for south africa. the unique context of the online psychological assessment screening environment, where the typical face-to-face interaction is removed, necessitates for a number of ethical issues to be considered. draft ethical guidelines recommended by hassem and laher (2018) were utilised to ensure that the online screening tool was adapted accordingly. for example, if one is to include items of suicide ideation, designated resources should be available to contact individuals who are at risk of self-harm (hassem & laher, 2018). unfortunately, these resources are limited in south africa, and in order for these resources to be effective, various stakeholders would need to be involved in ensuring the effective use of such services. ethically, the risk is too high for an individual completing an online screening (not diagnostic) tool to include an item assessing suicidality. hence, this was excluded. the aim of this study was to determine the content validity of the adapted cesd-r for online depression screening for diverse groups in south africa. content validity refers to the extent to which items on a scale accurately measure a construct based on the conceptual definition of the construct (grahn & gard, 2008; lenz, 2010). based on the literature, establishing content validity is typically performed qualitatively with little consensus as to which method should be followed. the consensus-based standards for the selection of health measurement instruments (cosmin) were developed in part to create more rigorous, standardised guidelines for establishing content validity (terwee et al., 2018). the cosmin guidelines for content validation of a patient-reported outcome measure (prom) informed the design of this study (terwee et al., 2018). according to the cosmin guidelines, for a tool to display content validity, three broad domains need to be assessed. they are: (1) relevance (items are relevant for the measured construct as well as the context or specific population), (2) comprehensiveness (all aspects of the construct being measured are included) and (3) comprehensibility (all items can be easily understood by the target population) (terwee et al., 2018). thus, this study aimed to: (1) determine if the adapted online depression screening tool measures depression; (2) determine if the items of the tool are easily understood and are culturally appropriate for south africans and (3) determine whether the instructions, response format and instant feedback provided are appropriate. all participants had the right to decline participation. participation was anonymous and completely confidential as no identifying information was requested. all participants had the right to stop participation at any point and also had the right not to answer any of the questions. there were no potential risks or benefits to participation in the study. methods two studies were carried out in order to determine the content validation of the online depression screening tool. study one followed a qualitative research design, whereas study two followed a quantitative research design. in the following sections two, the studies are discussed in detail. study one test adaptation the initial step of study one involved the adaptation of the cesd-r items. this step followed the first three stages of the test development guidelines proposed by foxcroft (2018) and the international test commission guidelines for translating and adapting tests (itc, 2017). the adaptation of the cesd-r items during studies one and two is highlighted (table 1). for study one, cesd-r items were either rephrased or removed from the tool. three items were removed from the cesd-r, which included the two items which assessed suicide ideation. guided by the south african literature regarding depression (anderson et al., 2015; hertog et al., 2016; kaiser et al., 2015; mosotho et al., 2008), three items were added to the tool (see table 2). the online adapted cesd-r for study one consisted of 20 items, which were free of psychological jargon. the response format (not at all, 1 to 2 days, 3 to 4 days, 5 to 7 days and nearly every day for 14 days) and time period (2 weeks) that symptoms are experienced were not changed from the cesd-r. table 1: item adaptation for study one and study two. table 2: demographic characteristics of study one and study two. participants individuals were invited to participate in the study based on their expertise, the field of depression screening and depression. a non-probability, purposive sample of 50 mental healthcare personnel participated in the study (patton, 1990). the participant demographics for study one are highlighted (table 2). the majority of the sample was psychiatrists (n = 15), followed by psychologists (research, clinical and counselling psychologists) (n = 14) and psychology honours1 students enrolled for a psychological assessment module (n = 13). seven of the participants described their occupations as ‘other with nurses, religious leaders, a paediatrician and a campus coordinator being specified. the majority of the participants stated that they had been practising in their respective fields for more than 10 years (n = 19), followed by 14 participants practising for less than 10 years. women made up the majority of the participants (n = 43). half of the participants identified as white people (n = 25). most participants identified as being christian (n = 24), followed by muslim and no religious affiliation (n = 13 and n = 8, respectively). thirty-six participants identified english as being their home language, while afrikaans was spoken by eight individuals and an african language (isizulu, ndebele, sepedi, setswana & swati) was spoken by six of the participants as a home language (see table 3). table 3: questions provided to participants and the frequencies of responses in study one. just over half of the sample (n = 26) diagnosed depression2 in their capacity as a psychologist or psychiatrist, with the majority diagnosing depression at least weekly (n = 17). interesting to note was that, out of the 26 individuals who stated that they diagnose depression, only 16 of these individuals had utilised a depression screening tool (see table 2). the most cited screening tool utilised was the bdi-ii, followed by the phq and the hamilton-d (ham-d). instruments participants were required to complete a brief demographic questionnaire requesting information on occupation, number of years practising, gender, population group, religious affiliation, home language, frequency in diagnosing depression as well as previous experience using a depression screening tool. once the demographic questions were completed, participants were presented with a page detailing the content validation instructions. participants were required to only read the instructions, items, scoring as well as feedback of the adapted cesd-r, and not to complete the tool. this was followed by 10 (yes or no response format) questions which were informed by the cosmin evaluation of content validity of a prom. participants were given the choice to provide additional comments in relation to the 10 questions. given the unique nature of the tool where instant feedback would be provided to the individual, two specific questions assessing the appropriateness of the scoring and feedback of the tool were included. procedure once items were finalised, an email was circulated to various south african healthcare professionals listed above (see table 2) inviting them to participate in the study. experts were identified based on their clinical and psychometric experience in the field of mental health and depression. the email described the nature of participation and contained a web link to the actual tool and content validation questions. once participants clicked on the link (generated from surveymonkey), a participant information sheet appeared detailing the nature of the study. the survey was anonymous and, on average, took between 10 and 15 min to complete. data collection took place electronically between july and october 2019 through an online survey tool. data analysis data were downloaded from the online tool and coded for analysis. demographic variables and yes or no response options were analysed using frequencies on statistical package for the social sciences (spss) version 25 (ibm corp, 2017), and qualitative data were analysed using thematic analysis as specified by braun and clarke (2006). results of study one results are discussed in terms of the three broad themes recommended in the cosmin guidelines that are relevance and comprehensiveness, comprehensibility and scoring and feedback. relevance and comprehensiveness because of the open-ended nature of the questions, various themes regarding the relevance of the tool emerged, namely, depression according to the dsm criteria, appropriateness of the items, time period, response format as well as the cultural applicability of the items. depression according to the dsm criteria the majority of the participants indicated that the tool does appear to measure depression (n = 46) (see table 3), which is echoed through the following comment made by participant 40: ‘it definitely does measure depression according to the dsm 5 framework’. two participants noted that the items provided on the tool measured depression according to the dsm-5 criteria for depression, while three participants stated that not all the dsm symptoms of depression were measured by the tool. one participant specifically commented that the dsm criteria do not include local idioms of distress or depression, while two participants indicated that suicide ideation had not been included in the tool. it was noted by one of the participant that the tool measured depression similar to the k10 (kessler psychological distress scale) and cesd (original version of the online adapted cesd-r). one participant specifically cautioned against using the tool as a diagnostic measure. appropriateness of items as evident in table 3, the majority (n = 31) of the participants felt that the items were appropriate for the south african context. in order to accommodate the african view of the self, participants were asked if items need to be rephrased to ‘my family and friends’. results indicate that the majority (n = 24) of the participants suggested that items do not need to be rephrased (see table 3). item appropriateness can be divided into three sub-themes, namely, items to be added, items to be removed and items to be rephrased or re-considered (see table 3). items to be added: five participants noted that weight gain and increased appetite are symptoms of depression that are commonly overlooked. therefore, participants recommended that these two items should be added to the tool. participant 9 expressed this as follows: ‘some patients eat less and lose weight whilst others eat more and gain weight (this is not dependent on depression type). asking if there has been a change in weight can be very helpful in distinguishing depression severity.’ (participant 9, female, researcher) four participants felt that an item targeting indecisiveness should be added as it is a common cognitive symptom of depressed patients. in addition, suicide ideation items were recommended to be included by five participants as they are symptom criteria for depression, according to the dsm. items to be removed: two participants suggested removing the items: ‘i feel like i am moving too slow’ and ‘i have the need to play with my fingers or move around for no reason’ (psychomotor agitation) (see table 1), as this is less common in depressed adults. one participant felt item 18 was unclear and another stated that item 17 was unclear, but they did not suggest removing these items. two participants suggested that there were too many sleep items and some should be removed or replaced with another item. item 20 (‘i feel bewitched almost all of the time’) was viewed as being the most inappropriate item (n = 11), implying that the item should be removed. items that need to be reconsidered or rephrased: participants felt that items needed to be rephrased so that the tenses would be consistent throughout the tool. a suggestion was made that the items need to be edited for grammatical accuracy. cultural applicability of the tool this theme is closely related to the theme of item appropriateness as the item, ‘i feel bewitched almost all of the time’, was the most commonly cited item that was inappropriate. this was because the term ‘bewitched’ is not culturally appropriate and the term was not clearly understood. participants suggested that the item questions the cultural aetiology of depression in a tool which needs to relate to the experience of depression. in addition, participants suggested that the term ‘bewitched’ can have both negative and positive connotations. participants suggested that if a culturally fair item is to be added, it should have follow-up questions. time period with regards to the time frame of ‘two weeks’, three participants noted that the time frame needed to be reconsidered as the symptoms experienced could be a result of life changes or a traumatic event, as indicated by participant 1: ‘two weeks is a problematic time period. a person could have undergone a life change or trauma and may not have major depressive disorder (mdd) but would meet the criteria based on the last two weeks. this could result in an implied misdiagnosis.’ (participant 1, female, psychologist) in addition to the change in the time frame of the symptoms experienced, the attribution of symptoms to life changes was also found in the narratives of four other participants. these participants highlighted that symptoms could be attributed to various medical conditions (chronic conditions or stomach bugs) as well as the lifestyle the individual leads (symptoms of sleep attributed to exhaustion). therefore, these suggestions showed the need for a statement ruling out medical conditions, lifestyle, as well as life changes causing the symptoms experienced. response format although no specific questions targeted the response format, six participants made comments which related to it. one participant specifically suggested that it might not be understood by the general public, while four participants suggested that the response format did not match well with the items, recommending that either the items be rephrased or the response format be changed for specific items. one participant felt that placing a time frame in the response format was inappropriate. lastly, two participants suggested that the responses can yield many false positives or negatives or be easily faked. the use of reverse scoring as well as a control question was suggested in order to prevent faking. comprehensibility comprehensibility of the tool is discussed in terms of appropriateness of the instructions, effectiveness of the language used, length of the tool and item order. appropriateness of instructions provided as evidenced in table 3, a majority of the participants (n = 47) felt that the instructions provided were appropriate. five participants recommended specific changes to the instructions, while one participant suggested that the last statement in the instructions (‘ask a friend or family member to assist you’) could deter individuals from taking the test and would need further consideration. two of the participants suggested grammatical changes be made to the instructions provided. one participant suggested that a sentence encouraging individuals to complete all items be added. effectiveness of the language used a majority of the participants (n = 49) felt that the tool will be easily understood by diverse groups in south africa (see table 3), as echoed by the following participant phrases ‘simple english’, ‘does not contain any difficult words’, ‘very simplistic words’ and ‘simply put and easy to understand’. three participants made reference to the general public of south africa not being english first language speakers and said it was imperative to ensure that all south africans understand the items. these three participants suggested that the tool be translated into african languages, with one participant highlighting caution when doing translations. length of the tool and item order all participants who commented on the length of the tool felt that it was appropriate for an individual who is depressed. this sentiment is represented by the following comment by participant 9: ‘the short length of the questionnaire is a strength of the tool as someone with moderate to severe depression will struggle to complete tasks’. only two participants mentioned that the order of the items needs to be reconsidered. participants suggested that items measuring similar symptoms do not appear together and to prioritise symptoms specific to the south african context in the ordering of items. scoring and feedback provided although scoring and feedback does not form part of the cosmin criteria for content validity, it is a vital point to discuss as the end-users of the tool will ultimately receive immediate feedback in a non-typical, face-to-face interaction. table 3 shows that it is evident that the majority of participants felt that the scoring and feedback provided are appropriate (n = 41 and n = 35, respectively). only five participants commented on the scoring of the tool, with three participants suggesting that the scoring was not transparent as they did not know the score that was attributed to each response and the cut-offs were not mentioned upfront. comments made regarding the feedback given to participants all emphasised the need to highlight the urgency of seeking help when individuals receive a high score on the tool. in addition, participants stated that the feedback was calmly stated so that it should not further perpetuate depression symptoms. participants indicated that individuals who do not meet the criteria for depression should also be offered contact details if they feel the need to seek help. lastly, participants felt that ‘consulting with your family practitioner’ should be the first point of contact that needs to be included. study two participants a non-probability, purposive sample of 21 mental healthcare personnel participated in the study (patton, 1990), with a majority of the sample represented by psychologists (n = 16). a greater part of the participants stated that they had been practising in their respective fields between 10 and 20 years (n = 10), with only one participant practising for under 10 years. a majority of the sample were females (n = 17). just over half of participants identified as belonging to the white population group (n = 11) and being christian (n = 11). english was noted as the dominant home language used by participants (n = 16) (see table 2). only eight participants had not previously diagnosed depression, with the majority of participants (n = 6) diagnosing depression weekly. lastly, a larger part of the participants (n = 15) had previously utilised a depression screening tool. the most commonly cited screening tool used was the bdi-ii followed by the ham-d. procedure the first step of this study involved item refinement based on the results of study one. weight, eating patterns and sleep items were represented by one item each which accounted for either side of the scale (weight gain or loss, increased or decreased appetite and less or more sleep) (see table 1). the item regarding bewitchment was removed from the scale based on the results of study one. items representing self-blame, loneliness and guilt were added. therefore, a total of 20 items were included in study two. given the occurrence of violence and traumatic events in south africa as well as comments made in study one regarding the time period, this was changed to a 2-month period of symptoms experienced as this will rule out symptoms experienced because of these events which could result in an overdiagnosis of depression. lastly, based on the feedback received regarding the response format in study one, the response format was changed to: ‘not at all’, ‘some of the time’, ‘most of the time’ and ‘all of the time’. data collection and instruments participants were required to complete the same brief demographic questionnaire as per study one. once the demographic questions were completed, participants were presented with a page detailing the content validation instructions. a 3-point likert scale was used to assess the relevance of each of the 20 items included in the screening tool (1 = item is not essential; 2 = item is useful but not necessary and 3 = item is essential for diagnosing depression). participants were also asked to rate the response format used as being ‘relevant’, ‘somewhat relevant’ or ‘not relevant’. lastly, participants were asked for any additional feedback or input on the tool. the questionnaire took, on average, 10 min to complete. data collection took place electronically between october and december 2019 through an online survey tool. data analysiss data were downloaded from the online tool and coded for analysis. demographic variables were analysed using frequencies. there are various statistical calculations for content validity, which either provide a consensus estimate or a consistency estimate. most commonly used in the field of psychology is the content validation ratio (cvr) proposed by lawshe (1975). there is debate in the literature as to which cvr value to utilise thus, at a 5% level of significance with 21 experts, a cvr value lower than 0.359 or 0.429 would exclude an item (ayre & scally, 2014; wilson, pan, & schumsky, 2012). polit et al. (2007) recommended the use of the item-content validation index (i-cvi), scale-content validation index (s-cvi) as well as the kappa statistic. therefore, this study used the cvr, i-cvi, s-cvi (average) as well as the kappa statistic to investigate content validity (polit et al., 2007). with regards to the i-cvi values, any values under 0.70 should be eliminated, values between 0.70 and 0.79 required some revision and any value of at least 0.78 is considered as appropriate. the kappa statistic values between 0.40 and 0.59 were considered fair, 0.60 and 0.74 were good and 0.75 and higher were considered excellent (polit et al., 2007). for the development of a new tool, the average s-cvi of 80% or above is considered acceptable (davis, 1992). in order to compute the i-cvi and kappa values, response options 2 (‘item is useful but not necessary’) and 3 (‘item is essential for diagnosing depression’) were combined and represented as a relevant item and the response option 1 (‘item is not essential’) represented a non-relevant item. results of study two when looking at items rated as essential by the experts, only three items (3, 8 and 9) were rated by all the 21 experts as being essential. fifteen items received an essential rating by the majority of the experts (n > 10) (see table 4). with regards to experts rating items as relevant, 12 items received a relevant rating by all experts. item 11 (‘i have been feeling happy’) received the lowest number of experts rating the item as relevant (n = 13). table 4: content validation statistics of study two. the cvr calculations had a very wide range from −0.524 to 1. both i-cvi and kappa ranged from 0.62 to 1. the lowest cvr value was obtained on item 6, while the lowest i-cvi and kappa values were obtained for item 11. it is evident that all 20 items included in the tool were considered as good items with the exception of item 11. item 11 did not meet two out of the three content validity criteria (cvr and i-cvi). the kappa value was 0.62, which indicated the item was good. using the guidelines in the literature (ayre & scally, 2014; polit et al., 2007; wilson et al., 2012), 10 items (3, 8, 9, 14, 19, 4, 17, 5, 20 and 21) met all the content validity criteria (cvr, i-cvi and kappa). items 3, 8, 9 and 14 obtained absolute scores on all three criteria (see table 4). ten items did not meet the cvr criteria for inclusion (16, 18, 21, 7, 15, 10, 2, 1, 11 and 6), but nine of these items met the i-cvi and kappa criteria for an excellent item. three (16, 18 and 21) out of the 11 items that did not meet the cvr criteria received absolute scores on the i-cvi and kappa scores. the average s-cvi score for the tool was 0.94. the majority of participants (n = 18) felt that the adapted time period and response format were relevant. discussion two studies were used to determine the content validity of an adapted online depression screening tool. for study one, 50 mental health experts unanimously agreed that the tool was valid in terms of the instructions, items, scoring and feedback provided. however, various suggestions were made to improve the quality of the tool. for study two, the recommendations suggested during study one were undertaken and the tool was revised. in terms of the cosmin criteria for content validity, the adaption of the depression screening tool for online usage by diverse groups in south africa shows good relevance. a follow-up content validation was conducted with 21 experts in the field and three independent researchers. nineteen of the experts in study two (final tool) felt that all the items on the tool were relevant for screening of depression, with the exception of item 11 (‘i have been feeling happy’). only 13 experts felt item 11 was relevant. this could be attributed to the item being reversed scored. therefore, it was decided that this item would be excluded from the final tool. the average s-cvi further highlights the content validity of the overall tool by achieving an average s-cvi score of higher than 80% (davis, 1992). four south african idioms of distress and depression were added to the tool, namely, ‘i have been experiencing more body aches and pains (e.g. headaches, neck pain or back pain)’, ‘i have been thinking too much’, ‘i have been feeling alone’ and ‘i have not felt like myself’. three of these items did not meet the cvr criteria; however, both the i-cvi and kappa criteria were met. these items received a low cvr as a result of few experts rating the items as essential in depression screening;, however, many experts indicated that the time was useful in depression screening, hence the items were retained. the nine items which were rephrased from the cesd-r and which represented the dsm 5 depression criteria received absolute i-cvi and kappa scores, and met the cvr criteria. items 7 and 15 did not appear on the original cesd-r, but were recommended by experts in study one to be included in the screening tool. these items did not meet the cvr criterion, but met the i-cvi and kappa criteria. it should be noted that these two items assessed the symptom criteria for depression as highlighted in the dsm 5 (focus and indecisiveness). this study demonstrated that a 19-item adapted online depression screening tool displays relevance in terms of the construct being measured and the appropriateness of the target population, and context for which it is intended. there are limitations, in that content validation is a subjective view of experts and therefore could result in bias when items are rated; however, the two studies with various experts and both qualitative and quantitative indicators reduced this risk. it is acknowledged that the pool of experts is small and does not represent the various language groups present in south africa. going forward, it would be necessary to establish the content validity in terms of comprehensiveness and comprehensibility ratings as described by cosmin. these ratings are dependent on the target population views and pilot testing of the tool and not the development stage of the tool. in addition, the construct and criterion validity as well as reliability of the tool would need to be assessed. conclusion this study highlighted that the adapted online depression screening tool designed for diverse groups in south africa shows good relevance in terms of content validity. in addition, the tool was phrased using simple language free from psychological jargon, which is hoped would encourage better understanding by second language english speakers. screening individuals for depression allows for early detection of their depression risk and has much to offer for the under-resourced south african mental healthcare landscape. the online screening of depression allows for early detection and self-help options for depression. further, it empowers individuals to discuss their symptoms from a better knowledge base with a doctor or other healthcare professional allowing for intervention much sooner than would have been the case if the individual had no support or no means of checking their symptoms and of accessing information on depression. the tool, therefore, has the potential to be incorporated as a screening tool for depression across standard platforms in university counselling centres, primary health care intake forms or even on web platforms such as the south african depression and anxiety support group (sadag). acknowledgements competing interests the author declares that she has no financial or personal relationships that may have inappropriately influenced her in writing this article. author’s contributions t.h. is the sole author and was responsible for the conceptualisation, data collection and analysis as well as write up for the article. ethical considerations the study was approved by the human research ethics committee – medical (hrecm) of the university of the witwatersrand, reference number: m180402. funding information this work is based on the research supported wholly or in part by the national research foundation of south africa (grant number: 112948). data availability data sharing is not possible, due to the nature of the data and related ethical principles. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author, and the publisher. references aguilera, a. (2015). digital technology and mental health interventions: opportunities and challenges. arbor, 191(771), a210–a210. https://doi.org/10.3989/arbor.2015.771n1012 andersen, l., kagee, a., o’cleirigh, c., safren, s., & joska, j. (2015). understanding the experience and manifestation of depression in people living with hiv/aids in south africa. aids care, 27(1), 59–62. https://doi.org/10.1080/09540121.2014.951306 austin, d.w., carlbring, p., richards, j.c., & andersson, g. (2006). internet administration of three commonly used questionnaires in panic research: equivalence to paper administration in australian and swedish samples of people with panic disorder. international journal of testing, 6(1), 25–39. https://doi.org/10.1207/s15327574ijt0601_2 ayre, c., & scally, a.j. (2014). critical values for lawshe’s content validity ratio: revisiting the original methods of calculation. measurement and evaluation in counseling and development, 47(1), 79–86. https://doi.org/10.1177/0748175613513808 braun, v., & clarke, v. (2006). using thematic analysis in psychology. qualitative research in psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa buchanan, t. (2003). internet-based questionnaire assessment: appropriate use in clinical contexts. cognitive behaviour therapy, 32(3), 100–109. https://doi.org/10.1080/16506070310000957 caruso, r., nanni, m.g., riba, m., sabato, s., mitchell, a.j., croce, e., & grassi, l. (2017). depressive spectrum disorders in cancer: prevalence, risk factors and screening for depression: a critical review. acta oncologica, 56(2), 146–155. https://doi.org/10.1080/0284186x.2016.1266090 cortelyou-ward, k., rotarius, t., & honrado, j.c. (2018). using technology to improve access to mental health services. the health care manager, 37(2), 101–108. https://doi.org/10.1097/hcm.0000000000000211 davis, l.l. (1992). instrument review: getting the most from a panel of experts. applied nursing research, 5(4), 194–197. https://doi.org/10.1016/s0897-1897(05)80008-4 docrat, s., besada, d., cleary, s., daviaud, e., & lund, c. (2019). mental health system costs, resources and constraints in south africa: a national survey. health policy and planning, 34(9), 706–719. https://doi.org/10.1093/heapol/czz085 donker, t., van straten, a., & cuijpers, p. (2010). internet-based mental health screening. in j. bennett-levy, d. richards, p. farrand, h. christensen, k. griffiths, d. kavanagh, … c. williams (eds.), oxford guide to low intensity cbt interventions (pp. 241–245). oxford university press. eaton, w.w., smith, c., ybarra, m., muntaner, c., & tien, a. (2004). center for epidemiologic studies depression scale: review and revision (cesd and cesd-r). in m.e. maruish (ed.), the use of psychological testing for treatment planning and outcomes assessment: instruments for adults (pp. 363–377). lawrence erlbaum associates publishers. ellis, c.g. (2003). cross-cultural aspects of depression in general practice. south african medical journal, 93(5), 342. foxcroft, c. (2018). developing a psychological measure. in c. foxcroft & g. roodt (eds.), introduction to psychological assessment in the south african context (5th ed.). cape town, south africa: oxford university press. grahn, b., & gard, g. (2008). content and concurrent validity of the motivation for change questionnaire. journal of occupational rehabilitation, 18(1), 68–78. https://doi.org/10.1007/s10926-008-9122-7 hassem, t., & laher, s. (2018, august21–24). ethics of online screening for mental illnesses: a systematic review. paper presented at the world congress of psychiatry conference on psychiatry and mental health: global inspirations, locally relevant inspirations, lisbon. hassem, t., & laher, s. (2019). a systematic review of online depression screening tools for use in the south african context. south african journal of psychiatry, 25(1), 1–8. https://doi.org/10.4102/sajpsychiatry.v25i0.1373 hertog, t.n., de jong, m., van der ham, a.j., hinton, d., & reis, r. (2016). “thinking a lot” among the khwe of south africa: a key idiom of personal and interpersonal distress. culture, medicine, and psychiatry, 40(3), 383–403. https://doi.org/10.1007/s11013-015-9475-2 ibm corp. (2017). ibm spss statistics for windows, version 25.0. ibm corp. international test commission. (2017). the itc guidelines for translating and adapting tests (2nd ed.). retrieved from www.intestcom.org kaiser, b.n., haroz, e.e., kohrt, b.a., bolton, p.a., bass, j.k., & hinton, d.e. (2015). “thinking too much”: a systematic review of a common idiom of distress. social science & medicine, 147, 170–183. https://doi.org/10.1016/j.socscimed.2015.10.044 lal, s., & adair, c.e. (2014). e-mental health: a rapid review of the literature. psychiatric services, 65(1), 24–32. https://doi.org/10.1176/appi.ps.201300009 lawshe, c.h. (1975). a quantitative approach to content validity. personnel psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x leentjens, a.f. (2010). [review of the book: global perspectives on mental-physical comorbidity in the who world mental health surveys, edited by m. r. von korff, k. m. scott, & o. gureje]. cambridge university press. 2009. psychological medicine, 40(7), 1226–1227. https://doi.org/10.1017/s0033291710000632 lenz, e.r. (2010). visual analog scales. in c.f. waltz, o.l. strickland, & e.r. lenz (eds.), measurement in nursing and health research (pp. 319–325). springer. meyer, w., moore, c., & viljoen, h. (2003). personology: from individual to ecosystem (3rd ed.). sandton, south africa: heinemann publishers (pty) ltd. mosotho, n.l., louw, d.a., calitz, f.j., & esterhuyse, k.g. (2008). depression among sesotho speakers in mangaung, south africa. african journal of psychiatry, 11(1), 35–43. https://doi.org/10.4314/ajpsy.v11i1.30253 mufamadi, j., & sodi, t. (2010). notions of mental illness by vhavenda traditional healers in limpopo province, south africa. indilinga african journal of indigenous knowledge systems, 9(2), 253–264. nglazi, m.d., joubert, j.d., stein, d.j., lund, c., wiysonge, c.s., vos, t., … bradshaw, d. (2016). epidemiology of major depressive disorder in south africa (1997–2015): a systematic review protocol. bmj open, 6(7), e011749. https://doi.org/10.1136/bmjopen-2016-011749 patel, v. (2001). cultural factors and international epidemiology: depression and public health. british medical bulletin, 57(1), 33–45. https://doi.org/10.1093/bmb/57.1.33 patel, v., saxena, s., lund, c., thornicroft, g., baingana, f., bolton, p., … unützer, j. (2018). the lancet commission on global mental health and sustainable development. the lancet, 392(10157), 1553–1598. https://doi.org/10.1016/s0140-6736(18)31612-x patton, m. (1990). qualitative evaluation and research methods (2nd ed.). sage. polit, d.f., beck, c.t., & owen, s.v. (2007). is the cvi an acceptable indicator of content validity? appraisal and recommendations. research in nursing & health, 30(4), 459–467. https://doi.org/10.1002/nur.20199 sadag, see south african depression and anxiety group. retrieved from https://www.sadag.org/ stafford, g.i., pedersen, m.e., van staden, j., & jäger, a.k. (2008). review on plants with cns-effects used in traditional south african medicine against mental diseases. journal of ethnopharmacology, 119(3), 513–537. https://doi.org/10.1016/j.jep.2008.08.010 statistics south africa. (2018). general household survey. retrieved from http://www.statssa.gov.za/publications/p0318/p03182018.pdf statistics south africa. (2019). general household survey. retrieved from http://www.statssa.gov.za/publications/p0318/p03182019.pdf stein, d.j., benjet, c., gureje, o., lund, c., scott, k.m., poznyak, v., & van ommeren, m. (2019). integrating mental health with other non-communicable diseases. bmj, 364, 1295. https://doi.org/10.1136/bmj.l295 sulmasy, d.p. (2002). a biopsychosocial-spiritual model for the care of patients at the end of life. the gerontologist, 42(suppl_3), 24–33. https://doi.org/10.1093/geront/42.suppl_3.24 terwee, c.b., prinsen, c.a., chiarotto, a., westerman, m. j., patrick, d.l., alonso, j., … mokkink, l.b. (2018). cosmin methodology for evaluating the content validity of patient-reported outcome measures: a delphi study. quality of life research, 27(5), 1159–1170. https://doi.org/10.1007/s11136-018-1829-0 the center for epidemiologic studies depression scale revised. (n.d.). cesd-r website. rerieved from https://cesd-r.com/ tomlinson, m., grimsrud, a.t., stein, d.j., williams, d.r., & myer, l. (2009). the epidemiology of major depression in south africa: results from the south african stress and health study. south african medical journal, 99(5), 365–373. van rensburg, a.j., poggenpoel, m., myburgh, c.p.h., & szabo, c.p. (2015). defining and measuring spirituality in south african specialist psychiatry. journal of religion and health, 54(5), 1839–1855. https://doi.org/10.1007/s10943-014-9943-y wilson, f.r., pan, w., & schumsky, d.a. (2012). recalculation of the critical values for lawshe’s content validity ratio. measurement and evaluation in counseling and development, 45(3), 197–210. https://doi.org/10.1177/0748175612440286 world health organisation (who). (2017). depression and other common mental disorders: global health estimates. retrieved from http://apps.who.int/iris/bitstream/10665/254610/1/who-msd-mer-2017.2-eng.pdf footnotes 1. the honours degree in south africa is a postgraduate qualification that follows on from an undergraduate degree and precedes a master’s degree 2. depression in south africa is diagnosed by clinical, counselling and educational psychologists, general practitioners as well as psychiatrists. abstract introduction method results discussion conclusion acknowledgments references about the author(s) itai propheta department of psychology, university of johannesburg, johannesburg, south africa casper j.j. van zyl department of psychology, university of johannesburg, johannesburg, south africa citation propheta, i. & van zyl, c.j.j. (2019). measuring cognitive emotion regulation in south africa using the cognitive emotion regulation questionnaire-short form. african journal of psychological assessment, 1(0), a9. https://doi.org/10.4102/ajopa.v1i0.9 original research measuring cognitive emotion regulation in south africa using the cognitive emotion regulation questionnaire-short form itai propheta, casper j.j. van zyl received: 25 jan. 2019; accepted: 20 mar. 2019; published: 18 apr. 2019 copyright: © 2019. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract cognitive emotion regulation plays an important role in how people manage stressful life events. some strategies are adaptive, while others are maladaptive and linked to several forms of psychopathology. the cognitive emotion regulation questionnaire (cerq)-short form measures an individual’s proclivity to use different strategies in response to longer term stressors. the cerq-short was developed in the netherlands, and although it has been standardised in several countries, it is yet to be validated for use in south africa. the aim of this study was to evaluate the psychometric properties of the cerq-short within the south african context. the study was conducted at a large urban university in the gauteng province of south africa. the above was considered on the basis of a reliability analysis and an investigation into the confirmatory factor structure of the cerq-short using data from a group of urban south african university students (n = 1904). with some exceptions, results indicated acceptable reliability for the scales ranging between 0.58 and 0.82. confirmatory factor analysis found reasonable support for a basic nine-factor model. the measurement properties of the cerq-short were found to be weaker in south africa compared to that reported in its country of origin. but it was nonetheless found to hold promise for use in our multicultural and multilingual context. in particular, it may be useful for research studies where brevity is called for. keywords: cerq-short; cognitive emotion regulation; psychometric properties; reliability; validity. introduction the influence of emotions on our daily lives can hardly be overstated. they directs attention in our environment, facilitate decision-making, shape behavioural responses and impact memory formation to highlight a few functions (gross, 2014). such examples show that everyday functioning requires all individuals to engage in some minimum level of emotion regulation all the time (davidson, 1998). while wonderfully useful for adaptive functioning in general, emotions can also be the cause of substantial harm. indeed, the inability to regulate emotions can produce serious disruptions to adaptive psychological functioning (koole, 2009). failure to regulate emotions effectively has been implicated in the emergence and maintenance of several forms of psychopathology, including depression and anxiety (bebko, ochsner, franconeri, & chiao, 2014), schizophrenia (gross & jazaieri, 2014), social anxiety (klemanski, curtiss, mclaughlin, & nolen-hoeksema, 2017), eating disorders (mallorqui-bague et al., 2017), personality disorders (fitzpatrick, khoury, & kuo, 2018) and mood disorders (gruber, hay, & gross, 2014; joormann & siemer, 2004). efforts to better understand this aspect of our mental landscape are therefore not trivial. given the salience of emotion regulatory processes in psychological health, it has become a major field of research in recent years (gross, 2014). naturally, many measures seeking to measure individual differences in emotion regulation have been developed in the process. examples include the emotion control questionnaire (ecq) (roger & nesshoever, 1987), the emotion regulation questionnaire (erq) (gross & john, 2003), difficulties in emotion regulation scale (ders) (gratz & roemer, 2004), regulation of emotions questionnaire (req) (phillips & power, 2007), affective style questionnaire (asq) (hofmann & kashdan, 2010), the emotion regulation profile-revised (erp-r) (nelis, quoidbach, hansenne, & mikolajczak, 2011), the emotion regulation of others and self (eros) (niven, totterdell, stride, & holman, 2011) and the state difficulties in emotion regulation scale (s-ders) (lavender, tull, dilillo, messman-moore, & gratz, 2015). while this list is not exhaustive, it showcases a variety of measures that have been developed in an effort to investigate the antecedents and consequences of emotion regulation, or components thereof (john & eng, 2014). importantly, john and eng (2014) differentiated among three individual difference approaches to the measurement of emotion regulation. the first approach represents measures based on gross’s (1998) process model, the second on coping with stressors and the third is focused on emotional competences. the focus of the present study is on the cognitive emotion regulation questionnaire-short (cerq-short). the cerq is considered a measure of the second type, as its focus is on coping, or managing emotions in regards to stressors over a longer term (garnefski, kraaij, & spinhoven, 2001; john & eng, 2014). this approach is differentiated from the other two approaches, as it does not focus on changing immediate behavioural–expressive components of affect (approach 1), and is not focused on emotional competences aimed at appropriate socio-emotional behaviour (approach 3; john & eng, 2014). garnefski et al.’s (2001) model for the cerq categorises cognitive emotion regulation into nine different strategies, namely self-blame, other-blame, positive reappraisal, rumination, catastrophising, putting-into-perspective, positive refocusing, acceptance and refocus on planning (garnefski et al., 2001). these cognitive emotion regulation strategies have all been linked to the presence or absence of psychopathology. while some strategies are adaptive (putting-into-perspective, positive refocusing, positive reappraisal, acceptance and refocus on planning), others are considered maladaptive (self-blame, blaming others, rumination and catastrophising), and are associated with psychological distress and psychopathology (aldao, 2012; garnefski et al., 2002; garnefksi & kraaij, 2007; jermann, van der linden, d’acremont, & zermatten, 2006; martin & dahlen, 2005). in contrast, the adaptive strategies have been positively associated with higher levels of optimism and self-confidence and correlated negatively with several psychopathologies (garnefski et al., 2002). the original cerq (garnefski et al., 2001) contains 36 items, although a shorter 18-item version was created some years later (garnefski & kraaij, 2006), with two items per subscale instead of four. the cerq-short is the focus of the present study. for the original version, support for construct validity came from a principal components analysis with nine factors extracted, consistent with the hypothesised model (garnefski & kraaij, 2007; garnefski et al., 2001). principal components analysis was similarly performed on the shortened version (garnefski & kraaij, 2006) to again assess construct validity. confusingly, the authors used oblimin and varimax rotations inconsistently across these studies with no clear rationale provided for either option, although one would theoretically expect the factors to be somewhat correlated. cronbach’s alpha reliability ranged between 0.75 and 0.87 for the four-item version and between 0.68 and 0.81 for the short version (garnefski & kraaij, 2006). slightly weaker alpha reliabilities were reported in an earlier study for the full version (garnefski et al., 2001). objectives of the study the objective of the present article is to investigate the psychometric properties of the cerq-short in south africa. there is a dearth of validation research on measures of cognitive emotion regulation in south africa. in fact, a search of the literature found no such research conducted in this context. while previous findings seemed promising in its country of origin, the extent to which the cerq-short functions adequately in south africa needs to be examined, given the extreme diversity of the population. as such, the psychometric properties of the cerq-short are investigated in this article, with specific focus on the reliability and factor structure (garnefski & kraaij, 2006) of the measure. method participants the study population comprised 1904 undergraduate students (mean = 20 years, standard deviation [sd] = 2.5 years) studying psychology at a large urban university in the gauteng province of south africa, who were invited to participate voluntarily. the data were not stratified in any other way. most of the participants were women (n = 1447, 76%). from an ethnic perspective, the majority were black participants (76%, n = 1446), followed by white participants (11.7%, n = 222). home language representation were as follows: afrikaans (4.5%), english (22.7%), isindebele (11.3%), isixhosa (5.8%), isizulu (19.8%), sepedi (11.9%), sesotho (7.9%), setswana (10.5%), siswati (5%), tshivenda (3.3%), xitsonga (6.5%) and unspecified (0.8%). instruments the cerq-short (garnefski & kraaij, 2006) is an 18-item scale used to measure an individual’s cognitive emotion regulation strategies in relation to negative or unpleasant events that are experienced. participants have to indicate how they respond and what they think about when they experience such an event. for example, items on the acceptance scale include ‘i think that i have to accept that this has happened’ and ‘i think that i have to accept the situation’. participants respond to questions on a five-point likert scale from 1 = almost never to 5 = almost always. the 18 items on the scale are divided into nine different subscales with each scale consisting of two items. the individual scale scores are calculated by adding the scores belonging to each subscale (ranging from 2 to 10). the higher the score on a subscale, the more it is used as a specific cognitive strategy. in comparison to the original cerq, the cerq-short demonstrated suitable reliability, with alpha scores ranging from 0.68 to 0.81, and principal components analysis has found support for the separation of items into the same original scales (garnefski & kraaij, 2006). data analysis reliability analysis given the likert-type responses of the cerq-short, reliability estimates were computed from a polychoric correlation matrix. previous research has shown that pearson correlations tend to underestimate relationships among ordered categorical variables and recommend using polychoric correlations for more precise reliability estimation (gadermann, guhn, & zumbo, 2012; zumbo, gadermann, & zeisser, 2007). cronbach’s alpha and revelle’s beta coefficients were computed. the latter can be considered an estimate akin to the worse possible split-half reliability, and also provides an indication of the amount of general factor variance in a test (revelle & condon, 2018). confirmatory factor analysis we investigated the internal construct validity of the measure with confirmatory factor analysis (cfa). two models were tested, a nine-factor model, based on the theoretical model (garnefski & kraaij, 2006), and a nine-factor higher order model. a nine-factor model would suggest that cognitive emotion regulation is best considered a multidimensional construct, whereas a nine-factor higher order model would support the view that the subscales of the cerq-short are in fact components of a single unidimensional variable. to evaluate the models, we considered several goodness-of-fit indices, including the comparative fit index (cfi) (bentler, 1990), the tucker–lewis index (tli) (tucker & lewis, 1973), root mean square error of approximation (rmsea) (steiger & lind, 1980) and the standardised root mean square residual. satisfactory (srmr) fit is typically reflected by cfi and tli values > 0.95 and < 0.08 for rmsea and srmr (hu & bentler, 1999). we made use of weighted least squares mean and variance corrected estimation (wlsmv) given its superior performance on ordered categorical responses over maximum likelihood (ml) estimation (beauducel & herzberg, 2006). procedure the data analysed were collected previously as part of a large project investigating wellness in an urban african context. permission for data collection was granted by the ethics committee of the department of psychology and faculty of humanities at a large urban university in south africa. participants were informed about the nature of the study and provided informed consent, acknowledging that they could withdraw from the study at any point should they wish to do so, that all information will be kept confidential and that no identifying information will be made available. the participants received the information via email, along with a link that took them to the questionnaire containing demographic questions and the psychological measures. the results were only used for research purposes. ethical considerations permission for data collection was granted by the ethics committee of the department of psychology, and the faculty of humanities at a large urban university in south africa (ethical clearance number: rec01-056-2016). results descriptive statistics and zero-order correlations among the subscales are reported in table 1. statistically significant correlations ranged between 0.08 (self-blame and positive reappraisal) and 0.51 (refocus on planning and positive reappraisal). it is worth noting that while there is variation with regard to the intercorrelations of this study compared to garnefski and kraai’s (2006) findings, the pattern is quite similar. strong and weak associations were consistently observed among the same variables in both studies. for example, while the previous correlation observed between acceptance and positive reappraisal was 0.43, it is 0.46 in the present study, and whereas catastrophising and refocus on planning previously correlated 0.09, they correlated 0.11 in the present study. table 1: zero-order intercorrelations and descriptive statistics for the scales of the cognitive emotion regulation questionnaire-short. reliability analysis inspection of the results in table 2 shows acceptable to good reliability for most scales (kline, 2011), with the exception of rumination, refocusing on planning and putting-into-perspective, for which reliability was weaker than expected. the similarity in results across alpha and beta is likely because of the small number of items per scale (two). for comparison, pearson correlation-based estimates of cronbach’s alpha and revelle’s beta are also reported in the table, which are lower across the board, compared to the polychoric-based estimates. table 2: reliability estimates for the cognitive emotion regulation questionnaire-short scales. confirmatory factor analysis results for the two confirmatory factor analytic models that were tested are reported in table 3. we found reasonable support for the nine-factor model only. the nine-factor higher order model was not supported as evidenced by the weak goodness-of-fit values. as such, we proceeded to examine the factor loadings of the basic nine-factor model, as reported in table 4. table 3: goodness-of-fit statistics for the nine-factor and nine-factor higher order models. table 4: standardised and unstandardised coefficients for the items of the cognitive emotion regulation questionnaire-short. all items were statistically significant and had good standardised loadings on their expected factors, ranging between 0.601 and 0.867 (tabachnick & fidell, 2007). inspection of the correlated residuals revealed five larger than 0.10 (kline, 2011). however, in each case there was no apparent content overlap on the item pairs to justify making modifications to the model. overall, the results suggest that the sub-facets of the cerq are well defined by their items. discussion the purpose of this study was to examine the reliability and internal factor structure of the cerq-short. while previous research has found fairly good support for the reliability and construct validity of the cerq-full and short versions in a different population (garnefski & kraaij, 2006, 2007; garnefski et al., 2001), no studies have been conducted to examine its utility in the south african context. this article sought to do this for the cerq-short. in terms of reliability, most scales yielded acceptable to good reliability; however, rumination (α = 0.58), refocusing on planning (α = 0.65) and putting-into-perspective (α = 0.64) had reliability coefficients that were weaker than expected. this is substantially lower than what was previously found by garnefski and kraaij (2006), who reported an alpha coefficient of 0.79 for rumination, positive refocusing and putting-into-perspective. the remaining scales had acceptable reliabilities, ranging between 0.71 and 0.81, which are consistent with previous findings (garnefski & kraaij, 2006). regarding the factor structure of the cerq-short, results of the basic nine-factor model provided reasonable support for the separation of items into nine different scales. however, a nine-factor higher order model did not provide adequate fit for the data, and suggests that cognitive emotion regulation as measured by the cerq-short is best considered a multidimensional rather than a unidimensional construct. this is reflected by the weak goodness-of-fit values observed for the nine-factor higher order model which does not support the idea that cognitive emotion regulation can be considered a unidimensional latent construct. however, the present data suggest that it can be indexed using a multidimensional approach as represented by the nine-factor model. this model supports the cerq-short as a measure comprising meaningful constructs, with all items having strong loadings on their respective factors. importantly, this study examined the cerq-short using cfa. previous research mostly used principal components analysis, with one exception in garnefski and kraaij (2007), whose findings were insufficiently reported. principal components analysis represents formative modelling, whereas cfa comprises reflective modelling (fleuren, van amelsvoort, zijlstra, de grip, & kant, 2018; howell, breivik, & wilcox, 2007; jarvis, mackenzie, & podsakoff, 2003). the use of principal components analysis rather than cfa, arguably, represents theoretical misspecification. a clear a priori theoretical structure (garnefski & kraaij, 2006; garnefski et al., 2001) along with a consideration of typical criteria for reflective modelling (fleuren et al., 2018) suggested that cfa was a more appropriate method with which to investigate the factor structure of the cerq-short. although previous research conducted in a different country reported satisfactory measurement properties for the cerq-short (garnefski & kraaij, 2006), its functioning in south africa was found to be slightly weaker in general. while the reliability for six out of nine scales was acceptable, reliability for the three remaining scales was not satisfactory – especially problematic was the rumination scale. it is possible that a latent variable estimate of reliability, such as mcdonald’s (1999) omega, could perhaps paint a different picture than the cronbach’s alpha coefficients; however, with two items per scale, it was not possible to compute. this study has some limitations that should be considered when interpreting the results. the data collected were from an urban student population in one part of south africa. english was not the first language of all participants, which may have impacted how questions were understood and interpreted. the sample consisted of university students who may not represent the general south african population and thus may limit the generalisability of the results. while this study focused on the cerq-short, future research using similar analyses is required on the full version of the measure. as such, the findings of this study should not be extrapolated to the full version. future research should also explore if the measure functions equivalently across different gender, ethnic and language groups. this would subsequently allow investigation of group differences in cognitive emotion regulation. conclusion considered together, the results of this study showed reasonable support for the reliability and construct validity of the cerq-short, with noted exceptions. it should be borne in mind that these findings likely represent a best-case scenario given the use of polychoric correlations. in comparison to the polychoric-based results, the pearson-based reliability estimates reported in table 2 were much less promising. the goodness-of-fit statistics, in particular the incremental fit indices (cfi and tli), are also not above reproach. nevertheless, the cerq-short offers some promise for use in the south african context. without data on the full version, one can only speculate about how it might function in south africa. from previous work conducted elsewhere, it seems likely that the full version would provide more stable measurement than the short version. however, in the absence of empirical data, this remains a conjecture, and the short version is recommended for use in south africa until research that supports the full version is produced. acknowledgments competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions both i.p. and c.j.j.v.z. worked on the entire manuscript. i.p. focused on the literature review and discussion and c.j.j.v.z. focused on the method and results. funding this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. disclaimer the views expressed in this article are the authors’ own and are not the official position of their institution. data availability statement the data for this study is not publicly available. references aldao, a. (2012). emotion regulation strategies as transdiagnostic processes: a closer look at the invariance of their form and function. spanish journal of clinical psychology, 17(3), 261–277. retrieved from http://e-spacio.uned.es/fez/eserv/bibliuned:psicopat-2012-17-3-6025/documento.pdf beauducel, a., & herzberg, p.y. (2006). on the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in cfa. structural equation modeling, 13(2), 186–203. https://doi.org/10.1207/s15328007sem1302_2 bebko, g.m., ochsner, k.n., franconeri, s.l., & chiao, j.y. (2014). attentional deployment is not necessary for successful emotion regulation via cognitive reappraisal or expressive suppression. emotion, 14(3), 504–512. https://doi.org/10.1037/a0035459 bentler, p.m. (1990). comparative fit indexes in structural models. psychological bulletin, 107(2), 238–246. https://doi.org/10.1037/0033-2909.107.2.238 davidson, r.j. (1998). affective style and affective disorders: perspectives from affective neuroscience. cognition and emotion, 12(3), 307–330. https://doi.org/10.1080/026999398379628 fleuren, b.p.i., van amelsvoort, l.g.p.m., zijlstra, f.r.h., de grip, a., & kant, i. (2018). handling the reflective-formative measurement conundrum: a practical illustration based on sustainable employability. journal of clinical epidemiology, 103, 71–81. https://doi.org/10.1016/j.jclinepi.2018.07.007 fitzpatrick, s., khoury, j.e., & kuo, j.r. (2018). examining the relationship between emotion regulation deficits and borderline personality disorder features: a daily diary study. counselling psychology quarterly, 31, 42–58. https://doi.org/10.1080/09515070.2016.1211509 gadermann, a.m., guhn, m., & zumbo, b.d. (2012). estimating ordinal reliability for likert-type and ordinal item response data: a conceptual, empirical, and practical guide. practical assessment, research & evaluation, 17(3). retrieved from https://www.pareonline.net/getvn.asp?v=17&n=3 garnefski, n., & kraaij, v. (2006). cognitive emotion regulation questionnaire – development of a short 18-item version (cerq-short). personality and individual differences, 41, 1045–1053. https://doi.org/10.1016/j.paid.2006.04.010 garnefski, n., & kraaij, v. (2007). the cognitive emotion regulation questionnaire: pscyhometric features and prospective relationships with depression and anxiety in adults. european journal of psychological assessment, 23(3), 141–149. https://doi.org/10.1027/1015-5759.23.3.141 garnefski, n., kraaij, v., & spinhoven, p. (2001). negative life events, cognitive emotion regulation and emotional problems. personality and individual differences, 30, 1311–1327. https://doi.org/10.1016/s0191-8869(00)00113-6 garnefski, n., van den kommer, t., kraaij, v., teerds, j., legerstee, j., & onstein, e. (2002). the relationship between cognitive emotion regulation strategies and emotional problems. european journal of personality, 16, 403–420. gratz, k.l., & roemer, l. (2004). multidimensional assessment of emotion regulation and dysregulation: development, factor structure, and initial validation of the difficulties in emotion regulation scale. journal of psychopathology and behavioral assessment, 26(1), 41–54. https://doi.org/10.1023/b:joba.0000007455.08539.94 gross, j.j. (1998). the emerging field of emotion regulation: an integrative review. review of general psychology, 2(3), 271–299. https://doi.org/10.1037/1089-2680.2.3.271 gross, j.j. (2014). emotion regulation: conceptual and empirical foundations. in j.j. gross (ed.), handbook of emotion regulation (2nd edn., pp. 3–10). new york: guilford press. gross, j.j., & jazaieri, h. (2014). emotion, emotion regulation, and psychopathology: an affective science perspective. clinical psychological science, 2(4), 387–401. https://doi.org/10.1177/2167702614536164 gross, j.j., & john, o.p. (2003). individual differences in two emotion regulation processes: implications for affect, relationships, and well-being. journal of personality and social psychology, 85(2), 348–362. https://doi.org/10.1037/0022-3514.85.2.348 gruber, j., hay, a.c., & gross, j.j. (2014). rethinking emotion: cognitive reappraisal is an effective positive and negative emotion regulation strategy in bipolar disorder. emotion, 14(2), 388–396. https://doi.org/10.1037/a0035249 hofmann, s.g., & kashdan, t.b. (2010). the affective style questionnaire: development and psychometric properties. journal of psychopathology and behavioral assessment, 32, 255–263. https://doi.org/10.1007/s10862-009-9142-4 howell, r.d., breivik, e., & wilcox, j.b. (2007). reconsidering formative measurement. psychological methods, 12, 205–218. https://doi.org/10.1037/1082-989x.12.2.205 hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 jarvis, c.b., mackenzie, s.b., & podsakoff, p.m. (2003). a critical review of construct indicators and measurement model misspecification in marketing and consumer research. journal of consumer research, 30, 199–218. https://doi.org/10.1086/376806 jermann, f., van der linder, m., d’acremont, m., & zermatten, a. (2006). cognitive emotion regulation questionnaire (cerq) confirmatory factor analysis and psychometric properties of the french translation. european journal of psychological assessment, 22(2), 126–131. https://doi.org/10.1027/1015-5759.22.2.126 john, o.p., & eng, j. (2014). three approaches to individual differences in affect regulation: conceptualizations, measures and findings. in j.j. gross (ed.), handbook of emotion regulation (2nd edn., pp. 321–345). new york: the guildford press. joormann, j., & siemer, m. (2004). memory accessibility, mood regulation, and dysphoria: difficulties in repairing sad mood with happy memories? journal of abnormal psychology, 113(2), 179–188. https://doi.org/10.1037/0021-843x.113.2.179 klemanski, d.h., curtiss, j., mclaughlin, k.a., & nolen-hoeksema, s. (2017). emotion regulation and the transdiagnostic role of repetitive negative thinking in adolescents with social anxiety and depression. cognitive therapy research, 41, 206–219. https://doi.org/10.1007/s10608-016-9817-6 kline, r.b. (2011). principles and practice of structural equation modelling (2nd edn.). new york: guildford press. koole, s.l. (2009). the psychology of emotion regulation: an integrative review. cognition and emotion, 23(1), 4–41. https://doi.org/10.1080/02699930802619031 lavender, j.m., tull, m.t., dilillo, d., messman-moore, t., & gratz, k.l. (2015). development and validation of a state-based measure of emotion dysregulation: the state difficulties in emotion regulation scale (s-ders). assessment, 24(2), 197–209. https://doi.org/10.1177/1073191115601218 mallorqui-bague, n., vintro-alcaraz, c., sanchez, i., riesco, n., aguera, z., granero, r., … fernandez-aranda, f. (2017). emotion regulation as a transdiagnostic feature among eating disorders: cross-sectional and longitudinal approach. european eating disorders review, 26(1), 53–61. https://doi.org/10.1002/erv.2570 martin, r.c., & dahlen, e.r. (2005). cognitive emotion regulation in the prediction of depression, anxiety, stress, and anger. personality and individual differences, 39, 1249–1260. https://doi.org/10.1016/j.paid.2005.06.004 mcdonald, r.p. (1999). test theory: a unified treatment. mahwah, nj: lawrence erlbaum associates. nelis, d., quoidbach, j., hansenne, m., & mikolajczk, m. (2011). measuring individual differences in emotion regulation: the emotion regulation profile-revised (erp-r). psychologica belgica, 51(1), 49–91. https://doi.org/10.5334/pb-51-1-49 niven, k., totterdell, p., stride, c.b., & holman, d. (2011). emotion regulation of others and self (eros): the development and validation of a new individual difference measure. current psychology, 30(1), 53–73. https://doi.org/10.1007/s12144-011-9099-9 phillips, k.f.v., & power, m.j. (2007). a new self-report measure of emotion regulation in adolescents: the regulation of emotions questionnaire. clinical psychology and psychotherapy, 14, 145–156. https://doi.org/10.1002/cpp.523 revelle, w., & condon, d.m. (2018, june 10). reliability from alpha to omega: a tutorial. https://doi.org/10.31234/osf.io/2y3w9 roger, d., & nesshoever, w. (1987). the construction and preliminary validation of a scale for measuring emotional control. personality and individual differences, 8(4), 527–534. https://doi.org/10.1016/0191-8869(87)90215-7 steiger, j.h., & lind, j.m. (1980). statistically based tests for the number of factors. paper presented at the annual meeting of the psychometric society, iowa city, ia. tabachnick, b.g., & fidell, l.s. (2007). using multivariate statistics. boston, ma: allyn & bacon/pearson education. tucker, l.r., & lewis, c. (1973). a reliability coefficient for maximum likelihood factor analysis. psychometrika, 38(1), 1–10. https://doi.org/10.1007/bf02291170 zumbo, b.d., gadermann, a.m., & zeisser, c. (2007). ordinal versions of coefficients alpha and theta for likert rating scales. journal of modern applied statistical methods, 6(1), 21–29. https://doi.org/10.22237/jmasm/1177992180 abstract introduction methods results discussion acknowledgements references footnotes about the author(s) kevin g.f. thomas department of psychology, faculty of humanities, university of cape town, cape town, south africa lauren baerecke department of psychology, faculty of humanities, university of cape town, cape town, south africa chen y. pan department of psychology, faculty of humanities, university of cape town, cape town, south africa helen l. ferrett department of psychiatry, faculty of health sciences, stellenbosch university, stellenbosch, south africa citation thomas, k.g.f., baerecke, l., pan, c.y., & ferrett, h.l. (2019). the boston naming test-south african short form, part i: psychometric properties in a group of healthy english-speaking university students. african journal of psychological assessment, 1(0), a15. https://doi.org/10.4102/ajopa.v1i0.15 original research the boston naming test-south african short form, part i: psychometric properties in a group of healthy english-speaking university students kevin g.f. thomas, lauren baerecke, chen y. pan, helen l. ferrett received: 14 june 2019; accepted: 18 sept. 2019; published: 22 nov. 2019 copyright: © 2019. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the boston naming test (bnt) is a popular cognitive test designed to detect word-finding difficulties in neurologic disease. however, numerous studies have demonstrated the bnt’s inherent cultural bias and cautioned against uncritical administration outside of north america. there is little research on the bnt performance of south african samples and on ways to make the test culturally fair for use in this country. in this article, we describe the development and psychometric properties of the bnt-south african short form (bnt-sasf). this instrument includes 15 items drawn from the original test pool and judged by a panel of practising neuropsychologists and community members to be culturally appropriate for use in south africa. we administered the standard 60-item bnt and the bnt-sasf to a homogeneous (english-fluent, high socioeconomic status and highly educated) sample of young south african adults. this design allowed us to avoid potentially confounding sociodemographic influences in our evaluation of the instrument’s basic utility. we found that the bnt-sasf demonstrates fundamental psychometric properties that are the equivalent of short forms developed elsewhere. moreover, it appears to measure the same construct as the 60-item bnt while being less culturally biased. we conclude that the bnt-sasf has potential utility in south african assessment settings. it is quick and easy to administer, thus aiding in the rapid screening of patients. moreover, it is cost-effective because its items are drawn from the pool comprising the original test. future research will describe psychometric properties of afrikaans and isixhosa versions of the bnt-sasf and investigate diagnostic validity in dementia patients. keywords: boston naming test; cross-cultural neuropsychology; cultural bias; reliability; short form; validity. introduction the boston naming test (bnt; kaplan, goodglass, & weintraub, 1978, 1983, 2001) is a widely used cognitive test designed to detect the serious word-finding difficulties that characterise certain variants of aphasia and dementia. however, numerous studies have suggested that the bnt is culturally biased and cautioned against uncritical administration of the instrument (barker-collo, 2001; fernández & abe, 2018). to date, there is little published research on the bnt performance of south african samples and on ways to make the test culturally fair for use in this country. boston naming test: a brief introduction the bnt tests confrontation naming ability (i.e. the ability to pull out the correct word at will; lezak, howieson, & loring, 2004, p. 511). in its current form, it consists of 60 black-and-white line drawings presented in ascending order of difficulty. the first few items are commonly encountered objects (e.g. bed), whereas the last several are less frequently encountered objects (e.g. protractor). for each item, the examinee is given 20 seconds to produce a correct spontaneous response, after which a semantic cue is offered (e.g. it measures angles). failing the production of a correct response to this cue, a phonemic cue is offered (e.g. it starts with the sound ‘pro’). the most recent revision also features a multiple-choice section. after completing the standard presentation as described above, the examiner returns to each failed item and asks the examinee to select, from an array of four options, the word best describing the pictured object. the bnt is used primarily to assess confrontation naming ability in patients of all ages with neurological deficits stemming from cerebrovascular accidents, traumatic brain injuries and neurodegenerative disorders (kiran et al., 2018; strain et al., 2017; strauss, sherman, & spreen, 2006). it is particularly effective in detecting the naming deficits present in alzheimer’s disease (ad) and thus helps distinguish that neurodegenerative disorder from normal aging and from other forms of dementia (balthazar, cendes, & damasceno, 2008; golden et al., 2005). interpretation of bnt performance is complicated by the fact that non-organic factors may impact on scores. for instance, both age and education moderate bnt performance in healthy individuals. scores decline with increasing age, with especially significant deterioration in the oldest old (by conventional definition, those aged 80 years and older; lucas et al., 2005; tombaugh & hubley, 1997; zec, burkett, markwell, & larsen, 2007). scores are also lower in those with fewer years of education, with particularly strong effects at < 12 years (hawkins & bender, 2002; mitrushina, boone, razani, & d’elia, 2005; neils et al., 1995). cross-cultural adaptation and use of the boston naming test as is the case with many other popular neuropsychological tests, the bnt was developed for the assessment of monolingual english-speaking north american individuals and reflects the context in which it was developed. unsurprisingly, then, bnt performance of non-north american samples is markedly poorer than that of north american samples (see, e.g., cruice, worrall, & hickson, 2000; tallberg, 2005). perhaps more surprising is that this cross-cultural difference exists even when evaluating the performance of english speakers from new zealand or australia against north american normative data, or when comparing the performance of white americans to that of african-americans, bilingual spanish/english residents of the united states, or bilingual french/english residents of canada (barker-collo, 2001; fillenbaum, huber, & taussig, 1997; kohnert, hernandez, & bates, 1998; lichtenberg, ross, & christensen, 1994; roberts, garcia, desrochers, & hernandez, 2002). often, the source of these performance differences is the cultural relevance of items to test-takers. evidence supporting this statement emerges from studies showing that examinees with different ethnic or cultural backgrounds produce different patterns of errors (allegri et al., 1997; pedraza et al., 2009). moreover, particular items (e.g. beaver, pretzel) appear to be especially culturally loaded: in non-north american samples, error rates on those items are significantly higher than those on adjacent items (i.e. items that should have a similar level of difficulty; barker-collo, 2007; worrall, yiu, hickson, & barnett, 1995). hence, researchers and clinicians across the world have developed culturally modified versions of the test, replacing problematic items with ones more suited to their local contexts (see, e.g., fernández & fulbright, 2015; grima & franklin, 2016; kim & na, 1999; patricacou, psallida, pring, & dipper, 2007). the current study we describe the development of, and present preliminary psychometric data for, the boston naming test-south african short form (bnt-sasf). we chose to develop a short (15-item) form because such instruments aid in the rapid screening of patients. cognitive screening instruments are especially important in the resource-limited and patient-heavy clinics that characterise the south african healthcare system (katzef, henry, gouse, robbins, & thomas, 2019; robbins et al., 2013). moreover, reduced test time facilitates the assessment of patients with limited attention or motivation, and of those with severe neurological impairment who may become easily fatigued or frustrated (roebuck-spencer et al., 2017). there is an extensive precedent for creating a short form of the bnt (fastenau, denburg, & mauer, 1998; kang, kim, & na, 2000; saxton et al., 2000). certain 15-item and 30-item short forms appear to have clinical utility, showing high rates of agreement with the full 60-item test in distinguishing dementia patients from healthy older adults (graves, bezeau, fogarty, & blair, 2004; lansing, ivnik, cullum, & randolph, 1999; williams, mack, & henderson, 1989). additionally, age, education and culture moderate performance on these short forms in the same way they do on the full test (jefferson et al., 2007; kent & luszcz, 2002; leite, miotto, nitrini, & yassuda, 2017). we modelled procedures for our short form development on those described by mack, freed, williams and henderson (1992). they created four equivalent 15-item versions by dividing the 60 items of the original test into four 15-item groups, with each group reflecting the original’s full range of content. they reported that each short form successfully differentiated a sample of ad patients from healthy controls. their fourth version, the mack sf-4, is the most globally popular 15-item short form and it is included with the officially published bnt kit. the bnt-sasf comprises 15 items judged by a forum of practising neuropsychologists and community members as being more culturally appropriate for the south african population than those on the mack sf-4. this article is the first to provide a detailed psychometric report on a version of the bnt designed specifically for use in south africa. although mosdell, balchin, and ameen (2010) describe a south african-adapted 30-item form of the bnt, they do not (1) provide reliability or validity information, (2) compare performance on their short form to performance on the full version of the instrument or to performance on previously published short forms, or (3) present item-level analyses. moreover, their adapted instrument features entirely new items, not included on the original bnt, making it somewhat less accessible to clinicians than the bnt-sasf. using a relatively homogeneous sample to minimise the influence on bnt performance of potentially confounding factors such as age, education and language background, the current study addressed these specific questions: how does the bnt performance of english-fluent university undergraduate students compare with north american normative standards? do basic psychometric properties of the bnt, as established in its development literature, hold in this south african sample? what is the test–retest and internal consistency reliability of the bnt-sasf? do the items included in the bnt-sasf show the desired properties in terms of, for instance, relative difficulty? methods development of the boston naming test-south african short form the bnt-sasf comprises 15 items drawn from the bnt’s pool of 60 items (table 1). table 1: items comprising the boston naming test-south african short form. to decide which 15 items would constitute the instrument, we consulted via email with 15 fully trained and experienced south african neuropsychologists personally known to us (ten based in the western cape, three in gauteng, one in the eastern cape and one in kwazulu-natal). all were members of the south african clinical neuropsychological association (sacna), and all had used the bnt in their clinical practice for several years. we told them we had divided the pool of 60 items into 15 sets of four items of equivalent difficulty (e.g. items 1–4 formed a set, items 5–8 formed another set, and so on; this procedure ensured the items in the short form would be of increasing difficulty and in a sequence roughly equivalent to the original test). we instructed the neuropsychologists to rate each item in each of the 15 sets according to whether it was culturally appropriate for use in south africa, and to then select the most culturally appropriate of the four items in each set. for instance, the item beaver was one of the options in the eighth set. however, this animal is likely to be relatively unfamiliar to the average south african; rhinoceros (another option in the same set) is likely to be more culturally appropriate. after taking the consensus of views, we settled on the final version of the bnt-sasf. a team of linguists translated and back-translated this modified test from english into afrikaans and isixhosa, the other two languages most widely spoken in the western cape. to ensure that the isixhosa version was appropriate for use in that province, we consulted with a small forum of community members (five women, aged from the mid-20s to mid-60s, all first-language isixhosa speakers) from khayelitsha and gugulethu. we report in more detail on those versions of the bnt-sasf in forthcoming publications participants we used convenience sampling to recruit and screen 104 undergraduate students. forty-five did not meet the eligibility criteria listed below. hence, the final sample consisted of 59 participants (24 men and 35 women). they received course credit in exchange for participation. participants were required to (1) be aged between 18 and 25 years; (2) speak english as a first language; (3) have matriculated from a south african quintile 4 or quintile 5 public high school (or the relative equivalent if schooled elsewhere)1 or from a private high school in south africa, and have gained entry into university; (4) have their home residence in a suburb with a median annual income of ≥ r76 801 (statistics south africa, 2011) and (5) make themselves available for one of the research slots listed on the online schedule distributed to them. we set inclusion criteria related to quality of education and socioeconomic status (ses) in place because, although there is not a large literature detailing their influence on bnt performance, numerous studies describe their general and significant relations to cognitive performance (see, e.g., crowe et al., 2012; lyu & burr, 2016). we excluded individuals with a current prescription for psychotropic medication and/or a history of psychiatric diagnosis; a history of pre-natal or birth complications; a history of head injury that resulted in a loss of consciousness for more than 5 min; seizure disorders; substance-use disorders; a history of medical illness that resulted in loss of cognitive functioning; or language, speech or behavioural disorders. we also excluded those who had been administered psychometric tests in the 12 months prior to study enrolment. again, we set these exclusion criteria in place because these factors influence cognitive test performance (mitrushina et al., 2005; strauss et al., 2006). measures and procedure each participant was tested individually, across two sessions separated by exactly 2 weeks, in a quiet testing room within a psychology research laboratory. a psychology graduate student administered all study procedures. test occasion 1 (t1) upon entering the laboratory, the researcher ensured the participant read, understood and signed an informed consent document. before administering the psychological tests, the researcher ensured that the participant completed a study-specific sociodemographic questionnaire. this instrument gathered biographical, socioeconomic and medical information needed for screening purposes. those meeting the eligibility criteria were administered the 60-item bnt according to the standardised instructions that appear in the test manual (kaplan et al., 2001), with this exception: the test administrator presented all 60 items, in order from item 1 through item 60 (i.e. the usual starting point and discontinuation rules were not applied). we followed this procedure to ensure that performance on all 60 items could be examined statistically. the bnt-sasf was not administered as a separate measure to participants. instead, we derived a score for the instrument from the performance on relevant items within the full bnt administration. at the end of the test administration, the researcher scheduled an appointment for the second test session. test occasion 2 (t2) immediately after entering the laboratory, participants were reminded of their research rights and they were then administered the bnt (including, of course, the 15 items that constituted the bnt-sasf). ethical considerations the study protocol was approved by our institution’s review board. all procedures were conducted in compliance with the declaration of helsinki (world medical association, 2013). our consent document gave participants complete information about the study procedures, assured them of their rights to privacy and to confidentiality of their data and informed them that they could withdraw from the study at any point without penalty. the document also informed them about their course credit compensation and about the minimal risks they would face during participation. finally, participants were fully debriefed at the end of t2 and given the opportunity to ask any questions relating to their experience of the research. all study procedures were approved by the university of cape town’s department of psychology research ethics committee (clearance number psy2019-005). data management and statistical analyses we scored the 60-item bnt and the bnt-sasf using conventional methods (i.e. the total score for each instrument is the sum of the number of correct spontaneous responses and the number of correct responses following a stimulus cue). we entered those outcome variables, along with the score for each item (0 or 1), into a datasheet. we analysed the data using spss (version 25.0), with the threshold for statistical significance (α) set at 0.05. analyses of the bnt and bnt-sasf data proceeded across four discrete steps. first, two separate one-sample t-tests compared bnt performance of the current sample at t1 to average bnt performance of highly educated young adults from north america and new zealand; and three separate paired-sample t-tests compared the t1 performance of the current sample on the 15 items comprising the bnt-sasf to their t1 performance on 15 items comprising previously established short forms. second, spearman’s ρ estimated test–retest reliability for each instrument was established across the 2-week interval between t1 and t2. (we used this coefficient, rather than pearson’s product-moment correlation coefficient, because test scores were non-normally distributed.) third, cronbach’s α estimated internal consistency reliability for the t1 data. fourth, we investigated item-by-item performance on both instruments by creating a difficulty index for each item (i.e. calculating, for each item across the entire sample, the proportion of correct responses produced either spontaneously or following the presentation of a semantic cue). several previous bnt studies have calculated the difficulty index in this way (see, e.g., franzen, haut, rankin, & keefover, 1995; tombaugh & hubley, 1997). the desired trend is for the proportion of correct responses to decrease (i.e. for the items to become more difficult) as the test progresses. for the 60-item bnt, we compared the difficulty index for each item to similar data from previously published research to help identify items that may be particularly problematic in the south african context. results sample characteristics participants ranged in age from 18–24 years (m = 19.98 ± 1.68). they had completed between 12 and 17 years of education (m = 13.31 ± 1.12). the modal annual income bracket for participants’ suburb of residence was r153 601.00 – r307 200.00. 60-item boston naming test: performance, psychometric properties and item analyses at t1, the sample’s mean score was 51.51 (median = 52; mode = 55; sd = 5.33; and range = 35–59). this performance was significantly worse than that of normative samples of young adults from north america but was not significantly different from that of a comparable sample of highly educated young adults from new zealand (table 2). these results must be interpreted with caution, because the current bntt1 scores were significantly non-normally distributed, shapiro–wilk test (59) = 0.92, p = 0.001, skewness = −0.93, kurtosis = 0.48. table 2: comparison of the current sample’s boston naming test 60-item performance to that of north american and new zealand normative samples. test–retest reliability was acceptable: t1 and t2 performance were significantly positively associated, spearman’s ρ = 0.41, p = 0.001. internal consistency reliability was better, however, with cronbach’s αt1 = 0.85. component variables with zero variances (viz., items 1–12, 14–18, 20–25, 31, 43 and 45) were not included in this analysis. figure 1 presents an item difficulty index based on the performance of the current sample at t1. most of the easiest items (i.e. those to which 100% of participants responded correctly) are clustered at the beginning of the test. although there is a roughly linear trend towards more difficult items at the end of the test, it is notable that the line is jagged, with more difficult items (e.g. 28 and 47) interspersed among much easier ones. a comparison of this item’s difficulty index with that presented by tombaugh and hubley (1997) for their sample suggests that fully one-third of the 60 items might be regarded as culturally biased against south africans (table 3). figure 1: item difficulty index for the current administration, at the first test occasion, of the standard 60-item boston naming test. data are proportion of correct responses made spontaneously or with stimulus cue for a sample of young english-speaking south african adults (n = 59). comparative data (n = 219 english-speaking canadian adults, age range 25–88, education range = 9–21 years) are from tombaugh and hubley (1997). table 3: boston naming test item difficulty index: current sample versus a north american sample. boston naming test-south african short form: performance, psychometric properties and item analyses at t1, the sample’s mean score was 13.97 (median = mode = 14; sd = 1.08; range = 11–15). this score was at least as good as the score they would have achieved on three other well-established 15-item short forms; in two of the three cases, it was significantly higher (table 4). these results must be interpreted with caution, however, because bnt-sasft1 scores were significantly non-normally distributed, shapiro–wilk test (59) = 0.82, p < 0.001, skewness = −1.03, kurtosis = 0.49. table 4: comparison of the current sample’s boston naming test-south african short form performance with that on other 15-item short forms (n = 59). analyses detected a significant positive association between bntt1 and bnt-sasft1 scores, spearman’s ρ = 0.66, p < 0.001. the estimate of test–retest reliability was confounded, however, because performance at t2 was better than that at t1 by at least one point in 77% of participants. hence, performance at t1 was significantly negatively associated with that at t2, spearman’s ρ = −0.39, p = 0.037. internal consistency reliability was poor, cronbach’s αt1 = 0.35. again, component variables with zero variances (viz., items 2, 7, 10, 15, 20, 22, 25, 31) were not included in this analysis. figure 2 presents an item difficulty index based on the performance of the current sample at t1. the trend for increasing errors as the test progresses is evident. whereas all participants responded correctly to the first 8 items, there were increasing numbers of errors from items 11 through 15 (with the exception of item 12 [funnel], which appeared to be much more familiar to this sample than the items adjacent to it). figure 2: item difficulty index for the current administration of the 15-item boston naming test-south african short form. discussion the boston naming test has, for decades, been one of the most widely used neuropsychological tests (rabin, barr, & burton, 2005; rabin, paolillo, & barr, 2016). despite its global reach and popularity, many of the test’s items are heavily culture-bound. hence, there is a high risk for misdiagnosis of naming deficits when the bnt is used to assess individuals outside of north america (cruice et al., 2000; tallberg, 2005). the current study describes the development of, and preliminary psychometric properties for, a south african-adapted version of the bnt. because local clinical conditions demand shorter and simpler forms of test administration, the bnt-sasf contains 15 items. these items were judged by a panel of practising neuropsychologists and community members to be culturally appropriate for local use. we administered the standard 60-item bnt, which incorporates the bnt-sasf, to a homogenous (english-fluent, high-ses, highly educated) sample of young adults. we reasoned that such a design, featuring the segment of the south african population that most closely matches north american normative samples, would allow us to avoid potentially confounding sociodemographic influences and to thus draw inferences about the basic utility of the bnt-sasf in this country. our analyses of bnt-sasf data suggested the instrument tests the same construct as other versions of the instrument. most participants scored 14/15 at the first administration, a high level of performance that is consistent with north american samples administered different 15-item short forms (fastenau et al., 1998; lansing et al., 1999; mack et al., 1992; tombaugh & hubley, 1997). moreover, the performance of our participants on the 15 items comprising the bnt-sasf was better than their performance on the 15 items comprising other well-known short forms that were developed outside of south africa and, therefore, without consideration of local cultural and contextual factors. boston naming test-south african short form scores were significantly positively associated with 60-item bnt scores, with the value of the correlation coefficient (ρ = 0.66) within the range reported in the literature on other 15-item short forms. that range spans values from 0.62 for the cerad short form (tombaugh & hubley, 1997), through 0.74 for the mack sf-4 (fastenau et al., 1998), and up to > 0.95 for all mack short forms (franzen et al., 1995). the current correlation would have been stronger had performance on the 60-item bnt been as good as that on the short form. as discussed below, many of the 60 items proved to be relatively problematic for our participants and so their scores were relatively poor on the full instrument. any discrepancy in favour of the bnt-sasf over the bnt might be interpreted as an indication of success in removing culturally biased items from the instrument. further evidence for the content validity of the bnt-sasf emerges from the item difficulty index created using the performance of the current sample. that index suggested that earlier items were relatively easy whereas later items were relatively difficult (with the last two items being the most difficult). this difficulty trend is what the bnt developers intended and the fact that performance on our 15-item version displays that trend is encouraging. although the internal consistency reliability of the bnt-sasf was quite low (cronbach’s α = 0.35), the value of this estimate is in the same range as what tombaugh and hubley (1997) report for the cerad short form and the mack sf-4 (α= 0.36 and 0.49, respectively). it is unsurprising that these values are relatively low, given that the internal consistency of a test is strongly related to its length (i.e. tests with more items are typically more internally consistent; cohen & swerdlik, 2018). this is one reason why some in this field prefer 30-item short forms over 15-item short forms (williams et al., 1989). a more prominent concern, however, is the relatively poor test–retest reliability (ρ = −0.39) of the bnt-sasf. as we note above, this value is influenced by the fact that most participants performed better at t1 than at t2 (perhaps as a result of carryover effects, specifically the administration of phonemic and multiple-choice cues at t1). such poor test–retest reliability is not a typical feature of 15-item bnt short forms. for instance, teng et al. (1989) reported a value of 0.90 over a 1-week interval for a sample of patients with ad. it is unclear, however, whether they followed standard administration procedures at both test occasions, as we did. our analyses of the current sample’s 60-item bnt data confirmed that the instrument’s inherent cultural biases make it unsuitable, in its original and unmodified form, for administration in south african clinical and research settings. we found, for instance, that the overall performance of our sample of english-fluent, high-ses, highly educated participants was significantly worse than that of comparable samples of young adults from north america and that the root of this performance difference was the difficulty our participants experienced on culturally bound items such as wreath, beaver and yoke. this result replicates those of numerous previous studies reporting on cross-cultural administration of the bnt (see, e.g., barker-collo, 2001; worrall et al., 1995). regarding reliability of the 60-item bnt in the current sample, findings were mixed. whereas internal consistency reliability (α = 0.85) was within the range most commonly cited as an acceptable value for this statistic (cohen & swerdlik, 2018), and was comparable to the coefficient (α = 0.78) reported by tombaugh and hubley (1997), test–retest reliability (ρ = 0.41), although statistically significant, was relatively low compared to previous studies. for instance, flanagan and jackson (1997) reported a value of 0.90 over a 1–2-week interval for a sample of healthy older adults. other studies of neurologically intact older adults suggest that this excellent test–retest reliability is maintained over much longer intervals (mitrushina et al., 2005). unfortunately, previous bnt investigations of healthy young adult samples do not provide reliability data. one possible reason for the relatively poor test–retest reliability in this sample is that our participants were farther away from ceiling effects at t1 than those in other samples, and improved significantly at t2 (again, perhaps as a result of carryover effects). statistical comparison of t1 and t2 performance bears out this account, t = −1.47, p = 0.15, cohen’s d = 0.27. limitations and directions for future research the inferences we might draw from this study are limited by the size and nature of the sample. compared with other studies that collected original data in developing bnt short forms (e.g. fastenau et al., 1998; graves et al., 2004), our sample size was smaller. moreover, the sample was not representative of the national population, or even of the population of south african undergraduates (note that 45 of the 104 individuals we recruited did not meet our very strict eligibility criteria). however, the purpose of this study was not to collect nationally representative normative data, or to make generalised statements about the utility of the bnt-sasf. instead, we intentionally recruited a homogeneous group of participants so as to avoid the confounding effects of sociodemographic variables (e.g. age, education and home language) on performance, and then set out to show (as a first step in a meticulous process of psychometric investigation) that this new instrument is reliable and valid in a south african sample that is, broadly speaking, comparable to those used in most north american normative studies. a second limitation is that, for at least two reasons, we cannot make definitive statements about the construct validity of the bnt-sasf. first, the magnitude of the correlation between bnt and bnt-sasf scores might be spuriously high as a result of method variance. second, we did not administer independent tests of confrontation naming ability (e.g. the naming test of the neuropsychological assessment battery; yochim, kane, & mueller, 2009). we chose not to do so because all existing tests of that cognitive construct are of the same form (i.e. the participant views an image and is asked to identify the pictured object). hence, comparative analyses of performance on the bnt and any of those tests runs the risk of being confounded by common method variance. a third limitation is that, rather than collecting original cross-national data, we used historical data when comparing performance of the current sample to that of adults from other countries. such historical comparisons are vulnerable to cohort effects and it is possible that we observed a minor instance of such effects here. for example, whereas 100% of our participants identified unicorn correctly, only 90% of tombaugh and hubley’s (1997) sample and 83% of barker-collo’s (2001) sample did so. the relative easiness of this item in the 2019 group might be attributed to the relatively more prominent place unicorns have in contemporary popular culture (segran, 2017). one remedy for such circumstances is to engage in what fernández and abe (2018, p. 1) term ‘simultaneous test development across multiple cultures’. follow-up studies of the bnt-sasf are already underway. in future articles, we will describe the psychometric properties of afrikaans and isixhosa versions of the instrument, report on how performance is influenced by age, education and ses, and investigate diagnostic validity in samples of healthy older adults and dementia patients. we encourage independent research groups to develop versions of the instrument appropriate for their own linguistic contexts, and to collaborate in collecting nationally representative and appropriately stratified normative data. summary and conclusion neuropsychological tests developed, standardised and normed in high-income countries of the global north often deliver misleading results when used outside of their sociocultural and linguistic context of origin (howieson, 2019; nell, 2000). this is especially true when the tests are used without critical consideration of cultural bias and cultural fairness, when construct validity in the local context has not been verified, or when locally appropriate normative data are not used. the need for cognitive tests that are reliable, valid, and culturally fair for use in south african clinical and research settings is growing. increasing numbers of neuropsychology trainees are entering the field. increasing amounts of overseas grant money are being invested into south african-based neuroscience research but funded projects must use psychometrically sound instruments that are well known to international audiences. here, we described the development and psychometric assessment of a south african-adapted short form of the bnt. a key aspect of the bnt-sasf’s value is that its items are drawn from the pool of items comprising the original test. this makes it a timeand cost-effective option on many levels (e.g. we did not have to curate an entirely new set of items, and those who already own the standard bnt will be able to use this modified short form without purchasing any new materials). these are particularly important considerations when one is operating in a resource-limited setting such as south africa. another advantage of this short form is that, unlike many other short forms that are developed via odd–even or split–half methods, this one was developed on an item-by-item basis, which lends itself to evaluation by item response theory (pedraza et al., 2009). our data suggest that the bnt-sasf demonstrates basic psychometric properties that are the equivalent of short forms developed elsewhere in the world. moreover, it appears to measure the same construct as the full 60-item bnt while being less culturally biased. acknowledgements competing interests the authors have declared that no competing interests exist. authors’ contributions k.g.f.t. designed the research, supervised the data analysis and wrote the first draft of the manuscript. l.b. contributed to research design, collected and analysed the data, wrote parts of the manuscript and approved the final version. c.y.p. collected and analysed the data, wrote parts of the manuscript, contributed to manuscript preparation and approved the final version. h.l.f. led the research project and approved the final version to be published. funding information this research was supported by the university of cape town (university research scholarship, awarded to l.b.) and the stellenbosch university strengthening research initiative programme (junior research fellowship, awarded to h.l.f.). data availability statement data are available upon request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or policy or position of any affiliated agency of the authors. references allegri, r.f., villavicencio, a.f., taragano, f.e., rymberg, s., mangone, c.a., & baumann, d. (1997). spanish boston naming test norms. the clinical neuropsychologist, 11(4), 416–420. https://doi.org/10.1080/13854049708400471 balthazar, m.l.f., cendes, f., & damasceno, b.p. (2008). semantic error patterns on the boston naming test in normal aging, amnestic mild cognitive impairment, and mild alzheimer’s disease: is there semantic disruption? neuropsychology, 22(6), 703–709. https://doi.org/10.1037/a0012919 barker-collo, s.l. (2001). the 60-item boston naming test: cultural bias and possible adaptations for new zealand. aphasiology, 15(1), 85–92. https://doi.org/10.1080/02687040042000124 barker-collo, s.l. (2007). boston naming test performance of older new zealand adults. aphasiology, 21(12), 1171–1180. https://doi.org/10.1080/02687030600821600 cohen, r.j., & swerdlik, m.e. (2018). psychological testing and assessment: an introduction to tests and measurement (9th edn.). new york: mcgraw-hill. crowe, m., clay, o.j., martin, r.c., howard, v.j., wadley, v.g., sawyer, p., & allman, r.m. (2012). indicators of childhood quality of education in relation to cognitive function in older adulthood. the journals of gerontology: series a, 68(2), 198–204. https://doi.org/10.1093/gerona/gls122 cruice, m.n., worrall, l.e., & hickson, l.m.h. (2000). boston naming test results for healthy older australians: a longitudinal and cross-sectional study. aphasiology, 14(2), 143–155. https://doi.org/10.1080/026870300401522 farmer, a. (1990). performance of normal males on the boston naming test and the word test. aphasiology, 4(3), 293–296. https://doi.org/10.1080/02687039008249081 fastenau, p.s., denburg, n.l., & mauer, b.a. (1998). parallel short forms for the boston naming test: psychometric properties and norms for older adults. journal of clinical and experimental neuropsychology, 20(6), 828–834. fernández, a.l., & abe, j. (2018). bias in cross-cultural neuropsychological testing: problems and possible solutions. culture and brain, 6(1), 1–35. https://doi.org/10.1007/s40167-017-0050-2 fernández, a.l., & fulbright, r.l. (2015). construct and concurrent validity of the spanish adaptation of the boston naming test. applied neuropsychology: adult, 22(5), 355–362. https://doi.org/10.1080/23279095.2014.939178 fillenbaum, g.g., huber, m., & taussig, i.m. (1997). performance of elderly white and african american community residents on the abbreviated cerad boston naming test. journal of clinical and experimental neuropsychology, 19(2), 204–210. https://doi.org/10.1080/01688639708403851 flanagan, j.l., & jackson, s.t. (1997). test-retest reliability of three aphasia tests: performance of non-brain-damaged older adults. journal of communication disorders, 30(1), 33–42. https://doi.org/10.1016/s0021-9924(96)00039-1 franzen, m.d., haut, m.w., rankin, e., & keefover, r. (1995). empirical comparsion of alternate forms of the boston naming test. the clinical neuropsychologist, 9(3), 225–229. golden, z., bouvier, m., selden, j., mattis, k., todd, m., & golden, c. (2005). differential performance of alzheimer’s and vascular dementia patients on a brief battery of neuropsychological tests. international journal of neuroscience, 115(11), 1569–1577. https://doi.org/10.1080/00207450590957953 graves, r.e., bezeau, s.c., fogarty, j., & blair, r. (2004). boston naming test short forms: a comparison of previous forms with new item response theory based forms. journal of clinical and experimental neuropsychology, 26(7), 891–902. https://doi.org/10.1080/13803390490510716 grima, r., & franklin, s. (2016). a maltese adaptation of the boston naming test: a shortened version. clinical linguistics & phonetics, 30(11), 871–887. https://doi.org/10.1080/02699206.2016.1181106 hawkins, k.a., & bender, s. (2002). norms and the relationship of boston naming test performance to vocabulary and education: a review. aphasiology, 16(12), 1143–1153. https://doi.org/10.1080/02687030244000031 howieson, d. (2019). current limitations of neuropsychological tests and assessment procedures. the clinical neuropsychologist, 33(2), 200–208. https://doi.org/10.1080/13854046.2018.1552762 jefferson, a.l., wong, s., gracer, t.s., ozonoff, a., green, r.c., & stern, r.a. (2007). geriatric performance on an abbreviated version of the boston naming test. applied neuropsychology, 14(3), 215–223. https://doi.org/10.1080/09084280701509166 kang, y., kim, h., & na, d.l. (2000). parallel short forms for the korean–boston naming test (k-bnt). journal of the korean neurological association, 18(2), 144–150. kaplan, e.f., goodglass, h., & weintraub, s. (1978). boston naming test: experimental edition. boston, ma: boston university. kaplan, e.f., goodglass, h., & weintraub, s. (1983). the boston naming test. philadelphia, pa: lea & febiger. kaplan, e.f., goodglass, h., & weintraub, s. (2001). the boston naming test (2nd edn.). philadelphia, pa: lippincott williams & wilkins. katzef, c., henry, m., gouse, h., robbins, r.n., & thomas, k.g.f. (2019). a culturally fair test of processing speed: construct validity, preliminary normative data, and effects of hiv infection on performance in south african adults. neuropsychology, 33(5), 685–700. https://doi.org/10.1037/neu0000539 kent, p.s., & luszcz, m.a. (2002). a review of the boston naming test and multiple-occasion normative data for older adults on 15-item versions. the clinical neuropsychologist, 16(4), 555–574. https://doi.org/10.1076/clin.16.4.555.13916 kim, h., & na, d.l. (1999). normative data on the korean version of the boston naming test. journal of clinical and experimental neuropsychology, 21(1), 127–133. https://doi.org/10.1076/jcen.21.1.127.942 kiran, s., cherney, l.r., kagan, a., haley, k.l., antonucci, s.m., schwartz, m., … simmons-mackie, n. (2018). aphasia assessments: a survey of clinical and research settings. aphasiology, 32(suppl 1), 47–49. https://doi.org/10.1080/02687038.2018.1487923 kohnert, k.j., hernandez, a.e., & bates, e. (1998). bilingual performance on the boston naming test: preliminary norms in spanish and english. brain and language, 65(3), 422–440. https://doi.org/10.1006/brln.1998.2001 lansing, a.e., ivnik, r.j., cullum, c.m., & randolph, c. (1999). an empirically derived short form of the boston naming test. archives of clinical neuropsychology, 14(6), 481–487. https://doi.org/10.1016/s0887-6177(98)00022-5 leite, k.s.b., miotto, e.c., nitrini, r., & yassuda, m.s. (2017). boston naming test (bnt) original, brazilian adapted version and short forms: normative data for illiterate and low-educated older adults. international psychogeriatrics, 29(5), 825–833. https://doi.org/10.1017/s1041610216001952 lezak, m.d., howieson, d., & loring, d. (2004). neuropsychological assessment (4th edn.). new york: oxford university press. lichtenberg, p.a., ross, t., & christensen, b. (1994). preliminary normative data on the boston naming test for an older urban population. clinical neuropsychologist, 8(1), 109–111. https://doi.org/10.1080/13854049408401548 lucas, j.a., ivnik, r.j., smith, g.e., ferman, t.j., willis, f.b., petersen, r.c., & graff-radford, n.r. (2005). mayo’s older african americans normative studies: norms for boston naming test, controlled oral word association, category fluency, animal naming, token test, wrat-3 reading, trail making test, stroop test, and judgment of line orientation. the clinical neuropsychologist, 19(2), 243–269. https://doi.org/10.1080/13854040590945337 lyu, j., & burr, j.a. (2016). socioeconomic status across the life course and cognitive function among older adults: an examination of the latency, pathways, and accumulation hypotheses. journal of aging and health, 28(1), 40–67. https://doi.org/10.1177/0898264315585504 mack, w.j., freed, d.m., williams, b.w., & henderson, v.w. (1992). boston naming test: shortened versions for use in alzheimer’s disease. journal of gerontology, 47(3), 154–158. https://doi.org/10.1093/geronj/47.3.p154 mitrushina, m., boone, k.b., razani, j., & d’elia, l.f. (2005). handbook of normative data for neuropsychological assessment (2nd edn.). new york: oxford university press. morris, j.c., heyman, a., mohs, r.c., hughes, j.p., van belle, g., fillenbaum, g., … the cerad investigators. (1989). the consortium to establish a registry for alzheimer’s disease (cerad). part i. clinical and neuropsychological assessment of alzheimer’s disease. neurology, 39(9), 1159–1165. https://doi.org/10.1212/wnl.39.9.1159 mosdell, j., balchin, r., & ameen, o. (2010). adaptation of aphasia tests for neurocognitive screening in south africa. south african journal of psychology, 40(3), 250–261. https://doi.org/10.1177/008124631004000304 neils, j., baris, j.m., carter, c., dell’aira, a.l., nordloh, s.j., weiler, e., & weisiger, b. (1995). effects of age, education, and living environment on boston naming test performance. journal of speech, language, and hearing research, 38(5), 1143–1149. https://doi.org/10.1044/jshr.3805.1143 nell, v. (ed.). (2000). cross-cultural neuropsychological assessment: theory and practice. mahwah, nj: erlbaum. patricacou, a., psallida, e., pring, t., & dipper, l. (2007). the boston naming test in greek: normative data and the effects of age and education on naming. aphasiology, 21(12), 1157–1170. https://doi.org/10.1080/02687030600670643 pedraza, o., graff-radford, n.r., smith, g.e., ivnik, r.j., willis, f.b., petersen, r.c., & lucas, j.a. (2009). differential item functioning of the boston naming test in cognitively normal african american and caucasian older adults. journal of the international neuropsychological society, 15(5), 758–768. https://doi.org/10.1017/s1355617709990361 rabin, l.a., barr, w.b., & burton, l.a. (2005). assessment practices of clinical neuropsychologists in the united states and canada: a survey of ins, nan, and apa division 40 members. archives of clinical neuropsychology, 20(1), 33–65. https://doi.org/10.1016/j.acn.2004.02.005 rabin, l.a., paolillo, e., & barr, w.b. (2016). stability in test-usage practices of clinical neuropsychologists in the united states and canada over a 10-year period: a follow-up survey of ins and nan members. archives of clinical neuropsychology, 31(3), 206–230. https://doi.org/10.1093/arclin/acw007 robbins, r.n., joska, j.a., thomas, k.g.f., stein, d.j., linda, t., mellins, c.a., & remien, r.h. (2013). exploring the utility of the montreal cognitive assessment to detect hiv-associated neurocognitive disorder: the challenge and need for culturally valid screening tests in south africa. the clinical neuropsychologist, 27(3), 437–454. https://doi.org/10.1080/13854046.2012.759627 roberts, p.m., garcia, l.j., desrochers, a., & hernandez, d. (2002). english performance of proficient bilingual adults on the boston naming test. aphasiology, 16(4–6), 635–645. https://doi.org/10.1080/02687030244000220 roebuck-spencer, t.m., glen, t., puente, a.e., denney, r.l., ruff, r.m., hostetter, g., & bianchini, k.j. (2017). cognitive screening tests versus comprehensive neuropsychological test batteries: a national academy of neuropsychology education paper. archives of clinical neuropsychology, 32(4), 491–498. https://doi.org/10.1093/arclin/acx021 saxton, j., ratcliff, g., munro, c.a., coffey, e.c., becker, j.t., fried, l., & kuller, l. (2000). normative data on the boston naming test and two equivalent 30-item short forms. the clinical neuropsychologist, 14(4), 526–534. https://doi.org/10.1076/clin.14.4.526.7204 segran, e. (2017). the unicorn craze, explained. retrieved from https://www.fastcompany.com/40421599/inside-the-unicorn-economy. statistics south africa. (2011). census 2011. pretoria: statistics south africa. strain, j.f., didehbani, n., spence, j., conover, h., bartz, e.k., mansinghani, s., … womack, k.b. (2017). white matter changes and confrontation naming in retired aging national football league athletes. journal of neurotrauma, 34(2), 372–379. https://doi.org/10.1089/neu.2016.4446 strauss, e., sherman, e.m.s., & spreen, o. (2006). a compendium of neuropsychological tests: administration, norms, and commentary (3rd edn.). new york: oxford university press. tallberg, i.-m. (2005). the boston naming test in swedish: normative data. brain and language, 94(1), 19–31. https://doi.org/10.1016/j.bandl.2004.11.004 teng, e.l., wimer, c., roberts, e., damasio, a.r., eslinger, p.j., folstein, m.f., … henderson, v.w. (1989). alzheimer’s dementia: performance on parallel forms of the dementia assessment battery. journal of clinical and experimental neuropsychology, 11(6), 899–912. https://doi.org/10.1080/01688638908400943 tombaugh, t.n., & hubley, a.m. (1997). the 60-item boston naming test: norms for cognitively intact adults aged 25 to 88 years. journal of clinical and experimental neuropsychology, 19(6), 922–932. https://doi.org/10.1080/01688639708403773 williams, b.w., mack, w.j., & henderson, v.w. (1989). boston naming test in alzheimer’s disease. neuropsychologia, 27(8), 1073–1079. https://doi.org/10.1016/0028-3932(89)90186-3 world medical association. (2013). world medical association declaration of helsinki; ethical principles for medical research involving human subjects. the journal of the american medical association, 310(20), 2191–2194. https://doi.org/10.1001/jama.2013.281053 worrall, l.e., yiu, e.m.l., hickson, l.m.h., & barnett, h.m. (1995). normative data for the boston naming test for australian elderly. aphasiology, 9(6), 541–551. https://doi.org/10.1080/02687039508248713 yochim, b.p., kane, k.d., & mueller, a.e. (2009). naming test of the neuropsychological assessment battery: convergent and discriminant validity. archives of clinical neuropsychology, 24(6), 575–583. https://doi.org/10.1093/arclin/acp053 zec, r.f., burkett, n.r., markwell, s.j., & larsen, d.l. (2007). a cross-sectional study of the effects of age, education, and gender on the boston naming test. the clinical neuropsychologist, 21(4), 587–616. https://doi.org/10.1080/13854040701220028 footnotes 1. section 35(1) of the south african schools act requires that each province’s executive council consults each year with the national minister of education to identify and publish the national quintile category within which each public school in the province will be placed. a school’s quintile is determined by the wealth of the surrounding community (i.e. the likely wealth of most students who will attend the school). quintile 1 schools are the poorest 20% of schools, quintile 2 schools are the next poorest 20%, and so on, with quintile 5 including the wealthiest 20% of schools. quintile 1 schools receive the highest per-student governmental allocation, and quintile 5 the lowest. quintiles 1–3 include no-fee schools (http://section27.org.za/wp-content/uploads/2017/02/chapter-7.pdf). abstract method results discussion limitations conclusion acknowledgements references about the author(s) tyrone b. pretorius department of psychology, faculty of community and health sciences, university of the western cape, cape town, south africa citation pretorius, t.b. (2022). the applicability of the ucla loneliness scale in south africa: factor structure and dimensionality. african journal of psychological assessment, 4(0), a63. https://doi.org/10.4102/ajopa.v4i0.63 original research the applicability of the ucla loneliness scale in south africa: factor structure and dimensionality tyrone b. pretorius received: 18 june 2021; accepted: 11 oct. 2021; published: 10 jan. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract this study examines the generalisability of the university of california los angeles loneliness scale version 3 (ucla-ls3) in a south african sample of young adults. in particular, it examined the normative data, reliability, and factor structure of this scale. the participants were young adults (n = 337) who were randomly sampled from a university population and they responded to the ucla loneliness scale. it was found that the sample had higher loneliness scores than those reported in the literature, potentially suggesting that loneliness may be a significant mental health concern amongst this group. women reported higher levels of loneliness than men. reliability analysis (cronbach’s alpha) and analysis of the influence of individual items on the mean, variance, and alpha demonstrated that ucla-ls3 had highly satisfactory internal consistency in the sample. confirmatory factor analysis (cfa) was used to test four conceptualisations of the factor structure of ucla-ls3: a one-factor model, a correlated three-factor model, a bifactor model with two subscales, and a bifactor model with three subscales. notably, cfa demonstrated that the two bifactor models are a better fit than the one-factor and correlated three-factor models and that the bifactor model with three subscales is marginally a better fit than the bifactor model with two subscales. ancillary bifactor analysis confirmed the dimensionality of the scale as sufficient variance was accounted for by the three subscales, after the variance attributable to the total scale was partitioned out. therefore, ucla-ls3 is best conceptualised as comprising of three subscales (isolation, relational connectedness, collective connectedness), in addition to a total scale. keywords: bifactor; covid-19; loneliness; ucla-ls3; reliability; south africa. recently, the onset of the novel corona virus disease 2019, referred to as covid-19, and its rapid outbreak led to a global public health crisis (luchetti et al., 2020). in the absence of a vaccine or cure, the world health organization (who) recommended social distancing, self-isolation, and quarantine to slow the spread of the virus and in this way reduce the burdens imposed on healthcare systems (who, 2020). loneliness has been identified as one of the most salient mental health consequences of the covid-19 pandemic and the prevention measures resulting from it, like stay-at-home directives, limited in-person contact with family and friends, self-isolation and quarantine, etc. (killgore, cloonen, taylor, & dailey, 2020; rosenberg, luetke, hensel, kianersi, & herbenick, 2020). loneliness is defined as the discrepancy between one’s actual levels of interaction with others and the desired levels thereof (perlman & peplau, 1981). it is a subjective and distressing emotional state characterised by the perception of social isolation and sense of feeling alone. even prior to covid-19, loneliness was a risk factor for adverse physical and mental illnesses, including depression, cardiovascular disease, alzheimer’s disease, and other problems such as suicide, lower life satisfaction, reduced work performance and mortality (chang et al., 2017; chiao, chen, & yi, 2019; maguire, hanly, & maguire, 2019). hence, the assessment of loneliness has been identified as a significant public health issue (killgore et al., 2020). according to evolutionary theory, loneliness serves as a signal to reconnect with significant others (luchetti et al., 2020). however, in the context of a pandemic characterised by self-quarantine and stay-at-home directives, such reconnection may not be possible. this is even more so in developing contexts where access to digital technologies that assists with social connectedness are not affordable or available to a large part of the population (padfmanabhanunni & pretorius, 2021). in such circumstances, feelings of loneliness may become aggravated and evoke further distress (luchetti et al., 2020). the most widely used self-report measure for the assessment of loneliness amongst adults and adolescents is the ucla loneliness scale (ucla-ls: russell, 1996; russell, peplau, & ferguson, 1978; russell, peplau, & cutrona, 1980). the ucla-ls operationalises loneliness as an undifferentiated unitary state that varies in intensity and arises from perceived deficits in social relationships (russell, 1996). three versions exist for this scale. the original version (russell et al., 1978) contained 20 negatively worded items that assessed an individual’s perceived experience of loneliness. because all items were negatively worded, it produced systematic bias in responses. a revised version of the scale was then developed (russell et al., 1980) that contained 10 negatively worded and 10 positively worded items. however, studies involving exploratory factor analysis (efa) reported a variety of factor solutions including one-factor (pretorius, 1993), two-factor with positive and negative items (mahon et al., 1995), three-factor (dussault, fernet, austin, & leroux, 2009) as well as fourand five-factor structures (e.g. neto, 1992). these inconsistencies limited the utility of the scale as an assessment tool and prompted a revision (russell, 1996). the third version of the scale (ucla-ls3: russell, 1996) contains 11 negative and 9 positive items. the internal consistency, test-retest reliability and discriminant and convergent validity of the ucla-ls3 has been supported by several studies (e.g. auné, abal, & attorresi, 2020; lópez-ramos, navarro-pardo, fernández muñoz, & da silva pocinho, 2017). further confirming its utility over the two previous versions, the ucla-ls3 has been adapted and validated in various cultural contexts and countries (e.g. france, germany, greece, japan, and russia: perlman & peplau, 1998; argentina: auné et al., 2020; saudi arabia: alnajjar & dodeen, 2017; spain: sancho, pinazo-hernandis, donio-bellegarde, & tomás, 2020; poland: kwiatkowska, rogoza, & kwiatkowska, 2018; iran: zarei, memari, moshayedi, & shayestehfar, 2016; portugal: lópez-ramos et al., 2017; palestine: nazzal, cruz, & neto, 2018). with reference to the scale’s dimensionality, russell (1996) maintained that the ucla-ls3 has a unidimensional structure. however, one-factor (e.g. lasgaard, 2007), two-factor (e.g. dodeen, 2015), three-factor (e.g. hawkley, browne, & cacioppo, 2012; shevlin, murphy, & murphy, 2014), and bifactor solutions (e.g. auné et al., 2020) have been reported thus suggesting a possible multidimensional factor structure. this multidimensionality implies that loneliness is a complex experience that involves various types of relationships and interactions. in accounting for these findings, russell (1996) maintained that the scale measures a unitary state which can be reached via deficits in different relationships and social networks. nevertheless, the model that has received the most support consists of three factors namely, isolation (i.e. the feelings of isolation that underlie loneliness), social connectedness which refers to the need for meaningful group connectedness and relational connectedness which refers to the need for social contact and close friendships (hawkley et al., 2012; shevlin et al., 2014). given that loneliness has been identified as a priority mental health problem during the covid-19 pandemic and the ucla-ls3 continues to be the most widely used instrument for assessing loneliness, it remains important to understand how the scale performs in different cultural contexts. responses to psychological scales such as the ucla-ls3 is influenced by the culture, age and socio-economic circumstances of the respondent (dodeen, 2014; durak & senol-durak, 2010). younger people and those from disadvantaged backgrounds may be more susceptible to wording effects (wouters et al, 2012). hence, it remains relevant to understand how the scale performs in different populations. this study aims to provide insights into the manifestation of loneliness amongst a sample of young adults during covid-19 and to investigate the psychometric properties of the ucla-ls in this population group. measuring instruments such as the ucla-ls3 that are developed in the fields of personality or social psychology typically have a well-defined exploratory factor analytic structure (boffo, mannarini, & munari, 2012). however, they are often not adequately supported by confirmatory factor analysis (cfa) and may not reach minimum standards of fit (boffo et al., 2012). research on loneliness in developing contexts like south africa requires suitable instruments with sound psychometric properties that have cross-national applicability. heppner, pretorius, wei, lee and wang (2002) argued that such cross-national applicability of instruments would allow building a more comprehensive knowledge base by searching for psychological universals (i.e. etic approach) and identifying culturally specific constructs, which are useful for identifying and explaining cultural differences (i.e. emic approach). overall, the research findings on factor structure of the ucla-ls3 have been equivocal. hence, this study aims to examine the normative data, reliability, and factor structure of the scale in the south african context. method participants the study participants were young adults who were doing undergraduate studies at a university in the province of the western cape, south africa. the study design was cross-sectional in nature, and the sample of young adults (n = 337) were randomly sampled from the university student population. in terms of gender, 77.2% of the sample were female and 22.0% were male, whilst 0.8% self-identified as transgender or binary. the mean age of the sample was 21.95 years (sd = 4.7). instruments in addition to a demographic questionnaire, all the participants completed the ucla loneliness scale (ucla-ls3; russell, 1996). the ucla-ls3 consists of 20-items scored on a 4-point scale that ranged from i never feel this way (1) to i often feel this way (4). the ucla-ls3 is purportedly a measure of an individual’s subjective feelings of loneliness and feelings of social isolation. generally, good internal consistency reliability has been reported for the ucla-ls3 (α 0.94 to 0.96: doğan, çötok, & tekin, 2011). the ucla-ls3 has also been previously used in one south african study, which reported a cronbach’s alpha of 0.77 (pretorius, 1993). procedure google forms were used to construct an electronic version of the ucla-ls3 and the link was distributed to a random sample of students via an email by the registrar’s office. the participants could access and complete the link between march and june 2021. data analysis in this study, cfa was used to examine four operationalisations of the structure of ucla-ls3. in cfa, the factors are regarded as latent variables which are represented by the items that are the observed measurements (bentler, 1995). the four models of the structure of the ucla-ls3 that were assessed, were a model representing only a total loneliness score (one-factor model), a model representing the structure of the ucla-ls3 as three correlated subscales (correlated three-factor model), and two models in which the structure of the scale is hypothesised to have both a total scale as well as subscales (bifactor model). the first bifactor model postulates that the ucla-ls3 consists of a total scale (general factor) with two uncorrelated subscales (specific factors) reflecting the variance amongst clusters of items (mansolf & reise, 2017), whereas the second bifactor model conceptualised ucla-ls3 as consisting of a total scale (general factor) with three uncorrelated subscales (specific factors). in cfa, the chi-square statistic (χ2) is used to determine whether the proposed model fits the observed data. however, it has been reported that χ2 test is very sensitive to the violations of distributional assumptions and is affected by sample size (jöreskog, olsson, & wallentin, 2016), it has, therefore, been recommended (kline, 2005) that the following fit indices should also be reported: the root-mean-square error of approximation (rmsea; ≤ 0.08 indicates good fit), comparative fit index (cfi; ≤ 0.90 indicates good fit), and standardised root-mean-square residual (srmr; ≤ 0.08 indicates good fit). other indices that are commonly reported include the goodness-of-fit index (gfi; ≥ 0.95 indicates good fit) and tucker–lewis index (tli; ≥ 0.90 indicates good fit; byrne, 1994; hu & bentler, 1999). when models are being compared, it is recommended that indices, which is used to compare models such as the akaike information criterion (aic), also be included (arbuckle, 2012). in general, the model with the lowest aic value is considered to have the best fit. the cfa analyses were conducted using ibm spss amos (version 26; ibm corp., armonk, ny, usa). it has been pointed out that fit indices do not necessarily address the dimensionality of a scale and may lead to incorrect conclusions about the structure of a scale (pretorius, 2021). we used the bifactor indices calculator (dueber, 2017) to calculate additional bifactor measures to address the dimensionality of the ucla-ls3. these measures include (1) explained common variance (ecv), which refers to the proportion of variance accounted for by the specific factor; (2) mcdonald’s omega, which is regarded as an alternative to coefficient alpha, as an estimate of reliability (omegas for subscales); and (3) omega hierarchical (omegah), which reflects the percentage of variance in total scores that is the result of individual differences on the general factor. an omegah > 0.80 reflects, that irrespective of good fit indices, the scale is largely unidimensional. with respect to specific factors (subscales), omegahs reflects the percentage of systematic variance of the specific factors, after the variance accounted for by the general factor is excluded (rodriguez, reise, & haviland, 2016). ethical considerations ethical approval for this study was obtained from the humanities and social sciences research committee of the university of the western cape (ethics reference number: hs20/5/1). the participants completed the survey anonymously. the first item in the electronic survey gave participants the opportunity to provide informed consent. at the end of the survey the contact details for the south african depression and anxiety group and the centre for student counselling and participants were urged to reach out to those services if they experienced any distress during completion of the questionnaire. results descriptive statistics the mean loneliness score (m = 49.1; sd = 11.6) was much higher than that previously found in a similar sample from south africa (pretorius, 1993; m = 38.8, sd = 7.8), as well as those in the literature, m = 34 to 38: e.g. auné et al., 2020; hartshorne, 1993; shevlin et al., 2015). the score was also higher than that found in a covid-19 study (killgore et al., 2020; m = 43.8, sd = 13.5). reliability the reliability of the ucla-ls3 (alpha = 0.923, omega = 0.924) was satisfactory, and compared favourably with the reliabilities documented in previous studies (e.g. tull et al., 2020). table 1 shows the impact of each item, if it were to be deleted, on the alpha coefficient, variance and mean, as well as the item-total correlation. table 1: the influence of items on the mean, variance, and alpha and item-total correlation. the correlations between individual items and the total score ranged between 0.26 and 0.73, whereas the item-deleted alphas were all 0.92, except for item 17, which had an alpha of 0.93. except for one related to items 8 and 15, the interitem correlations ranged between 0.11 and 0.87 and were all significant. the mean interitem coefficient was 0.37. overall, the evidence for internal consistency was highly satisfactory. factor structure in this study, we compared a one-factor model (ucla-ls3 total score), a correlated three-factor model, a bifactor model with two subscales, and a bifactor model with three subscales. notably, the two or three subscales conceptualisations of the bifactor models were based on previous factor analysis studies (hawkley et al., 2012; shevlin et al., 2014; wilson, cutts, lees, mapungwana, & maunganidze, 1992), which identified a two-factor solution (social others = so; intimate others = io) as well as a three-factor solution (collective connectedness = cc; isolation = i; relational connectedness = rc). table 2 shows the item groupings and factor labelling for the one-factor model, the three-factor model, and the two bifactor models based on these studies. table 2: item groupings for the one-factor and two bifactor models. the four representations of the factor structure of ucla-ls3 and the results of the cfa are shown in figures 1–4. figure 1: one-factor model of ucla-ls3. figure 2: correlated three-factor model. figure 3: bifactor model of ucla-ls3: two subscales. figure 4: bifactor model of ucla-ls3: three subscales. in the one-factor model, it is assumed that a single factor (total loneliness score) best explains the variance amongst the items, whereas the correlated three-factor model presumes that three related factors account for the variance. the bifactor model, in contrast, presumes that a general factor (loneliness) explains a certain proportion of the variance, whereas two/three specific factors (subscales) account for the remaining variance. table 3 reflects the fit indices for the three models. table 3: fit indices for three models of the structure of ucla-ls3. as detailed in table 3, the one-factor model failed to meet any of the criteria indicating a good fit. the correlated three-factor model had much better indices than those of the one-factor model, but they were marginally lower than those of the bifactor models, with only marginal differences in the fit indices of the two bifactor models. in both the two-subscale and the three-subscale bifactor models, tli, cfi, and rmsea were identical (tli = 0.92, cfi = 0.94, and rmsea = 0.07), indicating a reasonable fit. however, in terms of gfi and srmr, the three-subscale bifactor model showed a better fit. in addition, the model comparison index (i.e. aic) was lower for the three-subscale bifactor model than for the two-subscale bifactor model (487.13 in comparison to 506.37), indicating a slightly better fit. despite the evidence provided by the cfa in relation to the superiority of the bifactor structure for ucla-ls3, the cfa did not address the dimensionality of the scale. more specifically, the cfa did not clarify the relative proportion of variance accounted for by the total scale and subscales. in this regard, rodriguez et al. (2016), for example, urged researchers to use bifactor indices, over and above fit indices to examine the dimensionality of instruments. table 4 reflects the bifactor indices for the ucla-ls3. table 4: dimensionality indices for ucla-ls3. the ecv is the percentage of all common variance for all items explained by a factor. table 4 indicates that the general factor (loneliness) explained 59% and 57%, respectively, of the common variance in the two-subscale and three-subscale bifactor models. therefore, specific factors (io and so in the case of the two-subscale bifactor model and i, rc, and cc in the case of the three-subscale bifactor model) explained 41% and 43% of the variance, respectively. this result confirms the multidimensionality of ucla-ls3, as the specific factors accounted for sufficient variance, after the variance accounted for by the general factor was taken into consideration. in addition, omegah, which reflects the percentage of variance in total scores accounted for by the general factor, was below the cut-off point suggested in the literature (omegah = 0.68 and 0.69). reise, bonifay and haviland (2013) proposed that when omegah is greater than 0.80, the scale can be considered essentially unidimensional. lastly, the omegas coefficient, which is a model-based estimate of reliability, further confirmed that the various subscales in the two-subscale and three-subscale bifactor models (io: omegas = 0.92; so: omegas = 0.90; i: omegas = 0.90; rc: omegas = 0.89; cc = 0.79) demonstrated sufficient reliability. both the cfa and bifactor indices provide support for the use of ucla-ls3, as consisting of a total scale as well as either two or three subscales, although the three-subscale bifactor model fit the data marginally better than the two-subscale bifactor model. discussion the aim of the present study was to extend current research on ucla-ls3 by investigating its applicability to a sample of young adults in south africa. the current study, hence, focused on the psychometric properties and normative data of the ucla-ls3. there were several important findings. firstly, we found that the participants in the current study reported higher levels of loneliness than those found in prior studies (e.g. killgore et al., 2020), thus suggesting a potential mental-health crisis. such unprecedented levels of loneliness can be attributed to covid-19 preventive measures, such as prolonged prohibition of in-person social contact and social distancing directives. they could also be the result of disparities in the access to digital technologies that can be used to circumvent these restrictions and maintain social connectedness. women reported higher loneliness mean scores then men, a finding that confirms previous research on gender differences in loneliness (e.g. wang & zhao 2020). such higher levels of loneliness amongst women may be because of the impact of gender role socialisation, which contributes to women prioritising affiliations with friends and family. disruptions in the access to these networks as a result of stay-at-home directives may also contribute to increased psychological distress for women (li & wang, 2020). secondly, reliability analysis (cronbach’s alpha) and analysis of the influence of individual items on the mean, variance, and alpha demonstrated that ucla-ls3 exhibited highly satisfactory internal consistency in the sample. the obtained alpha coefficient was comparable to those reported in the literature (e.g. ausín, muñoz, martín, pérez-santos, & castellanos, 2019; pikea, parpa, tsilika, galanos, & mystakidou., 2016). the results, therefore, support the use of ucla-ls3 in culturally diverse contexts, and the findings suggest that meaningful comparisons can be drawn across countries. thirdly, the cfa demonstrated that the two bifactor models were a better fit than the one-factor model. the bifactor indices also demonstrated that the two and three subscales of the bifactor models account for sufficient variance after the variance accounted for by the general factor was considered. the bifactor model with three subscales had marginally better fit indices than those of the bifactor model with two specific subscales. these results, therefore, support the findings of hawkley et al. (2012) and shevlin et al. (2014). thus, it can be concluded that ucla-ls3 is best conceptualised as comprising of three subscales (isolation, relational connectedness, collective connectiveness), in addition to a total scale. the study confirms that loneliness, as measured by the ucla-ls3, is a multidimensional concept. it also provides support for the future use of the scale in the south african context, especially related to the assessment of loneliness during the covid-19 pandemic. limitations this study has some limitations. the first limitation is the small sample size. however, the sample size is still within the rule of thumb of 10 cases per variable (wolf, harrington, clark, & miller, 2013). moreover, the unequal gender sample, whilst consistent with the demographics in the university, prevented gender comparisons of models. conclusion this study provides support for the generalisability of ucla-ls3 in a south african sample of young adults and paves the way for its further use in south african samples. the scale demonstrated sound reliability, and the bifactor analysis confirmed the multidimensionality thereof. acknowledgements competing interests the author declares that he has no financial or personal relationships that may have inappropriately influenced him in writing this research article. author’s contributions t.b.p. is the sole author of this article. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the data will be available at https://uwc.figshare.com/. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author. references alnajjar, a., & dodeen, h.a.m.z.e.h. (2017). factor structure of the arabic version of the ucla loneliness scale. international journal of research in humanities, arts and literature, 5(9), 171–184. https://doi.org/10.1080/03601277.2015.1065688 arbuckle, j.l. (2012), amos 21.0 user’s guide. mount pleasant: amos development corporation. auné, s.e., abal, f.j.p., & attorresi, h.f. (2020). modeling of the ucla loneliness scale according to the multidimensional item response theory. current psychology, 2, 1–8. https://doi.org/10.1007/s12144-020-00646-y ausín, b., muñoz, m., martín, t., pérez-santos, e., & castellanos, m.á. (2019). confirmatory factor analysis of the revised ucla loneliness scale (ucla ls-r) in individuals over 65. aging & mental health, 23(3), 345–351. https://doi.org/10.1080/13607863.2017.1423036 bentler, p.m. (1995). eqs: structural equations program manual. multivariate software. los angeles: university of california. boffo, m., mannarini, s., & munari, c. (2012). exploratory structure equation modeling of the ucla loneliness scale: a contribution to the italian adaptation. tpm: testing, psychometrics, methodology in applied psychology, 19(4), 345–363. https://doi.org/10.4473/tpm19.4.7 byrne, b.m. (1994). testing for the factorial validity, replication, and invariance of a measuring instrument: a paradigmatic application based on the maslach burnout inventory. multivariate behavioral research, 29, 289–311. https://doi.org/10.1207/s15327906mbr2903_5 chang, e.c., wan, l., li, p., guo, y., he, j., gu, y., … batterbee, c.n.h. (2017). loneliness and suicidal risk in young adults: does believing in a changeable future help minimize suicidal risk amongst the lonely? the journal of psychology, 151(5), 453–463. https://doi.org/10.1080/00223980.2017.1314928 chiao, c., chen, y.h., & yi, c.c. (2019). loneliness in young adulthood: its intersecting forms and its association with psychological well-being and family characteristics in northern taiwan. plos one, 14(5), e0217777. https://doi.org/10.1371/journal.pone.0217777 dodeen, h. (2015). the effects of positively and negatively worded items on the factor structure of the ucla loneliness scale. journal of psychoeducational assessment, 33(3), 259–267. https://doi.org/10.1177%2f0734282914548325 doğan, t., çötok, n.a., tekin, e.g. (2011). reliability and validity of the turkish version of the ucla loneliness scale (uls-8) amongst university students. procedia-social and behavioral sciences, 15, 2058–2062. https://doi.org/10.1016/j.sbspro.2011.04.053 dueber, d.m. (2017). bifactor indices calculator: a microsoft excel-based tool to calculate various indices relevant to bifactor cfa models. retrieved from http://sites.education.uky.edu/apslab/resources/ durak, m., & senol-durak, e. (2010). psychometric qualities of the ucla loneliness scale-version 3 as applied in a turkish culture. educational gerontology, 36(10–11), 988–1007. https://doi.org/10.1080/03601271003756628 dussault, m., fernet, c., austin, s. & leroux, m. (2009). revisiting the factorial validity of the revised ucla loneliness scale: a test of competing models in a sample of teachers. psychological reports, 105(3), 849–856. https://doi.org/10.2466%2fpr0.105.3.849-856 hartshorne, t.s. (1993). psychometric properties and confirmatory factor analysis of the ucla loneliness scale. journal of personality assessment, 61(1), 182–195. https://doi.org/10.1207/s15327752jpa6101_14 hawkley, l.c., browne, m.w., & cacioppo, j.t. (2005). how can i connect with thee? let me count the ways. psychological science, 16(10), 798–804. https://doi.org/10.1111/j.1467–9280.2005.01617.x heppner, p.p., pretorius, t.b., wei, m., lee, d.g., & wang, y.w. (2002). examining the generalizability of problem-solving appraisal in black south africans. journal of counseling psychology, 49(4), 484. https://doi.org/10.1037/0022-0167.49.4.484 hu, l.t., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling: a multidisciplinary journal, 6, 1–55. https://doi.org/10.1080/10705519909540118 jöreskog k.g., olsson u.h., wallentin f.y. (2016) confirmatory factor analysis (cfa). in: multivariate analysis with lisrel. springer series in statistics. springer, cham. https://doi.org/10.1007/978-3-319-33153-9_7 killgore, w.d., cloonen, s.a., taylor, e.c., & dailey, n.s. (2020). loneliness: a signature mental health concern in the era of covid-19. psychiatry research, 290, 113–117. https://doi.org/10.1016/j.psychres.2020.113117 kline, r. b. (2005). principles and practice of structural equation modeling (2nd ed.). guilford: new york, ny. kwiatkowska, m.m., rogoza, r., & kwiatkowska, k. (2018, july 25). analysis of the psychometric properties of the revised ucla loneliness scale in the polish adolescent sample. current issues in personality psychology, 6(2), 164–170. https://doi.org/10.5114/cipp.2017.69681 lasgaard, m. (2007). reliability and validity of the danish version of the ucla loneliness scale. personality and individual differences, 42(7), 1359–1366. https://doi.org/10.1016/j.psychres.2020.113117 li, l.z., & wang, s. (2020). prevalence and predictors of general psychiatric disorders and loneliness during covid-19 in the united kingdom. psychiatry research, 291, 113267. https://doi.org/10.1016/j.psychres.2020.113267 lópez-ramos, y., navarro-pardo, e., fernández muñoz, j.j., & da silva pocinho, r.f. (2017). psychometric properties and factor structure of the satisfaction with life scale in an elderly portuguese students sample. anales de psicología/annals of psychology, 34(1), 146–152. https://doi.org/10.6018/analesps.34.1.267381 luchetti, m., lee, j.h., aschwanden, d., sesker, a., strickhouser, j.e., terracciano, a., & sutin, a.r. (2020). the trajectory of loneliness in response to covid-19. american psychologist, 75(7), 897–908. https://doi.org/10.1037/amp0000690 maguire, r., hanly, p., & maguire, p. (2019). living well with chronic illness: how social support, loneliness and psychological appraisals relate to well-being in a population-based european sample. journal of health psychology, 26(10), 1494–1507. https://doi.org/10.1177/1359105319883923 mahon, n. e., yarcheski, t. j., & yarcheski, a. (1995). validation of the revised ucla loneliness scale for adolescents. research in nursing & health, 18(3), 263–270. https://doi.org/10.1002/nur.4770180309 mansolf, m., & reise, s.p. (2017). when and why the second-order and bifactor models are distinguishable. intelligence, 61, 120–129. https://doi.org/10.1016/j.intell.2017.01.012 nazzal, f.i., cruz, o., & neto, f. (2018). psychometric analysis of the short-form ucla loneliness scale (uls-6) amongst palestinian university students. retrieved from https://www.psycharchives.org/bitstream/20.500.12034/1766/1/ijpr.v11i2.269.pdf neto, f. (1992). loneliness among portuguese adolescents. social behavior and personality: an international journal, 20(1), 15–21. https://doi.org/10.2224/sbp.1992.20.1.15 padmanabhanunni, a., & pretorius, t. b. (2021). the unbearable loneliness of covid-19: covid-19-related correlates of loneliness in south africa in young adults. psychiatry research, 296, 113658. https://doi.org/10.1016/j.psychres.2020.113658 perlman, d., & peplau, l.a. (1981). toward a social psychology of loneliness. personal relationships, 3, 31–56. retrieved from http://www.academia.edu/download/40307428/perlman___peplau_81.pdf perlman, d., & peplau, l. a. (1998). loneliness. in h. s. friedman (ed.) encyclopedia of mental health, vol 2 (pp. 571–581). academic press: santiago, ca. pikea, p., parpa, e., tsilika, e., galanos, a., & mystakidou, k. (2016). psychometric properties of the greek-university of california, los angeles loneliness scale-version 3 in a sample of people with human immunodeficiency virus. world journal of aids, 6(4), 157–168. https://doi.org/10.4236/wja.2016.64018 pretorius, t.b., (1993). the metric equivalence of the ucla loneliness scale for a sample of south african students. educational and psychological measurement, 53(1), 233–239. https://doi.org/10.1177/0013164493053001026 pretorius, t.b. (2021). over reliance on model fit indices in confirmatory factor analyses may lead to incorrect inferences about bifactor models: a cautionary note. african journal of psychological assessment, 3, 4. https://doi.org/10.4102/ajopa.v3i0.35 reise, s.p., bonifay, w.e., & haviland, m.g. (2013). scoring and modeling psychological measures in the presence of multidimensionality. journal of personality assessment, 95(2), 129–140. https://doi.org/10.1080/00223891.2012.725437 rodriguez, a., reise, s.p., & haviland, m.g. (2016). evaluating bifactor models: calculating and interpreting statistical indices. psychological methods, 21(2), 137. https://doi.org/10.1037/met0000045 rosenberg, m., luetke, m., hensel, d., kianersi, s., herbenick, d. (2020). depression and loneliness during covid-19 restrictions in the united states, and their associations with frequency of social and sexual connections. medrxiv. https://doi.org/10.1101/2020.05.18.20101840 russell, d., peplau, l.a., & cutrona, c.e., (1980). the revised ucla loneliness scale: concurrent and discriminant validity evidence. journal of personality and social psychology, 39(3), 472–480. https://doi.org/10.1037/0022-3514.39.3.472 russell, d., peplau, l.a., & ferguson, m.l. (1978). developing a measure of loneliness. journal of personality assessment, 42(3), 290–294. https://doi.org/10.1207/s15327752jpa4203_11 russell, d.w. (1996). ucla loneliness scale (version 3): reliability, validity, and factor structure. journal of personality assessment, 66(1), 20–40. https://doi.org/10.1207/s15327752jpa6601_2 sancho, p., pinazo-hernandis, s., donio-bellegarde, m., & tomás, j.m. (2020). validation of the university of california, los angeles loneliness scale (version 3) in spanish older population: an application of exploratory structural equation modeling. australian psychologist, 55(3), 283–292. https://doi.org/10.1111/ap.12428 shevlin, m., murphy, s., & murphy, j. (2014). the latent structure of loneliness testing competing factor models of the ucla loneliness scale in a large adolescent sample. assessment, 22(2), 208–215. https://doi.org/10.1177/1073191114542596 tull, m.t., edmonds, k.a., scamaldo, k., richmond, j.r., rose, j.p., & gratz, k.l., 2020, psychological outcomes associated with stay-at-home orders and the perceived impact of covid-19 on daily life. psychiatry research, 289, 113098. http://doi.org/10.1016/j.psychres.2020.113098 wang, c., & zhao, h. (2020). the impact of covid-19 on anxiety in chinese university students. frontiers in psychology, 11, 1168. https://doi.org/10.3389/fpsyg.2020.01168 wilson, d., cutts, j., lees, i., mapungwana, s., & maunganidze, l. (1992). psychometric properties of the revised ucla loneliness scale and two short-form measures of loneliness in zimbabwe. journal of personality assessment, 59, 72–81. https://doi.org/10.1207/s15327752jpa5901_7 world health organisation. (2020). coronavirus disease (covid-2019) situation reports. retrieved from https://www.who.int/emergencies/diseases/novelcoronavirus-2019/situation-reports wolf, e.j., harrington, k.m., clark, s.l., & miller, m.w. (2013). sample size requirements for structural equation models: an evaluation of power, bias, and solution propriety. educational and psychological measurement, 73(6), 913–934. https://doi.org/10.1177%2f0013164413495237 wouters, e., booysen, f. l. r., ponnet, k., & baron van loon, f. (2012). wording effects and the factor structure of the hospital anxiety & depression scale in hiv/aids patients on antiretroviral treatment in south africa. plos one, 7(4), e34881. https://doi.org/10.1371/journal.pone.0034881 zarei, s., memari, a.h., moshayedi, p., & shayestehfar, m. (2016). validity and reliability of the ucla loneliness scale version 3 in farsi. educational gerontology, 42(1), 49–57. https://doi.org/10.1080/03601277.2015.1065688 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) danille e. arendse research consultancy department, military psychological institute, pretoria, south africa citation arendse, d.e. (2020). the impact of different time limits and test versions on reliability in south africa. african journal of psychological assessment, 2(0), a14. https://doi.org/10.4102/ajopa.v2i0.14 original research the impact of different time limits and test versions on reliability in south africa danille e. arendse received: 18 mar. 2019; accepted: 10 jan. 2020; published: 03 mar. 2020 copyright: © 2020. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the empirically developed english comprehension test (ect) was created for organisational and educational purposes to assess verbal reasoning. the initial version of the ect had an associated time limit of 45 min, which required individuals to complete it within the specified time, while the later version of the ect had no time limit. the ect’s two test versions – a timed and an untimed version – were piloted as part of the development and validation of the ect. the purpose of this article was to explore the internal consistency of the two test versions and compare the reliability of the timed and untimed versions of the ect. this study was conducted to establish whether reliability was affected by the different time limit-related requirements. the sample size for ect version 1.2 was 597 and ect version 1.3 comprised 882 individuals. the methods used for comparison in this article involved a graphical display of performance relating to both test versions and an exploration of the times recorded for the untimed test version. a reliability analysis was performed to evaluate the internal consistency of the two test versions. the performance of individuals in the untimed and timed versions of the ect was similar based on the average minimum and maximum scores. the cronbach’s alpha indicated that verbal reasoning was measured consistently for the two test versions. this result suggested that time did not negatively affect the reliability of the test. keywords: psychometrics; reading comprehension; reliability; cronbach’s alpha; test performance; timed assessment; time limit. introduction reading comprehension comprises cognitive and linguistic components that support an individual in generating meaning from a text. the core processes that assist in producing meaning from texts are decoding and comprehension (kendeou, van den broek, helder, & karlsson, 2014; pretorius, 2002). these processes of decoding and comprehension are inter-related and associated with reading and literacy (kanniainen, killi, tolvanen, aro, & leppanen, 2019). literacy is an important aspect of learning and is required in various facets of one’s life. it begins with early schooling and continues until necessary as part of executing one’s job or for studying further. because literacy involves reading and making meaning, it was of concern that south african research about grades 3 and 4 learners indicated that they were struggling to comprehend and derive meaning from texts at school (howie et al., 2017; spaull, 2016). although these studies refer to the literacy levels of grades 3 and 4 learners, the reality of a literacy crisis in south africa strikes when learners’ foundations in literacy and reading are not in place, which is also one of the reasons for the low retention of learners until they complete grade 12 (spaull, 2013). the significance of literacy skills is that it affects reading and the performance of learners in reading comprehension and verbal assessments (kanniainen et al., 2019). moreover, reading comprehension is a component of learning english (bahardoost & ahmadi, 2018). the literacy and comprehension levels of english additional language learners (this refers to learners who are non-native english speakers) can, however, be affected by the availability of resources and the socio-economic status (ses) of these learners, factors that negatively affect the quality of the education they receive (cockcroft, bloch, & moolla, 2016; spaull, 2016). the educators’ level of competence in english, which causes them to use code switching (this refers to the mixture of english and other native languages in south africa), is also a factor contributing to low english literacy amongst non-native english speakers (krugal & fourie, 2014; kuwornu, 2017). in a context other than that of south africa, the australian context has, for example, identified similar issues related to english language comprehension and literacy faced by individuals from an aboriginal background (dingwall, gray, mccarthy, belima, & bowden, 2017; dingwall, lindeman, & cairney, 2014). a consideration of the factors that affect literacy and comprehension is important when evaluating an english test in south africa. the english comprehension test (ect) is theorised to measure verbal reasoning. furthermore, the ect is a south african test initiative that addresses the need to develop local tests that provide for the multicultural context in which tests in south africa are used (bekwa, 2016). the development and refinement of the test, which are still underway, have led to two test versions (ect version 1.2 and version 1.3) so far. the reasoning behind the two test versions was that the latter (ect 1.3) would be an improved version of the former (ect 1.2). the removal of a time limit is one of the changes that was made with respect to both test versions and is the specific focus of this article. therefore, the compromise between speed and ability is a significant factor when evaluating the reliability of the ect (goldhammer, 2015; streiner, 2003). the adjustment of test conditions of assessments, such as extending the time allocated for the assessment, can be likened to a process of accommodation. accommodation in relation to the ect can be viewed as a form of support that allows test-takers to show their understanding of the assessment (kuwornu, 2017). timed assessments, particularly in the case of power tests, may affect reliability because they are focused on the items completed rather than on the responses to items (goldhammer, 2015; lee & chen, 2011; streiner, 2003). it is thus imperative to explore the impact of time on the reliability of the ect and to investigate the actual time required by the slowest person to complete the test. although the slowest person could be an outlier, the focus was on allowing all the participants to complete the test, thereby fully accommodating individuals in the assessment. thus, the need to extend time limits in a multicultural context such as south africa is an imperative consideration for item completion. there are within-learner factors such as reading speed and coding or de-coding processes while reading, which are also worth noting (goldhammer, 2015; kendeou et al., 2014; pretorius, 2002). this also relates to the intention of the ect, which is primarily focused on eliciting the ability to decode texts and not on the ability under time-related pressure. the influence of time limits may also cause the working memory of the individual to be measured instead of the intended construct (keith & reynolds, 2010; oberauer & lewandowsky, 2013). the importance of being able to read and infer from text as well as to create meaning from text is implied in the ect (arendse & maree, 2019). in a study on the factor structure of the ect (arendse & maree, 2019), it was also indicated that the ect has a definite cognitive component. furthermore, the factors emerging from the ect, namely, reasoning, deduction and vocabulary, are directly related to reading and comprehension. the factors were labelled based on the content of their loadings, and the outcome suggested that the ect was possibly a measure of cognitive (verbal) ability. this commonality of factors and cognitive (verbal) ability was found across the two test versions. it was also argued that ect version 1.3 had a theoretically stronger factor structure, thereby suggesting that ect 1.3 was an improved test version (arendse & maree, 2019). the results of the study indicated that there was a definite dominant factor that emerged from both test versions. the evidence of the dominant factor was, however, not sufficient to claim that the test was unidimensional (arendse & maree, 2019). this is an important consideration for exploring the reliability of the ect, as the cronbach’s alpha is sensitive to multidimensional scales (abedi, 2002; osburn, 2000; streiner, 2003; taber, 2018; tavakol & dennick, 2011). the cognitively influenced factors (i.e. reasoning, deduction and vocabulary) of the ect are also affected by the reliability of the ect, as processes associated with reading comprehension and reasoning may be hampered. thus, time limits may have an influence on individuals’ reasoning ability and reading processes (reading speed and decoding). considering the findings related to literacy and comprehension levels in south africa, individuals who are not english first-language speakers may have some difficulty in completing english assessments (van de vijver & rothmann, 2004). thus, the addition of an imposed time limit could affect individuals’ true reflection of ability in assessments (angelidis, solis, lautenbach, van der does, & putman, 2019). the aspects related to timed assessments are important to acknowledge when considering the two versions of the ect. the rationale for this study was to establish whether any differences were observed in the reliability of the two versions of the ect when different time limits are applicable, one being the lack of a time limit. although there were substantial differences between the two test versions, the majority of the items remained the same. these changes across test versions are discussed in the ‘instrument’ section of this article. because time may play a role in performance in tests, it is essential to explore the reliability of the two test versions. these findings will provide important insights regarding the effects of time limits on reliability. the objectives of the study were as follows: objective 1: to assess how long individuals were taking to complete the ect by exploring the recorded times of the ect 1.3 objective 2: to explore the internal reliability of the two test versions of the ect using cronbach’s alpha. methods this study was quantitative in nature as the aims of this article were aligned with quantitative data analysis. the assessment of recorded times was done by physically recording the time that the last person completed the ect, which was thereafter captured using microsoft excel. in addition, the average of the recorded times was also interpreted. the assessment of the reliability of the two test versions involved the use of cronbach’s alpha, which was calculated using the statistical package for the social sciences (spss) 23 package. being the reliability coefficient, cronbach’s alpha is commonly used in psychology to assess internal consistency and was therefore used in this study (cronbach, 1951; osburn, 2000). because each test version of the ect was administered once, cronbach’s alpha was suitable for measuring reliability (tavakol & dennick, 2011). cronbach’s alpha was also used because the other reliability statistics measurements, such as guttman’s lambda 4 (guttman, 1945) is calculated using a split-half method rather than internal correlations, thus analysing the internal covariance between the two halves. it was also considered problematic that guttman’s lambda 4 (guttman, 1945) demands more stringent requirements regarding the sample size and length of test; thus, cronbach’s alpha was deemed more suitable (abedi, 2002; cronbach, 1951; erguven, 2014; osburn, 2000; streiner, 2003; taber, 2018; tavakol & dennick, 2011). participants the study sample comprised 597 individuals for test version 1.2 and 881 individuals for test version 1.3, respectively. for ect version 1.2, respondents’ ages ranged from 18 to 52 years (mean age = 22 years) and for ect version 1.3, the age groups ranged from 18 to 42 years (mean age = 21 years). the individuals completing the ect 1.2 were, however, predominately under the age of 35 years (n = 572, 96%), with only 24 (4%) individuals over the age of 35. the individuals completing the ect 1.3 were mainly under the age of 35 (n = 786, 89%), while 6 (1%) individuals were 36 years old and older. all nine provinces and 11 languages in south africa were represented in the sample for both test versions (arendse, 2018; arendse & maree, 2019). table 1 presents participants’ demographics for the two test versions. in the table, the percentage of gender representation is indicated, which was predominantly male in both test versions. the distribution of home languages is also indicated in terms of the following categories: english, afrikaans and african languages. this language distribution is significant for the interpretation of the reliability of the ect across the two test versions. moreover, the majority of the samples are non-native english speakers (afrikaans and african languages). table 1: differences between the two test versions of the english comprehension test. data collection instruments the ect is an individual test that is theorised to assess an individual’s verbal reasoning ability (arendse, 2018; arendse & maree, 2019). the ect contains a comprehension section that is made up of multiple-choice questions. the language section contains multiple-choice questions that have four answer options, with only one option indicating the correct answer, and a written answer section (sentence construction items). the scoring for both the comprehension and language sections was dichotomous. in table 2, an example of a comprehension question from ect versions 1.2 and 1.3 is presented. table 2: example of a test question (comprehension) in the english comprehension test versions 1.2 and 1.3. although the test is still under development, it was administered to individuals from different linguistic and cultural backgrounds in south africa. the ect has only been used for research purposes and thus the initial test version, ect 1.2, was essentially a pilot research study. a preliminarily item analysis of ect version 1.2 indicated that there were some problematic items and for this reason one problematic item was edited and two items were removed. although there are substantial differences between the administrations of the two test versions, the majority of the items across the two test versions remained the same. english comprehension test version 1.3 included five new items (plurals), such as foot or feet, to assess other language aspects not previously covered in ect version 1.2. table 3 presents an example of a test question from ect version 1.3. table 3: example of a test question (plurals) in the english comprehension test version 1.3. the age groups for the pilot study of the ect were between 18 and 52 years. this broad age group was tested following the convenience sampling method and a maximum sample was retained. the range of the age group should, however, be viewed with caution as the majority of the individuals participating across both test versions were actually under 25 years of age. english comprehension test version 1.3, predominately based on the content of ect 1.2, with some changes (indicated in table 4), was used for research purposes. english comprehension test version 1.2 has 39 items and a time limit of 45 min was imposed. english comprehension test version 1.3 has 42 items and no time limit was imposed (arendse, 2018; arendse & maree, 2019). in table 4, the changes made across the test versions are indicated. these changes across test versions are worth noting as possible factors that may have had an impact on the reliability of the ect. table 4: differences between the two versions of the english comprehension test. procedure the sampling method used to obtain the data was convenience sampling, as the individuals in the study were all attending selections, thus making them accessible for the ect pilot. the individuals were attending selection sessions for possible employment and were available after the selection process had been completed owing to the transport arrangements that had been made for them. the ethics of exploring the test for further validation after a high-stakes process had completed was considered and the individuals were informed that the test was for research purposes and thus were not compelled to participate in the research process. the ect is intended for screening purposes and not high-stakes testing. furthermore, the accessibility of the sample allowed for the piloting of the ect. the reasoning behind piloting the ect after lunch was that the research should not affect the performance of individuals in the selection process relevant to the employment for which they were applying. the performance of the participants in the research after the selection may have been affected by some measure of stress caused by the selection, which could either have inhibited or enhanced their performance. all the candidates were either grade 12 learners or had already completed grade 12. before the selection began, the participants completed the informed consent document. the participants were informed of the ect and asked whether they would consent to taking the test for research purposes. after the selection process had been concluded, the participants were given a lunch break and thereafter they completed the ect. the time of day that the research testing took place was early afternoon, which could imply that several factors may have had an impact on their performance in the ect. these factors include stress, fatigue, attitude, motivation and the energy levels of the participants when completing an assessment (angelidis et al., 2019; bunyi et al., 2015; dingwall et al., 2017; dodeen, abdelfattah, & alshumrani, 2014; kiwan, ahmed, & pollitt, 2000; kuwornu, 2017). although some individuals may employ stress as a motivator, others may experience stress as inhibiting their performance and causing anxiety (bunyi et al., 2015). these factors need to be acknowledged as they may have had an influence on the individuals completing both the untimed and timed versions of the ect. the administration of these pilot sessions involved test orientation and assisting individuals with completing the biographical section of the answer sheet (arendse, 2018; arendse & maree, 2019). the test times for each session were recorded manually, while only the starting time of the test and the time that the last person completed the test were recorded. the time recorded in the latter instance was therefore based on the maximum time required for the slowest person to complete the test. the reason for doing this was to assess the maximum time taken by an individual to complete the ect. a serious limitation with this method of recording time was that an average completion time could not be calculated. the ethical considerations were appropriately applied in this study. the confidentiality and privacy of participants were respected, with a view to keep any identifying information private and confidential. the participants, as said before, signed the informed consent document, which is a standard practice and allowed the individuals to be informed of what the research entails and that they were not forced to participate. the safeguarding of information is important and all data have been put into safekeeping. the data may only be accessed by registered professionals. ethical clearance for this study was obtained from the university of pretoria (arendse, 2018; arendse & maree, 2019). data analysis the description of the data includes the observation of skewness and kurtosis statistics to assess the normality of the data. this was done using spss employing descriptive statistics. the performance of individuals across the two test versions was indicated by means of scatter plots, which were generated using microsoft excel. the performance in the test involved a representation of the number of correct and incorrect answers to the questions in the test. this representation was done for both versions of the ect. it should be noted that the missing data for ect 1.3 were not captured because of the scanning process that automatically scored the test. for this reason, the incorrect and missing data would have been captured similarly (as a 0) for ect 1.3. because the missing data could not be compared across test versions, it was not included in the scatter plot. moreover, the incorrect data are potentially inflated because of possible missing data, which therefore presents a limitation to this study. although one would not expect missing data when no time limit is imposed, individuals completing the test were not forced to complete all test items. the reliability coefficient, cronbach’s alpha, was calculated using spss and used to assess how consistent the items of the test were as a whole (cronbach, 1951; hedge, powell, & sumner, 2018; liao, 2004; santos, 1999; streiner, 2003; taber, 2018; tavakol & dennick, 2011). the cronbach’s alpha contains information about how correlated the items of the test are to one another, which is referred to as the internal consistency of the measure. the cronbach’s alpha associated with the reliability in examining the internal consistency of the scale ranges from 0 to 1; thus, the closer this value is to 1, the more reliable the test will be in measuring the construct (mushquash & bova, 2007; streiner, 2003; taber, 2018; tavakol & dennick, 2011). ethical consideration ethical clearance to conduct the study was obtained from the university of pretoria (gw20150407hs). results the descriptive statistics in ect version 1.2 indicated a range of scores from 8 to 38, with an average score of 23. the descriptive statistics in ect version 1.3 indicated a range from 8 to 39, with an average score of 26. it is important to inspect the symmetry of the data to justify the use of parametric analyses across the two test versions. the skewness of -0.125 and the kurtosis of -0.284 for ect version 1.2 indicate that the data are fairly symmetrical and have a flat distribution (field, 2009). these values are within the commonly accepted range of -1.000 to +1.000. the kolmogorov–smirnov and shapiro–wilks tests for ect version 1.2 were, respectively, d(597) = 0.055, p < 0.05 and d(597) = 0.994, p < 0.001, indicating that the data are significantly non-normal (field, 2009). the skewness of -0.256 and the kurtosis of -0.082 for ect version 1.3 indicate that the data are slightly negatively skewed and have a flat distribution (field, 2009). this suggests that the majority of responses fell towards or above the mean value. the kolmogorov–smirnov and shapiro–wilks tests for ect version 1.3 were, respectively, d(881) = 0.063, p < 0.001 and d(881) = 0.987, p < 0.001, which indicates that the data are significantly non-normal (field, 2009). according to the skewness and kurtosis values for the two test versions, the data fall well within the commonly accepted ranges, which made the data suitable for further analysis. the kolmogorov–smirnov and shapiro–wilks tests of normality for the two test versions, however, indicated significantly non-normal distributions of data. because the deviation of normality was not severe, the entire complement of data was used. the sample size of 881 for ect version 1.3 might have improved the accuracy of the cronbach’s alpha as the data were not normally distributed (sheng & sheng, 2012). graphical display of performance in items of the test the scatter plot in figure 1 displays the responses of the individuals who completed ect version 1.2. it can be observed that most of the individuals answered the items correctly (60% of the responses to all the items were correct), while a smaller percentage provided incorrect responses (40% of the responses were incorrect). there was, however, a clearly significant increase in incorrect responses between items 19 and 24, and between items 36 and 39. this could be attributed to individuals choosing to answer certain items or not having sufficient time to correctly answer certain items in the test. the items’ difficulty levels can only be confirmed by conducting an item analysis, however, and this was not done. figure 1: the performance of individuals in english comprehension test version 1.2. figure 2 shows the responses in ect version 1.3, and as the data on the answer sheet were scanned automatically, it meant that the missing responses were not captured. a similar trend was observed with respect to the correct responses, while the incorrect responses increased in items 23, 24, 26, 27, 39, 40, 41 and 42. the range of incorrect and correct responses for ect 1.3 indicated that 62% of the responses to the items were correct, while 38% of the responses were incorrect. the pattern of incorrect and correct responses across the two test versions would suggest that perhaps the individuals completing the tests had intentionally skipped certain items in the test and did not necessarily need more time to complete the test. figure 2: the performance of individuals on english comprehension test version 1.3. recorded test times for english comprehension test 1.3 the time that it took the last person in the different groups to complete ect version 1.3 was recorded. table 5 shows the different times recorded for the pilot run and the length of time it took the last person in each group to complete the ect. table 5: test times for the pilot run of english comprehension test version 1.3. from the times captured in table 5, it is apparent that the candidates completed the test at different times in the 29 pilot tests that had been conducted, with an average of 74 min as completion time. the shortest time recorded was 55 min and the longest time recorded was 113 min. the fact that the last person in each group did not complete the test within 45 min is worth noting and it suggests that the set time limit of 45 min in ect version 1.2 might be an unsuitable time limit. reliability results the reliability coefficients for the two test versions are presented in tables 6 and 9. the average scores as well as the total items are indicated to place the reliability in tables 6 and 9 in context. the reliability of the full test items is similar across the two test versions (see tables 6 and 9), which may be regarded as acceptable reliability values for research purposes (nunnally & bernstein, 1994). to assess the best reliability coefficient for the data, the item total statistics were reviewed. these statistics highlighted the items that decreased the reliability coefficient value. the aforementioned items are indicated in tables 7 and 8 for ect 1.2 and in tables 10 and 11 for ect 1.3. for the best coefficient to be obtained, the items that decreased the reliability coefficient were deleted and the reliability analysis was rerun. this process was repeated until the reliability coefficient was at its highest value, which is depicted in table 6 for ect 1.2 and table 9 for ect 1.3. table 6: reliability statistics for english comprehension test 1.2. table 7: items removed from english comprehension test 1.2. table 8: items lowering the cronbach’s alpha for english comprehension test version 1.2. table 9: reliability statistics for english comprehension test 1.3. table 10: the items removed for english comprehension test 1.3. table 11: items lowering the cronbach’s alpha for english comprehension test version 1.3. in ect version 1.2 (tables 6 and 7), items 6, 10, 11, 12, 15, 16, 17, 18 and 19 were deleted to improve the reliability coefficient. the reliability coefficient on standardised items in the remaining 30 items was 0.820, indicating an acceptable reliability (nunnally & bernstein, 1994). this is, however, still insufficiently reliable for selection purposes or high-stakes testing (foxcroft & roodt, 2009). the mean of these 30 items was 17, which suggests that, on average, individuals answered 44% of the test correctly. in table 8, the contents of the items lowering the cronbach’s alpha are indicated. the varied contents of the items indicated in table 8 allow one to infer that these items were possibly affected by both ability and speed. for ect version 1.3 (tables 9 and 10), items 6, 7, 8, 9, 10, 11, 18, 23 and 25 were deleted to improve the reliability statistic. the 33 remaining items produced a reliability statistic of 0.816 on standardised items, indicating an acceptable reliability (nunnally & bernstein, 1994). it is, however, inadequate for selection purposes or high-stakes testing (foxcroft & roodt, 2009). the mean of these 33 items was 21, which indicates that, on average, individuals correctly answered 50% of the test questions. in table 11, the contents of the items lowering the cronbach’s alpha are indicated. the contents of the items are varied in table 11 and were possibly affected by the ability as speed was not a factor affecting ect version 1.3. the items of the ect 1.2 and ect 1.3 (tables 7, 8, 10 and 11) that lowered the cronbach’s alpha were negatively affecting the intercorrelations of the test and lowering the internal consistency of the test (streiner, 2003). moreover, the deletion of items that lowered the cronbach’s alpha was necessary as these items were malfunctioning despite the fact that individuals had a longer time within which to complete them. this raises the dilemma between speed and ability for the ect 1.2, while ect 1.3 could have been predominantly affected by ability. discussion the biographical details of the sample were taken into consideration as they informed the context of the results. the sample was dominated by men, particularly under the age of 25 years, who spoke an african language. this suggests that women and all language groups were not equally represented, which is a limitation in convenience sampling. the implication of this specific sample is that the overwhelming majority were non-native english speakers. this is crucial when considering that the ect is in english and thus language is an inherent variable that could contribute to measurement error in the calculation of the cronbach’s alpha of the two ect versions (dingwall et al., 2014, 2017; kanniainen et al., 2019; nel, 2018; spaull, 2016; van de vijver & rothmann, 2004). the substantial differences in the test administration, test structure and instructions of the two test versions may also have had an impact on the reliability of the ect. the minimum scores of individuals in the ect correlate with the reading comprehension and literacy concerns raised by researchers (dingwall et al., 2014, 2017; kanniainen et al., 2019; nel, 2018; spaull, 2013). these minimum scores could also be influenced by the manner in which items were phrased or the level of complexity of the items (dingwall, et al., 2014, 2017). when comparing the minimum and maximum scores of the timed (8 and 38) and untimed (8 and 39) versions of the ect, it would appear that these scores were not adversely affected by the time limit (angelidis et al., 2019; bunyi et al., 2015; keith & reynolds, 2010; oberauer & lewandowsky, 2013). although time limits are occasionally required to assess ability, the absence of a time limit may sometimes overestimate ability (keith & reynolds, 2010; oberauer & lewandowsky, 2013). the time limit imposed for ect version 1.2 may affect reliability because of the compromise between speed and ability (goldhammer, 2015). because the intention of the ect is to act as a screening tool, it does not require a time limit as the aim was to establish a baseline of ability, specifically verbal reasoning. although factors such as literacy and reading comprehension are worth considering when measuring verbal reasoning, there are other factors such as working memory, coding or decoding and reasoning skills that are equally important to consider (asgari & schutze, 2017; keith & reynolds, 2010; lohman & lakin, 2009; oberauer & lewandowsky, 2013). although individuals require literacy skills when reading and comprehending texts, they also use their working memory, coding or decoding processes and reasoning skills to reach valid conclusions (asgari & schutze, 2017; lohman & lakin, 2009). these cognitive processes and skills can be impacted by speed and may affect the reliability of the ect. the context in which the ect can be used, either educational or organisational, does not necessarily require timed screening. for this reason, the measurement of ability supersedes the use of speed. the recorded times were based on the time that the last person completed the test, with the average time taken being 74 min, which was 29 min longer than the time limit of 45 min that was imposed for ect version 1.2. the removal of the time limit and recording the time the last person finished can be regarded as a form of accommodation of participants to support the extraction of ability for the slowest persons (kuwornu, 2017). owing to the awareness that the ect sample comprised predominantly non-native english speakers, mechanisms such as accommodation were required as the time limit might have placed the focus on the items completed instead of the measurement of the construct (keith & reynolds, 2010; oberauer & lewandowsky, 2013). the observation of incorrect and correct responses throughout the two test versions would suggest that individuals preferred to answer certain items and were therefore less affected by the time limit. this therefore emphasises the compromise between speed and ability (goldhammer, 2015). in the administration of ect version 1.3, it was qualitatively observed that most candidates completing the test would spend the majority of their time on the last section of the test. the last section of the test contained sentence construction items and these items could therefore have been the reason for the long time spent on the test. there is, however, no quantifiable evidence to support this qualitative observation that was noted during the testing. the cronbach’s alpha for the full item scale of both test versions was appropriate for research purposes but insufficient for high-stakes selection purposes (nunnally & bernstein, 1994). when some items that reduced the cronbach’s alpha values were removed, the cronbach’s alpha for the revised test versions was sufficient for measuring ability across the two test versions (nunnally & bernstein, 1994). from the examination of the items that lowered reliability across the two test versions (tables 8 and 11), it is clear that there are some identical items. the identical items across the two test versions are the following: one ‘false’ and five ‘opinion and fact’ items. these items either lowered reliability because the items in relation to the comprehension section to which it refers may not be clear or the distractors for these items created inconsistency in answering patterns. because these items are based on comprehension and the comprehension section was not removed from either test version, one might ponder whether the outcome could be because of poor reading comprehension skills and literacy (abedi, 2002; bahardoost & ahmadi, 2018; dingwall et al., 2014, 2017; howie et al., 2017; kanniainen et al., 2019; nel, 2018; spaull, 2016; streiner, 2003). moreover, comprehension-based items may affect reliability as the items are dependent on the individual understanding the comprehension piece (streiner, 2003). these items in the ect that are based on the comprehension piece are not dependent on each other; however, each item assesses different inferences regarding the comprehension piece. it is nevertheless worth considering that the participants’ understanding of the comprehension had a direct impact on their ability to respond to items that depend on inferences in the comprehension (bahardoost & ahmadi, 2018; dingwall et al., 2014, 2017; kanniainen et al., 2019; nel, 2018; streiner, 2003). because the majority of the sample were non-native english speakers, the content of the test and items could have been more challenging in terms of the language used and questions posed (abedi, 2002; dingwall et al., 2014, 2017; van de vijver & rothmann, 2004). the responses to the comprehension-based items could also have been affected by external factors such as ses and the quality of education received (cockcroft et al., 2016; spaull, 2016). these external factors may also include personal contexts, urban and rural living circumstances, family history and traditional understanding, which may have influenced how individuals responded to these comprehension items. the remaining items were different ‘tense’ items across the two test versions that were identified as lowering the cronbach’s alpha (see tables 8 and 11). these ‘tense’ items were not related to the comprehension piece as they were separate, grammar-related, language questions. these ‘tense’ items could therefore either be too challenging as they required more formal english knowledge that africanand afrikaans-language individuals might not have, depending on their school background (abedi, 2002; cockcroft et al., 2016; dingwall et al., 2014, 2017; krugal & fourie, 2014; kuwornu, 2017; pretorius & klapwijk, 2016; spaull, 2016; van de vijver & rothmann, 2004). the handling of ‘tense’ could also be an issue affected by low literacy levels, differing uses of tense across languages, errors in forward or back translation processes and decoding errors on specific tense terms (asgari & schutze, 2017; bahardoost & ahmadi, 2018; dingwall et al., 2014, 2017; howie et al., 2017; kanniainen et al., 2019; nel, 2018; spaull, 2016; streiner, 2003). one synonym item that was identified in ect version 1.3 was also a language item but it did not relate to the comprehension piece. this item could be affected by participants’ vocabulary and literacy knowledge (abedi, 2002; cockcroft et al., 2016; krugal & fourie, 2014; kuwornu, 2017; pretorius & klapwijk, 2016). moreover, language generally affects the performance of non-native english speakers in english assessments (abedi, 2002; dingwall et al., 2014, 2017; kuwornu, 2017; van de vijver & rothmann, 2004). however, it is recommended that the content of these items should be explored in greater depth and in accordance with the principles of linguistic literature to establish language-related issues that may affect non-native english speakers. the reliability of the ect was nevertheless not negatively influenced by either the timed or untimed versions of the ect. moreover, the internal consistency of the two test versions appears to be acceptable, particularly the revised test versions. this acceptable internal consistency indicates that most of the items across the test versions appear to measure the same construct consistently. cognisant of this, the current study suggests that the internal consistency of the ect across the test versions was not negatively affected by time but this does not mean that performance in the two test versions was not affected. the removal of items that lowered the cronbach’s alpha was necessary if one considered that these items were possibly not related to the construct being measured, and were affecting the unidimensionality of the test. furthermore, these items have lower inter-relations with other items in the test and thus lowered the internal consistency of the test (streiner, 2003). it is, however, possible that cronbach’s alpha was underestimated in both the initial and revised reliability analysis, as the true reliability could be much higher (abedi, 2002; osburn, 2000; streiner, 2003; taber, 2018; tavakol & dennick, 2011). this argument would suggest that the two test versions appear to be sufficiently reliable for research (nunnally & bernstein, 1994) and, when revised, it may be able to measure verbal reasoning consistently (nunnally & bernstein, 1994). the performance of candidates across the two test versions was not assessed and may provide valuable insights into the ect in future research. the two test versions had different numbers of items, which is an important consideration in the light of the reliability results. it should nevertheless be cautioned that although the results for the two test versions were consistent, this does not imply that the performance across the test versions was equal. it is crucial to obtain these results for further developing and refining the ect. it thus opens up more avenues for research relating to the ect. there are a few important limitations concerning this study that should be noted. the samples for both test versions were conveniently selected. therefore, the results cannot be generalised and are specific to the population that was utilised. the lack of missing scores in the analysis of incorrect and correct items on the scatter plot is a limitation in assessing the accurate number of incorrect items across the test versions, and thus the incorrect data are regarded as possibly being inflated. another limitation of this study was that an average time for completing the untimed test version (ect 1.3) could not be calculated because alternative times, such as the time when the first person completed the test, were not recorded. the external factors such as stress, fatigue, motivation, anxiety, attitude and energy levels of participants, and internal test factors such as systematic errors, may have affected the reliability of both versions of the ect. it is recommended that in future piloting of the ect the time should be recorded for the first and last persons to complete the ect in order to establish a more accurate range of the time taken by individuals to complete the test. the recording of the first and last persons completing the test would allow for an average time to be calculated, which is a more accurate calculation of the time needed to complete the test. it is also recommended that the performance in the two ect versions should be assessed to establish whether there was a difference in performance. moreover, the performance of the nine african and afrikaans language individuals who are non-native english speakers should be compared to english first-language speakers across test versions. another recommendation is that the items identified as lowering the cronbach’s alpha should be explored in more detail in terms of the appropriate linguistic literature and statistical analysis. this may inform whether english, the nine african languages or afrikaans language individuals perform differently in such items and find a possible reason why they would perform differently or similarly for these items. conclusion this study embarked on assessing the reliability of individuals in the timed version (ect 1.2) and the untimed version (ect 1.3). the administration differences (including test structure and instructions) could have affected the reliability of the ect. the recorded times indicated that the last person to complete the test was unable to complete it within 45 min, which was the time limit of the timed test version (ect 1.2). the performance of individuals in the untimed and timed versions of the ect appears to be similar according to the average minimum and maximum scores. this performance could be attributed to the answering pattern of individuals, when they might deliberately have chosen to answer certain items and therefore might not have needed more time for answering test items. this more importantly suggests the unsuitability of a time limit for the ect, as the compromise between speed and ability affects the reliability of the test. the reliability results indicated that both tests were appropriate for research purposes and once the items that lowered the cronbach’s alpha had been removed, both test versions were able to measure the verbal reasoning aspect of the ect consistently. the revised reliability results across the two test versions suggested that the internal consistency was acceptable. removal of the items lowering the cronbach’s alpha across test versions was important as they negatively affected the reliability and internal consistency of the test. this study provides important information on the psychometric properties of the ect and is imperative for further development of the ect. acknowledgements the author acknowledges that this article is related to some of the findings from her thesis published in 2018, which is entitled: ‘exploring the construct validity and reliability of the english comprehension test’ (university of pretoria). competing interests the author has no competing interests. she also declares that she has no financial or personal relationship that may have inappropriately influenced her in writing this article. authors’ contributions i declare that i am the sole author of this research work. funding information there was no funding received for the publishing of this article. data availability statement data sharing is not applicable to this article as no new data were created or analysed in this study. disclaimer the views ad opinions expressed in this article are the author’s own and are not the official position of any institution. references abedi, j. (2002). standardized achievement tests and english language learners: psychometrics issues. educational assessment, 8(3), 231–257. https://doi.org/10.1207/s15326977ea0803_02 angelidis, a., solis, e., lautenbach, f., van der does, w., & putman, p. (2019). i’m going to fail! acute cognitive performance anxiety increases threat-interference and impairs wm performance. plos one, 14(2), 1–32. https://doi.org/10.1371/journal.pone.0210824 arendse, d.e. (2018). exploring the construct validity and reliability of the english comprehension test (unpublished doctoral thesis). pretoria: university of pretoria. arendse, d.e., & maree, d. (2019). exploring the factor structure of the english comprehension test. south african journal of psychology, 49(3), 376–390. https://doi.org/10.1177/0081246318805268 asgari, e., & schutze, h. (2017, september 7–11). past, present, future: a computational investigation of the typology of tense in 1000 languages. proceedings of the 2017 conference on empirical methods in natural language processing (pp. 113–124). copenhagen: association for computational linguistics. bahardoost, m., & ahmadi, a. (2018). the relationship between test-taking strategies and iranian efl learners’ performance on reading comprehension tests. international journal of foreign language teaching and research, 6(22), 117–130. bekwa, n.n. (2016). the development and evaluation of africanised items for multicultural cognitive assessment (unpublished doctoral thesis). university of south africa, pretoria. bunyi, j., heal, c., nadendla, s., sherman, d., tran, j.d., & varsos, j. (2015). test scores on timed exams decline over time without a significant increase in physiological stress. timed test performance and physiological stress. journal of advanced student science, 2015 (1, spring), 1–21. cockcroft, k., bloch, l., & moolla, a. (2016). assessing verbal functioning in south african school beginners from diverse socioeconomic backgrounds: a comparison between verbal working memory and vocabulary measures. education as change, 20(1), 199–215. https://doi.org/10.17159/1947-9417/2016/559 cronbach, l.j. (1951). coefficient alpha and the internal structure of tests. psychometrika, 16, 297–334. https://doi.org/10.1007/bf02310555 dingwall, k.m., gray, a.o., mccarthy, a.r., delima, j.f., & bowden, s.c. (2017). exploring the reliability and acceptability of cognitive tests for indigenous australians: a pilot study. biomed central (bmc) psychology, 5(26), 1–16. https://doi.org/10.1186/s40359-017-0195-y dingwall, k.m., lindeman, m.a., & cairney, s. (2014). ‘you’ve got to make it relevant’: barriers and ways forward for assessing cognition in aboriginal clients. biomed central (bmc) psychology, 2(13), 1–11. https://doi.org/10.1186/2050-7283-2-13 dodeen, h.m., abdelfattah, f., & alshumrani, s. (2014). test-taking skills of secondary students: the relationship with motivation, attitudes, anxiety and attitudes towards tests. south african journal of education, 34(2), 1–18. https://doi.org/10.15700/201412071153 erguven, m. (2014). two approaches to psychometric process: classical test theory and item response theory. journal of education, 2(2), 23–30. field, a.p. (2009). discovering statistics using spss. london: sage. foxcroft, c.d., & roodt, g. (2009). an introduction to psychological assessment in south african context. cape town: oxford university press. goldhammer, f. (2015). measuring ability, speed, or both? challenges, psychometric solutions, and what can be gained from experimental control. measurement, 13(3–4), 133–164. https://doi.org/10.1080/15366367.2015.1100020 guttman, l. (1945). a basis for analysing test-retest reliability. psychometrika, 10(4), 255–282. hedge, c., powell, g., & sumner, p. (2018). the reliability paradox: why robust cognitive tasks do not produce reliable individual differences. behaviour research, 50, 1166–1186. https://doi.org/10.3758/s13428-017-0935-1 howie, s.j., combrink, c., roux, k., tshele, m., mokoena, g.m., & mcleod palane, n. (2017). pirls literacy 2016: south african highlights report. pretoria: centre for evaluation & assessment. kanniainen, l., kiili, c., tolvanen, a., aro, m., & leppanen, p.h.t. (2019). literacy skills and online research and comprehension: struggling readers face difficulties online. reading and writing, 32(9), 2201–2222. https://doi.org/10.1007/s11145-019-09944-9 keith, t.z., & reynolds, m.r. (2010). cattell-horn-carroll abilities and cognitive tests: what we’ve learned from 20 years of research. psychology in the schools, 47(7), 635–650. https://doi.org/10.1002/pits.20496 kendeou, p., van den broek, p., helder, a., & karlsson, j. (2014). a cognitive view of reading comprehension: implications for reading difficulties. learning disabilities research and practise, 29(1), 10–16. https://doi.org/10.1111/ldrp.12025 kiwan, d., ahmed, a., & pollitt, a. (2000). the effects of time-induced stress on making inferences in text comprehension. paper presented at the european conference on educational research, edinburgh. krugal, r., & fourie, e. (2014). concerns for the language skills of south african learners and their teachers. international journal of educational science, 7(1), 219–228. https://doi.org/10.1080/09751122.2014.11890184 kuwornu, a.a. (2017). review of issues of language assessments for non-native speakers of english. sino-us english teaching, 14(3), 157–168. https://doi.org/10.17265/1539-8072/2017.03.005 lee, y., & chen, h. (2011). a review of recent response-time analyses in educational testing. psychological test and assessment modelling, 53(3), 359–379. liao, y. (2004). issues of validity and reliability in second language performance assessment. teachers college, columbia university. working papers in tesol and applied linguistics, 4(2), 1–4. lohman, d.f., & lakin, j.m. (2009). reasoning and intelligence. in r.j. sternberg & s.b. kaufman (eds.), handbook of intelligence (2nd edn., pp. 1–47). new york: cambridge university press. mushquash, c.j., & bova, d.l. (2007). cross-cultural assessment and measurement issues. journal of development disabilities, 13(1), 55–66. nel, c. (2018). a blueprint for data-based english reading literacy instructional decision-making. south african journal of childhood education, 8(1), 1–9. https://doi.org/10.4102/sajce.v8i1.528 nunnally, j.c., & bernstein, i.h. (1994). psychometric theory (3rd edn.). new york: mcgraw-hill. oberauer, k., & lewandowsky, s. (2013). evidence against decay in verbal working memory. journal of experimental psychology, 142(2), 380–411. https://doi.org/10.1037/a0029588 osburn, h.g. (2000). coefficient alpha and related internal consistency reliability coefficient. psychological methods, 5(3), 343–355. https://doi.org/10.1037/1082-989x.5.3.343 pretorius, e.j. (2002). reading ability and academic performance in south africa: are we fiddling while rome is burning? language matters, 33, 169–196. https://doi.org/10.1080/10228190208566183 pretorius, e.j., & klapwijk, n.m. (2016). reading comprehension in south african schools: are teachers getting it, and getting it right? per linguam, 32(1), 1–20. santos, j.r.a. (1999). cronbach’s alpha: a tool for assessing the reliability of scales. journal of extension, 37(2), 1–6. sheng, y., & sheng, z. (2012). is coefficient alpha robust to non-normal data? frontiers in psychology, 3, 1–13. https://doi.org/10.3389/fpsyg.2012.00034 spaull, n. (2013). south africa’s education crisis: the quality of education in south africa 1994–2011. johannesburg: centre for development and enterprise. spaull, n. (2016). what do we know about reading outcomes in south africa? paper presented at the bridge forum, johannesburg. streiner, d.l. (2003). starting at the beginning: an introduction to coefficient alpha and internal consistency. journal of personality assessment, 80(1), 99–103. https://doi.org/10.1207/s15327752jpa8001_18 taber, k.s. (2018). the use of cronbach’s alpha when developing and reporting research instruments in science education. research science education, 48 (6), 1273–1296. https://doi.org/10.1007/s11165-016-9602-2 tavakol, m., & dennick, r. (2011). making sense of cronbach’s alpha. international journal of medical education, 2, 53–55. https://doi.org/10.5116/ijme.4dfb.8dfd van de vijver, f.j.r., & rothmann, s. (2004). assessment in multicultural groups. the south african case. south african journal of industrial psychology, 30(4), 1–7. https://doi.org/10.4102/sajip.v30i4.169 abstract introduction method results discussion limitations conclusion and recommendations acknowledgements references about the author(s) itumeleng p. khumalo department of psychology, faculty of humanities, university of the free state, bloemfontein, south africa tharina guse department of psychology, faculty of humanities, university of pretoria, pretoria, south africa citation khumalo, i.p., & guse, t. (2022). factor structure of the dispositional hope scale amongst south africans: an exploratory structural equation modelling study. african journal of psychological assessment, 4(0), a66. https://doi.org/10.4102/ajopa.v4i0.66 research project registration: project number: opt-2014-012 original research factor structure of the dispositional hope scale amongst south africans: an exploratory structural equation modelling study itumeleng p. khumalo, tharina guse received: 05 july 2021; accepted: 30 sept. 2021; published: 31 jan. 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract snyder’s model of hope conceptualises and operationalises hope as a cognitive, trait-like bi-dimensional future-oriented construct consisting of pathways thinking and agency thinking for goal achievement. the present study implemented exploratory structural equation modelling (esem) and confirmatory factor analysis (cfa) on the dispositional hope scale, using data from two south african student samples (n = 383, 48% female, 21.70 years average age and n = 251, 68% female, 20.55 years average age), with the aim to examine its factor structure in an african context. the results showed that a six item unidimensional solution of hope fit the data best. this model characterises hope as the ability to make plans, informed by past experiences and to spontaneously manoeuvre around obstacles as any situation may call for it. this finding has implications for the measurement of hope and development of emic operational models in an african context. keywords: hope; dispositional hope scale; measurement; structural equation modelling; factor analysis. introduction the study of hope, as a goal-directed and future-oriented concept, features prominently as a focus of scientific inquiry in positive psychology (seligman & csikszentmihalyi, 2000; snyder, 2004). although it shares conceptual similarity with a number of other future-oriented constructs (krafft, martin-krumm, & fenouillet, 2017), hope is defined as a cognitive and dispositional process that involves agency and pathways for reaching one’s goals (snyder, 1995, 2002, 2004; snyder, cheavens, & sympson, 1997; snyder et al., 1991). other goal-directed constructs include optimism (alarcon, bowling, & khazon, 2013; scheier & carver, 1985), self-efficacy (bandura, 1977, 1997), meaning in life (schnell, 2009; steger, 2012), curiosity (kashdan et al., 2009) and motivation (ed. ryan, 2012). hope differs from these concepts because it foregrounds all the three goal-pursuit elements, namely agency, pathways and goals, equally (marques, lopez, rose, & robinson, 2014). whilst pathways thinking refers to the perceived ability to generate routes that lead to the desired goals, agentic thinking represents the cognitive willpower or energy, which propels people to move towards their goals (snyder, 1995). thus, high hope ‘reflects an elevated sense of mental energy and pathways for goals’ (snyder, 1995, p. 355). initially thought to be too vague to study and measure (see snyder, 1995), hope in addition to snyder’s model (snyder, 1995, 2002; snyder et al., 1991), is now conceptualised and operationalised through a plethora of approaches (e.g. bernardo, 2010; krafft et al., 2017; maree & maree, 2005; maree, maree, & collins, 2008; scioli, ricci, nyugen, & scioli, 2011). perceived hope refers to the conception of hope as perceived and experienced by ordinary people and encompassing spiritual, religious and altruistic dimensions (krafft et al., 2017). locus of hope is described as ‘whether the components of trait hope involve internal or external agents and internally or externally generated pathways’ (bernardo, 2010, p. 945). conceptualising hope expressed as an emotion, scioli et al. (2011) proposed integrated hope as a future-directed, four-channel emotion network made up of mastery, attachment, survival and spiritual systems. maree (maree & maree, 2005; maree et al., 2008) measured hope as a multidimensional construct comprising goal achievement resources, ineffectuality, future vision, despondency and agency, which subsequently enables perseverance, anchoring and direction. even before the cognitive-motivational model of snyder, two older models attempting to explain hope, with their own variations, were developed by stotland (1969) and averil, catlin and chon (1990). nevertheless, there is consensus across all of these models that hope represents a positive expectation towards future outcomes (krafft et al., 2017). hope is important because people are intrinsically goal oriented and tend to think about their future (emmons, 2003; ed. ryan, 2012; snyder, 1995). higher levels of hope are positively associated with life satisfaction (o’sullivan, 2011), positive affect and positive and rational problem solving styles (chang & banks, 2007). previous studies amongst university students found that high hope was associated with health benefits, such as diet regulation and exercise (berg, ritschel, swan, an, & ahluwalia, 2011), academic success (snyder et al., 2002), adjustment to university (liu, kia-keating, & modir, 2017) and well-being (demerli, türkmen, & arik, 2015; guse & shaw, 2018). conversely, hope was negatively associated with anxiety and depression (arnau, rosen, finch, rhudy, & fortunato, 2007; snyder et al., 1991), negative affect, negative problem orientation and an impulsive problem solving style (chang & banks, 2007). hope may serve as a protective factor in the mental health outcomes of youth (griggs, 2017) and can have therapeutic value as it may enhance well-being and preserve health (krafft et al., 2017). having hope also makes young people more likely to invest in their future, for example, through attaining education and avoiding risky health behaviours that may be detrimental to their future (graham & pozuelo, 2018). in counselling psychology settings (snyder, 1995) and cross culturally (chang & banks, 2007), the measurement of targeted outcome variables such as hope is crucial for accurate diagnostic information. the present study is specifically concerned with how hope is measured using the dispositional hope scale (dhs) (snyder et al., 1991) in a group of students in south africa. recognising that confirmatory factor analysis (cfa) has generally been used to examine the construct validity of multi-item instruments comprising hypothesised factors (pretorius, 2021), our study extends this work by applying exploratory structural equation modelling (esem) to the dhs in south africa. the importance of knowing and working with the confidence that a measuring instrument has good psychometric properties in the specifically applied context cannot be overstated (pretorius, 2021). a number of international studies have investigated and reported a variety of findings on the measurement quality and dimensionality of the dhs (e.g. abdel-khalek & snyder, 2007; demirli, türkmen, & arik, 2015; gana, daigre, & ledrich, 2013; kemer & atik, 2012; roesch & vaughn, 2006; sun, ng, & wang, 2012; venning, eliott, kettler, & wilson, 2009). most have relied on multivariate analysis and/or cfa. however, the limitations of cfa are increasingly demonstrated in empirical studies (perry, nicholls, clough, & crust, 2015). using a large multi-ethnic sample in the united states of america (us), roesch and vaughn (2006) found support for the two-factor sample of the dhs. sun et al. (2012) similarly found a two-factor model amongst three chinese samples. the two-factor structure was confirmed in several other studies implemented amongst french (gana et al., 2013), australian (venning et al., 2009) and arabic-speaking (abdel-khalek & snyder, 2007) samples. however, a few other studies did not find the theoretically intended two-factor structure. arnau et al. (2007) arrived at a view that agency and pathways components did not make unique contributions to the construct of hope. following bifactor analysis, brouwer et al., (2008) recommended that the dhs should be implemented as a unidimensional scale, as the items measure the same construct, with very little unique variance being explained by two separate dimensions. similarly, using cfa, espinoza et al. (2017) reported that a unidimensional model best fit the data when examining hope in general and clinical populations. choi, lee and lee (2008) reported a unidimensional structure amongst korean undergraduate students, whilst park and kim (2017) found support for the two-factor structure amongst korean stroke survivors. similar studies in (south) africa include those of guse, de bruin and kok (2016) and savahl, casas and adams (2016) who reported the psychometric properties of the children’s hope scale (snyder et al., 1997b) and nel and boshoff (2014) who reported the psychometric properties of the adult state hope scale (snyder et al., 1996). confirmatory factor analysis has been extensively used to investigate the factor structure of many measuring instruments in psychology and related fields (benitez-borrego, guardia-olmos, & urzua-morales, 2014). the limitations of cfa include its restrictive tendency of requiring zero cross-loadings and overestimating inter-factor correlations (perry et al., 2015). according to asparouhov and muthen (2009), exploratory factor analysis (efa) would be an alternative solution when the overly restrictive cfa does not fit the data well. exploratory structural equation modelling (asparouhov & muthen, 2009), conducted in mplus, was our preferred approach for adequately and comprehensively exploring the factor structure of the dhs amongst south african students. the advantages of esem, over cfa, are that it allows for the modelling of (theoretically plausible) cross-loadings, and it has less reliance on model modification indexes. in this way, it provides greater modelling flexibility in being a less restrictive way of estimating measurement models, thus allowing for broader and richer a priori model alternatives (asparouhov & muthen, 2009). the esem has been helpful in clarifying factor structure of (psychological) well-being measures in parts of the world such as iran (e.g. joshanloo, 2016a, 2016b). at the time of conducting this study, we were not aware of studies using esem to explore the factor structure of snyder’s conceptual model of hope. given the contradictory results concerning the factor structure of the dhs, the absence of evidence for the applicability of the dhs in the african context and the limitations of cfa, we further investigated its factor structure using esem. the value of this exercise is underscored by cross-cultural transportation and adaptation of measuring instruments (chen, 2008; joshanloo, 2016a; wissing et al., 2010). although hope may be considered a globally recognised concept (snyder, 2004; sun et al., 2012), cross-cultural transportation and adaptation of measurement cannot ignore group differences in personhood and ways of being. whilst some studies have supported the assumption of universality (e.g. roesch & vaugn, 2006), others have not (e.g. brouwer et al., 2008; choi et al., 2008; galiana, oliver, sancho, & tomás, 2015). to date, evidence of the applicability of the two-factor model of hope and its context informed nature is lacking in african samples. this is notwithstanding the observation by krafft et al. (2017, p. 3) that the ‘central questions in the design of hope studies have been the dimensionality and complexity (unidimensional or multidimensional) of the concept’. given the possible differences in socio-cultural characteristics of (south) africans such as time perspective, uncertainty avoidance, individualism-collectivism and cultural tightness-looseness (khumalo, wilson, & brouwers, 2020), the implementation of esem could yield a unique/emic dimensional solution of hope. no studies applying esem asparouhov & muthen, 2009) could be located. the present study is therefore concerned with investigating the factor structure of hope, as measured by the dhs, a widely used measure of hope in positive psychology (ackerman, warren, & donaldson, 2018). it is in this context that the exploration of the factor structure of the dhs using a more flexible statistical analysis approach was necessary. method participants and setting data were collected from two samples of students from two different institutions of higher learning situated in the most urbanised province of south africa. the first sample comprised 383 students (48% female, 21.70 average of age [sd = 2.36]) from a university of technology and the second sample of 251 students (68% female, 20.55 average of age [sd = 1.95]) from a comprehensive university. measuring instrument dispositional hope scale the dhs (snyder et al., 1991) is designed as a two-factor solution measure, consisting of 12 items meant to provide scores on agency-thinking (4 items) and pathways-thinking (4 items), and a combined summative score of hope. the four additional items are distractor items. it uses an 8-point likert-type scale ranging from 1 (definitely false) to 8 (definitely true). the agency-thinking item content is exemplified by ‘i energetically pursue my goals’, and pathways-thinking by ‘there are lots of ways around any problem’. the dhs has been found to be reliable, with snyder et al. (1991) reporting cronbach’s alpha coefficients of between 0.74 and 0.84 amongst students, as well as outpatients and inpatients in psychological treatment. in the same study, the agency subscale attained a cronbach’s alpha coefficient ranging between 0.71 and 0.76 and the pathways subscale between 0.63 and 0.80. later studies reported similar reliability coefficients (e.g. galiana et al., 2015; roesch & vaughn, 2006). research examining the dhs in different cultural contexts further supported its validity (e.g. demirli et al., 2015; galiana et al., 2015; kemer & atik, 2012; roesch & vaughn, 2006; sun et al., 2012). in the present study, based on the intended factor structure, the pathways subscale obtained an omega reliability index of 0.725 for sample 1 and 0.810 for sample 2, whilst the agency subscale scored 0.644 for sample 1 and 0.823 for sample 2. the reliability for the total scale as a unidimensional measure was 0.810 for sample 1 and 0.881 for sample 2. data analysis the present study investigated the model fit of the dhs measurement models in two independent samples using cfa and esem in mplus (muthén & muthén, 1998–2017). before commencing with factor analytic investigation, we computed the item-level descriptive statistics of the hope scale. as recommended by marsh, morin, parker and kaur (2014), we used maximum likelihood estimation, with oblique geomin rotation. the following measurement models were fitted and evaluated for fit in both samples separately: one-factor solution, two-factor cfa solution and two-factor esem solution. the following model fit indices and criteria were used to judge the adequacy of the models: chi-square (χ2), root mean square error of approximation (rmsea), standardised root mean square residual (srmr); comparative fit index (cfi), tucker–lewis index (tli); akaike information criterion (aic) and bayesian information criterion (bic) (geiser, 2013). for good fit, the following criteria were used: smaller and insignificant χ2, rmsea and srmr of less than 0.06; cfi of more than 0.95; tli of more than 0.95; smaller aic and smaller bic (byrne, 2012; geiser, 2013; hu & bentler, 1999; wang & wang, 2012). ethical considerations all participants were informed volunteers, recruited via their lecturers by research assistants who were postgraduate students. the participants from the vaal university of technology completed the battery of questionnaires after signing an informed consent letter and returned the completed questionnaires after a week. at the university of johannesburg (the comprehensive university), the students completed the questionnaire online after providing informed consent. participants did not receive incentives. ethical clearance was obtained from north-west university (reference number: nwu-00138-14-a8), vaal university of technology (reference number: 20140425-1ms) and the university of johannesburg (reference number: rec 01 -009 -2015). results sample 1 model fit indices are reported in table 1. the two cfa models of the dhs had good but not excellent fit. the unidimensional model (χ2 [20] = 84.68, p < 0.000, cfi = 0.917) was relatively comparable with the theoretically intended bidimensional one (χ2 [19] = 84.129, p < 0.000, cfi = 0.916). reliability indices and the item-level descriptive statistics, as seen in table 2, for the agency and pathways subscales were computed based on the two-factor model. mean scores ranged between 5.68 and 6.82 and all the skewness and kurtosis values attested to normal distribution of data. the unidimensional and bidimensional cfa models for sample 1 are displayed in figure 1. figure 1: the confirmatory factor analysis (cfa) models for sample 1 (n = 383). (a) the unidimensional solution, and, (b) the two-factor solution. table 1: fit indices for unidimensional, cfa bidimensional, and esem bidimensioanl models for sample 1 and sample 2. table 2: descriptive statistics: sample 1 (n = 383), based on confirmatory factor analysis two-factor model. the esem model demonstrated a significant improvement (χ2 [13] = 25.417, p = 0.0203, cfi = 0.984). the latent variables, agency and pathways had a high correlation of 0.974 in the cfa bidimensional model. as seen in figure 2, the resultant two factors in the esem model had a moderate correlation of 0.476. figure 2: the exploratory structural equation modelling two-factor solution for sample 1 (n = 383). as seen in table 3, all the items obtained significant factor loadings on the latent variables in both the cfa and esem models. the esem produced a two dimensional solution with agency consisting of six items and pathways having two items. all the theoretically intended pathways items load significantly on the agency factor. this left two agency items to make up the pathways subscale. no cross loading items were observed. table 3: standardised factor loadings of the two-factor models. sample 2 similarly, in sample 2, the esem model proved to be superior (see table 1). the cfa models from sample 2 are displayed in figure 3. both of them had acceptable, but not excellent model fit (hu & benlter, 1999), with the unidimensional one characterised by χ2 (20) = 97.555, p < 0.000, cfi = 0.912 and the bidimensional one by χ2 (19) = 65.526, p < 0.000, cfi = 0.947. reliability indices and the item-level descriptive statistics, as seen in table 4, for the agency and pathways yielded mean scores ranging between 5.61 and 6.69 and the skewness and kurtosis values showed a normal distribution. notably, the esem model demonstrated a significant improvement χ2 (13) = 28.060, p = 0.0089, cfi = 0.983. the correlation coefficient between agency and pathways was 0.849 in the cfa bidimensional model and 0.671 in the esem bidimensional model. figure 3: the confirmatory factor analysis (cfa) models for sample 2 (n = 251). (a) the unidimensional solution, and, (b) the two-factor solution. table 4: descriptive statistics: sample 2 (n = 251), based on confirmatory factor analysis two-factor model. all the items obtained significant factor loadings on the latent variables in both the cfa and esem models, as also found in sample 1 (see table 3). except for one important difference between sample 1 and sample 2, in this sample, esem produced a two dimensional solution, with agency consisting of six items, and pathways having two items. all the theoretically intended pathways items load significantly on the agency factor and two agency items made up the pathways subscale. of notable interest was that item 2 cross loaded (see figure 4). figure 4: the exploratory structural equation modelling two-factor solution for sample 2 (n = 251). discussion this study sought to provide further evidence of the dimensionality of hope, as operationalised through the dhs of snyder et al. (1991), using a more flexible factor analysis approach, esem (asparouhov & muthen, 2009), amongst a south african sample. not only was the necessity for this study driven by the statistical analytical approach and the african socio-cultural context, but we wanted to contribute to the current body of knowledge where there seems to be a lack of consensus about the measurement of hope. the importance of measurement in a context such as south africa is also underscored by the legal and ethical obligation for culturally appropriate and psychometrically sound measures (adams, van de vijver, & de bruin, 2012). we found that a six item unidimensional solution of hope was best fitting. this was in contrast to the one-factor and the theoretically intended two-factor cfa models, which did not have excellent fit in our data. our data did not support a unidimensional solution consisting of eight items, as this factor structure yielded sub-adequate model fit indices in both samples. this finding contradicts previous research, which suggested that the eight items of the dhs explained hope as one factor. in one such study, brouwer et al. (2008) found that the two separate dimensions explained very little unique variance. in other studies, the unidimensional solution was used from the onset. examples include guse and vermaak (2011) who used the total score for the children’s hope scale in south africa, and hirschi (2014) who used the dhs total score with student and working adult samples in germany. although the two-factor cfa model seemed promising in the present data, it was marginally inadequate in both samples. as expected, these theoretically intended cfa models showed very high inter latent variable correlations of 0.974 (sample 1) and 0.849 (sample 2). at face value, this result may suggest that hope may possibly be better operationalised as a unidimensional solution. it is also known that the overestimation of inter-factor correlation is a result of the over restrictive cfa, which constrains items to load only on the one intended factor (marsh, liem, martin, morin, & nagengast, 2011; perry et al., 2015). as a result of the inherent flexibility of esem (joshanloo 2016a; marsh et al., 2011), we expected that the two factor esem model would demonstrate superior fit and moderate inter-factor correlations. indeed, the application of esem yielded a more nuanced and possibly emic distribution of the dhs items. the use of esem resulted in a model with one major factor consisting of six items and one minor factor consisting of two items. in both samples, lower inter-factor correlations were observed, with 0.476 for sample 1, and 0.671 for sample 2, thus demonstrating inter-factor independence. as seen in table 4, all four items theoretically intended to indicate pathways thinking (e.g. ‘there are lots of ways around a problem’) loaded on the agency factor together with two agentic thinking items, namely ‘i energetically pursue my goals’, and ‘my past experiences have prepared me well for my future’. the two agentic thinking items that loaded on the separate minor factor were ‘i have been pretty successful in life’, and ‘i meet the goals that i set for myself’. a closer look at the item content of these two separately loading items tells us of the participants’ non-endorsement of the assumption of past success and meeting self-determined goals. the esem factorial solution may be best interpreted as indicating an emic six item unidimensional measure of hope characterised as the ability to make plans, informed by past experiences and to spontaneously manoeuvre around obstacles as any situation may call for it. we therefore see an intertwined hybrid of practical wisdom and the will to achieve success. practical wisdom is broadly described by furey (2017) as the use of subjective and contextual resources in response to real life situations and not relying only on contemplative knowledge. this disposition, also known as phronesis, equips a person with an ability to view situations from multiple perspectives and to ‘navigate a variety of contextually complex situations’ (furey, 2017, p. 473). ryff and singer (1998, p. 6) have thought of practical wisdom, which they described as ‘excellence of thought that guides good action’ as a moral virtue in an african sociocultural context. the assumption of past success and the expectation of meeting self-set goals do not form part of this picture. it is not uncommon that when one’s past is uncertain, it cannot be used as a source of hope for the future (see bryant & elland, 2015). as observed by cherrington (2018), hope is not only drawn from personal narratives of individualised success but is contextually nurtured in a collective sphere. instead, the foregrounded characteristic of this dimension is confidence in practical problem solving informed by past experiences and executed with vitality. this dimension is reminiscent of the factor labelled by maree et al. (2008) as agency. in the development and validation of their multidimensional hope measure, maree et al. (2008) conceived of this factor as the ‘ability to focus and act’ (p. 172), belief that goals can be achieved by doing something and representing situational ability to energise oneself. the finding not only holds implications for how dispositional hope becomes measurable but also how it is conceptualised in this context. in a region of the world characterised by uncertainty, where there is also a cultural orientation of being comfortable with unstructured situations, having tolerance for ambiguity (i.e. low uncertainty avoidance; allik & mccrae, 2004; hofstede, 2011) and being present and past time-oriented (mbiti, 1990, 1991), more collectivistic (wissing & temane, 2008) and culturally tight (khumalo et al., 2020), it may not be the individual but the greater social ecology, which sets the demarcations of the goal-oriented journey to be embarked on. according to bishop and willis (2014, p. 782) ‘hopeful thinking is necessary for the construction of a positive self-identity and positive sense of self-worth’, but what if it only serves as a pragmatic mechanism towards everyday goals. in their study amongst marginalised youth in australia, bryant and elland (2015) found that many participants could not articulate a future beyond their present circumstances and concluded that their uncertainties shaped their future thinking. these findings underscore the importance of context in measuring and understanding hope. limitations the findings of this study are to be interpreted in the context of its limitations. notwithstanding the strength of utilising two independent datasets, data were still obtained from university students through convenience sampling. future studies should consider more systematic and inclusive forms of sampling applied to larger, diverse and more representative population samples. there is also a need for bottom-up informed conceptual and operational models of hope in an african context (e.g. bishop & willis, 2014; cherrington, 2018). in addition to esem, as we did, quantitative data on hope, need to be subjected to more sophisticated statistical analysis approaches such as rasch analysis/item response theory, multi-dimensional scaling, bifactor analysis and latent class analysis. through such approaches in different and representative samples, a better understanding of item functioning and dimensionality of hope scales in an african context can be gained. conclusion and recommendations hope is a psychological strength that holds relevance for a number of life domains and intervention avenues. it is a determinant for students’ success (feldman, davidson, & margalit, 2014), as much as it is relevant for health psychology (snyder, 1995) and for counselling psychology setting (cheavens, feldman, woodward, & snyder, 2006). as hope is malleable (weis & speridakos, 2011) and associated with several positive mental health outcomes for students (berg et al., 2011; griggs, 2017), the dhs could be used in evaluating the effect of interventions to enhance hope amongst south african students. the fact that our esem results did not support the previous theoretically and empirically supported snyder’s model of hope is a possible illustration of either measurement instability (as seen in khumalo, ejoke, asante, & rugira, 2021) and or poor contextually embedded support (perhaps based on cultural interpretation) of a cross culturally transported theoretical construct (see cherrington, 2018). recommendations for further research include further investigation of the psychometric properties of the dhs by examining measurement invariance across gender and ethnicity, which may play a role in participants’ responses to some of the items. further studies could implement the dhs to investigate the dynamics of hope and well-being amongst south african university students. this is particularly important given the current context of higher education in south africa, as access to university remains limited to a small percentage of school leavers, many of whom do not complete their studies (habib, 2016). there also is a need to evaluate interventions to enhance hope amongst african samples and youth in particular. in conclusion, our study extended research on hope theory by providing support for psychometric properties of the dhs in an african student sample. it paves the way for further research on hope and well-being in the african context, thereby expanding knowledge of human flourishing beyond existing western understanding. acknowledgements competing interests the authors have declared that no competing interest exists. authors’ contributions i.p.k. and t.g. both contributed equally to the realisation of the article. this contribution includes conceptualisation, contribution of data, methodology and data analysis and writing. funding information this research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. data availability the authors confirm that the data supporting the findings of this study are available within the article. raw data that support the findings of this study are available from the corresponding author i.p.k., upon reasonable request. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors, and the publisher. references abdel-khalek, a., & snyder, c.r. (2007). correlates and predictors of an arabic translation of the snyder hope scale. journal of positive psychology, 2(4), 228–235. https://doi.org/10.1080/17439760701552337 ackerman, c.e., warren, m.a., & donaldson, s.i. (2018). scaling the heights of positive psychology : a systematic review of measurement scales. international journal of wellbeing, 8(2), 1–21. https://doi.org/10.5502/ijw.v8i2.734 adams, b.g., van de vijver, f.j.r., & de bruin, g.p. (2012). identity in south africa: examining self-descriptions across ethnic groups. international journal of intercultural relations, 36(3), 377–388. https://doi.org/10.1016/j.ijintrel.2011.11.008 alarcon, g.m., bowling, n.a., & khazon, s. (2013). great expectations: a meta-analytic examination of optimism and hope. personality and individual differences, 54(7), 821–827. https://doi.org/10.1016/j.paid.2012.12.004 allik, j., & mccrae, r.r. (2004). towards a geography of personality traits: patterns of profiles across 36 cultures. journal of cross-cultural psychology, 35(1), 13–27. https://doi.org/10.1177%2f0022022103260382 arnau, r.c., rosen, d.h., finch, j.f., rhudy, j.l., & fortunato, v.j. (2007). longitudinal effects of hope on depression and anxiety: a latent variable analysis. journal of personality, 75(1), 43–64. https://doi.org/10.1111/j.1467-6494.2006.00432.x asparouhov, t., & muthen, b. (2009). exploratory structural equation modelling. structural equation modelling: a multidisciplinary journal, 16(3), 397–438. https://doi.org/10.1080/10705510903008204 averil, j.r., catlin, g., & chon, k.k. (1990). rules of hope. new york, ny: springer. bandura, a. (1977). self-efficacy: toward a unifying theory of behavior change. psychological review, 84(2), 191–215. https://doi.org/10.1037/0033-295x.84.2.191 bandura, a. (1997). self-efficacy: the exercise of control. new york, ny: freeman. benitez-borrego, s., guardia-olmos, j., & urzua-morales, a. (2014). factorial structural analysis of the spanish version of whoqol-bref: an exploratory structural equation model study. quality of life research, 23, 2205–2212. https://doi.org/10.1007/s11136-014-0663-2 berg, c.j., ritschel, l.a., swan, d.w., an, l.c., & ahluwalia, j.s. (2011). the role of hope in engaging in healthy behaviours among college students. american journal of health behaviour, 35(4), 402–415. https://doi.org/10.5993/ajhb.35.4.3 bernardo, a.b. (2010). extending hope theory: internal and external locus of trait hope. personality and individual differences, 49(8), 944–949. https://doi.org/10.1016/j.paid.2010.07.036 bishop, e.c., & willis, k. (2014). ‘without hope everything would be doom and gloom’: young people talk about the importance of hope in their lives. journal of youth studies, 17(6), 778–793. https://doi.org/10.1080/13676261.2013.878788 brouwer, d., meijer, r.r., weekers, a.m., & baneke, j.j. (2008). on the dimensionality of the dispositional hope scale. psychological assessment, 20(3), 310. https://doi.org/10.1037/1040-3590.20.3.310 bryant, j., & elland, j. (2015). hope as a form of agency in the future thinking of disenfranchised young people. journal of youth studies, 18(4), 485–499. https://doi.org/10.1080/13676261.2014.992310 byrne, b.m. (2012). structural equation modeling with mplus: basic concepts, applications, and programming. new york, ny: routledge. chang, e.d., & banks, k.h. (2007). the color and texture of hope: some preliminary finding and implications for hope theory and counselling among diverse racial/ethnic groups. cultural diversity and ethnic minority psychology, 13(2), 94–103. https://doi.org/10.1037/1099-9809.13.2.94 cheavens, j.s., feldman, d.b., woodward, j.t., & snyder, c.r. (2006). hope in cognitive psychotherapies: on working with client strengths. journal of cognitive psychotherapy, 20(2), 135. chen, f.f. (2008). what happens if we compare chopsticks with forks? the impact of making inappropriate comparisons in cross-cultural research. journal of personality and social psychology, 95(5), 1005–1018. https://doi.org/10.1037/a0013193 cherrington, a.m. (2018). a framework of afrocentric hope: rural south african children’s conceptualizations of hope. journal of community psychology, 46(4), 1–13. https://doi.org/10.1002/jcop.21956 choi, y.h., lee, h.k., & lee, d. (2008). validation of the korean version of snyder’s dispositional hope scale. korean journal of social per psychology, 22(2), 1–16. demirli, a., türkmen, m., & arik, r.s. (2015). investigation of dispositional and state hope levels’ relations with student subjective well-being. social indicators research, 120, 601–613. emmons, r.a. (2003). the psychology of ultimate concerns: motivation and spirituality in personality. new york, ny: the guilford press. espinoza, m., molinari, g., etchemendy, e., herrero, r., botella, c., & rivera, r.m.b. (2017). understanding dispositional hope in general and clinical populations. applied research in quality of life, 12(2), 439–450. https://doi.org/10.1007/s11482-016-9469-4 feldman, d.b., davidson, o.b., & margalit, m. (2014). personal resources, hope, and achievement among college students: the conservation of resources perspective. journal of happiness studies, 16, 543–560. https://doi.org/10.1007/s10902-014-9508-5 furey, h. (2017). aristotle and autism: reconsidering a radical shift to virtue ethics in engineering. science and engineering ethics, 23, 469–488. https://doi-org.ufs.idm.oclc.org/10.1007/s11948-016-9787-9 galiana, l., oliver, a., sancho, p., & tomás, j.m. (2015). dimensionality and validation of the dispositional hope scale in a spanish sample. social indicators research, 120(1), 297–308. https://doi.org/10.1007/s11205-014-0582-1 gana, k., daigre, s., & ledrich, j. (2013). psychometric properties of the french version of the adult dispositional hope scale. assessment, 20(1), 114–118. https://doi.org/10.1177%2f1073191112468315 geiser, c. (2013). data analysis with mplus. new york, ny: guilford press. graham, c., & ruiz pozuelo, j. (2018). does hope lead to better futures? evidence from a survey of the life choices of young adults in peru. global economy and development. working papers 2018-038. human capital and economic opportunity working group. retrieved from http://dx.doi.org/10.13140/rg.2.2.12816.71686 griggs, s. (2017). hope and mental health in young adult college students: an integrative review. journal of psychosocial nursing and mental health services, 55(2), 28–35. https://doi.org/10.3928/02793695-20170210-04 guse, t., de bruin, g.p., & kok, m. (2016). validation of the children’s hope scale in a sample of south african adolescents. child indicators research, 9(3), 757–770. https://doi.org/10.1007/s12187-015-9345-z guse, t., & shaw, m. (2018). hope, meaning in life and well-being among a group of young adults. in a. krafft, p. perrig-chiello, & a. walker (eds.). hope for a good life: results of the hope barometer international research program (pp. 63–77). dordrecht: springer. guse, t., & vermaak, y. (2011). hope, psychosocial well-being and socioeconomic status among a group of south african adolescents. journal of psychology in africa, 21(4), 527–534. https://doi.org/10.1080/14330237.2011.10820493 habib, a. (2016) transcending the past and reimagining the future of the south african university. journal of southern african studies, 42(1), 35–48. https://doi.org/10.1080/03057070.2016.1121716 hirschi, a. (2014). hope as a resource for self-directed career management: investigating mediating effects on positive career behaviors and life and job satisfaction. journal of happiness studies, 15, 1495–1512. https://doi.org/10.1007/s10902-013-9488-x hofstede, g. (2011). dimensionalizing cultures: the hofstede model in context. in online readings in psychology and culture, unit 2. retrieved from http://scholarworks.gvsu.edu/orpc/vol2/iss1/8 hu, l.t., & bentler, p.m. (1999). cutoff criteria for fit indixes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 joshanloo, m. (2016a). a new look at the factor structure of the mhc-fs in iran and the united states using exploratory structural equation modeling. journal of clinical psychology, 72(7), 701–713. https://doi.org/10.1002/jclp.22287 joshanloo, m. (2016b). factor structure of subjective well-being in iran. journal of personality assessment, 98(4), 435–443. https://doi.org/10.1080/00223891.2015.1117473 kashdan, t.b., gallagher, m.w., silvia, p.j., winterstein, b.p., breen, w.e., terhar, d., & steger, m.f. (2009). the curiosity and exploration inventory-ii: development, factor structure, and psychometrics. journal of research in personality, 43(6), 987–998. https://doi.org/10.1016/j.jrp.2009.04.011 kemer, g., & atik, g. (2012). hope and social support in high school students from urban and rural areas of ankara, turkey. journal of happiness studies, 13(5), 901–911. https://doi.org/10.1007/s10902-011-9297-z khumalo, i.p., ejoke, u.p., asante, k.o., & rugira, j. (2021). measuring social well-being in africa: an exploratory structural equation modelling study. african journal of psychological assessment, 3(0), a37. https://doi.org/10.4102/ajopa.v3i0.37 khumalo, i.p., wilson, a., & brouwers, s.a. (2020). well-being orientations and time perspective across cultural tightness-looseness latent classes in africa. journal of happiness studies, 21, 1681–1703. https://doi.org/10.1007/s10902-019-00151-5 krafft, a.m., martin-krumm, c., & fenouillet, f. (2017). adaptation, further elaboration, and validation of a scale to measure hope as perceived by people: discriminant value and predictive utility vis-à-vis dispositional hope. assessment, 26(8), 1594–1609. https://doi.org/10.1177/1073191117700724 liu, s.r., kia-keating, m., & modir, s. (2017). hope and adjustment to college in the context of collective trauma. journal of american college health, 65(5), 323–330. https://doi.org/10.1080/07448481.2017.1312412 maree, d.j.f., & maree, m. (2005). assessment of hope and the process of constructing a gender-sensitive scale for hope within a south african context. paper presented at the hope: probing the boundaries conference, prague, czech republic, 8–10 august, 2005. maree, d.j.f., maree, m., & collins, c. (2008). constructing a south african hope measure. journal of psychology in africa, 18(1), 167–178. https://doi.org/10.1080/14330237.2008.10820183 marsh, h.w., liem, g.a.d., martin, a.j., morin, a.j.s., & nagengast, b. (2011). methodological measurement fruitfulness of exploratory structural equation modelling (esem): new approaches to key substantive issues in motivation and engagement. journal of psychoeducational assessment, 29(4), 322–346. https://doi.org/10.1177%2f0734282911406657 marsh, h.w., morin, a.j.s., parker, p.d., & kaur, g. (2014). exploratory structural equation modelling: an integration of the best features of exploratory and confirmatory factor analysis. annual review of clinical psychology, 10, 85110. https://doi.org/10.1146/annurev-clinpsy-032813-153700 marques, s.c., lopez, s.j., rose, s., & robinson, c. (2014). measuring and promoting hope in schoolchildren. in m.j. furlong, r. gilman & e.s. huebner (eds.), handbook of positive psychology in schools (2nd ed.). new york, ny: routledge. mbiti, j.s. (1990). african religions and philosophy. london: clays ltd. mbiti, j.s. (1991). introduction to african religion (2nd ed.). oxford: heinemann. muthén, l.k., & muthén, b.o. (1998–2017). mplus statistical analysis with latent variables: users’ guide (8th ed.). los angeles, ca: muthén & muthén. nel, p., & boshoff, a. (2014). factorial invariance of the adult state hope scale. south african journal of industrial psychology, 40(1), 01–08. https://doi.org/10.4102/sajip.v40i1.1177 o’sullivan, g. (2011). the relationship between hope, stress, self-efficacy, and life satisfaction among undergraduates. social indicators research, 101(1), 155–172. https://doi-org.ufs.idm.oclc.org/10.1007/s11205-010-9662-z park, e., & kim, j. (2017). the factor structure of the dispositional hope scale in hemiplegic stroke patients. journal of mental health, 0(0), 1–6. https://doi.org/10.1080/09638237.2017.1385735 perry, j.l., nicholls, a.r., clough, p.j., & crust, l. (2015). assessing model fit: caveats and recommendations for confirmatory factor analysis and exploratory structural equation modeling. measurement in physical education and exercise science, 19(1), 12–21. https://doi.org/10.1080/1091367x.2014.952370 pretorius, t.b. (2021). over reliance on model fit indices in confirmatory factor analyses may lead to incorrect inferences about bifactor models: a cautionary note. african journal of psychological assessment, 3(0), a35. https://doi.org/10.4102/ajopa.v3i0.35 roesch, s.c., & vaughn, a.a. (2006). evidence for the factorial validity of the dispositional hope scale: cross-ethnic and cross-gender measurement equivalence. european journal of psychological assessment, 22(2), 78–84. https://doi.org/10.1027/1015-5759.22.2.78 ryan, r.m. (ed.). (2012). the oxford handbook of human motivation. oxford: oxford university press. ryff, c.d., & singer, b. (1998). the contours of positive human health. psychological inquiry, 9(1), 1–28. https://doi.org/10.1207/s15327965pli0901_1 savahl, s., casas, f., & adams, s. (2016). validation of the children’s hope scale amongst a sample of adolescents in the western cape region of south africa. child indicators research, 9(3), 701–713. https://doi.org/10.1007/s12187-015-9334-2 scheier, m.f., & carver, c.s. (1985). optimism, coping, and health: assessment and implications of generalized outcome expectancies. health psychology, 4(3), 219–247. https://psycnet.apa.org/doi/10.1037/0278-6133.4.3.219 schnell, t. (2009). the sources of meaning and meaning in life questionnaire (some): relations to demographics and well-being. the journal of positive psychology, 4(6), 483–499. https://doi.org/10.1080/17439760903271074 scioli, a., ricci, m., nyugen, t., & scioli, e. (2011). hope: its nature and measurement. psychology of religion and spirituality, 3(2), 78–97. https://doi.org/10.1037/a0020903 seligman, m.e.p., & csikszentmihalyi, m. (2000). positive psychology: an introduction. american psychologist, 55(1), 5–14. https://doi.org/10.1037/0003-066x.55.1.5 snyder, c.r. (1995). conceptualizing, measuring, and nurturing hope. journal of counselling and development, 73(3), 355–360. https://doi.org/10.1002/j.1556-6676.1995.tb01764.x snyder, c.r. (2002). hopeful theory: rainbows in the mind. psychological inquiry, 13(4), 249–275. https://doi.org/10.1207/s15327965pli1304_01 snyder, c.r. (2004). hope and the other strengths: lessons from animal farm. journal of social and clinical psychology, 23(5), 624–627. https://doi.org/10.1521/jscp.23.5.624.50751 snyder, c.r., cheavens, j., & sympson, s.c. (1997). hope: an individual motive for social commerce. group dynamics: theory, research, and practice, 1(2), 107–118. https://doi.org/10.1037/1089-2699.1.2.107 snyder, c.r., harris, c., anderson, j.r., holleran, s.a., irving, l.m., sigmon, s.t., … harney, p. (1991). the will and the ways: development and validation of an individual-differences measure of hope. journal of personality and social psychology, 60(90), 570–585. https://doi.org/10.1037/0022-3514.60.4.570 snyder, c.r., hoza, b., pelham, w.e., rapoff, m., ware, l., danovsky, m., … stahl, k.j. (1997). the development and validation of the children’s hope scale. journal of pediatric psychology, 22(3), 399–421. https://doi.org/10.1093/jpepsy/22.3.399 snyder, c.r., shorey, h.s., cheavens, j., pulvers, k.m., adams iii, v.h., & wiklund, c. (2002). hope and academic success in college. journal of educational psychology, 94(4), 820–826. https://doi.org/10.1037/0022-0663.94.4.820 snyder, c.r., sympson, s.c., michael, s.t., & cheavens, j. (2000). the optimism and hope constructs: variants on a positive expectancy theme. in e.c. chang (ed.), optimism and pessimism (pp. 103–124). washington, dc: american psychological association. snyder, c.r., sympson, s.c., ybasco, f.c., borders, t.f., babyak, m.a., & higgins, r.l. (1996). development and validation of the state hope scale. journal of personality and social psychology, 70(2), 321–335. https://doi.org/10.1037/0022-3514.70.2.321 steger, m.f. (2012). experiencing meaning in life: optimal functioning at the nexus of well-being, psychopathology, and spirituality. in p.t. wong (ed.) the human quest for meaning (2nd ed., pp. 165–184). new york: routledge. stotland, e. (1969). the psychology of hope. san francisco, ca: jossey-bass. sun, q., ng, k.m., & wang, c. (2012). a validation study on a new chinese version of the dispositional hope scale. measurement and evaluation in counseling and development, 45(2), 133–148. https://doi.org/10.1177%2f0748175611429011 venning, a.j., eliott, j., kettler, l., & wilson, a. (2009). normative data for the hope scale using australian adolescents. australian journal of psychology, 61(2), 100–106. https://doi.org/10.1080/00049530802054360 wang, j., & wang, x. (2012). structural equation modeling: applications using mplus. west sussex: wiley. weis, r., & speridakos, e.c. (2011). a meta-analysis of hope enhancement strategies in clinical and community settings. psychology of well-being: theory, research and practice, 1(1), 5. https://doi.org/10.1186/2211-1522-1-5 wissing, m.p., & temane, q.m. (2008). the structure of psychological well-being in cultural context: towards a hierarchical model of psychological health. journal of psychology in africa, 18(1), 45–56. https://doi.org/10.1080/14330237.2008.10820170 wissing, m.p., thekiso, s.m., stableberg, r., van quickelberg, l., choabi, p., moroeng, c., … vorster, h.h. (2010). validation of three setswana measures for psychological wellbeing. south african journal of industrial psychology, 36(2), art. #860, 8 pages. https://doi.org/10.4102/sajip.v36i2.860 abstract introduction methods results discussion conclusion acknowledgements references about the author(s) charles h. van wijk department of global health, faculty of medicine and health sciences, stellenbosch university, cape town, south africa institute for maritime medicine, simon’s town, south africa citation van wijk, c.h., (2023). exploring the emotional dysregulation scale-short form in isolated, confined, and extreme environments. african journal of psychological assessment, 5(0), a119. https://doi.org/10.4102/ajopa.v5i0.119 original research exploring the emotional dysregulation scale-short form in isolated, confined, and extreme environments charles h. van wijk received: 03 sept. 2022; accepted: 02 jan. 2023; published: 22 mar. 2023 copyright: © 2023. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the emotional dysregulation scale-short form (eds-s) may have potential for assessing emotional dysregulation (ed) both in general clinical mental health environments and in specialised work settings. before it can be used fairly and appropriately, evidence of its validity in the local south african (sa) context is required. this study thus explored its psychometric characteristics among local working adult samples by pursuing three specific objectives, namely, to investigate its structural validity, its construct validity, and issues around practical use (e.g. priming bias and ability to predict performance). data were collected across four samples that comprised general workers and specialised naval personnel (total n = 1374), who also completed measures of clinical mental health and other adjustment difficulties. statistical analysis included examination of socio-demographic effects, internal consistencies, confirmatory factor analysis, measurement invariance, and associations with measures of mental health and adjustment difficulties (including binomial logistic regressions and receiver operating/operator characteristics curve analyses). this study reported evidence of structural and criterion validity, with significant associations to measures of mental health and adjustment difficulties, for the 12-item eds-s in non-clinical samples of sa workers. the study further provided preliminary support for its predictive utility in specialised work environments. preliminary evidence of validity of the eds-s in sa worker samples with sufficient english proficiency was demonstrated. contribution: there is some support for the use of the eds-s in clinical research and applied practise. however, caution must be observed for possible effects of language proficiency and further research into the role of language is required. keywords: eds-s; emotional dysregulation; isolated, confined and extreme environments; mental health; south africa; validity. introduction overview isolated, confined and extreme (ice) environments pose special challenges to psychological performance, and optimal adaptation in such environments is required to ensure well-being. successful adaptation is contingent on, among other things, appropriate emotional regulation. various mechanisms exist to measure emotional regulation, and this article investigates validity aspects of one such tool, namely, the emotional dysregulation scale (eds). isolated, confined and extreme environments isolated, confined and extreme environments refer to settings characterised by hostile external conditions, and an exposure to a range of context-specific physical, mental and social stressors, and often require engineering technology to maintain human survival. isolated, confined and extreme environments are, for instance, underwater habitats, spacecraft, remote weather stations, polar outposts, and in certain circumstances, ships at sea (suedfeld & steel, 2000; van wijk & martin, 2021). such ice environments may present considerable and often unique configurations of psychological challenges to individuals and groups working in such settings. challenges to survival may include a hostile climate and the mastery of specialised equipment for life support, as well as demands of constant vigilance – where neither critical nor routine tasks can be avoided or postponed, and where mistakes may have severe consequences. social challenges include restricted communication with the outside world, cramped living spaces, enforced intimacy with individuals not of one’s choosing (sandal, 2000, p. a37), navigating evolving group dynamics and emotional isolation. reports describe how persons in ice environments are exposed to exceptionally high levels of stress, resulting in higher-than-average rates of somatic symptoms, anxiety, depression, hostility, and mild cognitive impairment. these symptoms of stress appear to manifest themselves as health problems, reduced emotional well-being, decreased performance, and interpersonal tension (basner et al., 2014; kanas et al., 2009; palinkas & suedfeld, 2008; rohrer, 1961; sandal, 2000; shea et al., 2009). liu et al. (2016) further demonstrated that isolation and confinement result in a decreased ability to regulate emotions, as well as an increased vulnerability to negative emotions. appropriate psychological adaptation to these challenges would be critical to achieve and maintain both optimal performance and optimal well-being in such settings. broadly stated, psychological adaptation refers to an individual’s ability to adjust to changes in their environment to optimise personal functioning. in ice environments, successful psychological adaptation is traditionally operationalised in terms of gunderson’s antarctic triarchy (gunderson, 1973; palinkas et al., 2000; suedfeld & steel, 2000), which reflects three domains, namely: task ability (referring to the quality of work output). sociability (referring to the quality of interpersonal interaction; sometimes referred to as ‘social compatibility’). emotional stability (referring to the quality of internal self-regulation). psychological adaptation to ice environments is of increasing interest to southern africa. for example, the south african (sa) government maintains a polar icebreaker and three research/weather stations in antarctica and islands as part of the south african national antarctic programme (https://www.sanap.ac.za/). the south african navy (san) operates long-range patrol vessels (e.g. frigates) and submarines (https://en.wikipedia.org/wiki/south_african_navy), and a number of private companies in the oil and gas industry operate offshore drilling platforms from the angolan to the mozambican coasts. all of these examples may qualify as isolated and confined environments, and while not all are necessarily extreme, they are certainly unusual for those accustomed to living on terra firma. emotional dysregulation the emotional regulation (and dysregulation) aspect of adaptation is of particular interest, as it underpins personal performance across many facets of daily life, including family, work and sport (gross & thompson, 2007). in ice environments, individuals with more adaptive emotional regulation would be expected to more effectively manage their personal performance across work output, social interactions, and affective states, especially under the psychological rigorous demands of ice environments (palinkas & suedfeld, 2008). in contrast, individuals with less adaptive emotional regulation might be expected to have greater difficulty managing their personal performance across the same three domains. emotional regulation thus acts across domains to influence the maintenance of quality work output, social relations and emotional well-being. it may therefore be useful to know of problematic emotional regulation in individuals, as this can prime programme managers to either better prepare individuals for the rigours of ice environments or to advise against such exposures. emotional regulation can be defined as the ability of an individual to correctly identify, monitor, express and modulate the intensity and duration of an emotion or set of emotions (american psychological association [apa], 2022a; cole et al., 1994; raimondi et al., 2022). emotional dysregulation (ed) refers to the difficulty or inability to carry out this process, and in particular refers to extreme or inappropriate emotional response to a situation (apa, 2022b). as such, ed reflects deficits in awareness and acceptance of emotions, as well as in regulation strategies to manage intense, negative and shifting emotional states (gross & thompson, 2007; powers et al., 2015). emotional dysregulation is currently understood as a trans-diagnostic construct that has an impact on many psychological conditions, spanning from, among others, mood and anxiety disorders, substance use and personality disorders to autism spectrum disorder, psychological trauma and brain injury (cf. apa, 2022b; powers et al., 2015; raimondi et al., 2022; for summaries). developmental research suggests that these self-regulatory deficits emerge from an interaction of intrinsic temperamental and biological factors, as well as extrinsic intrusions, such as exposure to traumatic experiences, particularly in early life (bradley et al., 2011; powers et al., 2015). emotional dysregulation is not the same as negative affect. generally speaking, negative affect reflects the types of emotions people have (e.g. anger, fear and sadness), while emotion regulation reflects the ability to adaptively manage the intensity and duration of emotions (including negative ones) as they arise (powers et al., 2015, p. 86). this distinction has an important practical application, in that patients can be taught strategies for how to manage intense, negative emotions as they occur (powers et al., 2015). components of ed include a tendency for emotions to spiral out of control, change rapidly, get expressed in intense and unmodified forms, and/or overwhelm both coping capacity and reasoning (bradley et al., 2011). measuring emotional dysregulation a number of self-report instruments are available to measure ed, including the 36-item difficulties in emotion regulation scale (ders; gratz & roemer, 2004), which measures six dimensions of difficulties in emotion regulation, and the 10-item emotional regulation questionnaire (erq; gross & john, 2003), which measures two emotional regulation strategies. in spite of their widespread use, they also have serious limitations (cf. powers et al., 2015; raimondi et al., 2022, for critique). bradley et al. (2011) developed the 24-item eds. items are scored on a seven-point likert scale and assess domains of emotional experiencing, cognition and behaviour. the scale demonstrated high internal consistency, replicated across samples (chanana & sharma, 2019). emotional dysregulation scale-24 scores were significantly correlated to childhood trauma and negative affect, as well as significant predictors of post-traumatic stress symptoms, history of alcohol and drug abuse problems, depressive symptoms and lower global adaptive functioning (bradley et al., 2011). while eds-24 scores were significantly associated with all the subscales of the ders, it also demonstrated incremental validity over the ders in predicting different psychopathological conditions. recent studies have supported the use of the eds in a variety of clinical populations, for example, with patients suffering from mood disorders or post-traumatic stress disorder (ptsd; christ et al., 2019; mekawi et al., 2020; pencea et al., 2020). in response to the criticisms of the ders, erq and length of the eds-24, powers et al. (2015) developed the eds-short form (eds-s), based on an exploratory factor analysis (efa) of the original eds 24-item scale, which yielded one factor. the 12 items with the highest loadings were then chosen for the eds-s. the bivariate correlation between the 24-item and 12-item eds scales was extremely high (r = 0.98, p < 0.001). the eds-s retained the seven-point likert scale, with items assessing the domains of emotional experiencing (‘emotions overwhelm me’), cognition (‘when i’m upset, everything feels like a disaster’), and behaviour (‘when my emotions are strong, i often make bad decisions’) and higher scores indicating higher ed. high internal consistency for the eds-s has been reported (mandavia et al., 2016; michopoulos et al., 2015; powers et al., 2015; raimondi et al., 2022). the eds-s demonstrated a significant correlation with ders, and appeared predictive of depressive symptoms, ptsd symptoms, alcohol abuse, borderline personality disorder, general psychopathology, suicidality and psychiatric hospitalisation, and was negatively associated with positive affect and resilient coping (mandavia et al., 2016; michopoulos et al., 2015; powers et al., 2015; raimondi et al., 2022). confirmatory factor analysis (cfa) of an italian version suggested a unidimensional structure (raimondi et al., 2022). table 1 provides a summary of published data on the eds-24 and eds-s. table 1: summary of published data on the emotional dysregulation scale. aims and overview of studies the eds may have potential for assessing ed in both specific ice environments and clinical mental health settings. for example, responses to the eds could be used to guide decisions around inclusion/exclusion of individuals applying for missions in ice environments, or to guide appropriate preparation or advance intervention for such persons. it could also be used for research within clinical settings to better understand the role of emotional regulation in the development of, or protection against, mental disorders. however, neither its fair and unbiased use (employment equity act, 1998) nor its clinical or practical validity (i.e. accuracy in identifying risk) have been established in the sa context. validation is a constant process, involving a continuum of evidentiary support, including evidence of internal structures and effects of context and sample characteristics (schaap & kekana, 2016). therefore, before it can be used with confidence, a better understanding of the instrument in the indigenous sa context is required. this study thus set out to explore the psychometric characteristics of the scale among local population samples. it used data collected across four studies to pursue three specific objectives: firstly, it investigated the structural validity of the eds. secondly, it investigated the construct validity of the eds, by exploring its associations with measures of common mental disorders and indicators of adjustment difficulties, as well as describing the eds profile in a group that has demonstrated good adaptation in an ice environment. thirdly, it investigated two issues around practical use, namely the eds’ susceptibility to priming bias and the eds’ ability to predict self-rated performance in ice contexts. the studies were set up as follows: in general terms, validity refers to the extent to which a scale measures what it claims to measure. study 1 thus investigated, firstly, the structural validity of the eds by describing its psychometric characteristics in a general sa workplace sample (including internal consistency and test–retest reliability, socio-demographic effects and dimensionality), and secondly, the construct validity of the eds by exploring its association with measures of common mental disorders and other indicators of mental (ill)health history, work adjustment and experience of stress overload. priming is the phenomenon according to which the recent experience of a stimulus facilitates or inhibits later processing of the same or a similar stimulus (apa, 2022c). in other words, it describes how the introduction of one stimulus influences how people respond to a subsequent stimulus (cherry, 2021). one example is repetition priming, in which the presentation of a particular stimulus increases the likelihood that participants will identify the same or a similar stimulus later in a test. in semantic priming, presentation of a word or symbol influences the way in which participants interpret a subsequent word or symbol (apa, 2022c). priming works by activating an association or representation in memory, and can work with stimuli that are perceptually, linguistically or conceptually related (cherry, 2021). this phenomenon is thought to generally occur outside of conscious awareness. in research, measures of ed are often used in combination with measures of other psychological constructs, raising the question how such combinations may influence responses on the eds, depending on where – in a sequence of measures – the eds is administered. study 2 thus aimed to investigate the eds’ susceptibility to priming bias by exploring the effects of mood state responses on the priming of eds responses. successful psychological adaptation in ice environments has traditionally been operationalised in terms of gunderson’s antarctic triarchy, namely performance in the domains of task ability (work-output), sociability (interpersonal relations) and emotional states. by monitoring and modulating (apa, 2022a) a person’s inner state, adaptive emotional regulation may act across these three domains to optimise personal performance (referring here to work and social behaviour, well-being, etc.). study 3 thus aimed to investigate the eds’ ability to predict performance in ice contexts by exploring eds-s total score associations with firstly, self-rated performance assessment on the antarctic triarchy, and secondly, mood state as measured by the brunel mood scale (brums), at the end of a 3-month ice mission. submarines constitute a very specific example of an ice environment, and submariners have traditionally been considered as particularly good adaptors (kimhi, 2011; van wijk, 2017, 2022; weybrew & noddin, 1979). good emotional regulation – and low eds-s scores – would be expected from this group. study 4 thus aimed to describe the eds-s profile of a population that has demonstrated good adaptation in an ice environment, namely a small sample of san submariners. all four studies used a cross-sectional survey design. it needs to be noted that the eds and other measuring scales employed in this study were designed for mental health screening and not for clinical diagnosis purposes, and their use for diagnostic ends are not recommended. methods study 1 participants and procedure the sample was drawn from non-clinical and skilled worker populations who volunteered to complete the scales and questionnaires during employer-sponsored occupational health surveillance initiatives. prior to giving their consent and providing any information, volunteers were briefed that the completion of the eds-24 would not influence their health screening or any subsequent health support. the mean age of the 1006 participants was 33 years (±8, range: 20–60 years), and 33.4% of the sample were women. english as the first language was spoken by 19.1% of the sample, with the rest reporting the other 10 sa official languages as their mother tongues. all participants self-reported as proficient in english; however, actual english proficiency was not objectively established. the detailed distribution across language and occupational fields is presented in table 2. table 2: sample distribution across home language and occupational field. measures emotional dysregulation scale: the full 24-item eds (bradley et al., 2011) was administered in its standard format, in english. respondents were asked to rate each item on a seven-point likert scale (1 = ‘not true’, 7 = ‘very true’), with higher scores reflecting greater ed. a subsample (n = 131) competed the same version of the eds 35 days after the first administration. this was a purely convenience sample (i.e. who could be contacted and was available at the time), used to investigate test–retest reliability. self-report questionnaire: participants also completed a self-report questionnaire with four sections: mental health history, consisting of three items with yes/no answers, enquired about previous admission to hospital or clinic for mental health concerns, previous psychological or psychiatric out-patient treatment, and previous treatment for alcohol or substance abuse or addiction. adjustment at work, consisting of two items with yes/no answers, enquired about concerns regarding interpersonal relations in their workgroup (conflict with co-workers, supervisors), and disciplinary issues at work during the past 2 years. domestic discord enquired about difficulties in relationships with partner/immediate family. finally, two items reflected the very brief screen for adult attention deficit/hyperactivity disorder (adhd) and scores ≥ 1 were considered suggestive of adhd (van wijk & firfirey, 2020; zimmerman et al., 2017). indicators of common mental disorders: current clinical syndromes were identified using locally validated (cf. van wijk et al., 2021) scales for common mental disorders: the patient health questionnaire for depression (phq-9; gilbody et al., 2007) was used to screen for depression, with a score of ≥ 10 used for identifying cases (α = 0.79 for this study). the generalised anxiety disorder scale (gad-7; löwe et al., 2008) was used to screen for generalised anxiety disorder, with a score of ≥ 10 identifying cases (α = 0.82 for this study). the primary care screen for ptsd using dsm-5 criteria (pc-ptsd-5; bovin et al., 2021) was used to screen for ptsd, with a score of ≥ 3 identifying cases (α = 0.73 for this study), and the cage (cut, annoyed, guilty, eye-opener) questionnaire (dhalla & kopec, 2007) was used to screen for problematic alcohol use, with a score of ≥ 2 identifying cases (α = 0.57 for this study). stress overload: current stress overload, in a subsample of 224 participants, was measured with the 10-item stress overload scale-short form (sos-s; amirkhan, 2018). this was to identify participants who were experiencing the demands of life as overwhelming their available resources. previous sa research suggested that scores > 20 were associated with significant mental health difficulties (van wijk, 2021). data analysis all statistical analyses were conducted by means of statistical package for social sciences (spss) (ibm spss for windows, version 27) and analysis of moment structures (amos). means, standard deviations (s.d.), and range (for the full scale and the short form) were calculated. to confirm the 12-item eds-s, the current study carefully replicated the original process undertaken by powers et al. (2015), which included an efa and retaining items with the highest loadings. the rest of the analysis is based on the 12-item eds-s. the effects of socio-demographic variables were explored using pearson’s correlation coefficients and analysis of variance (anova) for age, as well as t-tests for independent samples for gender and language effects. for this analysis, language was coded into two groups, namely english first language (19.1%) and non-english first language (80.8%), and age was coded into four groups (20–29, 30–39, 40–49 and 50–60). internal consistencies were examined with cronbach’s α, inter-item correlations and corrected item-total correlations. test–retest reliability was examined by comparing the two administrations of the eds, 35 days apart (n = 131), using a paired sample t-test. the earlier efa with the 24-item version suggested a single factor, and a cfa previously found a unidimensional structure in an italian version of the eds-s (raimondi et al., 2022). a cfa was thus conducted to test a model with a unidimensional structure. confirmatory factor analysis is used to test whether the data fit a hypothesised measurement model (marker, 2002). the maximum likelihood estimator was used to explore a one-factor model fit. for a cfa, the global fit χ2 would ideally be small and not significant, but it is rarely achieved, and the following indices with cut points were also taken into consideration: the root mean square error of approximation (rmsea) should be < 0.06 to < 0.08 for continuous data, while both the comparative fit index (cfi) and the tucker-lewis index (tli) should be > 0.95 (schreiber et al., 2006). bartlett’s test of sphericity and the kaiser–meyer–olkin (kmo) test were performed to assess whether the data were suitable for factor analysis. adequacy of the correlation matrix would be indicated by a significant bartlett’s test (p < 0.05) and a kmo index > 0.70. measurement invariance refers to the generalisability element of construct validity (putnick & bornstein, 2016), and is assessed when scores need to be compared across groups (e.g. gender and language). scales need to be invariant with respect to the way the latent constructs are formed (configural invariance), and the indicators or items should load similarly on latent factors across the groups (metric invariance). the requirement for invariance is that the difference in global χ2 between hierarchical models is not significant. the measurement invariance for the eds-s was evaluated for gender (men and women) and language (english first language speakers and non-english first language speakers). construct validity was explored by examining associations between the eds-s and indicators of common mental disorders (phq-9, which was also coded for the presence of major depressive disorder; gad-7, also coded for the presence of generalised anxiety disorder; pc-ptsd-5, also coded for the likelihood of ptsd; cage questionnaire, also coded for the likelihood of alcohol use disorder), as well as the other self-reported indicators of adjustment difficulties (as described earlier). pearson’s correlations were calculated for scaled markers, while t-tests for independent samples were conducted for categorical markers (i.e. indicators with yes/no answers). because ed has been associated with varying types of psychopathologies, divergence across psychiatric symptoms was not expected, and it was predicted that ed would show positive associations with mood, anxiety and alcohol use disorder symptoms, as well as psychiatric hospitalisations and lower global adaptive functioning (powers et al., 2015, p. 86). positive findings of associations were explored further to determine the extent of each indicator’s contribution to variance on the eds-s. a series of binomial logistic regressions were conducted for 12 indicators of common mental disorders and adjustment difficulties. receiver operating/operator characteristics (roc) curve analyses were also conducted for these indicators. study 2 overview of the study the sample completed two instruments, in booklet form, in a cross-over design. one version of the booklet (‘condition 1’) presented the questionnaires in the format of brums first, then 10 affectively neutral biographical items and then the eds-24. a second version of the booklet (‘condition 2’) presented the questionnaires in the format of eds-24 first, then 10 affectively neutral biographical items and then the brums. participants and procedure the sample consisted of naval administrative personnel who volunteered to complete the questionnaires during their biennial occupational health screen. the study booklet containing the scales was additional to their screening and was completed anonymously. prior to the questionnaire administration, they were briefed that completion of the booklet would be considered as implied consent. consequentially, no consent forms were completed, and the researchers could not know who had completed the booklet and who had not. the sample of 168 had a mean age of 31.5 years (s.d. = 5.6, range: 21–50, with 65% concentrated in the 26–35-year age band), and included 25 (14.9%) women. all participants had at least a grade 12 education, with 88% also in possession of higher vocational training certificates. all self-identified as proficient in english. the two subgroups were well matched, with no significant differences in gender composition (χ2 < 0.001, p = 0.996) or mean age (t = 0.799, p = 0.426). the full sample completed the questionnaire booklet in a single session. a total of 200 questionnaires were prepared (100 of each version) and were handed out randomly, resulting in the unequal subgroup sizes. of the 174 booklets returned, six cases were excluded because of missing data points. measures emotional dysregulation scale: the 24-item eds (bradley et al., 2011) was administered, with full sample cronbach α = 0.91. brunel mood scale: the brums is a 24-item self-report inventory that measures transient affective mood states (terry et al., 1999, 2003a), using a five-point likert scale (0 = not at all, 4 = extremely). it has been used extensively, and a substantial body of literature exists on its use in many domains – from sports performance (lane et al., 2005) to academic achievement (thelwell et al., 2007), as well as a marker of mental health (brandt et al., 2016). good concurrent and criterion validity has been reported internationally (terry et al., 1999, 2003a) and in sa (terry et al., 2003b). a cronbach α of 0.79 was calculated for this study. a total mood distress score – where higher scores represent greater distress – can be calculated and was used in this study. data analysis the scales were administered in their standard format, and the respective total scores were calculated according to standard procedures. only total scale scores are reported in this study. scale associations were analysed using pearson’s correlation coefficients. this was done for the total sample, as well as the two conditions. priming effects were further explored with t-tests for independent samples (for both eds-24 and eds-s). cohen’s d was employed to consider effect sizes. all statistical analyses were conducted using spss-27. study 3 overview of the study a sample of san sailors preparing for a long-range maritime patrol (3-month duration) completed the eds-s 1 week prior to departure. at the end of the mission, they completed a self-assessment of their performance relating to work-output, social relations and emotional stability during the patrol, and also completed the brums. participants and procedure the sample comprised 152 naval volunteers who consented to complete the scales and questionnaires immediately prior, and at completion of a ship-based operational patrol of 3 months. the sample had a mean age of 31.6 (±5.6, range: 21–50 years), and comprised 21 women (13.8%) and 131 men (86.2%). of the total group, 76 (50%) worked in navy-specific fields, 49 (32.2%) in technical and engineering fields and 27 (17.8%) in support fields. measures emotional dysregulation scale-short form: the 12-item eds-s (powers et al., 2015) was administered, in english, in the week prior to departure. a mean total score = 15 (±5; range: 12–48) and cronbach α = 0.89 were calculated for this sample. brunel mood scale: the 24-item brums (terry et al., 2003a) was administered, in english. this was done at week 12, at the end of the patrol. cronbach α for this sample was 0.82. self-report assessment of performance: participants were invited to rate their performance on the triarchy using a three-item, 10-point scale (1 = ‘poor’, 10 = ‘very good’), with the instruction set referring to ‘during the past six weeks’. the items referred to: (1) ‘the quality of your work output’, (2) ‘the quality of your interpersonal interactions (e.g. how you got along with others)’ and (3) ‘the quality of your emotional state (e.g. how you were mostly feeling)’. this was done at week 12, at the same time as the brums. data analysis pearson’s correlation coefficients were calculated, and linear regression analysis (with eds-s as regressor) was used to predict both self-reported performance across the triarchy and mood state. all statistical analyses were conducted using spss-27. study 4 participants successful san submariners were invited to complete the eds-s anonymously and briefed that completion of the scale will be considered as implied consent. submariners were considered successful (i.e. good adaptors) based on a number of criteria, including (1) completed at least 2 years of operational experience after qualification, (2) have no organisational record of poor psychological adaptation on submarines and (3) received positive supervisors’ reports, including a recommendation for continued use on-board submarines (personal correspondence, institute for maritime medicine, 19 august 2022). the sample of 48 participants had a mean age of 40.0 years (±6.9), comprised of 18 (37.5%) women and 30 (62.5%) men, with 18 (37.5%) reporting english as first language and 30 (62.5%) reporting other sa languages as their first language. english is the language spoken onboard the submarines. all participants were highly skilled and in possession of post-school tertiary academic training or advanced technical qualifications. measures and data analysis the eds-s was administered, in english. descriptive statistics were calculated, as were differences between the sample’s mean score and that of the general worker sample reported in study 1, using a t-test for single samples. ethical considerations this project has been approved by the health research ethics committee of stellenbosch university (reference number: n20/07/078). results study 1: descriptive scale scores the eds-24 had a mean total score of 36.6 (±16.6) and a range of 24–152. cronbach α = 0.93, and no deletion of items improved it. to confirm the 12-item eds-s, the current study carefully replicated the original process undertaken by powers et al. (2015). an efa, using a scree-test, identified one factor (explaining 41.2% of variance) on which all items loaded. after the item-loadings were examined, the same 12 items were retained. as with the original study, a strong bivariate correlation was found between the 24-item and 12-item scales (r = 0.96, p < 0.001). the eds-s had a mean total score of 17.6 (±8.7) and a range of 12–76. no floor or ceiling effects were detected. study 1: evidence for structural validity socio-demographic effects: age showed a small but significant correlation to eds-s scores (r = −0.171, p < 0.001). however, this was not a linear distribution, and an anova (f3,1002 = 11.915, p < 0.001) indicated that higher scores were clustered in the age bracket 20–29 years. there were no significant differences in the mean total scores of women and men (t = -0.931, p = 0.352, cohen’s d = 0.063), or of english first language and not-english first language speakers (t = 0.831, p = 0.4.7, cohen’s d = 0.068). internal consistency and test–retest reliability: the eds-s cronbach α was 0.91, and no deletion of items improved it. inter-item correlations ranged from 0.323 to 0.600, while corrected item-total correlations ranged from 0.557 to 0.742. the eds-s showed good temporal stability over 35 days (t = 1.1914, p = 0.07; mean difference = 0.6; r = 0.950, p < 0.001). dimensionality: the correlation matrix was adequate for factor analysis (bartlett’s test = 5540.852; degree of freedom [df] = 66; p < 0.001; kmo = 0.942), and the 12-item eds-s was subjected to cfa. although the one-factor model did not obtain a non-significant χ2 (χ2 = 379.118, df = 54), the value was not excessively high. further, while not an absolute fit, the rmsea (0.067; 90% ci: 0.061–0.076) was adequately small (< 0.08), and the cfi (0.941) and tli (0.928) also supported an adequate fit. standardised loadings were relatively uniform, ranging from 0.59 to 0.79. thus, the unidimensional model appeared to have an acceptable fit to the data. measurement invariance: the eds-s for women and men showed acceptable configural invariance but did not reach metric invariance (δχ2 = 79.68, δdf = 11, p < 0.001). the eds-s for english first language speakers and non-english first language speakers also showed acceptable configural invariance but again did not reach metric invariance (δχ2 = 37.40, δdf = 11, p < 0.001). study 1: evidence for construct validity correlations between the eds-s and screeners for common mental disorders were all significant (p < 0.001). emotional dysregulation correlated significantly and positively with clinical measures of depression (phq-9, r = 0.540) and general anxiety (gad-7, r = 0.540), with large effect sizes. significant positive correlations were also observed for ptsd (pc-ptsd-5, r = 0.372) and stress overload (sos-s, r = 0.496), with moderate effect sizes. the positive correlation with the measure of problematic alcohol use was significant (cage, r = 0.264) but with small effect size. emotional dysregulation scale-short form total scores further differentiated significantly between individuals with positive indicators on all the mental health and adjustment difficulty questions, and those without (p > 0.001), which are presented in table 3. large effect sizes (cohen’s d ≥ 0.8) were observed for all 12 indicators. table 3: t-test for independent samples for emotional dysregulation scale-short form and selected indictors of common mental disorders and other adjustment difficulties. the results of the binomial logistic regressions, as well as the results of the roc curve analysis, are presented in table 4. the binomial logistic regressions for all 12 indicators of common mental disorders and adjustment difficulties were statistically significant (p < 0.01). the model for each indicator of common mental disorders explained 19% – 29% of variance. the model for each indicator of life difficulties explained 5% – 20% of variance. the logistic regression further correctly classified 87.5% – 99.1% of cases. clinically useful (> 80%) areas under the curve were reported for mental disorders, and acceptable areas under the curve were reported for other indicators of more general adjustment difficulties (66% – 77%). table 4: binomial regression predicting selected indicators of common mental disorders and other adjustment difficulties using emotional dysregulation scale-short form scores. study 2: priming effect the brums total mood distress score for the full sample was -7.75 (±6.6, range: -16 to 16), the mean total score for the eds-24 was 32.42 (±10.9; range: 24–88) and 15.5 (±5.3; range: 12–48) for the eds-s. the full sample scale totals for the brums and eds-24 correlated significantly and positively (r = 0.502, p < 0.001). stronger correlations were found for the condition 1 sample (r = 0.610, p < 0.001) than for the condition 2 sample (r = 0.438, p < 0.001). the scale total score outcomes of the t-tests for independent samples are reported in table 5. three individual items of the eds-24 represented the largest (0.4–0.6) mean differences. table 5: the outcome of the t-tests for independent samples for brunel mood scale and emotional dysregulation scale. study 3: prediction of performance in isolated, confined and extreme contexts the correlations between eds-s scores and self-report performance and mood state are presented in table 6. baseline ed correlated significantly to both self-rated performance and self-report mood state, with modest effect sizes. table 7 presents outcomes of a linear regression analysis, where the eds-s significantly predicted self-rated performance and mood state during an ice environment exposure, again with modest effect sizes. table 6: correlations between baseline emotional dysregulation scale-short form scores and self-rated performance at end of deployment (week 12). table 7: linear regression analysis with emotional dysregulation scale-short form. study 4: emotional dysregulation scale-short form profile of san submariners the submariners had a mean eds-s score of 12.9 (±1.2, range: 12–16). there were no significant differences between the mean scores of english first language and non-english first language speakers (t = -1.848, p = 0.07), and the sample mean was significantly lower than that of the general worker sample (t = -25.534, p < 0.001, cohen’s d = 3.69). discussion the mean eds-s total score of the current (non-clinical) workplace sample was substantially lower than earlier studies that focussed on vulnerable individuals (e.g. with history of psychological trauma, psychiatric disorders, etc.). the degree of difference could in part be attributed to the fact that the current sample consisted of generally healthy and employed individuals who had access to employer-sponsored health and well-being services. cronbach’s α was similar to published studies. evidence of validity the first aim was to explore evidence of structural validity. the lack of significant gender effects was expected (powers et al., 2015), and age effects were consistent with previous reports that suggested that, as people get older, they learn to cope with stressors and avoid emotionally triggering situations (raimondi et al., 2022, p. 424). evidence for structural validity could be found in the acceptable unidimensional model fit, similar to the italian version (raimondi et al., 2022) in this non-clinical population. good internal reliability and temporal stability (at least over the short term) were also demonstrated. however, the eds-s only achieved configural measurement invariance, but not metric invariance, for both gender and language. in the absence of significant differences in mean total scores between gender and language groups, this finding would require further exploration. thus, evidence of structural validity was found, although the limited measurement invariance suggests the need for some caution in practical application. the second aim was to explore evidence of construct validity. in this regard, significant associations with measures of psychopathology and general adjustment were demonstrated. emotional dysregulation scale-short form total mean scores were associated with indicators of mood, anxiety, problematic alcohol use, ptsd, adhd, and history of psychiatric hospitalisations and mental health treatment. emotional dysregulation scale-short form total mean scores were also associated with indicators of general adjustment difficulties, including problematic interactions in the workplace and at home, and could differentiate between individuals with positive indicators on all the markers of mental health and adjustment difficulties, and those without. the eds-s appeared particularly useful in predicting depression, general anxiety and stress overload. previous reports established the association of ed with varying types of psychopathologies, and thus divergence across psychiatric symptoms was not expected in this study. indeed, in support of earlier findings (christ et al., 2019; mandavia et al., 2016; mekawi et al., 2020; michopoulos et al., 2015; pencea et al., 2020; powers et al., 2015; raimondi et al., 2022), the eds-s predicted symptoms and indicators associated with mood and anxiety disorders, ptsd, history of psychiatric hospitalisation and problematic substance use, as well as non-clinical indicators of adjustment difficulties. this finding supports the understanding of ed as a trans-diagnostic process that impacts many psychological conditions and experiences, both clinical disorders and non-clinical indicators of adjustment. further, low and homogenous eds-s scores were found in an ice sample expected to have homogenously low scores, namely navy submarine crews, a group of proven good adaptors in their ice context. their low scores, consistently observed throughout the sample, suggest low ed and good adaptation. thus, evidence of construct validity were demonstrated, in these non-clinical samples of general workers and san specialists. consideration for practical use priming: the two conditions in study 2 were equal in age, gender and brums scores, but significantly different in eds scores, depending on the order of administration. when the brums was completed first, there were significantly higher ed scores than when the eds was completed first (mean difference 5.8, representing half a s.d. from the full sample mean). three eds-24 items had particularly large mean differences between conditions (> 0.4), and might be particularly susceptible to priming. in this regard, it is noteworthy that these three items have already been removed in the eds-s, and that the mean difference between eds-s total scores across conditions represent less than half a sd from the full sample mean, suggesting that the short form might be more resilient to priming. prior completion of the eds did not appear to bias responses to the brums. the eds asks questions in a ‘general’ sense, which may make it more susceptible to priming, whereas the brums asks about specific current timeframes, thus possibly offering less opportunity for priming. in summary, the scale was found to be potentially vulnerable to priming bias, which may need to be considered when it is included in battery format administration. it may be particularly susceptible to the effects of measures with very specific instruction frames when sequenced prior to eds administration. the short form appeared more resilient to priming effects, suggesting its preferential use (as opposed to the 24-item version) in battery administration. prediction of performance: the eds-s predicted self-rated performance in this sample deployed into an ice context. however, the effect sizes were very modest, which could limit its value for practical application in this context at this time. the smaller effect sizes may in part be because of the practise that all navy sailors undergo an annual mental health screening, and only those without debilitating mental health concerns would then be eligible for deployment. this was visible in the current sample, in the limited range of scores on the eds-s, and in that no ed was noted in participants’ responses. it could also be hypothesised that the relatively short time frame of 3 months may not be enough to elicit more severe expression of ed. sailors might be able to cope over short periods, whereas maintaining good emotional regulation may become more difficult over longer time frames. the fact that both the ed and rated performances were self-reported, was a limitation to this study, and more research may be required to confirm its practical utility in ice environments. practical application this study’s findings have immediate practical application to ice workplaces: across varying iterations, ice environments place greater demands on individuals’ and groups’ adaptive functioning capacities than is typically found in more conventional environments (palinkas & suedfeld, 2008; sandal, 2000; shea et al., 2009). isolation and confinement also decrease an individual’s ability to regulate emotions (liu et al., 2016), making people in ice settings vulnerable to health problems, reduced emotional well-being, decreased performance and interpersonal tension (basner et al., 2014; palinkas & suedfeld, 2008; sandal, 2000; shea et al., 2009). when included as part of a comprehensive psychological assessment, the eds-s could become a useful tool to assess risk for poor emotional regulation, serving three purposes: (1) assessment of ed risk could guide the selecting-out of individuals with high-risk profiles; (2) knowing risk profiles could allow for increased support through closer monitoring of high-risk individuals, either by remote programme directors, or local expedition medical staff and (3) the eds-s could be used to better prepare individuals – prior to ice missions – through greater awareness of their own ed risks and the development of coping strategies to enhance appropriate emotional self-regulation. such initiatives could be employed by the san on their ships and submarines or in other military missions (e.g. current protracted peacekeeping operations across africa), as well as the south african national antarctic programme. this may also be useful for private companies in the offshore oil and gas industry for the selection, preparation and placement of staff. the association of ed with adverse childhood experiences (ace) has previously been demonstrated by the eds (bradley et al., 2011; christ et al., 2019; mandavia et al., 2016). south africa has many young adults with history of ace (manyema & richter, 2019), and possibly even more children currently experiencing aces, which may warn of the risk of a major mental health epidemic in the near future. with the current evidence of validity, the eds-s can now be used with some confidence in local studies of similar populations, and particularly with investigations into the association of ace, ed as adults, and associated poor mental health and adjustment outcomes. the eds-s can further be used in mental health service settings to guide targeted treatment for persons with depressive or anxiety symptoms (fehlinger et al., 2013; mennin, 2006). limitations and future directions the study used non-clinical samples of workplace populations who were generally well educated and in good health, with good self-reported english proficiency. results cannot necessarily be generalised to the wider sa population, and additional samples from diverse sectors of society would be helpful to confirm the results. further, assessment tools like the eds-s rely on respondents’ literacy with regard to the semantic descriptions of emotional distress. individuals without the english proficiency of the current samples might be challenged to express their experience of emotional regulation in english. future research would be invaluable to validate the eds-s in samples with different levels of language proficiency. future studies also need to test this instrument in clinical samples and other groups vulnerable to poor mental health outcomes. further exploration of measurement invariance, across gender and language, would provide further confidence in the eds-s. finally, future studies need to test the application of the eds-s across different ice contexts (e.g. ships at sea vs weather stations on isolated islands), across different time frames (shorter vs longer missions) and to use more objective ratings of performance across the triarchy (e.g. supervisor or peer rating of quality of work and interpersonal relations, and extended measures for emotional well-being). conclusion this study reported evidence of validity for the 12-item eds-s. it made a novel contribution in that it replicated previous investigations in a sa context: evidence of structural and construct validity was demonstrated, in non-clinical samples of sa workers, and significant associations with measures of mental health and adjustment difficulties were reported. this study further provided preliminary support for the eds-s to predict self-rated performance in ice environments. there is some support for the use of the scale in clinical research (e.g. exploring associations between ed and ace) and applied practise (e.g. assessment of psychological performance in ice environments). however, caution must be observed for possible effects of language proficiency, and further research into the role of language is required. acknowledgements competing interests the author declares that he has no financial or personal relationships that may have inappropriately influenced him in writing this article. author’s contributions c.h.v.w is the sole author of this article. funding information this research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. data availability the data from study 1 are available from the author upon reasonable request. data from studies 2 to 4 are not publicly available. disclaimer the views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of any affiliated agency of the author, and the publisher. references american psychological association (apa) (2022a). emotion regulation. apa dictionary. retrieved january 17, 2022, from https://dictionary.apa.org/emotion-regulation american psychological association (apa) (2022b). emotional dysregulation. apa dictionary. retrieved january 17, 2022, from https://dictionary.apa.org/dysregulation american psychological association (apa) (2022c). priming. apa dictionary. retrieved march 08, 2022, from https://dictionary.apa.org/priming amirkhan, j.h. (2018). a brief stress diagnostic tool: the short stress overload scale. assessment, 25(8), 1001–1013. https://doi.org/10.1177/1073191116673173 basner, m., dinges, d.f., mollicone, d.j., savelev, i., ecker, a.j., di antonio, a., jones, c.w., hyder, e.c., kan, k., morukov, b.v., & sutton, j.p. (2014). psychological and behavioral changes during confinement in a 520-day simulated interplanetary mission to mars. plos one, 9(3), e93298. https://doi.org/10.1371/journal.pone.0093298 bovin, m.j., kimerling, r., weathers, f.w., prins, a., marx, b.p., post, e.p., & schnurr, p.p. (2021). diagnostic accuracy and acceptability of the primary care posttraumatic stress disorder screen for the diagnostic and statistical manual of mental disorders (fifth edition) among us veterans. jama network open, 4(2), e2036733. https://doi.org/10.1001/jamanetworkopen.2020.36733 bradley, b., defife, j.a., guarnaccia, c., phifer, j., fani, n., ressler, k.j., & westen, d. (2011). emotion dysregulation and negative affect: association with psychiatric symptoms. the journal of clinical psychiatry, 72(5), 685–691. https://doi.org/10.4088/jcp.10m06409blu brandt, r., herrero, d., massetti, t., crocetta, t.b., guarnieri, r., de mello monteiro, c.b., da silveira viana, m., bevilacqua, g.g., de abreu, l.c., & andrade, a. (2016). the brunel mood scale rating in mental health for physically active and apparently healthy populations. health, 8(2), 125–132. https://doi.org/10.4236/health.2016.82015 chanana, s., & sharma, a. (2019). the effectiveness of self-perceived body image on emotional dysregulation among adolescents and young adults. international journal of education and psychological research, 8(1), 30–34. cherry, k. (2021). priming and the psychology of memory. retrieved march 08, 2022, from https://www.verywellmind.com/priming-and-the-psychology-of-memory-4173092 christ, c., de waal, m.m., dekker, j., van kuijk, i., van schaik, d., kikkert, m.j., goudriaan, a.e., beekman, a., & messman-moore, t.l. (2019). linking childhood emotional abuse and depressive symptoms: the role of emotion dysregulation and interpersonal problems. plos one, 14(2), e0211882. https://doi.org/10.1371/journal.pone.0211882 cole, p.m., michel, m.k., & teti, l.o. (1994). the development of emotion regulation and dysregulation: a clinical perspective. monographs of the society for research in child development, 59(2–3), 73–100. https://doi.org/10.1111/j.1540-5834.1994.tb01278.x dhalla, s., & kopec, j.a. (2007). the cage questionnaire for alcohol misuse: a review of reliability and validity studies. clinical & investigative medicine, 30(1), 33–41. https://doi.org/10.25011/cim.v30i1.447 employment equity act. (1998). employment equity act 55 of 1998. retrieved april 18, 2022, from https://www.gov.za/documents/employment-equity-act fehlinger, t., stumpenhorst, m., stenzel, n., & rief, w. (2013). emotion regulation is the essential skill for improving depressive symptoms. journal of affective disorders, 144(1–2), 116–122. https://doi.org/10.1016/j.jad.2012.06.015 gilbody, s., richards, d., & barkham, m. (2007). diagnosing depression in primary care using self-completed instruments: uk validation of phq–9 and core–om. british journal of general practice, 57, 650–652. gratz, k.l., & roemer, l. (2004). multidimensional assessment of emotion regulation and dysregulation: development, factor structure, and initial validation of the difficulties in emotion regulation scale. journal of psychopathology and behavioral assessment, 26(1), 41–54. https://doi.org/10.1023/b:joba.0000007455.08539.94 gross, j.j., & john, o.p. (2003). individual differences in two emotion regulation processes: implications for affect, relationships, and well-being. journal of personality and social psychology, 85(2), 348–362. https://doi.org/10.1037/0022-3514.85.2.348 gross, j.j., & thompson, r.a. (2007). emotion regulation: conceptual foundations. in j.j. gross (ed.), handbook of emotion regulation (pp. 229–248). guilford press. gunderson, e.k.e. (1973). individual behavior in confined or isolated groups. in j.e. rasmussen (ed.), man in isolation and confinement (pp. 145–164). aldine. kanas, n., sandal, g.m., boyd, j.e., gushin, v.i., manzey, d., north, r., leon, g.r., suedfeld, p., bishop, s.l., fiedler, e.r. & inoue, n. (2009). psychology and culture during long-duration space missions. acta astronautica, 64(7–8), 659–677. https://doi.org/10.1016/j.actaastro.2008.12.005 kimhi, s. (2011). understanding good coping: a submarine crew coping with extreme environmental conditions. psychology, 2(9), 961–967. https://doi.org/10.4236/psych.2011.29145 lane, a.m., jackson, a., & terry, p.c. (2005). preferred modality influences on exercise induced mood changes. journal of sports science and medicine, 4(2), 195–200. liu, q., zhou, r.l., zhao, x., chen, x.p., & chen, s.g. (2016). acclimation during space flight: effects on human emotion. military medical research, 3(1), 15. https://doi.org/10.1186/s40779-016-0084-3 löwe, b., decker, o., müller, s., brähler, e., schellberg, d., herzog, w., & yorck-herzberg, p. (2008). validation and standardization of the generalized anxiety disorder screener (gad-7) in the general population. medical care, 46(3), 266–274. https://doi.org/10.1097/mlr.0b013e318160d093 mandavia, a., robinson, g.g., bradley, b., ressler, k.j., & powers, a. (2016). exposure to childhood abuse and later substance use: indirect effects of emotion dysregulation and exposure to trauma. journal of traumatic stress, 29(5), 422–429. https://doi.org/10.1002/jts.22131 manyema, m., & richter, l.m. (2019). adverse childhood experiences: prevalence and associated factors among south african young adults. heliyon, 5(12), e03003. https://doi.org/10.1016/j.heliyon.2019.e03003 marker, d. (2002). model theory: an introduction. springer-verlag. mekawi, y., watson-singleton, n.n., kuzyk, e., dixon, h.d., carter, s., bradley-davino, b., fani, n., michopoulos, v., & powers, a. (2020). racial discrimination and posttraumatic stress: examining emotion dysregulation as a mediator in an african american community sample. european journal of psychotraumatology, 11(1), 1824398. https://doi.org/10.1080/20008198.2020.1824398 mennin, d.s. (2006). emotion regulation therapy: an integrative approach to treatment-resistant anxiety disorders. journal of contemporary psychotherapy, 36(2), 95–105. https://doi.org/10.1007/s10879-006-9012-2 michopoulos, v., powers, a., moore, c., villarreal, s., ressler, k.j., & bradley, b. (2015). the mediating role of emotion dysregulation and depression on the relationship between childhood trauma exposure and emotional eating. appetite, 91, 129–136. https://doi.org/10.1016/j.appet.2015.03.036 palinkas, l.a., gunderson, e.k.e., holland, a.w., miller, c., & johnson, j.c. (2000). predictors of behavior and performance in extreme environments: the antarctic space analogue program. aviation, space and environmental medicine, 71(6), 619–625. palinkas, l.a., & suedfeld, p. (2008). psychological effects of polar expeditions. the lancet, 371(9607), 153–163. https://doi.org/10.1016/s0140-6736(07)61056-3 pencea, i., munoz, a.p., maples-keller, j.l., fiorillo, d., schultebraucks, k., galatzer-levy, i., rothbaum, b.o., ressler, k.j., stevens, j.s., michopoulos, v., & powers, a. (2020). emotion dysregulation is associated with increased prospective risk for chronic ptsd development. journal of psychiatric research, 121, 222–228. https://doi.org/10.1016/j.jpsychires.2019.12.008 powers, a., stevens, j., fani, n., & bradley, b. (2015). construct validity of a short, self report instrument assessing emotional dysregulation. psychiatry research, 225(1–2), 85–92. https://doi.org/10.1016/j.psychres.2014.10.020 putnick, d.l., & bornstein, m.h. (2016). measurement invariance conventions and reporting: the state of the art and future directions for psychological research. developmental review, 41, 71–90. https://doi.org/10.1016/j.dr.2016.06.004 raimondi, g., imperatori, c., fabbricatore, m., lester, d., balsamo, m., & innamorati, m. (2022). evaluating the factor structure of the emotion dysregulation scale-short (eds-s): a preliminary study. international journal of environmental research and public health, 19(1), 418. https://doi.org/10.3390/ijerph19010418 rohrer, j.h. (1961). interpersonal relationships in isolated small groups. columbia university press. sandal, g.m. (2000). coping in antarctica: is it possible to generalize results across settings? aviation, space and environmental medicine, 71(9 suppl.), a37–a43. schaap, p., & kekana, e. (2016). the structural validity of the experience of work and life circumstances questionnaire (wlq). south african journal of industrial psychology, 42(1), a1349. https://doi.org/10.4102/sajip.v42i1.1349 schreiber, j.b., nora, a., stage, f.k., barlow, e.a., & king, j. (2006). reporting structural equation modeling and confirmatory factor analysis results: a review. the journal of educational research, 99(6), 323–338. https://doi.org/10.3200/joer.99.6.323-338 shea, c., slack, k.j., keeton, k.e., palinkas, l.a., & leveton, l.b. (2009). antarctica meta-analysis: psychosocial factors related to long-duration isolation and confinement. final report submitted to the nasa behavioral health and performance element. nasa. retrieved september 02, 2022, from https://ntrs.nasa.gov/api/citations/20090007551/downloads/20090007551.pdf suedfeld, p., & steel, g.d. (2000). the environmental psychology of capsule habitats. annual review of psychology, 51(1), 227–253. https://doi.org/10.1146/annurev.psych.51.1.227 terry, p.c., lane, a.m., & fogarty, g.j. (2003a). construct validity of the poms-a for use with adults. psychology of sport and exercise, 4(2), 125–139. https://doi.org/10.1016/s1469-0292(01)00035-8 terry, p.c., lane, a.m., lane, h.j., & keohane, l. (1999). development and validation of a mood measure for adolescents. journal of sports sciences, 17(11), 861–872. https://doi.org/10.1080/026404199365425 terry, p.c., potgieter, j.r., & fogarty, g.j. (2003b). the stellenbosch mood scale: a dual-language measure of mood. international journal of sport and exercise psychology, 1(3), 231–245. https://doi.org/10.1080/1612197x.2003.9671716 thelwell, r.c., lane, a.m., & weston, n.j.v. (2007). mood states, self-set goals, self-efficacy and performance in academic examinations. personality and individual differences, 42(3), 573–583. https://doi.org/10.1016/j.paid.2006.07.024 van wijk, c.h. (2017). coping in context: dispositional and situational coping of navy divers and submariners. journal of human performance in extreme environments, 13(1), 7. https://doi.org/10.7771/2327-2937.1091 van wijk, c.h. (2021). usefulness of the english version stress overload scale in a sample of employed south africans. african journal of psychological assessment, 3, a41. https://doi.org/10.4102/ajopa.v3i0.41 van wijk, c. (2022). psychological profiles of resilience in extreme environments: correlating measures of personality and coping and resilience. scientia militaria – south african journal of military studies, 50(1), 1–18. https://doi.org/10.5787/50-1-1256 van wijk, c., & firfirey, n. (2020). a brief screen for attention deficit/hyperactivity disorder in the south african workplace. south african journal of psychiatry, 26(1), 1500. https://doi.org/10.4102/sajpsychiatry.v26i0.1500 van wijk, c.h., & martin, j.h. (2021). promoting psychological adaptation among navy sailors. scientia militaria – south african journal of military studies, 49(1), 23–34. https://doi.org/10.5787/49-1-1260 van wijk, c.h., martin, j.h., & maree, d.j.f. (2021). clinical validation of brief mental health scales for use in south african occupational healthcare. south african journal of industrial psychology, 47(1), a1895. https://doi.org/10.4102/sajip.v47i0.1895 weybrew, b.b., & noddin, e.m. (1979). the mental health of nuclear submariners in the united states navy. military medicine, 144(3), 188–191. zimmerman, m., gorlin, e., dalrymple, k., & chelminiski, i. (2017). a clinically useful screen for attention-deficit/hyperactivity disorder in adult psychiatric outpatients. annals of clinical psychology, 29(3), 160–166. abstract introduction methods results discussion limitations and recommendations conclusion acknowledgements references appendix 1 about the author(s) amanda cromhout africa unit for transdisciplinary health research (auther), faculty of health sciences, north-west university, potchefstroom, south africa lusilda schutte africa unit for transdisciplinary health research (auther), faculty of health sciences, north-west university, potchefstroom, south africa marié p. wissing africa unit for transdisciplinary health research (auther), faculty of health sciences, north-west university, potchefstroom, south africa angelina wilson fadiji africa unit for transdisciplinary health research (auther), faculty of health sciences, north-west university, potchefstroom, south africa department of educational psychology, faculty of education, university of pretoria, pretoria, south africa tharina guse department of psychology, faculty of humanities, university of pretoria, pretoria, south africa sonia mbowa centre for social development in africa, faculty of humanities, university of johannesburg, johannesburg, south africa citation cromhout, a., schutte, l., wissing, m.p., wilson fadiji, a., guse, t., & mbowa, s. (2023). psychometric properties of the harmony in life scale in south african and ghanaian samples. african journal of psychological assessment, 5(0), a122. https://doi.org/10.4102/ajopa.v5i0.122 original research psychometric properties of the harmony in life scale in south african and ghanaian samples amanda cromhout, lusilda schutte, marié p. wissing, angelina wilson fadiji, tharina guse, sonia mbowa received: 06 oct. 2022; accepted: 13 jan. 2023; published: 28 feb. 2023 copyright: © 2023. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract harmony is regarded as important for well-being in many cultures. however, (cultural) differences in the meanings and manifestations of harmony may impact the equivalence of measures of harmony in life, as well as the associations between harmony and other well-being constructs. this study aimed to investigate the factorial, convergent and divergent validity, and measurement invariance of the harmony in life scale (hils) in south african and ghanaian samples. confirmatory factor analysis was applied to data from three south african samples (two multicultural samples completed the hils in english; and a setswana-speaking sample completed the hils in setswana) and one ghanaian sample (completed the hils in english). sample sizes ranged between n = 400 and n = 427. good fit indices were obtained for all samples, except for the setswana-speaking sample from south africa. in all instances the hils showed good internal consistency reliability and convergent and divergent validity. full scalar invariance was supported for the two multicultural south african samples, but only partial scalar invariance when data from the ghanaian sample were added to the analysis. the hils shows potential for future use in all samples, except the setswana-speaking sample. findings emphasise the importance of considering cultural and/or contextual and linguistic differences and how these may influence the measurement of psychological constructs. future research should qualitatively explore the meanings and manifestations of harmony in various african and other global contexts in local languages. contribution: this study is the first to investigate the psychometric properties of the original english version of the hils in south african and ghanaian samples, as well as a setswana translation of the scale. the study contributes to the understanding of harmony in life and the measurement thereof in diverse contexts, in this case specifically focused on african samples, and may, in turn, inform interventions and evaluation of interventions. keywords: harmony in life; south africa; ghana; validity; reliability; measurement invariance. introduction harmony in life is associated with well-being and quality of life and seems to be valued across cultures. in a multi-country study by delle-fave et al. (2011), participants were asked what happiness is to them and, in the domain of psychological definitions of happiness (other domains included family, work, health, etc.), 25.4% of the responses included reference to harmony or balance, which was higher than for any other subcategory in the psychological definitions domain. delle-fave et al. (2016) found that almost 30% of the definitions of happiness in the psychological definitions domain referred to inner harmony, while 29.11% of the responses referred to balance. other research supports the notion that harmony or balance is associated with well-being (e.g. delle-fave et al., 2022; di fabio & tsuda, 2018; lomas, 2021; lomas et al., 2022; schutte et al., 2022; sirgy, 2019). although most cultures seem to value harmony as an important aspect of life and well-being, the construct may have different nuances and manifestations in different cultural groups. factors such as differences in language and culture can impact the equivalence of measures of harmony. furthermore, depending on cultural meanings of harmony, the construct may have varying associations with other well-being constructs across cultures. therefore, in order to promote well-being cross-culturally, it is important to understand how the meanings, manifestations and measurement of harmony overlap and differ across groups, especially in under-researched african contexts. conceptualisation of harmony harmony has been described as ‘a global and overall assessment of whether one’s life involves balance, mindful non-judgmental acceptance, fitting in and being attuned with one’s life’ (garcia et al., 2014, p. 5). according to the meriam-webster dictionary, harmony refers to ‘a pleasing arrangement of parts’ or congruence, as well as agreement and accord, and internal calm or tranquillity (https://www.merriam-webster.com/dictionary/harmony). in eastern philosophy, harmony suggests a favourable relation among different things that exist (li, 2006), while harmony is understood in terms of social relationships in african contexts (see metz, 2017 for a discussion). describing harmony from a psychological well-being perspective, kjell et al. (2016) proposed that harmony ‘encourages a holistic world view that incorporates a balanced and flexible approach to personal well-being that takes into account social and environmental contexts’ (p. 894). applying latent semantic analysis, kjell et al. (2016) found that participants associated harmony with balance, accord, agreement, concord and tranquillity, and that these linked to concepts that tap into selflessness, interconnectedness and interdependence. considering these links with selflessness, interconnectedness and interdependence, one may expect that harmony will be especially important in cultures with an interdependent self-construal (e.g. african, asian, latin-american and southern european cultures) where individuals are viewed as more connected, rather than differentiated from others, compared to cultures with an independent self-construal (e.g. north american and many western european cultures), where individuals view themselves as autonomous and independent from others (kitayama et al., 2020; markus & kitayama, 1991). in a study investigating cross-cultural variations and similarities of happiness and subjective well-being, uchida et al. (2004) argued that, although happiness is likely to be universal, the experience thereof is embedded in socio-cultural contexts. happiness is therefore encompassed in rich, associative networks that vary cross-culturally. this same argument can apply to the conceptualisation and experience of harmony and other facets of well-being across cultures. considering that culture consists of distinct sets of values, attitudes and behaviours that form value schemas or value orientations (connor & becker, 2003, 2006; rokeach, 1973, 1979), it can be expected that these differences can influence how constructs are interpreted (thus influencing the meaning attached to the construct) and manifest across cultures which, in turn, influence the measurement of the construct. it is therefore important that the validity of measuring instruments assessing harmony in life is investigated for different cultural groups. measuring harmony in life: the harmony in life scale the harmony in life scale (hils, kjell et al., 2016) is a 5-item measure of overall harmony in life. kjell et al. (2016) proposed that harmony in life is complementary to satisfaction with life in explaining the cognitive component of subjective well-being as measured by the satisfaction with life scale (swls, diener, 1984; diener et al., 1985; kjell et al., 2016). they argued that the cognitive aspects of psychological functioning relevant when evaluating harmony in life, stand in contrast with the evaluations relevant to satisfaction with life (see kjell et al., 2016, for a discussion of this aspect). more specifically, evaluations in the hils involve psychological balance and flexibility in life, whereas evaluations in the swls involve comparing actual life circumstances to expected life circumstances (kjell et al., 2016). kjell et al. (2016) reported a cronbach’s alpha value of 0.90 for the hils, sufficient test-retest reliability (r = 0.77), and support for convergent and discriminant validity. in a validation study of the turkish translation of the hils, satici and tekin (2017) found support for a unidimensional structure, as well as for convergent and discriminant validity of the scale. cronbach’s alpha values ranged from 0.77 to 0.79, composite reliability scores from 0.78 to 0.80, and test-retest reliability was supported. singh et al. (2016) also found support for a unidimensional structure of the hindi translation of the hils and reported a cronbach’s alpha value of 0.88. harmony and well-being if the meaning of harmony may differ cross-culturally, the underlying motives for harmony, as well as the predictors of harmony, may also vary across cultures (cf. uchida et al., 2004). there may therefore be variation in how harmony associates with other indicators of well-being depending on the cultural context. studies that investigated associations between harmony in life and other well-being indicators include the original validation study by kjell et al. (2016) where the hils showed positive correlations with satisfaction with life (r = 0.74), subjective happiness (r = 0.71), facets of psychological well-being such as environmental mastery (r = 0.64), personal growth (r = 0.25), positive relations (r = 0.43), and self-acceptance (r = 0.65) but did not correlate with autonomy (r = 0.03), and purpose in life (r = 0.08). the scale was correlated negatively with measures of depression (r = -0.39), anxiety (r = -0.13) and stress (r = -0.26; kjell et al., 2016) in a mainly american sample (n = 476, united states [us] = 406, india = 37, and other countries = 33). in a validation study of the turkish translation of the hils, satici and tekin (2017) found that the hils correlated positively with life satisfaction (r = 0.44), positive affect (r = 0.35), subjective happiness (r = 0.43), and subjective well-being (r = 0.51), and negatively with negative affect (r = -0.31). harmony in life was positively predicted by flourishing (ß = 0.55) and negatively predicted by depression (ß = -0.50), anxiety (ß = -0.40), and stress (ß = -0.37) in a sample of turkish university students (n = 253). the present study despite the importance of harmony for people’s well-being (cf. delle-fave et al., 2011), research on harmony is sparse, also in african contexts. considering that the meanings, manifestations and the measurement of harmony in life seem to be informed by cultural values and judgements (e.g. satici & tekin, 2017), this study aims to investigate the factorial, convergent and divergent validity and measurement invariance of the hils (kjell et al., 2016) in south african and ghanaian samples. the associations between the hils and selected measures of well-being and ill-being were also examined. some of the scales used in the current study measure similar constructs to the scales used in the validation studies by kjell et al. (2016) and satici and tekin (2017), which enabled us to see how findings in african contexts compare to the findings of other studies where samples from other cultural groups were used. south africa and ghana were selected because research teams who conduct research in these two countries have established collaboration relationships and administered the hils in their studies. while this selection is not representative of the wider african population, the countries present geographical diversity, with south africa being in southern africa and ghana in west africa. validation studies on the hils are still limited, and include the studies of kjell et al. (2016, english version), kjell and diener (2021, english version), satici and tekin (2017, turkish translation), and singh et al. (2016, hindi translation). as far as we could establish when searching in the literature, this study is the first to investigate the psychometric properties of the original english version of the hils in south african and ghanaian samples, as well as a setswana translation of the scale. the study contributes to the understanding of harmony in life and the measurement thereof in diverse contexts, in this case specifically focused on african samples, and may, in turn, inform interventions and evaluation of interventions. methods participants data from four nonprobability samples gathered in different studies were used: samples 1 and 2 were multicultural adult south african samples, sample 3 consisted of setswana-speaking adults from south africa, and sample 4 was an african adult sample from ghana. see table 1 for a description of the respective samples. samples 1, 2 and 4 completed the research battery in english, and sample 3 in setswana. for all samples, participants had to be 18 years or older to participate in the study. in addition, for samples 1 and 4 participants also had to have at least a grade 12 level of education and be proficient in english because participants had to complete the research battery in english. a grade 12 level of education was assumed to indicate sufficient english proficiency to complete the research battery. for sample 2, participants had to be able to read and understand english and had to have access to the online survey through a computer or mobile device. for sample 1, the data formed part of the fort 3 research project (fort = fortology project [forté=strength]); with the fort 3 subproject applicable for this study named: ‘the prevalence of levels of psychosocial health: dynamics and relationships with biomarkers of (ill)health in south african social contexts (wissing, 2008/2012); and for sample 2, the international hope barometer programme (krafft et al., 2018). for sample 3, data formed part of the mental health leg of the 2017–2019 round of data gathering in the longitudinal, multidisciplinary prospective urban and rural epidemiology – south africa (pure-sa) study (teo et al., 2009), north west province, which involved an overlap between pure-sa and the fort 3 research project. for sample 4, data formed part of the ghana leg of the eudaimonic and hedonic happiness investigation (ehhi; delle-fave et al., 2011, 2016; wilson, 2017), although additional items outside the ehhi were added. table 1: description of samples 1 to 4. measures different scales were included in the research battery used for each of the samples. the following selected scales are relevant for this study. socio-demographic questionnaire socio-demographic data on variables such as gender, age, home language, population group, level of education and standard of living were collected across the samples. harmony in life scale the harmony in life scale (hils, kjell et al., 2016) comprises a single scale (no subscales distinguished) with five items, and measures individuals’ subjective perception of the overall harmony in their life on a 7-point likert-type scale that ranges from 1 (strongly disagree) to 7 (strongly agree). detail on previous findings pertaining to the scale’s psychometric properties was presented in the ‘introduction’ section. the satisfaction with life scale the satisfaction with life scale (swls, diener et al., 1985) measures the global judgement of satisfaction with one’s life as a whole through five items on a 7-point likert-type scale ranging from 1 (strongly disagree) to 7 (strongly agree). diener et al. (1985) reported sufficient internal consistency reliability (α = 0.87) and a test-retest reliability score of 0.82. wissing and van eeden (2002) found support for the unidimensional structure of the swls in south african samples, and reported cronbach’s alpha values of 0.70, 0.83 and 0.85 for young adults (ages 18–35), middle adults (ages 36–64) and older adults (ages 65 and older), respectively. appiah et al. (2020) reported a unidimensional structure with the residuals of items 4 and 5 correlated, and sufficient internal consistency reliability with ɷ = 0.87 for the twi-translation of the scale in a rural adult ghanaian sample. the mental health continuum – short form the 14-item mental health continuum – short form (mhc-sf; keyes et al., 2008) comprises three subscales, namely emotional well-being (mhc_ewb), social well-being (mhc_swb) and psychological well-being (mhc_pwb). the scale measures positive mental health on a 6-point likert-type scale that ranges from 0 (never) to 5 (every day). lamers et al. (2011) reported a three-factor solution with cronbach’s alpha values of 0.89 (mhc-sf total), 0.83 (mhc_ewb), 0.74 (mhc_swb) and 0.83 (mhc_pwb) in a dutch sample between the ages of 18 and 87 years. in south africa, keyes et al. (2008) found marginal support for a three-factor solution and reported cronbach’s alpha values of 0.74 (total mhc-sf), 0.73 (mhc_ewb), 0.59 (mhc_swb) and 0.67 (mhc_pwb) in a setswana-speaking sample. in another south african study, schutte and wissing (2017) found support for a three-factor bifactor exploratory structural equation modelling model (with item 5 removed), and reported sufficient model-based omega coefficients of composite reliability for the global positive mental health factor in english (ω = 0.88), afrikaans (ω = 0.90) and setswana (ω = 0.86) student samples. appiah et al. (2022) found that a bifactor exploratory structural equation modelling model best fitted the data in a rural adult ghanaian sample who completed the twi version of the mhc-sf. omega reliability coefficients were high (ω = 0.97) for the general positive mental health factor, marginal for mhc_ewb (ω = 0.51) and mhc_swb (ω = 0.57), and low for mhc_pwb (ω = 0.41). the meaning in life questionnaire the meaning in life questionnaire (mlq; steger et al., 2006) comprises 10 items that measure meaning in life by means of two 5-item subscales, namely presence of meaning (mlq_p; measuring the subjective experience of the meaningfulness of one’s life) and search for meaning (mlq_s; measuring the motivation to find meaning or to better understand the meaning of one’s life) on a likert-type scale ranging from 1 (absolutely untrue) to 7 (absolutely true). steger et al. (2006) reported a 2-factor structure with sufficient cronbach’s alpha reliability scores for the mlq_p (between 0.82 and 0.86) and the mlq_s (between 0.86 and 0.87) in american student samples. in south africa, temane et al. (2014) found support for a two-factor structure when the english version was completed by a multicultural student group. they also reported sufficient reliability with cronbach’s alpha values of 0.85 (mlq_p) and 0.84 (mlq_s). the affectometer-2 the affectometer-2 (afm-2; kammann & flett, 1983) is the abbreviated version of the afm 1 and comprises 20 items that measure general happiness or sense of well-being on a scale with five response options (1 = not at all; 2 = occasionally; 3 = some of the time; 4 = often; 5 = all of the time). the scale has two subscales, namely positive affect (afm2_pa) and negative affect (afm2_na). kammann and flett (1983) reported cronbach’s alpha reliability scores of 0.88 (afm2_pa) and 0.93 (afm2_na). in south africa, wissing et al. (2008) reported cronbach’s alpha scores of 0.64 (afm2_pa) and 0.79 (afm2_na) for the english version of the scale in a setswana-speaking sample. appiah et al. (2020) found support for a two-factor bifactor exploratory structural equation modelling model and reported reliability scores of ɷ = 0.88 (afm2_total), ɷ = 0.43 (afm2_pa) and ɷ = 0.72 (afm2_na) for the twi-translation of the scale in a rural adult ghanaian sample. the positive affect and negative affect schedule the positive affect and negative affect schedule (panas; watson et al., 1988) comprises 20 items, and measures positive affect (panas_pa, 10 items) and negative affect (panas_na, 10 items), respectively. respondents must indicate the extent to which they experienced different positive and negative emotions over a certain period of time on a scale with five response options (1 = very slightly or not at all, 2 = a little, 3 = moderately, 4 = quite a bit, 5 = extremely). watson et al. (1988) reported a 2-factor solution with cronbach’s alpha values above 0.80. only the panas_ne is relevant to this study. the scale of positive and negative experiences the scale of positive and negative experiences (spane; diener et al., 2010) comprises 12 items and measures general positive experiences (spane_pe, 6 items) and negative experiences (spane_ne, 6 items), based on how often the respondent experienced the feelings during a 4-week period. the scale is in likert-format, and ranges from 1 (very rarely or never) to 5 (very often or always). separate scores are calculated for positive and negative experiences, respectively, and/or a balance score by subtracting the score for negative experiences from the score for positive experiences. diener et al. (2010) reported cronbach’s alpha values of 0.87 (spane_pe), 0.81 (spane_ne) and 0.89 (spane_balanced) and good convergent validity with measures of emotion, life satisfaction, well-being and happiness. in south africa, du plessis and guse (2017) reported cronbach’s alpha values of 0.84 (spane_pe), 0.79 (spane_ne) and 0.85 (spane_balanced) in a multicultural student sample, with the spane_pe correlating positively with well-being and life satisfaction, and the spane_ne correlating negatively with well-being and life satisfaction. the subjective happiness scale the subjective happiness scale (shs; lyubomirsky & lepper, 1997) comprises four items that measure global subjective happiness. respondents indicate on a 7-point scale the extent to which each of the four items describes them. lyubomirsky and lepper (1997) reported cronbach’s alpha values between 0.79 and 0.94 across 14 american and russian samples consisting of high school and college students, as well as adult and retired community samples. support for convergent and discriminant validity was also reported. in africa, agbo (2021) found that a one-factor model yielded adequate fit and sufficient reliability scores for nigerian samples consisting of students and working populations. the scale had positive correlations with measures of well-being (e.g. satisfaction with life) and negative correlations with measures of ill-being (e.g. depression). the patient health questionnaire-9 the patient health questionnaire-9 (phq-9; kroenke et al., 2001) comprises nine items, and is a diagnostic tool used to assess the symptoms of depressive disorders on a scale with four response options, namely 0 (not at all), 1 (several days), 2 (more than half of the days) and 3 (nearly every day). kroenke et al. (2001) reported support for criterion, construct and external validity with sufficient reliability scores in a patient sample in primary care (α = 0.86) and an obstetrics-gynaecology patient sample (α = 0.86), as well as test-retest reliability. botha (2011) reported a one-factor confirmatory factor analysis (cfa) model with sufficient criterion-related validity and reliability (cronbach’s α = 0.86) for the english version of the phq-9 in a multicultural south african sample. appiah et al. (2020) found support for a 2-factor exploratory structural equation modelling model, with an ɷ-value of 0.76 for the twi-translation of the scale in a rural adult ghanaian sample. the patient health questionnaire-4 the patient health questionnaire-4 (phq-4; kroenke et al., 2009) is a very brief screening tool for symptoms of anxiety and depression (measured by two items each). participants must indicate to what extent they have been bothered by these problems over the past 2 weeks. response options vary from 0 (not at all) to 3 (nearly every day). kroenke et al. (2009) reported the phq-4 to be a valid tool to screen for anxiety and depression. the phq-4 has further been validated in a german sample (löwe et al., 2010) and used in several non-western samples (e.g. lenz & li, 2022; materu et al., 2020). ethical considerations data were collected between 2014 and 2015 for sample 1, in 2018 for sample 2, between 2017 and 2019 for sample 3, and in 2017 for sample 4. all participants gave written informed consent and participated voluntarily in the study. samples 1, 2 and 4 did not receive incentives for participation in the study, while sample 3 received a small token of appreciation. all data were handled confidentially. a research committee approach, using standard forward and back translation procedures, was employed to translate the original english version of the hils into setswana. setswana is one of the 11 official languages that are spoken in south africa, and is the main language spoken in the areas of south africa where the data for sample 3 were collected. firstly, the scale was translated to setswana by a bilingual speaker; secondly, the scale was back-translated into english by an independent translator; and then a research committee (consisting of academics with setswana as first language and who were fluent in english, a professional translator, subject experts and members from the target communities) compared the back-translated english version with the original english version (brislin, 1980; van de vijver & humbleton, 1996; van de vijver & leung, 1997). for samples 1 and 3, ethics approval was obtained from the health research ethics committee of the north-west university, south africa, with ethics approval numbers: nwu 00002-07-a2 (sample 1) and nwu-00016-10-a1 (sample 3). in addition, ethics approval was obtained from the department of health of the north west province for sample 3. for sample 2, ethics approval was obtained from the faculty of humanities research ethics committee of the university of johannesburg, ethics approval number: rec01-092-2017. for sample 4, ethics approval was obtained from the university of ghana ethics committee for human research, ethics approval number: ech 086 16–17. data analysis the data were analysed in five stages. stage 1: descriptive statistics of individual scale items we used ibm® statistical package for the social sciences (spss) 27 to calculate the mean, standard deviation, skewness and kurtosis for each item of the hils across the four samples. stage 2: factorial validity the factor structure of the hils was determined by applying confirmatory factor analysis (cfa) in mplus 8.3 (muthén & muthén, 1998–2019). full information likelihood estimation was used to handle missing data and the robust maximum likelihood estimator (mlr) was applied. the χ2-statistic, comparative fit index (cfi, bentler, 1990), tucker-lewis index (tli, tucker & lewis, 1973), root mean square error of approximation (rmsea, steiger & lind, 1980) and the standardised root mean square residual (srmr) were used to evaluate model fit. models are deemed to display a good fit when the χ2-statistic has nonsignificant p-values (byrne, 2012); the cfa and tli values are larger than 0.95 (byrne, 2012; hu & bentler, 1999); and rmsea and srmr values are less than 0.05 (values less than 0.08 indicate reasonable model fit; byrne, 2012). the value of the χ2-statistic is sensitive to sample size (byrne, 2012); therefore, the cfi, tli, rmsea and srmr were primarily used to determine model fit. stage 3: internal consistency reliability the formula used by sánchez-oliva et al. (2017) was applied to calculate model-based omega coefficients of composite reliability. stage 4: convergent and divergent validity pearson’s correlation coefficients were calculated in ibm® spss statistics 27 to determine convergent and divergent validity of the hils. we also calculated the attenuation-corrected correlation coefficients by dividing the pearson’s correlation coefficient by the square root of the product of the two (sub-)scales’ omega hierarchical coefficients of reliability to compensate for the lack of reliability of the scales (borneman, 2010). although the pearson’s r-values are also reported, our interpretation of the correlations between the hils and the criterion scales were based upon the attenuation-corrected r-values. this is because the relationships between constructs are attenuated by random measurement error (borneman, 2010). when this attenuation is corrected, the relationships between the scales are estimated as if they were free from random error, thus estimating the true relationship between the hils and the criterion scales (see borneman, 2010). stage 5: measurement invariance invariance of the hils across the different samples was investigated in a hierarchical series of steps using mplus 8.3 (muthén & muthén, 1998–2019). for configural invariance, the number of factors, as well as the structure of fixed and freely estimated parameters were assumed to be similar across the groups, but no equality constraints were applied (byrne, 2012). for metric invariance, the factor loadings were constrained to be equal for all the groups, and for scalar invariance equality constraints were applied to factor loadings and intercepts. if configural invariance had not been supported, the measures were deemed noninvariant. however, if configural invariance had been supported but either metric or scalar invariance not, nonequivalant factor loadings or intercepts, respectively, were freed one at a time to look for partial metric or partial scalar invariance (putnick & bornstein, 2017). high mis and epcs were used to determine which parameters had to be freely estimated to yield partial metric or partial scalar invariance (byrne, 2012). differences smaller than 0.01 in the cfi values and smaller than 0.015 in the rmsea values of the nested models, indicate measurement invariance. we report the likelihood ratio test but did not use it for decision-making, because of its sensitivity to sample size (chen, 2007; cheung & rensvold, 2002). results stage 1: descriptive statistics of the individual scale items of the harmony in life scale for samples 1–4 the means, standard deviations, skewness and kurtosis values for the individual scale items of the hils for all samples are presented in table 1-a1. mean values were between 4.73 (standard deviation [sd] = 1.629; item 3) and 5.36 (sd = 1.404, item 4) for sample 1; between 4.98 (sd = 1.530, item 2) and 5.44 (sd = 1.310, item 5) for sample 2; between 5.43 (sd = 1.449, item 2) and 6.03 (sd = 1.371, item 3) for sample 3; and between 5.03 (sd = 1.449, item 4) and 5.51 (sd = 1.335, item 1) for sample 4. skewness values were between -0.690 (item 3) and -1.310 (item 4) for sample 1; between -1.239 (item 4) and -0.775 (item 2) for sample 2; between -1.833 (item 3) and -1.106 (item 2) for sample 3; and between -0.966 (item 1) and -0.652 (item 2) for sample 4. kurtosis values were between -0.362 (item 3) and 1.553 (item 4) for sample 1; between -0.186 (item 2) and 1.432 (item 4) for sample 2; between 0.804 (item 2) and 3.243 (item 3) for sample 3; and between 0.256 (item 3) and 1.035 (item 1) for sample 4. except for exceptions in sample 3, there were no significant deviations from normality as indicated by skewness and kurtosis values that were smaller than two in absolute value (bandalos & finney, 2010). stage 2: factorial validity of the harmony in life scale confirmatory factor analysis was used to determine the factor structure of the hils. the fit indices for the hils in the various samples are presented in table 2. except for sample 3, all cfi values exceeded 0.95, tli values were close to 0.95, rmsea values were smaller than 0.08 in most instances, while the srmr values were smaller than 0.05. the hils therefore showed acceptable fit for all samples, except for sample 3. table 2: global fit indices for the harmony in life scale in the various samples. apart from the global fit indices, item-level parameters, namely the standardised factor loadings, the items’ residual variances and the items’ r2-values, were also considered (see table 3). the standardised factor loadings ranged between 0.545 and 0.939 in the various samples, thereby supporting the factorial validity of the hils. although global model fit was insufficient for sample 3, the factor loadings ranged between 0.701 and 0.819. for the two multicultural south african samples (samples 1 and 2), the factor loadings and r2-values of items 1, 2 and 3 were notably higher than the factor loadings and r2-values of items 4 and 5. this aligns with the findings by kjell et al. (2016) where the first three items of the hils showed the highest item-total correlations. the first three items also showed the highest factor loadings on the harmony factor when a two-factor model was applied to combined hils and swls data (kjell et al., 2016; kjell & diener, 2021). when abbreviating the hils, kjell and diener (2021) proposed that the first three items of the hils (containing the words ‘harmony’ and ‘balance’) are most suitable for an abbreviated scale as these items refer to the most central aspects of harmony in life, compared to the last two items that refer to ‘accept’ (item 4) and ‘fitting in’ (item 5). the same pattern was not observed for samples 3 and 4. factor loadings were relatively close to each other for samples 3 and 4, except for item 4 that showed a substantially smaller factor loading for sample 4 compared to the other items. the residual variances for samples 1 and 2 were also smaller than for samples 3 and 4. because sample 3 did not produce a good fitting baseline model, convergent and divergent validity as well as measurement invariance were only investigated for samples 1, 2 and 4. table 3: standardised factor loadings, residual variances, r2-values and omega coefficients of the harmony in life scale for the various samples. stage 3: internal consistency reliability the hils showed sufficient reliability scores for all samples. the model-based omega coefficient of composite reliability ranged between 0.84 and 0.90 (see table 3). stage 4: convergent and divergent validity different scales were included in the research battery used for each of the samples, hence the criterion scales differed across samples. all criterion scales were screened for factorial validity (table 4) and reliability (see table 5). only (sub)scales that showed sufficient model fit and reliability scores were included in the analysis. note that item 9 of the mlq_p was removed for the ghanaian sample (sample 4) as this negatively worded item posed problems with model fit. table 4: global fit indices for the criterion-scales used in the various samples. table 5: correlations between the harmony in life scale and other measures of well-being and ill-being for samples 1, 2 and 4. the model-based omega coefficients of composite reliability of each criterion scale, pearson’s r-values, and the attenuation-corrected r-values are presented in table 5. the hils showed strong positive correlations with measures of well-being (e.g. positive mental health, meaning, positive affect/experiences and happiness), medium to strong negative correlations with measures of negative affect and ill-being (e.g. negative affect/experiences and depression), and a weak negative correlation with search for meaning. these findings point towards the convergent and divergent validity of the hils. stage 5: measurement invariance we wanted to test whether the english version of the hils was invariant for samples 1, 2 and 4. however, because we expected that cultural differences could influence the results, we first tested for invariance between the multicultural south african samples (samples 1 and 2), whereafter we also included the ghanaian sample (sample 4) in the analysis. the results for the measurement invariance of the hils are presented in table 6. table 6: measurement invariance of the harmony in life scale for samples 1 and 2, and for samples 1, 2 and 4. full scalar invariance was supported for the two multicultural south african samples (samples 1 and 2) as indicated by ǀ∆cfiǀ values smaller than 0.01 and ǀ∆rmseaǀ values smaller than 0.015. only partial scalar invariance was supported when the ghanaian sample (sample 4) was also included in the analysis. specifically, the configural model (invariance model 1) yielded adequate fit for the data. a ǀ∆cfiǀ value larger than 0.01 indicated insufficient metric invariance (invariance model 2a). the factor loading of item 5 had a high modification index (mi = 14.545) for the ghanaian group and the factor loading was allowed free estimation for all groups. this model (invariance model 2b) yielded a ǀ∆cfiǀ value smaller than 0.01 and a ǀ∆rmseaǀ value smaller than 0.015, indicating support for partial metric invariance. partial scalar invariance (invariance model 3c), as indicated by a ǀ∆cfiǀ value smaller than 0.01 and a ǀ∆rmseaǀ value smaller than 0.015 after the intercepts of items 4 (mi = 24.843 for the ghanaian sample; invariance model 3a) and 5 (mi = 24.172 for the ghanaian sample, invariance model 3b) were allowed free estimation, one at a time, for all groups. discussion the aim of this study was to evaluate the psychometric properties and measurement invariance of the hils for different south african and ghanaian samples. a single-factor solution fitted all samples, except for sample 3 (the south african sample who completed the scale in setswana). the hils showed sufficient reliability with ɷ-values larger than 0.80. convergent and divergent validity were supported for samples 1 (a multicultural south african sample completing the scales in english), 2 (a multicultural south african sample completing the scales in english) and 4 (a ghanaian sample completing the scales in english). full scalar invariance was supported for samples 1 and 2, but only partial scalar invariance when sample 4 was added to the analysis. some findings will be discussed. good model fit across the different african samples the hils displayed good fit across different samples from africa who completed the questionnaire in english. because the samples in this study may be very different in terms of cultural orientation compared to the samples for which the validity of the hils was originally investigated by kjell et al. (2016), this is very significant and suggests that the scale may be useful across different and diverse groups. it may also confirm the importance and prominence of harmony as a facet of psychosocial well-being across different groups. findings for the setswana-speaking south african sample the hils did not fit well for the setswana-speaking sample (sample 3). it may be that the exact meaning of some scale items was altered when the hils was translated from english to setswana. specifically, in the setswana language there is not a separate word for ‘harmony’. harmony, which appears in items 1 and 3 of the hils, has been translated with the setswana word ‘kagiso’ which means ‘peace’, and can refer to being neighbourly or getting along with others, rather than inner harmony which may be the connotation with the english word in the context of the scale items’ phrasing (see metz, 2017). a word with similar connotation does not exist in setswana. this difference in nuance may have affected the way in which participants interpreted items 1 and 3 of the setswana version of the scale, implying that this translation may measure a related but different construct. another possible explanation for the finding may be that the hils does not capture harmony as it is understood in the batswana cultural context. as indicated earlier, harmony is understood in terms of social relationships in african contexts (see metz, 2017, for a discussion), and the scale items of the hils relate more to inner harmony (items 1 to 3) and the external environment (items 4 and 5). therefore, completely different scale items may need to be developed to capture the cultural understanding and meaning of harmony. convergent and divergent validity of the harmony in life scale the hils showed acceptable convergent and divergent validity for samples 1, 2 and 4. specifically, we found strong positive correlations between the hils and measures of well-being (e.g. positive mental health, meaning, positive affect, positive experiences, and happiness), and medium to strong negative correlations between the hils and measures of ill-being (e.g. negative affect, negative experiences, and depression) and a weak negative relationship with search for meaning in life. these findings are in line with the findings of kjell et al. (2016) and satici and tekin (2017) discussed above, where the hils correlated positively with measures of well-being and negatively with measures of mental ill-being. measurement invariance full scalar invariance was supported for samples 1 and 2, while partial scalar invariance was supported for samples 1, 2 and 4. the different groups can therefore be compared in terms of the estimated latent mean scores obtained for the hils. interestingly, the factor loading of item 4 (that refers to accepting one’s life conditions) and the intercepts of items 4 and 5 (that refers to fitting in with one’s surroundings) were noninvariant for the ghanaian group. these two items seem to refer more to external conditions, while the first three items (item 1 [referring to a harmonious lifestyle]; item 2 [referring to overall balance in life]; item 3 [referring to being in harmony]) seem to refer more to what is within a person’s personal sphere of influence. the first three items, that can be considered to be more central to harmony in life (kjell & diener, 2021), may be more invariant than items 4 and 5 which are more externally focused. limitations and recommendations despite the contribution of this study, there are also limitations. firstly, while the current study is the first to evaluate performance of the hils in african countries, only two countries were selected, which do not represent the full african population. furthermore, the samples within the countries were not representative of the countries’ populations. in terms of ethical principles such as fair selection and scientific validity, we acknowledge the limitations of the study, and do not suggest that the findings are generalisable to the populations within the respective countries, or to the wider african population. it will be worthwhile to study harmony as a construct and the measurement thereof in more african countries in future, using representative samples. secondly, although the english version of the hils shows potential for use in the multicultural south african and ghanaian samples, this study only provides preliminary support for its use and future research may explore whether these results replicate in other samples. specifically, future research can explore the psychometric properties of the hils from a cultural perspective and, should the scale be translated into different african and global languages, specifically attend to the semantical and cultural meaning of constructs. lastly, although our sample sizes were adequate according to minimum sample size guidelines for performing factor analysis, future research may use larger sample sizes to account for challenges associated with smaller sample sizes, such as biased standard errors and questionable quality of the fit statistics (see kyriazos, 2018 in this regard). conclusion the hils shows potential for use in the current samples, except for the setswana-speaking south african sample. one should also take cognisance thereof that the hils may measure a different, but related construct to harmony in the setswana speaking sample (who completed the setswana translation of the hils); alternatively, that the hils does not capture the cultural understanding and meaning of harmony in a batswana cultural context. the findings emphasise the importance of language, and how different notions may be expressed in different languages, considering that words with exactly the same meaning may not exist in different languages. cultural meanings and understandings are expressed in language, and nuances may differ in different languages. this stresses the importance of cultural and/or contextual and linguistic differences and how these impact the measurement of psychological constructs. in this regard, future research should qualitatively explore the nuances and manifestations of harmony in various african and other global contexts. new measures, that capture these meanings in the local languages, can then be developed. acknowledgements the authors would also like to thank lanthé kruger, who was the principal investigator of the pure-sa project, north west sites, at the time of data gathering, for her leadership in the larger project. the authors also wish to thank all fieldworkers who assisted with the gathering of the data. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions a.c., l.s., m.p.w., a.w.f., t.g. and s.m. contributed to the design and planning of the study. l.s., m.p.w., a.w.f. and t.g. were responsible for the gathering and capturing of the data. a.c. and l.s. attended to the statistical analyses and the interpretation of the results. a.c. drafted the manuscript, incorporated the suggestions from the co-authors and prepared the final manuscript for submission. l.s., m.p.w., a.w.f., t.g. and s.m. provided continuous and critical feedback regarding the intellectual content of the document. the final manuscript was read and approved by all authors. for samples 1 and 3, ethics approval was obtained from the health research ethics committee of the north-west university, south africa, with ethics approval numbers: nwu 00002-07-a2 (sample 1) and nwu-00016-10-a1 (sample 3). in addition, ethics approval was obtained from the department of health of the north west province for sample 3. for sample 2, ethics approval was obtained from the faculty of humanities research ethics committee of the university of johannesburg, ethics approval number: rec-01-092-2017. for sample 4, ethics approval was obtained from the university of ghana ethics committee for human research, ethics approval number: ech 086 16–17. funding information this work is based on the research supported in part by the national research foundation of south africa (grant numbers: 106050, 121948) and the south african medical research council (samrc) in the division of research capacity development under the early investigators programme from funding received from the south african national treasury (project code: 57035). the grant holders acknowledge that opinions, findings and conclusions or recommendations expressed in any publication are solely those of the authors, and that the nrf and samrc accept no liability whatsoever in this regard. data availability the datasets generated and/or analysed during the current study are available from the third author (marie.wissing@nwu.ac.za) for sample 1, from the fifth author (tharina.guse@up.ac.za) for sample 2, from the second author (lusilda.schutte@nwu.ac.za) for sample 3 and the fourth author (angelina.wilsonfadiji@up.ac.za) for sample 4. all data will be available on reasonable request, subject to ethics approval. disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references agbo, a.a. (2021). assessment of the psychometric properties of the subjective happiness scale among emerging, young, and middle-aged adult populations in nigeria. current psychology. https://doi.org/10.1007/s12144-021-01884-4 appiah, r., schutte, l., wilson fadiji, a., wissing, m.p., & cromhout, a. (2020). factorial validity of the twi versions of five measures of mental health and well-being in ghana. plos one, 15(8), e0236707. https://doi.org/10.1371/journal.pone.0236707 appiah, r., wissing, m.p., wilson fadiji, a., & schutte, l. (2022). factorial validity of the twi version of the mental health continuum-short form and prevalence of mental health in a rural ghanaian sample. in l. schutte, t. guse, & m.p. wissing (eds.), embracing well-being in diverse african contexts: research perspectives (vol. 16, pp. 73–98). cross-cultural advancements in positive psychology. springer. bandalos, d.l., & finney, s.j. (2010). factor analysis: exploratory and confirmatory. in g.r. hancock & r.o. mueller (eds.), the reviewer’s guide to quantitative methods in the social sciences (pp. 93–114). routledge. bentler, p.m. (1990). comparative fit indexes in structural models. psychological bulletin, 107(2), 238–246. https://doi.org/10.1037/0033-2909.107.2.238 borneman, m.j. (2010). correction for attenuation. in n.j. salkind (ed.), encyclopaedia of research design (pp. 260–264). sage. botha, m.n. (2011). validation of the patient health questionnaire (phq-9) in an african context. master’s thesis. north-west university. retrieved from http://dspace.nwu.ac.za/handle/10394/4647 brislin, r.w. (1980). translation and content analysis of oral and written material. in h.c. triandis & j.w. berry (eds.), handbook of cross-cultural psychology (vol. 1, pp. 389–444). allyn & bacon. byrne, b.m. (2012). structural equation modeling with mplus: basic concepts, applications, and programming. routledge. chen, f.f. (2007). sensitivity of goodness of fit indexes to lack of measurement invariance. structural equation modelling: a multidisciplinary journal, 14(3), 464–504. https://doi.org/10.1080/10705510701301834 cheung, g.w., & rensvold, r.b. (2002). evaluating goodness-of-fit indexes for testing measurement invariance. structural equation modelling: a multidisciplinary journal, 9(2), 233–255. https://doi.org/10.1207/s15328007sem09025 connor, p.e., & becker, b.w. (2003). personal value systems and decision-making styles of public managers. public personnel management, 32(1), 155–180. https://doi.org/10.1177/009102600303200109 connor, p.e., & becker, b.w. (2006). public sector managerial values: united states, canada and japan. international journal of organizational theory and behavior, 9(2), 147–173. https://doi.org/10.1108/ijotb-09-02-2006-b001 delle fave, a., brdar, i., freire, t., vella-brodrick, d., & wissing, m.p. (2011). the eudaimonic and hedonic components of happiness: qualitative and quantitative findings. social indicators research, 100(2), 185–207. https://doi.org/10.1007/s11205010-9632-5 delle fave, a., brdar, i., wissing, m.p., araujo, u., castro solano, a., freire, t., hernández-pozo, m.d.r., jose, p., martos, t., nafstad, h.e., nakamura, j., singh, k., & soosai-nathan, l. (2016). lay definitions of happiness across nations: the primacy of inner harmony and relational connectedness. frontiers in psychology, 7, 30. https://doi.org/10.3389/fpsyg.2016.00030 delle fave, a., wissing, m.p., & brdar, i. (2022). the investigation of harmony in psychological research. in c. li & d. düring (eds.), the virtue of harmony (pp. 253–276). oxford university press. diener, e. (1984). subjective well-being. psychological bulletin, 95(3), 542–575. https://doi.org/10.1037/0033-2909.95.3.542 diener, e., emmons, r.a., larsen, r.j., & griffin, s. (1985). the satisfaction with life scale. journal of personality assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13 diener, e., wirtz, d., tov, w., kim-preto, c., choi, d.-w., oishi, s., & biswas-diener, r. (2010). new well-being measures: short scales to assess flourishing and positive and negative feelings. social indicators research, 97(2), 143–156. https://doi.org/10.1007/s11205-009-9493-y di fabio, a., & tsuda, a. (2018). the psychology of harmony and harmonization: advancing the perspectives for the psychology of sustainability and sustainable development. sustainability, 10(12), 4726. https://doi.org/10.3390/su10124726 du plessis, g.a., & guse, t. (2017). validation of the scale of positive and negative experiences in a south african student sample. south african journal of psychology, 47(2), 184–197. http://doi.org/10.1177/0081246316654328 garcia, d., nima, a.a., & kjell, o.n.e. (2014). the affective profiles, psychological well-being, and harmony: environmental mastery and self-acceptance predict the sense of a harmonious life. peerj, 2, 259. https://doi.org/10.7717/peerj.259 hu, l., & bentler, p.m. (1999). cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. structural equation modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 kammann, n.r., & flett, r. (1983). affectometer 2: a scale to measure current level of general happiness. australian journal of psychology, 35(2), 259–265. https://doi.org/10.1080/00049538308255070 keyes, c.l.m., wissing, m., potgieter, j.p., temane, m., kruger, a., & van rooy, s. (2008). evaluation of the mental health continuum – short form (mhc-sf) in setswana-speaking south africa. clinical psychology and psychotherapy, 15(3), 181–192. https://doi.org/10.1002/cpp.572 kitayama, s., berg, k., & chopik, w.k. (2020). culture and well-being in late adulthood: theory and evidence. american psychologist, 75(4), 567–576. https://doi.org/10.1037/amp0000614 kjell, o.n.e., daukantaitè, k., hefferon, k., & sikstrӧm, s. (2016). the harmony in life scale complements the satisfaction with life scale: expanding the conceptualization of the cognitive component of subjective well-being. social indicators of research, 126(2), 893–919. https://doi.org/10.1007/s11205-015-0903-z kjell, o.n.e., & diener, e. (2021). abbreviated three-item versions of the satisfaction with life scale and the harmony in life scale yield as strong psychometric properties as the original scales. journal of personality assessment, 103(2), 183–194. https://doi.org/10.1080/00223891.2020.1737093 krafft, a.m., perrig-chiello, p., & walker, a.m. (eds.). (2018). hope for a good life: results of the hope-barometer international research program (vol. 72). springer. kroenke, k., spitzer, r.l., & williams, j.b.w. (2001). the phq-9: validity of a brief depression severity measure. jgim: journal of general internal medicine, 16(9), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606 kroenke, k., spitzer, r.l., williams, j.b.w., & löwe, b. (2009). an ultra-brief screening scale for anxiety and depression: the phq-4. psychosomatics, 50(6), 613–621. https://doi.org/10.1016/s0033-3182(09)70864-3 kyriazos, t.a. (2018). applied psychometrics: sample size and sample power considerations in factor analysis (efa, cfa) and sem in general. psychology, 9(8), 2207–2230. https://doi.org/10.4236/psych.2018.98126 lamers, s.m.a., westerhof, g.j., bohlmeijer, e.t., ten klooster, p.m., & keyes, c.l.m. (2011). evaluating the psychometric properties of the mental health continuum-short form (mhc-sf). journal of clinical psychology, 67(1), 99–110. https://doi.org/10.1002/jclp.20741 lenz, a.s., & li, c. (2022). evidence for measurement invariance and psychometric reliability for scores on the phq-4 from a rural and predominately hispanic community. measurement and evaluation in counselling and development, 55(1), 17–29. https://doi.org/10.1080/07481756.2021.1906157 li, c. (2006). the confucian ideal of harmony. philosophy east and west, 56(4), 583–603. https://doi.org/10.1353/pew.2006.0055 lomas, t. (2021). life balance and harmony: well-being’s golden thread. international journal of well-being, 11(1), 50–68. https://doi.org/10.5502/ijw.v11i1.1477 lomas, t., lai, a.y., shiba, k., diego-rosell, p., yukiko, u., & vanderweele, t.j. (2022). insights from the first global survey of balance and harmony. in f. halliwell, r. layard, j.d. sachs, j.-e. de neve, l.b. aknin, & s. wang (eds.), world happiness report 2022 (pp. 127–154), sustainable development solutions network. löwe, b., wahl, i., rose, m., spitzer, c., glaesmer, h., wingenfeld, k., schneider, a., & brähler, e. (2010). a 4-item measure of depression and anxiety: validation and standardization of the patient health questionnaire-4 (phq-4) in the general population. journal of affective disorders, 122(1), 86–95. https://doi.org/10.1016/j.jad.2009.06.019 lyubomirsky, s., & lepper, h.s. (1997). a measure of subjective happiness: preliminary reliability and construct validation. social indicators research, 46(2), 137–155. retrieved from https://www-jstor-org.nwulib.nwu.ac.za/stable/27522363 markus, h.r., & kitayama, s. (1991). culture and the self: implications for cognition, emotion, and motivation. psychological review, 98(2), 224–253. https://doi.org/10.1037/0033-295x.98.2.224 materu, j., kuringe, e., nyato, d., galishi, a., mwanamsangu, a., katebalila, m., shao, a., changalucha, j., nnko, s., & wambura, m. (2020). the psychometric properties of phq-4 anxiety and depression screening scale among out of school adolescent girls and young women in tanzania: a cross-sectional study. bmc psychiatry, 20(1), 1–8. https://doi.org/10.1186/s12888-020-02735-5 metz, t. (2017). values in china as compared to africa: two conceptions of harmony. philosophy east and west, 67(1), 441–465. https://doi.org/10.1353/pew.2017.0034 muthén, l.k., & muthén, b.o. (1998–2019). mplus (version 8.3) [computer software]. putnick, d.l., & bornstein, m.h. (2017). measurement invariance conventions and reporting: the state of the art and future direction for psychological research. developmental review, 41(5), 71–90. https://doi.org/10.1016/j.dr.2016.06.004 rokeach, m. (1973). the nature of human values. free press. rokeach, m. (1979). understanding human values: individual and societal. free press. sánchez-oliva, d., morin, a.j.s., teixeira, p.j., carracxa, e.v., palmeira, a.l., & silva, m.n. (2017). a bifactor exploratory structural equation modeling representation of the structure of the basic psychological needs at work scale. journal of vocational behavior, 98, 173–187. https://doi.org/10.1016/j.jvb.2016.12.001 satici, s.a., & tekin, e.g. (2017). harmony in life scale – turkish version: studies of validity and reliability. psicologia: reflexão e crítica [psychology: research and review], 30, 18. https://doi.org/10.1186/s41155-017-0073-9 schutte, l., & wissing, m.p. (2017). clarifying the factor structure of the mental health continuum short form in three languages: a bifactor exploratory structural equation modelling approach. society and mental health, 7(3), 142–158. https://doi.org/10.1177/2156869317707793 schutte, l., wissing, m.p., wilson fadiji, a., mbowa, s., shoko, p.m., & schutte, w.d. (2022). exploration of harmony as a quality of happiness: findings from south africa and ghana. in l. schutte, t. guse & m.p. wissing (eds.). embracing well-being in diverse african contexts: research perspectives (vol. 16, pp. 319–344). cross-cultural advancements in positive psychology. springer. singh, k., mitra, s., & khanna, p. (2016). psychometric properties of hindi version of peace of mind, harmony in life and sat-chit-ananda scales. indian journal of clinical psychology, 43(1), 58–64. sirgy, m.j. (2019). positive balance: a hierarchical perspective of positive mental health. quality of life research, 28(7), 1921–1930. https://doi.org/10.1007/s11136-019-02145-5 steger, m.f., frazier, p., oishi, s., & kaler, m. (2006). the meaning in life questionnaire: assessing the presence of and the search for meaning in life. journal of counseling psychology, 53(1), 80–93. http://doi.org/10.1037/0022-0167.53.1.80 steiger, j.h., & lind, j.c. (1980, june). statistically based tests for the number of common factors. paper presented at psychometric society annual meeting. temane, l., khumalo, i.p., & wissing, m.p. (2014). validation of the meaning in life questionnaire in a south african context. journal of psychology in africa, 24(1), 51–60. https://doi.org/10.1080/14330237.2014.904088 teo, k., chow, c.k., vaz, m., rangarajan, s., & yusuf, s. (2009). the prospective urban rural epidemiology (pure) study: examining the impact of societal influences on chronic noncommunicable diseases in low-, middle-, and high-income countries. american heart journal, 158(1), 1–7. http://doi.org/10.1016/j.ahj.2009.04.019 tucker, l.r., & lewis, c. (1973). a reliability coefficient for maximum likelihood factor analysis. psychometrika, 38(1), 1–10. https://doi.org/10.1007/bf02291170 uchida, y., norasakkunkit, v., & kitayama, s. (2004). cultural constructions of happiness: theory and empirical evidence. journal of happiness studies, 5(3), 223–229. https://doi.org/10.1007/s10902-004-8785-9 van de vijver, f., & hambleton, r.j. (1996). translating tests: some practical guidelines. european psychologist, 1(2), 89–99. https://doi.org/10.1027/1016-9040.1.2.89 van de vijver, f.j.r., & leung, k. (1997). methods and data analysis for cross-cultural research. sage. watson, d., clark, l.a., & tellegen, a. (1988). development and validation of brief measures of positive and negative affect: the panas scales. journal of personality and social psychology, 54(6), 1063–1070. https://doi.org/10.1037/0022-3514.54.6.1063 wilson, a. (2017). eudaimonic hedonic happiness inventory. unpublished research protocol, approved by the health research ethics committee of the university of ghana (number ech 086 16–17). wissing, j.a.b., wissing, m.p., du toit, m.m., & temane, q.m. (2008). psychometric properties of various scales measuring psychological well-being in a south african context: the fort 1 project. journal of psychology in africa, 18(4), 511–520. https://doi.org/10.1080/14330237.2008.10820230 wissing, m.p. (2008/2012). the prevalence of levels of psychosocial health: dynamics and relationships with biomarkers of (ill)health in south african social contexts (fort3). unpublished research protocol, approved by the health research ethics committee of the north-west university, south africa (number nwu 00002-07-a2). wissing, m.p., & van eeden, c. (2002). empirical clarification of the nature of psychological well-being. south african journal of psychology, 32(1), 32–44. https://doi.org/10.10520/ejc98170 appendix 1 table 1-a1: descriptive statistics of the individual scale items of the hils for samples 1–4. abstract introduction the job demands-resources approach to burnout methods results discussion conclusion acknowledgements references footnotes about the author(s) leon t. de beer workwell research unit, north-west university, potchefstroom, south africa wilmar b. schaufeli department of psychology, utrecht university, utrecht, the netherlands research unit for occupational and organizational psychology and professional learning, katholieke universiteit leuven, leuven, belgium arnold b. bakker center of excellence for positive organizational psychology, erasmus university rotterdam, rotterdam, the netherlands citation de beer, l.t., schaufeli, w.b., & bakker, a.b. (2022). investigating the validity of the short form burnout assessment tool: a job demands-resources approach. african journal of psychological assessment, 4(0), a95. https://doi.org/10.4102/ajopa.v4i0.95 original research investigating the validity of the short form burnout assessment tool: a job demands-resources approach leon t. de beer, wilmar b. schaufeli, arnold b. bakker received: 13 dec. 2021; accepted: 28 apr. 2022; published: 09 june 2022 copyright: © 2022. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract the purpose of this study was to investigate the psychometric properties of the short form burnout assessment tool (bat-12). as a result of the pandemic, job stress has been compounded and the use of conceptually grounded and accurate measures is needed to identify burnout risks within specific organisations and the overall workforce. the study sample comprised 660 employees from various occupational settings who filled out an online survey. latent variable methods with ordinal categorical data were implemented to model the data and to test the hypotheses for the study. results showed that the proposed second-order factor model of the bat-12 showed a good fit to the data and was invariant across gender and ethnicity. in addition, burnout – as operationalised with the bat-12 – played the hypothesised mediating role in the job demands-resources model. the bat-12 also showed convergent validity with the maslach burnout inventory. the authors conclude that bat-12 is a robust instrument with adequate psychometric properties to measure burnout risk and present a freely available online application for employees to estimate their risk of burnout. keywords: burnout; burnout assessment tool; work engagement; job demands-resources model; measurement invariance. introduction even though there has been an overemphasis on the psychometric properties of burnout scales that has impeded needed theory development (bakker & de vries, 2021), the south african context demands, ethically and by law, that instruments present robust evidence of unbiasedness and fair measurement because of the potential of misuse of psychological scales (barnard, 2021). furthermore, efficiencies in business have become increasingly important, and the implementation of surveys by researchers and practitioners is not exempt from this, as survey fatigue has been identified as a concern (e.g. de koning et al., 2021). gatekeepers to participating employees in organisations therefore consider it important to use accurate short scales where possible. in this way, information that can be extracted from participants is maximised, whilst respecting participants’ and the organisation’s time. as a result of the coronavirus disease 2019 (covid-19) pandemic and the associated national lockdowns, organisations did not only struggle to cope (katare, marshall, & valdivia, 2021), but the situation accelerated – by necessity – remote and digital work transformation strategies. this, in turn, has also rekindled a focus on the well-being of employees and public health in general (juchnowicz & kinowska, 2021; kniffin et al., 2021). consequently, it should come as no surprise that the term ‘burnout’ has been used frequently in the media (e.g. bernard, 2021) and its dynamics remain a focus of academic research (e.g. chirico et al., 2021). burnout has been included in the 11th revision of the international classification of diseases (icd-11) – effective from 2022 – by the world health organization (2019) and is classified as an ‘occupational phenomenon’ defined as ‘a syndrome conceptualised as resulting from chronic workplace stress that has not been successfully managed’. the consequences of burnout have become apparent over almost half a century of research: decreased performance (roczniewska & bakker, 2021; taris, 2006), impaired job satisfaction and affective commitment (park, nam, & yang, 2011; salvagioni et al., 2017), increased turnover intention, negative perceptions of quality and safety (garcia et al., 2019; salyers et al., 2017) and more physical and psychological distress symptoms (salvagioni et al., 2017). not only has research in south africa shown that burnout risk is associated with increased reporting of receiving treatment for conditions such as depression, diabetes, irritable bowel syndrome and hypertension by employees (de beer, pienaar, & rothmann, 2016) but also that the medical aid provider expenditure by private insurers on employees categorised into a high burnout risk group is approximately double the amount compared with a low burnout risk group (de beer, pienaar, & rothmann, 2013). therefore, burnout does not only affect employees – its ripple effect on the surrounding ecosystem of the organisation and society cannot be discounted. in fact, the cost of burnout to economies has been estimated at between $125 and $190 billion in the united states of america (garton, 2017). this was well before the pandemic and it is not far-fetched to posit that this estimate might currently be significantly higher. consequently, high burnout levels are not only an individual risk but also have implications for organisations and the society at large (public health). this means that burnout should be measured (identifiable) and managed to lessen its harmful impact (salvagioni et al., 2017). therefore, an accurate short version of a burnout measure becomes increasingly important. however, the measurement of burnout has been plagued by inconsistencies and criticisms over time. specifically, the most popular measure of burnout, the maslach burnout inventory (mbi; maslach & jackson, 1981; maslach, leiter, & jackson, 2017), was initially never designed or intended as a diagnostic instrument or screening device for any disease, but primarily as a research tool after interviews with individuals in human services work. this has led to critique that as burnout was defined by the mbi there is a conflation of the terminology and instrument that impedes innovation (schaufeli, desart, & de witte, 2020). in addition, peer-reviewed research has shown that the factor structure of mbi-assessed burnout has been partly inconsistent with its generally accepted presentation as a syndrome as modelling has shown not only its proposed three-factor structure but also two-factor, four-factor, five-factor, second-order and bifactor solutions (see de beer et al., 2020). to some this may not necessarily be problematic, but the reality is that if burnout is considered a syndrome, a total risk score indicated by the individual components should also be feasible. for example, the mbi does not allow for a total score to be established – instructing that its components (emotional exhaustion, cynicism or professional efficacy) should be considered separately (maslach & leiter, 2021). however, an overall score, based on the components, is ideal as one then has evidence for a syndrome with a cluster of components that is presented in line with the who description of the phenomenon. this then indicates the potential of a unidimensional, second-order (higher-order or hierarchical) and potential bifactor solutions as the possible options available to model burnout as a total score. moreover, in the absence of evidence-based diagnostic criteria for burnout, erroneous cut-off scores and prevalence estimates of burnout have been presented (brisson & bianchi, 2017). maslach and leiter (2021) have however decried ‘misuses’ of the mbi to diagnose any disease or present estimate prevalence stating that they ‘… never designed the mbi as a tool to diagnose an individual health problem’ (p. 4). furthermore, research evidence supports the notion that the purported third component of burnout, professional efficacy, should not be considered a core aspect of burnout (de beer & bianchi, 2019; kim & ji, 2009). researchers have also argued that the positively framed items for the professional efficacy component are problematic as it is measured with positive items, implying wording effects (e.g. lheureux, truchot, borteyrou, & rascle, 2017). indeed, research has shown that changing the valence of professional efficacy to professional inefficacy with negatively framed items yielded more accurate results (schaufeli & salanova, 2007). other research has proposed that the efficacy component may act as either an outcome or a precursor of burnout (schaufeli & taris, 2005). clearly, maintaining the status quo would likely only perpetuate the present situation. this is without even considering the debates in the literature regarding the overlap of burnout with depression (see bianchi et al., 2021) or reducing the definition of burnout to only exhaustion (see canu et al., 2021). however, burnout is both about inability and unwillingness (schaufeli, 2021). consequently, the burnout assessment tool (bat) was developed based on the conceptual framework of schaufeli and taris (2005), which considers both the aforementioned aspects, to address some of the problems of the mbi by using both an inductive and a deductive approach. for the inductive approach, as burnout has been recognised in the netherlands as an occupational disease for over two decades, and in flanders an occupation-related disease, there are various health professionals and occupational physicians who have worked with employees categorised as burned-out. specifically, 49 dutch and flemish professionals were interviewed who are involved at various stages of the burnout process, asked to ‘describe a patient with prototypical burnout symptoms and to focus on specific symptoms, causes, and the way burnout unfolds …’ and ‘… describe burnout in their own words, and to prioritise the burnout symptoms they mentioned in terms of their relevance for diagnosing burnout’ (schaufeli, desart, & de witte, 2020, p. 3). then, in terms of the deductive development process of the bat, more than 357 items (representing 66 dimensions) were analysed using factor analytic methods (see schaufeli et al., 2020, for a complete overview). based on these approaches, the bat defines burnout as ‘a work-related state of exhaustion that occurs amongst employees, which is characterised by extreme tiredness, reduced ability to regulate cognitive and emotional processes and mental distancing’ (schaufeli et al., 2020). noticeably, professional (in)efficacy is not present as one of the components of the bat, but there is an addition of two components with exhaustion and mental distance, that is, cognitive impairment (reduced ability to regulate cognitions) and emotional impairment (reduced ability to regulate emotions) (schaufeli, de witte, & desart, 2020). recent results showed that bat-23 functions well as a second-order factor in factor analyses of data collected in italy, romania, ecuador, poland and korea. in addition, the instrument showed measurement invariance across european countries and japan (de beer et al., 2020). the job demands-resources approach to burnout arguably, over the last two decades significant advancements in the field of occupational health psychology have occurred. one of the first was the development of the job demands-resources (jd-r) model of burnout at the turn of the millennium (see demerouti, bakker, nachreiner, & schaufeli, 2001) – explaining how exhaustion and disengagement may develop as result of working conditions, that is, job demands and job resources. the next was the publication of the utrecht work engagement scale (uwes) that measures work engagement, which is described as a positive work-related state characterised by vigour, dedication and absorption (schaufeli & bakker, 2004; schaufeli, salanova, gonzález-romá, & bakker, 2002). work engagement has also been positioned as the positive antipode of burnout (bakker & oerlemans, 2011; schaufeli & bakker, 2004). subsequently, the jd-r model was adapted to include work engagement (schaufeli & bakker, 2004) and it formally describes dual processes: (1) the health impairment process in which burnout is the result of inordinate job demands (and a lack of job resources) and that burnout in turn leads to undesired outcomes and (2) the motivational process in which work engagement is the result of job resources and this, in turn, leads to desired organisational outcomes (bakker & demerouti, 2007, 2017; bakker, demerouti & sanz-vergel, 2014). as a result of a causal chain of three variables being implied in each process, the possibility of indirect effects, that is, burnout and work engagement acting as potential mediators, will also be revisited as part of this validation. subsequently, the following hypotheses are presented for this study: h1: burnout, assessed with the bat-12, can be operationalised as a second-order factor, which is an overall latent score indicated by four latent components. h2: burnout, assessed with the bat-12, shows convergent validity with burnout as assessed with the mbi. h3: burnout, assessed with the bat-12, shows acceptable measurement invariance based on: gender ethnicity. h4: burnout is a mediator in the relationship between job demands (work overload) and turnover intention in the health impairment process of the specified jd-r model. h5: burnout is a mediator in the relationship between job resources and turnover intention in the jd-r model. therefore, the general objective of this study was to investigate the construct validity of the bat-12, to test measurement invariance and to gauge bat-assessed burnout’s performance within mediation model based on the dual process of jd-r theory. methods study design and participants the data for this study formed part of the bat project and were collected at one point in time, indicating a cross-sectional design. cross-sectional designs are suitable for studies that seek to establish the psychometric properties and correlational relationships between variables. the data were collected using a purposive sampling strategy, that is, participants had to be south african employees at least 18 years of age. participants participants were recruited via social media and could voluntarily participate according to their own volition. the sample comprised 660 employees working in south africa. the minority of the participants were men (n = 277; 42%) and the average age of the participants was 38 years, with a standard deviation of 10.60 years. regarding ethnicity, most of the sample participants were african employees (39%), followed by white employees (29.70%), coloured1 employees (12.90%) and indian employees (4.42%). measuring instruments burnout was measured with the short form bat-12 (schaufeli, de witte, & desart, 2020). the scale comprises 12 items measuring the four components of bat-defined burnout with three items for each of the components (exhaustion, mental distance, cognitive impairment and emotional impairment). the items of the bat-12 are provided in table 1. maslach burnout inventory-assessed burnout was measured using the 16-item version of the maslach burnout inventory-general survey (mbi-gs): emotional exhaustion, cynicism and professional efficacy (schaufeli, leiter, maslach, & jackson, 1996). the job demands and job resources used in this study were measured with scales from the job demands-resources scale (jdrs) that was validated by rothmann, mostert and strydom (2006). specifically, the following dimensions were used and rated on a four-point likert scale, ranging from never to always: work overload (six items; e.g. ‘i have too much work to do’), autonomy (three items; e.g. ‘do you have influence in planning your work activities?’), colleague support (three items; e.g. ‘can you count on your colleagues when you come across difficulties in your work?’), supervisor support (three items; e.g. ‘can you count on your direct supervisor when you come across difficulties in your work?’) and role clarity (four items; e.g. ‘do you know exactly what other people expect of you in your work?’). work engagement was measured with the three-item ultra-short version of the uwes-3 (e.g. ‘at work, i feel bursting with energy’) (schaufeli, shimazu, hakanen, salanova, & de witte, 2019). lastly, turnover intention was measured with a three-item scale (e.g. ‘i am actively looking for other jobs’) (sjöberg & sverke, 2000). table 1: factor loadings from the confirmatory factor analysis model. table 2: descriptive statistics, omega reliability, average variance extracted and correlation matrix with shared variances. data analysis the software program mplus 8.6 (muthén & muthén, 2021) was used to model the data. it is important to note that the items were considered to be ordered categorical in nature and not purely continuous. therefore, the meanand variance-adjusted weighted least squares (wlsmv) estimation method was used, as this estimator is also robust against non-normality of data (li, 2016). specifically, confirmatory factor analysis (cfa) was implemented by specifying a second-order model – in line with the assumption that the bat should also be able to model burnout to be a syndrome indicated by its four first-order components. in terms of fit statistics, the comparative fit index (cfi) and tucker–lewis index (tli) were considered and these values need to be above 0.90 (van de schoot, lugtig, & hox, 2012). in addition, the root mean squared error of approximation (rmsea) and standardised root mean residual (srmr) were also considered, and these values should ideally be below 0.08. however, recent research has shown that srmr performs better compared with rmsea when data are estimated as ordered categorical in nature (see shi, maydeu-olivares, & rosseel, 2020). factor loadings were considered acceptable at approximately 0.50 and effect sizes for correlation coefficients were small (0.10+), medium (0.30+) and large (0.50+). for support of discriminant validity correlation coefficients had to be below the guideline of 0.85 in all correlational relationships between the variables (brown, 2015). to test the equivalence of the bat-12 across gender and ethnicity, measurement invariance analyses were implemented with wlsmv and theta parameterisation (millsap & yun-tein, 2004). as the data were specified as categorical (considering category thresholds and not only intercepts) and the bat-12 was modelled as a second-order factor, the analyses were also somewhat more complex when compared with normal measurement invariance with maximum likelihood and continuous data. a series of models had to be tested for both gender and ethnicity in line with the approach taken in de beer et al. (2020) when the bat-23 was tested for invariance in a sample of six european countries and japan (see table 3 for the model descriptions). as there is no agreement in the literature as to whether loading or threshold invariance should be tested first, this step was combined (see de beer et al., 2020, for a complete overview). moreover, as guidelines for delta (δ) changes in cfi and rmsea for second-order models with categorical data have not been formally established, we used a change in cfi of no larger than 0.008 and rmsea of 0.060 for the first-order models to not be significantly worse-fitting (cf. de beer et al., 2020). but we used the conventional criteria of changes no larger than 0.010 for cfi and 0.015 for rmsea between the second-order models as these included intercept parameters (rudnev, lytkina, davidov, schmidt, & zick, 2018). if these aforementioned criteria were met between the models, the bat-12 could be considered invariant across gender and ethnicity in the sample, allowing for fair comparison between groups if required. table 3: results of the second-order measurement invariance testing. to test the criterion validity of the bat-12, a classical dual process model based on jd-r theory was specified as a mediation model – see figure 1. in this mediation model, the focus was on the significance, size and direction of the standardised beta coefficients. the bootstrapping option was also enabled to resample 50 000 times from the data to obtain 95% confidence intervals (cis) for the indirect effects in the model. for a meaningful indirect effect to exist, the guideline is that the 95% ci for that parameter should not include the value zero, that is, the parameter should not change sign from negative to positive or vice versa. figure 1: the job demands-resources model for the research study. ethical considerations ethical clearance to conduct the study was obtained from the economic and management sciences research ethics committee of the north-west university (nwu-00558-17-a4). the participants followed a process of informed consent that explained the purpose of the study and that all data would be handled in a confidential manner. every person had to agree to participate in the study before they could continue with answering any of the questions in the survey. as the project was advertised online, the possibility of repercussions for any person who did not wish to participate in the study is almost impossible as these participants cannot be identified. results modelling the burnout assessment tool-12 as a second-order model the cfa modelling of the bat as a second-order factor indicated by four first-order factors (exhaustion, mental distance, cognitive impairment and emotional impairment) resulted in a good fit to the data: χ2 = 541.33; df = 50; cfi = 0.95; tli = 0.93; rmsea = 0.12; and srmr = 0.06. all the fit statistics except the rmsea were satisfactory, but as mentioned it has been shown that rmsea is biased when ordered categorical data are used in estimation procedures. we therefore deferred to the srmr which has shown to be more accurate under these conditions (shi et al., 2020). this second-order model was compared to a strictly unidimensional (one-factor) model that was clearly shown to be inferior: χ2 = 1330.31; df = 54; cfi = 0.86; tli = 0.82; rmsea = 0.19; and srmr = 0.08. table 1 presents factor loading values, standard errors and the associated statistical significance values for the second-order model. as shown in table 1, all factor loadings were significant; p < 0.001 for all items. the values of the loadings were all the given guideline of 0.50 (hair, black, babin, & anderson, 2010) and a majority above 0.70, except for mental distance item 2 which had a loading of 0.47. however, given that this was only a 0.03 difference from the guideline, we decided to keep the item, as it was well above the conventional 0.30 criterion, and a factor with two items would not be identified and disqualify the measure from cross-country comparison in future studies. as shown in table 2, all components of the bat-12 showed acceptable omega reliability estimates (ω > 0.70). furthermore, the ave as indicator of convergent validity was also satisfied, except for mental distance which was just below the 0.50 cut-off. however, one must be pragmatic in considering cut-off values, and mental distance still showed discriminant validity in all of its correlations with the other bat variables. indeed, discriminant validity was evident for all correlational relationships as the aves for all factors were greater than the shared variances (squared correlations) between them – indicating that the components of the bat-12 can be distinguished from one another in line with a ‘syndrome’ (i.e. a set of underlying symptoms that refer to an underlying common condition). furthermore, all correlations between the bat components showed a large effect size (r ≥ 0.63). therefore, based on the given evidence, h1 was supported because bat-12-assessed burnout can be modelled as a second-order model indicated by four first-order factors, namely, exhaustion, mental distance, cognitive impairment and emotional impairment. convergent validity of burnout measured by the burnout assessment tool-12 and the maslach burnout inventory to test h2, a cfa model was specified with the bat as a second-order burnout factor (indicated by exhaustion, mental distance, cognitive impairment and emotional impairment) and the maslach burnout inventory (mbi) as a two-factor model: burnout (indicated by its exhaustion and mental distance items as the core) and professional efficacy as a separate factor. this model showed an acceptable fit to the data: χ2 = 2668.09; df = 343; cfi = 0.92; tli = 0.91; rmsea = 0.101; and srmr = 0.06. the correlation between bat-assessed burnout and the mbi-assessed burnout was 0.92, which indicates strong evidence for convergent validity (83.72% shared variance) – supporting h2. the correlations with professional efficacy showed medium effect sizes with both the mbi (r = –0.49) and the bat (r = –0.45) assessed burnout. second-order measurement invariance of the burnout assessment tool-12 based on gender and ethnicity the results of the measurement invariance models showed that the bat-12 was invariant across gender and ethnicity – see table 3. specifically, the changes (δ) in cfa, rmsea and srmr all met the criteria followed as described. hypotheses 3a and 3b were therefore supported; the bat-12 measures fairly across these groups and levels of burnout can be directly compared if required. criterion-related validity of the burnout assessment tool-12 in the job demands-resources model the model specified for the criterion-related validity of burnout as assessed by the bat-12 in the context of jd-r theory also showed a good fit to the data: χ2 = 2262.03; df = 617; cfi = 0.93; tli = 0.92; rmsea = 0.06; and srmr = 0.06. the omega reliabilities and the correlations for the model are provided in table 4. as can be seen, all factors were reliable and the correlations were all in the expected directions. table 4: correlation matrix for the job demands–resources model. as can be seen from figure 1, the path coefficients (structural relationships) in the model were as expected. work overload showed a positive path to burnout (β = 0.49), and burnout was positively related to turnover intention (β = 0.31). job resources had a negative path to burnout (β = –0.58) and a positive path to work engagement (β = 0.63). work engagement, in turn, had a negative path to turnover intention (β = –0.14). lastly, work overload did not have a significant path to turnover intention (p = 0.15), but job resources showed a negative direct path to turnover intention (β = –0.28). bootstrapping revealed that there was a meaningful indirect effect from work overload to turnover intention through burnout (β = 0.15, 95% ci [0.05, 0.26]). similarly, job resources had a negative indirect effect on turnover intention through burnout (β = –0.18, 95% ci [–0.30, –0.07]). these results supported h4 and h5. as an additional analysis, work engagement was also tested as a mediator in the relationship between job resources and turnover intention. the 95% ci crossed through zero by the closest possible margin (β = –0.09, 95% ci [–0.18, 0.001]). but given this is a thousandth of a decimal threshold and considering the 90% cis, there is tentative evidence for an effect (β = –0.09, 90% ci [–0.17, –0.01]) in line with the literature. discussion the purpose of this study was to investigate the validity and measurement invariance of the short form bat-12 in the context of jd-r theory. the results of cfa showed that bat-12 can be modelled as a second-order factor indicated by four facets: exhaustion, mental distance, cognitive impairment and emotional impairment. these findings are consistent with theoretical expectations for the instrument based on schaufeli, desart, & de witte (2020). furthermore, there were no discriminant validity concerns between the components of the bat-12. all in all, the bat-12 showed robust construct validity, supporting h1. the bat-12 also showed convergent validity with the mbi – modelled with its core items of emotional exhaustion and cynicism – indicating a similar concept being measured, supporting h2. although this overlap is substantial, it is important to consider that the bat was developed not only inductively but also deductively and explicitly includes the component of executive functioning that may be impaired: cognitive impairment (see deligkaris, panagopoulou, montgomery, & masoura, 2014; demerouti, bakker, peeters, & breevaart, 2021) and emotional impairment that is not present within the mbi. furthermore, h3a and h3b were also supported as the bat-12 was found to be invariant for both gender and ethnicity. this result is in line with the measurement invariance tests conducted for the bat-23 within the south african context that showed strong measurement invariance for gender and ethnicity (de beer et al., 2022) and other invariance tests that have shown the cross-cultural validity of the bat (e.g. de beer et al., 2020). consequently, the bat-12 can be used to compare scores fairly between groups or persons if such comparisons are needed. finally, considering h4–h5, the proposed jd-r model showed a good fit to the data and the indirect effects were generally as expected. there were indirect effects from job demands (work overload) and job resources through bat-12 burnout to turnover intention. specifically, job demands had a positive effect and job resources had a negative effect – indicating the importance of optimal resources. therefore, strong evidence for the health impairment process was present. contrastingly, even though the direct effects in the proposed regression chain were significant for the motivational process, the indirect effect from job resources to turnover intention through the ultra-short work engagement construct was only marginally meaningful. considering the literature, the 90% cis and the very small violation of the guideline (one thousandths of a decimal) this is considered an artefact in this sample and future studies will likely find different. in general, the results are in line with previous studies on the jd-r model in south africa (e.g. de beer, rothmann, & pienaar, 2012). in summary, bat-12 was shown to have robust psychometric properties and the instrument can be used in a valid way to measure employees’ burnout levels. limitations and recommendations for future research this study is not without limitations. firstly, this study used a cross-sectional design – hence it was not possible to investigate the test–retest reliability of the bat-12, even though adequate omega reliability coefficients were presented. future studies should therefore consider a longitudinal design to investigate test–retest reliability and establish causal ordering in the nomological network. secondly, the sample was non-probability and therefore cannot be completely representative of the south african working population. therefore, generalisation is cautioned, even though the results are in line with the available literature in other contexts. lastly, this study did not include a measure of depression. the debate about the overlap of burnout and depression is current in the literature (e.g. bianchi et al., 2021; meier & kim, 2021) and the bat should also be investigated in this context. future studies should therefore consider including measures of depression and using techniques such as latent profile analyses and bifactor exploratory structural equation modelling analyses to attempt to disentangle the overlap of the bat with depression scales (see morin, arens, & marsh, 2016). another avenue is to identify a group with serious burnout problems and use receiver operating characteristic (roc) analysis to establish cut-off values that can be used for screening to identify (potential) burnout cases. for other future research direction considerations, we referred to demerouti et al. (2021). conclusion the results of this study indicate that the bat-12 is a robust tool to measure the burnout risk of employees within an organisational context. more specifically, the bat-12 can be used to measure individual levels of burnout, as well as group-level burnout within a company as part of a psychosocial risk analysis. the bat-assessed burnout also performs well within a jd-r framework to explain the process of health impairment in employees. however, it must be emphasised that at present the bat does not categorise someone as burned out or not burned out and only assesses burnout risk (level), which if problematic should refer the employee to the necessary employee assistance programme or relevant health professional for a clinical interview. therefore, prevalence estimates are discouraged. an online application for south african employees to screen their personal burnout risk level against the current norms of the bat project data set is freely accessible at https://theburnout.app/?mod=bat12sa. we are optimistic that this validation study and the online application will assist south african organisations and their employees to prevent burnout and facilitate occupational well-being. acknowledgements the authors would like to acknowledge maksim rudnev’s work on second-order measurement invariance in general, as well as his assistance with the adaptation of the online application of the bat-23 to be able to accommodate the 12-item version of the bat for south africa. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions l.d.b. was responsible for conceptualisation, methodology, statistical analyses and writing of the original draft. w.b.s and a.b.b. were involved in the writing, review and editing of the article. all the authors have read and approved the final version of the manuscript to be published. funding information the data for this study formed part of the bat project in south africa that was supported by the national research foundation (south africa) with reference number csrp170523232041 (grant no: 112106). the views and opinions expressed are those of the researchers and do not reflect the opinions or views of the national research foundation. data availability the data set used and analysed during this study can be available from the corresponding author upon reasonable requests. disclaimer the views and opinions expressed in this article are those of the authors and do not reflect the opinions or views of the national research foundation or any other agency of the authors. references bakker, a.b., & demerouti, e. (2007). the job demands-resources model: state of the art. journal of managerial psychology, 22(3), 309–328. https://doi.org/10.1108/02683940710733115 bakker, a.b., & demerouti, e. (2017). job demands–resources theory: taking stock and looking forward. journal of occupational health psychology, 22(3), 273–285. https://doi.org/10.1037/ocp0000056 bakker, a.b., demerouti, e., & sanz-vergel, a.i. (2014). burnout and work engagement: the jd–r approach. annual review of organizational psychology and organizational behavior, 1(1), 389–411. https://doi.org/10.1146/annurev-orgpsych-031413-091235 bakker, a.b., & de vries, j.d. (2021). job demands–resources theory and self-regulation: new explanations and remedies for job burnout. anxiety, stress, & coping, 34(1), 1–21. https://doi.org/10.1080/10615806.2020.1797695 bakker, a.b., & oerlemans, w. (2011). subjective well-being in organizations. in k.s. cameron & g.m. spreitzer (eds.), the oxford handbook of positive organizational scholarship (pp. 178–189). new york, ny, usa: oxford university press. barnard, a. (2021). psychological assessment: predictors of human behaviour. in m. coetzee, e. botha & l. de beer (eds.), personnel psychology: an applied perspective (3rd ed., pp. 159–200). cape town: oxford university press. bernard, a. (2021, october). covid-19 pandemic leading to higher levels of employee burnout. techrepublic. retrieved from https://www.techrepublic.com/article/covid-19-pandemic-leading-to-higher-levels-of-employee-burnout/ bianchi, r., verkuilen, j., schonfeld, i.s., hakanen, j.j., jansson-fröjmark, m., manzano-garcía, g., … meier, l.l. (2021). is burnout a depressive condition? a 14-sample meta-analytic and bifactor analytic study. clinical psychological science, 9(14), 579–597. https://doi.org/10.1177/2167702620979597 brisson, r., & bianchi, r. (2017). stranger things: on the upside down world of burnout research. academic psychiatry, 41(2), 200–201. https://doi.org/10.1007/s40596-016-0619-7 brown, t.a. (2015). confirmatory factor analysis for applied research (2nd ed.). new york, ny: guilford press. canu, i.g., marca, s.c., dell’oro, f., balázs, á., bergamaschi, e., besse, c., … wahlen, a. (2021). harmonized definition of occupational burnout: a systematic review, semantic analysis, and delphi consensus in 29 countries. scandinavian journal of work, environment & health, 47(2), 95–107. https://doi.org/10.5271/sjweh.3935 chirico, f., ferrari, g., nucera, g., szarpak, l., crescenzo, p., & ilesanmi, o. (2021). prevalence of anxiety, depression, burnout syndrome, and mental health disorders among healthcare workers during the covid-19 pandemic: a rapid umbrella review of systematic reviews. journal of health and social sciences, 6(2), 209–220. https://doi.org/10.19204/2021/prvl7 de beer, l.t. (2021). is there utility in specifying professional efficacy as an outcome of burnout in the employee health impairment process. international journal of environmental research and public health, 18(12), 6255. https://doi.org/10.3390/ijerph18126255 de beer, l.t., & bianchi, r. (2019). confirmatory factor analysis of the maslach burnout inventory: a bayesian structural equation modeling approach. european journal of psychological assessment, 35(2), 217–224. https://doi.org/10.1027/1015-5759/a000392 de beer, l.t., pienaar, j., & rothmann, s., jr. (2013). linking employee burnout with medical aid provider expenditure. south african medical journal, 103(2), 89–93. https://doi.org/10.7196/samj.6060 de beer, l.t., pienaar, j., & rothmann, s., jr. (2016). job burnout, work engagement and self-reported treatment for health conditions in south africa. stress and health, 32(1), 36–46. https://doi.org/10.1002/smi.2576 de beer, l., rothmann, s., jr., & pienaar, j. (2012). a confirmatory investigation of a job demands-resources model using a categorical estimator. psychological reports, 111(2), 528–544. https://doi.org/10.2466/01.03.10.pr0.111.5.528-544 de beer, l.t., schaufeli, w.b., & de witte, h. (2022). the psychometric properties and measurement invariance of the burnout assessment tool (bat-23) in south africa (unpublished manuscript). north-west university, potchefstroom campus. de beer, l.t., schaufeli, w.b., de witte, h., hakanen, j.j., shimazu, a., glaser, j., … rudnev, m. (2020). measurement invariance of the burnout assessment tool (bat) across seven cross-national representative samples. international journal of environmental research and public health, 17(15), 5604. https://doi.org/10.3390/ijerph17155604 de koning, r., egiz, a., kotecha, j., ciuculete, a.c., ooi, s.z.y., bankole, n.d.a., … & kanmounye, u.s. (2021). survey fatigue during the covid-19 pandemic: an analysis of neurosurgery survey response rates. frontiers in surgery, 8, 690680. https://doi.org/10.3389/fsurg.2021.690680 deligkaris, p., panagopoulou, e., montgomery, a.j., & masoura, e. (2014). job burnout and cognitive functioning: a systematic review. work and stress, 28, 107–123. https://doi.org/10.1080/02678373.2014.909545 demerouti, e., bakker, a.b., nachreiner, f., & schaufeli, w.b. (2001). the job demands-resources model of burnout. journal of applied psychology, 86(3), 499–512. https://doi.org/10.1037/0021-9010.86.3.499 demerouti, e., bakker, a.b., peeters, m.c., & breevaart, k. (2021). new directions in burnout research. european journal of work and organizational psychology, xx, 1–6. https://doi.org/10.1080/1359432x.2021.1979962 garcia, c.d.l., abreu, l.c.d., ramos, j.l.s., castro, c.f.d.d., smiderle, f.r.n., … bezerra, i.m.p. (2019). influence of burnout on patient safety: systematic review and meta-analysis. medicina, 55(9), 553. https://doi.org/10.3390/medicina55090553 garton, e. (2017, april). employee burnout is a problem with the company, not the person. harvard business review. retrieved from https://hbr.org/2017/04/employee-burnout-is-a-problem-with-the-company-not-the-person hair, j.f., black, w.c., babin, b.j., & anderson, r.e. (2010). multivariate data analysis: a global perspective (7th ed.). upper saddle river, nj, london: pearson education. juchnowicz, m., & kinowska, h. (2021). employee well-being and digital work during the covid-19 pandemic. information, 12(8), 293. https://doi.org/10.3390/info12080293 katare, b., marshall, m.i., & valdivia, c.b. (2021). bend or break? small business survival and strategies during the covid-19 shock. international journal of disaster risk reduction, 61, 102332. https://doi.org/10.1016/j.ijdrr.2021.102332 kim, h., & ji, j. (2009). factor structure and longitudinal invariance of the maslach burnout inventory. research on social work practice, 19(3), 325–339. https://doi.org/10.1177/1049731508318550 kniffin, k.m., narayanan, j., anseel, f., antonakis, j., ashford, s.p., bakker, a.b., … van vugt, m. (2021). covid-19 and the workplace: implications, issues, and insights for future research and action. american psychologist, 76(1), 63–77. https://psycnet.apa.org/doi/10.1037/amp0000716 lheureux, f., truchot, d., borteyrou, x., & rascle, n. (2017). the maslach burnout inventory–human services survey (mbi-hss): factor structure, wording effect and psychometric qualities of known problematic items. le travail humain, 80(2), 161–186. https://doi.org/10.3917/th.802.0161 li, c.h. (2016). the performance of ml, dwls, and uls estimation with robust corrections in structural equation models with ordinal variables. psychological methods, 21(3), 369–387. https://doi.org/10.1037/met0000093 maslach, c., & jackson, s.e. (1981). the measurement of experienced burnout. journal of organizational behavior, 2(2), 99–113. https://doi.org/10.1002/job.4030020205 maslach, c., & leiter, m.p. (2021, march 19). how to measure burnout accurately and ethically. harvard business review. retrieved from https://hbr.org/2021/03/how-to-measure-burnout-accurately-and-ethically maslach, c., leiter, m.p., & jackson, s.e. (2017). maslach burnout inventory manual (4th ed.). palo alto, ca, usa: mind garden. meier, s.t., & kim, s. (2021). meta-regression analyses of relationships between burnout and depression with sampling and measurement methodological moderators. journal of occupational health psychology. retrieved from https://psycnet.apa.org/doi/10.1037/ocp0000273 millsap, r.e., & yun-tein, j. (2004). assessing factorial invariance in ordered-categorical measures. multivariate behavioral research, 39(3), 479–515. https://doi.org/10.1207/s15327906mbr3903_4 morin, a.j., arens, a.k., & marsh, h.w. (2016). a bifactor exploratory structural equation modeling framework for the identification of distinct sources of construct-relevant psychometric multidimensionality. structural equation modeling: a multidisciplinary journal, 23(1), 116–139. https://doi.org/10.1080/10705511.2014.961800 muthén, l.k., & muthén, b.o. (2021). mplus user’s guide (8 ed.). los angeles, ca: muthén & muthén. park, h., nam, s., & yang, e. (2011). relationships of burnout with job attitudes and turnover intention among koreans: a meta-analysis. korean journal of industrial and organizational psychology, 24(3), 457–491. https://doi.org/10.24230/kjiop.v24i3.457-491 roczniewska, m., & bakker, a.b. (2021). burnout and self-regulation failure: a diary study of self-undermining and job crafting among nurses. journal of advanced nursing, 77(8), 3424–3435. https://doi.org/10.1111/jan.14872 rothmann, s., mostert, k., & strydom, m. (2006). a psychometric evaluation of the job demands-resources scale in south africa. sa journal of industrial psychology, 32(4), 76–86. retrieved from https://hdl.handle.net/10520/ejc89107 rudnev, m., lytkina, e., davidov, e., schmidt, p., & zick, a. (2018). testing measurement invariance for a second-order factor: a cross-national test of the alienation scale. methods, data, analyses: a journal for quantitative methods and survey methodology, 12(1), 47–76. https://doi.org/10.12758/mda.2017.11 salvagioni, d.a.j., melanda, f.n., mesas, a.e., gonzález, a.d., gabani, f.l., & de andrade, s.m. (2017). physical, psychological and occupational consequences of job burnout: a systematic review of prospective studies. plos one, 12(10), e0185781. https://doi.org/10.1371/journal.pone.0185781 salyers, m.p., bonfils, k.a., luther, l., firmin, r.l., white, d.a., adams, e.l., & rollins, a.l. (2017). the relationship between professional burnout and quality and safety in healthcare: a meta-analysis. journal of general internal medicine, 32(4), 475–482. https://doi.org/10.1007/s11606-016-3886-9 schaufeli, w. (2021). the burnout enigma solved? scandinavian journal of work, environment & health, 47(3), 169–170. https://doi.org/10.5271/sjweh.3950 schaufeli, w.b., & bakker, a.b. (2004). job demands, job resources, and their relationship with burnout and engagement: a multi-sample study. journal of organizational behavior, 25(3), 293–315. https://doi.org/10.1002/job.248 schaufeli, w.b., desart, s., & de witte, h. (2020). burnout assessment tool (bat) – development, validity, and reliability. international journal of environmental research and public health, 17(24), 9495. https://doi.org/10.3390/ijerph17249495 schaufeli, w.b., de witte, h., & desart, s. (2020). manual burnout assessment tool (bat) – version 2.0. ku leuven: unpublished internal report. schaufeli, w.b., leiter, m.p., maslach, c., & jackson, s.e. (1996). maslach burnout inventory-general survey. in c. maslach, s.e. jackson, & m.p. leiter (eds.), the maslach burnout inventory-test manual (3rd ed.). palo alto, ca, usa: consulting psychologists press. schaufeli, w.b., & salanova, m. (2007). efficacy or inefficacy, that’s the question: burnout and work engagement, and their relationships with efficacy beliefs. anxiety, stress & coping, 20(2), 177–196. https://doi.org/10.1080/10615800701217878 schaufeli, w.b., salanova, m., gonzález-romá, v., & bakker, a.b. (2002). the measurement of engagement and burnout: a two sample confirmatory factor analytic approach. journal of happiness studies, 3, 71–92. https://doi.org/10.1023/a:1015630930326 schaufeli, w.b., shimazu, a., hakanen, j., salanova, m., & de witte, h. (2019). an ultra-short measure for work engagement: the uwes-3 validation across five countries. european journal of psychological assessment, 35(4), 577–591. https://doi.org/10.1027/1015-5759/a000430 schaufeli, w.b., & taris, t.w. (2005). the conceptualization and measurement of burnout: common ground and worlds apart. work & stress, 19(3), 256–262. https://doi.org/10.1080/02678370500385913 sjöberg, a., & sverke, m. (2000). the interactive effect of job involvement and organizational commitment on job turnover revisited: a note on the mediating role of turnover intention. scandinavian journal of psychology, 41(3), 247–252. https://doi.org/10.1111/1467-9450.00194 shi, d., maydeu-olivares, a., & rosseel, y. (2020). assessing fit in ordinal factor analysis models: srmr vs. rmsea. structural equation modeling: a multidisciplinary journal, 27(1), 1–15. https://doi.org/10.1080/10705511.2019.1611434 taris, t.w. (2006). is there a relationship between burnout and objective performance? a critical review of 16 studies. work & stress, 20(4), 316–334. https://doi.org/10.1080/02678370601065893 van de schoot, r., lugtig, p., & hox, j. (2012). a checklist for testing measurement invariance. european journal of developmental psychology, 9(4), 486–492. https://doi.org/10.1080/17405629.2012.686740 world health organization. (2019, may). burn-out an “occupational phenomenon”: international classification of diseases. retrieved from https://www.who.int/news/item/28-05-2019-burn-out-an-occupational-phenomenon-international-classification-of-diseases footnote 1. all descriptions in this section are used in line with the terminology of the employment equity act, 55 of 1998 for designated and non-designated groups. ‘coloured’ is an official term in south africa and indicates citizens of mixed ethnic origins. no offense is intended. abstract introduction the new workforce – gen y the need for effective leaders leadership theory is still evolving leadership competency models are defective and incomplete the way forward: the development of a comprehensive graduate leader competency model a new structure for (leadership) competency models: the nomological network the integration of theory on leadership performance future aims conclusion acknowledgements references footnotes about the author(s) jacques s. pienaar department of industrial psychology, faculty of economic and management sciences, stellenbosch university, stellenbosch, south africa carl c. theron department of industrial psychology, faculty of economic and management sciences, stellenbosch university, stellenbosch, south africa citation pienaar, j.s., & theron, c.c. (2021). the development and validation of a graduate leader competency questionnaire: arguing the need for a graduate leader performance measure. african journal of psychological assessment, 3(0), a61. https://doi.org/10.4102/ajopa.v3i0.61 original research the development and validation of a graduate leader competency questionnaire: arguing the need for a graduate leader performance measure jacques s. pienaar, carl c. theron received: 09 may 2021; accepted: 19 july 2021; published: 17 sept. 2021 copyright: © 2021. the author(s). licensee: aosis. this is an open access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. abstract this article deals with the need for the conceptualisation and operationalisation of a modern graduate leader performance construct and the development and psychometric evaluation of a (graduate) leader competency questionnaire. the need for an investigation into the graduate leader performance construct is motivated against the backdrop of the availability of a new generation of leaders given the impending retirement of the world’s most senior management talent. generation y is singled out as a critical resource pool whose leadership potential needs to be tapped to enhance organisational performance and improve the economic fortunes of our country. however, it is pointed out that our understanding of this generation, as well as the psychological mechanism that determines how leaders influence various aspects of an organisation, work group or team to bring about optimal performance at a collective level, is fragmented and incomplete. accordingly, we make suggestions for expanding contemporary conceptualisations of competency models so as to merge an expanded form of a competency model with the notion of a nomological network in providing a comprehensive explanation for the psychological mechanism that regulates graduate leader performance within organisational settings. the explication of such a competency model logically needs to start with the conceptualisation of the graduate leader performance construct. the validation of such a competency model will necessitate in future, amongst others, a measure of the competencies comprising the graduate leader performance construct as well. keywords: competencies; competency modelling; gen y; graduate; leadership; nomological network; performance; talent. introduction the imminent retirement of most of industry’s senior and most influential management talent (lacey & groves, 2014) from the baby boomer generation (wong, lang, gardiner, & coulon, 2008) (individuals born between 1945 and 1964) is creating a leadership vacuum in organisations around the world (miner, 2019; silvestri, 2013; squyres, 2020). it is estimated that between 60 and 80 million baby boomers will be exiting the workforce over the next 7 to 10 years, which roughly translates into approximately 10 000 of this generation’s employees, retiring daily across the globe, each of which has between 30 and 40-years of work experience (miner, 2019). lacey and groves (2014) liken the retirement of the baby boomers to a catastrophe – referring to it as the 5/50 crisis as the expectation at the time was that industry would lose up to 50% of their management talent within 5 years. this massive exodus of leadership (or managerial) talent, knowledge and work experience (hagemann & stroope, 2013) challenges human resource (hr) departments with creating strategies for the preservation or transfer of institutional knowledge, filling critical (functional) skills gaps and lastly but most importantly, developing leadership (or management) succession pipelines for the future (seemiller & grace, 2019). however, whilst the substitution of retiring managers (or leaders) with high potentials from the newer generations represents the obvious solution to this latter leadership pipeline dilemma, this evolutionary process, which has been a natural occurrence in intergenerational takeovers up until now, might not work as well this time around. this is because gen x (the second newest generational cohort to enter the workforce comprising of individuals born between 1965 and 1981; wong et al., 2008) is made up of significantly fewer people than the older baby boomer generation who they are expected to replace. as gen x is believed to have approximately 25 million fewer people (and thus a considerably smaller talent pool) than the baby boomers (miner, 2019), there is simply not enough (capable) replacements amongst this generation of employees to meet industry’s declining manpower and leadership needs. the rationale for singling out gen y employees (a cohort of individuals born between 1982 and 2000; wong et al., 2008) in the current discussion is thus marked by their ever-increasing representation in the workforce (they will make up 75% of the workforce by 2025; culiberg & mihelic, 2016) and the smaller number of gen x employees coupled with the imminent retirement of baby boomers, with gen z (born after 20001; mccrindle research, 2006) still waiting in the wings. as was the case in the past with the baby boomers where organisations adjusted their structure, strategies, compensation and management styles to fit this cohort’s specific mindset (risher, 2008), gen y also brings forward unique characteristics that are remarkably different from previous generations (naim & lenka, 2018). these characteristics have ‘significant implications for the design of organisations and work groups in order to meet the needs of these younger workers’ (yrle, hartman, & payne, 2005, p. 198), elicit the best performances from them (cook, 2016) and develop leadership bench strength amongst them in the workplace. these characteristics, therefore, also need to be acknowledged and reflected in an explanatory graduate leader competency model. the new workforce – gen y in explaining the origins of generational differences, generational cohort theory holds that different generations develop unique psycho-graphical attributes because of shared events they experience during their formative years, leading to a similar values system, perceptions and attitudes (d’amato & herzfeldt, 2008; gentry, griggs, deal, mondore, & cox, 2011; kupperschmidt, 2000) that ultimately manifest in the form of new behavioural trends in the workplace. for example, generation y individuals are reported to be emotionally needy and to constantly seek approval and praise (bencsik, horváth-csikó, & tímea, 2016; crumpacker & crumpacker, 2007). this need for constant feedback and recognition (hurst & good, 2009) has been reinforced by several authors in the literature (e.g. martin, 2005; smith & galbraith, 2012), is understood to be characteristic of this high maintenance (graen & schiemann, 2013; martin, 2005) generation and is probably a consequence of comparable liberal parental direction experienced in their childhood (glass, 2007) which became popular during their youth. for similar reasons, gen y employees may also prefer teamwork (gilbert, 2011; hills, ryan, smith, & warren-forward, 2012; olšovská, mura, & švec, 2015; van der wal, 2017), environments where there is collaborative decision-making (glass, 2007; vanmeter, grisaffe, douglas, chonko, & roberts, 2013) and where they have freedom and flexibility to get the task done and at their own pace (martin, 2005). furthermore, gen y employees appear to have superior ambition and a desire to keep learning and move quickly upwards through an organisation (rheeder, 2015) into positions or assignments that will improve their curricula vitae (cvs) (hira, 2007) and portfolios of marketable skills (connor & shaw, 2008). their desire for rapid career growth is mirrored in higher salary expectations, with some gen y employees even expecting pay raises after only 6 months on the job (erikson, alsop, nicholson, & miller, 2009). their parents’ continued financial and emotional support once again may have likely contributed to this sense of entitlement (erickson, 2008), but regardless of some who view this positively as a form of optimism, this generation still expects to progress in their careers at a rate considered unrealistic by their (senior) colleagues (karefalk, pettersen, & zhu, 2007). in a similar vein, hanson and gulish (2016) and sharma (2012) attribute this form of self-entitlement of gen y to being the most educated generation ever and to the fact that they have grown up in relative opulence compared to other generations. accordingly, this generational cohort is described as fickle in terms of where they want to work, with respect to employer brand (pihlak, 2018), industry sector (pwc, 2011) and remuneration (they will not accept a low salary with the promise of raises to come later; martin & tulgan, 2011) despite many of them being perpetually unemployed (pauw, bhorat, goga, ncube, oosthuizen, & van der westhuizen, 2006). these fickle preferences, of course, are relevant to their choice of work and assignments as well in that it is generally accepted that gen y individuals desire challenging and meaningful assignments (baruch, 2004; olšovská et al., 2015) and are simply not satisfied with menial or mediocre jobs (laundrum, 2016). for millennials, ‘it’s not a question of whether or not they are right for the job, it’s a question of is the job right for them’ (caraher, 2015, p. 27). unfortunately, however, competition and work requirements2 have risen substantially from the time of the baby boomers, thus making it difficult for them to find meaningful (entry-level) work (hanson & gulish, 2016) at all. perhaps one of the most defining characteristics of gen y is their kinship with the digital world (rheeder, 2015). prensky (2001) refers to them as ‘digital natives’ as they have grown up with broadband, email (mangelsdorf, 2015), social media and a wide range of other online applications and services, making them extremely tech-savvy (zang, lu, & murat, 2017) with an intuitive grasp of technology (combes, 2009) and demanding of instant access to information (rheeder, 2015) and gratification (erickson, 2008). based on these experiences, hershatter and epstein (2010) argue that they have every reason to assume that all necessary information can be obtained (and work and learning be done, and relationships maintained) with the touch of a button, and on a 24/7/365 basis, which further exacerbates their sense of entitlement as well as their demands for instant gratification. regardless, these above-mentioned characteristics of millennials have one aspect in common, namely a desire to express their individuality (ferf, 2016) in all aspects of their lives: in short, they (also) want a customised work environment – and personalised careers… none of this should be too surprising… we live in a world that expects mass customisation… customers demand goods and services that meet their individual needs… it’s not hard to see why millennials growing up in this environment expect no less from their jobs. (p. 5) in summary, from the above, it should be clear that gen y possesses more bargaining power in the labour market than ever before and that they bring unique needs, values and characteristics that appear markedly different (rheeder, 2015) from those traditionally held in the workplace. in conjunction with the rise of the protean career that emphasises career success (park & rothwell, 2009) and freedom (chin & rasdi, 2014) from the side of employees in crafting their own career trajectories in ways that might not align with the organisation’s leadership pipeline needs, gen y’s unique needs, values and characteristics might therefore also derail leadership development initiatives targeted at this generation as well. for example, the reported transactional, mediumand short-term orientation of gen y in terms of the psychological contracts they now enter with employers (beddingfield, 2005) and their documented demand for work-life balance (clarke, 2015) makes one wonder whether leadership development programs that rely on many sacrifices from the side of trainees, including the time and effort associated with intensive and extra-curricular training and a longer-term commitment to stay on in one organisation, will gain traction with this generation at all. despite this continuing shift in workforce dynamics, however, there has been no significant change in human resource (hr) management practices in recent years (naim & lenka, 2018; sylvester, 2015) to acknowledge the changing nature of the workforce. the reluctance (or negligence) from the side of hr professionals to properly address this matter is a cause for concern as it should be the continuing goal of behavioural scientists to study, understand and positively influence employee job performance, interpreted to be constituted by a structurally interrelated network of latent competencies and latent outcome variables and to be determined by a complex interrelated nomological network (cronbach & meehl, 1955) of latent variables characterising the employee and the organisational context. without a valid understanding of the nomological net of latent variables constituting and determining employee performance, the hr profession is relatively helpless in their attempts to enhance employee performance via a range of interventions. whilst vast strides have been made in the past in many areas of the hr body of knowledge, the failure to bring hr theory and the application thereof up to date and in alignment with the shifting realities of the workplace is threatening the credibility of the profession amidst growing calls from executive boards for hr to demonstrate return on investment. the need for effective leaders from the introductory discussion, it should be clear that the most influential group of employees that currently requires prioritisation is the gen y resource pool that serves as the main feeder source for entry level jobs. perhaps, hr’s most critical responsibility within the context of the impending 5/50 crisis, however, is to create leadership bench strength for the future (lacey & groves, 2014; ulrich, smallwood, & sweetman, 2008). in this regard, the gen y resource pool has a dual purpose (a second role) in that it serves as a feeder pool for industry’s fast-track, or ‘high-flyer’ leadership development programmes as well. the importance of the development and supply of effective leaders is elevated in this discussion by the fact that leadership transcends individual performance contributions by way of potential multiplicative or synergistic (hackman & wageman, 2005) effects on groups or teams. thus, whilst the individual performance contributions of individual gen y employees (the entry role) remain important from the perspective of critical manpower and (functional) skills shortages as the baby boomer workforce starts to retire, our focus on the gen y leadership (the second) role and its nomological network is motivated by the collective performance advantages that could be unlocked by this valuable resource given the impending loss of the core of our managerial talent instead. here we draw on the evolutionary utility of leadership as a phenomenon fundamental to societal growth (toor & ofori, 2008) and highlight the criticality of leadership within the context of universal societal survival needs such as adaptation (van vugt, 2006), the achievement of collective goals (toor & ofori, 2008), conflict resolution, teaching and the promotion of social cohesion (van vugt & ronay, 2013). the reasoning behind this argument is thus simple – effective leaders can mean the difference between outstanding and poor organisational performance (kragt & guenter, 2018; peterson, smith, marorana, & owens, 2003). effective leaders steer organisations to success, inspire and motivate followers, they spearhead change and innovation, develop capability, resolve conflicts and provide a moral compass for employees from which direction is set. poor leaders, on the other hand, can inflict a considerable amount of damage on organisations, demoralise staff and destroy value. one does not need to search far to find examples of how poor (and unethical) leadership in south african society have left destruction in its wake from the looting scandal resulting in vbs mutual bank’s collapse and the electricity supply commission’s (eskom) poor management that has led to neglected maintenance of south africa’s power infrastructure and ultimately the load shedding debacle, to passenger rail agency of south africa’s (prasa) train acquisition blunder and most recently the steinhoff saga. south african business leaders must also address several further challenges in addition to an unethical leadership culture that are unique to this country against the backdrop of an already ailing economy as evidenced by the world economic forum (2019) that ranks the country at number 60 out of 141 countries on economic performance in the world. the south african leadership improvement challenge ‘began in 1994 with the demise of apartheid that placed unprecedented demands upon leaders of organisations in all sectors of society’ (nkomo & kriek, 2011, p. 453). private organisations found themselves thrust into a new world economy and having to compete with global powerhouse firms. the lack of capability to stay one step ahead of global competitors (i.e. external scanning) and to seize upon export-led industrial growth opportunities (i.e. business strategy) were glaringly obvious and are perhaps still currently lacking amongst many senior south african organisational leadership teams today. also, the end of apartheid sparked significant social identity transformation (mayer & louw, 2011) amongst south african citizens undergirded by a significant change in power and relations between races (nkomo & kriek, 2011), a process which has as of yet not entirely run its course. consequently, many south african senior leadership teams remain challenged (i.e. valuing diversity, inclusivity, etc.) in their workplaces by significant tensions in employee–employer relationships (eustace & martins, 2014), cultural conflict and identity issues (mayer & louw, 2011). furthermore, the country is plagued by skills shortages, with the general state of its human capital described as low on productivity, motivation and work ethics (kleynhans, 2006; rasool & botha, 2011). this tasks the senior leadership of organisations to become the architects and drivers of basic workforce capability as well. all of these challenges require strong leadership and high-quality relations between leaders and employees so that they can work together to find the appropriate solutions (eustace & martins, 2014): it is essential to improve leadership… (it is) necessary for improved productivity, market share growth and profitability. this is important, given south africa’s unique position of being an emerging market economy with a diverse workforce, … and an open economy that gives its workforce little protection. (pp. 1–2) leadership theory is still evolving the level at which organisational leadership performs is not the outcome of a random event nor a static condition. it is rather systematically determined by a complex nomological network of latent variables characterising the (graduate) leader and characterising the environment in which the (graduate) leader must operate. effective organisational leadership results from a persistent, purposeful and holistic hr strategy, provided it is rooted in a valid understanding of this nomological network. valid performance theory must guide hr in this leadership development strategy and inform the various hr interventions through which it is implemented to attract, select, engage, develop and retain the services of gen y leadership talent. moreover, such a performance theory will add value to the extent that it can firstly identify the competencies required of future south african leaders (to be used as a competency benchmark or tool for the identification and development of future leaders3), and secondly, if it can empirically link these competencies with a set of generic strategic outcomes that are required of future leaders in organisational settings (to be added to the competency tool in measuring leadership performance and providing formative developmental feedback to burgeoning leaders) too. nevertheless, to inform selection methodologies for more accurate gen y talent selection decisions, to create an employer brand that is attractive and aspirational to gen y, to create leadership development simulations and content that resonate with gen y and to employ engagement and retention strategies that are effective in motivating and retaining the services of this generation, the complex nomological net comprising the inter-related person and contextual variables that influence graduate leader performance must be explicated first. mccracken, currie and harrison (2016) also argue persuasively for the explication of the ‘modern’ graduate nomological network as follows: [g]raduates are often seen as an enigma because their potential is offset by specific challenges such as poor work readiness and unrealistic expectations about the world of work. recent graduates also fall into the generation y category which has different characteristics from other workforce generations… this means those tasked with designing and implementing the right talent management strategy for graduates need to understand the specific nature of the graduate talent pool. (p. 2731) yet, such a network has not been explicated and it is uncertain whether (all of) the gen y trends discerned in the developed world apply to south africa as our society is unique in that it has been socially divided and fragmented, with not all our population groups equally affected by historical events (jonck, van der walt, & sobayeni, 2017) in the past. as a result, we are currently still no closer to a point where we understand what aspects of our leadership development policies and practices should evolve to be effective with this generation in this country, and how gen y employees will respond to such reforms. in the end, given gen y’s potential to affect the wider society, the economy and the political order as they increasingly start taking on influential roles in these domains (holmes, 2013), the paucity of valid scientific knowledge and understanding of this performance relationship between the characteristics and needs of gen y and our leadership development systems is perhaps one of the most important questions perplexing the hr profession in south africa today. nevertheless, it must be noted that many failings of the leadership development systems of today can also be traced back to the fact that most of the research on leadership performance traditionally has been context free (gordon & yukl, 2004; liden & antonakis, 2009; osborn, uhl-bien, & milosevic, 2014; zaccaro & klimoski, 2001). studies investigating the leadership phenomenon as isolated, role-based actions on the part of individuals that ‘exogenously’ impact organisations (lichtenstein et al., 2006, p. 2) in a vacuum (porter & mclaughlin, 2006) explain only a part of the leadership puzzle (gordon & yukl, 2004) and the critical question of how leaders can ‘build and maintain a group (or organisation) that performs well relative to its competition’ (hogan & kaiser, 2005, p. 172) accordingly remains largely unanswered. however, despite many calls for researchers to adopt a more sophisticated and practical perspective by studying leadership in organisational settings and by conceptualising the organisation (or work unit) as an open system entailing complex interactions within larger systems… within which the organisation (or work unit) is embedded, and within which leaders operate as the critical boundary spanners (cross, erns, & pasmore, 2013), ‘such research is rare’ (carter et al., 2020, p. 1). conversely, research that focuses exclusively on the interpersonal processes that take place between leaders and followers, which simply attempt to distil the traits required of effective leaders, or which investigate the most effective leadership styles in relation to different contingencies contributes to fragmentation in the field and fails to describe for industry and organisational leadership development practitioners the richness of the construct in a way that really matters to the bottom line. we are of the belief that this fragmented, context-free approach to leadership research represents the major reason for why industry has never really excelled in producing effective leaders (moldoveanu & narayandas, 2019) as industry looks to academia for guidance on such matters and our response to this hitherto has been at best limited (kragt & guenter, 2018) and incomplete. below we turn to a discussion on how leadership development is routinely implemented in industry, draw attention to some of the current shortcomings of this methodology and offer some suggestions for improvement that ultimately culminates into our suggested approach and framework for the development of a modern graduate leader performance construct. leadership competency models are defective and incomplete competency models are the most frequently used method for informing leadership development (barrett & beeson, 2002; conger & ready, 2004; croft & seemiller, 2017), and in combination with the rise in the popularity of 360-degree feedback, which is built entirely around competencies, these two tools provide the development architecture for most if not all executive fast-track programmes today. simply put, practitioners frequently use the term competency model to refer to a set of competencies (rather broadly defined as attributes, knowledge, skills and abilities) used to align individual behaviour with organisational goals, create clear expectations and guide (by way of 360-degree feedback) development as leaders (in training) progress along the organisational (croft & seemiller, 2017; spencer & spencer, 1993) hierarchy. the general presumption is that the demonstration of the competencies included in the ‘model’, and at the required level, will lead to performance in the job or role for which the model was created, which in effect equates to a (rather frail it must be said) job performance theory. conger and ready (2004) explain that competency models provide at least three other critical benefits, namely clarity (clear expectations of the behaviours for those in leadership roles), consistency (a common framework for communicating and implementing leadership development) and connectivity (foundational metrics for informing many of the other hr interventions such as remuneration, succession, etc.). one major problem with the contemporary use of competency models for leadership development, however, is the fact that there is a discrepancy between the leadership competencies that organisations need, and those that executive development programmes often target to enhance or develop (fernandez-aroaoz, graysberg, & nohria, 2011; narayandas & moldoveanu, 2019). this dilemma is fuelled by two inefficient practices of contemporary competency modelling methodology. firstly, there are multitudes of different leadership competency models in circulation, and whilst some might be well-researched and of a high standard, many unfortunately are not. much of the blame for the diluted proliferation of competency models can be placed on the training providers, digital start-ups and a host of other newcomers to the leadership development industry who offer quick-fix, customisable solutions that lack depth and substance. disintermediation has occurred (narayandas & moldoveanu, 2019), according to which universities, business schools and management consultancies that served as able intermediaries (or gatekeepers) of research on leadership competencies in the past, is now bypassed altogether. secondly, executive education often also targets the development of the incorrect (i.e. criterion deficient or criterion contaminated) competencies because of the use of competency libraries that are used as input to ‘develop’ leadership competency sets. these are universal lists of competencies typically created by consulting houses that are assumed to be related, in some way or another, to all conceivable jobs and organisational roles, and practitioners frequently use these to select the competencies that they deem relevant when ‘developing’ a competency model (campion et al., 2011) for a job or leadership role. however, it is highly unlikely that the human mind can project, process, comprehend and integrate all the relevant factors that impact leadership performance based on this haphazard approach in such a way so as to distil from the competency models that are optimal for impacting (leadership) performance on the job. conversely, it is also highly likely that the (definitions of the) competencies included in competency libraries are too broad and, as such, fail to effectively capture the intricacies of the performance domain of a leadership role. within this context, competency models are thus essentially used as lexicons or semantic frameworks, and certainly do not constitute validated psychometric measures. from a pure (performance) measurement and prediction perspective, many contemporary competency models, therefore, lack both validity and reliability in the work environment and there is a substantial and questionable gap between the many claims and actual measurement and prediction benefits delivered by such (limited) models. excluded here of course are the competency model variants that have been developed for specific work contexts and that manage to combine the various elements of competency frameworks into more meaningful, persuasive job performance hypotheses (bartram, 2005). regardless, the development of competency models in general is difficult and time-consuming, and the derivation of leadership competencies particularly challenging, because the focus here is not on functional competencies, but rather on meta-competencies (competencies that underpin or allow for the development of other competencies; van der merwe & verwey, 2007), affective and perceptual competencies (e.g. regulating affective states and moods in response to the context, content and constraints of the situation; boyatzis, goleman and rhee, cited in bar-on and parker [2000]) and self-regulation or self-command competencies (i.e. do-this-now, do-this-first or do-that-not-this, stuss, 2011). maybe more importantly, a second problem with the use of competency models in general, irrespective of the quality of research and explication methodologies underlying it, relates to structural conjectural shortcomings. competency models in their simplest form are often (too narrowly) described as ‘a simple list or catalogue, specifying desirable competencies’ (markus, cooper-thomas, & allpress, 2005, p. 117). the underlying goal is then for this ‘dictionary’ of competencies or competency model to be used as the foundation for hr departments to plan and guide leadership development interventions. it is, however, highly doubtful that a simple list of competencies that are assumed to all be equally significant in describing success in a job provides a true reflection of the leadership performance domain (and all others for that matter). with reference to the nomological net of employee performance mentioned earlier, modelled as a complex (abstract) network comprising malleable and non-malleable variables characterising employees and malleable (and possibly non-malleable) variables characterising the organisational context that are richly interconnected, we remain unconvinced that a simple list of competencies that are assumed to all be equally significant in describing leadership performance provides a penetrating, valid insight into the nature of the psychological mechanism that regulates success in a leadership role. as a prediction about how complex human and organisational behaviour will interact to affect leadership performance, the assumption of a simple, linear, bivariate and one-way relationship between competencies and leadership success simply does not hold ground. related to this is the problem of deficiency with regard to the performance theory underlying competency models’ performance (outcomes) criterion. despite some support for the notion that there might be a ‘general’ factor in performance that corresponds to the ‘g’ factor in cognitive intelligence (arvey & murphy, 1998; serpico, 2018), here we point to the progress made with regard to the taxonomic structure of job performance growing the awareness that job performance is multidimensional in nature (e.g. borman & brush, 1993; campbell, mccloy, oppler, & sager, 1993; fay & sonnentag, 2010), assert that such an understanding of job performance is beneficial for the evaluation and deeper understanding of leadership performance as well and accordingly plead for such differentiated performance outcomes to be accommodated in the structure of (leadership) competency models. the 18-factor structure of borman and brush (1993), for example, referencing training, coaching, developing subordinates and so on, is particularly attractive when considering how to differentiate the outcome component of leadership performance in terms of various (qualitatively distinctive) outcomes and is, moreover, aligned with contemporary conceptualisations of the role of leaders in organisations (e.g. servant leadership, sendjayay, sarros, & santora, 2008; the human capital developer role of the leadership code, ulrich et al., 2008, etc.) as well. a further possibility within the context of organisational performance, specifically, would be to explicate the various outcomes that leaders should be made responsible for in a business and to then map the competency hypotheses (or model) in relation to each outcome. such a framework would accordingly link a leadership competency model (or explanatory performance model) with an ‘in-series’ work unit (organisation, team or group) competency model, thereby articulating leadership performance from the perspective of what the (graduate) leader does and consequently achieves within the organisational system constitute enabling physical and psychological conditions that augment the performance of the work unit as a whole. regardless, the goal of leadership development cannot be fully attained if feedback on the development of competencies is not directed at specific leadership outcomes and vice versa and competency models that treat leadership performance as an undifferentiated criterion embed this debility in industry. to accurately model this interaction and in line with extant research or thoughts on job performance theory that acknowledges both an outcome and behavioural (or process) component (borman & motowidlo, cited in borman & schmitt, 1993; campbell et al., cited in borman & schmitt, 1993; roe, cited in cooper & robertson, 1999) of job performance, we therefore suggest that an additional domain should be formally added to (leadership) competency models, namely a differentiated (competency) outcomes domain. a further, perhaps lesser, scourge of competency models that are negatively impacting on our leadership development efforts relates to the conceptualisation of competencies (see table 1). regardless of whether authors refer to competencies as skills, knowledge, abilities, values or behavioural repertoires, however, a specific disagreement exists here in one key matter, namely whether competencies refer to non-malleable factors such as traits or attributes, to malleable or learnable behaviour or both. yet as we have discussed earlier, many hr practitioners and consultants involved with leadership development do not shy away from a ‘mixing and matching’ approach by co-opting various competencies from various off-the-shelf competency libraries, and it is, therefore, very likely that one will encounter many competency models in use today that include a combination of both competencies framed as ‘innate’ performance constructs and competencies framed as ‘learned constructs’. in our opinion, the inclusion of non-malleable factors (i.e. innate traits or attributes) in a competency model (narrowly defined) confounds its use and poses an ethical dilemma, particularly within the context of leadership development. for one, if leadership development depends on feedback on the demonstration of competencies, and competencies are defined in terms of non-malleable constructs, then the feedback mechanism becomes moot as innate traits are quasi-impossible to learn or teach (buckingham & vosburgh, 2001; wortman, lucas, & donellan, 2012). trait theory at least has demonstrated that personality traits (e.g. conscientiousness or extraversion), as one example of how the competency construct can be misconstrued, remain relatively stable over time (terracciano, mccrae, & costa, 2008) and we, therefore, question the effectiveness of competency models (narrowly defined) that articulate leadership competencies in terms of innate cognitive ability (tansley, harris, stewart, & turner, 2006), attitude or character (michaels, handfield-jones, & axelrod, 2001) on this basis. table 1: influenctial competency definitions. moreover, if antecedent variables such as personality traits are included under the banner of competencies for the purpose of leadership development, the entire endeavour could be seen to promote indoctrination, as the purpose then shifts to the conditioning of trainees to (in a sense) operate in a way that might be completely internally self-conflicting. for example, if a trainee naturally has a very humble disposition and leadership development training emphasises the demonstration of an ‘assertiveness’ competency, tacitly this implies behavioural conditioning (especially if rewards and salary increases are dependent on this) in that it prompts him or her to behave in ways that are contrary to or incompatible with who he or she really is. this naturally has ethical and moral implications for leadership development, which ultimately will affect the success of leadership development programmes as well. having said this, we acknowledge that many authors propose that knowledge, skills, attitudes, motives, beliefs, traits and other underlying characteristics should also (or should rather) be considered as competencies. however, our belief is that these factors should not be explicitly included under the definition of competencies, but rather be modelled as antecedents to competencies and as a qualitatively distinct category of latent variables forming part of a (broader and overarching) competency model. a too-encompassing definition of competencies precludes the possibility of utilising the distinction between latent variables characterising what the leader does and (antecedent) latent variables characterising who the leader is, for the purpose of explanation and measurement. conversely, these factors should rather be regarded as an individual’s potential to perform certain behaviours (or master behavioural repertoires) well – and is thus argued to logically fall within a different, ‘up-stream’ domain of a leadership competency model, a domain we refer to as the competency potential domain. in following this line of reasoning, we further contend that competencies should thus be defined along the lines proposed by bartram (2005), namely that competencies ‘are sets of behaviours (that are influenced by competency potential variables and) that are instrumental in the delivery of the desired results or outcomes’ (p. 1187). this line of reasoning accordingly also implies a competency model constituted by a three-domain ‘in-series’ chain of variables that logically flows from competency potential to competencies, and finally to competency results or outcomes – that all combine into a more meaningful, persuasive job performance hypothesis. the way forward: the development of a comprehensive graduate leader competency model whilst there is a plethora of research available that rather disjointedly explain or describe the leadership phenomenon in terms of who the leader is, what the leader does or from the perspective of the processes by which leaders shape or influence followers, we target a unique perspective on the development of a ‘functional approach’ to leadership that unpacks all of the possible variables relevant to a leaders’ enabling roles as (senior) business managers in generic organisational settings, thus answering the call from a number of authors who have bemoaned the lack of depth in research in this particular area (carter et al., 2020; hogg & van knippenberg, 2003; howell & shamir, 2005; rosenbach, taylor, & yound, 2018). at the same time, the intention is to evolve contemporary practices of competency modelling by unpacking all the performance variables that are relevant to a (graduate) leader’s performance at work and carefully arranging these into a conceptually broader and shrewder (job performance hypothesis) framework reflecting our blueprint for a more advanced ‘competency model’, thereby marrying the traditional (narrowly defined) concept of competency models with the idea of a nomological network of performance constructs that can be simultaneously tested, an analysis for which structural equation modelling (sem) is ideally suited. as opposed to univariate and bivariate statistical techniques that are limited in examining relationships between different constructs because of leaving some interactions unexplained (crowley & fan, 1997), sem allows researchers to answer complex research questions and test multivariate models (weston & gore, 2006) by analysing different independent and dependent variables and their effects in a network simultaneously (nunkoo & ramkissoon, 2011). a new structure for (leadership) competency models: the nomological network in terms of a theoretical performance theory framework, we aim to utilise a progressive interpretation of competency modelling based on an expansion of bartram’s (2005) interpretation of a competency model to map the net of performance requirements for effective business leaders (see figure 1) onto the behaviours and outcomes constituting performance in a multi-domain job performance hypothesis (or competency or explanatory structural model). accordingly, leadership performance is conceptualised in terms of a structurally interrelated set of competencies, and outcome variables where the level of competence achieved is determined by a structurally interrelated network of competency potential variables. as depicted in figure 1, each set or domain in the competency model is, moreover, interpreted as representing a qualitatively distinct network of cause-and-effect variables in itself. figure 1: a graphical representation of a chain of cause-and-effect relationships between variables mapped in a three-domain competency model. according to this interpretation then, competency potential variables (referring to rather inflexible dispositions such as intelligence or different aspects of personality, and more malleable attainments such as knowledge or attitudes) are hypothesised to structurally affect competency variables (referring to more malleable behavioural patterns), which in turn, are hypothesised to affect competency results variables (referring to the actual outcomes of leadership behaviour within organisational contexts such as increased follower motivation or follower cohesion). however, this three-domain competency model still fails to acknowledge all of the relevant factors that impact leadership performance as employees do not act in a vacuum but operate within the broader work environment system that are characterised by certain ‘facilitators’ that will assist them in their efforts or indeed also ‘obstacles’ that might make it more difficult for them to behave or perform optimally. in this regard, bartram, robertson and callinan (cited in robertson & callinan, 2002) make reference to competency requirements as well as contextual and situational factors, with the former referring to some of the demands made upon employees to behave in certain ways or to avoid specific behaviours (i.e. the line manager setting goals for an employee) and the latter to other factors in the work environment that shape and direct an employee’s efforts and that ultimately affect his or her ability to demonstrate or produce the desired sets of behaviour (i.e. organisational structure, job characteristics, remuneration systems, etc.). consequently, it can be argued that competency requirements (as influenced by an organisation’s strategy) can exert a main effect on the success with which competencies are displayed at work and it is secondly proposed that different latent variables that define the work environment can exert a main effect on the success with which competencies are displayed at work and also further moderate the impact of competency potential latent variables on the level at which competencies are displayed at work. similarly, it is argued that latent variables that define the work environment can exert a main effect on the outcome (i.e. competency results) latent variables as well as moderate the impact of competencies on outcomes. this line of reasoning is depicted in figure 2. figure 2: a five-domain representation of a competency model. the argument thus far assumes an essentially uni-directional, albeit complex, causal flow in which competency potential latent variables and situational characteristics affect the level of competence that is achieved on competencies, which in turn, affect the standards that are achieved on the outcome latent variables. it, however, seems unlikely that employees (and even possibly the nature of the organisational environment) will remain psychologically unaffected by the success or failures achieved on the outcome latent variables. for example, porter and lawler’s (1968) interpretation of the expectancy theory on motivation suggests that the psychological state of job satisfaction flows from job performance but at the same time also determines performance through its feedback effect on the expectancies and valences associated with performance and with performance outcomes. similarly, it is doubtful whether the psychological state of empowerment, which is defined as an active motivational orientation with regard to an individual’s work role emanating, at least in great part, from an individual’s feeling of being in control at work (boudrias, morin, & lajoie, 2014) can be achieved in the absence of acceptable (or above average to superior) work performance. the position that psychological states and other malleable competency potential variables (as well as malleable situational variables, like a high-performance culture) may in part develop through performance need therefore be captured through feedback loops from the competency and outcome domains to the competency potential domain. this line of reasoning is depicted in figure 3. figure 3: a five-domain representation of a competency model. the integration of theory on leadership performance the research framework as depicted in figure 3 represents a comprehensive blue print for the nomological network constituting the (graduate) leader competency model and defines broad causal pathways between various psychological domains – to be populated by richly connected and multiple antecedent, mediator and outcome variables – that will all draw from a large number of scattered, fragmented theories on leadership performance (e.g. contingency theory, trait theory, transformational theory, etc.) as well as job performance, to articulate a system of intertwined laws (cronbach & meehl, 1955), or an overarching theory that generates testable predictions about (graduate) leader performance. the idea is to set a standard and encourage its universal application and provide scaffolding to elicit future theory-building, thus establishing an evolving knowledge base from which manipulable factors can be identified in various domains (e.g. competency potential, competency and contextual variables) to enhance (graduate) leader performance. at the same time, the new theory will also investigate the leadership phenomena within the context of work unit (organisation, function or team) performance in a way that matters to the organisation’s bottom line, thereby bridging across the different theories on leadership effectiveness to provide context to the leader’s role within organisational settings, a research agenda which to date has been rare (carter et al., 2020). in fact, despite the wealth of leadership performance literature at our disposal, to our knowledge, such a quantitative synthesis of the (graduate) leadership performance nomological network has not yet been conducted at all, at least not in modelling how leadership performance is interlinked with and embedded in the same nomological system as a work unit. in this regard, we intend to conceptually link the leadership competency model with an ‘in-series’ work unit competency model ‘down-stream’, where the competency outcome domains of the leadership competency model double as the competency potential domain of a work unit competency model, that is, the outcomes achieved by the graduate leader (i.e. competency results) simultaneously constitute the levels of malleable work unit competency potential (i.e. collective attitudes, psychological states, cohesion, communication flow, etc.) and the malleable work unit environment characteristics (via competency requirements and situational characteristics) so as to synergistically amplify the collective outcomes (i.e. competency results) eventually achieved by the work unit as a whole. such a model would ultimately constitute an a priori specification of sets of relations amongst competency potential, competency, competency outcome and contextual variables as antecedents, mediators and consequents, allowing for subsequent simultaneous tests of the leadership performance network via confirmatory factor analyses of the structural paths and measurement hypotheses implied by its structure by way of sem.4 finally, the explanatory model is to be tested and validated on gen y (leader) trainees in ascertaining precisely how this generation’s psycho-graphic attributes articulate with the psychological mechanism that regulates (graduate) leader performance in furthering our understanding of how we can ensure the availability of future leaders. for example, gen y’s reported need for rapid career growth (rheeder, 2015) and exposure to challenging and meaningful assignments (baruch, 2004) appear to be naturally compatible with the spirit of graduate acceleration programmes, which should work to the advantage of leadership development interventions that aim to accelerate fresh graduates’ transition into leadership positions in shorter periods of time. if this is true, then some of the traditional contextual variables (e.g. accelerating learning curve, incremental promotions, etc.) under which leadership development takes place would organically capitalise on this type of gen y (competency potential and competency) profile. yet, the reported transactional, mediumand short-term orientation of the modern graduate employee in terms of the psychological contracts they now enter with employers (beddingfield, 2005; kelley-patterson & george, 2002) as well as their demand for greater work or life balance (dwyer & azevedo, 2016) rather strikingly opposes such congruity and makes one wonder whether leadership development programmes that rely on many sacrifices from the side of trainees, including the time and effort associated with intensive and extra-curricular training and a longer-term commitment to stay on in one organisation, will gain traction with this generation at all. if the demand for greater work or life balance, for instance, contributes to gen y not being taken with the idea of becoming part of leadership development programmes in the first place, effort must be put into an investigation into whether other contextual variables (e.g. pay incentives, online self-learning, etc.) can be altered or introduced to counteract or compensate for this gen y (competency potential) preference. ultimately, the systematic exploration of this entire cause and effect system in terms of its compatibility with gen y and its implications for managing gen y are equally vital in leveraging this critical leadership resource for the country’s future. future aims we are arguing in favour of a comprehensive conceptualisation of the (graduate) leader performance construct that encompasses a structurally interrelated competency domain structurally interlinked with a structurally interrelated latent outcome domain as part of a larger explanatory structural model that will provide a valid description of the psychological mechanism that regulates differences in performance across (graduate) leaders. the purpose of the model is to inform proactive and reactive attempts to influence the performance levels of (graduate) leaders. such proactive and reactive interventions must focus on the competencies, the competency potential, the outcomes and the situational variables simultaneously. however, the development and testing of such a comprehensive (five domain) graduate leader competency model is a massive and ambitious undertaking and implies the development of several structural domain models, several to-be-tested hypotheses on how the variables in these different domains relate to each other, as well as the development of construct valid and unbiased measures of the behavioural competencies, competency potential, the outcomes of (graduate) leaders and the contextual variables that impact leadership performance at work, respectively. consequently, and in order to prioritise a comprehensive explication of one of these domains as the starting point for a future larger set of studies, a decision was made to focus on the (1) derivation of a structural model depicting the competency domain (behaviour) of graduate leader performance, (2) development of an instrument (the pienaar graduate leader competency questionnaire – or pglcq) that can be used to measure graduate leaders’ standing on these graduate leader competencies and (3) examining the psychometric properties of the pglcq. the competency domain of this broader competency model is the logical starting point as the explication of the domain positioned in the middle of the broader nomological network would yield important information to in future explicate the other two domains lying ‘up-stream’ (i.e. competency potential) and ‘down-stream’ (i.e. competency outcomes) of it, and inform hypotheses about the contextual variables that could facilitate or hinder performance on these competency, competency potential and competency outcome variables. however, despite the focus on the competency domain and the validation of the pglcq, it will nonetheless be necessary to explicate the partial competency model that maps the competencies on the competency results (outcomes) of graduate leader performance too, that is, one cannot hypothesise on the behaviours (i.e. the independent variables) required of leaders if one does not have clarity on what generic outcomes leaders should be responsible for in organisational settings (i.e. the dependent variables) and setting the benchmark for this first. consequently, the enabling physical and psychological conditions that would facilitate superior performance in collective groups (or organisations) and how we can merge this in coming to a meeting point with the literature on leadership (process or behavioural) performance will need to be factored into the new leadership performance theory as well. the research initiating questions associated with the explication of the graduate leader performance construct can consequently be framed as follows: what is the connotative meaning of the graduate (leader) performance construct interpreted behaviourally? what is the denotative meaning of the graduate (leader) performance construct interpreted behaviourally? does the pglcq5 utilising these denotations as stimuli provide a reliable and construct valid measure of the to-be-measured construct as constitutively defined? conclusion with the imminent retirement of the baby boomers and thus the loss of the world’s most senior management talent, a global leadership crisis is unfolding. human resource practitioners and researchers have a vital role to play in assisting industry to improve the attraction, selection, development and engagement of the newest generation to enter the workplace so as to maximise the potential of this critical leadership talent pool. although a mature field, the leadership discourse nonetheless remains fragmented and is limited in respect to valid theories on how leaders, as boundary spanners within organisational systems, can positively impact the fortunes of collective groups or teams. in south africa, the absence of such research is particularly disparaging given the unique challenges our economy is facing and the wealth of untapped potential that can be unlocked by effective leadership. the lack of a coherent theory on leadership performance in organisational settings is also preventing us from gaining a more accurate picture of how our leadership development technology should be adapted to resonate with and capitalise on the competency potential and competency profiles of prospective gen y leaders in the country. consequently, in this article, we outlined a proposal for the development of a modern graduate leader performance construct as one hr solution for dealing with these challenges. we outlined a broad structure for research on all of the variables that impact on leadership performance by suggesting the use of an expanded form of a competency model that is merged with the concept of a nomological network to comprehensively model the psychological mechanism that regulates (graduate) leader performance at work. at the same time, the intention is to contextualise the leadership phenomenon by conceptualising the construct ‘in-series’ with a work unit competency model, thereby factoring in an explanation for the manner in which leaders can optimise and synergise team or group functioning, or performance at a collective (work unit) level. such a broader study that accurately maps the competency potential, competency, competency results and contextual variables and the manner in which they simultaneously impact on leadership performance, however, is ambitious and will be time-consuming. we suggest that the explication of this entire graduate leader performance space, therefore, be approached in phases, commencing with the explication of the competency domain first. insight into this domain will not only extemporarily provide a scaffolding with which to explicate the other domains in the comprehensive model but will also provide clarity on the competency set (measurements) that need to be targeted for leadership development interventions as the main short-term gain to be derived from such a study. moreover, clarity on the competency set (measurements) underlying successful leadership will serve to inform the identification and utilisation of specific contexts and environments (i.e. fidelity, meyer, wong, timson, & prefect, 2012) that will make the transfer of training and learning more likely and effective. acknowledgements the content of this article is directly obtained from phd research dissertation to dr jacques pienaar. competing interests the authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. authors’ contributions j.s.p. and c.c.t. were responsible for conceptualisation, methodology, formal analysis, investigation, visualisation, project management, writing of the original draft, reviewing and editing thereof. c.c.t. was the supervisor for this research spanning from 2016 to 2020 and assisted with resources. ethical considerations approval to conduct the study was obtained from stellenbosch university’s research ethics committee. no ethical certificate was required. this article followed all ethical standards for research without direct contact with human or animal subjects. funding information this research received no specific grants from any funding agency in the public, commercial or not-for-profit sectors. data availability data are available on secure servers at the stellenbosch university. the purpose of this article is to argue the need for the development of a competency measure of graduate leader performance. the actual results (data) of the psychometric evaluation of the instrument will be provided in a future article (not necessarily to be published in ajopa). disclaimer the views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any affiliated agency of the authors. references arvey, r.d., & murphy, k.r. (1998). performance evaluations in work settings. annual review of psychology, 49, 141–168. https://doi.org/10.1146/annurev.psych.49.1.141 athey, t.r., & orth, m.s. (1999). emerging competency models for the future. human resource management, 38(3), 215–226. https://doi.org/10.1002/(sici)1099-050x(199923)38:3<215::aid-hrm4>3.0.co;2-w barrett, a., & beeson, j. (2002). developing business leaders for 2010. new york, ny: the conference board. bartram, d. (2004). assessment in organisations. applied psychology: an international review, 53(2), 237–259. https://doi.org/10.1111/j.1464-0597.2004.00170.x bartram, d. (2005). the great eight competencies: a criterion-centric approach to validation. journal of applied psychology, 90(6), 1185–2103. https://doi.org/10.1037/0021-9010.90.6.1185 bartram, d., robertson, i.t., & callinan, m. (2002). a framework for examining organisational effectiveness. in i.t. robertson, m. callinan, & d. bartram (eds.), organisational effectiveness: the role of psychology (pp. 1–10). chichester: john wiley and sons. baruch, y. (2004). transforming careers: from linear to multidirectional career paths: organisational and individual perspectives. career development international, 9(1), 58–73. https://doi.org/10.1108/13620430410518147 beddingfield, c. (2005). transforming the roi of your graduate scheme. industrial and commercial training, 37(4), 199–203. bencsik, a., horváth-csikós, g., & tímea, j. (2016). y and z generations at workplaces. journal of competitiveness, 8(3), 99–106. https://doi.org/10.7441/joc.2016.03.06 borman, w.c., & brush, d.h. (1993). more progress toward a taxonomy of managerial performance requirements. human performance, 6(1), 1–21. https://doi.org/10.1207/s15327043hup0601_1 borman, w.c., & motowidlo, s.j. (1993). expanding the criterion domain to include elements of contextual performance. in w.c. borman & n. schmitt (eds.), personnel selection in organizations (pp. 71–98). san francisco, ca: jossey-bass publishers. boudrias, j., morin, a.j.s., & lajoie, d. (2014). directionality of the associations between psychological empowerment and behavioural involvement: a longitudinal autoregressive cross-lagged analysis. journal of occupational and organisational psychology, 87(3), 437–462. https://doi.org/10.1111/joop.12056 boyatzis, r.e. (1982). the competent manager: a model for effective performance. new york, ny: wiley-interscience. boyatzis, r.e., goleman, d., & rhee, k.s. (2000). clustering competence in emotional intelligence: insights from the emotional competence inventory. in r. bar-on & j.d.a. parker (eds.), the handbook of emotional intelligence: theory, development, assessment, and application at home, school, and in the workplace (pp. 343–362). new york, ny: jossey-bass. buckingham, m., & vosburgh, r.m. (2001). the 21st century human resources function: it’s the talent, stupid! human resource planning, 24(4), 17–23. campbell, j.p., mccloy, r.a., oppler, s.h., & sager, c.e. (1993). a theory of performance. in w.c. borman & n. schmitt (eds.), personnel selection in organisations (pp. 35–70). san francisco, ca: jossey-bass. campion, m.a., carr, l., fink, a.a., ruggerberg, phillips, g.m., & odman, r.b. (2011). doing competencies well: best practices in competency modelling. personnel psychology, 64(1), 225–262. https://doi.org/10.1111/j.1744-6570.2010.01207.x caraher, l. (2015). millennials & management: the essential guide to making it work at work. new york, ny: routledge. carter, d.r., cullen, k.l., jones, j.m., gerbasi, a., chrobot-mason, d., & nae, e.y. (2020). functional leadership in interteam contexts: understanding ‘what’ in the context of why? where? when? and who? the leadership quarterly, 31(1), 101378. https://doi.org/10.1108/cdi-10-2014-0143 chen, h.c., & naquin, s.s. (2006). an integrative model of competency development, training design, assessment center, and multi-rater assessment. advances in developing human resources, 8(2), 265–282. https://doi.org/10.1177/1523422305286156 chin, w.s., & rasdi, r.m. (2014). protean career development: exploring the individuals, organizational and job-related factors. asian social sciences, 10(21), 203. https://doi.org/10.5539/ass.v10n21p203 clarke, m. (2015). dual careers: the new norm for gen y professionals? career development international, 20(6), 562–582. https://doi.org/10.1108/cdi-10-2014-0143 combes, b. (2009). generation y: are they really digital natives or more like digital refugees? synergy, 7(1), 401–408. conger, j.a., & ready, d.a. (2004). rethinking leadership competencies. leader to leader, 2004(32), 41–47. https://doi.org/10.1002/ltl.75 connor, h., & shaw, s. (2008). graduate training and development: current trends and issues. education & training, 50(5), 357–365. https://doi.org/10.1108/00400910810889048 cook, v.s. (2016). engaging generation z students. retrieved from https://sites.google.com/a/uis.edu/colrs_cook/home/engaging-generation-z-students croft, l., & seemiller, c. (2017). developing leadership competencies. new directions for student leadership, 2017(156), 7–18. https://doi.org/10.1002/yd.20267 cronbach, l.j., & meehl, p.e. (1955). construct validity in psychological tests. psychological bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957 cross, r., ernst, c., & pasmore, b. (2013). a bridge too far? how boundary spanning networks drive organisational change and effectiveness. organisational dynamics, 42(2), 81–91. https://doi.org/10.1016/j.orgdyn.2013.03.001 crowley, s.l., & fan, x. (1997). structural equation modelling: basic concepts and applications in personality assessment research. journal of personality assessment, 68(3), 508–531. https://doi.org/10.1207/s15327752jpa6803_4 crumpacker, m., & crumpacker, j.m. (2007). succession planning and generational stereotypes: should hr consider age-based values and attitudes a relevant factor or passing fad? public personnel management, 3(4), 349–369. culiberg, b., & mihelič, k.k. (2016). three ethical frames of reference: insights into millennials’ ethical judgements and intentions in the workplace. business ethics: a european review, 25(1), 94–111. https://doi.org/10.1111/beer.12106 d’amato, a., & herzfeldt, r. (2008). learning orientation, organizational commitment and talent retention across generations: a study of european managers. journal of managerial psychology, 23(8), 929–953. https://doi.org/10.1108/02683940810904402 dwyer, r.j., & azevedo, a. (2016). preparing leaders for the multigenerational workforce. journal of enterprising communities: people and places in the global economy, 10(3), 281–305. https://doi.org/10.1108/jec-08-2013-0025 erickson, t. (2008). plugged in: the generation y guide to thriving at work. boston: harvard business press. erikson, t., alsop, r., nicholson, p., & miller, j. (2009). gen y in the workforce. harvard business review, 87(2), 43–49. eustace, a., & martins, n. (2014). the role of leadership in shaping organisational climate: an example from the fast-moving consumer goods industry. south african journal of industrial psychology, 40(1), 1–14. https://doi.org/10.4102/sajip.v40i1.1112 fay, d., & sonnentag, s. (2010). a look back to move ahead: new directions for research on proactive performance and other discretionary work behaviours. applied psychology, 59(1), 1–20. https://doi.org/10.1111/j.1464-0597.2009.00413.x fernandez-araoz, c., graysberg, b., & nohria, n. (2011). how to hang on to your high potentials. harvard business review, 81(2), 76–83. financial executives research foundation (ferf). (2016). creating a leadership pipeline: developing the millennial generation into finance leaders. morristown: financial executives research foundation robert half. francis, t., & hoefel, f. (2018). ‘true gen’: generation z and its implications for companies. mckinsey&company. retrieved august 1, 2020 from https://www.mckinsey.com/industries/consumer-packaged-goods/our-insights/true-gen-generation-z-and-its-implications-for-companies gilbert, j. (2011). the millennials: a new generation of employees, a new set of engagement policies. ivey business journal, 75(5), 26–28. glass, a. (2007). understanding generational differences for competitive success. industrial and commercial training, 39(2), 98–103. https://doi.org/10.1108/00197850710732424 gordon, a., & yukl, g. (2004). the future of leadership research: challenges and opportunities. german journal of human resource research, 18(3), 359–365. graen, g.b., & schiemann, w.a. (2013). leadership-motivated excellence theory: an extension of lmx. journal of managerial psychology, 28(5), 452–469. https://doi.org/10.1108/jmp-11-2012-0351 hackman, j.r., & wageman, r. (2005). when and how team leaders matter. research in organisational behaviour, 26, 37–74. https://doi.org/10.1016/s0191-3085(04)26002-6 hagemann, b., & stroope, s. (2013). developing the next generation of leaders. industrial and commercial training, 45(2), 123–126. hanson, a.r., & gulish, a. (2016). from college to career: making sense of the post-millennial job market. the george town public policy review, 21(1), 1–22. hershatter, a., & epstein, m. (2010). millennials and the world of work: an organisation and management perspective. journal of business and psychology, 25, 211–223. https://doi.org/10.1007/s10869-010-9160-y hills, c., ryan, s., smith, d.r., & warren-forward, h. (2012). the impact of ‘generation y’ occupational therapy students on practice education. australia occupational therapy journal, 63(6), 391–398. hira, n. (2007). you raised them now manage them. fortune, 155(9), 38–48. hogan, r., & kaiser, r.b. (2005). what we know about leadership. review of general psychology, 9(2), 169–180. https://doi.org/10.1037/1089-2680.9.2.169 hogg, m.a., & van knippenberg, d. (2003). social identity and leadership processes in groups. in m.p. zanna (ed.), advances in experimental social psychology (vol. 35, pp. 1–52). san diego: elsevier academic press. holmes, a., & miller, s. (2000). a case for advanced skills and employability in higher education. journal of vocational education and training, 52, 653–664. https://doi.org/10.1080/13636820000200145 holmes, l. (2013). competing perspectives on graduate employability: possession, position, or process? studies in higher education, 38(4), 538–554. howell, j.m., & shamir, b. (2005). the role of followers in the charismatic leadership process: relationships and their consequences. academy of management review, 30(1), 96–112. https://doi.org/10.5465/amr.2005.15281435 hurst, j.l., & good, l.g. (2009). generation y and career choice: the impact of retail career perceptions, expectations and entitlement perceptions. career development international, 14(6), 570–593. jonck, p., van der walt, f., & ntomzodwa, c.s. (2017). a generational perspective on work values in a south african sample. sa journal of industrial psychology, 43(1), 1–9. https://doi.org/10.4102/sajip.v43.1393 karefalk, a., petterson, m., & zhu, y. (2007). how to motivate generation y with different cultural backgrounds. unpublished doctoral dissertation. kristianstad: kristianstad university. kelley-patterson, d., & george, c. (2002). mapping the contract: an exploration off the comparative expectations of graduate employees and human resource managers within the hospitality, leisure and tourism industries in the united kingdom. journal of services research, 2(1), 55–74. kleynhans, e.p.j. (2006). the role of human capital in the competitive platform of south african industries. sa journal of human resource management, 4(3), 55–62. kragt, d., & guenter, h. (2018). why and when leadership training predicts effectiveness: the role of leader identity and leadership experience. leadership & organisational development journal, 39(3), 406–418. kupperschmidt, b.r. (2000). multigeneration employees: strategies for effective management. the health care manager, 19(1), 65–76. https://doi.org/10.1097/00126450-200019010-00011 lacey, m.y., & groves, k. (2014). talent management collides with corporate social responsibility: creation of inadvertent hypocrisy. journal of management development, 33(4), 399–409. laundrum, s. (2016). how millennials are changing how we view success. retrieved from http://www.forbes.com/sites/sarahlandrum/2016/12/30/how-millennials-are-changing-how-we-view-success/ lichtenstein, b.b., uhl-bien, m., marion, r., seers, a., orton, j.d., & schreiber, c.j. (2006). complexity leadership theory: an interactive perspective on leading in complex adaptive systems. emergence, complexity, and organisations, 8(4), 2–12. liden, r.c., & antonakis, j. (2009). considering context in psychological leadership research. human relations, 62(11), 1587–1605. https://doi.org/10.1177/0018726709346374 mangelsdorf, m. (2015). von babyboomer bis generation z: der richtige umgang mit unterschiedlichen generationen im unternehmen. offenbach: gabal. martin, c.a., & tulgan, b. (2011). managing generation y: global citizens born in the late seventies and early eighties. amherst: hrd press. markus, l.h., cooper-thomas, h.d., & allpress, k.n. (2005). confounded by competencies? an evaluation of the evolution and use of competency models. new zealand journal of psychology, 34(2), 117–126. martin, c.a. (2005). from high maintenance to high productivity: what managers need to know about generation y. industrial and commercial training, 37(1), 39–44. mayer, c.h., & louw, l. (2011). managerial challenges in south africa. european business review, 23(6), 572–591. https://doi.org/10.1108/09555341111175417 mccracken, m., currie, d., & harrison, j. (2016). understanding graduate recruitment, development and retention for the enhancement of talent management: sharpening ‘the edge’ of graduate talent. international journal of human resource management, 27(22), 2727–2752. https://doi.org/10.1080/09585192.2015.1102159 mccrindle, m. (2006). new generations at work: attracting, recruiting, retraining, and training generation y. new south wales: mccrindle research. meyer, g.f., wong, l.t., timson, e., & prefect, p. (2012). objective fidelity evaluation in multisensory virtual environments: auditory cue fidelity in flight simulation. plos one, 7(9), 1–14. https://doi.org/10.1371/journal.pone.0044381 michaels, e., handfield-jones, h., & axelrod, b. (2001). the war for talent. brighton, ma: harvard business press. miner, n. (2019). as baby boomers near retirement, companies risk a leadership shortage. retrieved from http://www.forbes.com/sites/forbescoachescouncil/2019/10/15/as-baby-boomers-near-retirement-companies-risk-a-leadership-shortage/?sh=502efbc51f9a moldoveanu, m., & narayandas, d. (2019). the future of leadership development. harvard business review, 97(4), 40–48. naim, m.f., & lenka, u. (2018). how does mentoring contribute to gen y employees’ intention to say? an indian perspective. europe’s journal of psychology, 13(2), 314–335. https://doi.org/10.5964/ejop.v13i2.1304 nkomo, s.m., & kriek, d. (2011). leading organisational change in the ‘new’ south africa. journal of occupational and organisational psychology, 84(3), 453–470. https://doi.org/10.1111/j.2044-8325.2011.02020.x nunkoo, r., & ramkissoon, h. (2011). residents’ satisfaction with community attributes and support for tourism. journal of hospitality & tourism research, 35(2), 171–190. olšovská, a., mura, l., & švec, m. (2015). the most recent legislative changes and their impact on interest by enterprises in agency employment: what is next in human resource management? problems and perspectives in management, 13(3), 47–54. osborn, r., uhl-bien, m., & milosevic, i. (2014). the context ad leadership. in d. day (ed.), the oxford handbook of leadership and organisations (pp. 589–612). oxford: oxford university press. park, y., & rothwell, w.j. (2009). the effects of organizational learning climate, career-enhancing strategy, and work orientation on the protean career. human resource development international, 12(4), 387–405. https://doi.org/10.1080/13678860903135771 parry, s.b. (1998). just what is a competency? (and why should you care?). training, 35, 58–64. pauw, k., bhorat, h., goga, s., ncube, l., oosthuizen, m., & van der westhuizen, c. (2006). graduate unemployment in the context of skills shortages in education and training: findings from a firm survey. retrieved from http://open.uct.ac.za/bitstream/handle/11427/7345/dpru_wp06-115.pdf?sequence=1 peterson, r.s., smith, d.b., martorana, p.v., & owens, p.d. (2003). the impact of chief executive officer personality on top management team dynamics: one mechanism by which leadership affects organizational performance. journal of applied psychology, 88(5), 795–808. https://doi.org/10.1037/0021-9010.88.5.795 pihlak, k. (2018). the relationship between employer brand and intention to apply. published honours dissertation. manchester: university of manchester. porter, l.w., & lawler, e.e. (1968). managerial attitudes and performance. homewood, il: dorsey. porter, l.w., & mclaughlin, g.b. (2006). leadership and the organisational context: like the weather? leadership quarterly, 17(6), 559–576. https://doi.org/10.1016/j.leaqua.2006.10.002 prensky, m. (2001). digitial natives, digital immigrants. on the horizon, 9(5), 1–6. https://doi.org/10.1108/10748120110424816 pwc. (2011). millennials at work: reshaping the workplace report. retrieved august 1, 2020 from https://www.pwc.com/co/es/publicaciones/assets/millennials-at-work.pdf rasool, f., & botha, c.j. (2011). the nature, extent and effect of skills shortages on skills migration in south africa. sa journal of human resource management, 9(1), 1–12. rheeder, k. (2015). leadership it is a changing! [paper presentation]. international conference on economic, finance and management outlooks, 27–28 july 2015. bangkok, thailand. risher, h. (2008). planning and managing tomorrow’s pay programs: demographic shifts may trigger deep changes in traditional pay plans. compensation and benefits review, 40(4), 30–36. https://doi.org/10.1177/0886368708321524 roe, r.a. (1999). work performance: a multiple regulation perspective. in c.l. cooper, & i.t. robertson (eds.), international review of industrial and organisational psychology (vol. 14, pp. 231–335). chichester: wiley. rosenbach, w.e., taylor, r.l., & youndt, m.a. (2018). contemporary issues in leadership (7th ed.). new york: routledge. schwab, k., & world economic forum. (2019). global competitiveness report 2019. retrieved august 1, 2020 from http://www3.weforum.org/docs/wef_theglobalcompetitivenessreport2019.pdf seemiller, c., & grace, m. (2019). generation z: a century in the making. oxon: routledge. sendjayay, s., sarros, j.c., & santora, j.c. (2008). defining and measuring servant leadership behaviour in organisations. journal of management studies, 45(2), 402–424. serpico, d. (2018). what kind of kind is intelligence? philosophical psychology, 31(2), 232–252. https://doi.org/10.1080/09515089.2017.1401706 sharma, l. (2012). generation y at workplace. human resource management, v, 74–78. silvestri, r.f. (2013). building leaders through planned executive development. leader to leader, 2013(68), 19–26. https://doi.org/10.1002/ltl.20070 smith, s.d., & galbraith, q. (2012). motivating millennials: improving practices in recruiting, retaining, and motivating younger library staff. the journal of academic librarianship, 38(3), 135–144. spencer, l.m., & spencer, s.m. (1993). competence at work: models for superior performance. new york, ny: john wiley & sons. squyres, d. (2020). prioritising the forgotten generation: why organisations should make boomers a key part of their talent acquisition strategy in 2020. strategic hr review, 19(3), 99–102. stuss, d.t. (2011). functions of the frontal lobes: relation to executive functions. journal of the international neuropsychological society, 17, 1–7. https://doi.org/10.1017/s1355617711000695 sylvester, j. (2015). creating and maintaining an engaged generation y workforce – why it matters and what to do. strategic hr review, 14(4), 151. tansley, c., harris, l., stewart, j., & turner, p. (2006). talent management: understanding the dimensions. change agenda. london: chartered institution of personnel and development (cipd). terracciano, a., mccrae, r.r., & costa, p.t. (2008). personality traits: stability and change with age. geriatrics and aging, 11(8), 474–478. toor, s.r., & ofori, g. (2008). leadership vs. management: how they are different and why! journal of leadership and management in engineering, 8(2), 61–71. https://doi.org/10.1061/(asce)1532-6748(2008)8:2(61) ulrich, d., smallwood, n., & sweetman, k. (2008). the leadership code: five rules to lead by. boston, ma: harvard business press. van der merwe, l., & verwey, a. (2007). leadership meta-competencies for the future world of work. sa journal of human resource management, 5(2), 33–41. https://doi.org/10.4102/sajhrm.v5i2.117 van der wal, z. (2017). the 21st century public manager: the public management and leadership series. london: palgrave. vanmeter, r.a., douglas, g.b., chonko, j.w., & roberts, j.a. (2013). generation y’s ethical ideology and its potential workplace implications. journal of business ethics, 117(1), 93–109. https://doi.org/10.1007/s10551-012-1505-1 van vugt, m. (2006). evolutionary origins of leadership and followership. personality and social psychology review, 10(4), 354–371. van vugt, m., & ronay, r. (2013). the evolutionary psychology of leadership: theory, review, and roadmap. organisational psychology review, 4(1), 74–95. https://doi.org/10.1177/2041386613493635 weston, r., & gore, p.a. jr. (2006). a brief guide to structural equation modelling. the counselling psychologist, 34(5), 719–751. https://doi.org/10.1177/0011000006286345 wong, m., lang, w., gardiner, e., & coulon, l. (2008). generational differences in personality and motivation: do they exist and what are the implications for the workplace? journal of management psychology, 23(8), 878–890. https://doi.org/10.1108/02683940810904376 wortman, j., lucas, r.e., & donnellan, m.b. (2012). stability and change in the big five personality domains: evidence from a longitudinal study of australians. psychology and aging, 27(4), 867–874. https://doi.org/10.1037/a0029322 yrle, a.c., hartman, s.j., & payne, d.m. (2005). generation x: acceptance of others and teamwork implications. team performance management, 11(516), 188–199. https://doi.org/10.1108/13527590510617765 zaccaro, s., & klimoski, r. (2001). the nature of organizational leadership: an introduction. san francisco, ca: jossey-bass. zang, t., lu, c., & murat, k. (2017). engaging generation y to co-create through mobile technology. international journal of electronic commerce, 21(4), 489–516. https://doi.org/10.1080/10864415.2016.1355639 footnotes 1. some authors’ timelines differ, with many having contrasting opinions. for example, in their research, francis and hoefel (2018) label the generation born between 1995 and 2010 as gen z. for this article, however, such precise detail and differences in opinion are of secondary relevance. 2. employers these days expect, apart from academic capabilities and degrees, and given new organisational and technological work models that have evolved (e.g. lean production, internally flexible organisation, the learning organisation, etc.) which impose fundamental shifts in the working competencies required by the traditional organisation that graduates should display ability on competencies not directly related to functional (or vocational) task competencies that will facilitate prompt and successful transition from higher education (holmes & miller, 2000). many young graduates lack competence on these more generic competencies (i.e. the graduate employability dilemma). 3. we firstly target the development of a competency questionnaire for future graduate leaders. as will be explained in more detail later in the article, the development of such a questionnaire is necessary first in order to collect data with which we can validate the internal nomological validity of our new proposed competency model. once the hypothesised internal structure has been validated, this will open the possibility of generalising the findings to a competency tool that will have numerous applications in the leadership development space. 4. leaders are responsible not only for the performance of their work unit as a collective but also for the performance of their individual followers. a similar sequential linkage exists between the proposed graduate leader competency model and an individual employee competency model, where the malleable individual employee competency potential latent variables simultaneously are latent outcome variables in the graduate leader competency model. 5. the pglcq measuring the level of competence that graduates achieve on the graduate competencies that constitute success will form the first subscale of an eventual two-scale graduate leader performance battery (glpb), namely the first part. the second subscale of the glpb will be the graduate leader outcome questionnaire (gloq) that will measure the graduate (leader) outcomes achieved at work. this scale will be developed as part of a future study.