13Foxcroft.qxd The Wechsler Scales Published in 1939, the Wechsler-Bellevue Adult Intelligence Scale was developed and standardised by David Wechsler as an alternative to the Stanford-Binet and with a clear purpose of measuring both verbal and non-verbal intellectual ability at the same time. The first major revision of the Wechsler- Bellevue was published in 1955 as the Wechsler Adult Intelligence Scale (WAIS). Further revisions resulted in the publication of the WAIS-R in 1988 and the Wechsler Adult Intelligence Scales, Third edition (WAIS-III) in 1997. During the latter and most recent revision, the test materials, item content and administration procedures were updated and three new subtests (Matrix Reasoning, Letter-Number Sequencing and Symbol Search) were added to the 11 subtests retained from the WAIS-R (Nell, 1999; Wechsler, 1997). While Full Scale, Verbal and Performance IQ scores are still computed, the third edition of the test also provides for grouping the scores of the subtests into more precise domains of cognitive functioning, namely: Verbal Comprehension Index, Percept ual Organisation Index, Working Memory Index, and Processing Speed Index. The introduction of the option of index scores aligns the WAIS-III with advances in neuropsychology and cognitive psychology and reduces testing time, as only those subtests necessary to evaluate a particular domain can be administered (Nell, 1999; Wechsler, 1997). Although dialogue continues about the atheoretical nature of all editions of the Wechsler Intelligence Scales, they continue to be the most widely accepted and administered intelligence scales internationally, and there is little indication that their popularity will diminish significantly in the foreseeable future (Sparrow & Davies, 2000). The Wechsler Intelligence Scales in South Africa In South Africa, attention was initially focused on the adaptation of the Wechsler-Bellevue for the needs of South Africans, even though the test materials were not available in this country and data had to be obtained from Wechsler’s manual, verbal descriptions and unscaled drawings (Huysamen, 1996; Nell, 1994). The adaptation began in 1947 and was initially undertaken by the Bureau of Personnel Research which became the National Institute of Personnel Research (NIPR) in 1948. Normative sampling across English and Afrikaans language groups began in the 1950s. When the first major revision of the Wechsler-Bellevue was published as the Wechsler Adult Intelligence Scale in the United Sates of America in 1955, the NIPR found itself halfway through an expensive adaptation and norming exercise of the original measure. Instead of abandoning its adaptation of the Wechsler-Bellevue and concentrating on standardising the newly published WAIS, the NIPR decided to continue with the standardisation of the Wechsler-Bellevue. The wisdom of this decision has been seriously questioned (Nell, 1994). The NIPR finally completed the adaptation and standardisation of the outdated Wechsler-Bellevue Scales for South Africa in 1969 and the adapted test was published as being the South African Wechsler Adult Intelligence Scale (SAWAIS). Despite the fact that most of the items were nearly three decades old, the South African version was named after the extensively revised version, the WAIS (Nell, 1994; Pieters & Louw, 1987). Although the finished product was considerably divergent from the original Wechsler-Bellevue on which it was based, it was not the WAIS, and did not incorporate the comprehensive modifications made to the Wechsler-Bellevue in 1955. The unintentional misnaming of the measure by the NIPR led to a belief among practising psychologists that the South African adapted Wechsler-Bellevue was, in fact, the WAIS, and this resulted in a reduction in the pressure to rapidly undertake another revision. In 1987, Pieters and Louw publicly criticised the South African version of the Wechsler-Bellevue and exhorted the scientific, research and education communities to urgently consider replacing or re-standardising the test. Their critique cast aspersions on the quality of professional instruction in the field of psychology at local universities, and on the efficacy of the Professional Board for Psycholog y and the then Test Commission of the Republic of South Africa, who were responsible for ensuring that standards in psychometric testing were maintained at international levels (Nell, 1994). In addition, Pieters and Louw (1987) drew attention to the obligation of both the test publisher (the NIPR, which had subsequently been incorporated into the Human Sciences Research Council or HSRC) to clearly communicate to potential SAWAIS users and buyers that the measure was based on the Wechsler-Bellevue and CHERYL D FOXCROFT SUSAN ASTON cheryl.foxcroft@nmmu.ac.za Higher Education Access & Development Ser vices Nelson Mandela Metropolitan University ABSTRACT In response to the growing demand for a test of cognitive ability for South African adults, the Human Sciences Research Council (HSRC) adapted the Wechsler Adult Intelligence Scales, third edition (WAIS-III) for English- speaking South Africans. The standardisation sample included both first and second language English speakers who were either educated largely in English or Afrikaans. The purpose of this article is to critically examine the adaptation process undertaken by the HSRC when standardising the WAIS-III for English-speaking South Africans by deliberating whether sufficient attention was paid to establishing if the measure was equivalent for various groups of English first and second language test-takers. In performing this critical examination, international test adaptation guidelines and standards, psychometric conventions, and national and international research findings were contemplated. The general conclusion reached was that the equivalence of the WAIS-III across diverse language groups has not been unequivocally established and there are indications that some bias may exist for English second language test-takers, especially if they are black or Afrikaans-speaking. Based on these conclusions, recommendations are made regarding the way forward. Key words Wechsler Adult Intelligence Scale, WAIS-III, language, test adaptation, test bias CRITICALLY EXAMINING LANGUAGE BIAS IN THE SOUTH AFRICAN ADAPTATION OF THE WAIS-III 97 SA Journal of Industrial Psychology, 2006, 32 (4), 97-102 SA Tydskrif vir Bedryfsielkunde, 2006, 32 (4), 97-102 not on the WAIS, and that the statistical properties of the SAWAIS were largely unknown. At that stage the norms of the SAWAIS were so outdated that the scores were no longer valid predictors of deficits or normality (Nell, 1994). As the SAWAIS became increasingly dated, criticism of its continued use escalated (Claassen, Krynauw, Holtzhausen, & Mathe, 2001a). Questions were raised and extensive evidence was collected, the result being a clear indication that the ongoing administration of the South African version of the Wechsler- Bellevue was not in the best interests of the public (Nell, 1994). In some cases practising psychologists resorted to using the USA revision of the WAIS-R, which had no South African norms, or the revised Senior South African Individual Scale (SSAIS-R) which had only been normed for school-going children (Claassen et al., 2001a). During the academic boycott period in the late 1980s and early 1990s, the Psychological Corporation rejected attempts by the HSRC to initiate standardisation of the WAIS-R for South Africans. However, once sanctions and boycotts were lifted after the demise of Apartheid and the formation of a democratic South Africa in 1994, the HSRC signed a contract with the Psychological Corporation in December 1997 to adapt and standardise the WAIS-III for English-speaking South Africans (Claassen, Krynauw & Holtzhausen, 2000). The adaptation of the WAIS-III for English-speaking South Africans The primary objective of the process of adapting the WAIS-III was to add or adapt items that were considered to be more relevant to the South African context and to develop norms for English-speaking South Africans from the four main cultural groups, namely, blacks, coloureds, Indians, and whites (Claassen et al., 2001a). In view of the trend towards the use of English by urbanised African-language speakers, the advisory committee suggested that each of the four cultural groups should constitute 25% of the standardisation sample. By allocating 25% to each group, investigation into differential item functioning (DIF) was facilitated (Claassen et al., 2001a). Furthermore, given that both first and second language English speakers were included in the standardisation sample, it was important to establish the English proficiency levels of the participants. Consequently, all participants were required to write a short English test which made it possible to compare the levels of English proficiency between the groups. Both quantitative and qualitative methods were used to identif y which items needed to be adapted or replaced. The original items were administered to a multicultural sample of black (N=165), coloured (N=230), Indian (N=191) and white (N=203) adults. DIF analyses were performed to guide the evaluation of each item. Items that were found to be biased against non-white South Africans were replaced. From a qualitative perspective, items in the Information and Comprehension subtests that appeared at face value to be foreign to South African cultures were replaced with more relevant items (Claassen et al., 2001a). In addition, the opinions of 11 knowledgeable experts from all cultural groups were sought regarding the suitability of each item in the Vocabulary, Information and Comprehension tests for English-speaking South Africans (Claassen et al., 2001a). Details of the comments of the expert review group have not been reported in the technical report, nor are any indications provided as to how comments were incorporated into the adaptation of the items. This omission casts doubt on the significance accorded to qualitative data by the test adaptors. In addition, no indication is provided whether the qualitative and quantitative data were used in combination to determine whether an item should have been adapted. The lack of reporting by the project team on the qualitative data suggests that the process of standardising the WAIS-III relied heavily on quantitative information. According to the task group assigned by the HSRC to adapt the measure to the South African context, very little item bias was detected, resulting in minimal changes to the original test. Modifications were made to the Vocabulary, Information, Arithmetic and Comprehension subtests. Readers are referred to Claassen et al., (2001a) and Claassen, Krynauw, Paterson and Mathe (2001b) for a detailed discussion of the specific changes made. Unfortunately, after adapting certain items that proved to be biased, the new items were not re-piloted in order to determine the possible existence of continuing cultural or language bias. This requirement is stipulated in the guidelines for test adaptation published by the International Test Commission (ITC) (International Test Adaptation Guidelines [On-line], 2000). Guideline D.4, Test Development and Adaptation states that “Test developers/publishers should provide evidence that item content and stimulus materials are familiar to all intended populations.” Although the measure was standardised ostensibly only for English-speaking South Africans, the test adaptation team needed to be conscious of the fact that culturally diverse backgrounds as well as differing levels of English proficiency among test-takers would have a marked impact on the understanding of and familiarity with English terms and on test performance. As will be argued in the next section, there is some doubt whether sufficient attention was paid to this matter during the adaptation process and whether it can be confidently asserted that the South African adaptation of the WAIS-III is not biased against second language English speakers. Impact of language on test performance Language as a mediator of test performance Language is one of the parameters along which cultures vary. In South Africa, 11 official languages are recognised: nine African languages, Afrikaans and English. English-speaking learners are educated through the medium of English, while in most cases African language learners are educated in their home language until they reach Grade 4 and thereafter mainly in English. Most Afrikaans-speaking learners are educated in Afrikaans, and study English as a subject at school. Consequently, there is a widely held view in South Africa that as the language of learning from Grade 4 onwards is either English or Afrikaans, test performance will not be negatively affected if test-takers are assessed in their language of teaching and learning (Claassen et al., 2001b). Furthermore, the dominant language in business and industry is English. Consequently, when administering an individual intelligence scale like the WAIS-III, psychologists often argue that it is justifiable to administer the measure in English, irrespective of whether English is the first or second language of the test-taker, as it is important that test-takers can demonstrate their ability to perform test tasks in the language that will be used in the workplace (Koch, 2005). However, Koch (2005) is critical of this approach for two reasons. First, it assumes that the scores on the measure are comparable across language groups. Second, it ignores the fact that language can be a ‘nuisance factor’ that impacts on the test performance of English second language speakers. Language may be the most important mediator of test performance, especially when the language in which the measure is administered is not the home language of the test- taker. Concepts may be denied or alternatively, made available, to test-takers who are native or non-native speakers of the language of administration (Nell, 1994). The use of colloquial or archaic language in test items can lead to misunderstanding and miscommunication by test-takers, which may ultimately influence scores negatively (Nell, 1999). Herbst and Huysamen (2000) found that environmentally disadvantaged children assessed in a language other than that spoken at home performed at a significantly lower level than those who were assessed in their mother tongue. Items involving verbal comprehension FOXCROFT98 were found to be biased against test-takers who spoke an African language at home, even though they had been exposed to English on a daily basis. Language usage and reading ability can significantly impact on test scores when measures are administered in languages or with cultures other than those for which the test has been standardised. Individuals who do not read test items accurately and those who fail to understand the content of a test item are more likely to respond incorrectly (Hinkle, 1994; Shuttleworth- Edwards, Kemp, Rust, Muirhead, Hartman & Radloff, 2004). Additionally, it has been found that language is one of the primary influencers of intelligence test performance, and can have a significant negative impact when a test is administered in the test-taker’s second or third language (Nell, 1999). Language and performance on the South African adaptation of the WAIS-III The task team responsible for the adaptation of the WAIS-III for English-speaking South Africans stated that “it is unlikely that performance in some of the performance subtests will be adversely affected by the language used by the tester, even if the subject has only a limited command of that language, providing he/she has clarity on what is expected of him/her” (Claassen et al., 2001a, p. 8). This comment is contrary to the views and findings of Koch (2005), Herbst and Huysamen (2000), Hinkle (1994), Nell (1994, 1999) and Shuttleworth-Edwards et al., (2004) expressed above. While test-takers whose first language is not English may understand the wording of items, the interpretation of meaning varies significantly across cultures and first and second language English speakers, and may well impact negatively on test scores. Aston (2006) obtained the views of psychology professionals in the Eastern and Western Cape on each of the WAIS-III subtests regarding whether items were potentially problematic (biased) for English, Afrikaans and Xhosa speakers. Her results indicated that problems were experienced by test-takers with the language of certain of the verbal and performance subtests. When it came to the performance subtests, the participants reported that Xhosa- and Afrikaans-speaking test-takers were confused by the wording used in the instructions of the Picture Completion, Digit Symbol-Coding, and Block Design subtests. The test users themselves experienced difficulty with the instructions of the Matrix Reasoning subtest and recommended that these be simplified as well as translated into Afrikaans and Xhosa in order to facilitate administration of the subtest. Aston’s (2006) findings suggest that the test adaptation team should have spent more time qualitatively and quantitatively evaluating the performance subtests and their instructions and possibly introducing some adaptations. Hambleton and De Jong (2003) concur with this view in that they argue that producing a test which is linguistically appropriate for use in more than one language involves adaptation of both verbal and non-verbal tasks. What information did the test adaptation team provide regarding whether second language English speakers are disadvantaged by taking the measure in English? To their credit, the team performed various investigations and analyses to explore the factor structure and compare the performance of various cultural and language groups. Factor structures were derived using principle factor analysis with varimax and oblique rotation for blacks, coloureds, Indians and whites. In addition, factor structures were derived for blacks with and African language as a mother tongue, Afrikaans- speakers who work in an English environment, and for Afrikaans-speakers who speak Afrikaans at work. Claassen et al, (2001a and b) reported that the solutions derived supported the four index scores for English-speaking coloureds, Indians and whites. However, the solutions derived for black English speakers and African language speakers as well as for the two Afrikaans-speaking groups were more mixed and less satisfactory. In particular, Claassen et al., (2001a and b) concluded that there was weak support for the structure of the index scores for the black samples. The findings of the factor analysis raise questions regarding the equivalence of the WAIS- III across cultural and language groups and for test-takers whose home language is not English but who take the test in English. It is a pity that Claassen et al. (2001a and b) did not statistically compare the similarity of the factor structures for the various groups by, for example, computing coefficients of congruence. This would have provided empirical information regarding the equivalence (similarity) of the factor structures and could have indicated whether there was greater similarity for some of the index scores than for others. With the wisdom of hindsight, it should also be questioned whether merely deriving factor solutions for various samples was sufficient to reach a conclusion regarding the equivalence of the WAIS-III across various language and cultural groups. It is increasingly being argued in the literature that multiple methods should be used to evaluate structural equivalence as opposed to only using one method and that matched groups should ideally be used (Koch, 2005; Sireci & Khaliq, 2002). Researchers suggest that a combination of methods such as principal components analysis, weighted multidimensional scaling, and structural equation modelling, together with various methods for exploring differential item functioning (DIF), should be used to comprehensively evaluate the structural equivalence of a measure across language versions or language groups (Koch, 2005; Sireci & Khaliq, 2002). Only one method (principle factor analysis) was used to investigate the equivalence of the WAIS-III for various cultural and language groups and matched samples were not used. Users of the WAIS-III should thus be aware that there is not yet sufficient and unequivocal evidence to conclude that the measure is structurally equivalent across cultural and language groups. Other than investigating the factor structure of WAIS-III across various groups, the performance of different groups formed on the basis of language and culture was also explored. It should be noted that Claassen at al., (2001a and b) only reported means and standard deviations for different groups and did not provide any inferential statistics, which would have aided the interpretation of whether differences among the various groups were statistically significant or not. Consequently, the discussion on the performance of the various groups that will be highlighted here remains speculative, although the present authors used the means, standard deviations and sample sizes provided by Claassen et al., (2001b) to test whether the means differed significantly. When testing whether two means differ significantly using the STATISTICA package, only p values are provided. Hence, reference will only be made to p values when commenting on whether two means differed significantly or not. The following mean scores were obtained on the English proficiency test: 112.62 (SD = 9.75) for English-speaking whites, 106.52 (SD = 11.19) for English-speaking coloureds, 102.97 (SD = 1.50) for Afrikaans-speaking whites, 99.72 (SD = 13.56) for English-speaking blacks, 95.88 (SD = 14.75) for Afrikaans- speaking coloureds, and 90.37 (SD = 17.47) for blacks who spoke English at work but for whom English was largely a second or third language. When the mean differences were statistically compared it was found that the mean score for English-speaking whites was significantly higher than that of Afrikaans-speaking whites (p<.001), English-speaking coloureds (p=.0012), Afrikaans-speaking coloureds (p<.001), English-speaking blacks (p<.001) and blacks who spoke English at work although this was not their first language (p<.001). The groups thus differed significantly in terms of their English proficiency levels, with groups for whom English was a second (or third) language scoring 0.5 to almost two standard deviations below the white English first language sample. LANGUAGE BIAS IN THE WAIS-III 99 Table 1 provides information on the index scores obtained by various language and cultural groups. This table was compiled from statistical information in Tables 9.8 and 9.10 in Claassen et al., 2001b (p. 63 and 66). Furthermore, Table 2 contains the p-values obtained when testing whether the difference between the means for the white English-speaking sample and those of the other language and cult ural samples was significantly different. TABLE 1 MEANS AND STANDARD DEVIATIONS FOR INDEX SCORES FOR VARIOUS LANGUAGE AND CULTURAL GROUPS N Verbal Perceptual Working Processing Compre- Organi- Memory Speed hension sation English-speaking 70 109,40 111,51 104,96 107,13 whites (15,49) (15,42) (14,73) (15,90) Afrikaans-speaking 97 102,88 110,72 101,07 108,32 whites (14,10) (16,37) (12,87) (15,15) English-speaking 72 103,93 102,04 98,67 101,64 coloureds (13,70) (11,65) (14,33) (13,87) Afrikaans-speaking 96 92,36 93,89 94,22 97,17 coloureds (13,44) (14,49) (13,03) (13,74) English-speaking 35 101,71 97,63 96,23 96,97 blacks (15,02) (16,84) (11,90) (13,99) Blacks who speak 196 91,47 88,17 91,42 88,60 English at work (14,69) (14,49) (14,10) (9,80) (SD provided in parentheses) TABLE 2 SIGNIFICANCE LEVEL (P) WHEN COMPARING THE MEANS OF THE WHITE ENGLISH-SPEAKING SAMPLE TO THE MEANS OF THE OTHER LANGUAGE AND CULTURAL GROUPS English-speaking whites Verbal Perceptual Working Processing compared with: Compre- Organi- Memory Speed hension sation Afrikaans-speaking whites 0,001 0,753 0,072 0,624 English-speaking coloureds 0,027 <0,001 0,011 0,030 Afrikaans-speaking coloureds <0,001 <0,001 <0,001 <0,001 English-speaking blacks 0,017 <0,001 0,003 0,002 Blacks who speak English <0,001 <0,001 <0,001 <0,001 at work When comparing the index scores of the differing combinations of language and cultural groups provided in Table 1, it is clear that the English-speaking white sample consistently obtained higher mean scores than the other groups. From Table 2 it can be seen that the mean for the white English-speaking group was significantly (p<.05) higher than that of all the other groups for Verbal Comprehension. In addition, the mean Perceptual Organisation, Working Memory and Perceptual Speed index scores of the white English-speaking sample was significantly higher than the means for all the groups, except for the Afrikaans-speaking white group. The differences in means between white English speakers and the other groups ranged between 6 and 17 points and in the majority of cases the mean differences were statistically significant. This raises questions about the potential differential impact of English proficiency on WAIS-III performance. Some may argue that the comparisons of the test performance of the different language and cultural group combinations could be attributable more to differences between cultural groups than to language proficiency. While there might be some validit y to this argument, the comparison of the mean performance of English and Afrikaans speaking whites and coloureds provided in Table 1 suggests otherwise. Not only did English-speaking coloureds consistently obtain significantly higher mean scores than Afrikaans-speaking coloureds (p-values ranged between .038 and <.001), but in the case of the Verbal Comprehension and Perceptual Organisation scores the mean difference was close to 10 points, which is a considerable difference that was found to be statistically significant (p<.001) in both instances. Although English-speaking whites generally obtained higher mean scores than Afrikaans-speaking whites, the difference was small. However, the Verbal Comprehension mean for second language English speakers was almost 7 points lower than that of first language English speakers and this difference was found to be statistically significant (p=.005). Thus, on the index score where language plays a significant role in terms of the nature of the test tasks and the constructs being tapped, it is worrying that the scores of second language English speakers were significantly depressed. Comparison of the performance of different language groups within the same cultural group thus leads one to the conclusion that the impact of language on test performance cannot be ignored and cannot solely be explained in terms of cultural differences in cognitive test performance. As education has been highlighted as being a critical moderator of test performance (Claassen, et al., 2001a and b; Shuttleworth-Edwards, et al., 2004), another argument that could be put forward is that the differences between the various cultural and language groups could best be explained in terms of differences in the levels of education and the quality of the education received rather than in terms of differing levels of English proficiency. Claassen et al. (2001b) did not report on analyses in which the main and interaction effects of educational level, language and culture on WAIS-III performance were explored. However, when Shuttleworth- Edwards, et al. (2004) explored the impact of quality and level of education on WAIS-III test performance, they also incidentally provided information on the performance of educationally comparable samples of English and African first language speakers. English first language graduates respectively obtained mean Verbal, Performance and Full Scale IQ scores of 124.93, 116.14 and 123, while African first language graduates obtained mean scores of 116.10, 107.80, and 113.40 respectively. While the results of the groups were not statistically compared by Shuttleworth-Edwards, et al. (2004), the present authors tested the mean differences for significance. While there was no significant difference between the Performance IQ scores for the two groups (p=.07), the Verbal and Full Scale IQ scores of the English first language graduates were significantly higher than those of the African first language graduates (p=.01 in each instance). The significantly depressed scores of African first language speakers who were tested in English is a source for concern. Furthermore, these results once more suggest that the impact of language on test performance is a real issue that has to be specifically addressed. One way in which the differential impact of language on test performance could have been addressed would have been to explore whether separate norms should have been developed for different language and cultural group combinations. Claassen at al. (2001b) contemplated this possibility but decided to weight the cultural groups in terms of quality of education and not to attempt to also develop norms for the different language subgroups. They argued that if the language of learning was taken into account, this could justif y testing blacks who have an African home language in English. However, as Afrikaans speakers educated in Afrikaans performed poorly on the English version of the WAIS-III, Claassen et al. (2001b) suggested that the measure should be FOXCROFT100 administered in Afrikaans to this group of test-takers. To this end they developed an Afrikaans translation of the verbal subtests but cautioned that the “translated version has as yet not been standardised and should be used with this knowledge in mind” (p. 73). The ITC’s test adaptation guidelines (International Test Adaptation Guidelines [On-line], 2000) as well as the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association & National Council on Measurement in Education, 1999) clearly indicate that when a measure is translated and/or adapted, evidence needs to be provided regarding the equivalence of the translated/adapted versions. It is thus unacceptable that Claassen et al., (2001b) provided an Afrikaans translation of the verbal subtests without also providing information regarding the equivalence of the translation. Until such information is available, psychologists who follow good assessment practices should refrain from using the Afrikaans version of the WAIS-III. To amplif y this suggestion, the findings of two recent studies are pertinent. Grieve (2005) identified certain problematic items in the Afrikaans translation and was critical of the fact that no scoring criteria were provided for the Afrikaans translation. These findings and sentiments were echoed by Aston (2006). For example, in Aston’s (2006) study, participants, who were all psychology professionals, commented qualitatively that on the Vocabulary Subtest:” The meaning of the items used in the Afrikaans translation differs from the meanings of the original English items. Afrikaans people generally perform less well on these items” (p. 96) and that “Because the meanings of these words differ significantly from the original English version and are less commonly used, Afrikaans test-takers are likely to perform less well on this subtest. The questions and the scoring criteria need to be translated into Afrikaans in order to obtain parity and equivalence” (p. 96). There are thus sufficient indications that there are problems with the Afrikaans translation of the WAIS-III which psychologists should take heed of. It should also be noted that no norms have been developed for Afrikaans speakers who are assessed using the Afrikaans version of the WAIS-III for the verbal subtests, which is also problematic. DISCUSSION When judged against international guidelines for test adaptation and for using measures across linguistically diverse groups, certain problems have been identified in the South African adaptation of the WAIS-III. The adapted version seems to be appropriate for white English-speaking South Africans. However, the results of the factor analyses as well as the comparisons of performance among the groups strongly hint that there are some doubts regarding the equivalence of the WAIS-III across various language groups and that the measure may be biased against English second language black test-takers and coloured and white Afrikaans-speaking test-takers. The Way Forward It would be inappropriate to merely be critical of the adaptation of the WAIS-III for first and second language English speakers without offering some suggestions regarding how language issues can be addressed in the WAIS-III. Consequently, in the concluding section of this article suggestions will be offered related to the responsibilit y of the test distributor, the psychologists using the WAIS-III, researchers, and the Psychometrics Committee of the Professional Board for Psychology with respect to addressing this matter. Responsibility of the test distributor The HSRC’s WAIS-III test adaptation team was disbanded when the project ended and the onus now rests on the test distributor in South Africa to address limitations in the South African adaptation. The test distributor should specifically focus on: 1. Undertaking a further, more thorough, qualitative bias review of all the subtests of the WAIS-III, including the performance subtests to detect items and instructions that require adaptation or replacement. The review team should consist of psychologists, measurement experts, linguists and anthropologists. The team should furthermore be representative of English first and second language speakers from the four major cultural groups. The recommendations of Aston (2006) regarding items that might be biased against second language English speakers and problematic instructions could be used to facilitate the review process. 2. Establishing the equivalence of the measure across the diverse language groups that it is intended to be administered to. As suggested by Koch (2005) and Sireci and Khaliq (2002) a combination of statistical methods should be used when equivalence is investigated. 3. Re-examining whether separate norms for various combined language and cultural groups might result in the measure being able to be used more fairly with English second language test-takers in particular. 4. Refining the Afrikaans translation of the verbal subtests and using judgemental and empirical methods to establish the equivalence of the English and Afrikaans versions. In addition, norms should be developed for Afrikaans-speaking South Africans. 5. Exploring the possibility of adapting/translating the measure into various African languages. Once an equivalent Afrikaans translation is available, it will be difficult to justif y why only an Afrikaans translation is provided. As Hambleton and De Jong (2003, p. 130) observe, “Growing recognition of multiculturalism has raised awareness of the need to provide for multiple language versions of tests and instruments intended for use within a single national context”. Having the WAIS-III available in multiple languages will allow psychologists to assess test-takers in the language in which they are most proficient. Responsibility of psychologists using the WAIS-III Psychologists who use the WAIS-III should critically study the technical report (Claassen et al., 2001b) to reach their own conclusions regarding the limitations of the adapted South African version. In addition, they should seek out reviews of the measure and South African research studies. This should provide them with sufficient information regarding test-takers for whom it is appropriate and inappropriate to administer the measure to. The acid test will be whether psychologists will follow good assessment practice guidelines and not use the WAIS-III in instances where there is insufficient information to support the possibility that the results will be valid and that language factors did not negatively impact on test performance. Responsibility of researchers It is essential that South African researchers critically examine the adaptation of the WAIS-III and explore the psychometric properties of the WAIS-III for diverse groups. From their findings, refinements to the measure and enhancements to the use and interpretation of WAIS-III test performance can be suggested. By way of example, three South African studies on the WAIS-III were reported in this article. Namely, studies by Aston (2006), Grieve (2005) and Shuttleworth-Edwards, et al., (2004). While there are also other South African studies that have researched the WAIS-III, more are needed so that a substantial body of knowledge can be developed. Responsibility of the Psychometrics Committee The Psychometrics Committee of the Professional Board for Psychology are tasked with advising the Board on matters pertaining to psychological tests and testing and to classif y psychological tests. When a measure is submitted for classification, t wo independent reviewers evaluate it to LANGUAGE BIAS IN THE WAIS-III 101 determine whether use of the measure constitutes a psychological act, whether application of the measure and/or its results could have harmful consequences, how appropriate the measure is for the multicultural, multilingual South African context, and whether the psychometric properties of the measure have been comprehensively evaluated. As the WAIS-III is on the list of tests classified as being a psychological test by the Psychometrics Committee, it must be assumed that the measure was evaluated for classification purposes. Nonetheless, this process failed to identif y critical issues regarding the appropriateness of the WAIS-III for diverse language groups, the lack of sufficient information regarding the equivalence of the measure across various groups, the appropriateness of the norm groups, and the availability of an unsubstantiated Afrikaans version of the verbal subtests with no norms for Afrikaans speakers. One explanation for why this happened might be that the classification and review criteria need to be expanded and are not explicit and detailed enough. It is encouraging to note that the test classification methodology and criteria used are currently being contemplated by the Psychometrics Committee with the view to possible refinement. Psychologists depend on this committee to ensure that the psychological tests that they use comply with quality and psychometric standards. It is thus further imperative that the Psychometrics Committee subjects its refined test classification and review process and criteria to international benchmarking. Concluding remark The test adaptation process is fraught with difficulties and it is through the combined efforts of test users and researchers that limitations in the adapted version of a measure can be brought to the attention of test developers and distributors. It is thus hoped that this article has made a constructive contribution to the further refinement and adaptation of the WAIS-III as regards minimizing the impact of language on test performance. REFERENCES American Educational Research Association, American Psychological Association & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association. Aston, S. (2006). A qualitative bias review of the adaptation of the WAIS-III for English-speaking South Africans. Unpublished Master’s dissertation, Nelson Mandela Metropolitan University, Port Elizabeth. Claassen, N.C.W., Krynauw, A.H. & Holtzhausen, H. (2000). Standardising the Wechsler Adult Intelligence Scale-Third edition (WAIS-III) for South Africa. Pretoria: Human Sciences Research Council. Claassen, N.C.W., Krynauw, A.H, Holtzhausen, H, & Mathe, M. (2001a).Wechsler Adult Intelligence Scale-Third edition: performance of South African reference groups. Pretoria: Human Sciences Research Council. Claassen, N.C.W., Krynauw, A.H. & Paterson, H., & Mathe, M. (2001b). A standardization of the WAIS-III for English-speaking South Africans. Pretoria: Human Sciences Research Council. Grieve, K. (2005). Use of the WAIS-III for Afrikaans-speaking South Africans. Paper delivered at the 11th annual congress of the Psychological Society of South Africa, Cape Town, 20-23 September 2005. Hambleton, R.K. & De Jong, J.H.A.L. (2003). Advances in translating and adapting educational and psychological tests. Language Testing, 20 (2), 127-134. Herbst, I. & Huysamen, G.K. (2000). The construction and validation of a developmental scale for environmentally disadvantaged preschool children. South African Journal of Psychology, 30 (3), 19-24. Hinkle, J.S. (1994). Practitioners and cross cultural assessment: a practical guide to information and training. Measurement and Evaluation in Counselling and Development, 27 (2), 748- 756. Huysamen, G.K. (1996). Psychological measurement: an introduction with South African examples (3rd ed.). Pretoria: J.L. Van Schaik Publishers. International Test Commission (2000). International test adaptation guidelines [On-line]. Available: http://www.intestcom.org. Accessed June, 2006. Koch, E. (2005). Evaluating the equivalence, across language groups, of a reading comprehension test used for admissions purposes. Unpublished doctoral thesis, Nelson Mandela Metropolitan University, Port Elizabeth, South Africa. Nell, V. (1994). Interpretation and misinterpretation of the South African Wechsler-Bellevue Adult Intelligence Scale: a history and a prospectus. South African Journal of Psychology, 24 (2), 100-108. Nell, V. (1999). Standardising the WAIS-III and WMS-III for South Africa: legislative, psychometric, and policy issues. South African Journal of Psychology, 29, 128-137. Pieters, H.C. & Louw, D.A. (1987). Die Suid-Afrikaanse Wechsler- Intelligensieskaal vir volwassenes: ’n kritiese perspektief. South African Journal of Psychology, 17, 145-149. Shuttleworth-Edwards, A.B., Kemp, R.D., Rust, A.L., Muirhead, J.G.L., Hartman, N.P. & Radloff, S.E. (2004). Cross-cultural effects on IQ test performance: A review and preliminary normative indications on WAIS-III test performance. Journal of Clinical and Experimental Neuropsychology, 26 (7), 903-920. Sireci, S.G. & Khaliq, S.N. (2002). Comparing the psychometric properties of monolingual and dual language test forms. (Center for Educational Assessment Research No. 458). Amherst, MA: School of Education, University of Massachusetts, Amherst. Sparrow, S.S. & Davies, S.M. (2000). Recent advances in the assessment of intelligence and cognition. Journal of Child Psychology & Psychiatry, 41 (1), 117-131. Wechsler, D. (1997). WAIS-III administration and scoring manual. San Antonio, Texas: Psychological Corporation. FOXCROFT102 REVIEW PANEL Dr Kate Cockroft University of the Witwatersrand Prof Marié de Beer University of South Africa Prof Deon de Bruin University of Johannesburg Dr Karina de Bruin University of Johannesburg Prof Robert Doktor University of Hawaii at Manoa Prof Cheryl Foxcroft Nelson Mandela Metropolitan University Dr Dirk Geldenhuys University of South Africa Prof Kate Grieve University of South Africa Prof Gert Huysamen Gordon Institute of Business Science & University of the Free State Dr Martin Jooste University of Johannesburg Prof Wilhelm Jordaan University of Pretoria Dr Charlene Lew Gordon Institute of Business Science Mr Deon Meiring South African Police Services Prof Ian Rothmann North-West University Dr Hilton Rudnick Private Practice Prof Pieter Schaap University of Pretoria Prof Faans Steyn North-West University Prof Callie Theron Stellenbosch University Prof Esmé van Rensburg North-West University Prof Delene Visser University of Johannesburg REVIEW PANEL 103