AN INVESTIGATION OF CHINESE MIDDLE SCHOOL IN-SERVICE ENGLISH TEACHERS’ ASSESSMENT LITERACY Indonesian EFL Journal, Vol. 1(1) January 2015ISSN 2252-7427 AN INVESTIGATION OF CHINESE MIDDLE SCHOOL IN-SERVICE ENGLISH TEACHERS’ ASSESSMENT LITERACY Lin Dunlai School of Foreign Languages and Literature; National Innovation Center for Assessment of Basic Education Quality, Beijing Normal UniversityE-mail: lindunlai@bnu.edu.cn Su You School of Humanities, Beijing University of Posts and TelecommunicationsE-mail: suyou@bupt.edu.cnAPA Citation: Lin, D. & Su, Y. (2015). An investigation of Chinese middle school in-service English teachers’assessment literacy. Indonesian EFL Journal 1(1), 1-10Received: 04-08-2014 Accepted: 10-10-2014 Published: 01-01-2015 Abstract: This paper reports an investigation into the status quo of assessment literacy of Chinesemiddle school in-service English teachers. Using tasks designed by Coombe et al. (2007), the studyfinds out that Chinese secondary English teachers have low levels of assessment literacy. They are notaware of such principles as authenticity, sensitivity issues about test content and self-assessment.Especially, they are extremely incompetent in understanding statistics about item analysis anddistractor efficiency analysis. No significant difference was detected about teachers’ assessmentliteracy in terms of teaching experience and whether they have taken assessment training courses inany forms. The authors call for a study into language assessment courses offered for secondary Englishteachers and enough attention paid to the relevance of language assessment training courses toclassroom assessment. Keywords: assessment literacy, middle school in-service English teachers. China. INTRODUCTIONTeachers spend up to one third or evenhalf of their career life in dealing withassessment (Stiggins, 1991a; 1999). It is nodoubt that teachers play a pivotal part inclassroom assessment. Besides, research hasfound that if assessment is appliedappropriately, it would boost student learning(Black & Wiliam, 1998). The current beliefsabout the prominent role of assessment infostering learning, referred to as “assessment for learning” (Gipps, 1994; Broadfoot & Black,2004), call for the knowledge required forconducting assessment activities, known as“assessment literacy”.The term “assessment literacy” was firstcoined by Rick Stiggins (1991b). He firstdescribed what assessment illiterates cannotdo in Stiggins (1991b) and was morestraightforward in delineating whatassessment literate teachers can do inStiggins (1995). Popham (2011) definesassessment literacy as “(it) consists of anindividual’s understandings of the fundamental assessment concepts and procedures deemed likely to influence educational decisions” (italics in original text).Language testing arena didn’t take up theterm until 2009 when Taylor (2009) called forthe sharing of the language testing knowledge,skills and understanding in wider circles. Sheargues that “training for assessment literacyentails an appropriate balance of technicalknow-how, and understanding of principles,but all firmly contextualized within a soundunderstanding of the role and function ofassessment within education and society”(ibid: 27). Inbar-Lourie (2008) also perceivedlanguage assessment literacy asencompassing layers of assessment literacyand language specific elements. In this article,the authors take a classroom-oriented layer ofassessment literacy, which suggests that “inorder to become literate in languageassessment, one needs to attain knowledge informative and summative testing andassessment methods, in interpreting studentscores, in understanding the complexities of 1 Lin Dunlai & Su You An Investigation of Chinese Middle School In-Service English Teachers’ Assessment Literacyvalidity and reliability including currenttensions which question the application oftraditional psychometric measures to teacher-based assessment” (Inbar-Laurie, 2013;Teasdale & Leung, 2000).As a matter of fact, the knowledge basefor assessment has been offered by officialdocuments. The American Federation ofTeachers (AFT), National Council onMeasurement in Education (NCME) andNational Education Association (NEA)identified seven components that formed thebase required for performing assessmenttasks. The seven components are (1) Teachersshould be skilled in choosing assessmentmethods appropriate for instructionaldecisions; (2) Teachers should be skilled indeveloping assessment methods appropriatefor instructional decisions; (3) Teachersshould be skilled in administering, scoring,and interpreting the results of both externallyproduced and teacher-produced assessmentmethods; (4) Teachers should be skilled inusing assessment results when makingdecisions about individual students, planningteaching, developing curriculum, and schoolimprovement; (5) Teachers should be skilledin developing valid pupil grading proceduresthat use pupil assessments; (6) Teachersshould be skilled in communicatingassessment results to students, parents, otherlay audiences, and other educators; (7)Teachers should be skilled in recognizingunethical, illegal, and otherwise inappropriateassessment methods and uses of assessmentinformation (AFT, NCME, & NEA, 1990).But the real situation seems to lag toomuch behind. Many researchers haveconsistently found that teachers lackassessment literacy (Arter, 2001; Mertler,2004; Mertle & Campbell, 2005; Popham,2006; Wang, Wang & Huang, 2008; Lin, 2014),which makes it simply impossible to build upassessment culture. The methods that wereadopted in the existing literature were mostlysurvey, which often includes the instrumentdeveloped by Plake, Impara & Fager (1993),which is a 35-multiple-choice test developedfrom the seven standards (AFT, NCME, & NEA,1990), mentioned above. Plake and Impara(1997) carried out a national survey over 555American teachers, and found “woefully” lowlevels of assessment competence. Campbell, Murphy and Holt (2002) applied theinstrument on pre-service teachers and foundincompetence in assessment. Mertler (2004)investigated 61 in-service and 101 pre-servicesecondary teachers in the US, and foundincompetency in grading and interpreting testresults. Zhang and Burry-Stock (1997) usedthe assessment inventory to find out 7 factorsof assessment literacy. They comparedteachers with different teaching experienceand assessment training courses and foundsignificant difference of assessment literacyperception both in terms of teachingexperience and assessment training courses.Empirical research into languageassessment literacy is still rare (Lin & Wu,2014). Existent research focuses on theknowledge base of language assessmentliteracy (e.g. Bailey & Brown, 1996; Brown &Bailey, 2008; Tsagari, 2011; Fulcher, 2012;Jeong, 2013). In Chinese context, Jin (2010)investigated 86 language testing courses forteacher preparation programs for universitiesacross China and found a heavy focus ontesting and measurement perspective ratherthan assessment perspective. As forsecondary English teachers’ assessmentliteracy, Lin (2014) found low levels ofassessment literacy of the teachers with botha quantitative and qualitative design. In thisstudy, the authors want to report a study onthe secondary English teachers’ assessmentliteracy based on a test. The researchquestions for the study are: (1) What is thestatus quo of assessment literacy of Chinesemiddle-school in-service English teachers? (2)Does teaching experience make a difference?(3) Does assessment training course make adifference? METHOD The sampleThis study took the conveniencesampling method due to the lack of resources.39 middle school in-service English teachers(N=39) took part in the test. These teacherswere attending a language assessment courseas part of their Master of Education program.33 (N=33) were female, and 6 (N=6) weremale. They were from 14 provinces (ormunicipalities) (There are 34 provincialadministrative zones in China). In terms ofage, 22 were below 30 years old, 11 were at 2 Indonesian EFL Journal, Volume 1 (1) January 2015ISSN 2252-7427the age of 31-35, and 6 were above 36 yearsold. As regards the teaching experience, 18 ofthem had been teaching for 2-5 years, 12 ofthem 6-10 years and 9 of them more than 11years. 35 of them graduated from a teachertraining program. As for language proficiency,26 passed Test for English Majors Band 8(TEM 8), which suggests very high Englishlanguage proficiency. 14 of them were juniorhigh school teachers and 25 taught seniorhigh. As to class size, 6 of them taught classeswith fewer than 30 students, 7 of them with30-40 students, 10 of them with 41-50students, and 16 of them with more than 50students. As for these teachers’ professionaldevelopment in assessment, 20 of them nevertook language assessment courses in anyform, 12 of them took a complete languageassessment course, 7 of them got to knowlanguage assessment through lectures orlanguage teaching methods course. As fortheir perception about the importance ofknowledge about language testing, 34 of themthought knowledge about languageassessment very important for an Englishteacher. The instrumentThe instrument for this study wasadapted from a book written by Coombe,Folse and Hubley (2007). Altogether theinstrument includes ten tasks. The first tasktests the participants’ knowledge aboutlanguage assessment. It required theparticipants to select one best answer. Anexample is as follows: It’s the beginning of the semester, and you have a mixed-level class. You want to get an idea of the class’s strengths and weaknesses before you plan your lessons. Which kind of test would give you the information you need? A. Placement. B. Diagnostic. C. Proficiency. D. Aptitude. (Key: B)The second and third tasks present twoscenarios of an English teacher conductinglanguage testing in a semester. Theparticipants were required to underlinewhere there are some inappropriate languagetesting practices and briefly explain why.The fourth task is about techniquesfor multiple choice questioning. Theparticipants were to name the defects in themultiple choice question setting. An exampleis as follows: An architect is a person who does not _________. a. design automobiles b. design buildings c. design houses d. design offices (suggested answer: 1. “design” should be provided in the stem to avoid repetition. 2. We normally do not define something as “not”— authenticity issue)The fifth to eighth tasks provide a shortreading, writing, listening and speaking testrespectively. Participants were required topoint out where a violation turns upregarding testing techniques.The ninth task is a scenario about anEnglish teacher’s test preparation practice forstudents. Participants were required tounderline where inappropriate and give somecomments.The tenth task is for participant to readan item analysis and distractor analysis reportand interpret the results and suggest someimprovements. Table 1 indicates the tasks andits contents. Table 1. Ten tasks, its contents and question types1 2 3 4 5 6 7 8 9 10knowledge Scenario scenario MC reading writing listening speaking test pre- paration statisticsMC Open open open open open open open open open The data collection processThe test was administered at the verybeginning of the above-mentioned languageassessment course to avoid any learning. Inother words, the participants should rely ontheir prior knowledge to finish the test. Thetasks were printed as booklets to avoiding misplacing of the test paper. Space was morethan enough for the participants, so they cancome up with ideas and write freely.Participants were notified that the test wasonly for research purpose so they should nothesitate to write down their response exactlyas what they thought about the issues. The 3 Lin Dunlai & Su You An Investigation of Chinese Middle School In-Service English Teachers’ Assessment Literacytest lasted for 2 hours. To avoid fatigue, a ten-minute break was called in. All the test paperswere collected, coded and kept safely by theresearchers. The data processing methodThe scoring of the test was done by theauthors collaboratively. The marking schemewas developed based on Coombe et at.(2007)’s reference answers to the tasks andthe participants’ real performance. Theauthors worked out the indicators to serve asthe basis for marking. The numbers ofindicators for each task was shown in Table 2. Altogether 123 indicators were worked out.Each indicator was marked as 0, 1, exceptTask 2 and Task 4, where partial credit wasnecessary and the indicators were marked as0, 0.5, 1. As Task 2 and Task 4 involve morejudgment to be made, the researchers useddouble marking. The Pearson correlation wascalculated, and the results were 0.945 and0.943, respectively, which suggest very highagreement between the two researchers. Thescores from the two researchers wereaveraged out to reach the final markings forTask 2 and Task 4. Table 2. Ten tasks and number of indicators for each task1 2 3 4 5 6 7 8 9 10knowledge Scenario scenario MC reading writing listening speaking test pre- paration statistics10 16 15 13 20 10 15 10 8 6 FINDINGS AND DISCUSSION Research Question 1: The status quo of assessment literacyThe reliability check of the test showsthat the Cronbach α=0.828, showing very highlevel of internal consistency. As indicated in Figure 1, out of 123, the highest score is 65.25,the lowest is 15.5, the average score is 33.22,and the standard deviation is 9.397. Thefacility index of the test is 0.27, showing thatthe test is very difficult for these secondaryteachers. Table 3 shows IF of each task. Table 3 Ten tasks and IF of each task1 2 3 4 5 6 7 8 9 10knowledge Scenario scenario MC reading writing listening speaking test pre- paration statistics0.36 0.29 0.56 0.24 0.17 0.26 0.12 0.36 0.35 0.06As we can see in Table 3, six tasks (IF<0.3)are extremely difficult for the participants,with the final task—interpreting statistics—as the most difficult one. The techniques formaking multiple choice questions are notmastered by the teachers, despite the ubiquityof MC questions in almost every kind of test.Compared with testing productive skills(writing and speaking), testing receptive skills(reading and listening) is more difficult forthe teachers. Teachers are totally statisticallyilliterate for item analysis. A comparison study was done betweenteachers with different length of teachingexperience by ANOVA. The result shows that F-value is 0.262 (p=0.771), suggesting nosignificant difference between teachers ofdifferent length of teaching experience.Another comparison study was donebetween teachers with different levels ofassessment training by ANOVA. The resultshows that F-value is 0.865 (p=0.468),suggesting no significant difference betweenteachers with different levels of assessmenttraining experience. 4 Indonesian EFL Journal, Volume 1 (1) January 2015ISSN 2252-7427 Figure 1 Total score distribution of the test Research questions 2 & 3: The role of teaching experience and assessment training course Low levels of assessment literacyUsing mostly open-ended questions, thisstudy resonates with other existent research(e.g. Arter, 2001; Mertler, 2004; Mertler &Campbell, 2005; Popham, 2006; Wang, Wang& Huang, 2008; Lin, 2014) that secondaryEnglish teachers lack a desirable level of assessment literacy. Pill and Harding (2013)conceptualized different levels of assessmentliteracy as illiteracy, nominal literacy,functional literacy, procedural and conceptualliteracy and multidimensional literacy. Table 5shows the meaning of the five levels of literacy. Table 5. Five levels of assessment literacy (Adapted from Pill & Harding, 2013: 383) Illiteracy ignorance of language assessment conceptsNominal literacy Understanding that a specific term relates to assessment, but may indicate amisconceptionFunctional literacy Sound understanding of basic terms and conceptsProcedural and conceptualliteracy Understanding central concepts of the field and using knowledge in practiceMultidimensional literacy Knowledge extending beyond ordinary concepts including philosophical, historicaland social dimensions of assessmentJudging from teachers’ responses to theten tasks and the above scale, the assessmentliteracy of Chinese middle school Englishteachers can be rated as somewhere betweenilliteracy to nominal literacy. The latest National English Curriculum Standards forcompulsory education (Ministry of Education,or MoE, 2012) stipulates that teachers shouldmake use of various kinds of assessment toevaluate students’ development, including formative assessment and summativeassessment. It includes exemplary assessmenttasks and comments for teachers’ reference,covering as many as 45 pages. Obviously,teachers lacking assessment literacy willhinder the sound implementation of the Standards. As Alderson (2011) argues,“testing is too important to be left to testers”.Here, the authors want to point out someprominent issues in testing that the 5 Lin Dunlai & Su You An Investigation of Chinese Middle School In-Service English Teachers’ Assessment Literacyparticipants were not aware of. The first issueis authenticity in language assessment.Authenticity is defined as “the degree ofcorrespondence of the characteristics of agiven language test task to the features of aTLU (Target Language Use) task” (Bachman &Palmer, 1996:23). In Task 2, the scenariodescribes a novice teacher asking students towrite an essay about the use of modal verbs:“Write a 300-word essay on the meanings ofmodal verbs and their stylistic uses. Giveexamples and be specific.” In Task 6, studentsare asked to describe a petrol pump. In Task 7,the listening material is about an introductionto a made-up place. All these tasks violate theprinciple of authenticity. But none of theparticipants in this study commented on thisprinciple in this test. The second issue isabout fairness review. McNamara and Roever(2006: 129) points out that to avoid or reducedifferential item functioning (DIF), testmakers use “sensitivity review” at the earlystages of test creation. This should also berelevant in classroom context. In Task 5, thereading passage is about alligators attackingand claiming people’s life. Only oneparticipant pointed out that this text mayarouse negative feeling in students and shouldbe avoided in classroom test. Another issuethe authors want to point out is that teachersdo not attach importance to students’ self-assessment. In Task 9, a teacher says “Iusually don’t like students to mark their ownpapers”, only 3 teachers thought that studentsshould be given opportunities to self-assesstheir performances and make appropriateadjustment of their study. Anxiety for statisticsIn this study, we found that statistics foritem analysis is extremely difficult for thosein-service secondary English teachers. 7 of theparticipants didn’t write any answer in thispart, while 16 of them indicated clearly thatthey didn’t know the answer. The otherparticipants gave wild guessing. Typicalanswers were “The IF is 0.77. It’s too hard”;“The discrimination index is 0.61. It’s too lowand needs to be dropped”. We know from Ebeland Frisbie (1991: 232) that as regards indexof discrimination, “0.4 and up suggests verygood items; 0.30-0.39 suggests reasonablygood but possibly subject to improvement; 0.20-0.29 suggests marginal items, usuallyneeding and being subject to improvement;and below 0.19 suggests poor items to berejected or improved by revision”. Theparticipants were not able to pinpoint thedistractive power of different options andmake suggestions for improvements.As so many participants (23 out of 39)claimed ignorance in statistics, we find iturgent to look into the issue of statisticsanxiety, which is defined by Brown (2013:353)as “ a complex of behaviors, includinguneasiness, trepidation, nervousness, andeven debilitating fear, that may occur in somestudents when they are confronted withstudying or using statistics”. Despite statisticsanxiety, Brown (2012) believes that thoughClassical Test Theory has its disadvantages, itis still “sufficiently accurate, easy-to-learn,and practical to continue in use for years tocome in real (especially local) testingsituations” (p.334). He further points out thatclassical item analysis and distractorefficiency analysis will continue to provideuseful feedback to item writers and testdevelopers about their items and their testspecifications.So it is crucial and practical to equipEnglish teachers with Classical Test Theory. Inthis regard, Brown’s (2013) proposal of need- to-know approach is appropriate fordeveloping teachers, i.e., teacher trainersshould analyze what needs to be acquired forclassroom teachers, and make statisticalinstruction accessible and manageable for thesecondary teachers. Teaching experience and assessment trainingIn this study, no significant difference wasdetected about teachers’ assessment literacyas regards to teaching experience andwhether they have taken any form ofassessment training course. This is incontradiction to the study carried out byZhang and Burry-Stock (1997). But theproblem is Zhang and Burry-Stock’s researchused teachers’ self-perception rather than atest like the current study, which may beinfluenced by teachers’ over-confidence intheir own assessment literacy (cf. Wise &Lukin, 1993).Here it is very important to examine thenature of assessment literacy. Wang et al. 6 Indonesian EFL Journal, Volume 1 (1) January 2015ISSN 2252-7427(2008) takes assessment literacy as a part ofthe package of pedagogical contentknowledge (PCK), which has been introducedas an element of the knowledge base forteaching (Schulman, 1986). Based on Cochran,DeRuiter and King (1993), who thoughtSchulman’s concept of PCK did not accountfor teachers’ initiative in developing their ownPCK and put forward pedagogical contentknowing (PCKg) instead, Lin (2014)conceptualizes assessment literacy as a partof PCKg. It both emphasizes pre-servicedevelopment on language assessment and in-service development. What’s more important,it calls for teachers’ reflection on the sceneand take into consideration the local social-cultural environment. Assessment literacyshould be acquired by doing assessment.To make sure teachers develop desirableassessment competence, teacher preparationprograms and teacher certifying institutionsshould pay more attention to developingteachers’ assessment literacy. Early studyshowed that assessment training was largelyneglected in teacher preparation programs(Noll, 1955; Schafer & Lissitz, 1987;Gullickson,1984; Stiggins & Conklin,1988,1989; Wise & Lukin,1993). Thesestudies were carried out in American contextand in an early time, but the situation seemsto have not improved too much in Chinabased on the authors’ preliminary research onpre-service language assessment courses.There is no published research in Chinesecontext except Jin (2010). But Jin’s researchwas about teacher preparation programs foruniversity English Teachers. A study isurgently needed about language assessmentcourses of teacher preparation programs forsecondary teachers.Another issue that should have people’sattention is the relevance of languageassessment course for secondary teachers.Jin’s (2010) study found a heavy focus ontesting rather than classroom assessment. Asearly as 1991, Stiggins (1991a) called people’sattention to the mismatch betweenassessment training and classroom uses ofassessment. He put forward a 30-contact-hour assessment training framework. As it isstill relevant today, we quote it here withsome amendments to suit languageassessment. Session 1 is to make teachers aware of the meaning of quality assessmentand why it is so critical to students’ well-being.Session 2 is to show the importance ofdesigning assessments with a clear vision ofthe achievement targets. Session 3 offersinstruction in the design and use of paper andpencil assessment instruments. Session 4addresses the assessment of the four differentskills. Session 5 illustrates the use ofobservation and professional judgment asclassroom assessments. Session 6 takes thewriting assessment example and shows howit can be expanded to provide a methodologythat can be applied to the observation andjudgment of any achievement-relatedbehavior or product. Session 7 deals with theassessment of affect. Session 8 is to developsound grading practices. Session 9 addressesthe norm-referenced standardizedachievement tests. Session 10 is to come backto quality of assessment and common pitfalls. CONCLUSIONThis study shows that in Chinese context,middle school in-service English teachers lacklanguage assessment literacy that isimplicated by the National English Curriculum Standards (MoE, 2012). There is no significantdifference of level of assessment literacybetween teachers with different length ofteaching experience and teachers withdifferent levels of assessment training, whichshows that language assessment literacy doesnot grow with more teaching, neither will itgrow with simply taking a short assessmenttraining course. A combination of pre-serviceand in-service teacher development ofassessment literacy is expected. Assessmenttraining materials and methods should bealigned to teachers’ classroom practice torender for beneficial harvests. ReferencesAlderson, J. C. (2011). Testing is too important to be leftto testers. In J. C. Alderson (Ed.), A Lifetime of Language Testing (pp. 219-239). Shanghai: ShanghaiForeign Language Education Press.American Federation of Teachers, National Council onMeasurement in Education & National EducationAssociation (AFT, NCME, & NEA). (1990). Standards for teacher competence in educational assessment of students. Washington, DC: Buros Institute.Arter, J. (2001). Learning teams for classroomassessment literacy. NASSP Bulletin 85(621), 53-65.Bachman, L. F., & Palmer, A. S. (1996). Language testing 7 Lin Dunlai & Su You An Investigation of Chinese Middle School In-Service English Teachers’ Assessment Literacy in practice. Oxford: Oxford University Press.Bailey, K. M., & Brown, J. D. (1996). Language testingcourses: What are they? In A. Cumming & R.Berwick (Eds.), Validation in Language Testing (pp.236–256). London, UK: Multilingual Matters.Black, P., & Wiliam, D. (1998). Assessment andclassroom learning. Assessment in Education: Principles, Policy & Practice 5(1), 7-74.Broadfoot, P., & Black, P. (2004). Redefining assessment?The first ten years of Assessment in Education. Assessment in Education 11(1), 7-27.Brown, J. D. (2012). Classical test theory. In G. Fulcher &F. Davidson (Eds.), The Routledge Handbook of Language Testing (pp. 323-335). London and NewYork: Routledge.Brown, J. D. (2013b). Teaching statistics in languagetesting courses. Language Assessment Quarterly 10(3), 351-369.Brown, J. D., & Bailey, K. M. (2008). Language testingcourses: What are they in 2007? Language Testing 25(3), 349-383.Campbell, C., Murphy, J. A., & Holt, J. K. (2002).Psychometric analysis of an assessment literacyinstrument: Applicability to pre-service teachers.Paper presented at the Mid-Western EducationalResearch Association, Columbus, OH.Cochran, K. F., DeRuiter, J. A., & King, R. A. (1993).Pedagogical content knowing: An integrative modelfor teacher preparation. Journal of Teacher Education 44(4), 263-272.Coombe, C., Folse, K. S. & Hubley, N. (2007). A practical guide to assessing English language learners. AnnArbor: University of Michigan Press.Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th Ed). Englewood Cliffs,NJ: Prentice Hall.Fulcher, G. (2012). Assessment literacy for the languageclassroom. Language Assessment Quarterly 9(2),113-132.Gipps, C. (1994). Beyond testing: Towards a theory of educational assessment. London, England: FalmerPress.Gullickson, A. R. (1984). Teacher perspectives of theirinstructional use of tests. Journal of Educational Research 77(4), 244-248.Inbar-Lourie, O. (2008). Constructing a languageassessment knowledge base: A focus on languageassessment courses. Language Testing 25(3), 385–402.Inbar-Lourie, O. (2013). Language assessment literacy.In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (pp.2923-2931). Wiley-Blackwell.Jeong, H. (2013). Defining assessment literacy: Is itdifferent for language testers and non-languagetesters? Language Testing 30(3), 345-362.Jin, Y. (2010). The place of language testing andassessment in the professional preparation offoreign language teachers in China. Language Testing 27(4), 555–584.Lin, D. (2014). A study on Chinese middle school English teachers’ assessment literacy [Unpublished DoctoralDissertation]. Beijing: Beijing Normal University.Lin, D., & W. Z. (2014). New development of research onlanguage assessment literacy. Modern Foreign Languages 37(5), 711-720. McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Oxford, England: Blackwell.Mertler, C. A. (2004). Secondary teachers’ assessmentliteracy: Does classroom experience make adifference? American Secondary Education 33(1),49-64.Mertler, C. A., & Campbell, C. (2005). Measuringteachers’ knowledge and application of classroomassessment concepts: Development of theassessment literacy inventory. Paper presented atthe annual meeting of the American EducationalResearch Association, Quebec, Canada.Ministry of Education, (2012). National English Curriculum Standards (2011 ed.). Beijing: BeijingNormal University Press.Noll, V. H. (1955). Requirements in educationalmeasurement for prospective teachers. School and Society 80, 88-91.Pill, J., & Harding, L. (2013). Defining the languageassessment literacy gap: Evidence from aparliamentary inquiry. Language Testing 30(3), 381-402.Plake, B. S., & Impara, J. C. (1997). Teacher assessmentliteracy: What do teachers know about assessment?In G. D. Phye (Ed.), Handbook of classroom assessment: Learning, achievement, and adjustment(pp.53-68). London: Academic Press.Plake, B. S, Impara, J. C. & Fager, J.J. (1993). Assessmentcompetencies of teachers: A national survey. Educational Measurement: Issues and Practice 12(4),10-12.Popham, W. J. (2006). Needed: A dose of assessmentliteracy. Educational Leadership 63(6), 84-85.Popham, W. J. (2011). Assessment literacy overlooked: Ateacher educator’s confession. The Teacher Educator 46(4), 265-273.Schafer, W. D., & Lissitz, R. W. (1987). Measurementtraining for school personnel: recommendationsand reality. Journal of Teacher Education 38, 57-63.Shulman, L. S. (1986). Those who understand:Knowledge growth in teaching. Educational Researcher 15(2), 4-14.Stiggins, R. J. (1991a). Relevant classroom assessmenttraining for teachers. Educational Measurement: Issues and Practice 10(1), 7-12.Stiggins, R. J. (1991b). Assessment literacy. The Phi Delta Kappan 72(7), 534-539.Stiggins, R. J. (1995). Assessment literacy for the 21stcentury. The Phi Delta Kappan, 77(3), 238-245.Stiggins, R. J. (1999). Evaluating classroom assessmenttraining in teacher education programs. Educational Measurement: Issues and Practice 18(1), 23-27.Stiggins, R. J., & Conklin, N. F. (1988). Teacher training in assessment. Portland, OR: Northwest RegionalEducational Laboratory.Stiggins, R. J., & Conklin, N. F. (1989). Teacher training inassessment. Paper presented at the annual meeting of the National Council on Measurement in Education,San Francisco.Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied Linguistics 29, 21–36.Teasdale, A., & Leung, C. (2000). Teacher assessmentand psychometric theory: A case of paradigmcrossing? Language Testing 17(2), 163-184. 8 Indonesian EFL Journal, Volume 1 (1) January 2015ISSN 2252-7427 Tsagari, D. (2011). Investigating the ‘assessmentliteracy’ of EFL state school teachers. In D. Tsagari &I. Csepes, (Eds.), Classroom-Based Language Assessment (169-190). Frankfurt am Main: PeterLang.Wang, T. H., Wang, K. H., & Huang, S. C. (2008). Designinga web-based assessment environment forimproving pre-service teacher assessment literacy. Computers & Education, 51, 448-462. Wise, S. L., & Lukin, L. E. (1993). Measurement trainingin Nebraska teacher education programs. In S. L.Wise (Ed.), Teacher Training in Measurement and Assessment Skills (pp.187-202). Lincoln, NE: BurosInstitute of Mental Measurements.Zhang, Z., & Burry-Stock, J. (1997). Assessment practicesinventory: A multivariate analysis of teachers’perceived assessment competence. Paper presented at the Annual Meeting of the National Council on Measurement in Education, March 1997, Chicago, IL. 9 Lin Dunlai & Su You An Investigation of Chinese Middle School In-Service English Teachers’ Assessment Literacy Berapa Harga Dalam Celana?An Australian man visited a supermarket in Indonesia. He had just started learning Bahasa Indonesia, so he could notspeak very fluently. He wanted to buy some underwear so asked the shop-assistant in Indonesian, "Berapa harga dalam celana? (How much for underneath the pants?). This guy thought that the Indonesian language had the samestructure as English! Of course the shop-assistant laughed and the Australian man had to ask him for clarification. Nowhe understands about the structure of Indonesian! (Alfons Arsai, source: http://www.ialf.edu/dpdf/april05page1.html) 10 BerapaHargaDalamCelana?