IJEE (Indonesian Journal of English Education), 6 (1), 2019, 48-64 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | DOI: http://doi.org/10.15408/ijee.v6i1.11888 This is an open access article under CC-BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Available online at IJEE (Indonesian Journal of English Education) Website: http://journal.uinjkt.ac.id/index.php/ijee ANALYSIS OF A RESEARCH INSTRUMENT TO MAP ENGLISH TEACHERS’ PROFICIENCY Siti Mina Tamah, Anita Lie Received: 28th March 2019; Revised: 27th May 2019; Accepted: 28th June 2019 ABSTRACT Teachers‘ English proficiency can be measured by designing a research instrument in a form of test. The devised test must fulfill the requirement of a good test. This article is aimed at discussing item analysis centering on multiple choice questions used to measure the proficiency of Indonesian High School teachers involved in English instruction. The first set of syllabus oriented test is tried out to 20 subjects, and the second set – general English oriented – to 28 subjects. The test analysis indicates the item difficulty indices range from .20 to 1 for the first set and .07 to .89 for the second set. With regard to item discrimination analysis, the study finds the d values range from -0.33 to 1.0 for the first set, and -0.11 to .78 for the second set. It is found that the whole test has ‗average‘ level of difficulty and is ‗good‘ at discriminating between high and low achieving test takers; to be used for the actual research, a revision of the test is done to eliminate the ‗bad‘ items. Key Words: item analysis; test; difficulty level; discrimination power; English proficiency; teacher ABSTRAK Kecakapan bahasa Inggris guru dapat diukur dengan merancang instrumen penelitian dalam bentuk tes. Tes yang dirancang harus memenuhi persyaratan tes yang baik. Artikel ini bertujuan membahas analisis soal yang berpusat pada pertanyaan pilihan ganda yang digunakan untuk mengukur kemahiran guru-guru SMA Indonesia yang terlibat dalam pengajaran Bahasa Inggris. Tes set kesatu yang berorientasi silabus diujicobakan pada 20 subjek. Set kedua - berorientasi Bahasa Inggris umum - diujicobakan ke 28 subjek. Analisis tes menunjukkan bahwa indeks kesulitan soal berkisar dari .20 hingga 1 untuk set pertama dan .07 hingga .89 untuk set kedua. Terkait analisis diskriminasi item, studi ini menemukan bahwa nilai D berkisar dari -0,33 ke 1,0 untuk set pertama, dan -0,11 hingga 0,78 untuk set kedua. Ditemukan bahwa keseluruhan tes memiliki tingkat kesulitan 'rata-rata' dan 'baik' dalam membedakan antara peserta tes berprestasi tinggi dan rendah. Untuk digunakan dalam penelitian aktual, revisi tes dilakukan dengan menghilangkan soal 'buruk'. Kata Kunci: analisis soal; uji; tingkat kesulitan; kekuatan diskriminasi; kemahiran bahasa Inggris; guru How to Cite: Tamah, S. M., Lie, A. (2019). Analysis of a Research Instrument to Map English Teachers‘ Proficiency. IJEE (Indonesian Journal of English Education), 6(1), 48-64. doi:10.15408/ijee.v6i1.11888 IJEE (Indonesian Journal of English Education), 6 (1), 2019 49-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license INTRODUCTION Teachers‘ subject matter mastery and teaching competence will affect the attainment of instructional objectives. Their skills and knowledge have been highlighted as a key component associated with clear objectives for student learning and accomplished teaching (OECD, 2005 cited in Caena, 2011). Teacher quality is in fact the key to enhance students‘ achievement (Barber & Mourshed, 2007; Chetty, 2011; Rasmussen & Holm, 2012; Harjanto et al., 2017). It is, therefore, crucial that research on teacher competence be conducted. With the increasing importance of English as a language of global communication, the quality of English instruction in schools has drawn research interest particularly in countries where English is not the lingua franca. A number of studies on teachers‘ English proficiency have been conducted. Author (20xx) urged that to set advanced competencies in the English curriculum, Indonesian teachers‘ English proficiency first had to be improved. Tsang (2011) investigated to what extent 20 primary school English teachers in Hong Kong were aware of English metalanguage and found the need for regular or systematic use of metalanguage among school teachers. Sharif (2013) was concerned that limited English proficiency of teachers distorted students‘ understanding of the content taught. Othman and Nordin (2013) studied the correlation between the Malaysian University English Test (MUET) and academic performance of English teacher education students. Earlier, Lee (2004) criticized the use of the high-stake MUET as a driver to improve English proficiency and suspected that the very traditional approach to teaching reading with the focus on discreet skills may have been the result of teachers‘ preoccupation with getting their students to pass MUET. More recently, Nair and Arshad (2018) examined the discursive construction of Malaysian English language teachers in relation to the Malaysian Education Blueprint action plan from 2013 to 2015 and argued for ways to help teachers achieve the desired proficiency and make changes to existing classroom practices that are aligned with the government agenda. The competence of Indonesian teachers of English has also been the focus of a number of studies. A study (Lengkanawati, 2005) examining the English proficiency of teachers in West Java used a TOEFL-equivalent test and found that the majority of the teachers did not demonstrate a satisfactory proficiency level. Aniroh (2009) discussed the need for ESP teachers to IJEE (Indonesian Journal of English Education), 6 (1), 2019 50-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license have a set of qualities, one of which is proficiency in English but she did not further elaborate on the proficiency issue. Anugerahwati and Saukah (2010) studied professional competence of English teachers in Indonesia and presented a profile of exemplary teachers based on qualitative descriptions of the four research subjects. They argued that satisfactory competence in English ―may seem to be taken for granted by many people other than the English teachers themselves. They tend to put a lot of pressure on themselves to excel in the subject matter. Actually this competence is already guaranteed by the requirement that a teacher has to have an S1 or D-IV degree qualification, and as such, it is understandable that other people view subject matter competence as something given by their formal education (p. 55).‖ The guarantee of subject matter competence through the teachers‘ formal education is still very much debatable as graduate competence standards are still yet to be established and enforced in English teacher education. Assessing English teachers‘ competence remains a salient issue. Soepriyatna (2012) investigated and assessed competence of high school teachers of English in Indonesia and set three dimensions of English language competence domain (language skills, linguistic, and sociocultural), two dimensions of content knowledge domain (text types and grammar points), and seven dimensions of teaching skills domain (objectives, material development, learning management, teaching techniques, learning styles, learning strategies, and qualities of an engaging teacher). He developed performance tasks to assess the twelve competence dimensions. The language proficiency covered in the first two domains is addressed in performance indicators statements such as ―uses vocabulary correctly and appropriately‖ and ―maintains grammatical accuracy.‖ Soepriyatna did not address how those indicators can be determined reliably. A test specifically constructed to assess the English proficiency of high school teachers is yet to be developed in Indonesia. The Ministry of Education has been administering annual Teacher Competency Test for all teachers as part of the certification process. The online test comprises of subject area and pedagogy items. Therefore, it does not specifically address language proficiency. Furthermore, there have been concerns that the test was not adequately constructed (Prasetyo, 2017; Putra, 2017). In line with these concerns, it is reported that of the eight IJEE (Indonesian Journal of English Education), 6 (1), 2019 51-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license national education standards, three standards—teacher standard, learning resources and facilities standard, and graduate competence standard—are the weakest. Toni Toharudin, chair of the National School Accreditation Council, urges that the government should play a more concrete role in enhancing teacher competence and distributing high-quality teachers equally in the regions (Eln, 2018). An essential requirement for a test to be employed especially for conveying teachers‘ proficiency is that the test should be a good one for a research instrument. The test devised ought to be valid and reliable. One extensively used way to perform as the step to fulfill this requirement is analyzing the test items—Gronlund (1982:101) simply puts it ―studying the students‘ responses to each item‖. Plakans and Gebril (2015) assert that item analysis is a checking procedure to see that test questions are at the right level of difficulty. It is also a procedural entity to check that test questions distinguish test takers appropriately. Test item analysis based on classical measurement theory functions as an analysis tool to measure item difficulty index, item discrimination index, and distractor effectiveness (Hughes, 1989). Classical test theory has less demand on the number of test takers whose answers will be the ones to analyze. This theory is consequently more practical since no formal training is needed prior to analysis undertaking. The item analysis is more easily performed manually—by taking, for instance, a calculator-assisted analysis or by using a simple program in a computer. The weakness of this theory is that there is an interdependency between test takers and item difficulty level. Item response theory appears as a response to the weakness of classical measurement theory. Based on this item response theory – also called ―Rasch analysis‖ (Hughes, 1989: 163), test item difficulty is ideally constant, taking no notice of whichever group is being tested. This theory performs item analysis by calculating difficulty index only (commonly termed as a one- parameter logistic model), item difficulty index and item discriminating index (prevalently termed as a two- parameter logistic model), and difficulty index, discriminating power, and speculation element (labelled a three-parameter logistic model). The more elements to be analysed, the more test takers will be engaged for their answers to analyse. In conclusion, classical test theory is more practical than item response theory. Classical test theory is more easily conducted as it does not require lots of test takers. It IJEE (Indonesian Journal of English Education), 6 (1), 2019 52-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license can be applied more effortlessly by teachers or researchers. This article presents the result of test item analysis. The analysis is delimited to item difficulty and item discrimination. The analysis is carried out to contribute to revealing the reliability of an instrument to measure high school teachers‘ English proficiency. Difficulty level is most often paired with other terms having the same meaning like difficulty index, index of item difficulty, or facility value as used by Hughes (1989), Brown (2004), Brown and Abeywickrama (2010), or Item Facility as used by Brown (1996). They all refer to the same construct. Difficulty index is a score indicating whether a test item is difficult or easy. The level of item difficulty can be explained by the percentage of the test takers who answer a test item correctly. Gronlund (1982) points out that it is the percentage of answering the items correctly. Brown (1996: 64-65) similarly asserts that it is ―a statistical index used to examine the percentage of students who correctly answer a given item.‖ Therefore, difficulty index which is symbolized as P value is one which is obtained after a measurement has been done on students who are able to answer the item correctly. The difficulty index functions as an indicator for test makers to know the quality of their test by determining whether the test is difficult or easy. Difficulty item analysis will reveal students‘ ability to the problem being analyzed. With regard to good P value, the majority of test analysts would argue for the level of ‗sufficient‘ or ‗medium‘ (P value of 0.50) for a good test. Meanwhile, Hughes (1989: 162) claims, ―There can be no strict rule about what range of facility values are to be regarded as satisfactory. It depends on what the purpose of the test is … The best advice … is to consider the level of difficulty of the complete test.‖ Discriminating power also has several terms like discrimination index, item discrimination, level of discriminating, and index of discriminating. They all refer to the same construct. Some literature labels index of item discriminating power with the letter ‗D‘, while some others use two letters ‗DI‘. This D value or DI value reveals the discrimination power of a test item. To be more specific, it indicates ―the degree to which an item separates the students who performed well from those who performed poorly‖ (Brown, 1996: 68) therefore it allows test developer to contrast the performance of the high achievers and low achievers. IJEE (Indonesian Journal of English Education), 6 (1), 2019 53-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license An item discrimination index of 1.00 is considered ―very good as it indicates the maximum contrast between the upper group and lower groups of students—that is, all the high-scoring students answered correctly and all the low-scoring students answered correctly.‖ (Brown, 1996: 68). In light of the need for better quality of English instruction in Indonesia, our research team identified the research gap of mapping the content knowledge competence of English language teachers in Indonesia high schools and assessing their English proficiency. This study is a part of a bigger research project funded in 2018 by the Indonesian Ministry of Research, Technology and Higher Education to conduct a mapping of high school teachers of English. This article presents the construction of a test to assess their English proficiency as a preliminary step before assessing their English language teaching competences. METHOD As previously mentioned in the background, the test constructed by the research team will be used as a research instrument to map the English proficiency of high school teachers in Indonesia. Design This study which centers on item analysis is quantitative in nature. The statistical formula prevalently employed include the difficulty and discriminating power values. In order for the test to be an accurate measure of what it is supposed to measure, and also more importantly in order that the test does not result in ―a harmful backwash effect‖ (Hughes, 1989: 22-23), or in order for a test to be an effective strategy to determine the content of Multiple Choice questions (Plakans & Gebril, 2015), a test specification is prepared. A test specification is responsible for ―the construct framework for operationalizing the test design through subsequent item development‖ (Kopriva, 2008: 65). Despite the counter- argument stating that Multiple Choice questions do not adequately simulate how language is used in real life, Multiple Choice questions occasionally provide better coverage of content than the nowadays performance based assessment (Plakans & Gebril, 2015). Furthermore, in spite of its drawbacks, Multiple Choice format offers efficiency of administration, particularly when it involves a large number of test-takers. These particular reasons lead the research team to include Multiple Choice type. IJEE (Indonesian Journal of English Education), 6 (1), 2019 54-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Subjects There were 20 and 28 subjects involved in the first and second tests respectively. Some subjects consisted of pre-service teachers/fresh graduates of English Department of Teacher Training Faculty; they were not involved in the teaching field yet. Some other subjects were completing their last semester at the English Department of Teacher Training Faculty; they were finishing their thesis writing. The try- out subjects excluded those teachers who would be engaged in the following research. Instrument The test was developed to cover three main categories: the syllabus- oriented, the general English (grammar and reading comprehension), and essay. There were three test types utilized: Multiple Choice, Cloze test, and Writing. All together 65 items were developed. This paper presents only the analysis of 50 Multiple Choice items (the other test types – Cloze test amounting to 15 items and Writing test – are not analysed). Among the seven Multiple Choice formats (Haladyna, Downing, & Rodriguez, 2002), the one used in this study was Conventional MC. The first test set which consists of 30 items is presented in Table 1. The test specification guiding the construction of the 30 items in the first test set is taken from the currently used 2013 English Curriculum for high school in Indonesia. The second test set which is general English consists of 20 items covering 10 Grammar and 10 Reading Comprehension items as presented in Table 2 and Table 3 respectively. Table 1. Table of Test Specification (Syllabus Oriented) Basic Competence Items Prepared 1. Implement social function, text structure, and language feature … involving giving and asking personal (family and relative) information based on the appropriate context (Focus on pronoun: subjective, objective, possessive). 1. My mother‘s brother in-law is my … aunt / uncle / cousin / grandfather 2. Implement social function, text structure, and language feature … involving giving and asking information related to future intention based on the appropriate context (Focus on be going to, would like to). 2. Shinta … married next year. is going to get / would like to get / got / are getting 3. Doni … a new job. getting / would like to get / have got / are getting IJEE (Indonesian Journal of English Education), 6 (1), 2019 55-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Basic Competence Items Prepared 3. Distinguish social function, text structure, and language feature … involving giving and asking information related to famous historical building based on the appropriate context (Focus on e.g. adverbs quite, very). 4. Borobudur Temple is … beautiful. quite / quiet / quitely / quietly 4. Implement social function, text structure, and language feature … involving giving and asking information related to past event based on the appropriate context (Focus on e.g. simple past tense vs present perfect tense). 5. He … his leg in a car accident last year. is breaking / broke / has broken / breaks 6. I left home at 7 a.m. and I … here at 1 p.m. am getting / got / has gotten / get 7. I cannot go out because I … my work yet. am not finishing / didn’t finish / haven’t finished / don’t finish 5. Distinguish social function, text structure, and language feature … involving recount texts based on the appropriate context (Focus on e.g. transitional words like first, then, after that, before, when, at last. 8. … the movie ends, we head out for a late night snack. Before / Then / After that / When 6. Distinguish social function, text structure, and language feature … involving narrative texts based on the appropriate context (Focus on e.g. simple past tense, past continuous). 9. Once upon a time, there was a little boy, who was poor, dirty, and smelly, … into a little village. comes / is coming / coming / was coming 10. Kancil … quick-witted, so that every time his life was threatened, he managed to escape. was / were / is / be 7. Implement social function, text structure, and language feature … involving giving and asking information related to suggestion and offering based on appropriate context (Focus on e.g. model auxiliary should and can). 11. Giving suggestion: Can I help you? / I can walk that far. / I should go. / You should study harder. 12. Offering something: Should I go to your house tonight? / Can I help you? / You can do it. / He should go to the doctor today. 8. Implement social function, text structure, and language feature … involving giving and asking information related to giving opinion based on appropriate context (Focus on e.g. I think, I suppose). 13. Giving opinion: In my opinion, she’s pretty. / Can you give me your opinion? / He is thinking about her everyday. / He should go. 9. Distinguish social function, text structure, and language feature … involving actual issues based on the appropriate context (Focus on transitional words like therefore, consequently). 14. Madeline is rich, …, her cousin is poor. however / otherwise / so / therefore 15. The students didn‘t study. …, they failed the course. however / otherwise / so / therefore IJEE (Indonesian Journal of English Education), 6 (1), 2019 56-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Basic Competence Items Prepared 10. Implement social function, text structure, and language feature … involving giving and asking information related to events or activities with the focus not on the doers based on appropriate context (Focus on e.g. passive voice). 16. What is the passive voice of this sentence: Somebody stole my pen. My pen has been stolen. / My pen was stolen. / My pen had stolen by somebody. / My pen is stolen. 17. What is the passive voice of this sentence: Have you finished the report? Has the report been finished? Has the report finished? / Has the report finished by you? Has the report been finish? 18. This experience will never … by me. forget / forgot / be forgot / be forgotten 19. The girl … by the boy. was tease / tease / was teased / teases 20. Choose the correct sentence: Her duty done by her. / Was her duty done by her? / Did she done her duty? / She was done her duty. 11. Implement social function, text structure, and language feature … involving giving and asking information related to cause-effect based on appropriate context (Focus on e.g. because of, due to). 21. His defeat was … the lottery issue. due to / because / since / thanked to 22. The crash occurred … the erratic nature of the other driver. due / because / because of / thanked to 12. Distinguish social function, text structure, and language feature … involving nature or social issues based on the appropriate context (Focus on transitional words like if –then, so, as a consequence, since, and passive voice). 23. The snowfall came … the effects of El Nino. as a consequence / due / since / because of 24. Serious threats … by genetic engineering. is posed / will be posed / can be posed / pose 25. Deforestation … some rainforest ecosystems.has been destroyed / have been destroyed /has destroyed / have destroyed 13. Distinguish social function, text structure, and language feature … involving news based on the appropriate context (Focus on Tenses like Past tense, Present Perfect Tense, Future Tense, passive voice, direct-indirect speech, preposition). 26. President Joko Widodo … to depart for Surakarta, Central Java, on Tuesday evening to pay his last respects to his in-law, Didit Supriyadi, who passed away in the morning. set / sets / is set / are set 27. He asked her … him a cup of water. give / giving / to give / gave 28. She told the boys … on the grass. not to play / don’t play / not play / doesn’t play 29. Who are you waiting … by / in / for / at 30. Where‘s Martin? Is he … work today? for / on / in / at Table 2 Table of Test Specification (General English; Grammar) Grammar Category Items Prepared 1. Verb; Tense (Past Tense) Your niece used to help you quite often, … ? didn‘t she / wouldn‘t she / doesn‘t she / hadn‘t she 2. Verb; Tense (Future Tense) If Anton . . . with us, he would have had a good time. would join / had joined / would have join / joined 3. Verb; Subjunctive Honestly, I‘d rather you … anything about it for the time being. Do / don‘t / didn‘t do / didn‘t 4. Verb; Since he isn‘t answering his telephone, he . . . IJEE (Indonesian Journal of English Education), 6 (1), 2019 57-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Grammar Category Items Prepared Modal Auxiliary must have left / need have left / should have left / can have left 5. Verb; Tense (Perfect Tense) We were hurrying because we thought that the taxi . . . had already came / had already come / has already came / have already coming 6. Pronoun (Object pronoun) Let you and … agree to straighten out our own problems. I / me / myself / my 7. Pronoun (Relative Pronoun) If you had told us earlier … he was, we could have introduced him at the meeting. Who / whom / which / whoever 8. Pronoun (Relative Pronoun) The notebooks … Ben had lost at the bus station were returned to him. what / which / who / whose 9. Pronoun (as object of a sentence) They didn‘t seem to mind … TV while they were trying to study. my watching / me watching / that I watch / me to watch 10. Verb; Tense (Passive Voice) My pictures … until next week. won‘t develop / don‘t develop / aren‘t developing / won‘t be developed Table 3 Table of Test Specification (General English-Reading Comprehension) Barret Taxonomy Items prepared Reorganization 1. Which of the following is the best title for this passage? What the Eye Can See in the Sky / Bernard‘s Star / Planetary Movement / The Ever-moving Stars Inferential Comprehension 2. The expression ―naked eye‖ in line 1 most probably refers to … a telescope / a scientific method of observing stars / unassisted vision / a camera with a powerful lens Literal Comprehension 3. According to the passage, the distances between the stars and Earth are … barely perceptible / huge / fixed / moderate Inferential Comprehension 4. The word ―perceptible‖ in line 5 is closest in meaning to which of the following? Noticeable / Persuasive / Conceivable / Astonishing Inferential Comprehension 5. In line 6, a ―misconception‖ is closest in meaning to a(n) … idea / proven fact / erroneous belief / theory Literal Comprehension 6. The passage states that in 200 years Bernard‘s star can move … around Earth‘s moon / next to Earth‘s moon / a distance equal to the distance from Earth to Moon / a distance seemingly equal to the diameter of the Moon Inferential Comprehension 7. The passage implies that from Earth it appears that the planets … are fixed in the sky / move more slowly than the stars / show approximately the same amount of movement as the stars / travel through the sky considerably more rapidly than the stars Inferential Comprehension 8. The word ―negligible‖ in line 8 could most easily be replaced by … negative / insignificant / rapid / distant IJEE (Indonesian Journal of English Education), 6 (1), 2019 58-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Barret Taxonomy Items prepared Inferential Comprehension 9. Which of the following is NOT true according to the passage? Stars do not appear to the eye to move. / The large distances between stars and the earth tend to magnify movement to the eye. / Bernard‘s star moves quickly in comparison with other stars. / Although stars move, they seem to be fixed. Inferential Comprehension 10. The paragraph following the passage most probably discusses … the movement of the planets / Bernard‘s star / c. the distance from Earth to the Moon / why stars are always moving Data Collection The test was tried out using two ways of administration: on-line version (making use of google form) and off- line version commonly known as paper-based test. A week period of test administration was given to the subjects who did the timed on-line version. A 60-minute classroom session at a university in Nusa Tenggara Timur province in East Indonesia was administered off-line due to the poor internet connection. Data Analysis Procedure The result of the test try out having been collected is analysed quantitatively using two types of statistical formula. The first prevalently employed formula to find difficulty level is taken from Gronlund (1982). P= R/T Where P = the percentage who answered the item correctly R = the number who answered correctly T = the total number who tried the item The second employed formula to calculate the index of discriminating power is taken from Brown (1996). D= IF upper – IF lower Where D = item discrimination power for an individual item IF upper = item facility or p value for the upper group on the whole test IF lower = item facility or p value for the lower group on the whole test FINDINS AND DISCUSSION The analysis on the first set of test indicates that the item difficulty indices (P value) range from .75 to 1.00 for easy items which amounts to 33.3%, .35 to .70 for average items amounting to 56.7%, and .20 to .25 for difficult items reaching only 10%, the smallest percentage (See Figure 1). It is revealed that the average items occupy the highest percentage rank. Calculating the average percentages of difficulty level for the test with regard to the syllabus oriented test – the first test set, the writer finds it to be .64 revealing average level of difficulty. IJEE (Indonesian Journal of English Education), 6 (1), 2019 59-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Figure 1. Item Difficulty of Syllabus- Oriented Items Meanwhile as displayed in Figure 2 below, the indices of discriminating power range from -0.33 to 1.0. Having D value of .83 – 1, seven (23.3%) items are ‗very good‘ at discriminating between the high achieving test takers and the low ones. Having D value of .5 to .67), nine (30%) items are ‗good‘ at discriminating between the high and low achieving test takers. Five (16.7%) items have the D value of .33 indicating they are ‗sufficient‘ in discriminating between the high and low achieving test takers. Nine (30%) items belong to ‗bad‘ ones They cannot distinguish between the two groups well. One of those nine items has negative value (- 0.33). The average index of discriminating power for the test with regard to the syllabus oriented test – the first test set – is .43 indicating ‗good‘ discriminating power). Figure 2. Discriminating Power of Syllabus- Oriented Items The analysis to the second set of test – as seen in Figure 3 – indicates that the item difficulty indices (P value) range from .79 to .89 for easy items which amount to 15%, and .68 to .32 for average items amounting to 75%. The item difficulty indices (P value) range from .07 to .29 for difficult items reaching 10%, the smallest percentage of the total. It is explicitly revealed that the average items occupy the highest percentage rank. Calculating the average percentages of difficulty level for the test with regard to the general English oriented test – the second test set, the writer finds it to be .55 revealing average level of difficulty. IJEE (Indonesian Journal of English Education), 6 (1), 2019 60-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Figure 3. Item Difficulty of General English- Oriented Items Meanwhile as seen in Figure 4 the indices of discriminating power range from -0.11 to .78 Having D value of .78, only one (5%) item is ‗very good‘ at discriminating between the high achieving test takers and the low ones. Having D value of .44 - .67, ten (50%) items are ‗good‘ at discriminating between the high and low achieving test takers. Five (25%) items have D value of .22 - .33 indicating they are ‗sufficient‘ in discriminating between the high and low achieving test takers. Four (20%) items are found to be ‗bad‘ ones. They cannot distinguish between the two groups well. One of those four items has negative value (-0.11). The average index of discriminating power for the test with regard to the general English oriented test – the second test set – is .39. This D value indicates ‗sufficient‘ discriminating power. Figure 4. Discriminating Power of General English Items When all 50 items are combined and analysed for their P value and D value, it is found – as seen in Figure 5 below – that 13 (26%) items belong to easy category (ranging from .75 to 1), 32 (64%) items belong to average category (ranging from .32 to .7), and 5 (10%) items belong to difficult category (ranging from .07 to .29). Figure 5. Item Difficulty of All Items It is also found – as seen in Figure 6 – that 8 (16%) items belong to the category of ‗very good‘ at discriminating test takers (D value IJEE (Indonesian Journal of English Education), 6 (1), 2019 61-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license ranges from .83 to 1), 19 items belong to the category of ‗good‘ at discriminating test takers (D value ranges from .44 to .66), 10 items belong to the category of ‗sufficient‘ at discriminating test takers (D value ranges from .33 to .22), and 13 items belonged to the category of ‗bad‘ at discriminating test takers (D value ranges from -.33 to 0). Two of these 13 items have negative values (-.33 and - .11). Figure 6. Discriminating Power of All Items Having combined the detailed calculation of the two test sets – covering syllabus oriented and general English test, the writer finds that the average P value equals to .60 and the D value equals to .41. This finding makes it evident that the devised test has reached the category of average level of item difficulty and the classification of good at discriminating between the high and low achieving test takers. This particular finding of the study is congruent with Sim and Rasiah‘s (2006) stating that MCQ items that demonstrate good discrimination index tend to be average items for their item difficulty. They further claim that items that are in the moderately difficult to very difficult range are more likely to show negative discrimination. Nevertheless, as it found that nine and four bad items appear in the first and second test sets respectively, the test devised for inclusion in the actual research should be reassessed. The bad items can simply be eliminated or improved by developing some more items. The items kept for inclusion in the actual research instrument should— following Boopathiraj and Chellamani (2013)‘s suggestion – be arranged in such a way that items of higher indices of difficulty, of moderate indices of difficulty, and of lower indices of difficulty are organized in a balanced composition. CONCLUSION AND SUGGESTIONS This article is a report on test item analysis centering on Multiple Choice questions used to measure the proficiency of Indonesian High School teachers involved in English instruction. Restricted to the analyses of item difficulty and item discrimination, the study has found that with regard to the whole test (covering syllabus oriented and general English oriented IJEE (Indonesian Journal of English Education), 6 (1), 2019 62-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license items) the average P value equals to .60 and the D value equals to .41. It is evident that the devised test has reached the category of average level of item difficulty and the classification of good at discriminating between the high and low achieving test takers. The complete test should, however, be improved for the actual research since some items—slightly above three quarters—are indicated as ‗bad‘ at discriminating test takers. The result of item analysis to the devised test in this study can hopefully become a section in a good item bank for the decision makers dealing with teacher professional development. Another suggestion might be for test developers to consider the need of the test takers by developing a test which attempts to see further the possibility of co-certification as exemplified by Newbold (2011). Acknowledgements The authors disclosed receipt of the financial support for the research, authorship, and/ or publication of this article from the Directorate of Research and Community Service, Indonesia Ministry of Research, Technology and Higher Education. REFERENCES Aniroh, K. (2009). From English as a general school subject onto English as a medium for learning specific subjects: The need to shift in the teaching orientation. TEFLIN Journal. 20(2), 169- 179. Anugerahwati, M. & Saukah, A. (2010). Professional Competence of English Teachers in Indonesia: A Profile of Exemplary Teachers. Indonesian Journal of English Language Teaching. 6(2), 47-59. Barber, M. & Mourshed, M. (2007). How the World's Best Performing Schools Come Out on Top. London: McKinsey and Company. Boopathiraj, C. & Chellamani, K. (2013). Analysis of test items on difficulty level and discrimination index in the test for research in education. International Journal of Social Science & Interdisciplinary Research. 2(2), 189-193. Brown, D. H. (2004). Language Assessment: Principles and Classroom Practices. New York: Longman. Brown, D. H., & Abeywickrama, P. (2010). Language Assessment: Principles and Classroom Practices. (2nd Edition). New York: Pearson Longman. Brown, J. D. (1996). Testing in Language Program. New Jeersey: Prentice Hall Regents. Caena, F. (2011). Teachers‘ core competences: Requirements and development. European Commission. http://ec.europa.eu/dgs/education_c ulture/repository/ education/policy/strategic- framework/doc/teacher- competences_en.pdf Eln. (2018). Fokus pada Perbaikan Standar Mutu Guru (Focus on Improvement of Teacher Quality Standards). Kompas, 10 August, p. 12. http://ec.europa.eu/dgs/education_culture/repository/ http://ec.europa.eu/dgs/education_culture/repository/ IJEE (Indonesian Journal of English Education), 6 (1), 2019 63-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Gronlund, N. E. (1982). Constructing Achievement Test (3rd edition). New York: Prentice Hall. Chetty, R., Friedman, J. N. & Rockoff, J. E. (2011). The Long-term impacts of teachers: Teacher value-added and student outcomes in adulthood. Working Paper 17699 http://www.nber.org/papers/w17699 National Bureau of Economic Research. Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A Review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education. 15(3), 309– 334. Harjanto, I., Lie, A. Wihardini, D., Pryor, L., & Wilson, M. (2018). Community-based teacher professional development in remote areas in Indonesia. Journal of Education for Teaching. 44(2), 212-231. https://doi.org/10.1080/02607476.2017.141 5515 Hughes, A. (1989). Testing for Language Teachers. Cambridge: Cambridge University press. Kopriva, R. J. (2008). Improving Testing for English Language Learners. New York: Routledge. Lee, K. S. (2004) Exploring the connection between the testing of reading and literacy: The case of the MUET. GEMA Online® Journal of Language Studies. 4(1). Lengkanawati, N. (2005). EFL Teachers‘ competence in the context of English curriculum 2004: Implications for EFL teacher education. TEFLIN Journal. 26(1), 79-92. Nair, R. & Arshad, R. (2018). The discursive construction of teachers and implications for continuing professional development. Indonesian Journal of Applied Linguistics. 8(1), 131- 138. doi: 10.17509/ijal.v8i1.11472 Newbold, D. (2012). Local institution, global examination: Working together for a ‗co-certification‘. In Dina Tsagari / Ildikó Csépes (Eds.) Collaboration in  Language Testing and Assessment (pp.127-142). Frankfurt: Peter Lang. Othman, J. & Nordin, A. (2013) MUET as a predictor of academic achievement in ESL teacher education. GEMA Online® Journal of Language Studies. 13(1), 99- 111. Plakans, L. & Gebril, A. (2015) Assessment Myths. University of Michigan Press. Prasetyo, S. (2017). Uji Kompetensi Guru, Tes Sesuaikan Kompetensi Guru. Jawa Pos. 2 July. Retrieved 9 August 2018 from https://www.jawapos.com/pendidika n/02/07/2017/uji-kompetensi-guru- tes-sesuaikan-kompetensi-guru Putra, I. (2017). Pretest UKG (Ujian Kompetensi Guru) Ini Ujian Apa Ya? Kompasiana. Retrieved 9 August 2018 from https://www.kompasiana.com/indra- yahdi/59afb95aa32cdd1bae7721d3/pre test-ukg-ujian-kompetensi-guru-ini- ujian-apa-yaaa Rasmussen, J. & Holm, C. (2012). In pursuit of good teacher education: How can research inform policy? Reflecting Education. 8(2), 62-71 http://reflectingeducation.net Sharif, A. (2013) Limited proficiency English teachers‘ language use in science classrooms. GEMA Online® Journal of Language Studies. 13(2), 65-80. Sim, D. S. & Rasiah, R. I. (2006). Relationship between item difficulty and discrimination indices in true/false type multiple choice questions of a para-clinical multidisciplinary paper. Annals Academy of Medicine. 35(2), 67- 71. https://doi.org/10.1080/02607476.2017.1415515 https://doi.org/10.1080/02607476.2017.1415515 https://www.jawapos.com/pendidikan/02/07/2017/uji-kompetensi-guru-tes-sesuaikan-kompetensi-guru https://www.jawapos.com/pendidikan/02/07/2017/uji-kompetensi-guru-tes-sesuaikan-kompetensi-guru https://www.jawapos.com/pendidikan/02/07/2017/uji-kompetensi-guru-tes-sesuaikan-kompetensi-guru https://www.kompasiana.com/indra-yahdi/59afb95aa32cdd1bae7721d3/pretest-ukg-ujian-kompetensi-guru-ini-ujian-apa-yaaa https://www.kompasiana.com/indra-yahdi/59afb95aa32cdd1bae7721d3/pretest-ukg-ujian-kompetensi-guru-ini-ujian-apa-yaaa https://www.kompasiana.com/indra-yahdi/59afb95aa32cdd1bae7721d3/pretest-ukg-ujian-kompetensi-guru-ini-ujian-apa-yaaa https://www.kompasiana.com/indra-yahdi/59afb95aa32cdd1bae7721d3/pretest-ukg-ujian-kompetensi-guru-ini-ujian-apa-yaaa IJEE (Indonesian Journal of English Education), 6 (1), 2019 64-64 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://doi.org/10.15408/ijee.v6i1.11888 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Soepriyatna. (2012). Investigating and assessing competence of high school teachers of English in Indonesia. Malaysian Journal of ELT Research. 8(2), 38-49. Tsang, W. L. (2011) English metalanguage awareness among primary school teachers in Hong Kong. GEMA Online® Journal of Language Studies. 11(1), 1-16..