THE QUALITY OF ENGLISH LANGUAGE TESTING IMPLEMENTED IN KBRI SCHOOL, SEKOLAH INDONESIA KUALA LUMPUR, MALAYSIA Nur Kholilah Email: nur_kholilah94@gmail.com Universitas Islam Negeri Sunan Ampel Surabaya Abstract. This study is aimed to know how the quality of second grade senior high school Sekolah Indonesia Kuala Lumpur Malaysia English language testing like in term of content validity, index of difficulty, index of discrimination, and the effectiveness of distracters. The design used in this study is a qualitative research. Qualitative in this research is descriptive research. Then, use quantitative descriptive research to calculate and compute the data to prove the qualitative data and conclude the result of this research. The object of this research is second grade of senior high school of Sekolah Indonesia Kuala Lumpur Malaysia and only focus in multiple choice test items. The sample of this research is second grade of Science class and second grade of Social class which conclude all students in the class. The result of this study reported that the English language testing has good content validity. It also reported that the index of difficulty of the English language testing are acceptable. Besides, the index of discrimination of this test is satisfactory. Moreover, it has good distracters. Key Words: Content Validity, Item analysis, Index of Difficulty, Index of Discrimination, the Effectiveness of Distracters INTRODUCTION English learning aimed in junior high school is oriented to reach functional level. It means that the students should be able to communicate oral and written in their daily life activity. While, English learning in senior high school is expected to reach informational level, because they have been prepared to continue their study in university (Depdiknas ; 2004). So, English subject in senior high school is important subject for students have been prepared for their study in university afterward. Senior high school students Kholilah 150 IJET | Volume 5, Issue 1. July 2016 expected to master English subject well before they going to university. Test is used to provide information concerning not only with the individual students performance, but also with the effectiveness of teaching learning activities. And test is one type of measurement is used to measure student's behavior goal of instructions. For teachers, a test is used to measure the effectiveness of teaching learning activities (Mursyidah: 2009). (Norris: 2000) Language teachers are often faced with the responsibility of selecting or developing language tests for their classrooms and programs. However, deciding which testing alternatives are the most appropriate for a particular language education context can be daunting, especially given the increasing variety of instruments, procedures, and practices available for language testing. Such alternatives include not only test types with long traditions of use—such as multiple choice, matching, true-false, and fill-in-the-blank tests; cloze and dictation procedures; essay exams; and oral interviews—but also tests differing in scope and structure from these well-known options. For example, technological developments have led to a number of new language testing formats, including computer-based and computer-adaptive tests (Brown 1997; Dunkel 1999; Yao and Ning 1998), audiotape-based oral proficiency in- terviews (Norris 1997; Stansfield and Kenyon 1992), and web-based testing (Roever 1998). In teaching learning activities, testing has an important role. The results of teaching without testing will be useless, because testing helps to show the achievement of the objectives of education. From the result of the test it can be seen whether the teaching learning process is successful or not. Both testing and teaching are so closely interrelated that it is virtually impossible to work in either field without being constantly concerned with other (Heaton; 1988). It was cleared that relation between testing and teaching can’t be ignored. Teachers, students, and school want to know their effort to achieve the educational objectives are successful or not. They will be satisfied if their efforts are successful. But if their efforts unsuccessful so they will changes their ways (Utami:2013). The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 151 Chittenden said that the purpose of testing are “keeping track, checking-up, finding-out, and summing up”. Keeping track is collecting the data about student progress in learning process in the school. Checking-up is checking the students’ skill in learning process and to know weakness of the student in learning process. Finding-out is searching, finding, and detecting the weakness and mistakes from the students in learning process. Summing up is concluding the students’ learning progress which appropriate with standard competency in that school (Arifin: 2012). Thus, as one tool of evaluation test is needed to be employed in teaching activities. Moreover it has lot of benefits in order to support the success of teaching learning process, such as: (1) To measure language proficiency. (2) To diagnose student’s strengths and weakness, to identify what they know and what they do not know. (3) To discover how successful student have been in achieving the objectives a course of study. (4) To assist placement of student by identifying the stage or part of a teaching program most appropriate to their ability (Hughes: 2003). Regarding to the case above, it is very important to have tests or some kind another, are valid, well designed and formulated. Hughes mentioned in his book that test is said to be valid if it is measure accurately what it should be measured. Nurkanca and Sumartana also pointed out that a qualified test should be reliable, valid and having degrees of difficulty-index and discriminating power (Nurkanca and Sumartana: 1986). Language testers are sometimes asked to say what is ‘the best tests’ or ‘the best testing technique’. Such question reveals a misunderstanding of what is involved in the practice of language testing. A test that proves ideal for one purpose maybe quite useless for another; a technique that may work very well in one situation can entirely inappropriate in another. Equally, two teaching institutions may require different test, depending on objectives of their courses, the purpose of the tests, and the resources available (Hughes: 2003). From that point, the teacher must recognize which test that is appropriate to measure the student skills. The teacher must create the test that is suitable with the student ability too. Kholilah 152 IJET | Volume 5, Issue 1. July 2016 In this research, the researcher will focus on language testing technique that other the teachers do in schools. The researcher wants to identify how the teacher built the test to the students: what technique that is used by the teachers and how the test can measure the skills of the students and whether the test is suitable with the students or not. Indonesia has embassy in every country that have relation with Indonesia that called by Embassy of The Republic Indonesia or KBRI (Kedutaan Besar Republik Indonesia). KBRI build school in those countries like Singapore, Malaysia, Thailand etc. which still under KBRI control. One of the examples is Sekolah Indonesia Kuala Lumpur. Sekolah Indonesia Kuala Lumpur is KBRI’s School that is located in Lorong (street) Tun Ismail no.1 50480 Kuala Lumpur Malaysia. Sekolah Indonesia Kuala Lumpur is under control supervision of Indonesians’ embassy, it means that the curriculum and the rules of the school are based on Indonesian curriculum. Like Indonesian school, Sekolah Indonesia Kuala Lumpurs’ curriculum is based on Education National Standard or BSNP (Badan Standar Nasional Pendidikan) in Indonesia. From the method term, the teacher use CTL (Contextual Teaching Learning) in English subject. This method aims to help the student to know and use the language in a real situation of the target language. For the textbook of English subject in Senior High School, the teacher uses Indonesian books from Dinas Pendidikan Indonesia or Indonesian Education Agency and Singaporean books. From the method CTL that they use, means they use KTSP or School Based Curriculum. From the competency standards of this school is Standar Kompetensi 2006 as same as in Indonesia. The reason of the researcher to do this research is, the researcher wants to identify the validity of the test in that school. Also are the culture of Malaysia influenced the teacher on the way they teach the subject and build the test of the students? From those points above, the researcher wants to know how the teachers do a testing for the student in that school and what technique that the teacher use to do a test. The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 153 Based on the above background, this research attempts to answer the questions on the quality of the English Language testing used in KBRI school, Sekolah Indonesia, in Malaysia, as well as to describe it. TESTING AND TEACHING Test is set techniques, procedures, and items that constitute an instrument of some sort that require performance or activity on the part of the test taker (and sometimes on the part of the tester as well) (Douglas: 2001). Test is procedures designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual (Bachman: 1990). In line of that, test as quoted from Webster’s Collegiate by Daryanto, is any series of questions or exercise or other means of measuring the skill, knowledge, intelligence, capacities of aptitudes or an individual or group (Daryanto: 1999). In the other word, Kubizyn and Borich stated in their book(2003), that test is just as tools that can contribute importantly to the process of evaluating pupils, the curriculum, and the teaching method. The effect of testing on teaching learning is known as backwash, and can be harmful or beneficial. If a test is regarded as important, if the stakes are high, preparation for it can come to dominate all teaching and learning activities. And if the test content and testing techniques are at variance with the objective of the course, there is likely to be harmful backwash. An instance of this would be where students are following an English course that is meant to train them in language skills (including writing) necessary for university study in an English speaking country, but where the language test that they have to take in order to be admitted to a university does not test those skills directly. If the skill of the writing, for the example, is tested by multiple choice items, then there is great pressure to practice such items rather that practice the skills of writing itself. This is clearly undesirable (Hughes: 2003). Kholilah 154 IJET | Volume 5, Issue 1. July 2016 Standards in testing One area of increasing concern in language testing has been that of standards. The word 'standards' has various meanings in the literature, as the Task Force on Language Testing Standards set up by ILTA discovered One common meaning used by respondents to the ILTA survey was that of procedures for ensuring quality, standards to be upheld or adhered to, as in codes of practice. A second meaning was that of levels of proficiency - what standard have you reached?. A related, third meaning was that contained in the phrase 'standardized test', which typically means a test whose difficulty level is known, which has been adequately piloted and analyzed, the results of which can be compared with those of a worming population: standardized tests are typically norm referenced tests. In the latter context 'standards' is equivalent to 'norms'. In recent years, language testing has sought to establish standards in the first sense (codes of practice) and to investigate whether tests are developed following appropriate professional procedures. Groot argues that the standardization of procedures for test construction and validation is crucial to the comparability and exchangeability of test results across different education settings. Alderson and Buck and Alderson et al. describe widely accepted procedures for test development and report on a survey of the practice of British EFL examining boards. The results showed that current (in the early 1990s) practice was wanting. Practice and procedures among boards varied greatly, yet (unpublished) information was available which could have attested to the quality of examinations. Exam boards appeared not to feel obliged to follow or indeed to understand accepted procedures, nor did they appear to be accountable to the public for the quality of the tests they produced. Fulcher and Bamford (1996) argue that testing bodies in the USA conduct and report reliability and validity studies partly because of a legal requirement to ensure that all tests meet technical standards. They conclude that British examination boards should be subject to similar pressures of litigation on the grounds that their tests are unreliable, invalid or biased. In the German context, Kieweg (1999) makes a plea for common standards in examining EFL, claiming that The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 155 within schools there is litde or no discussion of appropriate methods of testing or of procedures for ensuring the quality of language tests (Alderson and Banerjee: 2001). The purpose of test Test is used to measure students’ mastering with the subject given. Some experts mention the other purpose of test. According to Nurkanca and Sumartana(1986), a test has many purposes. First, is to know how far the result of a programmer applied whether it has reached its goal or not. Second, is to see whether the materials should be re-taught or not. Third, is to get some information about the students’ weakness and difficulties in learning about the given materials. Fourth, is to determine the students’ achievement and to allow them going through to the grade. Fifth, is to select and group students based on their achievement. David (1959) conducted six objectives of language testing: 1. To determine readiness for instructional programs. 2. To classify or place individuals in appropriate language classes. 3. To diagnose the individual’s specific strengths and weaknesses. 4. To measure aptitude for learning. 5. To measure the extent of student achievement of the instructional goals. 6. To evaluate the effectiveness of instruction. Characteristic of a Good Test A test is an important instrument in teaching learning process to measure students’ mastery on the materials. To know the affectivities of a test, it has criteria for testing a test. According to Arikunto, there are some criteria of good test; validity, reliability,objectivity, practicality, economy (Brown: 2004). Validity A test was classified to be valid if it measures accuracy what it is intended to measure. According to Heaton, validity of a test is the extent to which it measure what it is supposed to measure and nothing else. There are four types of validty ; face validity, content validity, contruct validity, and emperical validity (Heaton: 1988). Kholilah 156 IJET | Volume 5, Issue 1. July 2016 Reliability One of the necessary characteristic of good test is reliability. The test was said to be reliable if it is consistent in the measurements. It means that the students must have same mark if the test marked by two or more examiners. Moreover, the reliability of the test was considered a number of factors that may contribute to the unreliability of the test. According to Heaton, the factors affecting the reliability are: 1) The extent of the material selected for testing. Reliability is concerned with the size of the test; it is not too long and not too short. 2) The administration of the test(Heaton: 1988). The students or test-takers must have same condition and time limit. 3) The instruction. The clarity of the instruction will affect the students’ comprehension to answer the test. 4) Personal factors, such as motivation and illness. 5) Scoring the test. It means that the objective test is more reliable than the subjective test. There are some methods to estimate reliability. Such as test – retest method, split half, equivalent method, and internal consistency method. Here, the reseacher uses split half method to get reliability because the test did only one times. This formula is r11 12 = N ∑ Y1X1 − (∑ X1) (∑ Y1) √{(N ∑ X1 2) − (∑ X1) 2 (N ∑ Y1 2) − (∑ Y1) 2} After that the result above to corelation with sperman Brown pattern, this formula is : r11 = 2 X (r11 12 ) 1 + (r11 12 ) This is criteria of reliable 0.00-0.20 Not reliable 0.20-0.40 Less Reliable 0.40-0.60 Reliable enough The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 157 0.60-0.80 Reliable 0.80-1.00 Very Reliable Objectivity According to Arikunto (1993) the test is called objective if it is free from subjective factors which influence the test. Objectivity of a test can be increased by using more objective types test items and the answers are scored according to model answers provided. Arikunto adds that there are two factors that influence the objectivity of a test they are the form of a test and the test scorer. Practicality A test is called as practical test if it is easy to do and does not require many equipments and give freedom to the students to do the easier part, easy to score, is completed with clear instructions. Arikunto (1993) stated that practicality of a test deals with a level of difficulties in admintering the test it self. Item Analysis The purpose of items analysis was to identified the test items whether it is good or not. To know the answer, all items should be identified from the index of difficulty and index discrimination. METHODOLOGY Based on the problem statements of this study, the goal of this research is to explain the answer about the language testing technique that used by Sekolah Indonesia Kuala Lumpur and explain how the test valid with the student, so this research use qualitative descriptive study. This study the writer will use descriptive methodology. This descriptive study is designed to obtain information concerning particular issues and then describe them. Arikunto (1993) states that descriptive research is not meant to test a certain hypothesis, but it only describes the phenomena, situation and condition that occur during the study. Best moreover divide descriptive research into four parts: document or content analysis study, case study, ethnographic study, Kholilah 158 IJET | Volume 5, Issue 1. July 2016 and explanatory observation study. Document or content analysis study is the study which is concern with the explanation of the status of phenomenon at particular time. Case study is the way of organizing social data for the purpose of viewing social reality. Ethnographic study is the process of collecting data on many variables on an extended period of time, in naturalistic setting. Explanatory observation study is the study which seeks to find answers to question through the analysis of variable relationship (Best :1981). From the statements above, it can be concluded that the study is categorized as document or content analysis study since this study concern about the answer of what the language testing technique that use in Sekolah Indonesia Kuala Lumpur and also this research is concurrent embedded method because it is combine between qualitative approach and quantitative approach to get and analyze that data. Data Collection Technique and Instrument In this research, the researcher use study document technique to answer the questions. The teacher made English test, the answer key and Standard of Graduates Competence academic year 2012-2013 are used to answer the validity of the test. The students’ answers sheet and the students’ scores of the teacher-made English test are used to answer realibility, the index of difficulty and index of discrimination, distracters of the English test. Those instruments are to prove the answer for all questions. Data Analysis Procedure In this study, researcher use interview technique because from the researcher point this technique is appropriate to collect the data and this technique is easiest one to know the answer of researcher question. The researcher conduct step in analyzing the data, as follows: 1. Explain the document (English final exam test) in second grades of Sekolah Indonesia Kuala Lumpur. 2. Analyzing the test based on language test technique and measures the test whether valid or not. The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 159 Analyzing The Face Validity Face validity will be high if the students or test takers encounter some or the entire characteristic of good face validity, as follow: Analyzing the Content Validity In analyzing the content validity, the researcher want to know accuracy of English test with the idicators of curricullum or Standard Competencies. The researcher collect the data through the following steps: 1. Making a list of the standard competencies, basic competencies, indicators, and learning experience for the tenth grade students of senior high school and the indicators of basic competencies given. 2. Placing each of the test items in the appropriate place with the standard competencies and basic competencies to identify whether or not the standard competencies and basic competencies covered by the final test. Step Aspect of test and questions Explanation 1 Test appearance How is the cover of test? How is the letter used in the test? How is the test layout How is the size of test paper used? 2 The direction How is the general instruction of the test? How is the specific instrument of the test? How is the instruction for going on to text section in the next page? 3 Test items types How many types of the test have been chosen? How are the test presented? Kholilah 160 IJET | Volume 5, Issue 1. July 2016 3. Counting the percentage of the test items of every language aspects. 4. Concluding the result of analysis. ITEM ANALYSIS Index of Difficulty The Index of difficulty of an item simply shows how easy or difficult the particular item proved in the test (Heaton : 1988). To analyze the index of difficulty of test items, the researcher takes the following steps: 1. Arranging the students’ score from the highest score to the lowest one. 2. Finding the top and the bottom of the students’ score, as upper and lower groups. Dividing the scripts in rank order of total score into two groups of equal size, the top half as the upper level and the bottom half as the lower group. 3. Computing the item difficulty by using the formula of by Heaton (1988) below: Where: FV = Index of difficulty R = the number of students who answer correctly N = the number of students who taking the test Classify the result based on the criteria of Arikunto(1993) , as follow: 1. Test items with 0,00 – 0,30 Difficult value 2. Test items with 0,31 - 0,70 Moderate value 3. Test items with 0,71- 1,00 Easy value Analyzing the Index of Discrimination The Index of Discrimination indicates the extent to which the item discriminates between the testees, separating the more able testees from the less able (Heaton: 1988). To analyze the Index of FV = R N The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 161 D = Correct U – Correct L N Discrimination here use the same steps using in analyze Index of difficulty. Those steps are: 1. Arranging the students’ score from the highest score to the lowest one. 2. Finding the top and the bottom of the students’ score, as upper and lower groups. Dividing the scripts in rank order of total score into two groups of equal size, the top half as the upper level and the bottom half as the lower group. 3. Calculate the index of discrimination, the researcher used the formula below: Where D : The Index of Discrimination Correct U : The number of students in upper group who answered the items correctly Correct L : The number of students in lower group who answer the items correctly N : The number of students taking the test in one group. Classify the result based on the criteria of Arikunto (1993), as follow: 1. Test items with 0,00 – 0,20 is Poor 2. Test items with 0,21 - 0,40 is Satisfactory 3. Test items with 0,41- 0,70 is Good 4. Test items with 0,71 – 1,00 is Excellent Analyzing the Effectiveness of Distracter Besides calculating index of difficulty and discrimination, it also important to analyze the items in very detail, moreover on those which cannot perform as expected. Analyzing the distracter aimed not only to know which items that cannot work properly but also to check why particular test taker failed to answer certain items correctly. Distracters shave functioned well if these chosen mostly by students from lower level. According to Arikunto (1993), the Kholilah 162 IJET | Volume 5, Issue 1. July 2016 distracter which is chosen at least by 5% students from is called good distracter. In addition, to conduct the effectiveness of distracter the researcher should determine the amount of students from upper and lower level who chosen each options in each item. The researcher also determines the amount of students who do not chose the options at all (omit). However, to ease the analyzing, the researcher used the table below which shows the example of analyzing the Effectiveness of Distracters (Arikunto: 1993) Item Number Options Upper Group Lower Group Comment 1 A B* C D O 1 22 1 1 0 8 11 2 4 0 Good Good Good Good NF FINDINGS After classifying the students to the upper and the lower group, the next step is analysing the validity and item analysis. The researcher takes two kinds of validity; include face validity and content validity. Item analysis includes index of difficulty, index of discrimination and distracters. Face validity To show the result of face validity of English Test for second grade in Sekolah Indonesia Kuala Lumpur, the researcher took two steps. First step was classifying the matter of the test. Second step was analysing test based on criteria in the table below. First, the test is printed in A4 paper. The test consisted of seven pages. The first page was used to cover. The cover of the test had; a logo of the school, the subject of test, date and time to do the test as well. Second page until six pages was used to reading section that consisted of forty items. The last pages contained essay test, but in this research, the researcher only analyzed the multiple choice test. Second, analyzed the test based on the table below. The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 163 The table below shows the result of the analysis of the face validity of the test (Utami: 2013). Content Validity To show the result of the analysis of content validity of the English test for the second grade of high school of Sekolah Indonesia Kuala Lumpur Malaysia, the researcher uses Standard of Graduates competencies of 2012 to know the connection the test with the Step Aspect of test and questions Explanation 1 Test appearance  How is the cover of test?  How is the letter used in the test?  How is the test layout  How is the size of test paper used? - The test cover used black colours and suitable font, can be read easily - The size of the letter in the test used 12. Can be read easily. - The test had good layout. The picture used in the test are understandable - The paper in this test used A4 paper 2 The direction  How is the general instruction of the test?  How is the specific instrument of the test?  How is the instruction for going on to text section in the next page? - The general instructions of this test are understandable. - This test had no a specific instrument. - This test had no instruction going to the next section/ending. 3 Test items types  How many types of the test have been chosen?  How are the text presented? - This test had 2 types of test. The test had multiple choice test and essay test. - This test is quite well presented in the layout or arrangement. Kholilah 164 IJET | Volume 5, Issue 1. July 2016 standard competencies. The analysing of content validity used table specification There are seventh columns in that table. The first column contains of standard competence, second column contains of basic competencies, the third column contains of indicators, the forth column contains of learning experience, the fifth column contains of item test that is appropriate with the basic competencies, the next column contains of the number of items test () and the last column contains of the percentage of total numbers of particular items represent the elated basic competence. According to J.B Heaton, the test can be said had a good content validity if it covers all the contents as stated in the curriculum. Based on the result of analysing content validity, this test just covers two criteria, the percentage of every aspect of learning content is concluded as follows: 1. There are 45% 0r 18 items for reading which focused on narrative, hortatory exposition, and spoof. 2. There are 40% or 16 items for linguistics which focused on simple past, past tense, and adverbs. 3. There are 15 % or 6 items unsuitable because it focused on descriptive and present future tense. Based on the result above, we can conclude that English test in second grade in Sekolah Indonesia Kuala Lumpur high school Malaysia is good since 85% items test represents all materials. It is more than 50%, according to Bloom if the agreement of the test is 50% or more, it can be concluded that the test had high content validity (Bloom: 1981). Moreover, there are 6 items or 15% of the test did not cover the materials, they are the items test number 7, 8, 9,10,30,35. Those item are unsuitable with the indicator of standard and basic competencies and were not taught in this semester. Analysing index of difficulty To get the data of index of difficulty, the researcher divided the class into 2 groups The first group was upper group, who were The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 165 students who get a good score. The second is lower group, who were student who get the bad score. After the researcher got the data, she did the analysis using formula as follows: 𝐹𝑉 = R N Note : FV : index of difficulty R : the number of correct answer N : the number of students taking test There are eight columns in the table. First column contained the number of English test items. The second, it was contained the score of the upper group which answer correctly of each English test items. The third contained the score of the lower group who answer correctly of each English test items. The fourth column contained total of upper group and lower group who answer correctly of each items. The fifth column contained the value of index of difficulty. The sixth column contained upper group minus lower group who answer correctly of each items. The seventh column contained the value of index of discrimination. The eight columns contained comment for each item of index difficulty and index discrimination. The researcher did analysis of the English test in second grade of high School Sekolah Indonesia Kuala Lumpur, the class that the researcher use to collect the data was second grade of Science class, and second grade of Social class. The total numbers of student in those classes were thirty six students. The numbers of student of two classes were taken as a sample of this research. The researcher used those classes because of in that school the second grade presently contains of two classes, Science class and Social class. Each major of the school presently had one class and in each class contained less than 20 students, so the researcher uses those two classes to taken a sample of this research. The students divided into two groups as the upper group consist of eighteen students and lower group consist eighteen students. After analysing the index of difficulty, the next step is machining the result with the criteria of index of difficulty according to Arikunto. The analysis is organized in the following table. Kholilah 166 IJET | Volume 5, Issue 1. July 2016 Criteria of index difficulty Index of difficulty Criteria Item number Total of item 0,00 – 0,30 Difficult 34,35,40 3 0.31 – 0,70 Moderate 3,4,5,7,8,9,10, 11, 12,13,14,15, 16,17,18,19, 20,21,22,24,25,26,27, 28, 29,31,32, 36,37,38,39. 31 0,71 – 1,00 Easy 1,2,6,23,30,33 6 The table above shows that there are 31 items are moderate level. There are 3 items are difficult level. There are 6 items are easy level. Almost test items are moderate. It means that those items are good to be given to the students. The English test items for second graders high school of Sekolah Indonesia Kuala Lumpur have acceptable index of difficulty. Index discrimination Index discrimination is tools to differentiate between students who are in the upper group (achieved well) and the lower group (who did not achieve well). To analyze index discrimination, the researcher arranged student in the upper group and the lower group, same as analysing the index difficulty. After arranging the upper and the lower group then the researcher computed data of the index of discrimination. There are six eight in the table of analysis index of difficulty and index of discrimination. The first columns contained of the number of items. The second column contained the score of student in upper group who answer correctly of each item. Third column contains the score of the lower group who answer correctly of each item. Fourth column contained of total of the upper group and the lower group who answer correctly each items. Fifth column contained the value of index of difficulty. The sixth column contained the The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 167 numbers of students in the upper group minus the number of students in lower group who answer correctly of each items. Seventh column contained the value of index of discrimination. The eight columns contain comment for each item of index difficulty and index discrimination. To calculate the index discrimination for each item number, the formula used: 𝐷 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑈 − 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝐿 𝑛 D : index of discrimination Correct U : the number of students in upper group who answer the item correctly Correct L : the number of students in lower group who answer the item correctly N : number of candidate of one group Criteria of index discrimination Index of discrimination Criteria Item number Total of item 0,00 – 0,20 Poor 1,2,4,5,6,7,8,11,12, 14,16,18,22,23,24, 26,27,30,31,32,33, 34,35,39,40 25 0.20 – 0,40 satisfacto ry 3,9,10,13,15,17,19,2 0, 21,25,28,29,36,37 13 0,40 – 7,00 Good 38 1 0,70 – 1, 00 Excellent - - -0 Wrong - - Based on the table, the result of index of discrimination shows that there are 25 items had poor index of discrimination, there are 13 items had satisfactory, and there are 1 items had good index of discrimination. Almost students index of discrimination are poor. It means that those items are categorized poor. It means that the English test must be revised. Kholilah 168 IJET | Volume 5, Issue 1. July 2016 Analyzing the Effectiveness of Distracters Item distracters are the incorrect options in the multiple choices which district the testee from the correct answer. A good distracter will attract more students from the lower group than the upper students. Thus, if there are more able students chosen the distracters, it means that the item does not function as expected in it must be revised. This English test contains of forty items test and each items test had five answer options. This appendix contains of four columns each items. The first column is the number of items test. Second column is the total correct answer of items from the upper group. Third column is the total correct answer from the lower group. The last column is the comment of the distracter. According to Arikunto (1993), if the distracter was chosen at least by 5% of student who take the test, it is called a good test. (5% from testee = 5% x 60 students = 3 students). In this case 5% of the total student is 2 students (5% x 36 students). The result of distracters shows that most of all distracters had good criteria because the distracters have been chosen by more the lower group than the upper group. So the English test had good distracters. DISCUSSION The result of face validity above shows that the test had the criteria of good test. From the cover of the test, it had clear font and colour. The test also had fine letter size to be read. In addition, the test had the acceptable paper size, the test used A4 paper. From the instructions, the instructions are simple and clearly understandable. The first instruction contains date of the test, time to do the test, and how the test must be done. The instruction of each section used unclear instructions. The instruction of each section had items number without explanation about the section. The last criteria are about the kind of the test. The test contain 40 items multiple choice, and 10 essay. From the explanation above the researcher concluded that the English test of second grade of high school Sekolah Indonesia Kuala The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 169 Lumpur Malaysia had acceptable quality, not actually well other than acceptable to the students. Based on the result of content validity above, the test had 85 % items that covered the indicators of Standard of competencies of 2012. It is 15% items that did not cover the indicators of Standard of Graduates Competencies. The 45% of the test is focused on reading skill. The test had narrative text, hortatory exposition and spoof text. However in this test there is one descriptive test that not thought in this semester. Then, 40 % from the test is focused on linguistics. There are simple past, past tense, and conjunction. But in this test there is one of items test that use present future that not thought in this semester. According to Bloom, if the test agreement is 75% or more, then it can be said that the test had high content validity. On the other hand, if agreement is less than 50% the rest is considered having low content validity (Bloom: 1981) From the explanation above the researcher concluded that the English test that used for second grade Sekolah Indonesia Kuala Lumpur high school Malaysia had good content validity which had 85% covered indicator of Standard Competencies. Index of difficulty Based on the table index of difficulty, the result reported that there are 3 items that classified in difficult criteria. In the index difficulty, 3 out of 40 had difficult value had 0,00 until 0,30. These items must be revised because it is too difficult to be done by the students. This test had 6 out of 40 of easy criteria. The easy value of index difficulty is from 0,71 until 1,00. These items are too easy to the students. This items tests it must be revised. It can be concluded that most of items or 31 out of 40 items are moderate. The moderate criteria had value from 0,31 until 0,70. These items tests were acceptable for the students. This items test did not to be revised. Index of discrimination From the result of index discrimination, it was explained that the test have 1 items of the 40 items in good criteria. The good discrimination value is 0, 40 until 0, 70. This item is not to be revised Kholilah 170 IJET | Volume 5, Issue 1. July 2016 but it must be put in more items to make a good test. Then, this test had 13 out of 40 items that include in satisfactory discrimination value. These items must be revised because it is not a big number as a poor criteria. Then the researcher concluded that the test is poor criteria. The tests have 25 out of 40 items. In this discrimination value, the poor criteria had 0, 00 – 0, 19 value. These criteria must be revised because the number of poor criteria is major of the total of test items. Analyzing of effective distracter Item distracters are the incorrect options in the multiple choice which is can amuse the student who do the test from the actual answer. A good distracter will attract more students from the lower group than the upper students (Utami: 2013). In this English test of second grade of Sekolah Indonesia Kuala Lumpur Malaysia had forty multiple choice items. Those are number one until number forty, and each items number was contained of five options. So the item distracters of this test were 160. From the result above, 7 out of 160 were bad distracters because those item distracters were chosen less that 5% of the total students who take the test. The distracters items must be revised. Besides that, there are 153 out of 160 was good item distracters because the items was chosen by 5% or more of the total of the students who take the test. In Addition, according to Nurgiyantoro the data for analyzing the effectiveness of distracters in the appendix 5 showed that there are 4 out of 160 non function distracters since none from both the upper group and the lower group of students chosen those distracters. Besides, there are 8 out of 160 distracters categorized as adequate, because they had same amount of voters from the upper and the lower group. Moreover, there are 4 out of 160 malfunction distracters since those items attracts more students in the upper group than students in the lower group, which is good distracters must been chosen by more the lower group than the upper group. These items must be revised. However, there are 144 distracters are good since worked properly to the students. That is concluded that the test had good distracters and not to be revised. The Quality of English Language Testing Volume 5, Issue 1. July 2016 | IJET 171 CONCLUSION From data analysis and discussion in chapter IV, the researcher concluded that English test has been constructed by English teacher in senior high school Sekolah Indonesia Kuala Lumpur Malaysia. The English final test had been conducted by the English teacher. The test was objective test that consisted of forty multiple choice questions and had five multiple choice objections. From the validity, face validity, English test of the school had acceptable quality, not actually well but acceptable to the students. Then, from content validity, English test of the school had good content validity which had 85% covered indicator of Standard Competencies. In addition from items analysis, index difficulty, most of items or 31 out of 40 items are moderate value. These items tests were acceptable for the students. From item analysis index discrimination , the test is poor criteria. The tests have 25 out of 40 items. These criteria must be revised because the number of poor criteria is major of the total of test items. From item analysis distracters, there are 144 out of 160 distracters are good since worked properly to the students. That is concluded that the test had good distracters and not to be revised. REFERENCES Alderson, J. C. & Banerjee, J. (2001). Language testing and assessment (Part I),United Kingdom: Cambridge University. Arifin, Z. (2012). Evaluasi Pembelajaran. Bandung: PT Remaja Rosdakarya. Arikunto, S. (1993). Dasar-dasar Evaluasi Pendidikan, Jakarta, Bumi Aksara. Bachman, L. F. (1990). Fundamental Considerations in Language Testing. USA: Oxford University Press. Brown, H. D. (2001). Teaching by Principles: An Interactive Approach to Language Pedagogy, Second Edition.San Francisco: Longman Inc. Daryanto. (1999). Evaluasi Pendidikan, Jakarta: Rineka Cipta. Depdiknas. (2004). Standard Kompetensi Mata Pelajaran Bahasa Inggris SMP dan Madrasah Tsanawiyah, Depdiknas: Jakarta, Kholilah 172 IJET | Volume 5, Issue 1. July 2016 Fulcher, G. (2007). Language testing and assesment, New York: Routledge Heaton, J. B. (1988). Writing English Language Test, New York: Longman. Hughes, A. (2003) Testing for Language Teacher, Cambridge: University Press. James, Ayodele, & Oluwatayo. (2012) Validity and Reliability Issues in Educational Research (Vol 2),Nigeria: Institute of Education,Ekiti State University, 2012 Khoiro, A. (2012). An analyzed teacher made English try out fr national exam for the third graders of MAN Sidoarjo, Thesis S1, Surabaya: Perpustakaan IAIN Kubiszyn, T. & Borich, G. (2003). Educational Testing and Measurement Singapore: John Wiley & Sons, INC. Mayangsari, I. M. (2009). An Analysis of UAS English Test of Second Semester 2008/2009 by Teacher-made English Test in SMA 2 Muhammadiyah Sidoarjo, Surabaya: Perpustakaan IAIN Sunan Ampel Surabaya, Norris, J. M. (2012). Purposeful Language Assessment: Selecting the Right Alternative Test. English Teaching Forum, 38(1), pp. 41 – 45. Retreved from: http://files.eric.ed.gov/fulltext/EJ997530.pdf Nurkanca, W. & Sumartana. (1986). Evaluasi Pendidikan, Surabaya: Usaha Nasional. Sudijono, A. (1996) Pengantar Evaluasi Pendidikan. Jakarta: PT. Raja Grafindo Persada. Tambini, R. F. (1999). Aligning learning activities and assessment strategies in the ESL classroom. The Internet TESL Journal, V(9). Retreved from: http://iteslj.org/Articles/Tambini- Aligning.html Utami, S. 2014. An analysis teacher made English UKK test for academic years 2012 – 2013 for seventh graders of Muhammadiyah 9 surabaya, Thesis S1, (Unpublished)