203 | JISAE. Volume 6 Number 2 September 2020. https://doi.org/10.21009/JISAE JISAE (Journal of Indonesian Student Assessment and Evaluation) ISSN : P-ISSN: 2442-4919│E-ISSN: 2597-8934 Vol 6 No 2 (2020) Website : http://journal.unj.ac.id/unj/index.php/jisae APPLYING RASCH MODEL TO MEASURE STUDENTS` READING COMPREHENSION Dinar Pratama 1 , Ihda Husnayaini 2 12Fakultas Tarbiyah, IAIN Syaikh Abdurrahman Siddik Bangka Belitung, ABSTRACT There have been studies suggesting that students' reading comprehension in English subject is influenced by the accuracy of teaching strategy use. The use of appropriate teaching strategies is required to accommodate the diversity of students' abilities. Therefore, measurements are needed to provide actual information about students' abilities. This study aims to examine the levels of students' reading comprehension by using RASCH model. The subjects of this study were8th grade students, totaling 200 responses on the teacher's reading comprehension test with five alternative responses. The data analysis used was the Rasch 1 parameter model consisting of person reliability, item-person distribution maps, and item- person suitability. The findings of this study indicated that the average reading comprehension ability of students was included in the high category with a value (Meanperson) of 1.29 logits above the average level of difficulty of the test items (Meanitem) of 0. Further research is expected to be conducted to prove whether the addition of test items affect the value of person reliability. Keywords: Person Reliability, Fit Item, Rasch Model. Address for Correspondence: dinarpratama24@gmail.com INTRODUCTION Understanding texts written in foreign languages, especially English, has its difficulties. This is experienced by those who want to learn English, especially high schoolstudents. Generally, teaching reading comprehension is focused on mastering vocabulary only (Burns, Hodgson, Parker, & Fremont, 2011) Whereas the results of some studies showed that teaching reading comprehension can be effective if it is supported by implementing a variety of teaching strategies. A study conducted by Hagaman& Reid (2008) proved that the application of paraphrase strategies can minimize the failure in understanding reading texts. While the implementation of metacognitive strategies can improve students' ability to comprehend reading texts (Çubukcu, 2008; Li, 2010; Ahmadi, Ismail, & Abdullah, 2013; Meniado, 2016). The application of the Question-Generation strategy for reading influences reading comprehension as well (Khansir & Dashti, 2014). This strategyemphasizes the students' activities to analyzereading texts by using their prior knowledge then they are required to ask and answer questions. If students cannot answer the questions correctly, it means that the students do not comprehend the reading text well. Reading comprehension can be understood as "the search for, or an establishment of, meaning from printed text is inadequate" (Tennent, 2015). Guthrie, Wigfield, &Perencevich (1997), provide an understanding of reading comprehensions as, "the process of simultaneously extracting and constructing meaning through interaction and involvement with written language". Knowing students' readingcomprehension is important because it is related to choosing the right strategy. Each strategy has different characteristics that are influenced by various situations and conditions. By knowing the students’ reading skill, it can at least guide the teacher in choosing http://journal.unj.ac.id/unj/index.php/jisae 204 | JISAE. Volume 6 Number 2 September 2020. the right strategy. To find out an individual's ability to master a particular concept can be done in several ways, one of them is by giving a test. In preparing the test material to measure the ability, there aremany aspects to be considered. Good test criteria at least should pass the stages ofquantitative and qualitative analysis. Center for Research and Development of Balitbang of the Ministry of National Education, (2007) in Wardhani& Putra (2016), stated thatqualitative analysis can be done by examining aspects of writing techniques, language use, and compatibility of the material. While the quantitative analysis can be seen from the internal characteristics of the test obtained from the results of empirical measurements of test participants (Surapranata (2009) in Primary (2019). Besides, a qualitative test analysis can be done by knowing how accurately the test can measure the skill that needs to be measured. For quantitative analysis, the test should be validated first before being used (Azwar, 2009) in Primary (2019). In this regard, two approaches are often used to analyze test quality, namely classical test theory and classical item response theory. In recent studies, the analysis of test quality through the classical theory approach has been gradually abandoned because it has several weaknesses. Classical test theory has at least two weaknesses namely, 1) measurement results depend on the characteristics of the tests used, 2)item parameters have relied on the ability of test-takers, and 3) error measurement can only identify groups, not individuals (Mardapi, 2012). Furthermore, classical theory is weak in displaying the true abilities of test-takers. This is based on the fact that the ability of the test takers is only known from the total score by not considering the relationship between the test taker's abilities with the item characteristics (Wardhani& Putra, 2016) Item Response Theory (IRT) has the assumption that the probability of the test taker to answer correctly on each item is based on the test taker's ability. Therefore, test takers with high ability have a greater chance of answering correctly when compared to test takers who have low ability (Retnawati, 2014) There are at least three assumptions on which IRT is based, namely, unidimentional, local independence, and invariant parameters (Hambleton &Swaminathan (1985; Hambleton, Swaminathan, & Rogers 1991) in Retnawati (2014). Unidimensional means each test item only measures one ability. For example, tests of reading comprehension ability only. This confirms that the test can only be used to determine the ability of test participants in the aspect of reading comprehension alone, not other abilities. In some conditions, this assumption is difficult to do due to several factors such as cognitive, personality, environment, and even anxiety. The assumption of local independence states that there is no relationship between the responses of test- takers with different items (Hambleton et.al, 1991) in Sarea&Ruslan (2019). Whereas parameter invariance is stated as item characteristics not dependent on the distribution of the test taker's parameters and the parameters that become test taker's characteristics are not based on item characteristics (Retnawati, 2014) The advantages of IRT are; 1) the score truly reflects the test taker's ability and is not influenced by the test's difficulty, 2) the relationship between the item and the test taker's ability can be found out, 3) parallel tests are not needed to determine the reliability coefficient (Hambleton, RK, & Jones, RW, (1993) in Andayani&Ramalis (2019). According to Mardapi (2012), the one- parameter Rasch model (1-PL) is most commonly used to develop a test set. Another advantage of the Rasch model is this model can meet the main principles in measurement namely; 1) this model can produce linear measurements with the same interval, 2) it does not affect the analysis if there are missing data, 3) it gives a more precise estimate, 4) it can determine the inaccuracy of a model, and 5) it provides measurements that are independent of the parameters studied (Sumintono, B. &Widhiarso, W, 2014) in Purnomo, (2016) Studies on analyzing student abilities through the Rasch model have been carried out in various fields. This can be foundin the results of study conductedbyCamminatiello, Gallo, &Menini(2010), Osman, Naam,Jaafar, Badaruzzaman&Rahmat (2012), Runnels (2012), and Chan, Ismail&Sumintono (2014). Specifically, the results of study using the Rasch model analysis related 205 | JISAE. Volume 6 Number 2 September 2020. to reading comprehension were used by Baghaei&Carstensen (2013) to identify students' reading types through comprehensive reading tests. Aryadoust& Zhang (2016) conducted a study using the Rasch model to determine students' reading skill which are divided into two class groups. While Santos et al., (2016) only presented an analysis of test quality based on psychometric characteristics on reading comprehension tests. This research was administered to find out how students' abilities in reading comprehension were viewed from the difficulty level oftest item. In addition to finding out students' reading abilities, this study was also conducted to determine the quality of reading comprehension tests made by the teacher.The results of this study are expected to contribute to the improvement of English language learning, especially in reading comprehension learning material. METHOD This is a descriptive quantitative study to get a picture of students' readingabilities through the Rasch one-parameter (1-P) model. The subjects in this study were 8th grade students, totaling 200 responses on the teacher's reading comprehension test with five alternative answers. The teacher-made test kits were taken from the results of the implementation of formative tests through documentation techniques. Quantitative data analysis was carried out through the Rasch IRT approach with the help of the QUEST program. RESULT AND DISCUSSIONS The reading comprehension test instrument has 40 items with five answer choices. Respondents' answer patterns were analyzed using the Rasch model through QUEST software. Person Reliability The value of Person reliability at QUEST output can be known through the value of the reliability of an estimate. According toPrime (2018), the reliability criteria value of the Rasch model can be categorized as follows; <0.67 is weak, 0.67-0.80 is enough, 0.81 - 0.90 is good, 0.91 - 0.94 is very good,> 0.94 is perfect. Person reliability in this study amounted to 0.40 is classified as weak. The low value of Person reliability also indicates that tests cannot distinguish test takers' abilities (Chan et al., 2014) as shown in figure 1 the map response item. The low value of person reliability can be influenced by the level of difficulty of items that do not vary (Dwinata, 2019). Figure 1 shows that the ability of test-takers was above the average level of difficulty of the questions. However, regarding the test items classified as very difficult with a logit value <3.0, none of the test-takers has the ability equivalent to the level of difficulty of the test. This pattern also occurs in easy items. Thus the distribution of students' abilities and test items are not in line with the distribution of the normal curve, which moves from the lowest to the highest value. This is what causes the low value of person reliability which means that there are inconsistencies in the test taker's responses (Ardiyanti, 2017). Person-Item Distribution Map Distribution of the ability of test takers with the level of difficulty of items (person item distribution map) in the QUEST program can be seen in the output of Item Estimates (Thresholds) which have the same logit scale. Through the person item distribution map, we can determine the test items by the ability of each test taker. The results of the analysis showed that the average value of the ability of test-takers (Meanperson) was 1.29 above the average level of difficulty of test items (Meanitem) of 0. In figure 1, it can be seen that students' abilities are above the average of test items, but there are still 14 test items in the ‘easy’ category respondedincorrectly by students. Besides, there are also 4 difficult test items that none of them can be responded correctly by the students. Such difficulty level patterns are not sufficient to provide any information related to students' reading comprehension. 206 | JISAE. Volume 6 Number 2 September 2020. Figure 1. Person Item Distribution Map Item-Person Compliance Based on the results of the QUEST program output, it is known that students with the highest ability have a logit value of +2.67. There are two students with the highest ability namely, students with codes 017 and 037. Besides, students with the lowest ability have a logit value of - 0.16. Students with the lowest ability are students with codes 058 and 081. There are test items that cannot be responded correctly by all test takers (item no. 27) and there are test items that can be responded correctly by all test takers (item no. 18) These items are suggested not to be used because they do not provide information about students' abilities. Even though the reading comprehension of test-takers is above the average level of difficulty of the test items, this does not automatically indicate the true abilities of students. To ensure the students 'true reading comprehension’, a suitability test can be done between the level of difficulty of the items and the students' abilities(Sumintono, 2016) The criteria used to determine the suitability of the item is referred to the Outfit Mean Square (MNSQ) value of 0.5