102 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Test Instrument Validation in Plane Geometry using Rasch Model Eyus Sudihartinih, Sufyani Prabawanto Department of Mathematics Education, Universitas Pendidikan Indonesia Email: eyuss84@upi.edu Abstract The purpose of this study was to describe the results of the analysis of the quality of the students' initial mathematical ability in the concept of plane geometry through the Rasch model. The benefit of this research is to provide knowledge about the validation of geometrical instruments with the Rasch model. This research is descriptive quantitative research. The study population was all students who attended two classes of analytic geometry at a university in Bandung. The sample was selected by purposive sampling so that one class of students was selected, consisting of 44 students studying analytic geometry (30 women and 14 boys). The research instrument was an essay that consisted of four geometry concepts. Based on the research results, it is known that all items meet the standard criteria as a measuring tool so that these questions can be used as instruments in further research.. Keywords: instrument validation, plane geometry, descriptive quantitative, Rasch model. INTRODUCTION Research on instrument testing plays an important role in data collection. The main indicators of the quality of research instruments are reliability and validity (Kimberlin & Winterstein, 2008). A valid instrument means that it can be used to measure what should be measured so that validity is an absolute requirement to produce valid research (Hidayati, 2012). A valid instrument is needed by a writer in research on geometry so that it needs the design and analysis of the instrument on the concept of geometry, especially the concept of plane geometry. It is because geometry has an important role to play. Volderman stated that geometry plays a role in our lives (Kambilombilo & Sakala, 2015); geometry plays a role in the concepts of astronomy, chemistry, biology, algebra, statistics, and calculus (Luneta, 2014). In Indonesia, research on instrument validation has been carried out, including the Rasch model (Khumaeroh, Susongko, & M. Shaefur Rokhman, 2017; Nisa, Susongko, & Wikan Budi Utami, 2017; Purnomo, 2016; Susdelina, Perdana, & Febrian, 2018). The development of analysis using the Rasch model is one of the Theory Response Items that has been carried out since the 1960s by George Rasch (Rasch, 1968). This mathematical model was later popularized by Benjamin Wright and Geoff Masters (Wright & Masters, 1982). Data analysis with the Rasch model can be helped by Winstep software developed by Linacre (Linacre, 2006). The advantage of the Rasch model is that it can determine the reliability and validity of research instruments (Bond & Fox, 2007; Razali & Shahbodin, 2016). The Rasch model can produce preferred and more precise measurement instruments (Sumintono & Widhiarso, 2014). The advantages of Rasch modeling can provide linear scales at the same interval, predict missing data, provide more precise estimates, detect 103 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ inaccurate models and produce replicable measurements (Sumintono, B.; and Widhiarso, W., 2014). Thus in this article explained about the validation of the instrument geometry problems using the Rasch model. Research on the Rasch model for instrument validation has been carried out by several researchers (Khumaeroh et al., 2017; Maseko, Luneta, & Long, 2019; Nisa et al., 2017; Purnomo, 2016; Susdelina et al., 2018). Meanwhile, several other researchers studied the Rasch model for ability analysis (Folastri, Rangka, & Ifdil, 2017; Sari, Sekarwana, Hinduan, & Sumintono, 2016; Sudihartinih, Purniati, & Rohayati, 2019; Sudihartinih & Wahyudin, 2019a, 2019b; Widhiarso & Sumintono, 2016). In this study, instrument validation on plane geometry was not found. RESEARCH METHOD This research is descriptive quantitative research. The study population was all students who attended two classes of analytic geometry at a university in Bandung, Indonesia. The sample was selected by purposive sampling so that one class of students was selected, consisting of 44 students studying analytic geometry (30 girls and 14 boys). The instrument in this study was four problems in the concept of plane geometry. Three experts consulted questions before being tested on students. Here are the test instruments. 1. A segment A.B. with 𝐴(3, −6) and 𝐴(−5, −8) is known. Determine the distance of the midpoint of the segment to the line 3𝑥 − 4𝑦 = −8. 2. Look for the point P which lies on the line through 𝑃1(2, −5) and 𝑃2(−3,10) so |𝑃1𝑃)⃗⃗⃗⃗⃗⃗⃗⃗ ⃗| = 3|𝑃𝑃2)⃗⃗⃗⃗⃗⃗⃗⃗ ⃗|. 3. Find the equation of a line through (7, −3) intersect on coordinate axes at the intersection point of the same axis. 4. Find the equation of the circle whose center is at 3𝑥 − 5𝑦 = 8 and offends its coordinate axis The question is tested during the midterm student in two hours. Then the students' answers were given a score of 0-4. The scoring is as follows. Table 1. Score Information Score No answer 0 Can define graph 1 Can write the first equation 1 Can solve the first equation 1 Can write the second equation 1 Then the data were analyzed using the Rasch model with Winstep version 4.4.6. The steps are to analyze unidimensionality, person-item maps, item analysis, student's ability analysis, and instrument analysis. 1. Unidimensionality of measurement can be proven if Raw variance explained by measures ≥ 20% (Note: general criteria for interpretation are: enough if 20 − 40%, good if 40 − 60%, and excellent if above 60%) and if Unexplained variance in 1st to 5th contrast of residuals < 15% each (Bambang Sumintono & Widhiarso, 2014). 104 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ 2. The person-item map is above the average logit item, which is 0.00, which means that the average participant's ability is above the average item standard difficulty level. 3. For analysis items with criteria for checking the suitability of the item (item fit) or item mismatch (outliers or misfit) (Boone, Staver, & Yale, 2014) are as follows: (1) MNSQ OUTFIT value is greater than 0.5 and smaller than 1.5 and closer to 1 the better; (2) ZSTD OUTFIT value greater than −2.0 and smaller than +2.0 the closer to 0 the better; and (3) the value of PT MEASURE CORR is more than 0.4 and less than 0.85. An item can be considered fit if it meets at least 1 of the three criteria. 4. Analysis of students' abilities is by grouping them into high, medium, and high ability categories. Instrument analysis, namely, analyzing the mean, standard deviation, separation, reliability, and Cronbach Alpha values. RESULTS AND DISCUSSION 1. Unidimensionalitas Unidimensionality analysis is needed to identify whether the instrument developed can measure what should be measured. Figure 1. Unidimensionalitas The results of data analysis in Figure 1 show that Raw variance explained by measures was observed at 57.2%, including the good category (Sumintono & Widhiarso, 2014). Whereas Unexplained variance in 1st to 5st contrast of residuals were 16.6%, 14.3%, 11.8%, 0.2%, and 0.0% respectively. 2. Analysis of Wright Map (Person-Item Map) Wright map analysis can be seen in Figure 2. 105 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Figure 2. Variable map 106 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Based on figure 2, it is known that the student's ability map spreads in a range of -1 to 2 logits. Students' abilities are at -3 logit to 3 logits. The position of N2 and N3 questions is between -SD and + SD so that the item items are within the student's ability while the N3 item items are above + SD so that they are above the student's ability. While the item N1 is under -SD so it has a difficulty level of items that are below the ability of students. 3. Item Analysis This item analysis includes the level of difficulty (item measure), the level of suitability of item items (item fit), and the detection of item bias items. 2.1 Item Difficulty Level The level of difficulty items can be examined in Figure 3: Item Measure. Figure 3. Item measure From Figure 3, it is known that the SD value is 0.67. This SD value if combined with the average value of logit then the level of difficulty of items can be grouped into very difficult categories (greater +1 SD), hard categories (0.0 logit + 1 SD), easy categories (0.0 logit - 1 SD), and very easy categories (less than -1 SD). Thus, the limit value for the very difficult category is more than 0.67, the hard category is 0.00 to 0.67, the easy category is -0.67 to less than 0.00, and the very easy category is less than -0,67. Based on Figure 3 in sequence based on the level of difficulty (from the most difficult item to the easiest item), it is known that there is one item that is categorized as very difficult, namely item N3. The hard category is one item, N4. The easy category is one item, N2. While the category is very easy, there is 1 item, namely the question N1. 2.2 Item Match Level Untuk melihat item yang berfungsi normal untuk pengukuran dapat ditinjau berdasarkan data pada Gambar 4: Item Fit Order yaitu kolom OUTFIT mean square (MNSQ), OUTFIT Z-standard (ZSTD), dan pengukuran titik (PT MEASURE CORR). 107 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Figure 4. Misfit order Based on criteria 1, 2, and 3, there are no misfit items (Boone, Staver, & Yale, 2014). Thus, all items of students' mathematical initial ability test items are declared fit in the sense of functioning normally and can be understood correctly by students and can measure what must be measured in this case is the initial mathematical ability. 2.3 Rating Scale Diagnostic This diagnosis is carried out to determine whether participants have different answers in scores 0, 1, 2, 3, and 4. Figure 5. Diagnostic Differences in answers made by respondents if the observed average and Andrich Threshold values in Figure 5 show suitability and are equally increased in alternative answers 0, 1, 2, 3, 4. Thus it can be stated that students have answers on scores 0, 1, 2, 3, 4. 108 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ 2.4 Detection of Bias Items An item statement is said to contain bias if the probability value of the items, as listed in figure 6, is below 0.05 (Sumintono, B.; and Widhiarso, W., 2014). In the context of this study, bias can only be seen from the perspective of gender. Figure 6. Bias Item The results of the analysis of bias based on gender note there is no single item that is biased. An overall picture of the logit position for each item by gender can be seen in the following figure. Figure 7. DIF measure From the picture, it appears that each item can be worked out by male and female students. 4. Analysis of Student Ability This analysis is carried out on two things, namely the level of individual ability (person measure) and the level of individual suitability (person measure). 4.1 Analysis of Individual Ability Data on individual student's ability can be found in Person Measure From this figure. SD values are 1.19. This SD value when combined with an average logit (mean) value of 0.47 means that individual students' abilities can be grouped into the category of high ability (greater than 0.47 + 1.19 = 1.66), medium ability category (between 0.47 - 1.19 = -0.72 and 0.47 + 1.19 = 1.66 or -0.72 and 1.19), and the category of low ability (less than 0.47 - 1.19 = -0.72). Thus, the logit value limit for the high ability category is more than 1.66, the category of moderate ability from -0.72 to 1.66, and the category of low ability is less than -0.72. 109 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Figure 8. Person measure In Figure 8, it is known sequentially based on ability level, it is known that six people are included in the high ability category, 32 people are in the medium ability category, and six people are in the low ability category. 4.2 Level of Individual Suitability The suitability of individual responses based on their abilities can be examined based on the data in Figure 9, namely OUTFIT mean squire column (MNSQ), Z-standard OUTFIT (ZSTD), and point measuring correlation (PT MEASURE CORR). 110 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Figure 9. Person Fit Order Based on criteria (Boone, Staver, & Yale, 2014) is known that all students are declared fit in the sense of giving answers according to their level of ability. 111 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ 5. Instrument Analysis For instrument analysis, the information presented in figure: Summary Statistics are used. Figure 10. Summary Statistic Based on Figure 10, the following information is known. Table 2. Summary Statistic Mean SD Separation Reliability Cronbach Alpha Person 0,41 1.12 1,36 0,65 0,67 Item 0,00 0.67 3.54 0,93 Based on Table 2, a person measures 0.41 logit shows the average score of all participants working on items of initial student ability. The average value of a person that is greater than the average item (where the average item is 0.00 logit) shows that 112 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ the ability of participants is generally greater than the difficulty of the item items of the instrument. Cronbach Alpha value, which represents the interaction between person and item items as a whole, is 0.67, including enough category. Furthermore, the Person Reliability value is 0.65 as an indicator of the consistency of the respondents' answers, including the sufficient category. Item reliability of 0.93 as an indicator of the quality of the items on the instrument belongs to the very good category (Sumintono, B.; and Widhiarso, W., 2014). Other data in a table that can be used are MNSQ INFIT and MNSQ OUTFIT, both in Table Person and Table Item. Based on the table, it is known that the average value of MNSQ INFIT and MNSQ OUTFIT are 0.98 and 0.98, respectively. Meanwhile, based on table item, it is known that the average value of INFIT MNSQ and OUTFIT MNSQ are 0.99 and 0.98, respectively. The criteria, the closer to number 1 the better, because the ideal value is 1 (Sumintono, B.; and Widhiarso, W., 2014). Thus, the average person and item approach the ideal criteria. Meanwhile, related to INFIT ZSTD and OUTFIT ZSTD, the average values for the person, are -0.04 and 0.01, respectively. In contrast, the value of INFIT ZSTD and OUTFIT ZSTD for each item are -0.05 and -0.05. The ideal value of ZSTD is 0, the closer it is to 0, the better (Sumintono, B.; and Widhiarso, W., 2014). Thus it can be said that the quality of the person and items is good. Thus it can be said that the quality of people and items is the latest when it comes to separation or grouping of people and items. Distinct separation shows how well a set of items in a student's geometry level thinking instrument spreads along with the range of logit skills. The greater the individual's separation, the better the instruments are arranged because the items in it can reach individuals with high to low levels of ability. In contrast, item separation shows how large the sample subject to measurement is spread along a linear interval scale. The higher the separation of items, the better the measurements are made. This index is also useful for defining the significance of the construct being measured. In Figure 10, it is known that the separation for one person is 1.36, and one item is 3.54. The greater the separation value, the better the overall quality of the person and instrument. Separation values are calculated more precisely through the formula: H = {(4 x separation) + 1} / 3. Thus the separation value for a person is 2.15 rounded to 2, while the separation for an item is 5.05 rounded to 5. This implies that the study participants have a variety of abilities that can be categorized into two groups. Meanwhile, the difficulty level items are spread out into five groups, from the easiest to the most difficult groups. Based on information on measurement results, the picture is obtained, as shown in the following figure. 113 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Figure 11. Measure on latent variable The figure indicates that items about the level of geometrical thinking of students are more likely to produce great information on individuals with moderate levels of ability. Based on the findings, it is known that the instrument developed is valid so that this instrument will be able to measure the initial mathematical ability of the two- dimensional geometry concept. Although Rasch's analysis is very quantitative, it is clear that Rasch's analysis is also rich in qualitative (Boone, Townsend, & Staver, 2011). The first analysis shows the test instrument to have a good conceptual basis and be well targeted to groups, with a variety of items, so that students who have lower abilities can now answer a set of questions relatively easily. In contrast, while students with high ability skills will experience several things that are challenging (Maseko et al., 2019). The research can be continued with capability analysis (Folastri et al., 2017; Sari et al., 2016; Widhiarso & Sumintono, 2016). CONCLUSION All items meet the standard criteria as a measuring tool. The Cronbach Alpha value, which represents the interaction between the item person and the item as a whole is in the sufficient category. The Value of Person Reliability as an indicator of the consistency of respondents' answers is in the sufficient category. In contrast, item reliability as an indicator of the quality items on the instrument is classified as very good. Items are more likely to produce high levels of information about individuals of moderate ability. All students are declared fit in the sense of giving answers according to their level of ability, meaning that students are serious in giving answers. The position of the difficulty level is 114 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ one item that is categorized as very difficult, namely the item N3 question. The hard category is one item, N4. The easy category is one item, N2. While the very easy category has 1 item, namely question N1. ACKNOWLEDGMENTS We would like to thank all students who have helped in the completion of this research well. REFERENCES Bond, T. G., & Fox, C. M. (2007). Applying The Rasch Model: Fundamental Measurement in the Human Sciences. Mahwah, N.J.: Lawrence Erlbaum Associates Publishers. Boone, W. J., Staver, J. R., & Yale, M. S. (2014). Rasch Analysis in the Human Sciences. Springer Dordrecht Heidelberg New York London. Boone, W. J., Townsend, J. S., & Staver, J. (2011). Using Rasch Theory to Guide the Practice of Survey Development and Survey Data Analysis in Science Education and to Inform Science Reform Efforts: An Exemplar Utilizing STEBI Self-Efficacy Data. Science Education, 95(2). Folastri, S., Rangka, I. B., & Ifdil. (2017). Student's Self-concept Profile Based on Gender: a Rasch Analysis. Advances in Social Science, Education and Humanities Research, Volume 118 9th International Conference for Science Educators and Teachers (ICSET). Hidayati, K. (2012). Validasi Instrumen Non Tes dalam Penelitian Pendidikan Matematika. Prosiding Jurusan Pendidikan Matematika. Kambilombilo, D., & Sakala, W. (2015). An Investigation into the Challenges In-Service Student Teachers Encounter in Transformational Geometry, "Reflection and Rotation". The Case of Mufulira College of Education. Journal of Education and Practice, 6(2), 139–149. Khumaeroh, S. U., Susongko, P., & M. Shaefur Rokhman. (2017). Penyusunan Skala Sikap Peserta Didik Terhadap Matematika Dengan Penerapan Model Rasch. Jurnal Pendidikan MIPA Pancasakti, 1(1), 35–42. Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. American Journal of Health-System Pharmacy, 65(23), 2276–2284. Linacre, J. M. (2006). A User's Guide to Winstep Ministep Rasch- Model Computer Programme. Retrieved from www.winstep.com. Luneta, K. (2014). Errors Displayed By Learners In The Learning Of Grade 11 Geometry. 12, 26– 44. Maseko, J., Luneta, K., & Long, C. (2019). Towards validation of a rational number instrument: An application of Rasch measurement theory. Pythagoras, 40(1), a441. Nisa, K., Susongko, P., & Wikan Budi Utami. (2017). Penyusunan Skala Minat Belajar Matematika Dengan Penerapan Model Rasch. Jurnal Pendidikan MIPA Pancasakti, 1(1), 35–42. Purnomo, S. (2016). Pengembangan Soal Matematika Model PISA Konten Space dan Shape untuk Mengetahui Level Kemampuan Berpikir Tingkat Tinggi Berdasarkan Model Rasch. Rasch, G. (1968). A Mathematical Theory Of Objectivity And Its Consequences For Model Construction. EUROPEAN Meeting on Statistics, Econometrics and Management Science, Amsterdam 2-7 September 1968. Razali, S. N., & Shahbodin, F. (2016). Questionnaire on perception of online collaborativelearning: measuring validity and reliability using Rasch model. Proceedings of the 4th International Conference on User Science and Engineering. Melaka, Malaysia, 1. Sari, D. R., Sekarwana, N., Hinduan, Z. R., & Sumintono, B. (2016). Analisis Tingkat Kepuasan Masyarakat terhadap Dimensi Kualitas Pelayanan Tenaga Pelaksana Eliminasi Menggunakan Pemodelan Rasch. JSK, 2(1). 115 Mathematics Education Journals Vol. 4 No. 2 August 2020 ISSN : 2579-5724 ISSN : 2579-5260 (Online) http://ejournal.umm.ac.id/index.php/MEJ Sudihartinih, E., Purniati, T., & Rohayati, A. (2019). Analisis Sikap Mahasiswa Calon Guru Matematika Dalam Perkuliahan Geometri Analitik Dengan Model Rasch. KNPM 8 – IKIP SILIWANGI – Cimahi, 1 Agustus 2019, 182–191. Sudihartinih, E., & Wahyudin. (2019a). Analysis of students' self efficacy reviewed by geometric thinking levels and gender using rasch model. Journal of Engineering Science and Technology, 14(1), 509–519. Sudihartinih, E., & Wahyudin, W. (2019b). Pembelajaran Berbasis Digital: Studi Penggunaan Geogebra Berbantuan E-Learning Untuk Meningkatkan Hasil Belajar Matematika. Jurnal Tatsqif, 17(1), 87–103. https://doi.org/10.20414/jtq.v17i1.944 Sumintono, B., & Widhiarso, W. (2014). Aplikasi Model Rasch untuk Penelitian Ilmu-ilmu Sosial. Cimahi, Indonesia: Trim Komunikata Publishing House. Susdelina, Perdana, S. A., & Febrian. (2018). Analisis Kualitas Instrumen Pengukuran Pemahaman Konsep Persamaan Kuadrat Melalui Teori Tes Klasik Dan Rasch Model. Jurnal Kiprah, 6(1), 41–48. https://doi.org/10.31629/kiprah.v6i1.574 Widhiarso, W., & Sumintono, B. (2016). Examining response aberrance as a cause of outliers in statistical analysis. Personality and Individual Differences, 98, 11–15. https://doi.org/10.1016/j.paid.2016.03.099 Wright, B. D., & Masters, G. N. (1982). Rating scale analysis Benjamin D. Wright Geofferey N. Masters. Chicago: MESA PRESS.