THE FUNCTIONALITY OF THE MIDDLE VALUE OF THE INDONESIAN VERSION OF EMOTIONAL LEARNING INSTRUMENT Erwin Sulaeman Universitas Negeri Jakarta erwinsulaiman_pep17s2@mahasiswa.unj.ac.id Wardani Rahayu Universitas Negeri Jakarta Wardani.rahayu@unj.ac.id Erdawaty Kamaruddin Universitas Negeri Jakarta erda_kamaruddin@yahoo.com Winona Amanda Tiara Widodo Universitas Indonesia winona.amanda@gmail.com ABSTRACT This article discusses the psychometric validity of the Indonesian version of emotional learning instruments with a scale of five and four response categories.The purpose of this study is to produce an Indonesian version of emotional learning instrument with an effective response category scale used by Indonesians. The instrument is a modification of the scale of the Learning Environment Research Questionnaire on Emotional Climate Classroom. This study is a survey of 1494 responses of 7th and 8th grade junior high school students.Samples were selected by random sampling and based on considerations of schools implementing the 2013 Curriculum.Modification instruments consisting of 43 items were tested in obtaining validity based on item difficulty estimations and psychometric criteria with Rasch modeling.The results of this study indicate that the Andrich threshold validity testing meets the monotonic characteristics and the Standardized Residual Correlation is higher, so the scale of the five response categories is more effective to measure the Indonesian version of emotional learning instruments than the scale of the four response categories. Keywords: Functionality of middle value, ELVI, RaschModeling INTRODUCTION Emotions in the learning environment are formed from experiences and physical feelings. This condition must consider students' cognitive interests, aspirations and emotional lives to develop (Woodhouse, 2017). The importance of the learning environment influences student achievement and attitudes, (Ghosh, 2015; Koul, Fraser, Maynard, & Tade, 2018; Marchesi & Cook, 2012) reported that in the schools of Appalachian states in West Virginia, nearly 51000 students dropped out of high school due to less than 85 - 90% attendance, serious discipline violations, and stress in learning. Learning environment in classrooms embodies relationships between teachers, students, and student attitudes(López et al., 2018). 26 | JISAE. Volume 6 Number 1 February 2020. mailto:erwinsulaiman_pep17s2@mahasiswa.unj.ac.id mailto:Wardani.rahayu@unj.ac.id mailto:erda_kamaruddin@yahoo.com mailto:winona.amanda@gmail.com 27 Subjective perceptions of teachers or students are felt with various important results regarding achievements (Jones et al., 2017), emotional and social aspects (Taylor, Oberle, Durlak, & Weissberg, 2017).The progress of practice in schools can be designed through emotional ability (Jones et al., 2017; Taylor et al., 2017; Yaeger, 2017), this becomes the basis for developing the Indonesian version of emotional learning instruments (ELVI). Emotional learning in developed countries has been carried out, one of which is in Central Indiana and schools in the United States (Melnick, Cook-Harvey, &Darling-Hammond, 2017).In Indonesia, emotional learning is still theoretically introduced to character education(Suriyanti, 2015). Some research on emotions in relation to classroom environment is mostly concentrated on student anxiety.(Watt, Carmichael, & Callingham, 2017). The nature of emotional learning that influences how behavior is carried out leads to a learning environment or behavioral responses that appear on different time scales(Lowe, 2014). To get information about emotional learning, the right instrument is of course needed to be applied in Indonesia.The Learning Environment Research (LER) measurement scale was chosen in the modification of the ELVI instrument, based on recommendation of (Koul et al., 2018) about LER in Asia, that there is room for Asian researchers to modify the study environment study. To measure the level of latent nature related to the ability of emotional learning analysis using Rasch modeling. Its ability to predict missing data is based on a systematic response pattern, producing a standard measurement value of error and calibration in three ways, namely: the measurement scale, respondents, and items (Jae Jeong, 2016; Perera, Sumintono, & Jiang, 2018). Instruments said to be valid must have a scaled concept.(Perera et al., 2018). The problem of the optimal number of response categories has not been resolved, as seen from the response patterns and information retrieval (Jae Jeong, 2016). A scale with more than two or three response categories can provide maximum information retrieval (Green, 2010). Odd and even category scale, with respect to functioning of mean. Odd numbers from the response category are generally preferred, because the functioning of the middle value is interpreted as a neutral point, thus providing an opportunity to represent respondents' emotions neutrally and discriminatively. Omission of the middle value forces respondents to be wiser, resulting in a more precise ranking(Andrich, 2016; Green, 2010). The ELVI instrument was designed with five and four-category response frequency type scales. This has become a renewal in following up research (Adelson & McCoach, 2010) which previously compared the five-point scale and the four-point Likert type scale.The research has not yet investigated the effect of the number of response categories affecting the stability of student responses and helped answer whether the scale of the five response categories with functioning of middle values psychometrically outperformed the four response category scales. The effectiveness of the scale used can be known through the validity of the Andrich threshold.The purpose of this study is to determine the differences in the validity of the Andrich threshold in an ELVI instrument with a scale of five and four response categories based on Rasch modeling. Emotional learning is an inseparable component of cognitive process, testing how emotions during learning experience affect metacognitional progress that holds at the level of students' abilities(Chao, Dede, & Star, 2016). Cognitive processing is influenced by states of emotion(Lizzio, Wilson, & Simons, 2010). Emotional Learning is defined as an ability to help students recognize, express and regulate their own emotions, build relationships with peers and adults, empathize with other people's perspectives, maintain and focus attention (cognitive regulation), and understand the emotional perspectives of others.Recognizing how different situations are and deal with feelings in a prosocial way(Jones et al., 2017; Marchesi & Cook, 2012). (Swartz, 2017) divided two emotional areas, namelypersonal competence and empathy.Personal competence includes self-awareness, self-management, and social awareness. Empathy is an awareness to give attention, needs or care to others and maintain social relationships. A rating scale that involves more than two response categories is a popular response format of measurement in education. Aresponse scales is closely related to building validity(Salzberger, 2014). (Revilla, Saris, & Krosnick, 2014) showed in their study that a few response categories tend to produce smaller validity. (Green, 2010; Neumann, Neumann, & Nehm, 2011) explained that odd numbers from the response category are generally preferred over even numbers because the middle category is interpreted as a neutral point so it tends to strengthen preferences for a scale of five categories.(Wakita, Ueshima, & Noguchi, 2012) explained that a scale without neutral intermediaries is preferred because respondents are forced to make definite choices. (Sumintono, 2015) explained that the ranking scale validity analysis is conducted to verify whether the ranking of choice used confuse respondents or not.The Rasch model analysis provides a process of verifying the ranking assumptions given by looking at the Obsvd Avrge. Andrich Threshold tests whether the polytomic values used have been correct or not. (Distefano, Greer, Kamphaus, & Brown, 2015; DiStefano & Morgan, 2010) argue that the threshold as a moving point from one category to an adjacent category on the rating scale. The threshold number is equal to the number of scale categories (k-1). (Lundgren-Nilsson, Dencker, Jakobsson, Taft, & Tennant, 2014) Threshold is a point between two categories that have the same possible response.When a threshold gets broken, items can be saved again by reducing the category.(Huang, 2016) states that the higher the estimated threshold parameters, the greater the defect measured.If the defect is not too severe, the item category with some or a little difficulty can dominate. (Gonza, Zabalegui-ya, Lo, & Siso, 2014) explained 28 | JISAE. Volume 6 Number 1 February 2020. 29 that the number of responses in each category and the threshold for each item assessed the effectiveness of the rating scale. METHOD This research is a survey adopted from the post-positivism paradigm with a questionnaire method.Samples on a scale of five and four response categories on as much as 1494 student responses were taken at random in the province of Jakarta.Rasch Modeling (Kean, Bisson, Brodke, Biber, & Gross, 2018; Kutlay, Küçükdeveci, Gönül, & Tennant, 2018) describe the Rasch model, concerning the ability of nature, difficulty of items, and suitability of items used to examine psychometric properties of a collected instrument.(Andrich, 2016) explains the Rasch modeling put forward first by George Rasch from Denmark in the 1950s.According to (Kutlay et al., 2018) Rasch modeling relates to IRT as a modern measurement theory, while an existing measurement theory is stated as a classical measurement theory.According to (DiStefano & Morgan, 2010) that the Rasch model requires endurance of assumptions for accurate estimates, including (1) establishing unidimensionality, (2) monotonous scales, and (3) item fit. The ELVI instrument grids can be seen in Table 1 below: Table 1 ELVI Instrument Grids Dimension Indicator Item Number Before Modified Tota l Item Number After Modified Total Self- Awereness Captivate 33,34,35,36, 37,38,39,40 8 33,34,35,36,37,38,39 ,40 8 Self- Manageme nt Control 9,10,11,12,13,14,15 ,16 8 10,11,12,13,14,16,16 a 7 Social- Awereness Care 1,2,3,4,5,6,7,8 8 1,2,3,4,4a,5,6,7,8 9 Confer 41,42,43,44, 45,46,47,48 8 41,43,47,48 4 Relationshi p Skills Challenge 25,26,27,28, 29,30,31,32 8 26,27,29 3 Decision Making Responsibl e Clarify 17,18,19,20, 21,22,23,24 8 17,19,20,21,22,23,24 7 Consolida te 49,50,51,52, 53 5 49,50,51,52,53 5 Total 53 43 RESULT The basic requirement in construct validity is that instruments must be designed to measure one latent construct.Unidimension in Rasch modeling refers to invariant measurements(Kaliski et al., 2013). Unidimension becomes important as the essence of determining parameter estimation(Sinnema, Meyer, & Aitken, 2016). The importance of determining unidimension as proof of internal consistency(Huberty et al., 2013). The results of the unidimensional calculation of five and four response categories are shown in Table 2 below: Table 2Unidimensions for scales of five and four response categories Unidimension for five response category scale Unidimension for four response category scale Unidimensional criteria are seen in "raw variance explained by measure.‖The results in Table 2 are 39.3% for the scale of five response categories and 43.5% for the scale of four response categories.Both of them have a value greater than 20% so that the instruments meet the requirements for unidimension(Shih, Chen, Sheu, Lang, & Hsieh, 2013). Further dimensional analysis is proven through the Eigenvalue units column(Huberty et al., 2013; Kaliski et al., 2013), the value obtained is a scale of five response categories, namely: 2.6, 2.3, 2.2, 2.0, and 1.7.Variances that cannot be explained as follows: 4.0%, 3.6%, 3.5%, 3.1% and 2.7%.Eigenvalue units on a scale of four response categories: 2.7, 2.4, 1.9, 1.8, and 1.6, variance that cannot be explained: 4.1%, 3.6%, 2.9%, 2.7%, and 2.5%.An 30 | JISAE. Volume 6 Number 1 February 2020. 31 unexplained variance of both scales is less than 15%(Sinnema et al., 2016). The value of variance is in the range of 3-5% in the very strong category(Seol, 2016). Thus empirically the ELVI instrument with a scale of five and four response categories of unidimension and building construct validity. The monotonic nature of the modified ELVI instrument from the LER scale, Questionnaire on Classroom Emotional Climate. The use of frequency scales from five and four response categories can be seen in the following Table 3: Table 3Rating expression in each scale Scale Response Category 5 Never Rarely Occasional Often Always 4 Never Rarely Often Always In Table 3, the scale of the five response categories prioritizes the functioning of the middle value, placing a choice of three (3) with "occasional" indication (Naga, 2012).The scale of the four response categories negates the functioning of the middle value, so that students' responses are wiser and produce more precise rankings (Green, 2010). Table 4 Andrich thresholdin five and four scale response categories Scale Category Obsvd Avrge (5) Andrich Threshold (5) Obsvd Avrge (4) Andrich Threshold (4) -0,83 NONE -0,83 NONE -0,15 -2,17 0,17 -2,48 0,33 -0,38 1,21 0,33 0,94 0,59 2,43 2,14 1,54 1,97 (Andrich, 2011) explained that sequential threshold distances are not positively isolated and it is said that the response category can be interpreted as an ordinal scale.Table 4 shows that there was an increase in value on both scales, shown in the Observed Average column from negative to positive direction.Logit scores on a scale of five response categories start at -0.83 for choice of category 1 (never), 0.15 for category 2 (rare), 0.33 for category 3 (occasional), 0.94 for category 4 (often), and 1.54 for category 5 (always).Logit scores on a scale of four response categories start at -0.83 for category 1 (never), 0.17 for category 2 (rarely), 1.21 for category 3 (often), and 2.43 for category 4 (always). The Andrich threshold value on the scale of five monotonous response categories rises from NONE towards negative logit direction (-2.17) and leads to positive logit (1.97). The Andrich threshold value on the scale of the four monotonous response categories rises from NONE towards negative logit direction (-2.48) and leads to positive logit (2.14).Thus the increase in logit scores monotonically indicates that student responses can distinguish between the choices of response categories and verify the level of response of students who agree on the basis of both scales.This monotonic movement illustrates that items are in accordance with the students‘ choice of response categories for measurement. Fit Item in Rasch modeling can see the quality of the item's conformity to the model, explaining whether the statement item is functioning normally in taking measurements or not. Examination of mismatch index is seen in the value of Outfit Mean Square (MNSQ), Estimated Outfit Z Standard (ZSTD), and Point Measure Correlation (DiStefano & Morgan, 2010; Perera et al., 2018).MNSQ through squared standardized residual assumptions aims to determine misfits in reporting actual data.Showing a match between items and student responses that are not standardized. Criteria for an item to be declard fit, MNSQ values has to be between 0.5 to 1.5 logit (Abd-el-fattah, 2015; Elisabet, Benito, & Miguel, 2012; Harachi, 2012; Seol, 2016). ZSTD with a value of -1.96 to +1.96 indicates that an estimation is accepted(Elisabet et al., 2012; Seol, 2016). Point Measure Correlation to measure the identification of internal consistency in items and student responses. Items with negative Point Measure Correlation (-) are misfit items.Estimation in the PT-MEASURE CORR column with acceptance criteria is 0.32