ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 99 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi Teachers' Perspectives towards Validity of Teacher-Made Test Hasnan Yasin, Septia Tri Gunawan, Nida Husna, Didin Nuruddin Hidayat UIN Syarif Hidayatullah Jakarta Abstract Some studies that had been conducted showed that teacher-made tests were good and satisfactory. However, the majority of teachers did not validate their tests before administering them to the students. This study was conducted to investigate the perspective of teachers towards their-own- made (teacher-made) tests they had made. The purpose of this study was to know to what extent their agreement regarding their attitudes, quality, and use of the tests. The method used was qualitative descriptive analysis. Five English teachers from Greater Jakarta (Jabodetabek region) participated in this research. The data were gathered through a questionnaire. Their view on the test they have made was analyzed and it was then described. The results showed that (1) the teachers agreed about the appropriateness of the test they administered; (2) the teachers believed that the data quality obtained during research was useful and meaningful, and (3) the teachers used the test to identify and to evaluate their learning objectives, students’ learning needs, students’ learning difficulties, and school evaluation. Keywords: English, teachers’ perspective, teachers’ made test, validity Manuscript submitted: September 24, 2020 Manuscript revised: November 4, 2020 Accepted for publication: December 1, 2020 Introduction In Indonesia, anincredibly diverse and multicultural country, English is regarded as one of the most popular foreign. As an assessment approach, standardized tests played a crucial position both in EFL and ESL curriculum evaluation and student evaluation. Brown and Abeywickrama (2010) claimed that an exceptional standardized test was an outcome of practical research and improvement beyond merely acknowledging particular standards or benchmarks. This type of test also entailed systematic procedures for administration and scoring. Most schools around Indonesia employed standardized tests to evaluate students at each level of their educational proficiency. In some cases, particular entities, such as the Board or Ministry of Education and Culture, developed and administered standardized tests. Meanwhile, in other parts with different policies, the tests were administered by the departments within the schools (Akiyama, 2004). A valid instrument was determined the quality of the test when it was adequately conducted to measure what was deemed to be measured (Muijs, 2011). When an instrument accurately measured any destined variable, it was considered a valid instrument for that particular variable. Jackson (2003) mentioned at least four types of validity: face validity, criterion validity, content validity, or construct validity, as one of the important parts in determining good instruments. Face validity focused on the concept of whether the test seemed valid or not on its facade (Jackson, 2003). Criterion validity was ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 100 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi a notion that would be displayed in the actual study to build it required a good knowledge of theory associating the idea and measuring the relationship between the measurement and the factors related to it. Meanwhile, content validity addressed the content of items, whether it computed the concept being gauged in the study or not. Lastly was the construct validity, which covered the extent of an instrument, so it would precisely measure the theoretical construct composed to figure out the score's amount. The validity concept could be formulated as to how significantly a test measures what it was meant to be measuring. Valid evaluations produced data that could be used to provide input to educational agreements at various rates, from school advancement to teacher evaluation for individual student earnings and fulfillment. Nevertheless, Caffrey (2009) contended that validity was not an attribute of the test on its own; instead, validity referred to the extent to which specified conclusions drawn from the test results could be perceived as meeting a purpose or situation and important requirements. The validation included collecting facts to justify the use and interpretation of test results based on the principles that the test was intended to assess, defined as buildings. Suppose a test did not measure all the capacity within a concept. In that case, the judgments described from its test results might accurately reflect on the student's knowledge and thus be in place of a validity fulmination. When the test as an evaluation tool had been proven to have a clear description of the expertise and abilities and aimed to evaluate, an evaluation was considered accurate. It should be part of the validity process that when monitoring for a wide variety of learners, it should also be both compatible with the norm in determining the students' skill and calculable over test settings and scorers (Darling-Hammond, Herman & Pellegrino, 2013). Furthermore, types of data for validity assessment might include: (1) evidence of alignments, as in a statement from a functionally reliable unbiased adjustment study substantiating coordination between the evaluation and its test design, and the criteria of the authority; (2) justification of the validity of using test results for their main objectives, such as a consideration of validity in an authoritative declaration that affirms the aims of the tests, the intended explanations and the use of results; and (3) justification that scores are associated with possible external factors as anticipated, acting as summaries of investigations showing positive relationships with a) external assessments gaging similar constructs, b) student readiness teacher verdicts, or c) test-taker academic attributes. Evaluation is a daily-based work in the classroom and employed to be a guidance of the teaching-learning process. Evaluation was considered as an instrument or a process used to understand or quantify something in certain situations using certain rules (Arikunto, 2005). Additionally, an evaluation or test might be worth assessing a person's particular aptitudes and skills (Hopkins et al., 1990). Consequently, both evaluation and tests were used as a part of a particular process that educators and examiners’ effort in attempting and quantifying students’ ability by demonstrating some of the criteria as the sign of the skills being tested (Hedge, 2008). There were several types of tests that the teachers usually used to see whether or not the learning objectives were achieved. One of them is the achievement test. Achievement test could be categorized into two different types, the standardized evaluation and non-standardized test. A standardized test was when administering the test was prescribed and properly defined (Turkstra et al., 2005). Meanwhile, a non-standardized test was where the process served as an assortment of purposes, such as determining elements in domains where there was no standardized evaluation, describing performance from the context of real-world settings and cognitive requirements and supposed supports (Turkstra et al., 2005). ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 101 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi One that was included as a non-standardized evaluation was the teacher-made evaluation (also referred to as the classroom evaluation). Teacher-made evaluations were developed by topic teachers in schools or universities to rate pupils' achievement in regions covered in education. Usually, it would be limited to be based on a specific topic or group of pupils. While the standardized evaluation was valid, dependable, and contained a table of criteria, the teacher- generated test did not necessarily go through complete sorts of standardization (Okpala et al., 1993). The standardized evaluation was usually made to be used on a far bigger scale compared to teacher- made tests. Therefore, it was exposed to a string of standardization procedures until they were administered on pupils. The teacher-made test used here was included the daily tests, midterm test, and final term test, which had been administered to the students on a smaller scale. There was plenty of research that had been conducted on the validity of teacher-made tests. Nurhalimah et al. (2019) researched the quality of English teacher-made tests. Her findings showed that most items (80%) in teacher-made tests were in the rate of excellent, good, and satisfactory. This was one of the evidence that teacher-made test quality should not be not taken for granted. In her research, she also gave some comparison to standardized tests. It had been proved that 50% of standardized test items were irrelevant, while teacher-made tests were more superficial. It could be one reason why high scores in schools obtained lower scores in national examinations (Razali & Jannah, 2015). However, most teachers did not validate their test items before administering (Ugwu & Mkpuma, 2019). Despite the urgency and impact of the validity in testing (Friberg, 2010), some teachers still did not consider their test validity. Since no previous studies above explored teachers’ notion on a test, the present study attempted to examine the teachers' perspective towards their- own-made (teacher-made) test by formulating several research questions: (a) what are teachers’ attitudes towards the appropriateness of the test? (b) what are teachers’ perceptions of accuracy in a test? (c) what are the uses of a test for teachers? This inquiry was expected to provide the teachers’ agreement regarding their attitudes, quality, and use of the tests. Methods Research design This study used a qualitative method with a descriptive analysis to analyze teachers’ perspectives on teacher-made test validity. It was used to describe teachers’ perspectives about the teacher-made test and its effect on their teaching. This study's focus was on the perspective or view that was owned by the teachers of the teacher-made test. It was meant to demonstrate the approaches used to determine the validity of the test used by the teachers. This research participants were five English teachers who had been teaching for one to ten years, and they were from Jakarta, Bogor, Depok, Tangerang, and Bekasi (Jabodetabek) area. The participants were chosen by applying a random sampling technique based on availability, and they had a habit of making their tests. Data collection and analysis An adapted questionnaire from Kyriakides (2004) was deployed to see the extent to which the teachers view the validity of the teacher-made test. It was important to remember that the data used was a teacher-made test implemented in the final test. After administering the test, the teacher was asked to fill out a questionnaire. The questionnaire covering the following issues, such as (a) the teacher's attitude towards the suitability of test; (b) teacher attitudes towards the quality of data obtained from tests; and (c) the use of the test by teacher, was adapted from Kyriakides (2004). The questionnaire reliability results were assessed by measuring Cronbach Alpha values relative to the ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 102 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi scale used to assess the teacher's perspective on teacher-made assessments. To measure the teacher responses, Cronbach Alpha is valued for the five scales used in the questionnaire (Cronbach, 1990). Findings Five teachers participated in this research by answering the questionnaire related to teacher attitudes regarding the test's appropriateness, teacher attitudes regarding the quality of the data obtained from the test and the test used by teacher. Teacher attitudes toward the appropriateness of the test Table 1. Responses of teacher attitudes towards the appropriateness of the test means and standard deviations Descriptive Statistics N Mean Std. Deviation the usefulness of data 5 4.40 .548 the evaluative criteria 5 4.40 .548 the scoring 5 4.20 .837 Valid N (listwise) 5 Data from Table 1 was based on the teacher's responses in answering the questionnaires that were related to the suitability of each test activity. The teacher's assessment of the suitability of the testing activity might be based on the suitability of the information collected, the suitability of the topic being assessed and evaluative criteria, and the assessment guidelines' openness. Thus, the teachers were to rate items on a 1 (absolutely disagree) scale to 5 (absolutely agree). The result showed that the average value of items in line one was high (4.40), where the maximum score was 5, and the standard deviation was relatively low. This showed that, on average, most respondents agreed about using the tests they had made as providing information about their students' literacy skills. Second, the average value of items in line two shows the same value as the previous one (4.40), which shows that evaluative criteria are appropriate. Finally, the average score is lower than the previous two (4.20); therefore, it is still of high value and shows that the teacher considers the assessment guidelines for the tests to be beneficial. Teacher attitudes towards the quality of the data obtained from the test Table 2. Percentages of respondents concerning perceptions of accuracy of test result and factors that influenced students’ test result, their means, and standard deviations Descriptive Statistics N Agree* Disagree** Mean Std. Deviation Test scores give me some idea of students’ literacy 5 60% 20% 4.00 .707 ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 103 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi ability The test is indiscrimin ate and too simple 5 20% 80% 1.80 .447 Test scores only rank students compared to each other 5 20% 80% 3.00 1.414 Test scores in each activity reveal the learning needs of each student in specific aspects of literacy assessed by the test 5 100% 0% 4.40 .548 Student scores are affected by the fact that students are not interested in demonstrati ng their skills 5 60% 40% 3.80 1.095 Teacher’s knowledge about the individual student is critical to the interpretati on or meaning 5 100% 0% 4.40 .548 ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 104 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi given to student’s responses to test activities Student scores are affected by the context of each test activity, which is familiar only to some groups of students (e.g., middle- class rather than working- class students) 5 40% 40% 2.60 .894 Student scores are affected by the fact that students are not familiar with the form of activities included in the test 5 40% 60% 2.80 1.643 Student scores are affected by anxiety 5 40% 60% 3.00 1.414 Teacher- made test doesn’t portray minority language to 5 20% 60% 2.40 1.140 ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 105 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi students accurately Valid N (listwise) 5 *The respondents either agree or absolutely agree **The respondents disagree or absolutely disagree Most respondents (80%) did not think that the test was indiscriminate and too simple. Additionally, they argued that test results were not only useful compared to each other in ranking students but were also useful in giving teachers ideas about their student literacy skills. All respondents (100%) supported the idea that student scores from tests expressed students’ learning needs in particular literacy aspects. This assumed that the teacher found the test result was useful for both the formative and summative evaluation functions. However, more than 40 percent of respondents accepted that grades were affected by the fact that students did not want to show their skills. Furthermore, all respondents believed the knowledge of individuals influenced that test results. About half of the respondents also believed that their familiarity with some students' tests affected the score results. 40% of respondents also supported the idea that student scores were influenced by anxiety. Finally, more than half of respondents believed that this test accurately described minority languages. The uses of test by teachers This described how the teachers used the test further. The data of the respondents were shown in the following table. Table 3. The use of test by the teacher and its means and standard deviations Descriptive Statistics N Mean Std. Deviation The test helps teachers to identify whether the objectives were achieved 5 4.60 .548 The test helps teachers to identify the learning need of students 5 4.60 .548 The test helps teachers to identify students learning difficulties 5 4.00 1.225 The test is used for summative reasons 5 3.80 1.095 The test is used as sources for evaluating the effectiveness of the school 5 4.60 .548 Valid N (listwise) 5 Table 3 showed that the average test helped the teacher identify whether goals were achieved, students' learning needs, and evaluation of high school effectiveness (4.60) and low standard deviations. This showed that the test or test results were used to do all three things. It also helped teachers to identify student difficulties and was used for summative reasons. This was based on the average value of 4.00 and 3.80, respectively. ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 106 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi Discussion The information mentioned above could be explained in terms of its impacts on the improvement of teacher-made tests and, in particular, to increase the usefulness of information resulting from tests in decision making and provide appropriate explanations for basic assessment. Moreover, a more general question emerged concerning the significance of examining the instructor's definition as a way of determining the validity test. First, teachers agreed that this exam offered a depth of information on their students' literacy skills. This idea agrees with Brown (2003), who stated that the test's function is to measure a person’s ability, knowledge, and performance. They also often considered evaluative criteria to measure student responses to each related test operation. Furthermore, criteria for evaluation of almost all tasks were found very helpful. This indicated that the teacher seemed to assume that the teacher-made test had to evaluate students for its validity. In addition to that, some research findings showed that most of the teacher-made tests that had been administered to students indicated valid (e.g., Irhamsyah, 2020; Sugianto, 2017). On the other hand, another research found that the teacher- made assessments' validity was low (Minda, 2018). One among other reasons for these varieties is the teachers. As mentioned in a research, the experienced teachers who have gone through training on test development and analysis tended to design tests with higher validity and reliability than their counterparts without such training (Odimo, 2014). Second, the teacher claimed that the teacher-made test provided information on students’ literacy that was to be evaluated. The findings were seen as offering more facts about the validity of the test. The teacher also believed that several factors affect student test scores or results, including student interest, teacher knowledge about individuals, the context of the test, students' familiarity with several groups of students, and anxiety. It is in line with what had been found in several research studies indicating that test scores are affected by many factors (El-Omari, 2016; Farooq et al., 2011; Jurkovic, 2010; Khamkhien, 2010; Shvidko et al., 2015). Third, it could be claimed that teachers used tests or test scores to identify or evaluate whether goals were achieved, students’ learning needs, students' and learning difficulties. According to other research, test scores could be used in evaluating students and teachers themselves as part of a broader form of teachers’ evaluation (Baker, 2010; Corcoran, 2010) or even principals of the school (Grissom et al., 2015). This test could also be viewed as a summative reason and a school evaluation for its effectiveness. Conclusion To conclude, the gathered data showed that (1) the teachers agreed with the appropriateness of the test they administered; (2) the teachers also believed the quality of data obtained during research was useful and meaningful; and (3) the teachers used the test to identify and evaluate learning objectives, students’ learning needs, students’ learning difficulties, and school evaluation. Therefore, it was reasonable to assume that validity was a crucial aspect of constructing a test. It is strongly suggested that teachers do some validity test at least randomly if regularly is difficult to conduct. However, one thing that may become a problem, many teachers use the same test several times, and they are barely changing into new ones. Whether or not the validity of their tests is in line with those tests' reliability, it may become a further area to be researched. ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 107 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi References Akiyama, T. (2004). Introducing EFL speaking tests into a Japanese senior high school entrance examination. Melbourne: University of Melbourne. Arikunto, S. (2005). Dasar-dasar evaluasi pendidikan. Indonesia: Bumi Aksara. Baker, E. L. (2010). Problems with the use of student test scores to evaluate teachers. Retrieved from: http://epi.3cdn.net/b9667271ee6c154195_t9m6iij8k.pdf Brown, D., & Abeywickrama, P. (2010). Language assessment principles and classroom practices (2nd ed.). White Plain,NY: Pearson Education. Brown, H. D. (2003). Language assessment principles and classroom practices. London:Longman. Caffrey, E. (2009). Assessment in elementary and secondary education: A Primer. English: Congressional Research Service). Retrieved from: https://fas.org/sgp/crs/misc/R40514.pdf. Corcoran, S. P. (2010). Can Teachers Be Evaluated By Their Students’ Test Scores? Should They Be? The Use Of Value-Added Measures Of Teacher Effectiveness In Policy And Practice. Texas:Education Policy for Action Series. Cronbach, L. J. (1990). Essentials of psychological testing. New York,NY: Harper & Row. Darling-Hammond, L. Herman, J., & Pellegrino, J. (2013). Criteria for high-quality assessment. Stanford,CA: Stanford Center for Opportunity Policy in Education. El-Omari, A. H. (2016). Factors affectingsStudents’ achievement in English language learning. Journal of Educational and Social Research, 6(2), 9–18. Farooq, M. S., Chaudhry, A. H., Shafiq, M., & Berhanu, G. (2011). Factors affecting students’ quality of academic performance: A Case of Secondary School Level. Journal of Quality and Technology Management, 7(2), 1–14. Friberg, J. C. (2010). Considerations for test selection: How do validity and reliability impact diagnostic decisions?. Child Language Teaching and Therapy, 26(1), 77–92. Grissom, J. A., Kalogrides, D., & Loeb, S. (2015). Using student test scores to measure principal performance. Educational Evaluation and Policy Analysis, 37(1), 3–28. Hedge, T. (2008). Teaching and learning in the language classroom. Oxford ,UK: Oxford University Press. Hopkins, K. D., Stanley, J. C., & Hopkins, B. R. (1990). Educational and Psychological Measurement and Evaluation. Boston: Prentice Hall. Irhamsyah, L. H. (2020). The analysis of the teacher-made test for senior high school at State Senior High School 1 Kutacane, Aceh Tenggara. Jurnal Ilmiah DIDAKTIKA, 21(1), 10–20. Jackson, S. L. (2003). Research methods and statistics: A critical thinking approach. Wadsworth : Thomson Wadsworth. Jurkovic, V. (2010). Language learner strategies and linguistic competence as factors affecting achievement test scores in English for Specific Purposes. TESOL Journal, 1(4), 449–469. Khamkhien, A. (2010). Factors affecting language learning strategy. Electronic Journal of Foreign Language Teaching, 7(1), 66–85. Kyriakides, L. (2004). Investigating validity from teachers’ perspectives through their engagement in large-scale assessment: The Emergent Literacy Baseline Assessment project. Assessment in Education: Principles, Policy & Practice, 11(2), 143–165. Minda, M. H. (2018). Content Validity of EFL teacher-made assessment: The case of Communicative English Skills Course at Ambo University. East African Journal of Social Sciences and Humanities, 3(1), 41–62. Muijs, D. (2011). Doing Quantitative Research in Education with SPSS. London,UK: SAGE Publications. Nurhalimah, N., Fahriany, F., & Dadan, D. (2019). Determining the quality of English teacher-made ISSN |2355-3669| E-ISSN |2503-2518| Volume 7| Number 2|Dec 2020| 108 Available online at http://jurnal.radenfatah.ac.id/index.php/edukasi test: How excellent is excellent? Indonesia. Indonesiann EFL Journal: Journal of ELT, Linguistics, and Literature,5(1), 24-38. Odimo, L. (2014). Validity and reliability of teacher-made tests: Case study of year 11 physics in nyahururu district of Kenya. African Educational Research Journal, 2(2), 61–71. Okpala, P. N., Onocha, C. O., & Oyedeji, O. A. (1993). Measurement and evaluation in education. Stirling: Horden Publishers. Razali, K., & Jannah, M. (2015). The comparison between National Final Examination test items and English teacher made-test items of 2010 and 2011. Al-Talim Journal, 22(1), 10–22. Shvidko, E., Evans, N. W., & Hartshorn, K. J. (2015). Factors affecting language use outside the ESL classroom: Student perspectives. SYSTEM, 51, 11–27. Sugianto, A. (2017). Validity and Reliability of English Summative Test for Senior High School. Indonesian EFL Journal: Journal of ELT, Linguistics, and Literature, 3(2), 22–38. Turkstra, L. S., Coelho, C., & Ylvisaker, M. (2005). The use of standardized tests for individuals with Cognitive-Communication Disorders. Seminars in Speech and Language, 26(4), 215–222. Ugwu, N. G., & Mkpuma, S. O. (2019). Ensuring quality in education: validity of teacher-made language tests in secondary schools in Ebonyi State. American Journal of Educational Research, 7(7), 518–523.