140 | JISAE. Volume 6 Number 2 September 2020.  

 
https://doi.org/10.21009/JISAE 

JISAE (Journal of Indonesian Student Assessment and Evaluation) 

ISSN : P-ISSN: 2442-4919│E-ISSN: 2597-8934 
Vol 6 No 2 (2020) 

Website  : http://journal.unj.ac.id/unj/index.php/jisae
 

THE APPLICATION-BASED ANALYSIS OF QUESTIONS ITEM 

QUALITY IN JUNIOR HIGH SCHOOL 
 

Rabatul Adawiah 

 
Civics Education Department Faculty of 

Teacher Training and Education 

Lambung Mangkurat University 

 
ABSTRACT 

The activity of analyzing the items is one of the obligations for each 

teacher in an effort to improve the quality of the questions. However, 

for Civics Education teachers, this has never been done especially for 

questions created by the Subject Teachers' Consultation which are 

used for the end-semester assessment. The purpose of this study was to 

determine the quality of the questions based on distinguishing feature, 

level of difficulty, and effectiveness of the distractor. This study is an 

evaluation study of 50 items of  Civics Education Subject test in 

Banjarmasin, totaling 50 questions in the form of multiple choice 

questions at the end-semester test, academic year 2019/2020. The data 

collected are in the form of: (1) final exams question sheet, (2) 

question answer key sheet, and (3) students’ answer sheet. All data is 

obtained by documentation techniques. Data analysis uses the AnBuso 

version 8.0 application. The criteria for determining the quality of 

items are: (a) questions are considered good if the distinguishing 

feature is good/good enough, the level of difficulty is medium and all 

alternative answers are effective, (b) revision of alternative answers, if 

the distinguishing feature is good/good enough and the level of 

difficulty is medium, but the alternative there are ineffective answers, 

(c) good enough, if the distinguishing feature is good/good enough but 

the level of difficulty is easy/difficult, and (d) not good, if the 

distinguishing feature is not good.  The results of this study indicate 

that the questions used for the end-semester test at Junior High School 

in Banjarmasin are 50 % of poor quality. 

 
Keywords: Distinguishing  Feature, Effectiveness Of Distractor, 

Level Of Difficulty. 

 
Address for Correspondence:  

rabiatuladawiah@ulm.ac.id 
 

INTRODUCTION 

In the era of globalization, almost all countries strive to improve the quality of 

education. Various efforts have been made by the Government to improve the quality of 

education, and among them is through improving the quality of learning and the quality of the 

assessment system. The quality of learning and the quality of the assessment system are two 

interrelated things. A good learning system will produce good quality of learning, and the 

quality of learning can be seen from the results of the assessment. 

 Assessment is defined as the activity of interpreting the measurement data according 

to certain criteria or rules (Widoyoko, 2014: 30; Arifin, 2013: 4). Assessment includes all the 

ways used in assessing individual performance (Mardapi, 2008: 5). Another opinion says that 

assessment is taking a decision on something by referring to certain measures such as good or 

bad, smart or not smart, high or low (Supardi, 2015: 11). 

http://journal.unj.ac.id/unj/index.php/jisae
mailto:rabiatuladawiah@ulm.ac.id


141 | JISAE. Volume 6 Number 2 September 2020.  

 
From some of the definitions above, it can be said that the assessment emphasizes the 

effort made by the teacher or student in order to obtain information in connection with the 

learning that has been done. The information obtained can be used as feedback for a better 

learning process. Assessment aims to plan and implement learning, maintain classroom 

atmosphere, provide feedback and appreciation, students placement, diagnose students’ 

learning problems and assess the level of academic progress (Russell & Airasian, 2012: 5-8). 

Another opinion says that the assessment aims to determine the level of progress and 

students’ development in a certain period (Popham & Eva L. Baker, 2008: 151). Whereas the 

assessment function is (1) diagnostic, to identify student performance, (2) formative, to help 

student learning, (3) summative, to review, transfer and certification, and (4) evaluative, to 

see about teacher and institutional performance (Weeden, Winter and Broadfoot, 2002: 19). 

Therefore, assessment is a very important part of learning (Russel & Airasian, 2012: 2; 

Mansyur, Harun Rasyid and Suratno, 2015: 22), and has a strong influence in improving the 

learning process (Raymond, et al., 2012: 1-6; Bers, 2008: 31-39). Based on this, teachers are 

required to have sufficient ability to conduct assessments, because good judgment can 

motivate educators to teach better and encourage students to be better in learning (Mardapi, 

2012: 4). 

 To conduct an assessment, one of the tools commonly used in the teaching and 

learning process is a test. Test is a tool or procedure used to find out or measure something in 

a situation by means or rules that have been determined (Arikunto, 2013: 67). Another 

definition says that the test is a series of questions that have right or wrong answers, 

questions that require answers or responses to measure a person's ability level in certain 

aspects (Wening, 2012: 4). 

 As a form of learning achievement test, it is very important to maintain the quality of 

the questions. Assessment will produce a right information if the tool or instrument used to 

carry out the measurement meets several criteria such as validity, railiability and objectivity 

(Anderson, 2003: 10; Kubiszyn & Borich, 2013: 326). Item analysis becomes an important 

part in guaranteeing item validity (Nunnally & Bernstein, 1994: 304). Item analysis is an 

attempt to test the quality of questions to determine which items need to be maintained, 

discarded or revised. This analysis provides information about the quality of the questions 

seen from the level of difficulty, distinguishing feature and effectiveness of distractor 

(Muhson et al., 2015: 200). The purpose of item analysis is to identify good, bad and bad 

questions (Daryanto, 2012: 179; Sudjana, 2011: 135). 
 The activity of analyzing the items can be said as one of the ‘obligations for every 
teacher’. It is because every teacher must be able to convey information both for the 

institution or for students about the extent of mastery or students’ ability to the material or 

certain skills in connection with the material that has been given. The reality in the field often 

shows that the scores obtained by students from the results of the final exams or final school 

grades are still low. The low results are not only caused by the low ability of students to 

answer the questions, but can also be caused by the low quality of the questions. 

 
METHODS 

This research is an evaluation research. The evaluation was carried out on the items of 

Civic Education Subjects at the end of the odd semester assessment at SMPN Banjarmasin 

City Academic Year 2019/2020. The problem analyzed is the 8th grade question test, totaling 

50 multiple choice forms with research subjects totaling 200 students from nine schools. The 

data collected are in the form of: (1) final exam question sheets, (2) question answer key 

sheets, and (3) student answer sheets obtained by documentation techniques. 


142 | JISAE. Volume 6 Number 2 September 2020.  

 
In analyzing the data, researcher used the AnBuso version 8.0 program. The problem 

is said to be valid if it has a minimum correlation of 0.2. While referring to the distinguishing 

feature criterion items, the item is good, if the coefficient of distinguishing feature exceed of 

0.3. While the coefficient between 0.2-0.3 is considered good enough, and the coefficient 

below 0.2 is considered poor. 

The level of difficulty criteria range from <0.3 which included on the difficult 

category, 0.3-0.7 which included on the moderate category and >0.7 which included on the 

easy category. The criteria for a good level of difficulty are between 0.3 - 0.7. 

The criteria for distractor alternative answers of good items are if the alternative is 

answered by at least 5% of test takers, so that the alternative is considered effective. The 

criteria used to interpret the effectiveness of the distractor items are as follows: (a) If all the 

distractor on the item functions, it means the question is said to be very good which can then 

be stored in the question bank, (b) if there is one non-functional distractor on the item, then 

the problem is said to be good and can be stored in the question bank with the condition that 

the non-functional option is revised, (c) if there are two distractors in the non-functional item, 

then the problem is said to be bad and cannot be stored in the question bank. The question 

must be revised until it meets the criteria of good questions, and (d) if there are three or more 

non-functional distractors in the items, then the problem is said to be very bad and cannot be 

stored in the question bank. The question must be revised until it meets the criteria of a good 

question or the problem is discarded and replaced with a new one. 

While the criteria for determining the quality of items are: (a) questions are 

considered good if the distinguishing feature is good/good enough, the level of difficulty is 

moderate and all alternative answers are effective, (b) revision of alternative answers, if the 

distinguishing feature is good/good enough and the level of difficulty is intermediate, but 

there are ineffective alternative answers, (c) good enough, if the distinguishing feature is 

good/good enough, but the level of difficulty is easy/difficult, and (d) is not good, if the 

distinguishing feature is not good (Muhson, 2015). 

Evaluation of the quality of the questions, designed as follows. 

 
RESULTS AND DISCUSSION 

The results of the analysis of the distinguishing feature index of Civics Education 

issues at Junior High School in Banjarmasin can be seen in the following table: 

Table 1. Percentage of Distinguishing Feature of Problems 

 
Category Item Number 
Number of 

Items 
% 

Good >0.3 1, 2, 5, 7, 8, 9, 10, 13, 18, 19, 21, 28, 29, 34, 42, 

47 

16 32 

Good Enough 6, 12, 14, 24, 32, 35, 40, 48, 49 9 18 

Enter the 
question 
bank 
 

Good question 

 
Distinguish
ing feature 

 
A
n
a 
l 
i 
s 
i 
s 
 

Revised 
 

The question is 

good enough 

 
Analysis 

results 

 
A matter of 
choice  
 

Level of 
difficulty 
 

 discarded/
replaced 
 

Effectiven
ess of 
distractor 
 

Problem is not 
good 
 

143 | JISAE. Volume 6 Number 2 September 2020.  

 
0.2 - 0.3 

 
Not Good <0.2 

3, 4, 11, 15, 16, 17, 20, 22, 23, 25, 26, 27, 30, 

31, 33, 36, 37, 38, 39, 41, 43, 44, 45, 46, 50 

25 50 

From the table above, it is known that the questions that have a distinguishing feature 

are categorized as good only 32%, and 50% of the distinguishing feature of questions 

categorized as not good. The follow-up of the items after analyzing the distinguishing feature 

is as follows: (1) items that have good differentiation are stored in the question bank. These 

items can be re-issued when the next learning achievement test, (2) items with low 

differentiation items have two possibilities of not continuing, namely: (a) traced for later 

revised and then re-issued in future learning achievement tests to determine the distinguishing 

features increased or not. (b) discarded/replaced, and (3)  the items with distinguishing index 

number is negative, should be discarded because of their low quality (Sudijono, 2012). 

Next analysis is about the difficulty level of the questions. The difficulty level of an 

item is one of the item parameters that is very useful in the analysis because it can provide 

information about the difficulty of the item. Questions are said to be difficult if the difficulty 

level is close to 0 and questions are considered easy if it is approaching 1 (Muhson et al., 

2015). The results of the analysis of the difficulty level index at the end-emester test can be 

seen in the following table: 

Table 2. Percentage of Problem Difficulty Levels 

Category Item Number 
Number of 

Items 
% 

Easy >0.7 8, 18, 24, 47, 49, 50 6 12 

Medium 0.3-0.7 1, 2, 5, 6, 7, 9, 10, 12, 14, 16, 19, 21, 22, 26, 27, 28, 
29, 30, 32, 34, 35, 36, 37, 39, 42, 44, 48 

27 54 

Difficult <0.3 3, 4, 11, 13, 15, 17, 20, 23, 25, 31, 33, 38, 40, 41, 

43, 45, 46 

17 34 

  
The table illustrates that only 12% of questions is easy, 54% is medium, and 34% is 

difficult category. For the effectiveness of distractor, the results of this study indicate that all 

questions have good distractor, meaning that the distractor in each item has been chosen by 

more than 5% of students. From the analysis of the distinguishing feature, difficulty level and 

effectiveness of distractor, the quality distribution of questions can be seen in the following 

table: 

Table 3. Percentage of the Quality of the Questions 

Category Item Number 
Number of 

Items 
% 

Good 1, 2, 5, 6, 7, 9, 10, 12, 14, 19, 21, 28, 29, 32, 

34, 35, 42, 48 

18 36 

Revised 

Alternative 

Answers 

- 0 0 

Good Enough 8, 13, 18, 24, 40, 47, 49 7 14 

Not good 3, 4, 11, 15, 16, 17, 20, 22, 23, 25, 26, 27, 30, 

31, 33, 36, 37, 38, 39, 41, 43, 44, 45, 45, 50 

25 50 

 
From the table above, it can be concluded that the quality of the end-semester Civics 

Education test items at 8th grade of Junior High School in Banjarmasin, academic year 

2019/2020 is 50% poor quality.  


144 | JISAE. Volume 6 Number 2 September 2020.  

 
DISCUSSION 

The results of this study indicate that 50% of end-semester Civics Education test 

items used at State Junior High School in Banjarmasin, academic year 2019/2020 is poor. 

The results of the analysis of the distinguishing feature are only 32% were categorized as 

good, 18% were categorized as moderate, and 50% were categorized as poor. The item of 

distinguishing feature is the ability of items to distinguish between high and low ability test 

takers (Muhson et al., 2015: 200; Azwar, 2010: 137; Mansyur, Harun Rasyid and Suratno, 

2015: 189; Sukiman, 2012: 215). Thus, it can be said that of the 50 items used, 50% of the 

questions cannot distinguish students who have high abilities and students who have low 

abilities. Knowing the distinguishing feature items is important because one of the basics in 

preparing test items for learning outcomes is the assumption that the ability of the test items 

is the assumption that the ability of one question is different from another and the test items 

must be able to provide the test results that illustrate the differences in ability among the test 

takers. (Sudijono, 2012: 386).  

The analysis shows that item number 3, 4, 11, 15, 16, 17, 20, 22, 23, 25, 26, 27, 30, 

31, 33, 36, 37, 38, 39, 41, 43, 44 , 45, 46, and 50 have poor distinguishing feature with 

distinguishing feature coefficients <0.2. It means that the item cannot distinguish between 

students who can answer (high ability) and students who cannot answer (low ability). 

Therefore, the item should not be used anymore or must be discarded. The examples of 

problems that have distinguishing feature can be seen in the following figure 

 
Fig. 1. Examples of problems with poor distinguishing feature. 

 
While for item number 6,12,22,23,25,30,35,38, 48 and 49 is good enough/moderate 

differences with coefficients between 0.2 - 0.3, meaning that these items can slightly 

differentiate ability of each student and the item can still be maintained although it is not yet 

satisfying and needs to be improved. The examples of problem can be seen in the following 

picture. 

 
Fig. 2. Examples of problems with moderate distinguishing feature. 

 
145 | JISAE. Volume 6 Number 2 September 2020.  

 
Whereas for item number 1, 2, 5, 7, 8, 9, 10, 13, 18, 19, 21, 28, 29, 34, 42, and 47 

have good/high difference with coefficients >0.3. The examples of problem can be seen in the 

following picture. 

 
Fig. 3. Examples of problems with good distinguishing feature.  

Based on these data, it can be concluded that 50 questions that Civics Education used 

in the end-semester test are only 50% were able to distinguish students who were classified as 

high and low. In other words, there are only 50% of the questions if it given to students who 

are able, the results will show high achievement, and if given to students who are weak the 

results are low. 

To find out whether an item is good or bad, it can be seen from how much the ability 

of these questions in distinguishing the intelligent and less intelligent students (Shermis & 

Vesta, 2010: 282). Some questions, namely questions number 9,10,11, 17,19,34,40,46,47 and 

49 show negative differences. Negative index of distinguishing feature is bad, because it 

indicates that the item cannot distinguish according to the stated purpose (Chatterji, 2003: 

385-386). 

While, the difficulty level of this study showed that 12% of the categories are easy, 

54% are medium and 34% are difficult. The examples of problems that are categorized as 

easy, medium and difficult can be seen in the picture below. 

 
Fig. 4. Exampleof easy category question 

 
Fig. 5. Example of medium category question 

 
146 | JISAE. Volume 6 Number 2 September 2020.  

 
Fig. 6. Example of difficult category question 

 
The items is categorized as good if the item is not too difficult and not too easy (Anas, 

2011: 370). The questions that are too easy do not stimulate students to try to solve them, and 

the questions that are too difficult cause students to become discouraged and have no 

enthusiasm in trying because it is beyond of their range (Arikunto, 2013: 222). The 

assumption used to obtain good quality questions, especially for the difficulty level of the 

questions has existence of balance, in addition to meet the validity and reliability. The 

balance of questions is the existence of questions that include proportionally easy, medium 

and difficult (Mansyur, Harun Rasyid and Suratno, 2015: 180). Therefore it is expected that 

in a test there is a proportional level of difficulty between questions with easy, medium, and 

difficult categories (Rasyid & Mansyur, 2008: 239). 

The results of the question analysis, for items number 8, 18, 24, 47, 49, and 50, are in 

easy category with a coefficient value >0.7. Item number 1, 2, 5, 6, 7, 9, 10, 12, 14, 16, 19, 

21, 22, 26, 27, 28, 29, 30, 32, 34, 35, 36, 37, 39 , 42, 44, and 48 are in intermediate category 

with coefficient values between 0.3 - 0.7, and item number 3, 4, 11, 13, 15, 17, 20, 23, 25, 31, 

33, 38, 40 , 41, 43, 45, and 46, are in difficult category with coefficient values <0.3. The data 

shows that the number of easy, intermediate, and difficult questions do not meet the 

proportional requirements because the number of easy questions is only 12% (6 questions), 

questions that have medium category is 54% (27 questions) and questions that have difficult 

category is 34% (17 questions). Therefore to meet the provisions of the proportional 

difficulty level, some considerations in determining the proportion of the number of easy, 

medium and difficult questions are: (1) there is a balance of the number of questions for the 

three categories, (2) the proportion of the number of questions for the three categories is 

based on the normal curve, in which most of the questions are at medium category, then the 

proportion of easy and difficult items is proportional (Mansyur, Harun Rasyid and Suratno, 

2015: 181). 

From the 50 items analyzed, item number 25 was the most difficult one with a 

difficulty index of 0.020. Difficulty of items can be low (low coefficient value) if the contents 

of the learning target presented are not good, for example, the writing format is unclear, the 

language of writing items is confusing/ambiguous, and students do not yet understand the 

competencies/contents conveyed in the items (Chatterji, 2003: 385). Items that are too 

difficult show failure in constructing items (Shermis & Vesta 2011: 299). For the 

effectiveness of distractor, the results of the analysis show that all items have good distractor. 

In other words, all distractor have been chosen by more than 5% of the test takers. 

The effectiveness of distractor question items is how the ability of distractor questions 

to distract students who are less able to choose the alternative answers. Making questions in 

multiple choice forms must have the effectiveness of distractor means that the answer should 

not be too easy for students, but the answer can show the real ability related to who has 

knowledge, lacks knowledge or is confused with the material presented (Chatterji, 2003: 

386). A distractor can be said to function well, if the distractor has a great traction for test 


147 | JISAE. Volume 6 Number 2 September 2020.  

 
takers who do not understand the concept or do not master the material (Arikunto, 2015: 

234). A distractor can be said to function well if at least selected by 5% of test takers or more 

are chosen by the lower classes (Daryanto, 2010: 193; Sudijono, 2005: 411). Another opinion 

said that good question items are the distractor which is chosen equally by the test takers. 

Otherwise, bad question items are the distractor which is chosen unequally (Arifin, 2013: 

279). 

Based on the results of the items analysis by AnBuso program version 8.0 on the 

distinguishing feature of questions, the difficulty level and the effectiveness of distractor, it 

can be seen that the quality of the questions are shown in picture below: 

36%

14%

50%

0%

Ba ik

Cukup Ba ik

T idk  Baik

 
Fig.7. The Percentage of Questions Quality 

 
From the picture, question items categorized as good are 36%. Item number 1, 2, 5, 6, 

7, 9, 10, 12, 14, 19, 21, 28, 29, 32, 34, 35, 42, and 48, are good. Item number 8, 13, 18, 24, 

40, 47, and 49 are good enough. The rest are item number 3, 4, 11, 15, 16, 17, 20, 22, 23, 25, 

26, 27, 30, 31, 33 , 36, 37, 38, 39, 41, 43, 44, 45, 45, 50 are poor. The questions that are poor 

quality, of course, they must be discarded or replaced with new questions. 

 
CONCLUSION 

The results of this study indicate that the distinguishing feature item index is 32% in 

the good category, 18% in the moderate category, and 50% in the poor category. For the level 

of difficulty, 12% is categorized as easy, 54% is categorized as medium and 34% is 

categorized as difficult. For the effectiveness of distractor, the results of the analysis show 

that all items have good distractor. In other words, all distractors have been chosen by more 

than 5% of the test takers. From the results of the analysis of the distinguishing feature, the 

level of difficulty, and the effectiveness of the distractor, it can be seen that the quality of the 

items: 36% is good, 14% is good enough, and 50% is poor. 

 
REFFERENCES 

Anderson, L.W. 2003. Classroom Assessment: Enhancing The Quality of Teacher Decision 

Making. New Jersey: Lawrence Erlbaum Associates, Inc. 

Arifin, Zainal, 2013. Evaluasi Pembelajaran. Bandung: Remaja Rosdakarya. 

Arikunto, Suharsimi. 2013. Dasar-Dasar Evaluasi Pendidikan (Edisi 2). Jakarta: Bumi 

Aksara. 

Azwar, Syaifuddin. 2003. Tes Prestasi. Yogyakarta: Pustaka Pelajar. 

Bers, T.H. 2008. “The Role of Institutional Assessment in Assessing Student Learning 

Outcomes”. New Directions for Higher Education, 141, 31-39. 

Chatterji, Madhabi. 2003.  Designing and Using Tools For Educational Assessment. USA: 

Pearson Education, Inc. 

Daryanto. 2012. Evaluasi Pendidikan. Jakarta: Rineka Cipta. 


148 | JISAE. Volume 6 Number 2 September 2020.  

 
Kubiszyn, T. & Borich, G.D. 2013. Educational Testing and Measurement: Classroom 

Application and Practice (10th ed.). Hoboken, NJ: John Wiley & Sons, Inc. 

Mardapi, Djemari. 2012. Pengukuran, Penilaian dan Evaluasi Pendidikan. Yogyakarta: Nuha 

Medika. 

________. 2008. Teknik Penyusunan Instrumen Tes dan Nontes. Yogyakarta: Mitra Cendikia 

Press. 

Muhson, Ali, Berkah Lestari, Supriyanto & Kiromin Baroroh. 2015. “Kelayakan AnBuso 

sebagai Software Analisis Butir Soal Bagi Guru”. Jurnal Kependidikan. Fakultas 

Ekonomi Universitas Negeri Yogyakarta. Vol. 45. No. 2, Hlm.198-210. 

Muhson, Ali. 2015. Panduan Penggunaan AnBuso. Yogyakarta: Universitas Negeri 

Yogyakarta. 

Nunnally, J. C & Bernstein, I. H, 1994. Psychometric Theory (Third Edition). New York: 

McGraw-Hill, Inc. 

Pophan, W. James. 2008 dan  Eva L, Baker. 2008. Teknik Mengajar Secara Sistematis. 

(Terjemahan Amirul Hadi, dkk.) Jakarta: Rineka Cipta. 

Purnomo, A., 2007. Kemampuan Guru dalam Merancang Tes Berbentuk Pilihan Ganda pada 

Mata Pelajaran IPS untuk Ujian Akhir Semester (UAS). Lembaran Ilmu 

Kependidikan, Jilid 36, No. 1, Juni 2007. 

Rasyid, Harun dan Mansyur. 2008.  Penilaian Hasil Belajar. Bandung: CV. Wacana Prima. 

Raymond, J.E., Homer,C.S.E., Smith, R., & Gray, J.E. 2012. “Lerning through Authentic 

Assessment: An Evaluation of A New Development in The Undergraduate Midwifery 

Curriculum”. Nurse Education in Practice, 30, 1-6.  

Russell, M.K., & Airasian, P.W. 2012. Classroom Assessment: Concepts and Applications. 

(7thed.). New York: McGraw-Hill. 

Shermis, Mark D. &Di Vesta Francis J. 2011. Classroom Assessment In Action. USA: 

Rowman & Littlefield Publisher, Inc. 

Sudijono, Anas. 2012. Pengantar Evaluasi Pendidikan. Jakarta: Raja Grafindo: Persada. 

__________.  2011. Pengantar Evaluasi Pendidikan. Jakarta: Raja Grafindo: Persada. 

Sudjana, Nana. 2011. Penilaian Hasil Proses Belajar Mengajar. Bandung: PT. Remaja 

Rosdakarya. 

Supardi. 2015. Penilaian Autentik: Pembelajarn Afektif, Kognitif, dan Psikomotor (Konsep 

dan Aplikasi). Jakarta: PT. RajaGrafindo Persada. 

Sukiman. 2012. Pengembangan Sistem Evaluasi. Yogyakarta: Insan Madani. 

Weeden, P., Winter, J., & Broadfoot, P. 2002. Assessment: What’s in it for school?. London 

and new York: Routledge Falmar. 

Wening, Sri. 2012. Materi Evaluasi Pembeajaran. Yogyakarta: Fakultas Teknik Universitas 

Negeri Yogyakarta. 

Widoyoko, E.P. 2014. Evaluasi Program Pembelajaran Panduan Praktis bagi Pendidik dan 

Calon Pendidik. Yogyakarta: Pustaka Pelajar.