Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

REID (Research and Evaluation in Education), 7(2), 2021, 145-155 

Available online at: http://journal.uny.ac.id/index.php/reid 
 

Developing assessment instruments of debate practice in Indonesian 
Language learning 

 
Septiana Farida*; Farida Agus Setiawati 
Universitas Negeri Yogyakarta, Indonesia 
*Corresponding Author. E-mail: sfarida2590@gmail.com 

 
INTRODUCTION 

Speaking skill is a second language skill acquired by humans before having reading and 
writing skills, which is practical oral communication carried out on every individual in the social 
environment (Simarmata & Sulastri, 2018). So far, speaking skills as part of communication have 
not been noticed, often ignored, and not taken seriously so that many students are unable and 
dare not speak (Isnaniar, 2013; Morelent, 2012). Speaking skills play a significant role in giving 
birth to a generation that is intelligent, critical, creative, and cultured (Isnaniar, 2013). In practice, 
speaking skills involve more complex aspects (Sari et al., 2016) and support other language skills 
(Simarmata & Sulastri, 2018). 

The method used when speaking or in rhetoric is known as the art of speaking in dialogue 
or monologue. The art of speaking in the form of dialogue in question is a speaking activity that 
involves two or more people taking part in a conversation process (Midun, 2017, p. 14). The art 
form of speaking dialogue is debate, discussion, question and answer, negotiation, and conversa-
tion. The art form of monologue speech involves only one person speaking, namely in speeches, 
lectures, declamations, and remarks. Each speaking skills practice needs to be carefully studied 
and its components considered in every evaluation practice in the scope of Indonesian language 
learning in schools. 

ARTICLE INFO ABSTRACT 

Article History 
Submitted: 
22 August 2021 
Revised:  
19 November 2021 
Accepted:  
8 December 2021 

 
Keywords 
assessment; instrument; 
debate; practice; 
Indonesian language 

 
Scan Me: 

 
This study aims to develop an instrument for assessing debate practice in Indonesian 
Class X senior high school (Sekolah Menengah Atas or SMA/Madrasah Aliyah or MA) 
learning. The theoretical construct of the instrument was found after reviewing several 
theories, including speaking skills that apply to debate practice, especially those based 
on the Australian Debating Federation. The non-test instrument development proce-
dure used is the Mardapi model, which includes non-cognitive. Ten material experts 
reviewed the draft instrument (two lecturers and eight Indonesian Language Teachers 
Class X in the SMA/MA in Yogyakarta Special Region) then it was calculated using 
the Aiken formula to prove the validity of the contents of the instrument. The draft 
instrument was also tested by two raters/evaluators to assess the debate practice. The 
results of this trial were used to calculate inter-rater reliability using Cohen Kappa. The 
assessment instrument was declared reliable from the calculation of the inter-rater 
reliability value of the Kappa formula, which was obtained at 0.678. The final item 
number of the instrument after the exploratory factor analysis is 33 items with adjust-
ments to the composition of the dimensions of the statement items. 

This is an open access article under the CC-BY-SA license.  

How to cite: 
Farida, S., & Setiawati, F. (2021). Developing assessment instruments of debate practice in Indonesian Language 

learning. REID (Research and Evaluation in Education), 7(2), 145-155. doi:https://doi.org/10.21831/reid.v7i2.43338 

https://creativecommons.org/licenses/by-sa/4.0/
https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 146 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

Speaking not only issues meaningless words, but it requires technique, clear thoughts, and 
content (Midun, 2017, p. 14). The technical components in question are breathing, voice building, 
reading, and storytelling techniques. Furthermore, clear and contained thoughts become part of 
the weight of the substance conveyed when speaking, namely whether they have high creative 
and fantasy power or knowledge and objective evidence (Midun, 2017, p. 2014). Therefore, learn-
ing speaking skills occupies an essential part (Isnaniar, 2013) and at the end of the learning re-
quires a form of practical evaluation to observe all these components. 

In principle, the implementation of the evaluation of language skills in schools takes place 
differently. Reading and writing skills are used in non-face-to-face communication, and the evalu-
ation is done in writing through cognitive evaluation of the learning. The dominant mental evalu-
ation is carried out and put forward by educators rather than affective or psychomotor evalua-
tion. Cognitive evaluation is used as a benchmark for assessment and holds the principal place 
(Poerwanti et al., 2008, p. 23). This assessment is also evident from the national exam grid, which 
focuses on evaluating cognitive aspects at the elementary, junior high, and high school levels. The 
Indonesian national exam indicators only describe the evaluation of limited literary-non-literary 
reading and writing competencies and editing spelling (Badan Standar Nasional Pendidikan, 
2018). 

The form of evaluation of listening and speaking skills is carried out in practice when 
teaching is in progress (Isnaniar, 2013; Nurgiyantoro, 2001, p. 7). However, in reality, this skill 
evaluation is often forced in cognitive assessment through theoretical questions. If it is carried 
out in practice, it is realised without specific instrument guidelines. Speaking skills as a basic form 
of visual communication are often considered easy competencies, both to do and assess (Isnaniar, 
2013; Morelent, 2012). The practice of evaluating speaking skills is not an easy thing to do (Sari et 
al., 2016, p. 2), and the form of evaluation is in the form of non-test instruments that can be in 
the form of observation sheets, questionnaires, or assessment rubrics and requires accuracy in the 
evaluation process. 

Based on observations and unstructured interviews conducted with Indonesian teachers at 
MAN 3 Sleman, MAN 2 Kulonprogo, SMA N 6 Yogyakarta, and SMA N 9 Yogyakarta, the 
speaking skill assessment of students was carried out at a glance without special instruments. At a 
glance, here it is guided by the general aspects, namely intonation, expression, gesture, and mas-
tery of the material, without elaborating the indicators in each of these aspects. These general 
aspects happen in every assessment of speaking practice, whether speech, sermon, negotiation, 
declamation, debate or drama. Each student certainly has different outstanding abilities between 
each component and deserves different values/weights of appreciation. In addition, based on the 
research mentioned by Brown (2015, p. 51) that fifteen of the sixteen students in the study com-
mented that the use of debate in the classroom could improve collaborative skills or critical 
thinking skills during learning. 

Debate is one of the arts of speaking in the form of dialogue learned in class X SMA/MA 
and does not yet have a structured assessment instrument. Therefore, assessment instruments are 
needed specifically made to support each form of speaking practice. Debate is a very complex 
speaking skill competency. In addition to involving a whole of personnel, the flow of the debate 
also requires adequate competence in speaking strategies. Speakers are not only necessary to be 
able to master the motions given so they can speak frankly and piously, but they must also can 
convince and be critical and at the same time break the opponent's opinion (Salim, 2015, p. 100). 
In addition, the debate also requires a reflective and neutral attitude and is critical in examining 
the arguments or evidence used. The debater must assess the problem with solid analysis, not just 
relying on the opponent's interpretation (O’Connor et al., 2018, pp. 90–91). Thus, the aspect of 
speaking skills in the practice of debate includes various components. This component becomes 
the basis for assessing the competence of each student when practising debate. In line with this, it 
is necessary to develop an instrument for assessing the competence of debating practice realized 
in a non-test instrument in the form of an observation sheet (Ghorbani et al., 2018). 

https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 147 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

The research by Viswesh et al. (2018) aims to evaluate students' ability to make evidence-
based decisions and presentations through debate activities (methods). The results of this study 
indicate the readiness of team performance and students' skills in perceiving through debate. The 
process observed in this study is similar to the procedure followed in the development research 
carried out. The process in question is the various components and indicators that become the 
points of assessment in the debate, including the preparation process of the materials and meth-
ods of arguing used to obtain success in debating. 

In addition, based on relevant theory and research results, it can be assumed that the in-
strument of debating practice ability consists of three factors: Matter, Method, and Manner. The 
problem is how to develop an instrument that can be used to assess the practice of debate in an 
appropriate (valid) and reliable (reliable) manner. Observation sheet to be filled out by the teacher 
when evaluating students' debating practices in class. Developing the instrument is carried out 
based on the theory of speaking that applies to the course of debate. A valuable instrument for 
assessing KD 4.12 Indonesian language learning in class X SMA/MA even semesters reads Based 
on the problems/issues, points of view and arguments of several parties and conclusions from 
verbal debates to show the essence of the debate. 

The result of this study is the instrument used to assess the practice of debate that has 
good content validity and construct validity and has good interrater/Cohen Kapha reliability. The 
result of factor analysis (Exploratory Factor Analysis = EFA) shows that this debate instrument 
consists of three factors: Matter, Method, and Manner. The matter factor consists of 1 to 12 
items, the method factor consists of 13 to 24 items, and the manner factor consists of 25-38 
items. These three factors can explain the variance of debate practice by 100%. 

METHOD 

This study is a research on developing debate practice instruments using Djemari Mardapi's 
non-cognitive instrument development model (Mardapi, 2017). The assessment of debate prac-
tice in several debate contests that take place in the world after being studied from various 
sources refers to three dimensions: matter, method, manner (D’Cruz, 2003; Latif, n.d.; Quinn, 
2005). The three dimensions are decided by the adjudicators or debate experts (jurors). Each size 
has a component description related to general speaking skills. This speaking skill material is used 
as an indicator of assessment. It is stated that the instrument statement on the components con-
tained in each dimension of the debate. 

The instruments that have been compiled were tested twice, namely limited trials and field 
trials. An expert judgment validation process preceded the trial against ten material experts in 
Indonesian language learning. After obtaining the validity value, the product was revised and test-
ed limited (Murti, 2011, p. 20). A little trial was conducted on 24 students from four schools in 
Yogyakarta Special Region involving two assessors, namely the Indonesian language teacher from 
each school and researchers. The results of the limited trial obtained four values of inter-rater re-
liability using the Kappa formula calculation. Then, the average of the four reliability values was 
calculated, and the final inter-rater reliability value was obtained. 

Furthermore, the same instrument was used in a field trial on 246 class X students from 
four schools in Yogyakarta Special Region. The results of the field trials obtained were analyzed 
by exploratory factors (EFA) to obtain construct validity. The product then underwent a second 
revision based on various outcomes of factor analysis and suggestions for modification, resulting 
in a final instrument. Moreover, the test subjects were students of class X, namely 24 students in 
the limited trial and 246 students in the field trial. The trial issue was determined by purposive 
sampling, namely the class X students who practised debating. Subject determination was assisted 
by Indonesian teachers from four different schools, namely MAN 3 Sleman, SMA N 6 
Yogyakarta, SMA N 9 Yogyakarta, and MAN 2 Kulonprogo. The subjects of the field trial can be 
detailed as follows: 66 students of SMA N 6 Yogyakarta, 54 students of SMA N 9 Yogyakarta, 72 
students of MAN 3 Sleman, and 54 students of MAN 2 Kulonprogo. 

https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 148 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

FINDINGS AND DISCUSSION 

The product that resulted in this development research is an instrument sheet for assessing 
the practice of debate in Indonesian language learning for class X SMA/MA. The instrument 
sheet was developed to be used by Indonesian teachers in evaluating students when carrying out 
debating practices. The instrument sheet is in the form of a checklist observation sheet contain-
ing statements about the points that must be indicated to be observed when students argue. 

In the initial development, instrument specifications were carried out, which included the 
preparation of statement items and poured into the instrument grid. The item statements totalled 
44 statements. The grid was developed from three dimensions of debating practice assessment: 
matter, method, and manner. This dimension is determined by extracting from various literature 
on the evaluation of debate practice, including from international debate association institutions. 
Furthermore, each dimension is classified into components that give more specific indicators to 
be arranged into operational statement items. 

The dimension of matter or material consists of three components: motions, arguments, 
and facts from statements. Measuring the method is also distinguished into three parts: delivering 
idea, submitting an objection, and providing the response. Dimensions of manner or attitude are 
classified into components of expression, appearance, and vocals. Each of these components is 
still classified into more specific indicators to be reduced to statement items. The naming of the 
elements in each dimension and the arrows for each component may still change concerning the 
results of the exploratory factor analysis based on field trial data. 

The indicators derived from each component are reduced to item statements in more 
detail. Statements are coherently written on the instrument sheet according to the elements' order 
(Chai et al., 2019). The product instrument is a checklist observation sheet using a dichotomous 
score. These item statements are then observed in students when carrying out debate practices. 
The determination of the score on the observation sheet is carried out if the statement is found 
or observed, then the instrument sheet is given a check in the "Yes" column. 

On the other hand, if the statement is not found or observed, the instrument sheet is mark-
ed with a tick in the "No" column. The checklist in the column also determines the score ob-
tained by students because each statement rated "Yes" has a score of 1, and "No" has a score of 
0. The total score of the observed items can be grouped into the categorisation of students' de-
bating practice abilities. Then, from this categorisation, the score or predicate of students' level of 
proficiency in debating practice is known. 

Content Validity 

Validation of expert judgment is carried out before the product is used for testing. Expert 
validation was carried out on two Indonesian Language, and Literature Education Lecturers and 
eight Indonesian Language Teachers in class X. Expert validation aim to obtain content validity 
values calculated using the Aiken's V Index formula. 

In the revision of the first stage, six items were dropped. These items do not pass the valid-
ity test because according to Aiken's Table, which refers to the number of raters, the value of the 
item validity coefficient must exceed 0.73. This value of 0.73 is the reference value for the 4-scale 
instrument at 10 raters with an error rate of 5%. The six items can be detailed as follows. 

Table 1. Details of Dropped Items in the First Revision 

Item Statement Number Aiken's Validity Value Indicator  Information 

4 0.667 The substance of the motion Safe to abort; other 
items represent 

indicators 
13 0.700 Substance Facts 

17 0.333 Argument Statement 

21 0.600 Parry against Opponents 

29 0.700 Eye contact 

37 0.633 Costume 

https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 149 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

Construct Validity 

The developed instrument, which has been declared content valid and revised, is then used 
in a limited trial, besides that. The instrument is satisfactory and robust from Kappa's inter-rater 
reliability calculations. The instrument was tested on 246 students from four different schools. In 
this trial, each student debated their role as a pro and con group member and then directly as-
sessed using the revised assessment instrument. The score data per item of the students in the 
form of a 1-0 dichotomous score were recapitulated and analysed using the SPSS program.  

Results of Factor Analysis Items 1 to 12 

Results of factor analysis items 1 to 12 represents the dimension of matter which consists 
of the components of motion, argument, and facts of the argument. Items are arranged coher-
ently from each component. The motion component is divided into two indicators: the formula-
tion of the motion with two statements and the substance of the motion with one idea. The argu-
ment component is divided into the essence of the argument with two item statements and each 
speaker's opinion with three-item ideas. Two indicators are components, including the identity of 
the facts with three statements and the substance of the points with one statement. The results of 
the KMO and Bartlett’s Test and the total variance explained are shown in Figure 1 and Figure 2. 

 
Figure 1. KMO for Items 1-12  

 
Figure 2. Total Variance Explained for Items 1-12 

 
Figure 3. Scree Plot Display Dimension Matter 

https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 150 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

Table 2. The Naming of EFA Result Factors Dimension Matter 

Component Number of Statement Items Percentage Variance Factor Naming 

1 9, 10, 11, 12 33.879 Fact of Argument 

2 4, 5 19.435 Argument Statement 

3 3, 6, 7 17.586 Contents of Speaker's Argument 

4 2 10.970 Introduction to Arguments 

 
From Figure 3, it is known that the analysis results obtained are that four factors have an 

Eigenvalue > 1. The following details are the values of the rotated component matrix. If the mag-
nitude is > 0.5, it indicates a tendency to categorise the grain components. The four factors form-
ed the group the statement items and led to naming the factors as presented in Table 2. 

The analysis results show that the components developed in the instrument are primarily 
by the reality in the field, although there are improvements that need to be made. Point 1 can be 
aborted because apart from being represented by point 2, the “Motion Formulation” in debate 
practice has generally been formulated before the debate. Therefore, point 1 is not an item that 
must be observed in the implementation of debate practice because it is automatically present. 
Point 8 can be dropped because each speaker refers to the same concept of argument so that it 
cannot be separated, and each speaker strengthens the other speaker's argument. Therefore, argu-
ments need not be restricted explicitly between speakers. 

Results of Factor Analysis Items 13-24 

Items 13-24 coherently represent the method's dimension, which consists of the compo-
nents of how to convey arguments, how to submit rebuttals, and how to submit responses. The 
way the argument is delivered has two indicators: the argument statement with four-item state-
ments and the speaker's opinion with two item statements. An indicator of providing defence is 
divided into two views of resistance items against opponent and two statements concerning the 
identification of reasons. Components of delivering responses are divided into objective re-
sponses with two item statements and response structure with two item statements. 

Based on the factor analysis of the 12 item dimensions of the method, two statements with 
the smallest anti-image correlation value were produced, namely item 16 of 0.314 and item 13 of 
0.473. The two items were aborted, and the other 10 items were re-analysed in the same way. 

 
Figure 4. KMO for Items 13-24 

 
Figure 5. Total Variance Explained for Items 13-24 

https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 151 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

 
Figure 6. Display of the Dimensional Scree Plot Method 

Table 3. The Naming of EFA Result Factors Dimension Method 

Component Number of Statement Items Percentage Variance Factor Naming 

1 18, 19, 20, 21, 22, 23, 24 49.769 How to Respond to Arguments 
2 14, 15 12.412 How to Present an Argument 
3 17 10.974 Arguing Rules 

 
The KMO value generated after the second stage of analysis was 0.841. The results of the 

KMO and Bartlett’s Test and the total variance explained are shown in Figure 4 and Figure 5. 
The scree plot in Figure 6 shows that there are three form factors which are indicated by the 
Eigenvalue > 1. The following is a breakdown of the rotated component matrix value; if the 
magnitude is > 0.5, it categorises the item component groupings. The three factors formed the 
group the statement items and led to naming the factors as shown in Table 3. 

The concept of grouping indicators on each component of the developed method dimen-
sions follows the factors formed in the results of the factor analysis carried out. The dominant 
factor of this dimension lies in how to convey responses, which in the development concept are 
divided into two indicators, namely responses and rebuttals. However, in the factor analysis re-
sults, these two indicators tend to group on one factor. Since rebuttal is also a form of response, 
factors are generally named ways of responding to arguments. The rest, items 14, 15, and 17, 
have occupied the same elements as the initial lattice development concept. 

Results of Factor Analysis Items 25-38 

Items 25-38 contain coherent statements from the manner dimension, composed of ex-
pression, appearance, and vocal components. Indicators of eye contact with one item statement, 
gestures with two item statements, and facial expressions with two item statements are descrip-
tions of the expression components. The appearance component is detailed by one statement 
each of the indicators of standing and costume. The vowel component consists of voice and 
speed indicators with two item statements and pitch and pronunciation clarity which are also de-
tailed in two item statements, respectively. 

 
Figure 7. KMO for Items 25-38 

https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 152 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

 
Figure 8. Total Variance Explained for Items 25-38 

 
Figure 9. Scree Plot Display of Manner Dimensions 

Table 4. The Naming of EFA Result Factors Manner Dimension 

Component Number of Statement Items Percentage Variance Factor Naming 

1 32, 33, 34, 35, 36, 37 21.875 Vocal 
2 25, 26, 27 18.320 Appearance 
3 28, 29, 30 11.513 Expression 
4 31 8.999 Costume 

 
Based on the factor analysis of the 12 items of the Manner dimension, one statement was 

produced with the value of the rotated component matrix < 0.3, namely item 38 of 0.091. Thus, 
these items were dropped, and the other 11 things were re-analyzed with the same steps—items 
25 to 37, analyzed by repeated factors, resulted in a KMO value of 0.719. The results of the 
KMO and Bartlett’s Test and the total variance explained are shown in Figure 7 and Figure 8. 

From Figure 9, it is known that four factors are formed, namely four points that exceed the 
Eigenvalue > 1. The grouping of items into four components can be seen in the classification 
and naming of factors in Table 4. 

Items 32-37 were previously identified in the indicators specifically for the vocal compo-
nent, but after factor analysis, it turned out that all of these items showed unity. Therefore, items 
32-37 form a focused component as a vowel. Items 25 and 26-27 are different indicators, but 
they belong to the same component. However, after the factor analysis is carried out, it can be 
categorized into appearance components. Furthermore, item 30 in the initial development is a 

https://doi.org/10.21831/reid.v7i2.43338


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 153 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

different component of items 28-29. However, factor analysis tends to group the three items and 
can be categorized as expression components. Item 31 independently occupies the costume com-
ponent factor because it is a statement that describes the costume. 

Reliability 

The instrument reliability value was obtained by calculating the Kappa Formula's inter-rater 
reliability average on 24 subjects from four different schools, namely from MAN 3 Sleman, SMA 
N 9 Yogyakarta, SMA N 6 Yogyakarta, and MAN 2 Kulonprogo. The assessment results be-
tween the two assessors from the four schools were categorized first and then analyzed for their 
Kappa scores with the help of the SPSS Program. The categorization is determined into five 
groups based on the categorization guide Azwar (2012, p. 140), namely by first calculating the 
minimum value, maximum value, range, mean, and standard deviation. 

The instrument is in the form of an observation checklist with 38 statements whether or 
not there is so that the minimum value is 0 and the maximum value is 38. The range or range be-
tween the maximum and minimum values is 38. The mean is the maximum and minimum value 
divided by two, which is 19. The standard deviation is obtained from the range separated by six, 
which is 6.3. Based on these benchmarks, the categorization as shown in Table 5 can be obtained. 

Table 5. Categorization of Students' Debate Practice Ability 

Predicate Value Range Nominal Score 

Ver Low X ≤ 9.55 1 
Low 9.55 < X ≤ 15.85 2 

Medium 15.85 < X ≤ 22.15 3 
High 22.15 < X ≤ 28.45 4 

Very High X > 28.45 5 

 
There are 24 subjects divided into four schools obtained a nominal score categorisation 

ranging between 4 and 5. Kappa reliability values obtained from data from MAN 3 Sleman, SMA 
N 6 Yogyakarta, and SMA N 9 Yogyakarta were 0.571. In contrast, the reliability value Kappa 
data obtained from MAN 2 Kulonprogo is perfect, which is 1. The Kappa reliability value of the 
assessment instrument developed is calculated from the average of these four values. The average 
weight of Kappa reliability obtained is 0.678. Based on the categorisation of the Kappa reliability 
value by Fleiss and Cohen (1973), it can be seen that the debate practice assessment instrument 
developed is in the sufficient predicate because it is in the range of 0.61 to 0.75. It was 
corroborated by Garson (2016, p. 65), who also stated that the Kappa inter-rater reliability value 
was between 0.6 to 0.79 is included in the substantial (sturdy) category. Therefore, it can be said 
that the instrument developed is reliable. 

CONCLUSION 

The research and development that has been carried out have produced an instrument for 
assessing the practice of debate in Indonesian language learning for students of class X SMA/ 
MA, tested for validity and reliability. Before further research, the theoretical construct of the 
instrument was found after examining several theories about speaking skills in debate practice. 
After that, the procedure for developing the non-test instrument used in the study used the 
Mardapi model, which included non-cognitive. The draft instrument has been assessed by 10 ex-
perts, two of whom are dialectical speaking skills lecturers at the Indonesian Language and 
Literature Education Study Program and eight Indonesian Class X teachers in the SMA/MA in 
Yogyakarta Special Region. Both of them conducted an assessment using the Aiken formula to 
prove the validity of the instrument content. The initial instrument developed consisted of 44 
items. However, after operating field trials and conducting factor analysis, 33 final instruments 
were produced. 


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 154 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

The research results are as follows. First, the product of the debate practice assessment 
instrument compiled has been tested and has an adequate content validity value. The content 
validity test with the Aiken's V Index formula produces a validity value of 0.73 and is classified as 
valid. The first product revision was carried out after this calculation, which was to abort six 
statements so that the instrument totalled 38 items from the previous 44 items. Second, the prod-
uct reliability value is included in the reliable category, which is 0.678. Reliability is generated 
from the calculation of inter-rater reliability using the Kappa coefficient, which is based on the 
average of the four inter-rater reliability values. The four inter-rater reliability scores were obtain-
ed from a limited trial of class X students from four different schools. Thus, the number of state-
ment items for the debate practice assessment instrument developed after a construct validity test 
with exploratory factor analysis (EFA) is 33 items originating from three assessment dimensions. 
The three dimensions include the Matter dimension with four components/10 item statements, 
the Method dimension with three elements/10 item statements, and the Manner dimension with 
four components/13 statement items. 

ACKNOWLEDGMENT 

The researchers would like to thank the validators, both Indonesian language lecturers for 
speaking sub-skills and Indonesian language teachers who have been willing to be reviewers of 
the developed instrument items. Also, the class X students of the 2018/2019 academic year 
MAN 3 Sleman, MAN 2 Kulon Progo, SMA N 6 Yogyakarta, and SMA N 9 Yogyakarta, have 
become respondents/research samples for the debate practices carried out. 

REFERENCES 

Azwar, S. (2012). Reliabilitas dan validitas (4th ed.). Pustaka Pelajar. 

Badan Standar Nasional Pendidikan. (2018). Kisi-kisi USBN dan UN. Badan Standar Nasional 
Pendidikan. https://bsnp-indonesia.org/2018/11/bsnp-rilis-kisi-kisi-usbn-dan-un-2019/ 

Brown, Z. W. (2015). The use of in-class debates as a teaching strategy in increasing students’ 
critical thinking and collaborative learning skills in higher education. Educationalfutures 
[Online], 7(1). https://educationstudies.org.uk/?p=3685 

Chai, C. S., Hwee Ling Koh, J., & Teo, Y. H. (2019). Enhancing and modeling teachers’ design 
beliefs and efficacy of technological pedagogical content knowledge for 21st century quality 
learning. Journal of Educational Computing Research, 57(2), 360–384. 
https://doi.org/10.1177/0735633117752453 

D'Cruz, R. (2003). The Australia-Asia debating guide (2nd ed.). The Australian Debating Federation. 
https://www.dav.com.au/resources/aadg.php 

Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation 
coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613–619. 
https://doi.org/10.1177/001316447303300309 

Garson, G. D. (2016). Partial least squares: Regression & structural equation models. Statistical 
Publishing Associates. 

Ghorbani, S., Mirshah Jafari, S. E., & Sharifian, F. (2018). Learning to be: Teachers’ competences 
and practical solutions: A step towards sustainable development. Journal of Teacher Education 
for Sustainability, 20(1), 20–45. https://doi.org/10.2478/jtes-2018-0002 

Isnaniar. (2013). Peningkatan kemampuan berbicara siswa kelas XI SMA Negeri 4 Kota Bengkulu tahun 
ajaran 2012-2013 dengan pendekatan komunikatif [Universitas Bengkulu]. 
http://repository.unib.ac.id/id/eprint/8515 


10.21831/reid.v7i2.43338 
Septiana Farida & Farida Agus Setiawati 

Page 155 - Copyright © 2021, REiD (Research and Evaluation in Education), 7(2), 2021 
ISSN: 2460-6995 (Online) 

Latif, M. A. (n.d.). A comprehensive guide to debate adjudication. International Islamic University 
Malaysia. http://phased-
uph.weebly.com/uploads/3/2/1/6/32162939/comprehensive_adjudication_guide.pdf 

Mardapi, D. (2017). Pengukuran, penilaian, dan evaluasi pendidikan (2nd ed.). Parama Publishing. 

Midun, H. (2017). Membangun budaya mutu dan unggul di sekolah. Jurnal Pendidikan Dan 
Kebudayaan Missio, 9(1), 50–59. 
http://unikastpaulus.ac.id/jurnal/index.php/jpkm/article/view/117 

Morelent, Y. (2012). Peningkatan kemampuan berbicara siswa melalui kegiatan bercerita berbasis karakter di 
Sekolah Menengah Atas: Studi kuasi eksperimen pada siswa kelas X SMA Banuhampu Kabupaten 
Agam [Sekolah Pascasarjana Universitas Pendidikan Indonesia]. 
http://repository.upi.edu/7716/ 

Murti, B. (2011). Validitas dan reliabilitas pengukuran. In Matrikulasi Program Studi Doktoral, 
Fakultas Kedokteran, UNS, 1-19. https://dokumen.tips/documents/validitas-reliabilitas-
pengukuran-prof-bhisma-murti-55cd8744673e9.html?page=19 

Nurgiyantoro, B. (2001). Penilaian dalam pengajaran bahasa dan sastra. BPFE-UGM. 

O’Connor, A., Carpenter, B., & Coughlan, B. (2018). An exploration of key issues in the debate 
between classic and constructivist grounded theory. Grounded Theory Review, 17(1). 
http://groundedtheoryreview.com/2018/12/27/an-exploration-of-key-issues-in-the-
debate-between-classic-and-constructivist-grounded-theory/ 

Poerwanti, E., Widodo, E., Masduki, Pantiwati, Y., Poerwanti, E., Widodo, E., Masduki, 
Pantiwati, Y., & Departemen Pendidikan Nasional. (2008). Asesmen pembelajaran SD. 
Direktorat Jenderal Pendidikan Tinggi Departemen Pendidikan Nasional. 

Quinn, S. (2005). Debating. Simon Quinn. 
https://debate.uvm.edu/dcpdf/quinn_DEBATING.pdf 

Salim, A. (2015). Debate as a learning-teaching method: A survey of literature. TARBIYA: Journal 
of Education in Muslim Society, 2(1), 97–104. https://doi.org/10.15408/tjems.v2i1.1665 

Sari, K. D. I., Wendra, I. W., & Wisudariani, N. M. R. (2016). Pelaksanaan evaluasi pembelajaran 
keterampilan berbicara (bercerita) dengan materi cerpen pada siswa kelas IX D SMP Negeri 
3 Singaraja. Jurnal Pendidikan Bahasa Dan Sastra Indonesia Undiksha, 5(3). 
https://ejournal.undiksha.ac.id/index.php/JJPBS/article/view/8688 

Simarmata, M. Y., & Sulastri, S. (2018). Pengaruh keterampilan berbicara menggunakan metode 
debat dalam mata kuliah Berbicara Dialektik pada mahasiswa IKIP PGRI Pontianak. Jurnal 
Pendidikan Bahasa, 7(1), 49-62. 
https://journal.ikippgriptk.ac.id/index.php/bahasa/article/view/826 

Viswesh, V., Yang, H., & Gupta, V. (2018). Evaluation of a modified debate exercise adapted to 
the pedagogy of team-based learning. American Journal of Pharmaceutical Education, 82(4), 345–
353. https://doi.org/10.5688/ajpe6278