This is an open access article under the CC-BY-SA license. 

 
REiD (Research and Evaluation in Education), 5(1), 2019, 61-74 

Available online at: http://journal.uny.ac.id/index.php/reid 

 
An analysis of Javanese language test characteristic using the 
Rasch model in R program 

 
*1Muchlisin; 2Djemari Mardapi; 3Farida Agus Setiawati 
1,2,3Department of Educational Research and Evaluation, 

Graduate School of Universitas Negeri Yogyakarta 
Jl. Colombo No. 1, Karangmalang, Depok, Sleman, Yogyakarta 55281, Indonesia 

*Corresponding Author. E-mail: muchlisinjanuary@gmail.com 

 
Submitted: 28 February 2019 | Revised: 02 May 2019 | Accepted: 07 May 2019 

 
Abstract 
One skill required to solve a problem in the 21st century is communication. Two international languages 
that are important in communication and thought at school are English and German language. However, 
beside international language, the local language, such as the Javanese language, is also essential and need 
to be maintained. The purpose of this study is to analyze the Javanese language test characteristics. This 
study was explorative research with secondary data collected by documentation of 220 students responses 
to the 50 multiple choice item of Javanese language test in the 11th grade of vocational high school. Data 
were analyzed using the Rasch model assisted by R program. Rasch model fits the data with 42 items after 
three times calibration. Based on difficulty level, ICC, and items reliability, there were 28 of 42 items 
(66.67%) that were good. This study finds out that generally, the Javanese language test is in the moderate 
category of difficulty. Hence, the need of evaluating the Javanese language test to make a better test that 
gives more accurate information about examinees' ability is crucial. The evaluation of the Javanese 
language test can be used to plan the next learning to get better Javanese language learning.   

Keywords: Javanese language test, Rasch model, R program 
 
Permalink/DOI: https://doi.org/10.21831/reid.v5i1.23773 

 
Introduction  

In the 21st century, there are some 
skills that are required. One of these skills is 
communication (Dede, 2010, pp. 7–8; Trilling 
& Fadel, 2009, p. 54; Zubaidah, 2017, p. 1). 
We need language to carry out communica-
tion. Some international languages are impor-
tant, taught in the school, and widely used in 
the world, such as English, German language, 
Chinese language, etc. Beside international 
language, the local language, such as the Java-
nese language, is important and need to be 
maintained. 

Central Java and Yogyakarta Special Re-
gion, two provinces in Indonesia, are very rich 
in terms of tradition and culture of Java. One 
of these traditions is the Javanese language 
that is used to speak to each other in daily life. 

This is why the Javanese language lesson at 
school, especially in Java, still be held now-
adays. At every end of the semester, a test is 
conducted to assess students ability in the 
Javanese language. 

The assessment of the Javanese lan-
guage test can be carried out by analyzing test 
characteristics, which was begun by collecting 
the information about the previous results of 
the test score (Sumintono & Widhiarso, 2015, 
p. 12). Besides to give a score to the students, 
the students' response can also be used to 
predict or explain the students’ ability and 
item characteristic by analyzing test charac-
teristic based on the Item Response Theory 
(IRT). 

Test is very important both for teacher 
and students. A test can be used to classify 
the weakness in terms of verbal skills, me-

http://dx.doi.org/10.21831/reid.v5i1.23773


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

62 - Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 
ISSN 2460-6995 

chanical skills, etc. (Allen & Yen, 1979, p. 1). 
Besides, a test is a powerful method of data 
collection with an impressive array for gather-
ing numerical data rather than verbal kind 
(Cohen, Manion, & Morrison, 2007, p. 414). 
A test is defined as the standardized proce-
dure for sampling behavior and describing it 
with categories or scores (Gruijter & van der 
Kamp, 2008, p. 2). The essential features of a 
test are a standardized procedure, a focused 
behavioral sample, and description in term of 
scores or categories mapping (Gruijter & van 
der Kamp, 2008, p. 2). The result of the test 
(scores) can be used to predict or explain the 
item and test performances (Lord & Novick, 
2008, p. 358). Thus, the Javanese language test 
has to be analyzed in terms of its character-
istics to get a better test in the next chance 
that can reach the test goal and give more 
accurate information about the examinee’s 
ability. 

The test has some uses. Five uses of a 
test include classification, diagnosis and treat-
ment planning, self-knowledge, program eva-
luation, and research (Gregory, 2015, p. 29). A 
test can be a useful tool, but it can also be 
dangerous if misused (Allen & Yen, 1979, p. 
5), depending on our professionality in en-
suring the use of the test accurately and as 
fairly as possible. Many extraneous factors can 
influence the test (Gregory, 2015, p. 31). Sev-
eral sources that may influence the test are the 
manner of administration, the test character-
istic, the testing context, examinee’s motiva-
tion and experience, and the scoring method 
(Gregory, 2015, p. 31). 

In a test, some plannings need to be 
prepared, including identifying the purposes, 
the test specifications, and selection of the 
contents, considering the form, the writing 
test, the layout, the timing, and planning the 
scoring of the test (Cohen et al., 2007, p. 418). 
We can make a good Javanese language test 
by paying attention to the planning and some 
influencing factors. Besides, a good result of 
the test, which is accurate, rich, and beneficial 
for evaluation will be obtained by analyzing 
the characteristics of the items or test of Java-
nese language using Item Response Theory 
(IRT). 

There are some alternative ways to ana-
lyze test characteristics, including classical test 
theory (CTT) and item response theory (IRT). 
In CTT, it is difficult to analyze a test with a 
large amount of calculation to get useful 
information (Baker, 2001, p. 1). Besides, CTT 
has some weakness, such as the result of the 
measurement depends on the test character-
istic used, item parameter depends on the 
examinee's ability, and the error measurement 
provided is limited for group measurement 
instead of individual information (Mardapi, 
2017, p. 187). In CTT, if test is 'hard', the 
examinee ability will below; it is 'easy', the 
examinee ability will be higher (Ronald K. 
Hambleton, Swaminathan, & Rogers, 1991, p. 
2). Therefore, CTT is considered to be not 
effective to analyze the Javanese language test. 

The weakness of CTT is that it can be 
covered by IRT. IRT is one of the modern 
psychometric theories that provide useful 
tools for ability testing (Harrison, Collins, & 
Müllensiefen, 2017, p. 1). IRT is a powerful 
tool used to solve a major problem of CTT 
(Downing, 2003, p. 739). Item response 
theory (IRT) models, including Rasch, show 
the relationship between the ability of test 
participants from latent trait (e.g., Javanese 
language skills) and the opportunity to master 
the given items (answer the items correctly) in 
the form of logistic models (Finch & French, 
2015, p. 181). IRT has 3 assumptions (Finch 
& French, 2015, p. 181; Mardapi, 2017, p. 
187). These are monotonicity, unidimension-
ality, and local independence. 

CTT has served development well in a 
test over several decades, but IRT has become 
mainstream rapidly as the theoretical measure-
ment basis (Embretson & Reise, 2000, p. 3). 
The feature of IRT is specification of a 
mathematical function relating probability of 
an examinee’s response on a test item to an 
underlying ability (Embretson & Reise, 2000, 
p. 8; Finch & French, 2015, p. 177; Gruijter & 
van der Kamp, 2008, p. 133; R K Hambleton 
& Swaminathan, 1985, p. 9; Ostini & Nering, 
2006, p. 2; Reckase, 2009, p. 68; van der 
Linden & Hambleton, 1996, p. iii). In other 
words, the function describes in probabilistic 
terms, a person with low and high ability give 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 - 63 
ISSN 2460-6995 

a different response (Ostini & Nering, 2006, 
p. 2). IRT is an important thing that can solve 
the problem of dealing the relationship be-
tween ability (examinee’s mental traits) and 
response (performance) to the item (Lord & 
Novick, 2008, p. 397). IRT is used in so many 
education fields, not only in social science, 
even in medical education, it has some poten-
tial benefits (Downing, 2003, p. 739). In the 
IRT, some information about the test charac-
teristic can be gained accurately, so that ana-
lyzing the Javanese language test using IRT 
needs to be conducted. 

One of the models in IRT is the Rasch 
model. The Rasch model was developed by 
Georg Rasch, a Danish mathematician, in 
1960 (Hailaya, Alagumalai, & Ben, 2014, p. 
301; Jambulingam, Schellhorn, & Sharma, 
2016, p. 50; Mallinson, 2007, p. 1; Young, 
Levy, Martin, & Hay, 2009, p. 545). There are 
some points of view about the Rasch model. 
Rasch model is a special case of one-para-
meter logistic (1 PL) model with item dis-
crimination value is set equal to 1 (Finch & 
French, 2015, p. 181). Discrimination shows 
the ability of an item to differentiate among 
examinees ability (Finch & French, 2015, p. 
181). The Rasch model can be expressed as: 

 
              (1) 

 
In equation (1), xj is the response to the 
item j with 1 being correct in the context of 

an achievement test.  represents an individu-
al ability, and bj is the difficulty level of item j. 

Analysis of the Javanese language test 
using Rasch model has practical benefits. We 
can check the model fits the data. Rasch 
model can define the probability of a specified 
response in relation to examinee’s ability and 
item difficulty of a Javanese language test 
(Hailaya et al., 2014, p. 301; Jambulingam et 
al., 2016, p. 50). Using Rasch model, there is 
no need to differentially weight items to pro-
duce a total score that gives the maximum 
possible amount of information about latent 
trait; the number-right score is the best pos-
sible total score to use (Allen & Yen, 1979, p. 
260). Rasch model produces the latent-trait 
(Javanese ability) and the item difficulty scale 

that have desirable. Analyzing the Javanese 
language test using the Rasch model can be 
done by the R program. 

The Javanese language test in the school 
has to be analyzed the characteristic using the 
Rasch model in IRT by R program to get 
some information. This information can gain-
ed from the Item Characteristic Curves (ICC). 
ICC can provide the probability of the exam-
inees at a given ability level of answering each 
item correctly (Hambleton & Swaminathan, 
1985, p. 13). Beside ICC, there are the other 
important information about the items or the 
test that we can get by using the Rasch model 
in IRT.The Javanese language test in the 
school has to be analyzed the characteristic 
using Rasch model in IRT by R program to 
get some information. This information can 
be collected from the Item Characteristic 
Curves (ICC). ICC can provide probability of 
the examinees at a given ability level of an-
swering each item correctly (Hambleton & 
Swaminathan, 1985, p. 13). Beside ICC, there 
are the others important information about 
the items or the test that we can get by using 
the Rasch model in IRT. 

There are many studies of IRT applica-
tion. They compared the use of IRT and CTT 
or studied the application of IRT to analyze 
the test characteristic. A study conducted by 
Downing (2003) contrasts the IRT with CTT 
and explores the benefit of IRT application in 
typical medical education settings. Downing 
just compares these models and explore the 
benefit of IRT theoretically; he did not go 
further discussing the application of IRT in 
the analysis. In this study, IRT was used to 
analyze the test by the Rasch model in the R 
program. Essen, Idaka, and Metibemu (2017) 
analyze the model-data fit in IRT using Bilog 
and IRTPRO program. They used two pro-
grams to analyze the model-data fit, but in 
this study, one model in one program was 
used to analyze the model's fit data, item fit 
model, the difficulty level of the items, items 
characteristics curve (ICC), item information 
curve (IIC), test information curve (TIC), the 
information given by each item, and the Java-
nese ability distribution. More complex infor-
mation would be revealed in this study. 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

64 - Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 
ISSN 2460-6995 

The study of Purnama (2017) was con-
ducted to understand the characteristics of 
Accounting Vocational Theory test items by 
IRT using BILOG Program. In this study will 
analyze the characteristics of the Javanese lan-
guage test using the Rasch model in the R 
program. Purnama’s study analyzes the test 
using 2 PL, employing the Rasch model, 
which is the special case of 1 PL. Purnama’s 
study did not use the ICC to analyze the item 
characteristics, while in this study, ICC will be 
used. Another study conducted by Setiawati, 
Izzaty, and Hidayat (2018b, 2018a) using IRT 
to analyze the test employs Bilog program, 
while this study employs the R program. A 
study by Iskandar and Rizal (2018) has some 
relevancy with this study. These studies use a 
program to conduct analysis. In their study, 
they analyze the validity, reliability, difficulty 
level, and the other cases, but not the items 
and test characteristic curve, the information 
functions, the ability average of examinees, 
etc. Those aforementioned studies used CTT, 
while this study uses IRT. It is hoped that this 
study would present findings which can con-
tribute to analyzing the characteristic of the 
Javanese language test, so that there would be 
an evaluation for the Javanese language test to 
get a better one. 

The Javanese language test will be 
analyzed by IRT. Analyzing the Javanese lan-
guage test will be more accurate and can be 
used to estimate the relationship between the 
examinee ability and the examinee response to 
the items of the Javanese language test. Ana-
lyzing the Javanese language test using IRT 
will produce the analysis not just for the over-
all test, but also for individual items character-
istic. The characteristics of item and test (IIC 
and TCC) estimate how accurate the Javanese 
language test will give us the information (IIC 
and TIC) and the other characteristics. Based 
on the explanations, the researchers decided 
to analyze the Javanese language test charac-
teristics based on item response theory using 
the Rasch model in the R program.  

Method 

This study is explorative research, that 
is research which aims at finding the fact and 

characteristics systematically and accurately 
about atheJavanese language test (Arikunto, 
2010, p. 14). The characteristics of the Java-
nese language test were analyzed using the 
Rasch model in the R program. This research 
was conducted in Yogyakarta from May to 
June 2018. 

The data analyzed in this study are sec-
ondary data. The data were collected by the 
documentation method, which is collecting 
the answer sheet of 220 students' responses to 
the Javanese language test in Depok 1 Voca-
tional High School, Yogyakarta. The Javanese 
language test consists of 50 multiple choice 
items. 

The instrument unit, the Javanese lan-
guage test, was made by the Javanese language 
teacher. Then, the researchers summarize the 
responses in the dichotomy data table. The 
wrong responses are denoted by 0, and the 
true responses are denoted by 1. The item 
number 1 was symbolized with B1, item num-
ber 2 was B2, item number 3 was B3, and so 
on. The data of the Javanese language test 
were analyzed based on IRT using Rasch 
model in the R program. 

After the data were collected and ana-
lyzed using the Rasch model in the R pro-
gram, some findings are gained. It described 
how the characteristics of the Javanese lan-
guage test told us the probability of an exam-
inee’s response on the test item to an under-
lying ability (Javanese language ability). The 
researchers analyzed the model fits of the 
overall data, the difficulty level, and item fits 
of the model, ICC, TCC, IIC, TIC, item in-
formation, the Javanese language ability distri-
bution, and the descriptive statistics for the 
Javanese language ability. 

The model fits the overall data. The 
goodness of fit model was conducted to test 
whether the Rasch model fits with the overall 
data, whereas item fits model was done to test 
whether the model fits for individual items as 
well. Both will be fit if the p-value more than 
0.05. If the Goodness of Fit Model has not 
met the fit criteria, then the item fits model 
would be conducted, and the items that did 
not fit would be removed. Then, the good-
ness of fit of the remained items would be re-


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 - 65 
ISSN 2460-6995 

analyzed until the criteria were met, and we 
can continue to the next analysis. 

In practice, the researchers set the cate-
gory, e.g., a difficult level is said to be good if 
it has a difficulty value ranging from - 2.0 to 
2.0 (Hambleton & Swaminathan, 1985, p. 
107). In this study, an item can be said a good 
item if have difficulty level from – 3.0 to 3.0. 
The ICC will show about how the relation-
ship between examinee ability with the true 
response probability, whereas TCC shows the 
relationship between examinee ability and the 
true score (sum of the true response probabil-
ity). The IIC and TIC show the information 
that we can get based on the item or test for 
certain examinee ability. The item information 
is useful for item selecting. The criteria of the 
reliable item are if the item information value 
more than 0.5. The Javanese language ability 
distribution and descriptive statistics are all 
about examinee ability in this test. All of the 
information would explore the Javanese lan-
guage test characteristics in this study. 

Findings and Discussion 

After the data were collected and ana-
lyzed, some results are gained. It describes 
how the characteristics of the Javanese lan-
guage test told us the probability of an exam-
inee’s response to the test item to an under-
lying ability (Javanese language ability). It can 
be seen from model fits data, the difficulty 
level, and item fits model, ICC, TCC, IIC, 
TIC, the distribution of Javanese language 
ability, etc. 

The first step of the analysis of the 
characteristic of the Javanese language test is 
the assessment of the model fit for the Rasch 
model. We have to make sure that overall 
model fit for Rasch model. It can be said that 
the model fits the data if the frequency of the 
observed and the model-predicted individuals 
for each response pattern are close to one 
another (Finch & French, 2015, p. 189). To 
analyze the model fit, we used the bootstrap 
chi-square procedure in R program (whether 
the model fits for the overall data). The boot-
strap chi-square test of overall model fit for a 
Rasch model was conducted by command 
GoF.rasch(model.rasch, B=1000). First, the re-

searchers analyzed the model fits for all items 
(50 items). The result shows that p-value is 
0.006. If the p-value is less than 0.05, it means 
that the model does not fit the data. Thus, it is 
said that the model did not fit the data (for all 
items). Then the items fit model was analyzed 
(whether the model fits for the individual i-
tems as well) by command item.fit(model.rasch, 
simulate.p.value = TRUE). There were three i-
tems that did not fit the model. These items 
are item number 27, 32, and 35. The data for 
these three items were removed, and the re-
searchers analyzed the model which fits the 
data again. 

The second analysis of the model fit of 
the data was done, and we got the p-value 
0.017. It was still less than 0.05. It means that 
the Rasch model did not fit the data. Then the 
researchers analyzed the items fit the model 
for these 47 items. They got that the items 
number 3, 11, 13, 36, and 48 did not fit the 
model. The data for these items were then 
removed. Then, the researchers reanalyzed 
the model fit of the data with 43 items re-
mained. The third analyzing of the model fit 
of the data showed that the model fits the 
data. It could be seen from the p-value were 
0.053 (more than 0.05). Finally, after three 
times calibration of the fit-model, the re-
searchers got the Rasch model fits the data 
without the items number3, 11, 13, 27, 32, 35, 
36, and 48 (there are 42 items that would be 
analyzed). In other words, the researchers had 
gotten the overall model-fit for the Rasch 
model, then, they could continue the other 
analysis. 

The researchers analyzed the difficulty 
level of the items, and the items fit the model. 
The summary of the analysis is clearly pre-
sented in Table 1. 

The center of item difficulty level is 0; 
negative value represents relatively easy, and 
positive value indicates relatively more diffi-
cult items (Finch & French, 2015, p. 184). 
Based on that statement, it indicates that 
when the value of difficulty is increasingly 
negative, then the difficulty level of the prob-
lem is easier and when the value of the dif-
ficulty becomes more positive then the level 
of difficulty becomes increasingly difficult. 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

66 - Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 
ISSN 2460-6995 

From the Rasch's analysis of the difficulty 
level of the items, it is found that the easiest 
question is item number 20 (with difficulty 
level -15.7892) and the hardest problem is 
item number 23 (with difficulty level 0.9702). 

In theory, the difficulty levels are in the 
range of minus infinity to infinity. There are 
some items that have a good category based 
on their difficulty level. There are 28 good 

items, and the rest, 14 items, are not good 
based on the difficulty level. The not good 
items based on difficulty level are item num-
ber 5, 6, 7, 12, 14, 16, 17, 18, 19, 20, 25,29, 
38,and 46. There are 69.77% of 43 items that 
are good in the difficulty level. Hence, the test 
in the moderate category based on the 
difficulty level. 

Table 1. Difficulty level of items and the items fit of the model 

Item No. Difficulty level of the items The items fit of the model 

1 -0.8355 0.0792 
2 -1.0570 0.6634 
4 -0.3796 0.4554 
5 -4.6802* 0.7165 
6 -4.6802* 0.5149 
7 -3.5262* 0.3861 
8 -1.0317 0.1683 
9 -2.8874 0.2574 
10 -1.5902 0.3366 
12 -5.0950* 0.6832 
14 -5.7976* 0.6436 
15 -2.6885 0.9208 
16 -4.1508* 0.3465 
17 -3.7959* 0.0891 
18 -3.9593* 0.9208 
19 -5.7976* 0.1584 
20 -16.0705* 0.1881 
21 -0.2267 0.0396# 
22 -0.5127 0.9802 
23 0.9695 0.3960 
24 -1.8959 0.8713 
25 -4.3832* 0.8614 
26 -1.3516 0.7426 
28 -1.7202 0.9604 
29 -3.1221* 0.0693 
30 -1.5902 0.2970 
31 -0.4016 0.4356 
33 -2.6282 0.4059 
34 -1.7202 0.4653 
37 -1.9713 0.9406 
38 -3.5263* 0.3168 
39 -2.0908 0.1287 
40 -2.0102 0.1386 
41 -1.1084 0.1386 
42 -1.5589 0.2277 
43 -1.4678 0.2277 
44 -2.0908 0.8119 
45 -2.9610 0.3762 
46 -3.3073* 0.6436 
47 -2.1756 0.4158 
49 -1.3235 0.9505 
50 -1.6541 0.3366 

Notes: 
*item is not good based on the difficulty level 
#item misfit with the Rasch model 

 
An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 - 67 
ISSN 2460-6995 

The teacher should pay attention to the 
not good category items. All of the items that 
are not good based on the difficulty level are 
categorized at too easy items. These items are 
not good because they are too easy for every 
examinee. It was indicated by all of their in-
dexes of difficulty level which are smaller than 
-3.0. 

Rasch model had fit with the data, but 
there is one item that did not fit with the 
Rasch model. This item is item number 21. 
We could not decide on these items. It was 
because these items did not fit with the mod-
el. It means that the characteristics of this 
item (item no. 21) based on the Rasch model 
were not adequately accurate. 

The analysis of item characteristics is 
displayed in the form of curves for all items 
can be seen in Figure 1. The item character-
istic curve (ICC) places the test participant's 
location on the latent trait measured on the x-
axis and the ability to master an item on the y-
axis (Finch & French, 2015, p. 184). The la-
tent trait refers to the Javanese language abil-
ity, and the ability to master an item (proba-
bility answer correctly) refers to the probabil-
ity of the examinee to respond correctly to the 
item. From ICC, it can be known about the 
probability of correctly answer from someone 
with a certain ability on an item. The com-
mand to get ICC for all items (42 items) to-
gether is plot(model.rasch,type=c('ICC')). It gives 
us all the ICC of the item in the test. 

Figure 1 shows the ICC of 43 items. It 
was difficult to interpret the curve if we used 
all ICC together. The ICC of the items num-
ber 23 was located at the most right position 
of the x-axis (Finch & French, 2015, p. 185). 
It means that the item number 23 is the most 
difficult item. The easiest item was not able to 
find, because it was so complex. However, it 
is clear that the item number 20 is the easiest 
item based on the difficulty level of the item. 
If the curve from these items is separated, we 
can see it more clearly. Thus, the ICC for item 
number 20, 23, and two other numbers can be 
compared. The ICC for item number 20 and 
23, and two other items are presented in 
Figure 2. 

 
Figure 1. The ICC of Javanese language test 

From Figure 1, some of ICCs are not 
good because the correct response probabil-
ity for the examinee with low ability is high. 
These items are item number 5, 6, 7, 12, 14, 
16, 17, 18, 19, 20, 25, 29, 38, and also 46 (total 
of 14 items). All of these items have fitted the 
model. 

 
Figure 2. The ICC for items number 20, 23, 
24, and 29 

However, the difficulty levels of these 
items are not good. Thus, these items (see 
Figure 3) are not good based on the ICC and 
difficulty level. 

 
Figure 3. Items with not good ICC 

To look the ICC of a specific item, let 
us say that items number 20, 23, 28, and 29, 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

68 - Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 
ISSN 2460-6995 

used the command plot(model.rasch,type=c("ICC 
"),items=c(17,20,21,25)). It is a little different 
from the command for all ICC, in which, sort 
number from specific items was mentioned. It 
would make every ICC of some items in one 
graph to be able to compare easily. 

Figure 2 presents some item character-
istic information. For item number 20, regard-
less of the student’s ability, the probability to 
answer correctly is the same for all examinee, 
which is 1.0 (always true). It indicates that the 
item number 20 is too easy for every exam-
inee. It means that examinee with any Java-
nese language ability will be able to respond 
the item correctly (the examinee with ability 
value -4 through 4 could respond to this item 
correctly). For the hardest item (item number 
23), the examinee with ability 1 will have 
probability approximately 0.5 to answer this 
item correctly. To get high probability about 
0.9 or more, the examinee should have Java-
nese language ability almost 4. The Javanese 
language ability would be needed to increase 
the opportunity to answer this item correctly. 

The test characteristic in correlating the 
ability with true score can be found by TCC 
(Test Characteristic Curve). True score is the 
sum of correct answer probability. The Java-
nese language test TCC is shown in Figure 4. 

 
Figure 4. The TCC of the test 

From Figure 4, it is known that the test 
is an easy category. The examinee with a low 
ability (-3) will have true scores approximately 
19, and the examinee with an average ability 
(0) will have true scores approximately 35 
(near to the maximum true score, that is 42). 

The examinee with ability value 0 (aver-
age ability) will have a different probability for 
each item. He/she will have probability 0.2 

for item number 23, probability approximate-
ly 0.8 or more for item number 24 and 29, 
and probability 1.0 (true response) for item 
number 20. Figure 2 explains that the diffi-
culty level of item number 20 is easier than 
item number 24 and 29, and item number 24 
and 29 are easier than item number 23. Figure 
1 shows that some ICCs are not good since 
the correct response probability for examinee 
with low ability is high. These items are item 
number 5, 6, 7, 12, 14, 16, 17, 18, 19, 20, 25, 
29, 38, and 46 (14 items). Those items have 
fitted the model. The item characteristic for 
every item can be described the same way as 
we had done to the item number 20, 23, 24, 
and 29, by separating it from the other ICC so 
that it will be seen clearly. 

In addition to the ICC, we used the R 
program to plot the item information curve 
(IIC). The IIC describe the information func-
tion of an item. It refers to the degree to 
which item reduces the uncertainty in the esti-
mation of Javanese language ability (the latent 
trait) value for an individual (Finch & French, 
2015, p. 185). A high value of information for 
a specific range of ability distribution indicates 
that the item provides relatively more infor-
mation regarding the latent trait (Javanese 
language ability) in that region than another 
region in the distribution (Finch & French, 
2015, p. 186). Based on the IIC, we can see 
how reliable the item in giving information. 
All the IIC are shown in Figure 5. There are 
50 IIC with each degree in estimating the in-
formation given by each item. The command 
to get IIC for all item in the test is plot(model. 
rasch,type=c('IIC')). The command for specific 
IIC is plot(model.rasch,type=c('IIC'),items=c(18,21, 
25,40)), that will produce IIC for item number 
20, 23, 28, and 47. The IIC for 43 items is 
shown in Figure 3, and the IIC for item num-
ber 20, 23, 28, and 47 is shown in Figure 7. 

There are 43 IIC that can describe how 
reliable each item in the giving information 
about the Javanese language ability value for 
an individual. There are just 43 IIC of the 43 
items that the Rasch model fits for the data. 
From Figure 4, we can get the most accurate 
and inaccurate items in giving information 
about the examinee’s ability in the Javanese 
language. These are shown by item number 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 - 69 
ISSN 2460-6995 

20 and 23. The IIC for these numbers is 
shown separately from the others in Figure 5 
with item number 28 and 47. 

 
Figure 5. The IIC of 42 items 

Some of IICs give maximum informa-
tion for examinee with a low ability (Figure 6). 
These items are item number 5, 6, 7, 12, 14, 
16, 17, 18, 19, 20, 25, 29, 38, and 46 (14 
items). These items did not give maximum or 
give low information for the examinee with 
the medium or high ability. These items are 
not good, because they give maximum or high 
information just for low ability examinee and 
these items based on the ICC and the diffi-
culty levels are not good. Therefore, we can 
conclude that these items are not good based 
on the ICC, IIC, and difficulty level. 

 
Figure 6. Item with not good IIC 

 
Figure 7. The IIC for item number 20, 23, 28, 
and 47 

Figure 7 shows the IIC for item number 
20 is the most inaccurate in giving informa-
tion about the examinee’s Javanese language 
ability. This item cannot give the information 
accurately because any examinee with any 
ability shows 0 information value that can be 
provided by this item. We cannot differentiate 
the examinee's ability. There is no information 
about the examinee ability (in the Javanese 
language) that we can get if we use this item 
to measure them. The IIC for item number 23 
shows that it is needed ability approximately 1 
to get information about 0.25, in other words 
that item 23 provides maximum information 

for estimating  (Javanese language ability) a-
round values of 1. The item number 28 and 
47 will give maximum information about the 
examinee if he/she has ability about -2. The 
IIC for every item is different, but this study 
shows more specific item information curve 
for item number 20, 23, 28, and 47. If we 
want to look at the IIC from the other item, 
we can separate it from the others. 

Item information curves show the in-
formation function for every item in the test. 
For the total information, the function can get 
from Test Information Function. There are 
some features of the test information func-
tion. These are defined for a set of the test 
items at each point on the ability scale, the 
amount of the information is influenced by 
quality and number of test items, etc. One of 
the most important features of the test infor-
mation function is that the contribution of 
each item to complete information is additive 
(Hambleton & Swaminathan, 1985, p. 104). 
The test information curve that shows the to-
tal information function is like Figure 8. The 
command to get test information curve is 
plot(model.rasch,type=c("IIC"), items=c(0)). 

 
Figure 8. Test information curve 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

70 - Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 
ISSN 2460-6995 

Figure 8 shows the estimate of the test 
information function on the curve. TIC pre-
sents how reliable the Javanese language test 
is. The TIC interpretation is similar to the IIC 
interpretation. The test provides us maximum 

information for estimating  around values of 
-2. Thus, the test will be good to be used for 
examinee with low Javanese language ability. 
The test was less accurate in giving informa-
tion on examinee with Javanese language abil-
ity 0 (average ability) or more than 0 ability. 

The information function (IIC or TIC) 
has some application in the test construction, 

item selection, measurement precision assess-
ment, test comparison, scoring weight deter-
mination, and scoring methods comparison 
(Hambleton & Swaminathan, 1985, p. 101). In 
item selection, we can select the item that can 
provide accurate information on examinee’s 
ability. The item’s IIC, which does not pro-
vide information, means the item should not 
be used in the test (like item number 20). The 
item does not provide information in any the-
ta (ability), so it should not be used in the test. 

Table 2. The information of each item in theta -3.0 until 3.0 

Item No. Information Percentage 

1 0.88 87.60% 
2 0.86 85.78% 
4 0.90 89.93% 
5 0.16 15.74% 
6 0.16 15.66% 
7 0.37 36.94% 
8 0.86 86.01% 
9 0.53 52.58% 
10 0.79 79.35% 
12 0.11 11.01% 
14 0.06 5.82% 
15 0.57 57.43% 
16 0.24 24.03% 
17 0.31 31.04% 
18 0.28 27.67% 
19 0.06 5.82% 
20 0 0.09% 
21 0.90 90.31% 
22 0.89 89.44% 
23 0.87 86.55% 
24 0.74 74.38% 
25 0.2 20.06% 
26 0.83 82.61% 
28 0.77 77.38% 
29 0.47 46.78% 
30 0.79 79.39% 
31 0.9 89.86% 
33 0.59 58.87% 
34 0.77 77.38% 
37 0.73 73.00% 
38 0.37 37.05% 
39 0.71 70.70% 
40 0.72 72.27% 
41 0.85 85.29% 
42 0.8 79.81% 
43 0.81 81.09% 
44 0.71 70.70% 
45 0.51 50.76% 
46 0.42 42.14% 
47 0.69 68.98% 
49 0.83 82.95% 
50 0.78 78.42% 

 
An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 - 71 
ISSN 2460-6995 

The complete information of the test 
across all values of the Javanese language abil-
ity (latent trait) can be obtained by using the 
command information(model.rasch, c(-10,10)). The 
subcommand c(-10, 10) identifies the range of 
the theta (ability) for which information is re-
quested. The total information that is pro-
vided by the test at the examinee’s ability 
ranges from -10 to 10 equal to 41.93 or 100%. 
It means that the test will give maximum in-
formation if the test were used in the exam-
inees with ability -10 until 10. If we request 
for the ability values in range 0 to 10, with the 
command information(model.rasch, c(0,10)), is 5.9 
or 14.08% of the total information provided 
by the Javanese language test. In the normal 
distribution raw, the area of range -3 to 3 
equals to 95% of the total area. The total 
information that could be given by the test if 
we measure in the ability range of -3 to 3 is 
24.98 or 59.58% of the total information. 
There is still moderate information which we 
could obtain by using this instrument in mea-
suring the examinee with the ability in this 
range. 

Beside the ICC, TIC, and the total in-
formation, we can get the information given 
by each item in the range of a certain ability 
(theta). In this study, the information, that is 
given by each item in the ability range of –3 
until 3, are listed in Table 2. We can know the 
percentage that we get from the total infor-
mation of each item. 

Based on Table 2, we can see the infor-
mation given by each item in the theta -3.0 
until 3.0. The information can be used for 
item selection. How reliable the item depends 
on the percentage of information gotten from 
each item in this range of theta. We can set 
the criteria for reliable item like we need. For 
example, if we will compose a test, we cannot 
use item number 20, because it gives us very 
small information. If we set the criteria for 
reliable information of each item by more 
than 50%, we get 28 reliable items of 42 items 
that can be used (there are 66.67%). The re-
maining unreliable items (14 items) are not 
good. Incidentally, these unreliable items are 
also categorized as not good based on the 
ICC, IIC, and difficulty level. 

Obtaining latent trait (Javanese language 
ability) estimates for the Rasch model in R 
program, we used the command theta.rasch<-

factor.scores.rasch(model.rasch) to save the  esti-
mates from the Rasch model. Then, we used 
the command summary(theta.rasch$score.dat$z1) 

to get a basic descriptive statistic of ability(). 
The output of this command is shown in 
Table 3. 

Table 3. The latent trait estimates 

Min. Median Mean Max 

-2.0780 -0.1534 -0.1138 1.6538 

 
We can see that the mean of Javanese 

language ability for the sample is -0.1138, with 
the minimum being -2.0780 and the maximal 
being 1.6538. The standard deviation of Java-
nese language ability gotten by the command 
sqrt(var(theta.rasch$score.dat$z1)). The result of 
the standard deviation of Javanese language 
ability is 0.750783. The plot of the latent trait 
(Javanese language ability) was gotten by the 
command plot(theta.rasch). The plot of the la-
tent trait (Javanese language ability) based on 
the Rasch model is shown in Figure 9. 

 
Figure 9. Plot of theta 

Figure 9 shows that the distribution of 
Javanese language ability almost centered at 0. 
The center of the plot ability shows the mean 
of ability, that is -0.1367. Thus, that is the 
reason why it is almost centered for those 
with Javanese language ability value of 0. The 
highest density of Javanese language ability is 
located in the mean ability value. The distri-
bution of the theta (Javanese language ability) 
based on the analysis using the Rasch model 
in R program shows the normal distribution 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

72 - Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 
ISSN 2460-6995 

curve. The right side and the left side of the 
distribution curve are almost balanced. 

Figure 8 shows that maximum informa-
tion will be obtained when the Javanese lan-
guage ability value is -2. However, the mean 
ability from the examinees is -0.1367, meaning 
that generally, the Javanese language test did 
not give maximum information on the exam-
inee's Javanese language ability. It can be said 
that the test is less accurate. Thus, evaluation 
of the Javanese language test is needed. 

The evaluation of the Javanese language 
test will make the test better, so that it can 
give more accurate information for a teacher 
in the assessment of precision measurement. 
The teacher will have further steps or ideas to 
be applied in the next Javanese language les-
son if they know the examinee's ability gener-
ally to make the examinee’s Javanese language 
ability increase. It is hoped that, with the in-
creasing of the Javanese language ability, the 
student will practice it in their daily life. They 
retain the culture and character of Javanese 
language in their lives, which there are so 
much positive learning, culture, character, 
interaction in Java, and so much more. 

This study analyzed the Javanese lan-
guage test based on the Rasch model in the R 
program. For the next study, we hope they 
can use the other model to analyze the Java-
nese language test based on the procedure for 
each model. It is hoped there will be more 
test analysis, maybe about mathematics test, a 
certain language test, or the other test, espe-
cially the Javanese language test. Therefore, it 
will give the teacher a view to making a better 
test in the next chance that gives accurate 
information about the examinee ability and 
measures the examinee ability more accurate. 
It is better to use item response theory to ana-
lyze the test because there are some benefits 
that we can get. We can know about each 
item characteristic, the information function 
of each item, and the other benefits. 

Conclusion 

Based on the result of the analysis of 
Javanese language test using the Rasch model 
in R program, the interpretation, and the dis-
cussion, the researchers can conclude some 
points of the characteristic of the Javanese 

language test. The calibration of the fit-model 
was done in three times. It was done to get 
model fits the data with 42 items in the fit-
model. Analysis of the difficulty level shows 
that there are 28 items of 42 items (66.67% of 
43 items) that are a good category. Therefore, 
the Javanese language test is in the moderate 
category based on the difficulty level. 

We can see the characteristic of the 
item in predicting the true probability for 
examinee with a certain ability in the ICC and 
the test characteristic from the TCC. Based 
on ICC and IIC, there are 28 good items 
(66.67%). Based on the information that we 
can get from each item (item information) in 
the theta -3.0 to 3.0, there are 28 items 
(66.67% give information more than 50%) of 
42 items can be used (moderate category 
based on the information in this range of 
theta). From descriptive statistic, it can be said 
that the ability of examinees are in the mo-
derate category because the mean of ability is 
-0.1138 (near from 0.00/average ability). Gen-
erally, the Javanese language test is in the mo-
derate category. It will be better if we evaluate 
the Javanese language test to make a better 
test that gives more accurate information on 
the examinees’ ability. The evaluation of the 
Javanese language test can be used by the 
Javanese language teachers to plan the next 
learning in their class to get better Javanese 
language learning. 

Acknowledgment 

The researchers thank Depok 1 Voca-
tional High School, which had permitted the 
researchers to collect the data. Gratitude is 
also sent to contributors, Ali, Desy, and Laras 
for their help during the data collection and 
dealing with the dichotomy data tabulation. 

References 

Allen, M. J., & Yen, W. M. (1979). Introduction 
to measurement theory. Montery, CA: Cole 
Publishing. 

Arikunto, S. (2010). Prosedur penelitian: Suatu 
pendekatan praktik (Revised ed). Jakarta: 
Rineka Cipta. Retrieved from https:// 
doi.org/10.1017/CBO9781107415324.
004 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 - 73 
ISSN 2460-6995 

Baker, F. B. (2001). The basics of item response 
theory (2nd ed.). College Park, MD: 
ERIC Clearinghouse on Assessment 
and Evaluation. 

Cohen, L., Manion, L., & Morrison, K. 
(2007). Research methods in education (6th 
ed.). London and New York, NY: 
Routledge Falmer. 

Dede, C. (2010). Comparing frameworks for 
21st century skills. In J. Bellance & R. 
Brandt (Eds.), 21st century skills: 
Rethinking how students learn (pp. 51–76). 
Bloomington, IN: Solution Tree Press. 

Downing, S. M. (2003). Item response theory: 
Applications of modern test theory in 
medical education. Medical Education, 
37(8), 739–745. https://doi.org/10.10 
46/j.1365-2923.2003.01587.x 

Embretson, S. E., & Reise, S. P. (2000). Item 
response theory for psychologists: Multivariate 
applications book series. London: Lawrence 
Erlbaum Associates. 

Essen, C. B., Idaka, I. E., & Metibemu, M. A. 
(2017). Item level diagnostics and 
model - data fit in item response theory 
(IRT) using BILOG - MG v3.0 and 
IRTPRO v3.0 programmes. Global 
Journal of Educational Research, 16(2), 87–
94. https://doi.org/10.4314/gjedr.v16 
i2.2 

Finch, W. H., & French, B. F. (2015). Latent 
variable modeling with R. New York, NY: 
Taylor & Francis. 

Gregory, R. J. (2015). Psychological testing: 
History, principles, and applications (7th 
ed.). New York, NY: Pearson 
Education. 

Gruijter, D. N. M., & van der Kamp, L. J. T. 
(2008). Statistical test theory for the 
behavioral sciences. New York, NY: Taylor 
& Francis Group. 

Hailaya, W., Alagumalai, S., & Ben, F. (2014). 
Examining the utility of Assessment 
Literacy Inventory and its portability to 
education systems in the Asia Pacific 
region. Australian Journal of Education, 

58(3), 297–317. https://doi.org/ 
10.1177/0004944114542984 

Hambleton, R K, & Swaminathan, H. (1985). 
Item response theory: Principles and 
applications. Boston, MA: Kluwer-
Nijhoff. 

Hambleton, Ronald K., Swaminathan, H., & 
Rogers, H. J. (1991). Fundamentals of item 
response theory. Newbury Park, CA: Sage 
Publications. 

Hambleton, Ronald K, & Swaminathan, H. 
(1985). Item response theory: Principles and 
applications. Boston, MA: Kluwer 
Nijhoff. 

Harrison, P. M. C., Collins, T., & 
Müllensiefen, D. (2017). Applying 
modern psychometric techniques to 
melodic discrimination testing: Item 
response theory, computerised adaptive 
testing, and automatic item generation. 
Scientific Reports, 7(1), 1–19. https:// 
doi.org/10.1038/s41598-017-03586-z 

Iskandar, A., & Rizal, M. (2018). Analisis 
kualitas soal di perguruan tinggi 
berbasis aplikasi TAP. Jurnal Penelitian 
Dan Evaluasi Pendidikan, 22(1), 12–23. 
https://doi.org/10.21831/pep.v22i1.15
609 

Jambulingam, T., Schellhorn, C., & Sharma, 
R. (2016). Using a Rasch model to rank 
big pharmaceutical firms by financial 
performance. Journal of Commercial 
Biotechnology, 22(1), 49–60. 
https://doi.org/10.5912/jcb734 

Lord, F. M., & Novick, M. R. (2008). Statistical 
theories of mental test scores. (F. Mosteller, 
Ed.). Reading, MA: Addison-Wesley. 

Mallinson. (2007). Rehabilitation institute of 
Chicago in rehabilitation research 
provides new insights. Atlanta, pp. 1–3. 

Mardapi, D. (2017). Pengukuran, penilaian, dan 
evaluasi pendidikan (2nd ed.). Yogyakarta: 
Parama Publishing. 

Ostini, R., & Nering, M. L. (2006). Polytomous 
item response theory models. Thousand 
Oaks, CA: SAGE Publications. 


An analysis of Javanese language test characteristic... 
Muchlisin, Djemari Mardapi, & Farida Agus Setiawati 

74 - Copyright © 2019, REiD (Research and Evaluation in Education), 5(1), 2019 
ISSN 2460-6995 

Purnama, D. N. (2017). Characteristics and 
equation of accounting vocational 
theory trial test items for vocational 
high schools by subject-matter teachers’ 
forum. REiD (Research and Evaluation in 
Education), 3(2), 152–162. https://doi 
.org/10.21831/reid.v3i2.18121 

Reckase, M. D. (2009). Multidimensional item 
response theory (Statistics for social and 
behavioral sciences). New York, NY: 
Springer. 

Setiawati, F. A., Izzaty, R. E., & Hidayat, V. 
(2018a). Analisis respons butir pada tes 
bakat skolastik. Jurnal Psikologi, 17(1), 1–
17. https://doi.org/10.14710/jp.17.1.1-
17 

Setiawati, F. A., Izzaty, R. E., & Hidayat, V. 
(2018b). Items parameters of the space-
relations subtest using item response 
theory. Data in Brief, 19, 1785–1793. 
https://doi.org/10.1016/j.dib.2018.06.
061 

Sumintono, B., & Widhiarso, W. (2015). 
Aplikasi pemodelan Rasch pada asesmen 
pendidikan. Bandung: Trim Komunikata. 
Retrieved from https://umexpert.um. 
edu.my/file/publication/00013268_127
390.pdf 

Trilling, B., & Fadel, C. (2009). 21st century 
skills: Learning for life in our times. San 
Francisco, CA: Jossey-Bass. 

van der Linden, W. J., & Hambleton, R. K. 
(1996). Handbook of modern item response 
theory. New York, NY: Springer 
Science+Business Media. https://doi. 
org/10.1007/978-1-4757-2691-6 I 

Young, D. J., Levy, F., Martin, N. C., & Hay, 
D. A. (2009). Attention deficit 
hyperactivity disorder: A Rasch analysis 
of the SWAN rating scale. Child 
Psychiatry and Human Development, 40(4), 
543–559. https://doi.org/10.1007/s10 
578-009-0143-z 

Zubaidah, S. (2017). Keterampilan abad ke-
21: Keterampilan yang diajarkan melalui 
pembelajaran. In Isu-Isu Strategis 
Pembelajaran MIPA Abad 21. Sintang, 
West Kalimantan: Program Studi 
Pendidikan Biologi STKIP Persada 
Khatulistiwa Sintang. Retrieved from 
https://www.researchgate.net/publicati
on/318013627_keterampilan_abad_ke-
21_keterampilan_yang_diajarkan_melal
ui_pembelajaran