This is an open access article under the CC-BY-SA license.  

 
REiD (Research and Evaluation in Education), 6(1), 2020, 32-40 

Available online at: http://journal.uny.ac.id/index.php/reid 

 
Alternative item selection strategies for improving test security in 
computerized adaptive testing of the algorithm 

 
*Iwan Suhardi 
Faculty of Engineering, Universitas Negeri Makassar 

Jl. Daeng Tata Raya, Parang Tambung, Mannuruki, Tamalate, Kota Makassar, Sulawesi Selatan 
90224, Indonesia 

*Corresponding Author. E-mail: iwan.suhardi@unm.ac.id 

 
Submitted: 5 March 2020 | Revised: 21 April 2020 | Accepted: 29 April 2020 

 
Abstract 
One of the ability estimation methods that is widely applied to the Computerized Adaptive Testing (CAT) 
algorithm is the maximum likelihood estimation (MLE). However, the maximum likelihood method has 
the disadvantage of being unable to find a solution to the ability estimation of test-takers when the test 
takers’ scores do not have a pattern. If there are test takers who get either score of 0 or perfect score, then 
the abilities of test-takers are usually estimated using the step-size model. However, the step-size model 
often results in item exposure where certain items will appear more often than other items. This surely 
threatens the security of the test because items that often appear will be easier to recognize. This study 
tries to provide an alternative strategy by modifying the step-size model and randomizing the calculation 
results of the information function obtained. Based on the results of the study, it is found that alternative 
strategies for item selection can make more varied items appear to improve the security of tests on the 
CAT.   

Keywords: item selection strategy, item exposure, step-size, adaptive testing 

 
How to cite: Suhardi, I. (2020). Alternative item selection strategies for improving test 
security in computerized adaptive testing of the algorithm. REiD (Research and Evaluation in 
Education), 6(1), 32-40. doi:https://doi.org/10.21831/reid.v6i1.30508. 

 
Introduction  

The development of item response 
theory (IRT) and computer technology that is 
faster and in a large capacity allows the devel-
opment of computerized adaptive testing 
(CAT) (Haryanto, 2013, pp. 49–50). It is 
called “computerized” testing because the 
testing process no longer uses paper and pen-
cil, but rather uses a computer device. It is 
called “adaptive” testing because the items 
that appear are chosen in such a way and 
adjusted to the ability of the test takers in-
dependently. CAT is a test conducted for test-
takers where the items are determined based 
on the answers of the test takers (Winarno, 
2013, p. 577). The efficiency of CAT com-

pared to conventional testing models has 
been supported by several studies. The results 
of research by Eignor concluded that at the 
same level of measurement precision, adap-
tive tests only required a test length that was 
less than half of the computer-based test 
(CBT) device (Eignor, Stocking, Way, & 
Steffen, 1993; Grist, 1989, p. 2; Rudner, 1998, 
p. 2). McBride and Martin concluded that to 
achieve the same level of reliability, conven-
tional testing required 2.57 times more items 
than adaptive testing (McBride & Martin, 
1983). 

The method widely used to estimate the 
ability of test-takers is the maximum likeli-
hood estimation (MLE). The application of 

https://creativecommons.org/licenses/by-sa/4.0/deed.id
https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 - 33 
ISSN: 2460-6995 (Online) 

the maximum likelihood method has the dis-
advantage of being unable to find a solution 
when there are test takers who get extreme 
scores where all answers are always incorrect 
or always correct. To overcome this problem, 
the step-size method is generally employed. 
However, the application of the MLE and 
step-size model often leads to item exposure, 
which is the frequent appearance of certain 
items given to test takers. Although CAT is 
more efficient and reliable, the security of this 
testing is not guaranteed because certain items 
appear repeatedly. The items are easily recog-
nized because they appear frequently, espe-
cially at the beginning of the item sequence. 
Therefore, modifications are needed to the 
conventional CAT algorithm to minimize the 
appearance of these easily noticeable items. 
The procedures that are commonly used in 
developing conventional CAT algorithms are 
elaborated as follows (Thissen, 1990). 

Starting CAT 

CAT generally starts with the selection 
of items with the difficulty level of moderate 
(Mills, 1999, p. 123; Santoso, 2010, p. 70; 
Vispoel, 1999). A test taker who answers in-
correctly will then be given items with the dif-
ficulty level of easy. Conversely, if test taker 
answers correctly, they will be given items 
with the difficulty level of hard. 

Estimating the Ability of the Test-Takers 

The method commonly used to esti-
mate the ability of test-takers is MLE (Baker, 
1992; Birnbaum, 1968). The estimation of the 
ability of test-takers using the maximum likeli-
hood method is calculated using the Newton-
Raphson iterative procedure (Hambleton & 
Swaminathan, 1985, p. 83). The Newton-
Raphson iterative procedure is performed first 
by subtracting the ratio of the first derivative 
to the second derivative from the initial   val-
ue so that it results in new . This procedure 
is repeated by using the new  and calculating 
the value of the new derivative ratio. The 
estimated value of  at (m + 1) iteration can 
be expressed using the iterative relation as 
presented in Formula (1). Meanwhile, the er-
ror value is a correction factor that is formu-
lated as seen in Formula (2), where u equals 1 

if student’s answer is correct and u equals 0 if 
student’s answer is incorrect. Besides, P is 
probability of participants answering the items 
correctly, which is obtained by Formula (3). 

 
 ……............. (1) 

 … (2) 

…. (3) 

The iteration process is stopped when 

the error value , with ε as limiting number 
whose value is very small. In this study, the ε 
value of 0.0001 was used. 

One problem with the application of 
the MLE method in adaptive testing is the 
inability of the MLE method to find solutions 
when there are test takers who get an extreme 
score, which is either a score of 0 or a perfect 
score. To overcome the problem of the in-
ability of the MLE method to estimate the 
ability of test-takers when their responses did 
not have a pattern, the step size method can 
be used (Dodd, 1990). Based on the step size 
method, the test taker's ability level is up-
graded or degraded by a certain constant as 
long as the test taker’s responses do not have 
a pattern, for example, by using a step size of 
0.5. 

Selection of the Next Item 

After the test taker’s ability is success-
fully estimated, the CAT algorithm will then 
select the next item. Lord recommended the 
use of the maximum item information proce-
dure to select the next item (Lord, 1977). This 
method guarantees a highly accurate estima-
tion of the ability of test-takers (Eignor et al., 
1993). Items that have the greatest informa-
tion function value on the ability of certain 
test takers are selected to be presented to 
them. The item information function is cal-
culated at each ability level with the equation 
in Formula (4) (Hambleton, Swaminathan, & 
Rogers, 1991, p. 107). 

 
…. (4) 

https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

34 - Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 
ISSN: 2460-6995 (Online) 

Formula (4) shows that the information 
value only depends on the characteristic value 
of item parameters (for example the values of 
b, a, and c for the 3PL model) and the level of 
ability (θ). Thus, for each ability level (θ), the 
information function contribution for each 
item in the question bank can be calculated. 

The test information function is the 
sum of the information functions of the test 
item and is written as in Formula (5). Mean-
while, the test information function illustrates 
the accuracy of the test set in estimating dif-
ferent levels of ability. The greater the infor-
mation at the given ability level, the more ac-
curate the ability is estimated from the test kit. 
The standard error of measurement (SEM) is 
expressed by the equation in Formula (6) 
(Hambleton & Swaminathan, 1985, p. 95). 

 
 .............................. (5) 

 ............................ (6) 

Termination of CAT 

CAT termination uses criteria of equal 
measurement precision and a fixed number of 
items. Equal measurement precision criteria 
aim to produce test scores with the same 
measurement error level for each test taker. 
The standard error of measurement is limited 
to 0.30, which is equivalent to reliability of 
91% on conventional tests (Thissen, 1990). 
By using the criteria, the number of items the 
test takers must work on can vary (where the 
number of items is not the same). However, 
to avoid the test process that may not be 
converging, the criterion of a fixed number of 
items is also used in the CAT termination 
rules by limiting the maximum items that ap-
pear, for example, as many as 20 items. 

Giving Score to the Ability of the Test-Takers 

The score of the ability estimation of 
the test-taker derives from the conversion of 
the value θ that is obtained by Formula (7). 

 
 ………………….. (7) 

In this study, the CATs assessment results, 
which were the conventional CAT model (by 

taking the information value of items or the 
largest I (θ)) and the alternative CAT model 
(by taking some of the largest I (θ) values, 
then taken randomly to determine the value 
of I (θ) that would be used), were compared. 
After that, the alternative CAT model was 
treated using the step-size method with an 
additional variable of response time when the 
test takers’ responses did not have a pattern. 

The assumption underlying the re-
sponse time variable is those test-takers who 
have a high level of ability will be able to 
answer the items correctly in a shorter time 
than those who have a lower level of ability. 
Lidia Martinez compared groups of test-takers 
who took a test using CBT and found that the 
groups that spent the shortest average time 
responding to the initial test item obtained a 
higher average score (Martinez, 2009). Phil 
Higgins’ research results showed that in CBT, 
if the item difficulty index was higher, then 
test-takers would need more time to answer 
and review the items (Higgins, 2009). This 
showed that the test taker’s response time in 
working on the items correctly correlated with 
the estimation of the test taker’s ability level. 

Method 

This study used a Research and Devel-
opment (R&D) approach. The study began 
with the development of a question bank to 
obtain 265 items based on the 1-parameter 
logistic item response theory (1PL IRT) mod-
el. Characteristics of items in the form of pa-
rameters of the difficulty level of 265 items 
were obtained from the validation of proces-
sed results using the BILOG-MG software, 
obtained from the response test using CBT. 
The total number of items before validation 
was originally 290 items. A summary of the 
question bank validation statistics developed 
and used in this study is presented in Table 1. 

Table 1. Summary of Item Statistics on 
Question Bank 

General 
Information 

Based on 1PL IRT 
Number of items = 265 items 

Criteria of Item 
Difficulty Index (b) 

Hard category   = 40 items 
Moderate category = 128 
items 
Easy category = 97 items 

https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 - 35 
ISSN: 2460-6995 (Online) 

In the 1PL IRT model, the probability 
of a person with a certain ability (θ) answering 
the items correctly depends only on the 
difficulty level of the items (b). In this study, 
the estimation methods of the ability of test-
takers are the MLE and step-size methods. 

Next, two adaptive test designs devel-
oped were the conventional and the alter-
native CAT model. In this study, the develop-
ment of CAT software referred to the incre-
mental model (Pressman, 2001, pp. 35–36). 

In the conventional CAT model, the 
first item selection method employs a diffi-
culty level of moderate, starting with a range 
of b values from -0.5 to 0.5 chosen randomly. 
The ability level estimation is calculated using 
the MLE method. However, when the test-
takers’ responses have not had a pattern, their 
ability is estimated using the step size method 
with a value of 0.5. The next item that is se-
lected is the item that has the greatest infor-
mation function value on a particular ability. 

The alternative CAT model has the 
same principles as the conventional CAT 
model. The difference is in the selection of 
the second and subsequent items, which uses 
the principle of randomizing the value of the 
information function in the 5-4-3-2-1 pattern. 
The pattern rule of 5-4-3-2-1 used was that 
the second item was selected from one item 
randomly from the five items that had the 
largest information function, the third item 
was selected from one item randomly from 
the four items that had the largest infor-
mation function, the fourth item was selected 
from one item randomly from three items 
that had the largest information function, the 
fourth item was selected from one item ran-

domly from three items that had the largest 
information function, and the fifth item was 
selected from one item randomly from two 
items that had the largest information func-
tion. Meanwhile, for the sixth and subsequent 
items, the item selection criteria revert to the 
maximum information function criteria or re-
vert to the conventional CAT model. 

To estimate the ability of test-takers on 
the alternative CAT model when their re-
sponses have not had a pattern, a step-size 
method is used with the addition of the re-
sponse time variable. The test-takers’ esti-
mated initial ability level is selected at the abil-
ity level of θ0. Moreover, the step-size inter-
val changes constantly by k (where in this 
study, the value of k=0.5). If the test taker re-
sponds by answering incorrectly, the test-
taker’s estimated ability level becomes θ0 – 
k or equal to 0-0.5 = -0.5. Meanwhile, if the 
test taker answers correctly, the estimated 
ability level becomes θ0+x k or 0.5 . x, where 
x is a positive constant multiplier and the 
value depends on the category of students’ 
response time when their answer is correct. 

Table 2 shows a simulation procedure 
to estimate the test taker’s ability level with a 
step-size interval added to the response time 
factor. Test takers were given 300 seconds to 
respond to each item. If for more than 300 
seconds there is no response from test taker, 
the response is declared incorrect and easier-
level items will be displayed. In this study, the 
criterion for test termination is that the test is 
terminated if the SEM value has reached 0.30. 
An SEM value of 0.30 is equivalent to the 
reliability of 0.91 in conventional tests such as 
paper and pencil tests (Thissen, 1990). 

Table 2. Estimation of Ability of Test-Taker in the Response-Time-Based Step-Size Method 

Annotation: 

θ0    = Initial ability = 0 

k     = step size  =  0.5 

x     = constant multiplier 

θke-i = θi-1 + xk (for correct response) 

θke-i = θi-1 – k (for incorrect response) 

Responding with Correct 

Answer in Consecutive Times 

Responding with Incorrect 

Answer in Consecutive Times 

Item 1 Item 2 Item 3 Item 1 Item 2 Item 3 

θ1 θ2 θ3 θ1 θ2 θ3 

Very fast: x = 1.4 (≤  30 seconds) 0.7 1.4 2.1 -0.5 -1.0 -1.5 

Fast : x = 1.3 (31 to 60 seconds) 0.65 1.3 1.95 -0.5 -1.0 -1.5 

Moderate: x = 1.2 (61 to 90 seconds) 0.6 1.2 1.8 -0.5 -1.0 -1.5 

Slow : x = 1.1 (91 to 120 seconds) 0.55 1.1 1.65 -0.5 -1.0 -1.5 

Very slow : x = 1 (≥ 121 seconds) 0.5 1.0 1.5 -0.5 -1.0 -1.5 

https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

36 - Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 
ISSN: 2460-6995 (Online) 

Table 3. Testing Results of Conventional CAT Model when Responses of Answers Have Not 
Had Pattern Yet 

Item 1 Item 1 was taken randomly with the difficulty level of moderate (-0.5 ≤ b ≤ 0.5) 

Item 
Test Takers’ Responses are Always Correct Test Takers’ Responses are Always Incorrect 

Value of θ List Number of Item Value of θ List Number of Item 

Item 2 - 0.5 209 0.5 275 
Item 3 - 1 164 1 081 
Item 4 - 1.5 113 1.5 002 
Item 5 - 2 044 2 091 
Item 6 -  2.5 237 2.5 115 

 
Findings and Discussion 

Before the answers have a pattern, the 
conventional CAT model will use the step-
size method with an interval of 0.5. This 
means that if the test taker always responds 
with the correct answer, then the second and 
subsequent items that will appear are items 
that have the largest information function 
value at the ability level (θ) of 0.5, 1, 1.5, 2, 
2.5, and 3 respectively. Meanwhile, for test-
takers who always respond with the incorrect 
answer, the second and subsequent items that 
will appear are items that have the largest 
information function value at θ of -0.5, -1, -
1.5, -2, -2.5, and -3 respectively. The results 
that were obtained in the conventional CAT 
model are summarized in Table 3. 

From the results of the study, it was 
found that items with list numbers of 209, 
164, 113, 044, 237, 275, 081, 002, 091, and 
115 were items that appeared more often than 
other items. The items that often appear will 
make the security of the test in the conven-
tional CAT model degrade because they may 
be items that have been recognized by the test 
takers. 

From the results of conventional CAT 
model testing, it was found that the number 
of items with difficulty index of moderate, 
which was indicated by the difficulty index 
value (b) ranging from -0.5 to +0.5, was 128 
items. This meant that the probability of the 
first item having a chance to appear was 128 
items chosen randomly. This was indeed in 
accordance with the criteria applied to the 
conventional CAT model design algorithm, 
that the initially selected items were items 
with difficulty index of moderate (-0.5 to 
+0.5). 

After the first item displayed and was 
responded by the test taker, the second item 

was presented by using the step-size method. 
This meant that if students responded to the 
item with the correct answer, then the second 
item displayed was the item with maximum 
information for θ = 0.5. However, if students 
responded to items with incorrect answers, 
then the second item that was displayed was 
an item with maximum information for θ = -
0.5. Thus, it was certain that in the conven-
tional CAT, the second item only consisted of 
the possibility of 1 of 2 items only. In this 
study, the second item presented was question 
item number 275 (if the answer was correct) 
and question item number 209 (if the answer 
was incorrect). The frequent appearance of 
item number 275 and item number 209 made 
the security of CAT threatened due to the 
familiarity with the question. 

Another case that also often arises is 
that there has not been a pattern in students’ 
answers so that the step-size method is used. 
For example, if students answered questions 
correctly, the items that would appear were 
questions that had a maximum information 
value for θ = 0.5, 1.0, 1.5, 2.0, and 2.5, which 
were the second item whose item number was 
275, the third item whose item number was 
081, the fourth item whose item number was 
002, the fifth item whose item number was 
091, and the sixth item whose item number 
was 115. 

However, if students always answered 
the question incorrectly, then the item that 
appeared was questions that had a maximum 
information value for θ = -0.5, -1.0, -1.5, -2.0, 
and -2.5, i.e., the second item with item 
number 209, third item with item number 
164, fourth item with item number 113, fifth 
item with item number 044, and sixth item 
with item number 237. In the conventional 
CAT model, if the responses of the test takers 

https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 - 37 
ISSN: 2460-6995 (Online) 

have the same pattern, then the items that 
appear will also be the same. This is what 
makes the security level of the conventional 
CAT model suboptimal. 

If students’ responses already had pat-
terns (where the responses already consisted 
of correct and incorrect answers), then the 
items that appeared next had been quite 
varied because the first item that appeared 
already had a relatively large variety of items 
(128 items). However, by using the maximum 
information function value model to search 
for items that corresponded to the estimated 
level of test-takers’ abilities, it was very pos-
sible that many items could not be presented 
because they never obtained the maximum 
function value for each level of ability. 

The alternative solution proposed was 
to use the step-size method based on the stu-
dent’s response time in answering correctly. 
Student responses were grouped into groups 
based on the time spent by students in an-
swering the questions correctly. In the step-
size method based on response time, the step-
size value formula was given an additional 
constant multiplier based on the response 
time. The faster the students answered cor-
rectly, the greater the constant multiplier 
became. 

An additional solution proposed was to 
randomize the maximum information func-
tion value. If the conventional CAT model 
determined the items that appeared based on 
the value of the (single) maximum informa-

tion function, then the alternative CAT model 
determined the items that appeared by ran-
domizing the maximum information function 
values based on groups of 5–4–3–1–1. For 
example, one of the results of testing the 
alternative CAT model is presented in Table 
4. 

From Table 4, the calculation procedure 
for the alternative CAT model can be ob-
served. From the table, it can be seen that the 
items that appear in the alternative CAT 
model are more varied compared to those in 
the conventional CAT model. The algorithm-
ic procedure in the alternative CAT model 
can be explained as follows. 

The First Item that Appeared was Item 
Number 239 with b = -0.416 

The first item appeared in accordance 
with the criteria that items were taken ran-
domly with a difficulty index of moderate 
whose b value ranged from -0.5 to 0.5. Item 
number 239 fulfilled the criteria. Because 
students’ answers did not have a pattern, the 
method of estimating the ability level was the 
step-size of 0.5. Students’ answers were de-
clared correct (value 1). The time that was 
spent to work on the first item was 34 sec-
onds, so it was included in the fast category 
(between 31 and 60 seconds) with a multiplier 
factor = 1.3. Thus, the value of θ was 0.5 x 
1.3 = 0.64. 

Table 4. Results of Alternative CAT Model Testing 

No. Item b Response Time (second) θ IIF TIF SEM 

1 239 -0.416 1 34 0.65 0.7224 0.7224 1.18 
2 182 0.662 1 40 1.3 0.7223 1.4447 0.83 
3 192 1.32 0 8 1.1809 0.7225 2.1672 0.68 
4 042 1.181 0 49 0.8579 0.7225 2.8897 0.59 
5 132 0.861 1 20 1.3204 0.7225 3.6122 0.53 
6 192 1.32 0 26 1.1161 0.7225 4.3347 0.48 
7 152 1.119 1 10 1.5224 0.7225 5.0572 0.44 
8 002 1.524 0 14 1.3846 0.7224 5.7796 0.42 
9 161 1.396 0 7 1.2399 0.7224 6.502 0.39 
10 013 1.251 1 9 1.5831 0.7225 7.2245 0.37 
11 127 1.579 1 15 1.9486 0.7217 7.9462 0.35 
12 060 1.987 0 12 1.8848 0.7223 8.6685 0.34 
13 062 1.867 0 17 1.8118 0.7222 9.3907 0.33 
14 163 1.787 0 19 1.7339 0.7214 10.1121 0.31 
15 124 1.687 1 14 2.0656 0.7214 10.8335 0.3 

 
https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

38 - Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 
ISSN: 2460-6995 (Online) 

The Second Item that Appeared was Item 
Number 182 with b = 0.662 

The second item appeared because it 
had the five largest information function val-
ues at the value of θ = 0.65, according to the 
use of randomization with the principle of 5–
4–3–2–1. From the five alternative values of 
the largest information function (see Table 5), 
the item with number 182 was selected ran-
domly. The second item was answered cor-
rectly (then the response value was 1). Be-
cause students' answers did not have a pat-
tern, the method of determining the estimated 
ability level was the step-size of 0.5. The item 
was done in 40 seconds and included in the 
fast category (between 31 to 60 seconds) with 
a multiplier factor of 1.3. Thus, the value of θ 
= 0.65 + (0.5 x 1.3) = 1.3. 

Table 5. The Five Alternative Values of the 
Largest Information Function 

Rank Information Function Item b 

1 0.722495 153 0.647 

2 0.722492 274 0.654 
3 0.722474 202 0.643 
4 0.722425 182 0.662 
5 0.721861 003 0.685 

The Third Item that Appeared was Item 
Number 192 with b = 1.32 

The third item appeared because it had 
the four largest information function values at 
the value θ = 1.3 according to the use of ran-
domization with the principle of 5–4–3 –2–1. 
Of the four alternative values for the largest 
information function (see Table 6), the item 
with number 192 was randomly selected. The 
third item was responded with an incorrect 
answer (so the response value was 0). Because 
students’ answers did not have a pattern, the 
method for estimating the level of ability was 
MLE. The value of θ obtained was = 1.1809. 

Table 6. The Four Alternative Values for the 
Largest Information Function 

Rank Information Function Item b 

1 0.722349 053 1.317 

2 0.722291 192 1.32 

3 0.72227 179 1.321 

4 0.722091 145 1.272 

The Fourth Item that Appeared was Item 
Number 042 with b = 1.181 

The fourth item appeared because it 
had the three largest information function val-
ues at the value θ = 1.1809 according to the 
use of randomization with the principle of 5–
4–3 –2–1. Of the three alternative values for 
the largest information function (see Table 7), 
item with number 042 was randomly selected. 
The fourth item was responded with an in-
correct answer (so the response value was 0). 
Because students’ answers did not have a pat-
tern, the method for estimating the level of 
ability was MLE. The value of θ obtained was 
= 0.8579. 

Table 7. The Three Alternative Values for the 
Largest Information Function 

Rank Information Function Item b 

1 0.7225 042 1.181 
2 0.7225 057 1.181 
3 0.722449 021 1.171 

The Fifth Item that Appeared was Item 
Number 132 with b = 0.861 

The fifth item appeared because it had 
the two largest information function values at 
the value θ = 0.8579 according to the use of 
randomization with the principle of 5–4–3 –
2–1. Of the two alternative values for the 
largest information function (see Table 8), the 
item with number 132 was randomly selected. 
The fifth item was responded with the correct 
answer (so the response value was 1). Because 
students’ answers did not have a pattern, the 
method for estimating the level of ability was 
MLE. The value of θ obtained was = 1.3204. 

Table 8. The Two Alternative Values for the 
Largest Information Function 

Rank Information Function Item b 

1 0.722495 132 0.861 

2 0.722474 242 0.865 

The Sixth Item that Appeared was Item 
Number 192 with b = 1.32 

This sixth item appeared because it had 
one largest information function value at the 
value θ = 1.32 according to the use of ran-

https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 - 39 
ISSN: 2460-6995 (Online) 

domization with the principle of 5–4–3 –2–1. 
Of the one alternative value for the largest 
information function (see Table 9), the item 
with number 192 was randomly selected. The 
sixth item was responded with an incorrect 
answer (so the response value was 0). Because 
students’ answers were patterned, the method 
for estimating the level of ability was MLE. 
The value of θ obtained was = 1.1161. 

Table 9. The One Largest Information 
Function Value 

Rank Information Function Item b 

1 0.7225 192 1.32 

 
The subsequent items (i.e. the seventh 

to fifteenth items) used the same method to 
determine the item that had the largest infor-
mation function at its value of θ. The fifteenth 
item became the last item because the crite-
rion for termination rule had been met (SEM 
= 0.3). It was converted to a numerical value 
of 85. 

This alternative CAT model has been 
proven to be able to overcome a fundamental 
shortcoming in the conventional CAT model, 
which was the frequent appearance of certain 
items. From Table 3, it can be seen that in the 
conventional CAT model, several similar i-
tems would appear, especially in the initial 
patterns of CAT execution. Meanwhile, in 
Table 4, there were many variations on the 
possible items that appeared on the alternative 
CAT model, even though the patterns of stu-
dents’ answers were the same. The many vari-
ations of items that appear in the alternative 
CAT model can reduce the level of item ex-
posure on CAT so that it will make the CAT 
more secure. The item variations that appear-
ed in the alternative CAT model actually had 
item difficulty index that was not much differ-
ent from those that appeared in the conven-
tional CAT model, so it did not increase the 
test length or reduce the efficiency of the esti-
mation of the ability of the test takers. 

Conclusion 

From the results of this study, it can be 
concluded that the alternative CAT model 
was able to decrease the level of item expo-

sure on the CAT, thereby increasing the secu-
rity of the CAT without increasing the test 
length or reducing the efficiency of the CAT. 
The strategy adopted by the alternative CAT 
model was to select items using the step-size 
method based on response time and random-
ization of the maximum information function 
value with the criteria of 5–4–3–1–1 by ap-
plying the maximum likelihood estimation 
(MLE) to estimate the ability level of the test 
takers. The strategy has been proven to be 
able to present items with more variations, 
but still with item difficulty index which was 
not much different in the response patterns of 
the same test takers. 

References 

Baker, F. B. (1992). Item response theory: 
Parameter estimation techniques. New York, 
NY: Marcel Dekker. 

Birnbaum, A. (1968). Some latent trait models 
and their uses in inferring an examinee’s 
ability. In F. M. Lord & M. R. Novick 
(Eds.), Statistical theories of mental rest scores 
(pp. 397–479). Reading, MA: Addison-
Wesley. 

Dodd, B. G. (1990). The effect of item 
selection procedure and stepsize on 
computerized adaptive attitude 
measurement using the rating scale 
model. Applied Psychological Measurement, 
14(4), 355–366. https://doi.org/ 
10.1177/014662169001400403 

Eignor, D. R., Stocking, M. L., Way, W. D., & 
Steffen, M. (1993). Case studies in 
computer adaptive test design through 
simulation. https://doi.org/10.1002/ 
j.2333-8504.1993.tb01567.x 

Grist, S. (1989). Computerized adaptive tests. 
In ERIC Digest No. 107. Retrieved from 
https://files.eric.ed.gov/fulltext/ED31
5425.pdf 

Hambleton, R. K., & Swaminathan, H. (1985). 
Item response theory: Principles and 
applications. Boston, MA: Kluwer 
Nijhoff. 

Hambleton, R. K., Swaminathan, H., & 
Rogers, H. J. (1991). Fundamentals of item 

https://doi.org/10.21831/reid.v6i1.30508


https://doi.org/10.21831/reid.v6i1.30508 
Iwan Suhardi 

40 - Copyright © 2020, REiD (Research and Evaluation in Education), 6(1), 2020 
ISSN: 2460-6995 (Online) 

response theory. Newbury Park, CA: Sage 
Publications. 

Haryanto, H. (2013). Pengembangan 
computerized adaptive testing (CAT) 
dengan algoritma logika Fuzzy. Jurnal 
Penelitian Dan Evaluasi Pendidikan, 15(1), 
47–70. https://doi.org/10.21831/ 
pep.v15i1.1087 

Higgins, P. (2009). Candidate measured ability 
and use of time. Retrieved from 
https://www.rasch.org/mra/mra-10-
09.htm 

Lord, Frederic M. (1977). A broad-range 
tailored test of verbal ability. Applied 
Psychological Measurement, 1(1), 95–100. 
https://doi.org/10.1177/01466216770
0100115 

Martinez, L. (2009). Time usage and candidate 
performance. Retrieved from http://www. 
rasch.org/mra/mra-06-09.htm 

McBride, J. R., & Martin, J. T. (1983). 
Reliability and validity of adaptive 
ability tests in a military setting. In D. J. 
Weiss (Ed.), New horizons in testing: Latent 
trait test theory and computerized adaptive 
testing (pp. 224–236). New York, NY: 
Academic Press. 

Mills, C. N. (1999). Development and 
introduction of a computer adaptive 
Graduate Record Examinations 
General Test. In F. Drasgow & J. B. 
Olson-Buchanan (Eds.), Innovations in 
computerized assessment (pp. 117–135). 
Mahwah, NJ: Lawrence Erlbaum 
Associates. 

Pressman, R. S. (2001). Software engineering: A 
practitioner’s approach (5th ed.). New 
York, NY: McGraw-Hill Higher 
Education. 

Rudner, L. M. (1998). An on-line, interactive, 
computer adaptive testing tutorial. Retrieved 
from http://edres.org/scripts/cat 

Santoso, A. (2010). Pengembangan 
computerized adaptive testing untuk 
mengukur hasil belajar mahasiswa 
Universitas Terbuka. Jurnal Penelitian 
Dan Evaluasi Pendidikan, 14(1), 62–83. 
https://doi.org/10.21831/pep.v14i1.19
76 

Thissen, D. (1990). Reliability and 
measurement precision. In H. Wainer, 
N. J. Dorans, R. Flaugher, B. F. Green, 
R. J. Mislevy, L. Steinberg, & D. 
Thissen (Eds.), Computerized adaptive 
testing: A primer (2nd ed., pp. 161–186). 
Hillsdale, NJ: Erlbaum. 

Vispoel, W. P. (1999). Creating computerized 
adaptive tests of music aptitude: 
Problems, solutions, and future 
directions. In F. Drasgow & J. B. 
Olson-Buchanan (Eds.), Innovations in 
computerized assessment (pp. 151–176). 
Mahwah, NJ: Lawrence Erlbaum 
Associates. 

Winarno, W. (2013). Pengembangan 
computerized adaptive testing (CAT) 
menggunakan metode pohon segitiga 
keputusan. Jurnal Penelitian Dan Evaluasi 
Pendidikan, 16(2), 574–592. https://doi. 
org/10.21831/pep.v16i2.1132 

 
https://doi.org/10.21831/reid.v6i1.30508