79

A comparative analysis of pre-
equating and post-equating 
in a large-scale assessment, 
high stakes examination

Abstract
Statistical procedure used in adjusting test score difficulties on 
test forms is known as “equating”. Equating makes it possible for 
various test forms to be used interchangeably. In terms of where 
the equating method fits in the assessment cycle, there are pre-
equating and post-equating methods. The major benefits of 
pre-equating, when applied, are that it facilitates the operational 
processes of examination bodies in terms of rapid score reporting, 
quality control and flexibility in the assessment process. The 
purpose of this study is to ascertain if pre- and post-equating results 
are comparable. Data for this study, which adopted an equivalent 
group design method, was taken from the 2012 Unified Tertiary 
Matriculation Examination (UTME) pre-test and 2013 UTME post-
test in Use of English (UOE) subject. A pre-equating model using 
the 3-parameter (3PL) Item Response Theory (IRT) model was 
used. IRT software was used for the item calibration. Pre- and 
post-equating were carried out using 100-items per test form in 
an UOE test. The results indicate that the raw-score and ability 
estimates between the pre-equated model and the post-equated 
model were comparable. 

Keywords: pre-test, post-test, equating, ability estimates, equi-
valent group design

1. Introduction
Developments in the field of education, psychology and 
statistics communities have immensely assisted resear­
chers in assessment through its contributions towards the 
rapidly growing statistical and psychometric methodologies 
used in test equating. In large­scale examinations such 
as, the Unified Tertiary Matriculation Examination (UTME) 
where candidates’ scores are used for high­stakes 
decisions, testing programmes require new versions 
of tests to be continually produced. The essence and 
expectation is that tests produced should be equivalent in 
test score difficulty as well as in functionality over time. The 
UTME is a computer­based test (CBT) conducted by the 
Joint Admissions and Matriculation Board (JAMB) for the 
purposes of selecting qualified candidates for admissions 
into Nigerian tertiary institutions. The examination, which 
comprised of 23 subjects including the UOE, is conducted 
at different times within a specified period of 14 days for 

Dibu Ojerinde
Dibu65ojerinde@yahoo.com 
Joint Admissions and 
Matriculation Board (JAMB), 
Abuja Nigeria

Omokunmi Popoola
kunmipopoola@yahoo.com 
Joint Admissions and 
Matriculation Board (JAMB), 
Abuja Nigeria

Patrick Onyeneho
patrickonyeneho@yahoo.co.uk 
Joint Admissions and 
Matriculation Board (JAMB), 
Abuja Nigeria

Aminat Egberongbe
amiegberongbe@yahoo.com 
Joint Admissions and 
Matriculation Board (JAMB), 
Abuja Nigeria

DOI: http://dx.doi.
org/10.18820/2519593X/pie.
v34i4.6
ISSN 0258-2236
e-ISSN 2519-593X
Perspectives in Education 
2016 34(4): 79-98
© UV/UFS


80

Perspectives in Education 2016: 34(4)

over 1.5 million candidates. Therefore, the UTME is compulsory for any candidate seeking 
admissions into any tertiary institution in Nigeria. It is therefore a high­stakes test since results 
obtained from this examination is used in making important decisions about the candidates. 
Since this examination is conducted at different times and different days using several test 
forms in 23 subject areas, equating of the test forms is necessary. Equating is therefore a 
statistical procedure used in adjusting scores of two or more tests such that the resulting new 
forms of the test can be comparable. In supporting this assertion, Livingston (2004) defined 
equating as a statistical procedure that adjusts test scores for difficulty of the items. Equating 
as a statistical process refers to the derivation of transformations which places scores of 
different forms of a test onto a scale such that after transformation, the scores on the resulting 
forms are comparable. This definition can be likened to the meaning of equating by Kolen and 
Brennan (2004) who are of the opinion that it is a process that is used in adjusting scores on 
two or more test forms such that the scores can be used interchangeably.

Equating is an important component of any testing programme that produces more than one 
form for a test. It places scores from different forms onto a single scale. Once scores are 
placed on a single scale, the scores are interchangeable (Kolen & Brannam, 2004; Holland 
& Dorans, 2006). This development permits standardisation of scores across test forms such 
that what is applied to one test form is also applied to the other forms enabling consistency and 
accuracy across test forms in classification decisions. It is for this reason that equating has 
become essentially important to testing programmes that use test scores for the measurement 
of students’ growth as well as high­stakes decisions. In the UTME, pre­equating is used in 
establishing a conversion table prior to the operational testing. Kirkpatrick and Way (2008) 
affirmed that a series of advantages arise from the use of the pre­equating over the use of 
post­equating. Top on the list of benefits stated include assessment that is more flexible and 
a better quality­control check for the tests.

Generally, what equating does is adjust test score difference because of score difficulty. 
Normally, it is desirable to have the same group of test takers take the new test form as well 
as the reference form at the same time. The difference in average performance on the two 
forms indicates the difference in form difficulty. After this, scores on the new test form can 
then be statistically adjusted to make the average performances on both forms equivalent. 
Nonetheless, in practice, it is not possible to compel test takers to take two different tests at 
the same time; rather it is more convenient to have the two different groups of test takers take 
the two forms of the test at the same time or on two different occasions. However, because 
these two groups of test takers could have different average abilities, Xuan and Rochelle 
(2011) are of the opinion that the difference in average performance on the two forms could be 
an indication of the existence of both group ability differences and form difficulty differences. 

Equating may be classified as pre­equating or post­equating depending on the period when 
the equating practice is being conducted. Pre­equating according to Tong, Wu and Xu (2008) 
is to conduct equating prior to the operational testing while post­equating involves conducting 
equating after the operational testing. In their paper, they stated that pre­equating and post­
equating are used in K­12 large­scale assessment programmes. In many large­scale, high 
stakes examinations such as the UTME where immediate reporting of scores are required, 
pre­equating is often a preferred alternative to post­equating since the equating transformation 
must be produced in a rather short period of time. Every prospecting UTME candidate is 
expected to enrol four UTME subjects including the UOE. The subjects are selected based 
on the faculty and course requirements. Normalised scores are reported based on the four 


81

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

subjects for each candidate. The normalised scores are based on Z­score and T­score 
transformations of the raw score. No other form of equating is carried out since the equating 
has been done prior to test administration.

The UTME results is solely used by the Joint Admissions and Matriculation Board (JAMB) and 
the tertiary institutions in Nigeria as an entrance examination for selecting eligible candidates 
into the various programmes/courses offered by the institutions. The computer­based testing 
administered by the JAMB takes place at different times and dates and so, several forms of 
the same test are required in each session in order to forestall item over­exposure of the items 
in the item bank. This is a strategy for curbing incidences of examination security breach. 
Since immediate score reporting is needed, all forms of tests for all the subjects are pre­
equated in order to make them equivalent. This is to ensure that no candidate is in any way 
placed at a disadvantage because of administering any form of the test forms. 

When embarking on equating, care must be exercised in order to avoid equating errors. If 
equating errors exceed some tolerable limits as a result of applying pre­equating, this can 
likely lead to multidimensionality. The probable cause for pre­equating error is the presence of 
bias in the item parameter estimates caused by the violation of the assumption of item local 
independence (Kolen & Brennan, 2004). A guide against committing serious equating errors 
through ensuring that model assumptions are to a reasonable extent complied with adds 
value to the final equating results. 

2. Statement of problem
In many large­scale high stakes assessment enterprises such as the UTME, stakeholders 
need assessment evidence as quickly as possible to enable them to make informed decisions 
relating to admissions or other policy issues. The nature of the UTME assessment makes 
it pertinent to release candidates’ results as quickly as possible in compliance to requests 
requiring meeting some deadlines in reporting scores. To facilitate this, test items are often 
calibrated prior to the operational administration with the raw score to scale score conversion 
tables prepared well ahead of the test administration to ease problems that impede quick 
reporting. The use of different forms of the same test for assessment often raises the issue of 
the comparability of test scores across forms. In order to use the scores from different forms of 
a test interchangeably, they must be put on a common scale. The problem is how to make the 
several test forms, which consists of different test items drawn from the same content areas 
of the syllabus, psychometrically equivalent so that whichever form is given to any candidate, 
s/he will not in any way be disadvantaged. 

3. Purpose of study
Measurement equivalence is said to exist when candidates with the same scores on the latent 
trait have the same expected raw or true score at the item level. Raju, Laffitte and Byrne 
(2002: 517) inferred that without measurement equivalence, it is difficult to interpret observed 
mean score differences meaningfully. The purpose of this study therefore, is to compare pre­
equating and post­equating scores of candidates in the UTME high stakes examination in 
order to ascertain if the tests function the same way for students in a field test administration 
as well as in an operational test administration.


82

Perspectives in Education 2016: 34(4)

4. Literature review
While some researchers have varied views regarding the efficacy of pre­equating in a high 
stakes examination, other studies have suggested that pre­equating can achieve satisfactory 
results. For instance, a study by Livingston (2004) which adopted some sort of method similar 
to regression, demonstrated that pre­equating was highly accurate in three of the four New 
Jersey College Basic Skills Placement tests. Studies have also shown that there is a dearth 
of literature on post­equating. However, Kirkpatrick and Way (2008) were of the opinion that 
in post­equating, new operational data can be obtained for items selected from the calibrated 
item pool. They explained that item parameters are estimated for the operational data, and 
operational items are post­equated using the pool (old) and current (new) item parameters as 
well as a scale transformation procedure. If new field test items were administered with the 
operational items, this transformation can be applied to their calibration results as well.

Furthermore, in two of the most recent studies conducted by Domaleski (2006) and Tong 
et al. (2008), they supported the use of pre­equating by having similar pre­ and post­equated 
scoring tables and similar accuracy of classifying students into different performance levels. 
Apart from different research findings about pre­equating, a literature review indicates that 
little research has been conducted on whether pre­equating agrees with the post­equating 
for a test let­based and computer­administered testing programme. What is more, given 
the controversial view towards the use of pre­equating and the appealing features that pre­
equating can offer more research is clearly needed in this area. To this end, this study, which 
employed empirical data, aims at investigating whether the pre­equating results agree with 
the equating results based on operational data (post­equating). The study examined the 
degree to which the IRT pre­equating results agreed with those from IRT post­equating and 
the degree to which the two equating designs agree with each other.

Since pre­equating establishes a conversion table prior to the operational testing, a series 
of advantages often arise from the use of the pre­equating over that of post­equating (Kolen 
& Brennan, 2004) (Kirkpatrick & Way, 2008). These advantages include assessment that is 
more flexible, a better quality­control check for the tests and its ability to facilitate immediate 
score reporting of tests right after the test administration. 

5. Equating designs and equating method
This research is based on the equivalent group equating design. The UTME test is a high stakes 
standardised test that is made up of 100 items. Twenty­three other subjects are also tested 
but candidates are only allowed to choose four subjects according to faculty and departmental 
requirements. The Use of English (UOE) subject is compulsory for all candidates and all the 
tests are administered via a computer­based testing mode using the linear­on­the­fly­testing 
(LOFT) method. In the UOE test, test forms C1, C2, C3 and C4 were created with each taking 
into cognisance the sub­sections of the syllabus and weights as stated in the UTME syllabus. 
In so doing, more than one parallel forms were created. Each of these trial­tested items was 
used in 2012 in creating tests administered in a subsequent operational examination. 

The UTME test therefore contains many versions of the same test (test forms) created from 
the same rational content domain as stored in JAMB item banks. The test forms were built and 
made equivalent in terms of content and psychometric properties. For example, test form C1 
in UOE from the trial­test was taken as a reference form while forms C2, C3 and C4, etc., were 
made equivalent and taken as the focal groups for the pre­equating. Data in these test forms 


83

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

were organised such that they have item distributions of mean = 0 in terms of item difficulties 
b and discrimination parameter a varying between 1 and 2. Test scores on different forms of 
the 2013 post operational exams were also equated using a common reference form – D1 
and adjusting the test score difficulties of the other 3 test forms D2, D3 and D4 respectively. 
The 3­parameter IRT logistic model was used for the item analysis for the 8 UOE test forms 
comprising C1, C2, C3, C4, D1, D2, D3 and D4.

6. Data
Data for the study was extracted from the UTME master file after post­test administration 
as well as from the trial­test. The trial­test data is made up of responses of data from a 
representative sample of students from Senior Secondary Class III in the Use of English 
subject and indeed all other 22 UTME subjects. The students were administered the various 
test forms in a classroom setting at a period when they were psychologically ready for their 
senior secondary examination. The tests were administered to students in a scrambled form so 
that the groups of students taking each form were randomly equivalent. A pre­equating model, 
which employed the 3­parameter IRT logistic model, was used. The Xcalibre 4.0.0 software 
was used because of the necessity to have scoring tables prior to test administration. In this 
study, item parameter estimate and the raw score to theta (e.g., scoring table) relationship 
for pre­equating model were calibrated and developed on the field test data. To enable a 
comparison of the difference in equating results between pre­ and post­equating, data based 
on the post­administration for the four different test forms in UOE of the field test of 2012 and 
4 different post­administration data of test items in UOE in 2013 CBT were used. Each of the 
test forms consists of a sample of approximately 650 candidates’ responses. In all, the data 
used is made up of 5,166 responses. 

7. IRT pre-equating
Tong et al. (2008) defined pre­equating as conducting equating prior to the operational testing. 
The equating design used in pre­equating the UOE items was the IRT equivalent group 
equating procedure. In order to pre­equate the test forms in the 2012 UOE, the response 
data collected during the 2012 field test were first calibrated. Then, one of the test forms 
comprising of response data from the trial­test was calibrated using “a prior” information from 
previous operational data. Thereafter, the pre­test items were put on the same scale as the 
one calibrated using information from the operational items through the mean/sigma method. 
The item parameter estimates from the above step were then used to create the raw­to­scale 
conversion table for each form to the reference form using IRT pre­equating. The pre­equating 
process was carried out by applying the following procedures:

a. Estimates of item parameters were produced using the three­parameter IRT model on the 
2012 trial­test data.

b. The item parameters were placed onto the reference scale by using the item equivalent 
group equating design.

c. Some items were selected from the item bank and used along with some pre­test items to 
build new test forms for parallelism

d. A raw score to theta relationship for these new test forms are developed using the trial­test 
pre­equated item parameters.


84

Perspectives in Education 2016: 34(4)

Despite the advantage of using pre­equating as a cushion where immediate score reporting 
is necessary and as a guide towards reducing incidences of examination security breach, this 
equating method can be vulnerable to equating errors and bias in a test. 

8. IRT post-equating
In carrying out post­equating, the post administration item parameters and scoring table were 
produced using the operational data. During post equating, all the rules used in pre­equating 
were simulated during post­equating such as applying the mean/sigma equating method to 
place the item parameter estimates and scoring tables on the same scale. The following steps 
as suggested by Kolen et al. (2004) were applied during post­equating:

1. Calibrate all items on the operational test form by making the post­operational item 
difficulties centre at a mean value of zero and obtain raw score to theta scoring table.

2. Obtain mean test score difficulty using the post administration item parameters from the 
previous stage.

3. Obtain the scaling constant for post­equating by subtracting the mean item difficulty from 
stage 2 from the mean item difficulty from pre­equating.

4. Adjust all the post­administrational item parameters by adding the scaling constant 
obtained from stage 3. 

9. Test calibration and analysis 
A number of procedures can be performed to achieve item calibration and item linking such 
as carrying out separate calibration with linking, concurrent calibration or fixed parameter 
calibration. In this study, separate calibrations were carried out on all the test forms using 
the three­parameter IRT logistic model (3PL). The 3PL is an IRT model that specifies the 
probability of a correct response to a dichotomously scored multiple­choice item as a logistic 
distribution that introduces a guessing parameter in addition to the discrimination and difficulty 
parameters. Estimation of candidates’ ability was done using the Maximum Likelihood 
Estimation (MLE) method. In statistics, MLE is a method of estimating the parameters of 
a statistical model’s given observations by finding the parameter values that maximise the 
likelihood (or probability) of making the observations, given the parameters. Thereafter, 
the mean/standard deviation suggested by Livingston (2004) was used in placing the item 
parameters on the same scale. 

10. Assessment criteria
In assessing the pre­equating and post­equating results, one major area of concern is the 
item parameter estimates. In order to compare the item parameters of two more test forms 
from post­equating, the two must be placed onto a common operational scale. Statistical 
methods such as correlation analysis can then be used in comparing the differences in the 
item parameter estimates obtained between the two. Correlation coefficients obtained are 
expected to be close to .90 and the average absolute differences between estimates are 
expected to be below 0.20. This same criteria may be applied when comparing pre­ and post­
equating results.

It is also important and interesting to observe how different the raw­score­to­theta scoring 
tables tend to be based on pre­post contrast. In the large­scale assessment context, decisions 


85

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

on classifications are also important. In this study, percentages of students in each of the 
performance levels are also contrasted between pre­ and post­equating. Another reliability 
index examined is the classification accuracy. This is meant to establish what percentages 
of students were accurately classified. The classification method adopted by Gao, He and 
Ruan (2012) was applied to compute classification accuracy index for the pre­ and post­
equating results. To calculate the classification reliability index for a given ability score θ, the 
observed score θˆ is expected to be normally distributed with a mean of θ and a standard 
deviation of SE(θ) – the standard error of measurement associated with the given θ. The 
expected proportion of examinees with true scores in any particular level on high/low or pass/
fail classification rates given by different equating methods was also reported. Each test has 
two cut scores, C and D cuts. Classification rates for the C and D cuts were reported for the 
UOE test in this study. 

While there is no consensus on the best measures of equating effectiveness (Kolen et al., 
2004), three commonly employed measures used in equating studies include the Root Mean 
Square Error (RMSE), the Standard Error of Equating (SEE) and (3) BIAS of the equated 
raw scores (Pomplun, Omar & Custer, 2004). These measures represent total equating error, 
random equating error and systematic equating error, respectively. Notice that all three indices 
were weighted by the frequency of number­correct raw score at each particular level. Total 
equating error and systematic error were calculated with the formulas below:

BIAS = 
∑ifi (xi’ ­ xi)

∑ifi
  (1)

RMSE = 
∑ifi (xi’ ­ xi)

2

∑ifi
  (2)

where fi is the frequency of number­correct raw score level i, Xi’ is the equated score at each 
of the number­correct raw score level and Xi is the equated score from IRT pre­equating at the 
number­correct raw score level i.

The standard error of equating is a measure of random equating error and can be estimated 
with the RMSE and BIAS. The standard error of equating at each possible raw score was 
estimated with:

SEE (fi) =      RMSE(fi)
2 ­ BIAS (fi)

2   (3)

where fi is the frequency of number­correct raw score level i.

11. Results
Table 1 shows the item parameter estimates disparity between pre­ and post­equating results 
for test forms C1 and D1 representing the base test form for pre­equating and one test form 
from the post­equating. Columns1 and 2 in table 1 shows the p­values of test forms C1 and 
D1. Overall, the p­values appear to be higher for the post­equated form than for the pre­
equated one. The reason perhaps may be attributed to the prevailing situation during the 
conduction of the pre­test, as most students do not often take trial­tests as serious as other 
high stakes examinations. However, the item parameter values from the pre­equating were 
found not to be different from the post­equating item parameter estimates because of the 
mean/sigma equating, the average of the item parameter estimates were equated to be the 
same for pre­ and post­equating. 


86

Perspectives in Education 2016: 34(4)

Table 1: Comparisons between pre–equated and post administration item parameter 
estimates of use of English

Item

Pre-
equated 

item mean 
(p-value)

Post-
equated 

item mean 
(p-value)

Pre-
equated 

item 
parameter 

(a)

Pre-
equated 

item 
parameter 

(b)

Post-
equated 

item 
parameter 

(a)

Post-
equated 

item 
parameter 

(b)

Pre-post 
difference

1 0.9983 0.9984 4.6574 ­2.974 6 ­2.0877 ­0.8863

2 0.8831 0.2709 0.8005 ­1.526 6 2.0134 ­3.5394

3 0.2638 0.6256 1.3985 2.6616 2.6891 1.4339 1.2277

4 0.0885 0.092 1.2125 4 1.8418 2.7751 1.2249

5 0.0634 0.087 1.1432 3.7925 1.6143 3.0446 0.7479

6 0.0568 0.6273 1.1169 4 0.8547 0.2142 3.7858

7 0.0751 0.1084 1.111 4 1.4384 2.8714 1.1286

8 0.7446 0.0969 0.4233 ­0.9661 1.5148 3.0391 ­4.0052

9 0.0568 0.1051 1.175 3.5508 1.4855 2.9939 0.5569

10 0.0952 0.1002 1.0557 3.7965 1.4307 2.8522 0.9443

11 0.0351 0.0854 1.1231 3.9127 1.4269 2.7354 1.1773

12 0.0701 0.0542 1.1081 3.8988 1.5332 3.057 0.8418

13 0.0534 0.1117 1.1645 3.595 1.4537 2.8764 0.7186

14 0.2354 0.197 0.907 2.9634 1.3474 2.6269 0.3365

15 0.0501 0.1018 1.1076 3.9601 1.446 2.9046 1.0555

16 0.0501 0.4811 1.0984 3.9751 2.4416 0.3931 3.582

17 0.4474 0.1264 0.743 1.0776 1.488 2.9561 ­1.8785

18 0.0751 0.0575 1.1289 3.5074 1.5058 2.9411 0.5663

19 0.4073 0.3251 0.6921 2.2887 1.0861 1.5251 0.7636

20 0.0835 0.087 1.1354 3.5565 1.4684 2.8734 0.6831

21 0.0684 0.5386 1.2627 3.2729 1.5105 0.3335 2.9394

22 0.7563 0.2545 0.6847 ­0.7316 1.5121 3.0295 ­3.7611

23 0.1619 0.2397 1.2698 3.0003 1.1595 1.8202 1.1801

24 0.1703 0.1166 1.1809 3.265 1.4871 2.9837 0.2813

25 0.606 0.1888 0.6372 0.117 1.5196 3.044 ­2.927

26 0.3122 0.1987 1.1833 2.9771 1.5164 3.0494 ­0.0723

27 0.5476 0.2562 0.8543 0.3928 1.4251 2.8371 ­2.4443

28 0.1336 0.289 1.1425 2.9324 1.1929 1.4307 1.5017


87

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

Item

Pre-
equated 

item mean 
(p-value)

Post-
equated 

item mean 
(p-value)

Pre-
equated 

item 
parameter 

(a)

Pre-
equated 

item 
parameter 

(b)

Post-
equated 

item 
parameter 

(a)

Post-
equated 

item 
parameter 

(b)

Pre-post 
difference

29 0.5042 0.2841 0.624 0.7679 1.3913 2.7849 ­2.017

30 0.2404 0.1478 1.0675 3.0722 1.493 2.9872 0.085

31 0.1386 0.3465 1.1119 2.8014 1.0235 1.2874 1.514

32 0.3706 0.1757 0.9611 2.5122 1.5055 2.9795 ­0.4673

33 0.4558 0.1724 1.0753 0.8105 1.4874 2.9871 ­2.1766

34 0.2237 0.1429 1.0618 2.5839 1.4775 2.9762 ­0.3923

35 0.1135 0.2053 1.2798 2.8837 1.2783 2.5516 0.3321

36 0.0618 0.0788 1.1115 3.9954 1.4763 2.9489 1.0465

37 0.1018 0.1199 1.3219 2.8851 1.4842 2.9842 ­0.0991

38 0.0534 0.4483 1.1268 3.7298 2.2712 0.5067 3.2231

39 0.172 0.4351 1.2424 2.9521 1.9371 0.5428 2.4093

40 0.0952 0.1232 1.2425 3.2605 1.4904 3.0011 0.2594

41 0.0501 0.4943 1.3752 3.0139 2.9113 0.3305 2.6834

42 0.828 0.3268 0.7107 ­1.2719 1.5245 3.0592 ­4.3311

43 0.8731 0.3415 1.0836 ­1.2877 1.5193 3.05 ­4.3377

44 0.8514 0.3333 0.9867 ­1.2055 1.5167 3.0517 ­4.2572

45 0.0935 0.4631 1.2949 3.1726 2.0128 0.5022 2.6704

46 0.0918 0.1051 1.2881 3.171 1.5082 3.0269 0.1441

47 0.1085 0.2085 1.2021 3.4037 1.4827 2.9819 0.4218

48 0.0701 0.1067 1.3129 3.1915 1.4777 2.9619 0.2296

49 0.7462 0.1297 1.0897 ­0.5341 1.5085 3.032 ­3.5661

50 0.6761 0.243 0.6311 ­0.3727 1.4968 3.0168 ­3.3895

51 0.1369 0.0952 1.0308 3.8366 1.5069 3.0255 0.8111

52 0.0568 0.1051 1.2974 3.1161 1.4843 2.979 0.1371

53 0.0985 0.2135 1.2716 3.1313 1.364 2.5987 0.5326

54 0.0584 0.0706 1.1358 3.7712 1.4739 2.8864 0.8848

55 0.1736 0.4959 1.2232 3.1266 1.0059 0.6887 2.4379

56 0.1152 0.1248 1.0961 3.972 1.5216 3.0533 0.9187

57 0.0518 0.1248 1.3067 3.1408 1.4027 2.793 0.3478


88

Perspectives in Education 2016: 34(4)

Item

Pre-
equated 

item mean 
(p-value)

Post-
equated 

item mean 
(p-value)

Pre-
equated 

item 
parameter 

(a)

Pre-
equated 

item 
parameter 

(b)

Post-
equated 

item 
parameter 

(a)

Post-
equated 

item 
parameter 

(b)

Pre-post 
difference

58 0.7813 0.2463 1.0157 ­0.7803 1.5232 3.0553 ­3.8356

59 0.0801 0.0837 1.3068 3.1834 1.5074 3.0035 0.1799

60 0.7179 0.197 1.1117 ­0.4228 1.5052 3.0171 ­3.4399

61 0.8197 0.2677 1.2243 ­0.8248 1.5119 3.0391 ­3.8639

62 0.0785 0.1182 1.311 3.1662 1.4906 2.9987 0.1675

63 0.1619 0.1527 1.1741 2.9733 1.5076 3.0316 ­0.0583

64 0.0668 0.0788 1.1265 3.746 1.5092 3.0204 0.7256

65 0.1135 0.1264 1.2463 3.1242 1.5077 3.0346 0.0896

66 0.7295 0.2135 1.1906 ­0.3924 1.4969 3.0122 ­3.4046

67 0.6678 0.1363 1.2154 ­0.1831 1.5131 3.0425 ­3.2256

68 0.1219 0.1166 1.2231 3.3843 1.5007 3.0219 0.3624

69 0.0634 0.4269 1.3126 3.1709 2.1383 0.5417 2.6292

70 0.172 0.2003 1.1629 3.3349 1.4372 2.8649 0.47

71 0.0551 0.0772 1.18 3.7017 1.4915 3.0104 0.6913

72 0.621 0.2874 0.806 ­0.015 1.4709 2.927 ­2.942

73 0.8397 0.4089 0.8289 ­1.229 1.5122 3.0493 ­4.2783

74 0.7346 0.1757 1.1683 ­0.4817 1.5076 3.0023 ­3.484

75 0.0902 0.0542 1.1948 3.285 1.5017 3.0048 0.2802

76 0.1085 0.468 1.286 3.129 2.1233 0.5758 2.5532

77 0.1035 0.1856 1.2776 3.129 1.4768 2.9757 0.1533

78 0.202 0.1494 1.2011 3.0715 1.488 3.0004 0.0711

79 0.1386 0.0887 1.2572 3.152 1.5039 3.0131 0.1389

80 0.0668 0.3924 1.2062 3.3302 2.127 0.6414 2.6888

81 0.1135 0.2299 1.241 3.0987 1.1827 2.1068 0.9919

82 0.7646 0.2693 0.9913 ­0.7055 1.5065 3.0265 ­3.732

83 0.0735 0.3777 1.2802 3.0588 1.9903 0.7027 2.3561

84 0.0818 0.0788 1.1916 3.3853 1.483 2.9097 0.4756

85 0.1803 0.1839 1.1311 3.3307 1.4995 2.9886 0.3421

86 0.0935 0.1166 1.1268 3.5994 1.4732 2.9424 0.657


89

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

Item

Pre-
equated 

item mean 
(p-value)

Post-
equated 

item mean 
(p-value)

Pre-
equated 

item 
parameter 

(a)

Pre-
equated 

item 
parameter 

(b)

Post-
equated 

item 
parameter 

(a)

Post-
equated 

item 
parameter 

(b)

Pre-post 
difference

87 0.0868 0.1675 1.1469 3.6049 1.3443 2.7093 0.8956

88 0.1753 0.1297 1.0478 3.8787 1.4722 2.9693 0.9094

89 0.1336 0.1658 1.0891 3.5808 1.4459 2.891 0.6898

90 0.1068 0.1297 1.0719 3.8811 1.4595 2.8438 1.0373

91 0.222 0.1741 1.0369 3.8145 1.4599 2.9198 0.8947

92 0.0451 0.0427 1.221 3.4338 1.5177 3.0287 0.4051

93 0.8347 0.353 0.7053 ­1.3243 1.5203 3.0551 ­4.3794

94 0.1536 0.1248 1.2109 2.6945 1.5037 3.0246 ­0.3301

95 0.0851 0.0558 1.115 3.7291 1.5172 2.9931 0.736

96 0.0835 0.0887 1.1955 3.419 1.494 2.9979 0.4211

97 0.7646 0.3103 0.849 ­0.689 1.5052 3.0101 ­3.6991

98 0.8047 0.3235 0.7359 ­1.0372 1.5098 3.0369 ­4.0741

99 0.0952 0.1494 1.2025 3.4407 1.3969 2.7885 0.6522

100 0.0863 0.1084 1.1259 3.6137 1.4341 2.8826 0.7311

The average absolute difference between the item parameter estimates were computed as 
.000342 for C1 and D1, .00491 for C2 and D2, .00572 for C3 and D3 and .00557 for C4 and 
D4. In addition, all were found to be less than the benchmark of .20. Table 2 also shows 
the correlations between pairs of pre­equating and post­equating item parameter estimates 
of C1D1, C2D2, DC3D3 and C4D4. The results revealed correlation coefficients of .995**, 
.954**, .995** and .996**. 


90

Perspectives in Education 2016: 34(4)

Table 2: Correlation of pre­equating and post­equating item parameters

C1_Pre C2_Pre C3_Pre C4_Pre

D1_Post

Pearson Correlation .995** 0.036 0.062 0.082

Sig. (2­tailed) 0 0.722 0.54 0.42

N 100 100 100 100

D2_Post

Pearson Correlation 0.026 .994** 0.071 ­0.1

Sig. (2­tailed) 0.795 0 0.484 0.323

N 100 100 100 100

D3_Post

Pearson Correlation 0.06 0.076 .995** 0.191

Sig. (2­tailed) 0.555 0.455 0 0.056

N 100 100 100 100

D4_Post

Pearson Correlation 0.076 ­0.111 .212* .996**

Sig. (2­tailed) 0.454 0.274 0.035 0

N 100 100 100 100

**. Correlation is significant at the 0.01 level (2­tailed).

*. Correlation is significant at the 0.05 level (2­tailed).

Making decisions from the criteria earlier stated in assessment criteria (i.e., correlation being 
0.90 and average absolute difference being less than 0.20), the item parameter estimates 
between the two equating models are the same. Figures 1, 2, 3, and 4 also show the scatter 
plot of the relationship between the pre­equating and post­equating test forms.


91

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

 
Fig 1: Scatter plot of relationship between      Fig. 2: Scatter plot of relationship between pre- 
pre-equating and post-equating of C1 and D1 equating and post-equating of C2 and D2 forms 
           
 
Fig 3: Scatter plot of relationship between      Fig. 4: Scatter plot of relationship between pre- 
pre-equating and post-equating of C3 and D3 equating and post-equating of C4 and D4 test forms 
test forms     
 
All the items constituting the two different forms were aligned to the linear straight line showing 

highly close relationship. In the same way, figures 5, 6, 7 and 8 in depict the raw score-to-theta-scoring 

tables based on the two equating models mentioned above. While the horizontal axis represents the 

ability estimates, the vertical axis represents raw scores. From the figures, it is certain that the raw 

score-to-theta scoring tables for pre-equating and post-equating models were overlapping each other.  

‐5

0

5

‐2 0 2 4

Po
st
‐E
qu

at
ed

 U
O
E 
Sc
or
e

Pre_equated UOE Scores

Scatter Plot of C1 and D1 Test 
Forms  of UOE Scores

Pre and Post Equating of UOE Scores

0

0.5

0 0.2 0.4 0.6

U
O
E 
Po

st
‐E
qu

at
in
g

OUE Pre‐Equating

Scatter Plot of C2 and D2 Test Forms 
of UOE   

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1 1.5

Po
st
‐E
qu

at
in
g 
U
O
E

UOE Pre‐Equating

Scatter Plot of C3 and D3 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1

Po
st
‐E
qu

at
in
g 
U
O
E

Pre‐Equating UOE

Scatter Plot of C4 and D4 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

Fig 1: Scatter plot of relationship between pre­equating and post­equating of C1 and D1 

 
Fig 1: Scatter plot of relationship between      Fig. 2: Scatter plot of relationship between pre- 
pre-equating and post-equating of C1 and D1 equating and post-equating of C2 and D2 forms 
           
 
Fig 3: Scatter plot of relationship between      Fig. 4: Scatter plot of relationship between pre- 
pre-equating and post-equating of C3 and D3 equating and post-equating of C4 and D4 test forms 
test forms     
 
All the items constituting the two different forms were aligned to the linear straight line showing 

highly close relationship. In the same way, figures 5, 6, 7 and 8 in depict the raw score-to-theta-scoring 

tables based on the two equating models mentioned above. While the horizontal axis represents the 

ability estimates, the vertical axis represents raw scores. From the figures, it is certain that the raw 

score-to-theta scoring tables for pre-equating and post-equating models were overlapping each other.  

‐5

0

5

‐2 0 2 4

Po
st
‐E
qu

at
ed

 U
O
E 
Sc
or
e

Pre_equated UOE Scores

Scatter Plot of C1 and D1 Test 
Forms  of UOE Scores

Pre and Post Equating of UOE Scores

0

0.5

0 0.2 0.4 0.6

U
O
E 
Po

st
‐E
qu

at
in
g

OUE Pre‐Equating

Scatter Plot of C2 and D2 Test Forms 
of UOE   

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1 1.5

Po
st
‐E
qu

at
in
g 
U
O
E

UOE Pre‐Equating

Scatter Plot of C3 and D3 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1

Po
st
‐E
qu

at
in
g 
U
O
E

Pre‐Equating UOE

Scatter Plot of C4 and D4 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

Fig. 2: Scatter plot of relationship between pre­equating and post­equating of C2 and D2 
forms


92

Perspectives in Education 2016: 34(4)

 
Fig 1: Scatter plot of relationship between      Fig. 2: Scatter plot of relationship between pre- 
pre-equating and post-equating of C1 and D1 equating and post-equating of C2 and D2 forms 
           
 
Fig 3: Scatter plot of relationship between      Fig. 4: Scatter plot of relationship between pre- 
pre-equating and post-equating of C3 and D3 equating and post-equating of C4 and D4 test forms 
test forms     
 
All the items constituting the two different forms were aligned to the linear straight line showing 

highly close relationship. In the same way, figures 5, 6, 7 and 8 in depict the raw score-to-theta-scoring 

tables based on the two equating models mentioned above. While the horizontal axis represents the 

ability estimates, the vertical axis represents raw scores. From the figures, it is certain that the raw 

score-to-theta scoring tables for pre-equating and post-equating models were overlapping each other.  

‐5

0

5

‐2 0 2 4

Po
st
‐E
qu

at
ed

 U
O
E 
Sc
or
e

Pre_equated UOE Scores

Scatter Plot of C1 and D1 Test 
Forms  of UOE Scores

Pre and Post Equating of UOE Scores

0

0.5

0 0.2 0.4 0.6

U
O
E 
Po

st
‐E
qu

at
in
g

OUE Pre‐Equating

Scatter Plot of C2 and D2 Test Forms 
of UOE   

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1 1.5

Po
st
‐E
qu

at
in
g 
U
O
E

UOE Pre‐Equating

Scatter Plot of C3 and D3 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1

Po
st
‐E
qu

at
in
g 
U
O
E

Pre‐Equating UOE

Scatter Plot of C4 and D4 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

Fig 3: Scatter plot of relationship between pre­equating and post­equating of C3 and D3 
test forms 

 
Fig 1: Scatter plot of relationship between      Fig. 2: Scatter plot of relationship between pre- 
pre-equating and post-equating of C1 and D1 equating and post-equating of C2 and D2 forms 
           
 
Fig 3: Scatter plot of relationship between      Fig. 4: Scatter plot of relationship between pre- 
pre-equating and post-equating of C3 and D3 equating and post-equating of C4 and D4 test forms 
test forms     
 
All the items constituting the two different forms were aligned to the linear straight line showing 

highly close relationship. In the same way, figures 5, 6, 7 and 8 in depict the raw score-to-theta-scoring 

tables based on the two equating models mentioned above. While the horizontal axis represents the 

ability estimates, the vertical axis represents raw scores. From the figures, it is certain that the raw 

score-to-theta scoring tables for pre-equating and post-equating models were overlapping each other.  

‐5

0

5

‐2 0 2 4

Po
st
‐E
qu

at
ed

 U
O
E 
Sc
or
e

Pre_equated UOE Scores

Scatter Plot of C1 and D1 Test 
Forms  of UOE Scores

Pre and Post Equating of UOE Scores

0

0.5

0 0.2 0.4 0.6
U
O
E 
Po

st
‐E
qu

at
in
g

OUE Pre‐Equating

Scatter Plot of C2 and D2 Test Forms 
of UOE   

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1 1.5

Po
st
‐E
qu

at
in
g 
U
O
E

UOE Pre‐Equating

Scatter Plot of C3 and D3 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

0
1
2

0 0.5 1

Po
st
‐E
qu

at
in
g 
U
O
E

Pre‐Equating UOE

Scatter Plot of C4 and D4 Test 
Forms of UOE

Pre and Post Equating

Linear (Pre and Post Equating)

Fig. 4: Scatter plot of relationship between pre­equating and post­equating of C4 and D4 
test forms

All the items constituting the two different forms were aligned to the linear straight line showing 
a highly close relationship. In the same way, figures 5, 6, 7 and 8 depict the raw score­to­theta­
scoring tables based on the two equating models mentioned above. While the horizontal axis 
represents the ability estimates, the vertical axis represents raw scores. From the figures, it is 
certain that the raw score­to­theta scoring tables for pre­equating and post­equating models 
were overlapping each other. 


93

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

 
Figure 5: TCC of test forms C1 and D1 Figure 6: TCC of test forms C2 and D2 

   
Figure 7: TCC of test forms C3 and D3         Figure 8: TCC of test forms C4 and D4 
 

Table 3 shows that for the classification rate, the IRT post-equating tended to pass more examinees 

than the pre-equating methods in total. The table shows that the IRT pre-equating method tended to 

pass fewer examinees than the IRT post-equating method at the C cut and in total. However, the 

reverse is the case for the D cut, where the pre-equating method passed more candidates in test forms 

C1, C2 and C4.  
 

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re
s

Ability

Pre‐Equating Post‐Equating

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re

Ability

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

‐4

Ra
w
 S
co
re
s

Ability

TCC of Pre and Post‐
equating

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

1

‐
4
‐
3
‐
2
‐
1
01234

Ra
w
 S
co
re
s

Ability

TCC of C4 and D4 Test Forms

Pre‐Equating Post‐Equating

Figure 5: TCC of test forms C1 and D1

 
Figure 5: TCC of test forms C1 and D1 Figure 6: TCC of test forms C2 and D2 

   
Figure 7: TCC of test forms C3 and D3         Figure 8: TCC of test forms C4 and D4 
 

Table 3 shows that for the classification rate, the IRT post-equating tended to pass more examinees 

than the pre-equating methods in total. The table shows that the IRT pre-equating method tended to 

pass fewer examinees than the IRT post-equating method at the C cut and in total. However, the 

reverse is the case for the D cut, where the pre-equating method passed more candidates in test forms 

C1, C2 and C4.  
 

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re
s

Ability

Pre‐Equating Post‐Equating

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re

Ability

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

‐4

Ra
w
 S
co
re
s

Ability

TCC of Pre and Post‐
equating

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

1

‐
4
‐
3
‐
2
‐
1
01234

Ra
w
 S
co
re
s

Ability

TCC of C4 and D4 Test Forms

Pre‐Equating Post‐Equating

Figure 6: TCC of test forms C2 and D2


94

Perspectives in Education 2016: 34(4)
 

Figure 5: TCC of test forms C1 and D1 Figure 6: TCC of test forms C2 and D2 

   
Figure 7: TCC of test forms C3 and D3         Figure 8: TCC of test forms C4 and D4 
 

Table 3 shows that for the classification rate, the IRT post-equating tended to pass more examinees 

than the pre-equating methods in total. The table shows that the IRT pre-equating method tended to 

pass fewer examinees than the IRT post-equating method at the C cut and in total. However, the 

reverse is the case for the D cut, where the pre-equating method passed more candidates in test forms 

C1, C2 and C4.  
 

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re
s

Ability

Pre‐Equating Post‐Equating

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re

Ability

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

‐4

Ra
w
 S
co
re
s

Ability

TCC of Pre and Post‐
equating

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

1

‐
4
‐
3
‐
2
‐
1
01234

Ra
w
 S
co
re
s

Ability

TCC of C4 and D4 Test Forms

Pre‐Equating Post‐Equating

 
Figure 5: TCC of test forms C1 and D1 Figure 6: TCC of test forms C2 and D2 

   
Figure 7: TCC of test forms C3 and D3         Figure 8: TCC of test forms C4 and D4 
 

Table 3 shows that for the classification rate, the IRT post-equating tended to pass more examinees 

than the pre-equating methods in total. The table shows that the IRT pre-equating method tended to 

pass fewer examinees than the IRT post-equating method at the C cut and in total. However, the 

reverse is the case for the D cut, where the pre-equating method passed more candidates in test forms 

C1, C2 and C4.  
 

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re
s

Ability

Pre‐Equating Post‐Equating

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re

Ability

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

‐4

Ra
w
 S
co
re
s

Ability

TCC of Pre and Post‐
equating

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

1

‐
4
‐
3
‐
2
‐
1
01234

Ra
w
 S
co
re
s

Ability

TCC of C4 and D4 Test Forms

Pre‐Equating Post‐Equating

Figure 7: TCC of test forms C3 and D3 

 
Figure 5: TCC of test forms C1 and D1 Figure 6: TCC of test forms C2 and D2 

   
Figure 7: TCC of test forms C3 and D3         Figure 8: TCC of test forms C4 and D4 
 

Table 3 shows that for the classification rate, the IRT post-equating tended to pass more examinees 

than the pre-equating methods in total. The table shows that the IRT pre-equating method tended to 

pass fewer examinees than the IRT post-equating method at the C cut and in total. However, the 

reverse is the case for the D cut, where the pre-equating method passed more candidates in test forms 

C1, C2 and C4.  
 

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re
s

Ability

Pre‐Equating Post‐Equating

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re

Ability

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

‐4

Ra
w
 S
co
re
s

Ability

TCC of Pre and Post‐
equating

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

1

‐
4
‐
3
‐
2
‐
1
01234

Ra
w
 S
co
re
s

Ability

TCC of C4 and D4 Test Forms

Pre‐Equating Post‐Equating

 
Figure 5: TCC of test forms C1 and D1 Figure 6: TCC of test forms C2 and D2 

   
Figure 7: TCC of test forms C3 and D3         Figure 8: TCC of test forms C4 and D4 
 

Table 3 shows that for the classification rate, the IRT post-equating tended to pass more examinees 

than the pre-equating methods in total. The table shows that the IRT pre-equating method tended to 

pass fewer examinees than the IRT post-equating method at the C cut and in total. However, the 

reverse is the case for the D cut, where the pre-equating method passed more candidates in test forms 

C1, C2 and C4.  
 

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re
s

Ability

Pre‐Equating Post‐Equating

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re

Ability

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

‐4

Ra
w
 S
co
re
s

Ability

TCC of Pre and Post‐
equating

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

1

‐
4
‐
3
‐
2
‐
1
01234

Ra
w
 S
co
re
s

Ability

TCC of C4 and D4 Test Forms

Pre‐Equating Post‐Equating

 
Figure 5: TCC of test forms C1 and D1 Figure 6: TCC of test forms C2 and D2 

   
Figure 7: TCC of test forms C3 and D3         Figure 8: TCC of test forms C4 and D4 
 

Table 3 shows that for the classification rate, the IRT post-equating tended to pass more examinees 

than the pre-equating methods in total. The table shows that the IRT pre-equating method tended to 

pass fewer examinees than the IRT post-equating method at the C cut and in total. However, the 

reverse is the case for the D cut, where the pre-equating method passed more candidates in test forms 

C1, C2 and C4.  
 

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re
s

Ability

Pre‐Equating Post‐Equating

0
0.2
0.4
0.6
0.8
1

‐4

Ra
w
 S
co
re

Ability

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

‐4

Ra
w
 S
co
re
s

Ability

TCC of Pre and Post‐
equating

Pre‐Equating Post‐Equating

0

0.2

0.4

0.6

0.8

1

‐
4
‐
3
‐
2
‐
1
01234

Ra
w
 S
co
re
s

Ability

TCC of C4 and D4 Test Forms

Pre‐Equating Post‐Equating

Figure 8: TCC of test forms C4 and D4

Table 3 shows that for the classification rate, the IRT post­equating tended to pass more 
examinees than the pre­equating methods in total. The table shows that the IRT pre­equating 
method tended to pass fewer examinees than the IRT post­equating method at the C cut and 
in total. However, the reverse is the case for the D cut, where the pre­equating method passed 
more candidates in test forms C1, C2 and C4. 


95

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

Table 3: Classification frequency for aggregate pass rate, C­pass and D­pass rates for the 
UTME UOE

Test 
form

Equating 
method No.

Total 
high (N)

Total 
high 
(%)

C-high 
(N)

% 
C-high

D-high 
(N)

% D 
high

C1 Pre  559 45.48 418 34 147 11.96

 1229  

D1 Post  670 54.51 565 45.97 105 8.54

C2 Pre 563 46.87 345 28.72 218 18.15

 1201  

D2 Post  638 53.12 452 37.63 186 15.48

C3 Pre 534 45.44 438 37.27 96 8.17

 1175  

D3 Post  641 54.55 543 46.21 98 8.34

C4 Pre 694 44.37 487 31.13 207 13.23

 1564  

D4 Post  870 55.62 671 42.9 199 12.72

The means and standard deviations of the equated scores from different equating methods 
are shown in table 4. From the table, it can be seen that the item parameters of the test forms 
from the pre­equating and post­equating consistently yielded almost the same values except 
for test forms C1, representing pre­equating and the corresponding D2 for post­equating 
which has slightly higher means and SDs. 

Table 4: Means and standard deviations of the equated scores from different equating 
methods

Test Form
IRT-Pre equating

Test Forms
IRT Post equating

Mean SD Mean SD

C1 65.045 14.757 D1 65.034 14.818

C2 64.991 14.921 D2 64.989 14.95

C3 64.82 14.503 D3 64.806 14.438

C4 64.87 14.784 D4 64.917 14.774

Finally, table 5 also presents the results of the three indices used to evaluate the equating 
results with IRT pre­equating results as the baseline. All three indices indicated that the IRT 
post­equating yielded closer results to the IRT pre­equating method by having the smaller 
RMSD, BIAS and SEE in all four of the test forms.


96

Perspectives in Education 2016: 34(4)

Table 5: Indices used in evaluate the equating results with IRT pre­equating as the baseline

Test Form
RMSE

BIAS SEE
IRT Post

D1 0.01857 0.000345 0.018567

D2 0.07042 0.004959 0.070245

D3 0.07612 0.005794 0.075899

D4 0.07503 ­0.00563 0.074818

12. Discussions on results
The perception on the higher p­values from the post­equating method can probably be 
explained. During field trials, the items constituting the UOE were administered in paper­and­
pencil mode while the same items used in subsequent operational examination was done in 
a computer­based testing environment. The difference in the modes of examination could 
be a direct consequence for the perceived difference between the pre­equating method and 
post­operational method. The design of the UTME delivery system made it possible to include 
innovations such as the use of the four arrow keys on the keyboard as an alternative to the 
use of the mouse, review of items to reveal unanswered items prior to submission as well 
as inclusion of a timer among other things. These features added value to the test delivery 
system, distinguishing it from the paper­and­pencil mode of testing. 

The seriousness or stake attached to the two examinations may also have contributed to the 
difference in the p­values observed. Since the trial­test does not often attract motivational gains, 
students often do not take the examination as serious as the UTME high­stakes examination. 
This could account for the difference in the overall performance of the candidates. Again, 
the level of preparedness of the students can constitute its own problem as well, which also 
affects performance.

Observing the performance of the candidates through direct examination of the p­values shows 
that for instance, test forms C1 and D1could offer more insight into differences in pre­equating 
and post­operational methods. Test form C1 represents the pre­equating while D1 stands 
for the post­equating method. Of the 100 items tested, 56 of them were found to be harder 
in the pre­ than in the post­equating test form. Experience has shown that in the trial­testing 
situation, candidates are often less serious in taking examinations possibly because of a lack 
of motivation on the perceived consequences of the test. Wolf and Smith (1995) presented a 
research study, which showed that testing students in consequential condition compels them 
to out­perform other students in a non­consequential condition by an effect size of .26. They 
concluded that consequences influences motivation and motivation influences performance.

It is certain therefore that motivation is a likely contributor to performance differences found 
in this study between students that took the field test compared to students that took the 
UTME high stakes assessment. Indeed, it appears reasonable to say that students taking 
the field test according to Damaleski (2006) would not exert as much effort since no stakes 
were associated with this test event and, in fact, no student level results were ever reported. 
This lack of seriousness regarding trial­tests by students often accounts for the high rates 
of omitted and unreached items seen in many field tests and this possibly explains reasons 


97

Ojerinde, Popoola, Onyeneho & Egberongbe A comparative analysis of pre­equating ...

why trial­test items were found to be harder due to the relatively large amount of missing or 
incomplete data. 

The equality argument for fairness in assessment according to advocates assessing all 
students in a standardised manner using an identical assessment method, content and 
same administration, scoring and interpretation procedures. With this approach to assuring 
fairness, if different groups of test takers differ on some irrelevant knowledge or skills that 
can affect assessment performance, bias will exist. This situation is avoided by ensuring that 
pre­equating is carried out prior to real test administration. The analysis carried out in this 
study has shown that the pre­equating and post­equating methods have provided comparable 
results. This will mitigate the fears of stakeholders who are apprehensive of whether pre­
equating is actually doing what it is supposed to do or providing validity evidence as to the 
equivalency of the test forms used in testing in the UTME UOE.

13. Conclusion/Recommendation
The result of this study has shown that all three major indices involving RMSE, BIAS and SEE 
which represent total error, systematic error and standard equating error indicated that the 
IRT post­equating yielded closer results to the IRT pre­equating method and are therefore 
comparable. However, carrying out equating using IRT is complex, both conceptually 
and procedurally. 

Another score point for the post­equating method is that the method passed more candidates 
than the pre­equating especially in the total and c­cut. This shows that the field test items are 
predicting performance of candidates in the UTME operational examination. These results 
are pointers to the fact that item parameters obtained during the trial­test were remarkably 
equivalent to those obtained during the operational assessment of UTME in the UOE. All other 
22 UTME subjects were also subjected to pre­equating prior to operational test administration 
and similar results were achieved. The extent to which those inferences are appropriate for 
different groups of test takers is an important aspect of fairness

The practice of using the pre­equating method to build score tables prior to an operational 
assessment should be sustained since the method yielded comparable results with the post­
equating method. This occurs as long as the probable cause for pre­equating error such as 
the presence of bias in the item parameter estimates, which are caused by the violation of the 
assumption of item local independence, are removed (Kolen & Brennan, 2004). Pre­equating 
test forms prior to test administration in actual examination is a good way of assuring equity 
and fairness in assessment. When the tests given to the students are unbiased and function 
the same way for different groups of test takers, fairness is said to have been built into the test.

References
Domaleski, C.S. 2006. Exploring the efficacy of pre-equating a large scale criterion-referenced 
assessment with respect to measurement equivalence. Published PhD thesis. Ann Arbor, MI: 
ProQuest Information and Learning Company.

Gao, R., He, W. & Ruan, C. 2012. Does pre­equating work? An investigation into a pre­equated 
test let­based college placement exam using post administration data. ETS Research Report 
Series, 2012, i–18. https://doi.org/10.1002/j.2333­8504.2012.tb02294.x


98

Perspectives in Education 2016: 34(4)

Holland, P.W. & Dorans, N.J. 2006. Linking and equating. In R.L. Brennan. (Ed.). Educational 
measurement, 4th ed. Westport, CT: American Council on Education and Praeger Publishers. 
pp. 187­220.

Kolen, M.J. & Brennan, R.L. 2004. Test equating, scaling and linking: Methods and practices, 
2nd ed. New York: Springer –Verlag. https://doi.org/10.1007/978­1­4757­4310­4

Kirkpatrick, R.K. 2005. The effects of item format in common item equating. Unpublished 
doctoral dissertation. Iowa: University of Iowa

Kirkpatrick, R. & Way, W.D. 2008. Field testing and equating design for state educational 
assessment. A paper presented at the annual meeting of the American Educational Research 
Association, New York.

Livingston, L. 2004. Equating test scores (without IRT). Princeton, NJ: Educational Testing 
Service.

Pomplun, M., Omar, H. & Custer, M. 2004. A comparison of Winsteps and Bilog­MG 144. for 
vertical scaling with the Rasch model. Educational and Psychological Measurement, 64, 600­
616. https://doi.org/10.1177/0013164403261761

Raju, N.S., Laffitte, L.J. & Byrne, B.M. 2002. Measurement equivalence: A comparison of 
methods based on confirmatory factor analysis and item response theory. Journal of Applied 
Psychology, 87(3), 517­529. https://doi.org/10.1037/0021­9010.87.3.517

Tong, Y., Wu, S­S. & Xu, M. 2008. A comparison of pre­equating and post­equating using 
large­scale assessment data. Paper presented at the American Educational and Research 
Association annual meeting, New York City.

Wolf, L.F. & Smith, J.K. 1995. The consequence of consequence: Motivation, anxiety, and 
test performance. Applied Measurement in Education, 8(3), 227­242. https://doi.org/10.1207/
s15324818ame0803_3

Xuan, T & Rochelle, M. 2011. Why do standardized testing programs report scaled scores? 
Why not just report the raw or percent­correct scores? R&D Connections, 16, 1­6.