Departments of 1Paediatrics, 2Family & Community Medicine and 3Physiology, Arabian Gulf University, Manama, Bahrain
*Corresponding Author’s e-mail: fuadama@agu.edu.bh

 حتليل مفردات أسئلة االختيار من متعدد يف قسم األطفال جبامعة
اخلليج العريب، املنامة، البحرين

دينا خيامي، اأحمد جردات، طارق ال�سيباين، فوؤاد عبداهلل علي

abstract: Objectives: The current study aimed to carry out a post-validation item analysis of multiple choice 
questions (MCQs) in medical examinations in order to evaluate correlations between item difficulty, item dis-
crimination and distraction effectiveness so as to determine whether questions should be included, modified or 
discarded. In addition, the optimal number of options per MCQ was analysed. Methods: This cross-sectional 
study was performed in the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain. A total of 
800 MCQs and 4,000 distractors were analysed between November 2013 and June 2016. Results: The mean diff- 
iculty index ranged from 36.70–73.14%. The mean discrimination index ranged from 0.20–0.34. The mean distractor 
efficiency ranged from 66.50–90.00%. Of the items, 48.4%, 35.3%, 11.4%, 3.9% and 1.1% had zero, one, two, three and 
four nonfunctional distractors (NFDs), respectively. Using three or four rather than five options in each MCQ resulted 
in 95% or 83.6% of items having zero NFDs, respectively. The distractor efficiency was 91.87%, 85.83% and 
64.13% for difficult, acceptable and easy items, respectively (P <0.005). Distractor efficiency was 83.33%, 83.24% and 
77.56% for items with excellent, acceptable and poor discrimination, respectively (P <0.005). The average Kuder-
Richardson formula 20 reliability coefficient was 0.76. Conclusion: A considerable number of the MCQ items were within 
acceptable ranges. However, some items needed to be discarded or revised. Using three or four rather than five options in 
MCQs is recommended to reduce the number of NFDs and improve the overall quality of the examination.

Keywords: Medical Education; Educational Measurement; Academic Performance; Psychometrics; Examination 
Questions; Discriminant Analysis; Bahrain.

ال�سعوبة،  موؤ�رص  بني  العالقة  ودرا�سة  متعدد  من  االختيار  اأ�سئلة  مفردات  حتليل  اإجراء  اإىل  احلالية  الدرا�سة  هدفت  الهدف:  امللخ�ص: 
موؤ�رصالتمييز وفعالية امل�ستتات من اأجل االإحتفاظ باالأ�سئلة اأو تعديل اأو اإلغاء كل �سوؤال. باالإ�سافة اإىل ذلك مت حتليل اأف�سل عدد من البدائل 
يف كل �سوؤال. الطريقة: هذه درا�سة م�ستعر�سة اأجريت يف ق�سم طب االأطفال بجامعة اخلليج العربي، املنامة، البحرين. مت حتليل 800 �سوؤال 
اإختيار من متعدد و عدد 4,000 من امل�ستتات يف الفرتة من نوفمرب 2013 اىل يونيو 2016. النتائج: تراوح متو�سط موؤ�رص ال�سعوبة بني 
%73.14-36.70 و تراوح موؤ�رص التمييز من 0.34-0.20 و تراوحت فعالية امل�ستتات من %90.00-66.50 على التوايل. بلغت ن�سبة الفقرات 
التي حتتوي على �سفر، واحد، اثنان، ثالثة واأربعة م�ستتات غري فاعلة: %48.4، %35.3، %11.4، %3.9 و %1.1 على التوايل. با�ستخدام 
ثالثة اأو اأربعة بدال من خم�سة بدائل وبالتايل اإزالة واحد اأو اثنني من امل�ستتات الغري فاعلة من �ساأنه اأن تكون %95 اأو %83.6 من اال�سئلة 
حتوي على �سفر من امل�ستتات الغري فاعلة على التوايل. بلغت فعالية امل�ستتات %91.87، %85.83 و %64.13 بالن�سبة لالأ�سئلة ذات موؤ�رص 
ال�سعوبة يف املعدالت ال�سعبة واجليدة وال�سهلة على التوايل )P >0.005(. وبلغت فعالية امل�ستتات %83.33، %83.24 و %77.56 بالن�سبة 
لالأ�سئلة ذات موؤ�رص التمييز يف املعدالت املمتازة واجليدة وال�سعيفة على التوايل )P >0.005(. كان متو�سط اختبار املوثوقية 20 ب كودر 
ريت�سار�سون هو 0.76. اخلال�صة: كانت ن�سبة جيدة من مفردات اأ�سئلة االختيار من متعدد �سمن املعدالت املقبولة. ومع ذلك، يلزم اإ�ستبعاد 
بع�ض الفقرات اأو تنقيحها. نو�سي باإ�ستخدام ثالثة اأو اأربعة بدال من خم�سة بدائل للحد من امل�ستتات الغري فاعلة ولتح�سني جودة اأ�سئلة 

االختيار من متعدد.
الكلمات املفتاحية: التعليم الطبي؛ القيا�ض الرتبوي؛ االأداء االأكادميي؛ ال�سيكومرتية؛ اأ�سئلة االأمتحانات؛ حتليل التمييز؛ البحرين.

Item Analysis of Multiple Choice Questions at the 
Department of Paediatrics, 

Arabian Gulf University, Manama, Bahrain
Deena Kheyami,1 Ahmed Jaradat,2 Tareq Al-Shibani,3 *Fuad A. Ali1

clinical & basic research

Sultan Qaboos University Med J, February 2018, Vol. 18, Iss. 1, pp. e68–74, Epub. 4 Apr 18
Submitted 30 Aug 17
Revisions Req. 22 Oct & 12 Dec 17; Revisions Recd. 12 Nov & 25 Dec 17
Accepted 7 Jan 18 doi: 10.18295/squmj.2018.18.01.011

Advances in Knowledge
- Designing adequate multiple choice questions (MCQs) is essential to assess learning among medical students. Item analysis is an important 

scientific tool that provides information about the reliability and validity of MCQ items. However, item analysis studies are limited, particularly in 
medical schools in Arabian Gulf countries.

- The findings of the current study will hopefully increase awareness of this measurement tool among medical education providers in the region.

Application to Patient Care
- Designing appropriate MCQs improves the assessment and learning output of medical students. High-quality medical education in the Arabian 

Gulf region will encourage the provision of enhanced healthcare services to local populations.


Deena Kheyami, Ahmed Jaradat, Tareq Al-Shibani and Fuad A. Ali

Clinical and Basic Research | e69

While assessment is an essential part of student learning, assessment tools need to be valid, reliable and objective and reflect 
various achievement levels. Multiple choice questions 
(MCQs) should not only aim to assess knowledge 
recollection, but also measure other teaching object-
ives within Bloom’s taxonomy of learning, such as 
comprehension, application, analysis, synthesis and 
evaluation.1 Constructing a high-quality MCQ exam-
ination can be difficult and time-consuming; however, 
this approach is usually preferential to other types of 
assessment tools because it is objective and leaves little 
room for human bias, as answers to MCQ questions can 
be easily and reliably scored.2,3 In recent years, the most 
common type of MCQs employed in examinations are 
type A MCQs, which consist of a stem followed by four 
or five options or distractors.4,5 

An item analysis assesses the reliability and 
validity of an examination by examining student perf-
ormance with regards to each MCQ and applying 
statistical analyses to determine whether the item 
should be kept, reviewed or discarded from the test. 
Common item analysis parameters include the diffi-
culty index (DIFI), which reflects the percentage of 
correct answers to total responses; the discrimination 
index (DI), also known as the point biserial correlation, 
which identifies discrimination between students with 
different levels of achievement; and distractor efficiency 
(DE), which indicates whether the distractors in the 
item are well-chosen or have failed to distract students 
from selecting the correct answer. An ideal item 
should have a DIFI of between 30–70%, a DI of >0.2 
and a DE of 100%.6,7

At the end of their 10-week clinical rotation in 
the Department of Paediatrics of the Arabian Gulf 
University (AGU), Manama, Bahrain, paediatric clerk-
ship students undergo MCQ examinations in addition 
to objective standard clinical examinations, short-
answer question tests and continuous performance 
assessments. Each MCQ consists of a stem followed 
by five distractors. Students do not receive negative 
marks for wrong answers and the tests are criterion-
referenced, with passing standards expressed in absolute 
terms and a passing score of 60%. For each examination, 
approximately 50% of the MCQs are newly constructed 
while the remaining questions are taken from a question 
bank after revising and modifying question items 
according to item analysis outcomes. However, the 
examinations are not assessed for equivalent difficulty 
across the years.

The current study aimed to carry out a post-
validation item analysis of MCQs used in end-of-rota- 
tion examinations between 2013–2016 at the AGU 
Department of Paediatrics. Based on the item analysis 

outcomes, recommendations were made as to whether 
the questions should be retained, modified or discarded 
from the AGU question bank. In addition, correlations 
between the difficulty, item discrimination and dist-
raction effectiveness of each item were calculated and 
the optimal number of options in each MCQ was 
determined.

Methods

This cross-sectional study was performed in the 
Department of Paediatrics at AGU and included all 
MCQ items of paediatric clerkship summative exam-
ination papers from November 2013 to June 2016. 
There were 50 MCQs per paper and four examinations 
per year, resulting in a total of 800 MCQs and 4,000 
distractors. In total, 608 students had taken the exam-

Table 1: Mean difficulty index, discrimination index and 
distractor efficiency of end-of-rotation paediatric examin-
ations at the Arabian Gulf University, Manama, Bahrain 
(N = 16)

Year Exam. Mean ± SD 

DIFI % DI DE %

2013 1 65.81 ± 24.00 0.34 ± 0.22 70.00 ± 28.12

2 73.14 ± 20.39 0.30 ± 0.16 68.00 ± 29.47

3 70.70 ± 19.34 0.30 ± 0.20 66.50 ± 28.84

4 58.72 ± 23.19 0.23 ± 0.19 79.00 ± 21.64

Total 67.09 ± 22.34 0.29 ± 0.20 70.88 ± 27.43

2014 5 56.06 ± 23.38 0.28 ± 0.17 83.50 ± 19.31

6 52.40 ± 18.74 0.28 ± 0.27 75.00 ± 23.15

7 51.70 ± 24.21 0.20 ± 0.27 83.00 ± 17.81

8 52.88 ± 21.20 0.29 ± 0.20 85.00 ± 16.75

Total 53.26 ± 21.88 0.26 ± 0.23 81.62 ± 19.65

2015 9 39.54 ± 21.25 0.28 ± 0.19 90.00 ± 15.15

10 44.78 ± 23.63 0.23 ± 0.18 88.50 ± 20.34

11 43.59 ± 22.46 0.27 ± 0.19 86.50 ± 16.91

12 44.73 ± 25.25 0.23 ± 0.18 81.00 ± 21.17

Total 43.16 ± 23.12 0.25 ± 0.18 86.50 ± 18.73

2016 13 41.84 ± 22.07 0.27 ± 0.15 85.50 ± 18.96

14 36.70 ± 23.26 0.22 ± 0.18 89.00 ± 16.87

15 54.68 ± 18.94 0.31 ± 0.14 84.00 ± 20.05

16 47.14 ± 22.84 0.24 ± 0.15 89.00 ± 15.29

Total 45.09 ± 22.68 0.26 ± 0.18 86.88 ± 17.89

Average 52.15 ± 24.37 0.27 ± 0.20 81.47 ± 22.19

Exam. = examination; SD = standard deviation; DIFI = difficulty index; 
DI = discrimination index; DE = distractor efficiency.


Item Analysis of Multiple Choice Questions at the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain

e70 | SQU Medical Journal, February 2018, Volume 18, Issue 1

Table 2: Non-functioning distractors per individual multiple choice question items in the end-of-rotation paediatric 
examinations at the Arabian Gulf University, Manama, Bahrain (N = 800)

Year Parameter Number of NFDs per item Total

0 1 2 3 4 

2013 n (%) 64 (32) 73 (36.5) 37 (18.5) 18 (9) 8 (4) 200 (100)

Mean DIFI % 51.62 66.60 75.62 91.96 100.00 67.09

Mean DI 0.33 0.32 0.29 0.18 0.00 0.29

2014 n (%) 88 (44) 84 (42) 21 (10.5) 7 (3.5) 0 (0) 200 (100)

Mean DIFI % 45.54 56.27 65.01 78.81 - 53.26

Mean DI 0.26 0.28 0.28 0.07 - 0.26

2015 n (%) 116 (58) 66 (33) 13 (6.5) 4 (2) 1 (0.5) 200 (100)

Mean DIFI % 39.19 43.29 60.24 86.38 100.00 43.16

Mean DI 0.24 0.27 0.27 0.19 0.00 0.25

2016 n (%) 119 (59.5) 59 (29.5) 20 (10) 2 (1) 0 (0) 200 (100)

Mean DIFI % 38.98 48.89 65.02 97.37 - 45.09

Mean DI 0.27 0.25 0.27 0.16 - 0.26

Total n (%) 387 (48.4) 282 (35.3) 91 (11.4) 31 (3.9) 9 (1.1) 800 (100)

Mean DIFI % 42.62 54.36 68.65 88.62 100.00 52.15

Mean DI 0.27 0.28 0.28 0.15 0.00 0.27

NFDs = non-functioning distractors; DI = discrimination index.

inations during the study period, with an average of 38 
students sitting each examination.

Items were only used for summative assessment 
and were not reviewed with the students at any time. The 
content and construct validity of the examinations were 
verified by the Paediatric Examination Committee, 
which consisted of five content experts and paediatric 
consultants. Each examination was designed according 
to a predetermined examination blueprint, ensuring 
that all essential knowledge and skills were covered 
based on learning objectives. The post-validation item 
analysis was performed using the Oracle Database, 
Version 10g (Oracle Corp., Redwood City, California, 
USA). The committee discarded existing MCQs based 
on the item analysis results, flaws in MCQ construction 
and how frequently an item was used in previous years. 
The question bank was secured in the assessment office 
to which only authorised individuals were allowed 
access via a digital security system. Examinees entered 
their answers in pencil on a Scantron® optical answer 
sheet (Scantron Corp., Tustin, California, USA).

The item analysis parameters used in the current 
study included the DIFI, DI and DE. The DIFI ranged 
from 0% (i.e. none of the students answered the item 
correctly) to 100% (i.e. all of the students answered the 
item correctly). In general, items with a DIFI of <30% 
were considered difficult, those between 30–70% were 
considered acceptable and those >70% were considered 

easy. Kelley’s method was used to calculate the DI 
based on the difference between the scores of high-
achievers, classified as the top 27% of test-takers, and 
low-achievers, classified as the bottom 27% of test 
takers.8 The larger the difference between the high- and 
low-achieving groups, the higher the DI of an item. The 
DIs of items ranged from -1 (all and only low achievers 
answered correctly) to +1 (all and only high achievers 
answered correctly). Items with a DI of ≥0.35 were 
considered excellent, those between 0.2–0.34 were 
considered acceptable and those <0.2 were considered 
poor. 

The DE was calculated based on the number of 
nonfunctional distractors (NFDs) per item. An NFD 
was defined as an incorrect MCQ option selected 
by less than 5% of students.7 The DE was deemed to 
be either 0%, 25%, 50%, 75% or 100% if an item had 
four, three, two, one or zero NFDs, respectively.9 The 
reliability of the examination was measured using 
the Kuder-Richardson formula 20 coefficient (KR20); 
this value usually ranges from 0–1, with higher KR20 
values (i.e. closer to 1) indicating greater reliability. 
A KR20 value of <0.3 is considered poor and a value 
of ≥0.7 is considered acceptable. Items with DIFIs of 
>70% or <30% usually yield a low KR20 value, as do 
items with a DI of <0.2.2,10,11

Data analysis was performed using the Statistical 
Package for the Social Sciences (SPSS), Version 23.0 


Deena Kheyami, Ahmed Jaradat, Tareq Al-Shibani and Fuad A. Ali

Clinical and Basic Research | e71

(IBM Corp., Armonk, New York, USA). Variables 
were presented as means ± standard deviations. The 
linear relationship between DIFI and DI was measured 
using Pearson’s correlation test. A two-way analysis 
of variance was used to examine the differences in DE 
(dependent variable), DIFI (independent variable one) 
and DI (independent variable two). A P value of <0.050 
was considered statistically significant. 

The Vice Dean for Academic Affairs at AGU 
approved this study and allowed access to the exam-
ination data. The identities of the students taking the 
examination were kept anonymous and confidential. 
No human participants were involved in this study.

Results

The mean DIFI of the examinations ranged from 
36.70% in 2016 to 73.14% in 2013, with the overall 
mean DIFI considered acceptable (52.15%). The over-
all mean DI and DE ranged between 0.20–0.34 and 
66.50–90.00%, respectively [Table 1]. Of the total 
number of items, 48.4%, 35.3%, 11.4%, 3.9% and 1.1% 
had zero, one, two, three and four NFDs, respectively 
[Table 2]. It was calculated that using four rather than 
five options in each MCQ by removing one NFD 
would result in 83.6% of items having zero NFDs. Using 
three options and removing two NFDs resulted in 95% 
of items having zero NFDs.

The overall DIFI increased as the number of 
NFDs increased, with DIFIs of 42.62%, 54.36%, 68.65%, 
88.62% and 100% for items with zero, one, two, three 
and four NFDs, respectively. This finding was observed 
in the mean DIFI for each year as well. The overall 
mean DIFI was almost the same for items with zero, 
one and two NFDs (0.27%, 0.28% and 0.28%), while 
they were 0.15% and 0% for items with three and four 
NFDs, respectively. Similar results were observed for 
the mean DI of each year as well. More than 40% of the 
MCQs had an acceptable DIFI throughout the study 
period. The lowest percentage of difficult MCQs was 
observed in 2013 (7.5%). The highest percentage of 

difficult MCQs was 31.5%, noted in 2015. There were 
more easy MCQs in 2013 than in 2015 [Figure 1A]. Item 
DIs were relatively constant across the study period 
[Figure 1B]. 

Approximately half of the items had an acceptable 
DIFI (53.4%), while the other half were either difficult 
(20.8%) or easy (25.9%). The DE was directly related 
to the DIFI, with DEs of 91.87%, 85.83% and 64.13% 
for difficult, acceptable and easy items, respectively 
(P <0.005). Items were nearly equally distributed 
between poor, acceptable and excellent DIs. The DE 
was 83.24% and 83.33% for items with excellent and 
acceptable DIs, respectively, compared to 77.56% for 
items with poor discrimination (P <0.005) [Table 3]. 
There was a significant dome-shaped correlation 
between DIFI and DI (r = 0.162; P = 0.010), with the 
highest DIs occurring in the acceptable DIFI range and 
decreasing for DIFIs in the difficult range [Figure 2]. The 
average KR

20 coefficient value was 0.76.

Figure 1: Distribution according to (A) difficulty index and (B) discrimination index of multiple choice questions in end-
of-rotation paediatric examinations at the Arabian Gulf University, Manama, Bahrain (N = 800).

Table 3: Correlation between difficulty index and discrim-
ination index with distractor efficiency and action propo-
sed of multiple choice questions in end-of-rotation paed- 
iatric examinations at the Arabian Gulf University, 
Manama, Bahrain (N = 800)

Index n (%) DE % P value Proposed 
action

DIFI

Difficult 166 (20.8) 91.87 <0.005* Review

Acceptable 427 (53.4) 85.83 Store and 
review

Easy 207 (25.9) 64.13 Discard

DI

Poor 254 (31.8) 77.56 <0.005† Discard

Acceptable 267 (33.4) 83.33 Store and 
review

Excellent 279 (34.9) 83.24 Store

DE = distractor efficiency; DIFI = difficulty index; DI = discrimination 
index.
*The DE was significantly different for difficult, acceptable and easy items. 
†Poor DIs had a significantly lower DE than both acceptable and excellent 
DIs.


Item Analysis of Multiple Choice Questions at the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain

e72 | SQU Medical Journal, February 2018, Volume 18, Issue 1

Discussion

In the current study, out of 16 summative examinations 
and 800 items, the mean DIFI of individual tests was 
acceptable. Items with a high DIFI mostly occurred in 
examination papers from 2015 and 2016, while items 
with a low DIFI mostly occurred in 2013 examination 
papers. It is likely that this finding reflects recent 
improvements in MCQ construction by the AGU 
Examination Committee. The DIFI results of the 
current study were comparable to those of other 
institutions, although relative incentives and test 
conditions are unlikely to have been the same. Mitra 
et al. reported mean DIFIs ranging from 64–89% 
among 12 summative assessments in their foundation 
programme conducted between 2003 and 2006.12 Other 
studies have reported mean DIFIs of 39.4 ± 21.4% 
and 63.06 ± 18.95%, respectively.9,13 Keralia et al. rep-
orted mean DIFIs between 47.17–58.08% in MCQ 
items from 10 summative papers.14 Sharif et al. reported 
a mean DIFI of 49 ± 31% in 2,445 MCQs.15 In the basic 
medical sciences component of a nursing licensure 
examination, Lin et al. found the DIFI to range from 
10–93%, with a mean of 48%.16

Karelia et al. reported that 61 ± 8.43%, 24 ± 4.08% 
and 15 ± 7.07% of items in pharmacology summative 
tests were acceptable, very easy and very difficult, 
respectively.14 In the current study, 53.4%, 25.9% and 
20.8% of items fell within these same categories. The 
authors recommend selecting MCQs with lower DIFIs 
for fundamental topics that students will probably know; 
moreover, starting the examination with such questions 
will raise the students’ confidence. Similarly, MCQs 
with a high DIFI should be located nearer the end of 
the paper in order to discriminate between high- and 
low-achievers. With regards to DI, a nearly equivalent 
percentage of items in the current study were in the 
poor, acceptable and excellent ranges (31.8%, 33.4% and 
34.9%, respectively). Lin et al. reported that 28.8% of 
MCQ items in the basic medical sciences section had a 

DI of <0.2.16 Other studies have reported mean DIs of 
0.14 ± 0.19, 0.356 ± 0.17, 0.19 ± 0.30 and 0.33 ± 0.18.6,9,13,15 
Items with poor DIs usually result in low scores due to the 
use of incorrect answer keys, confusing stems or areas of 
controversy.17,18 Such items should be removed from the 
question bank as they fail to discriminate between strong 
and weak academic performances. 

Constructing plausible distractors and decreasing 
NFDs is essential to improve the quality of MCQs.19 
Therefore, items may need to be modified if students 
constantly avoid choosing certain distractors. In the 
current study, most questions had less than two NFDs, 
with a mean DE of 66.5–90.00%. Other studies have 
reported a mean DE of 88.6 ± 18.6% and 63.97 ± 
33.56%.9,13 Over the study period, there was a gradual 
improvement in mean DE from 2013 (70.88%) to 
2016 (86.88%); this was likely due to the continuous 
improvement activities of the AGU Examination 
Committee. This improvement is also reflected in the 
annual number of NFDs. Items with zero NFDs increased 
from 32% in 2013 to 44%, 58% and 59.5% in 2014, 2015 
and 2016, respectively, while items with three NFDs 
decreased from 9% in 2013 to 3.5%, 2% and 1% in 2014, 
2015 and 2016, respectively. Items with four NFDs 
decreased from 4% in 2013 to 0%, 0.5% and 0% in 2014, 
2015 and 2016, respectively.

Items with high NFDs reduce both the DE and DI, 
but increase the DIFI; thus, the item will be easy for the 
students and act as a poor discriminator of academic 
performance. In the current study, the DE was signif-
icantly higher among difficult items compared to accep- 
table and easy items as well as significantly higher 
among items with excellent and acceptable DIs over 
poor ones. Difficult items with excellent DE values need 
to be reviewed for possible language confusion, suffic- 
ient subject coverage or inappropriately chosen material 
according to the student’s level of learning. In contrast, 
easy items with low DE values should be discarded, 
while items with acceptable DIFI and DE values can 
be stored and reviewed for improvement. It is often 
necessary to revise items in which the distractor is 
selected more often than the correct answer.20 The 
number of NFDs also affects DI, in that items with 
lower NFDs are associated with acceptable or excellent 
DIs. The current study found that items with excellent 
and acceptable DIs had a significantly higher DE 
than items with a poor DI. The authors recommend 
discarding items with poor DIs and low DEs, while 
retaining items with acceptable or excellent DIs and 
high DEs.

In the current study, items with NFDs of zero, 
one and two had acceptable DIFIs and DIs, while 
items with NFDs of three and four had higher DIFIs 
and poorer DIs. Mukherjee et al. reported a similar 

Figure 2: Scatter plot showing the relationship between 
difficulty index and discrimination index among 
multiple choice question items in end-of-rotation 
paediatric examinations at the Arabian Gulf University, 
Manama, Bahrain (N = 800). 


Deena Kheyami, Ahmed Jaradat, Tareq Al-Shibani and Fuad A. Ali

Clinical and Basic Research | e73

association, with DIFIs of 32.5%, 51.36%, 71.11% and 
87.08% for items with zero, one, two and three NFDs, 
respectively, in a community medicine assessment; 
only items with NFDs of one and two had accept- 
able DIs (0.396 and 0.404, respectively), while items 
with NFDs of zero and three had poor DIs (0.023 and 
0.195, respectively).21 Items which reflect fundamental 
knowledge should be retained each year to determine 
whether all students continue to answer them correctly. 
While some may argue that the inclusion of more 
options in an MCQ reduces the ‘guessing effect’, others 
have demonstrated that additional options beyond three 
do not make much difference; in fact, reducing the 
list of available responses to three options can actually 
improve psychometric features.22,23

Furthermore, it is easier to develop three rather 
than four or five MCQ options and more effective to 
have fewer options with a greater number of functional 
distractors in comparison to increased options and 
more NFDs. Tarrent et al. suggested including three 
instead of four options, as such questions require 
less time to be constructed and the performance for 
both is equal.24 A meta-analysis of 80 years of research 
concluded that three options are optimal for MCQ 
items, resulting in a reduction in the amount of time 
required to prepare each MCQ and allowing more 
questions to be set per examination.19 In addition, 
this will increase subject exposure and improve the 
reliability and validity of the test due to the inclusion 
of more high-quality items. According to the dome-
shaped correlation between DIFI and DI in the current 
study, items with DIFIs falling in the difficult or easy 
categories had significantly poorer DIs. Sim et al. 
similarly reported that maximum DI values were seen 
with DIFIs between 40–74%.25 The reliability coefficient 
in the current study was 0.76, which is less than excellent 
but still within the desirable range.2,10,11

Constructing high-quality MCQs is essential to 
accurately assess student performance. Overall, for 
students who know the material covered by the exami-
nation, NFDs add little to the performance of a test 
item; in contrast, increasing the number of distractors 
decreases the likelihood of students accidentally 
choosing the correct answer by guesswork. An item 
analysis of questions is recommended for all exami-
nations in order to continuously update the question 
bank by keeping items with acceptable indices and 
revising or discarding others. In the authors’ experience, 
it is usually better to construct an examination with 
the input of an examination committee in order to 
improve the quality of the questions. Special training 
programmes or workshops should be offered to the 
members of such committees in order to hone their 
skills in preparing effective MCQs. Further research at 

AGU is recommended to determine any future improv-
ements in MCQ preparation. Conducting similar 
studies for examinations in other disciplines at AGU 
would also be useful.

Conclusion

Item analyses can be valuable to strengthen an MCQ 
bank in order to ensure the items have an acceptable 
DIFI, acceptable or excellent DI and high DE. The item 
analysis of paediatric end-of-rotation examinations at 
AGU indicated that a considerable percentage of test 
items had acceptable mean DIFIs and DIs. However, 
some items needed to be discarded or revised. Using 
three or four rather than five options in an MCQ is 
recommended. 

a c k n o w l e d g e m e n t s

The researchers would like to thank the Assessment 
Office at AGU for providing the MCQ database and 
helping with the item analysis.

c o n f l i c t o f i n t e r e s t
The authors declare no conflicts of interest. 

f u n d i n g

No funding was received for this study.

References
1. Case SM, Swanson DB. Extended-matching items: A practical 

alternative to free-response questions. Tech Learn Med 1993; 
5:107–15. doi: 10.1080/10401339309539601.

2. Cronbach LJ, Shavelson RJ. My current thoughts on coefficient 
alpha and successor procedures. Educ Psycol Meas 2004; 
64:391–418. doi: 10.1177/0013164404266386.

3. Bloom BS, Hastings JT, Madaus GF. Handbook on Formative 
and Summative Evaluation of Student Learning. New York, 
USA: McGraw-Hill, 1971. P. 103.

4. Skakun EN, Nanson EM, Kling S, Taylor WC. A preliminary 
investigation of three types of multiple choice questions. Med 
Educ 1979; 13:91–6. doi: 10.1111/j.1365-2923.1979.tb00928.x.

5. Skakun EN, Nanson EM, Taylor WC, Kling S. An investigation 
of three types of multiple choice questions. Annu Conf Res 
Med Educ 1977; 16:111–16.

6. Hingorjo MR, Jaleel F. Analysis of one-best MCQs: The 
difficulty index, discrimination index and distractor efficiency. J 
Pak Med Assoc 2012; 62:142–7.

7. Tarrant M, Ware J, Mohammed AM. An assessment of 
functioning and non-functioning distractors in multiple-choice 
questions: A descriptive analysis. BMC Med Educ 2009; 9:40. 
doi: 10.1186/1472-6920-9-40.

8. Kelley TL. The selection of upper and lower groups for valid-
ation of test items. J Educ Psychol 1939; 30:17–24. doi: 10.1037/
h0057123.

9. Mehta G, Mokhasi V. Item analysis of multiple choice quest-
ions: An assessment of the assessment tool. Int J Health Sci 
Res 2014; 4:197–202. 

https://doi.org/10.1080/10401339309539601
https://doi.org/10.1177/0013164404266386
https://doi.org/10.1111/j.1365-2923.1979.tb00928.x
https://doi.org/10.1186/1472-6920-9-40
https://doi.org/10.1037/h0057123
https://doi.org/10.1037/h0057123


Item Analysis of Multiple Choice Questions at the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain

e74 | SQU Medical Journal, February 2018, Volume 18, Issue 1

10. Bland JM, Altman DG. Cronbach’s alpha. BMJ 1997; 314:572. 
doi: 10.1136/bmj.314.7080.572.

11. Nunnally JC, Bernstein IH. Psychometric Theory, 3rd ed. New 
York, USA: McGraw-Hill, 1994. 

12. Mitra NK, Nagaraja HS, Ponnudurai G, Judson JP. The levels 
of difficulty and discrimination indices in type A multiple 
choice questions of pre-clinical semester 1 multidisciplinary 
summative tests. Int EJ Sci Med Educ 2009; 3:2–7.

13. Gajjar S, Sharma R, Kumar P, Rana M. Item and test analysis 
to identify quality multiple choice questions (MCQs) from 
an assessment of medical students of Ahemdabad, Gujarat. 
Indian J Community Med 2014; 39:17–20. doi: 10.4103/0970-
0218.126347.

14. Karelia BN, Pillai A, Vegada BN. The levels of difficulty and 
discrimination indices and relationship between them in four-
response type multiple choice questions of pharmacology 
summative tests of year II M.B.B.S students. Int EJ Sci Med 
Educ 2013; 7:41–6. 

15. Sharif M, Rahimi SM, Rajabi M, Sayyah M. Computer software 
application in item analysis of exams in a college of medicine. 
ARPN J Sci Tech 2014; 4:565–9.

16. Lin LC, Tseng HM, Wu SC. Item analysis of the registered nurse 
license exam by nursing candidates from vocational nursing 
high schools in Taiwan. Proc Natl Sci Counc Repub China D 
1999; 9:24–31.

17. Kaur M, Singla S, Mahajan R. Item analysis of in use multiple 
choice questions in pharmacology. Int J Appl Basic Med Res 
2016; 6:170–3. doi: 10.4103/2229-516X.186965.

18. Mackenzie J. Vague and ambiguous questions on multiple-
choice exercises: The case for. Educ Philos Theory 1994; 
26:23–33. doi: 10.1111/j.1469-5812.1994.tb00198.x.

19. Rodriguez MC. Three options are optimal for multiple-choice 
items: A meta-analysis of 80 years of research. Educ Meas Issues 
Pract 2005; 24:3–13. doi: 10.1111/j.1745-3992.2005.00006.x.

20. Tomak L, Bek Y. Item analysis and evaluation in the examin-
ations in the Faculty of Medicine at Ondokuz Mayis University. 
Niger J Clin Pract 2015; 18:387–94. doi: 10.4103/1119-3077. 
151720.

21. Mukherjee P, Lahiri SK. Analysis of multiple choice questions 
(MCQs): Item and test statistics from an assessment in a 
medical college of Kolkata, West Bengal. IOSR J Dent Med Sci 
2015; 14:47–52. doi: 10.9790/0853-141264752.

22. Nwadinigwe PI, Naibi L. The number of options in a multiple-
choice test item and the psychometric characteristics. J Educ 
Pract 2013; 4:189–96.

23. Vegada B, Shukla A, Khilnani A, Charan J, Desai C. Comparison 
between three option, four option and five option multiple 
choice question tests for quality parameters: A randomized 
study. Indian J Pharmacol 2016; 48:571–5. doi: 10.4103/0253-
7613.190757.

24. Tarrant M, Ware J. A comparison of the psychometric prop-
erties of three- and four-option multiple-choice questions 
in nursing assessments. Nurse Educ Today 2010; 30:539–43. 
doi: 10.1016/j.nedt.2009.11.002.

25. Sim SM, Rasiah RI. Relationship between item difficulty and 
discrimination indices in true/false-type multiple choice 
questions of a para-clinical multidisciplinary paper. Ann Acad 
Med Singapore 2006; 35:67–71.

https://doi.org/10.1136/bmj.314.7080.572
https://doi.org/10.4103/0970-0218.126347
https://doi.org/10.4103/0970-0218.126347
https://doi.org/10.4103/2229-516X.186965
https://doi.org/10.1111/j.1469-5812.1994.tb00198.x
https://doi.org/10.1111/j.1745-3992.2005.00006.x
https://doi.org/10.4103/1119-3077.151720
https://doi.org/10.4103/1119-3077.151720
https://doi.org/10.9790/0853-141264752
https://doi.org/10.4103/0253-7613.190757
https://doi.org/10.4103/0253-7613.190757
https://doi.org/10.1016/j.nedt.2009.11.002