Analysis of Item Writing Flaws in a Communications Skills Test in a 

Ghanaian University 

 
Ato Kwamina Arhin,1 Jonathan Essuman,2 & Ekua Arhin3 
1,2 Faculty of Education and Communication Sciences, AAMUSTED, Kumasi, Ghana 

3Department of Education, Ola College of Education, Cape Coast, Ghana 

 
Abstract  

Adhering to the rules governing the writing of multiple-choice test items will ensure quality 

and validity. However, realizing this ideal could be challenging for non-native English 

language teachers and students. This is especially so for non-native English language teachers 

because developing test items in a language that neither they nor their students use as their 

mother tongue raises a multitude of issues related to quality and validity. A descriptive study 

on this problem was conducted at a Technical University in Ghana which focused on item 

writing flaws in a communication skills test. The use of multiple-choice test in Ghanaian 

universities has increased over the last decade due to increasing student intake. A 20-item 

multiple-choice test in communication skills was administered to 110 students. The test items 

were analyzed using a framework informed by standard item writing principles based on the 

revised taxonomy of multiple-choice item-writing guides by Haladyna, Downing and 

Rodriguez (2002). The facility and discrimination index (DI) was calculated for all the items. 

In total, 60% of the items were flawed based on standard items writing principles. The most 

violated guideline was wording stems negatively. Pearson correlation analysis indicated a 

weak relationship between the difficulty and discrimination indices. Using the discrimination 

indices of the flawed items showed that 84.6 % of them had discrimination indices below the 

optimal level of 0.40 and above. The lowest DI was recorded by an item with which was worded 

negatively. The mean facility of the test was 45%. It was observed that the flawed items were 

more difficult than the non-flawed items. The study suggested that test items must be properly 

reviewed before they are used to assess students’ knowledge.  

Keywords: Discrimination index, Facility, Flawed item, multiple-choice item 

 
ISSN 1916-7822. A Journal of Spread Corporation 

 
Volume 10. No. 2  2021 Pages 121-143 

https://journal.lib.uoguelph.ca/index.php/ajote/index


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   122 

 
Background 

Multiple-choice items (MCIs) are one of the most commonly used item types for classroom 

assessment (Haladyna & Rodriguez, 2013). Because of its widespread use in the classroom 

multiple-choice items are highly indispensable regarding testing students at all levels of 

education. There is hardly any subject that cannot use MCI.  Test results are often used to make 

decisions that determine the future of students and teachers. It is, therefore, imperative that 

MCIs are properly handled at the construction, administration, scoring and in analyzing the test 

scores.  

Moreover, increasing enrolment in Ghanaian tertiary institutions, multiple-choice items 

(MCIs) have become the preferred mode of assessing students because of the greater ease and 

speed of grading of multiple-choice questions compared with other testing formats. They also 

cover a wide scheme of work or syllabus adequately. When assessing a large population of 

students, it will be very difficult to ignore multiple-choice items (MCI).  In 2015, the gross 

enrolment ratio in tertiary education for Ghana was 16.2 %. The gross enrolment ratio in 

tertiary education in Ghana increased from 0.7 % in 1972 to 16.2 % in 2015 growing at an 

average annual rate of 28.47%. This indicates a substantial growth in students’ intake at the 

tertiary level in Ghana. Due to increasing student intake many faculty members have resorted 

to the use of multiple-choice items to meet students’ assessment needs. Previously, MCIs were 

rarely used in our tertiary schools. Restricted response type of test items dominated mid and 

end of semester examinations because students’ enrolment was not as high as today. Essay was 

the ideal means of assessing students’ knowledge. The challenge now is how to construct good 

quality MCIs that have minimal flaws to elicit the knowledge possessed by the students. Essay 

test items are relatively easy to construct compared to MCIs. Multiple-choice item is made up 

of the stem- which possess the problem to be resolved, a set of options that consist of the key 

and distractors, respectively, the answer and the options that suggest the wrong alternative to 

the test taker.   

McKeachie (1999) notes that multiple-choice test is a staple of higher education 

because it provides an efficient and effective measure of student learning. The acceptance of 

multiple-choice test has increased over the years, partly due to improvements in technology in 

scoring multiple-choice items quickly and easily. The multiple-choice test is also highly 

reliable across scorers, unlike essay tests. For these reasons and others (Frederiksen, 1984), 

many educators consider the multiple-choice format as an optimal method of testing. However, 

multiple-choice tests have spawned substantial controversy mainly because questions of this 


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  123 

 
type are limited to measuring recall of knowledge. Despite all the weaknesses associated with 

it, multiple-choice (MC) tests are preferred in educational settings in Ghana. 

The teacher’s aim in crafting MCIs is not to confuse students, but to yield scores that 

accurately reflect the extent to which students have obtained an acceptable working knowledge 

of the content. It is worthy to note that students who pass a poorly designed test, may not 

necessarily possess adequate knowledge of the topic and this may constitute a real threat to 

their future academic progression. Well-constructed multiple-choice items represent a versatile 

assessment tool with the potential to assess students for sufficient evidence of the knowledge 

of the tested content (Rush, Rankin and White, 2016). A required characteristic of a multiple-

choice item is its power to be able to discriminate between the test takers who have learnt the 

material they are being tested on and those who have not learnt it. The discrimination index 

can differentiate between students of different ability levels. Poorly constructed MCIs also 

contain cues that allow students to guess the correct answer without prerequisite knowledge 

(Downing, 2002).  

It is time-consuming and energy-sapping to construct an item that is good enough to 

discriminate among testees. According to Rush, Rankin and White (2016), it takes about 20 to 

60 minutes to couch a quality multiple-choice item free from errors. Despite the importance of 

classroom assessment, studies suggest some deficiencies in teacher-made tests, (Mehrens and 

Lehmann, 2009). According to Lane et al. (2016), most teachers craft flawed items that 

measure the ability to recall basic facts and concepts. Some effects of item-writing flaws on 

students are; items may be easier or more difficult than intended, clues that will allow 

unprepared students to guess the correct answer and unnecessarily complex or esoteric test 

items prevent prepared students from demonstrating their knowledge (Case and Swanson, 

2002; Downing, 2005).  A poorly constructed item can inflate or deflate the student's score on 

a test and this represents a false picture of the student's performance.  Also, these flaws are 

capable of clouding the results obtained from the test. The effect of the clouding of results is that it 

changes the interpretation of the results and it contributes to unwanted evidence getting into the test 

data. 

There are many factors to consider when evaluating the quality of MC items. Firstly, 

one can examine the extent to which items conform to widely accepted item-writing guidelines, 

such as avoiding negatively worded items and avoiding the use of longer options as the 

answers.  Writing MCIs without following the guidelines can result in lowering the quality of 

individual items and the test as a whole (Downing, 2005; Tarrant & Ware, 2008).  


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   124 

 
Specific research-based principles guide the development of effective MCIs (Downing 

& Haladyna, 1997; Haladyna, 2004). The use of these research-based principles makes item 

writing a science.  In a review by Haladyna, Downing and Rodriguez (2002) a taxonomy of 31 

item-writing principles, based on an analysis of 27 current educational measurement textbooks 

and 27 empirical research papers, have been identified. Deviation from established item-

writing principles may result in a decrease in validity evidence for tests (Downing, 2002). Items 

that violate one or more of the standard items writing principles—flawed items—tend to 

produce construct irrelevant easiness which refers to a contaminating influence on test scores 

that tend to systematically increase test scores for a specific examinee or a group of examinees; 

construct-irrelevant difficulty does the opposite. It systematically decreases test scores for a 

specific examinee or a group of examinees (Haladyna, Downing, 2004). These effects are 

called construct-irrelevant variance (CIV).  

Similarly, Multiple-choice test items tend to have high grading reliability, however, 

creating valid MC items that perform reliably is difficult and requires skill to do that properly 

(Pellegrino, Chudowsky & Glaser, 2001). Many teachers usually have little to no formal 

training regarding appropriate assessment practices. For example, most pre-service teachers in 

Ghana take a three-hour course in assessment in schools which is woefully inadequate to 

prepare them for the enormous task ahead of them.  In addition to a lack of training, another 

reason is creating MC items can be difficult because there are numerous ways to lessen an 

examination's validity based on how it is designed. 

Although multiple-choice items are commonly used in tertiary institutions and other 

levels of language instruction and other subject areas in Ghana, there has not been enough 

evidence about the item analysis of multiple-choice tests in the area of communication skills. 

It is important to note that “the quality of a test largely depends on the quality of the individual 

items” (Oluseyi & Olufemi, p.240). Therefore, this study attempts to fill this gap by answering 

the following research questions: (1) What is the difficulty level (item facility) of each item on 

the communication skills test? (2) What is the discrimination index (item discrimination) of 

each item on the communication skills test? (3) What is the relationship between the facility 

and the discrimination index of the item on the communication skills test? 

The present study was undertaken in the first semester of the 2019/20 academic year   

to assess some item writing flaws observed in a Communication Skills test in a Technical 

University in Ghana. The observed flaws were, "longer sentences as answers among the 

options", "use of negative words", "starting a statement with a blank" and "options not arranged 


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  125 

 
in alphabetical order". These flaws are capable of introducing testwiseness in the answering of 

the test by students. Testwiseness is defined as a student's capacity to utilize the characteristics 

and formats of a test and/or the test-taking situation to receive a high score (Millman, Bishop 

and Ebel, 1965). Flawed test items are capable of providing test-wise cues to the items, thereby 

distorting the true performance of the student. Given the widespread use of MCIs in tertiary 

educational settings, it is practically important to look carefully at the quality of the MC items 

on classroom tests, and this was the specific purpose of this study. Evaluating the quality of 

MC items involves many factors. Firstly, one can examine the extent to which items conform 

to widely accepted item-writing guidelines, such as avoiding negatively worded items and 

avoiding the use of longer options as the answers. Writing MCIs without following the 

guidelines can result in lowering the quality of individual items and the test as a whole 

(Downing, 2005; Tarrant & Ware, 2008). Secondly, analyzing the responses from testees is 

another approach used in the research presented here. Specifically, we analyzed a teacher-made 

Communication Skills test administered to first-year IT students at a Technical University in 

Ghana and focused on how students score on the flawed items were affected by two major 

characteristics of MC items: facility and discrimination index.  

Methodology 

Participants  

All 110 respondents were first-year undergraduate students pursuing a degree in Information 

Technology Education.  These students were purposively selected because the instructor agreed 

to allow us to use his test items for the study. The test was conducted at a Technical University 

in Ghana. To ensure fairness, students were informed ahead of time to prepare for the test.   

Instrument  

The test consisted of 20 multiple-choice test items. The test items were used to assess students’ 

communication, paragraphing and writing skills.  The topics covered in the test constituted 

what had been taught in that semester. The items had four options, one of them being the correct 

answer and the other three being distractors. One of the items had only two options because it 

was a true/false item. In scoring the test no penalty was employed for guessing and the correct 

answer was awarded a mark of 1. Thus, the maximum possible score of the test was 20 and a 

minimum of 0. A copy of the test is included as an appendix. 

 
Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   126 

 
Time Period and Procedure  

Data was collected during the first semester of the 2019/20 academic year. The test was 

conducted under examination conditions.  The test was administered by the English language 

instructors of the university and the students were supposed to answer the questions in 25 

minutes.  

Data Analysis  

Students’ responses from the MCIs were analyzed using Microsoft Excel. The MCIs were 

analyzed to obtain the facility (p-value), the discrimination index (DI), and distractor analysis 

for all non-correct options. The Kuder–Richardson formula (KR-20) was used to assess the 

internal reliability of the test scores.  Data was analyzed based on the three research questions. 

Research questions #1 and #2 were answered using Microsoft Excel template for the facility 

and discrimination indices. Research question #3 was answered using Pearson product-moment 

correlation.  

Item Evaluation Procedure 

There are several methods available for evaluating multiple-choice items. Specifically, for this 

study, the aim was to determine which items exhibited the best quality in terms of option 

performance. The evaluation consisted mainly in inspecting the facility and discrimination 

indices for each test item.   The result of the examinees’ performance in the test was used to 

analyze the facility and the discrimination indices (DI) of each multiple-choice item. The 

facility is calculated as a percentage of the total number of correct responses to the test items. 

It is calculated using the formula p =
𝑅

𝑇
, where p is the facility, R is the number of correct 

responses, and T is the total number of responses (which includes both correct and incorrect 

responses). 

According to Hotiu (2006), the p (proportion) value ranges from 0 to 1. When 

multiplied by 100, the p-value converts to a percentage, which is the percentage of students 

who got the item correct. The higher the p-value, the easier the items. This means the higher 

the facility, the easier the item is understood to be.  It needs to be conceptualized that a p-value 

is a behavioural measure. Instead of explaining the facility in terms of some intrinsic 

characteristic of the item, the facility is defined in terms of the relative frequency with which 

those taking the test choose the correct response (Thorndike, Cunningham, Thorndike, & 

Hagen, 1991).  


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  127 

 
The item DI is the point biserial correlation between getting the item right and the total 

score on all other items. The discrimination index is the point biserial correlation between item 

score and corrected total score. This was computed using a Microsoft Excel sheet. The 

advantage derived from using this procedure is that it provides a more accurate assessment of 

the discrimination power of items because they take into account the responses of all students 

rather than just high and low scoring groups.  

Discrimination index reflects the degree to which an item and the test as a whole are 

measuring a unitary ability, values of the coefficient will tend to be lower for tests measuring 

a wide range of content areas than for more homogeneous tests. Item discrimination indices 

must always be interpreted in the context of the type of test which is being analyzed. The higher 

the DI the better the test item discriminates between the students with higher test scores and 

those with lower test scores.  According to Haladyna and Rodriguez (2013) guidelines for 

evaluating MC items based on classical test theory is provided in Table 1. 

Table 1: Guidelines for evaluating test items (adapted from Haladyna & Rodriguez, 2013, p. 
350) 

 
Results 

In this study, the flawed items were more difficult than non-flawed items measuring the same 

content. The mean test score was 9. The lowest score was 3 and the highest was 15. A quick 

synopsis of the test results showed that 12 of 20 items were flawed when assessed in the light 

of the standard item forms. This represents 60% of the items.  This observation was based on 

the revised taxonomy of multiple-choice item-writing guides by Haladyna, Downing and 

Rodriguez (2002). It is known that teacher-made tests are filled with a lot of flaws. Four kinds 

of flaws were observed namely, longer option as the answer, negatively worded item, options 

not arranged in alphabetical order and starting with a blank. The frequently violated rule was 

'negatively worded item', there were 8 out of 12 flawed test items. The reliability estimate for 

the test measured by KR-20 was 0.41. According to Rudner and Schafer (2002), a teacher-

made assessment needs to demonstrate reliability coefficients of approximately 0.50 or 0.60.  

Type Difficulty Discrimination                               Comment 

1 .60 to .90 Above .15 Ideal item; moderate difficulty and high 
discrimination 2 .60 to .90 Below .15 Poor discrimination 

3 Above .90 Disregard High-performance item; usually not very 
discriminating 4 Below .60 Above .15 Difficult but very discriminating 

5 Below .60 Below .15 Difficult and non-discriminating 
6 Below .60 Below .15 Identical to type 5 except that one of the 

distractors has a pattern 

like type 1, which signifies a key error 


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   128 

 
The facilities for the test items ranged from .17 to .86. The mean facility for the test was 45% 

that is p = 0.45. The optimal facility for a classroom teacher-made test is .63. Comparing the 

optimal value with a facility for the test indicates that the test was difficult. A possible reason 

for the difficult nature of the test could be because 60% of the items were flawed.  Items 1and 

2 were very easy compared to the optimal facility for classroom achievement tests.  

Odukoya et. al (2018), observed that majority of the items used in a private university 

in Nigeria (about 60 out of the 70 items fielded) did not meet psychometric standard (of 

appropriate difficulty and distractive index) and consequently need moderation or deletion. 

Approximately, 86% of the items failed to meet the suitable psychometric properties. The 

current shows that 12 out of 20 items were flawed when assessed in the light of the standard 

item forms. This represents 60% of the items.  This therefore, collaborates the study by 

Odukoya et al. (2018) which suggests that the teachers need to improve their item writing skills. 

One danger associated with flawed items is that they introduce errors into the student’s test 

score thereby making the difference between the observed score and the true score wider. It is 

these same results filled with errors that will be used to provide certificates for the students. 

This, therefore, calls on all involved in crafting of test items especially, MCIs to be abreast of 

the currents suggestions for writing multiple-choice test items.  

Table 2: Item Properties   

Item  Type of flaw  Facility  Discrimination 

index  

Q 1  Longer option as the answer  .86 .09 

Q 2 Negatively worded   .70 .17 

Q 4 Negatively worded  .66 .10 

Q 4 Options not arranged in alphabetical order  .66 .10 

Q 5 Starting with a blank  .58 .40 

Q 6 Negatively worded  .46 .11 

Q 7  Negatively worded  .33 .11 

Q 9 Negatively worded  .45 .15 

Q 10 Starting with a blank .49 .17 

Q 15 Negatively worded  .46 -.02 

Q 16  Negatively worded  .39 .08 

Q 18 Negatively worded  .33 .22 

 
Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  129 

 
Discussions  

Longer option as the answer 

Responses should be similar in length, the shorter the better (if one option is much longer than 

the others, students will assume that is either the correct answer or blatantly the wrong answer, 

which gives them better odds at “guessing”. Responses differ in length because the teacher 

would like to add qualifying phrases to make sure the keyed option is correct. Many novices 

and experienced test constructors make this mistake to respond free from disputations 

(Haladyna & Downing,1989). Test-wise students look for the “longest responses” to choose as 

answers during tests when they are unsure of the correct option. Making the responses almost 

the same length reduces the bias of such items and improves the validity of the measurement. 

The item analysis showed that the item was very easy. A total of 86 % of the students answered 

the item correctly.  The facility of the item was .86. The distracters for these items were not 

good enough to discriminate among the students. 

Figure 1:  An example of an item with the longer option as the answer from the test is: 

 Q.1  

Communication is a universal activity because it……………… 

A. is a credible source of data collection  

B. create the right atmosphere of dialogue  

C. enables people to give out or receive information  

D. is therapeutic   

Response key is C 

 
According to Haladyna and Downing (1989), 8 of 9 studies suggest that using long correct 

options makes items easier; Q.1 shows no difference. This study shows that the item with a 

longer option as the answer was the easiest of the 20 items used for the test. In this study, this 

particular item was the first and it can be argued that the first few items on a test be easier to 

avoid students losing confidence. This agrees with the notion regarding the arrangement of test 

items, that easier items start, put the difficult items in the middle and conclude with the easy 

items. In as much as a test developer will fulfil this condition, they should not compromise on 

item writing rules to justify the wrong.   


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   130 

 
It recorded a p-value of .86 and this validates the rule that the long options are often 

selected as the answer. This finding, therefore, collaborates with Haladyna and Downing 

(1989). Also, this item was less discriminatory between knowledgeable and non-

knowledgeable students (DI=0.09). Board and Whitney (1972) as cited in Haladyna and 

Downing (1989) posit that low-achieving students were inclined to take advantage of the 

option-length clue, whereas higher achievers do not. Higher achievers are disadvantaged when 

such items are prevalent in a test. It is, therefore, important that tests are rid of these items.    

Starting a question with a blank 

The stem can be written in two forms - as a question or partial sentence that requires 

completion. Research comparing these two formats have not demonstrated any significant 

difference in test performance (Violate, 1991, Haladyna, 1999, Masters et al 2001). To 

facilitate understanding of the question to be answered, it is recommended that if a partial 

sentence is to be used, a stem with parts missing either at the beginning or in the middle of the 

stem should be avoided (Haladyna, 1999).  It is recommended that the blank should be towards 

the end of the stem or sentence. It is natural that when conversing with someone you do not 

start with a blank for the other person to fill in. This does not make the communication 

effective. Therefore, starting a stem or sentence with a blank distorts the meaning of the 

question and makes it difficult to answer.  From the test, questions 5 and 10 started with a blank 

and recorded facility of .58 and .49 respectively. From Table 1 it is obvious that their facilities 

are not within the range of ideal items. 

 
An example of an item that starts with a blank is: 

Q.5 …………………………...are to move the reader to make a particular choice or to take a 

particular course of action. 

A. Expository paragraphs  

B. Mainstream paragraphs  

C. Narrative paragraphs  

D. Persuasive paragraphs  

The key is D 

 
The distractor analysis of item 5 indicates no selection for option C. In the entire test, item 5 

was the only item that recorded a zero selection for an option. This particular option C was 


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  131 

 
rendered implausible and unattractive even to the lower achievers because the item began with 

a blank. Three options for this particular item would have worked well instead of padding the 

question options that did not work. Because putting options A and B together results in less 

than 40%, clearly the options were not plausible at all.  

On the contrary, item 5 which started with a blank recorded a discrimination index of 

0.40 among all the 20 items which are considered as the optimal level of DI for multiple-choice 

test items. Even though item 5 is flawed based on the guidelines for writing MCIs, its item 

indices were all excellent (p=.58, DI=.40). This finding does not conform to outcomes from 

other studies indicating that beginning the stem with a blank cannot discriminate well between 

the high achievers and lower achievers. Also, this finding does not conform to outcomes from 

other studies indicating that beginning the stem with a blank makes the item difficult. 

However, with all the excellent item characteristics for item 5 it clearly violates the 'cover 

options test'. An important indicator of a well-written MCI, according to many writing 

guidelines, is that the question should allow testees to formulate a correct response without 

needing to first look at the available options – a criterion commonly referred to as the 'cover 

options test' (Case and Swanson, 2002). This guideline is violated by beginning a stem with a 

blank hence the respondent needs to fit in the options one after the other before the correct 

option can be selected.  

Negatively worded items  

In this study, 7 of 20 items were constructed in the negative sense. Cautiously, these items 

can be deemed as difficult based on the established range of p-values considered excellent, 

that is 40% and 60%.  Item 15 which was constructed negatively recorded a discrimination 

index of -.02. A negative value indicates an inverse relationship between the item and test 

performance. A total of 55 students got the item wrong. This makes the item difficult. This 

study is not consistent with Harasym, Price, Brant, Violato and Lorscheider (1992), who 

posited that negatively couched items stems were less difficult. The seven negatively worded 

items in this study recorded low discriminatory indices. These items were confusing for the 

higher achievers but rather favoured the lower achievers. A survey of authors’ guidelines on 

multiple-choice items showed that 63% are in favour of wording items positively, 19% were 

in support of using negatively worded items and 19% did not discuss this (Haladyna, Downing 

and Rodriguez, 2002). Dudycha and Carpenter ( 1973) speculated t h at  a negative orientation 

i n  the stem makes the item more difficult because this requires a  negative-to-positive shift 


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   132 

 
in mental orientation t o  answer questions. The purpose of MCIs is to measure the 

achievement of learning objectives thus to achieve instructional validity, a teacher must test 

what has been taught and since most learning objectives are not stated negatively, it means 

that writing items in the negative will not help the teacher to test students’ knowledge on what 

has been taught. Negatively worded stems should only be used when a student knows what 

to avoid or what is not the case. Most often, students l e a r n  what to do and what is the 

case, and thus item stems should be positively worded (Harasym, Price, Brant, Violato & 

Lorscheider, 1992). 

A student who reads very fast may miss the ‘not’ keyword and, consequently, the entire 

meaning of the question. However, negatively couched items are useful when necessary to 

measure an appropriate objective (e.g., what is not correct) and with negative words (at least, 

except and not) are underlined, highlighted, boldface, italicized and CAPITALIZED to caution 

the individual taking the test. It is also important to word each option positively to avoid 

forming double negative with the stem. There are some legitimate uses of negative terms, such 

as the case of medications or procedures that are contraindicated; this use may be legitimate in 

that “contraindication” is a straightforward concept in health care domains.  

 
Q. 15 Which of the following is NOT a feature of a good note? 

A. Descriptive  

B. Readable  

C. Reflects the source  

D. Understandable  

The key is A 

 
Options arranged in order (Alphabetical or sequential) 

One of the most ignored multiple-choice item writing guidelines is to arrange options in 

alphabetical, chronological, or conceptual order. The purpose of this guideline is for easy 

reading and to make options appear attractive to the test taker. Options can be arranged in either 

ascending or descending order. It becomes difficult to observe this guideline especially, for 

novice teachers when the options are in sentences. Also, it is not easy to obey this, because 


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  133 

 
observing this guideline could create a discernible pattern for the correct options on a test. In 

analysing these two items, it was valuable to look at the discrimination index (DI) and distractor 

analysis was important. The DI indicates the relationship between performance on an 

individual item and performance on the overall test. The discrimination index (DI) was .01 for 

item Q4. The DI indicates that Q4 falls below the range where the items are to be rejected or 

improved. Arranging options in an order is not considered important by most class teachers 

when constructing items, but the indices obtained from this study shows that it must be 

considered seriously. The p-value for Q4 was .66. An example of the item with options not 

arranged in alphabetical order is: 

 
Q4. The concluding paragraph has the following functions EXCEPT to……..  

A. introduce new research idea 

B. refer to cause or effect of issues  

C. summarize main ideas  

D. suggest solutions to issues  

The key is A 

 
The first two options were correctly ordered, but option (d) should have come before option 

(c).  

Correlation between the difficulty index and discrimination index  

A Pearson product-moment correlation coefficient was computed to assess the relationship 

between the facility and the discrimination index. There was a positive correlation between the 

two variables, r = 0.162, n = 20, p = 0.496. The maximal discrimination (D =0.4). A scatter 

plot (Figure 1) represents the relationship between the difficulty index (P) and discrimination 

index (D) of 20 MC items. The plot is not linear, rather very scattered in shape which indicates 

a weak relationship between the difficulty index and discrimination index. Increasing difficulty 

indices do not correlate with increases in the discrimination index. It is seen from the scatter 

plot that only three items (15 %) recorded negative discrimination. 

 
Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   134 

 
Table 1: Correlation between facility and discrimination index 

  Discrimination index Facility 

Discrimination 

index 

Pearson 

correlation  

Sig (2-tailed) 

N 

1 

 
20 

 
.162 

 
.496 

 
20 

Facility  Pearson 

correlation  

Sig (2-tailed) 

N 

.162 

 
.496 

 
20 

1 

 
20 

 
Recommendations for Improvement  

There are many innovative and easy ways to implement strategies that can help teachers 

improve their knowledge and skills in constructing MCIs. For example, team item writing by 

way of leveraging the expertise of colleagues and senior faculty members can help construct 

well-composed test items. New faculty must be oriented toward constructing MCIs and be 

-0.20

-0.10

0.00

0.10

0.20

0.30

0.40

0.50

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

Correlation between Discrimination index and facility 

D
is

c
ri

m
in

a
ti

o
n
 i

n
d
e
x
  

Facility / difficulty index  


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  135 

 
assigned to experienced hands. Another strategy is 'nudging' and 'shoving' where distractors 

are easily manipulated to alter an item's facility. According to Quaigrain and Arhin (2017), the 

appropriate quality of MCI is based on the presence of quality distractors. This assertion 

collaborates with the suggestion of moving from the traditional four or five option responses 

to three options because this will assuage the challenges of producing more than two plausible 

distractors without affecting students’ performance. On the other hand, reducing the number of 

options tends to increase students’ chances of guessing which is better than padding the options 

with non-functional distractors. 

Teachers must understand that crafting multiple-choice items is both science and art. 

Both science and art are to be employed fully to get efficient items that will produce valid 

results. Test items must be solely written based on learning objectives so that the teacher will 

know what exactly each item measured. This in a way will reduce the flaws in the items and 

will ensure instructional validity.  

Jozefowicz, Koeppen, Case, Galbraith, Swanson and Glew (2002) posit that teachers 

spend substantial time planning their lectures and course materials for students and insufficient 

time is allowed for test preparation and review before administration. Consequently, many tests 

are administered to students without adequate pretest to check the quality of the items. Before 

a test is administered, a review by an examinations review board whose members have 

adequate knowledge in item writing can eliminate flawed items.  

Conclusion 

The purpose of the study was to analyze the responses obtained from a communications skills 

test in terms of facility and discrimination indices. Overall analysis of the items, show that most 

of the items were ideal- the items had acceptable facility and discrimination indices based on 

the guidelines for evaluating test items (Haladyna & Rodriguez, 2013, p. 350). On the other 

hand, some items were not ideal. These items were found to need revision to improve the 

discriminatory power and the quality of the examination.  

It is worth noting that firmly following the guidelines for writing multiple-choice test 

items can reduce the number of flawed items on a test. Some teachers who write multiple-

choice items are either ignorant or find it too laborious to use the guidelines. As a result, you 

find items on a test being flawed.  The guidelines serve as a compass to the item writer to his 

destination without missing out. The ultimate aim of all item writers is to have good items that 

will produce valid test scores. After all, students who pass a poorly designed exam, although 


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   136 

 
they do not possess adequate knowledge of the content of the examination, may constitute a 

real threat for themselves and society at large. 

Studies show that tests with item writing flaws tend to disadvantage high achieving 

students and lower their test scores (Tarrant & Ware, 2008). Contrastingly, tests with item 

writing flaws can improve the grades of weaker students, who are not familiar with the content 

of the test (Nedeau-Cayo et al., 2013; Tarrant & Ware, 2008). We, therefore, are of the view 

that, in contrast to these undisputed views of authorities, using flawed MCIs can play a valuable 

role in the development (and not simply in the measurement of academic performance) of 

students' critiquing abilities. Thus, students become active observers of the learned materials 

and objectives of the lessons taught. This will help in the development of multiple-choice items 

and making them stand the test of time and become robust. But this will come at a cost to both 

teachers and students.  

Finally, the findings of this study have significance for practising teachers and test 

developers in that particular care should be taken when selecting or crafting new items to 

achieve an accurate measurement of students' behaviour. Also, Item analyses should be utilized 

to improve already existing test items.  

Assessment Implications  

On the strength of the findings made from this study using MCIs should begin with preparation 

of test blueprint that carefully adheres to the rules for writing multiple-choice items. Thereafter 

all items should be pre-tested, analysed, and subjected to item moderation to augment the 

overall content and construct validities. These processes will require the input of subject and 

psychometric specialists. To ensure that faculty uses quality test items in our tertiary 

institutions, it implies that these processes be established as statutory quality assurance 

procedures. In sum, anytime a teacher is deciding on a test, the following must be carefully 

considered to ensure maximum gains from the test: 

(a) the test's specific purpose: 

The traditional assumption is that tests are used to determine whether students have learned 

what they were expected to learn or the level or degree to which students have learned the 

material. Beyond this, the tests’ purpose might be that the teacher intentionally plans to 

improve students’ performance or plans to help students estimate what they are capable of 

doing outside the classroom. Hence, pre-testing students to determine their previous 

knowledge before introducing a new topic is an important teaching strategy. Teachers are 


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  137 

 
aware that continuously assessing students enables them to adjust their instruction 

appropriately. 

(b) what kind of information is required from the test results: 

Test results provide vital information to both the testee and teacher. The teacher must 

consider what type of information the test scores is to provide to the student. 

(c) the impact of test results on students: 

Test results must have a positive impact on students. The test results should provide 

feedback that will motivate students to learn and improve their learning. 

Author Contribution 

• Ato Kwamina Arhin: introduction, literature review, data analysis, discussion, and 

conclusion. 

• Jonathan Essuman: literature review, methodology and conclusion. 

• Ekua Arhin: data analysis, discussion and conclusion. 

References  

Alderson, J. C. (2000). Assessing Reading. Cambridge: Cambridge University Press. 

Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and  

clinical sciences. (3rd ed.). Philadelphia: National Board of Medical Examiners; p. 31–

66. http://www.nbme.org/publications/item-writing-manual.html.  

Dudycha, A. L., & Carpenter, J. B. (1973). Effects of item format o n  item discrimination 

and difficulty. Journal of Applied P s y c h o l o g y , 58, 116-121. 

Downing, S. M. (2002). Construct-irrelevant variance and flawed test questions: Do multiple 

-choice item writing principles make any difference? Academic Medicine 77, 103–104. 

Downing, S.M. (2005). The effects of violating standard item writing principles on tests  

and students: The consequences of using flawed test items on achievement  

examinations in medical education. Advances in Health Sciences Education: Theory  

and Practice, 10(2), 133–143. 

Downing, S. M. (2006). Twelve steps for effective test development. In. S. M. Downing & T.  

M Haladyna (Eds.), Handbook of test development (pp. 3–25). Mahwah, NJ: Lawrence  

Erlbaum Associates. 

http://www.nbme.org/publications/item-writing-manual.html


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   138 

 
Downing, S. M., & Haladyna, T. M. (1997). Test item development: Validity evidence from  

quality assurance procedures. Applied Measurement in Education, 10, 61-82. 

http://dx.doi.org/10.1207/s15324818ame1001_4 

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.)  

Englewood Cliffs, NJ: Prentice-Hall.  

Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning.  

American Psychologist, 39, 193-202 

Haladyna, T. M. (1999). When should we use a multiple-choice format? Paper presented at the  

annual meeting of the American Educational Research Association, Montreal Canada.  

Haladyna, T.M & Downing, S.M. (1989).  Validity of a Taxonomy of Multiple-Choice Item- 

Writing Rules. Applied Measurement in Education, 2:1, 51-78, 

http://dx.doi.org/10.1207/s15324818ame2014_4 

Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). 

Mahwah, NJ: Lawrence Erlbaum. 

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice  

item-writing guidelines for classroom assessment. Applied Measurement in Education, 

15, 309-344. http://dx.doi.org/10.1207/S15324818AME1503_5 

 Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes  

testing. Educational Measurement: Issues and Practice, 23(1), 17–27.  

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and Validating Test Items. New  

York, NY: Routledge. 

Haladyna, T. M.  & Downing, S. M. (2004). Construct-irrelevant Variance in High-Stakes  

Testing. Educational Measurement: Issues and Practice 

Harasym, P. H., Doran, M. L., Brant, R., & Lorscheider, F. L. (1992). Negation in stems of  

single-response multiple-choice items. Evaluation and the Health Professions, 16(3), 

342–357. http://doi:10.1177/016327879201500205  

 
Jozefowicz, R. F., Koeppen, B. M., Case, S., Galbraith, R., Swanson, D., & Glew, R. H. (2002).  

http://dx.doi.org/10.1207/s15324818ame1001_4
http://dx.doi.org/10.1207/s15324818ame2014_4
http://dx.doi.org/10.1207/S15324818AME1503_5


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  139 

 
The quality of in-house medical examinations. Academic Medicine, 77, 156-161.  

http://dx.doi.org/10.1097/00001888-200202000-00016 

Lane, S., Raymond, M. R., Haladyna, T. M., & Downing, S. M., Test development process. In  

S. Lane., M. R. Raymond., & T. M. Haladyna (Eds), Handbook of test development 

(pp. 3-18). New York: Routledge, 2016. 

McKeachie, W. J. (1999). Teaching tips: Strategies, research, and theory for college and  

university teachers (10th ed.). Boston: Houghton Mifflin. 

 Masters, J. C., Hulsmeyer, B. S., Pike, M. E., Leichty, K., Miller, M. T., & Verst, A. L. (2001).  

Assessment of multiple-choice questions in selected test banks accompanying 

textbooks used in nursing education. Journal of Nursing Education, 40, 25-32. 

 Mehrens, W. A. & Lehmann, I. J., Measurement and evaluation in education and psychology.  

New York: Harcourt Brace College Publishers, 1991.  

Millman, J., Bishop, C. H., & Ebel, R. (1965). An analysis of testwiseness. Educational and  

Psychological Measurement, 25, 707-726. 

Nedeau-Cayo, R., Laughlin, D., Rus, L., & Hall, J. (2013). Assessment of item-writing flaws  

in multiple-choice questions. Journal for Nurses in Professional Development, 29, 52–

57. 

Odukoya, J.A., Adekeye, O., Igbinoba, A.O.& Afolabi, A. (2018). Item analysis of university- 

wide multiple choice objective examinations: the experience of a Nigerian private 

university. Quality & Quantity 52, 983–997 https://doi.org/10.1007/s11135-017-0499-

2 

Oluseyi, A. E., & Olufemi, A. T. (2012). The Analysis of Multiple-Choice Item of the Test of  

an Introductory Course in Chemistry in a Nigerian University. International Journal of 

Learning, 18(4), 237-246. doi:10.18848/1447-9494/CGP/v18i04/47579. 

Pellegrino, J., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science  

and design of educational assessment. Washington, D. C.: National Academy Press. 

Quaigrain, K & Arhin, A.K. (2017). Using reliability and item analysis to evaluate a teacher- 

http://dx.doi.org/10.1097/00001888-200202000-00016


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   140 

 
developed test in educational measurement and evaluation, Cogent Education, 4:1, 

1301013 https://doi.org/10.1080/2331186X.2017.1301013 

Rudner, L. M., & Schafer, W. D. (2002). What teachers need to know about assessment. 

Washington, DC: National Education Association. Retrieved from http://echo.edres. 

org: 8080/nea/teachers.pdf 

Rush, B. R., Rankin, D. C. and White, B. J. (2016). The impact of item-writing flaws and item  

complexity on examination item difficulty and discrimination value. BMC Medical 

Education 16:250 DOI 10.1186/s12909-016-0773-3 

Tarrant, M., & Ware, J. (2008). Impact of item-writing flaws in multiple-choice questions on  

student achievement in high-stakes nursing assessments. Medical Education, 42, 198-

206. http://dx.doi.org/10.1111/j.1365-2923.2007.02957x 

Thorndike, R. M., Cunningham, G. K., Thorndike, R. L., & Hagen, E. P. (1991). Measurement  

and evaluation in psychology and education (5th ed.). New York, NY: MacMillan. 

Violato, C. (1991). Item difficulty and discrimination as a function of stem completeness.  

Psychological Reports 69(3 P11):739-743 

 (March, 2021). Ghana-Gross enrollment ratio in tertiary education 

https://knoema.com/atlas/Ghana/topics/Education/Tertiary-Education/Gross-

enrolment-ratio-in-tertiary-education 

  
https://doi.org/10.1080/2331186X.2017.1301013
http://echo.edres/
https://knoema.com/atlas/Ghana/topics/Education/Tertiary-Education/Gross-enrolment-ratio-in-tertiary-education
https://knoema.com/atlas/Ghana/topics/Education/Tertiary-Education/Gross-enrolment-ratio-in-tertiary-education


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  141 

 
Appendix 

DEPARTMENT OF LANGUAGES EDUCATION 

MID-SEMESTER EXAMINATION 

PAPER TITLE: COMMUNICATION SKILLS                DURATION: 25 MINS 

PAPER CODE: GPD 111 

INDEX NUMBER: …………………………… CLASS: …………………………………... 

Answer all the questions on the question paper.  

1. Communication is a universal activity because _________________ . 

a. it is a credible source of data collection 

b. it creates the right atmosphere of dialogue 

c. it enables people to give out or receive information 

d. it is therapeutic 

2. Which one of the following does NOT constitute one of the reasons why we 

communicate? 

a. To establish relations 

b. To persuade 

c. To share information 

d. To test the efficacy of words 

3. Non-verbal communication largely involves the use of _________________ . 

a. cues 

b. posters 

c. symbols 

d. vision 

4. The concluding paragraph has the following functions EXCEPT _______________ . 

a. to introduce new research idea 

b. to refer to cause or effect of issues 

c. to summarize main ideas 

d. to suggest solutions 

5. _______________________ are to move the reader to make a particular choice or to 

take a particular course of action. 

a. Expository paragraphs 

b. Mainstream paragraphs 

c. Narrative paragraphs 

d. Persuasive paragraphs  

6. Paragraphs can be distinguished according to the following EXCEPT ____________ . 

a. Function 

b. Length 

c. Position 

d. Unity 

7. Which one of the following is not a major drawback in effective communication? 


Ato Kwamina Arhin, Jonathan Essuman, & Ekua Arhin

 
AJOTE Vol.10 No. 2 (2021), 121-143   142 

 
a. Distortion 

b. Faking attention 

c. Noise 

d. Semantic distraction 

8. New perspectives are discovered when the author _______________ the work. 

a. edits  

b. proofreads 

c. simmers 

d. writes 

9. A group of students were given a topic to write for an assignment. The students have 

to go through the following stages EXCEPT __________________ . 

a. drafting 

b. prewriting 

c. revision 

d. submission 

10. ________________ is a reading technique that aims at understanding and obtaining of 

a story or text. 

a. Close reading 

b. Scanning 

c. Skimming 

d. Studying  

11. The final step in the pre-writing stage of the writing process is _______________ . 

a. brainstorming 

b. clustering 

c. editing 

d. outlining 

12. The stage in the communication process where the recipient seeks the correct meaning 

of the message is called _____________ . 

a. channelling 

b. decoding 

c. feedback 

d. interpretation 

13. The use of siren by the police, fire service or ambulance to suggest urgency of the 

situation is an example of  

a. Haptics 

b. Kinesics 

c. Objectics 

d. Oculesics 

14. Converting an idea into written or spoken form of language is called __________ . 

a. Decoding 

b. Encoding 

c. Ideation 

d. Interpretation 


Analysis of Item Writing Flaws in a Communications Skills Test in a Ghanaian University

 
AJOTE Vol.10 No. 2 (2021), 121-143  143 

 
15. Which of the following is NOT a feature of a good note? 

a. Descriptive 

b. Readable 

c. Reflects the source 

d. Understandable  

16. All the following are ways of writing notes EXCEPT ________________ . 

a. Detailing 

b. Headline 

c. Paraphrasing 

d. Spidergram 

17. Which of the following reading techniques is employed at the Survey Stage of the 

SQ4Rs Method? 

a. Browsing 

b. Drifting 

c. Scanning 

d. Skimming 

18. Which of the following is NOT a negative reading habit? 

a. Fixation 

b. Regression 

c. Stress 

d. Vocalization 

19. The type of reading that is undertaken for academic and professional purposes is ___ . 

a. Diving 

b. Extensive reading 

c. Faster reading 

d. Intensive reading 

20. Any paragraph that lacks Coherence would lack Unity. 

a. TRUE 

b. FALSE