ENDOUROLOGY AND STONE DISEASE

Inter-observer Agreement between Urologists and Radiologists in Interpreting the Computed 
Tomography Images of Emergency Patients with Renal Colic

Jun Young Hong1, Dong Hoon Lee2*, In Ho Chang3, Sung Bin Park4, Chan Woong Kim5, Byung Hoon Chi6

Purpose: Low-dose non-enhanced computed tomography (LDCT) has been shown to provide low radiation expo-
sure with proper diagnostic accuracy compared to standard dose non-enhanced computed tomography (SDCT) in 
patients with renal colic. The goal of our study is to estimate the accuracy of LDCT and SDCT interpretation by 
emergency medicine residents who primarily treated patients with renal colic.

Materials and Methods: Thirty sample images of both LDCT and SDCT from renal colic patients were extracted 
from January 2013 to December 2015 in a tertiary teaching hospital. Five emergency medicine   residents interpret-
ed 60 image samples over a time span of 3 weeks. The presence of a ureteric stone, the stone’s size and location, 
and signs of obstruction were recorded in the reports. A total of 300 reports were compared with formal readings 
by a radiologist. The inter-observer agreement and kappa value were calculated for comparative analysis. 

Results: Identification of ureteric stones showed almost perfect inter-observer agreement on SDCT (kappa value: 
0.93), and the percentage of agreement was 96.7%. However, on LDCT, the inter-observer agreement was substan-
tial (kappa value: 0.73), and the percentage of agreement was 88.0%. 

Conclusion: Using SDCT, emergency medicine residents had almost perfect inter-observer agreement in interpret-
ing the CT images of patients with renal colic compared to a radiologist. However, when using LDCT, they had a 
lower inter-observer agreement. 

Keywords: emergency department; non-enhanced computed tomography; radiation dose; renal colic; urolithiasis. 

INTRODUCTION

Approximately 12 percent of males and 6 percent of females will experience urolithiasis during 
their lifetime, and up to 50 percent of these individuals 
will experience a recurrence of urolithiasis within 10 
years (1-3). Renal colic is a common symptom seen in the 
emergency department (ED). In the United States, more 
than a million patients are treated for urolithiasis in an 
emergency department over the span of a year(4).
In the past, intravenous urography (IVU) was the imag-
ing method of choice for diagnosing urolithiasis. How-
ever, unenhanced helical computed tomography (CT) 
has become the standard for diagnosing acute flank 
pain, and has replaced IVU as the best initial diagnostic 
imaging modality in patients with renal colic (5). Fur-
thermore, CT examination is often repeated to assess 
the progress of the condition. In 2007, Broder et al. re-
ported that approximately half of the patients who had 
been diagnosed with urolithiasis in the ED received two 
more CT scans over the course of their condition and 
that approximately 30 percent of these patients under-
went more than three scans(6). The risk of cancer is in-
creased at a rate greater than 1/1000 per abdominal CT 

1Department of Emergency Medicine, College of Medicine, Chung-Ang University, Seoul, Republic of Korea. 
2Department of Emergency Medicine, College of Medicine, Chung-Ang University, Seoul, Republic of Korea
3Department of Urology, College of Medicine, Chung-Ang University, Seoul, Republic of Korea.
4Department of Radiology, College of Medicine, Chung-Ang University, Seoul, Republic of Korea.
5Department of Emergency Medicine, College of Medicine, Chung-Ang University, Seoul, Republic of Korea.
6Department of Urology, College of Medicine, Chung-Ang University, Seoul, Republic of Korea.
*Correspondence: Department of Emergency Medicine, College of Medicine, Chung-Ang University, Seoul, 
Republic of Korea.
Tel: 82-2-6299-3109. E-mail: emdhlee@cau.ac.kr.
Received March 2017 & Accepted November 2017

scan, and the risk is higher in young patients (7,8). There-
fore, a means to reduce radiation exposure is needed, 
and low-dose CT (LDCT) was studied as a diagnostic 
modality.
The correct interpretation of urolithiasis by an emer-
gency physician via CT images could be advantageous 
for the early diagnosis and treatment of renal colic pa-
tients. SDCT interpreted by emergency physicians has 
an appropriate percentage of inter-observer agreement 
compared with formal reporting by a radiologist(9). 
However, there has not been a study that evaluated the 
accuracy of the LDCT interpretation of urolithiasis by 
emergency physicians. In this study, we compared the 
accuracy of LDCT interpretation by emergency medi-
cine residents with radiologists.

METHODS
This study was approved by the institutional review 
board of the Chung-Ang University Hospital (IRB No. 
C2016023). Written informed consent was obtained 
from each participant. We have residency program in 
major of emergency medicine for 4 years. Five emer-
gency medicine (EM) residents (two junior and three 

Endourology and Stone Diseases  6



senior residents) of Chung-Ang University Hospital 
were included to compare the accuracy of interpretation 
of LDCT. 
Study design
This Study retrospectively reviewed images of renal 
colic patient performed in emergency department. Five 
emergency medicine residents interpreted 60 patient 
CT scans over a time span of 3 weeks and reported to-
tal 300 cases. A simple reporting method was provided 
to the EM residents. Each interpretation was recorded 
on the reporting form, which included brief clinical in-
formation. The case report form included the presence 
of ureteric stones, their size and location, and signs of 
obstruction. Other clinical findings that were unrelated 
to ureteric stones were recorded to create a descriptive 
clinical picture. The participants’ reports were com-
pared for inter-observer agreement with reports by a 
professional radiologist.
Sampling Images
974 patient image samples were composed of unen-
hanced abdominal pelvic CT conducted in the emer-
gency department from January 2013 to December 
2015. During this period, another study was conduct-
ed to compare the diagnostic efficacy of LDCT with 
SDCT (Title: Diagnostic Trial of Low-Dose CT for the 
Detection of Urolithiasis IRB No. C2013234(1194)). 
All 30 LDCT and SDCT image samples were random-
ly extracted from 974 patient image samples and those 
were anonymized and randomized. Total 60 patient CT 
images were used for the interpretation.
CT protocol 
All of the unenhanced CT studies were performed using 
a 256-MDCT scanner (Brilliance iCT, Philips Health-
care, Cleveland, OH, USA). All patients underwent 
a scan using the standard- or low-dose protocol from 
the proximal aspect of the T12 vertebra to the distal 
aspect of the symphysis pubis in the supine position. 
The standard-dose protocol and low-dose protocol was 
achieved at a manually set peak tube voltage of 120 
kVp and 100kVp, with automated Z-axis dose modula-
tion by the scout image (DoseRight, Philips Healthcare, 
Cleveland, OH, USA),and the tube current was limited 
to 150 mAs and 100mAs, respectively. The remaining 
scanning parameters were as follows: detector config-
uration, 128x0.625; pitch, 0.915; beam collimation, 
80 mm; rotation time, 0.4 sec; and helical acquisition. 

Image noise was reduced by iterative reconstruction in 
the acquired scan images and could reduce the radiation 
dose from 5.77 mSV to 1.34 mSV.
Sample size and statistical analysis
To compare the accuracy of diagnostic performance uti-
lizing LDCT by EM residents with a radiologist, the in-
ter-observer agreement was used. The kappa coefficient 
was calculated using the R statistical computing pro-
gram (R Foundation for Statistical Computing, Vienna, 
Austria. http://www.R-project.org/). We considered a 
kappa value of ≤ 0.19 as poor, a kappa value of 0.20-
0.39 as fair, a kappa value of 0.40-0.59 as moderate, 
a kappa value of 0.60-0.79 as substantial, and a kappa 
value of ≥ 0.80 as almost perfect(10). If the expected low-
er boundary for a kappa one-sided 95% confidence in-
terval (CI) was 0.5 and the expected preliminary kappa 
value and prevalence were 0.73 and 0.5, respectively, 
based on a previous study, a minimum of 146 subjects 
were required for this study of inter-observer agreement 
by 2 raters. We estimated sample size using the kap-
paSize library statistical program in R-project (R Core 
Team [2012]. R: A language and environment for statis-
tical computing; R Foundation for Statistical Comput-
ing, Vienna, Austria. http://www.R-project.org/)

RESULTS
This study included 44 men and 16 women. The mean 
age was 47.5 years (inter-quartile range: 34.25 to 
59.75). Overall, 55% (n = 33) of the CT images were 
positive for urolithiasis, and 45% (n = 27) were nega-
tive for urolithiasis. All five EM residents who partici-
pated in this study had experience with more than 1000 
scans for SDCT and fewer than 100 scans for LDCT.  
When identifying ureteric stones on SDCT, the percent-
age of agreement between residents and radiologists 
was 96.7%, and the inter-observer agreement was near 
perfect (kappa value; 0.93). However, ureteric stones 
were identified at a percentage of agreement of 88.0%, 
and the inter-observer agreement was substantial (kap-
pa value; 0.73) on LDCT scans. The LDCT interpreta-
tion by an EM resident had a 75% negative predictive 
value compared with the interpretation conducted by a 
radiologist. This was significantly low compared with 
the 98% of agreement on SDCT scans (Table 1).
The results of the interpretation of size and location of 
ureteric stones were perfect in terms of the inter-observ-
er agreement (kappa value; 0.85, 0.95) on SDCT and 

Table 1. Diagnostic performance of identifying urolithiasis

 agreement (95% CI)  Kappa Sensitivity(%)  Specificity(%)  PPV† (%)  NPV‡ (%)

total CT 92.3(89.3-95.4)   0.85 90.9(86.5-95.3)  94.1(90.0-98.1)  94.9(91.5-98.4)  89.4(84.3-94.6)
SDCT 96.7(93.8-99.6)   0.93 96.7(92.0-100)  96.7(92.9-100)  95.1(89.5-100)  97.8(94.6-100)
LDCT 88.0(82.7-93.3)   0.73 87.6(81.2-94.0)  88.9(79.3-98.4)  94.8(90.4-99.3)  75.5(63.5-87.4)

†PPV, positive predictive value; ‡NPV, negative predictive value

  Sign of urinary obstruction  Stone size(5 mm)  Stone location
  
  Agreement (95% CI) Kappa Agreement (95% CI) Kappa Agreement (95% CI) Kappa

total CT  76.3 (71.5 - 81.2) 0.52 85.0 (80.9 - 89.1) 0.76 89.3 (85.8 - 92.8) 0.84
SDCT  86.0 (80.4 - 91.6) 0.71 91.3 (86.8 - 95.9) 0.85 91.3 (86.8 - 95.9) 0.93
LDCT  66.7 (59.0 - 74.3) 0.34 78.7 (72.0 - 85.3) 0.66 78.7 (72.0 - 85.3) 0.76

Table 2. Diagnostic performance of sign of obstruction, stone size and location

Interpretation of low-dose CT in ED-Hong et al.

Vol 15 No 02   March-April 2018  7



were substantial for inter-observer agreement (kappa 
value; 0.66, 0.76) on LDCT (Table 2). Sign of obstruc-
tion results had a kappa value of 0.71 on SDCT and 
0.34 on LDCT (Table 2).

DISCUSSION
Rafi et al. compared the accuracy of interpretation of 
conventional CT scans by emergency physicians for pa-
tients with renal colic, and the results had a sensitivity of 
92%, a specificity of 99%, and a kappa value of 0.89(9). 
These results indicate that emergency physicians could 
interpret the images of SDCT almost perfectly for pa-
tients with renal colic in the ED. Therefore, emergency 
physicians used non-enhanced CT to evaluate patients 
in many EDs who presented with renal colic. Recently 
low-dose, non-enhanced helical CT was studied in pa-
tients with renal colic to reduce the radiation threat of 
SDCT. Therefore, there have been several reports that 
LDCT had high sensitivity and specificity for the di-
agnosis of urolithiasis when interpreted by radiologists 
and urologists(11,12).
In our study, the kappa value was 0.93 (Table 1), which 
was similar to that found in Rafi’s previous study. 
Kwon et al. reported that a recent survey, LDCT in pa-
tients with renal colic demonstrated similar sensitivity 
and specificity compared with the conventional Stand-
ard-dose CT (SDCT)(13-15). However, there have not 
been studies on the accuracy of LDCT interpretation 
performed by emergency physicians. In this study, we 
compared the agreement of interpretation on LDCT the 
kappa value was 0.73, which is a lower value than that 
of SDCT. Thus, when LDCT was used in the ED and 
the result was read by an emergency medicine resident, 
some patients could have been misdiagnosed, although 
the final confirmation of interpretation was made by a 
radiologist.
Yang et al. reported that the diagnostic performance of 
low-dose appendiceal CT was influenced by the amount 
of a physician’s experience with both low- and stand-
ard-dose CT interpretation(12). Urologists with an appro-
priate amount of experience seem to frequently be in 
agreement with radiologists on LDCT scans. However, 
our participants (emergency physician residents) had 
worked with more than 1000 scans of SDCT for a year; 
therefore, they were familiar with images of SDCT. 
According to this study, emergency medicine residents 
could find urolithiasis in the images of SDCT as well as 
a radiologist could and could interpret the exact loca-
tion and size. Therefore, there was minimal difficulty in 
making a clinical decision with SDCT. In contrast, the 
images from LDCT were coarser than those of SDCT 
because of the low radiation amount. Emergency med-
icine residents had no experience with interpreting im-
ages from LDCT prior to this study. Each resident had 
worked with fewer than 100 scans of LDCT, and they 
had not trained in the interpretation of LDCT during 
the study period. Therefore, they were not familiar with 
the coarse and low-quality LDCT images. In this study, 
emergency medicine residents simply read the images 
of LDCT based on previous knowledge and compe-
tence with SDCT. To improve the accuracy of interpret-
ing LDCT images, emergency medicine residents may 
be required to have sufficient experience and training.
In this study, sign of obstruction, size, and location of 
ureteric stones had substantial to almost excellent in-
ter-observer agreement (kappa value; 0.71, 0.85, 0.93) 

compared with formal readings on SDCT (Table 2). In 
contrast, a fair to substantial inter-observer agreement 
(kappa value: 0.34, 0.66, 0.76) was observed on LDCT 
scans. The signs of obstruction and the size and loca-
tion of ureteric stones are important for determining 
the prognosis and first-line treatment for renal colic 
patients (16). Therefore, emergency medicine residents 
should be trained in the interpretation of LDCT.
When emergency medicine residents could find stones 
in LDCT or SDCT, they had little difficulty with in-
terpreting the characteristics of urolithiasis. When they 
had not been trained to interpret the low-quality imag-
es of LDCT, it was difficult to deduce the presence of 
a stone. Therefore, if they have more experience with 
LDCT images and receive training on the interpretation 
of these images, LDCT might be as useful in the assess-
ment of urolithiasis as SDCT.
LDCT has been reported to have adequate diagnostic 
performance while reducing the risk of cancer from 
radiation as compared with SDCT in a variety of dis-
eases(17-19). Accordingly, LDCT was used for the exam-
ination of several diseases in some EDs. If emergency 
physicians can properly interpret LDCT without wait-
ing for a formal reading, they can potentially determine 
the appropriate treatment course and prognosis in the 
ED more expediently. As our study showed, the inter-
pretation of LDCT by emergency medicine residents 
had low inter-observer agreement compared with for-
mal reading. For proper interpretation with LDCT scans 
in renal colic patients, additional experience and educa-
tion may be required.
Limitations
The participants who enrolled in this study were in one 
tertiary medical center. Therefore, the sample could not 
represent the accuracy of interpretation of LDCT by an 
emergency physician. However, our participants had a 
similar accuracy of interpretation on SDCT compared 
with a previous study that included emergency physi-
cians. We used a lower greyscale monitor compared 
to radiologists, who use a higher greyscale monitor for 
formal reading. This could have affected the diagnostic 
accuracy of our participants due to the lower imaging 
quality. However, emergency physicians do not use a 
high-resolution monitor for readings in the ED setting. 
Further, in a real ED setting, emergency physicians take 
detailed histories and conduct physical examinations of 
patients before reading the CT results. Our study in-
cluded only brief patient information prior to reading.

CONCLUSIONS
When SDCT was performed in the ED for patients with 
renal colic, emergency medicine residents had a high 
level of agreement of interpretation compared with 
radiologists. However, on low-dose unenhanced CT, 
emergency medicine residents had relatively lower lev-
els of agreement of interpretation with the use of SDCT 
compared with a radiologist.

CONFLICTS OF INTEREST
The authors declare that they have no conflict of inter-
est.

REFERENCES
 1. Bartoletti R, Cai T, Mondaini N, et al. 

Epidemiology and risk factors in urolithiasis. 

Interpretation of low-dose CT in ED-Hong et al.

Endourology and Stone Diseases  8



Urol Int. 2007;79 Suppl 1:3-7.
 2. Curhan GC. Epidemiology of stone disease. 

Urol Clin North Am. 2007;34:287-93.
 3. Sierakowski R, Finlayson B, Landes RR, 

Finlayson CD, Sierakowski N. The frequency 
of urolithiasis in hospital discharge diagnoses 
in the United States. Invest Urol. 1978;15:438-
41.

 4. Brown J. Diagnostic and treatment patterns for 
renal colic in US emergency departments. Int 
Urol Nephrol. 2006;38:87-92.

 5. Türk C, Petřík A, Sarica K, et al. EAU 
guidelines on diagnosis and conservative 
management of urolithiasis. European 
urology. 2016;69:468-74.

 6. Broder J, Bowen J, Lohr J, Babcock A, Yoon 
J. Cumulative CT exposures in emergency 
department patients evaluated for suspected 
renal colic. The Journal of emergency 
medicine. 2007;33:161-8.

 7. [No authorlisted]. Radiation and your patient: 
a guide for medical practitioners. Ann ICRP. 
2001;31:5-31.

 8. Brenner D, Elliston C, Hall E, Berdon 
W. Estimated risks of radiation-induced 
fatal cancer from pediatric CT. AJR Am J 
Roentgenol. 2001;176:289-96.

 9. Rafi M, Shetty A, Gunja N. Accuracy of 
computed tomography of the kidneys, ureters 
and bladder interpretation by emergency 
physicians. Emergency Medicine Australasia. 
2013;25:422-6.

 10. Landis JR, Koch GG. The measurement of 
observer agreement for categorical data. 
biometrics. 1977159-74.

 11. Kwon JK, Chang IH, Moon YT, Lee JB, 
Park HJ, Park SB. Usefulness of Low-dose 
Nonenhanced Computed Tomography With 
Iterative Reconstruction for Evaluation of 
Urolithiasis: Diagnostic Performance and 
Agreement between the Urologist and the 
Radiologist. Urology. 2015;85:531-8.

 12. Yang HK, Ko Y, Lee MH, et al. Initial 
Performance of Radiologists and Radiology 
Residents in Interpreting Low-Dose (2-mSv) 
Appendiceal CT. AJR Am J Roentgenol. 
2015;205:W594-611.

 13. Poletti P-A, Platon A, Rutschmann OT, 
Schmidlin FR, Iselin CE, Becker CD. Low-
dose versus standard-dose CT protocol 
in patients with clinically suspected renal 
colic. American Journal of Roentgenology. 
2007;188:927-33.

 14. Niemann T, Kollmann T, Bongartz G. 
Diagnostic performance of low-dose CT for 
the detection of urolithiasis: a meta-analysis. 
AJR Am J Roentgenol. 2008;191:396-401.

 15. Kulkarni NM, Uppot RN, Eisner BH, 
Sahani DV. Radiation dose reduction at 
multidetector CT with adaptive statistical 
iterative reconstruction for evaluation of 

urolithiasis: how low can we go? Radiology. 
2012;265:158-66.

 16. Glenn M. Preminger M, Co-Chair; Hans-
Goran Tiselius, MD, PhD, Co-Chair; Dean 
G. Assimos, MD, Vice-Chair; Peter Alken, 
MD, PhD; A. Colin Buck, MD, PhD; Michele 
Gallucci, MD, PhD; Thoma Knoll, MD, PhD; 
James E. Lingeman, MD; Stephen Y. Nakada, 
MD; Margaret Sue Pearle, MD, PhD; Kemal 
Sarica, MD, PhD; Christian Turk, MD, PhD; 
J. Stuart Wolf, Jr., MD. 2007 Guideline for 
the Management of Ureteral Calculi. american 
urological association. 2007.

 17. Berrington de González A, Mahesh M, 
Kim K-P, et al. Projected cancer risks from 
computed tomographic scans performed in the 
United States in 2007. Archives Of Internal 
Medicine. 2009;169:2071-7.

 18. Keyzer C, Tack D, de Maertelaer V, Bohy 
P, Gevenois PA, Van Gansbeke D. Acute 
appendicitis: comparison of low-dose and 
standard-dose unenhanced multi-detector row 
CT. Radiology. 2004;232:164-72.

 19. Team NLSTR. Reduced lung-cancer mortality 
with low-dose computed tomographic 
screening. The New England journal of 
medicine. 2011;365:395.

Interpretation of low-dose CT in ED-Hong et al.

Vol 15 No 02   March-April 2018  9