165J Contemp Med Sci | Vol. 5, No. 3, May–June 2019: 165–169

Original

Decision support detection system for lung nodule abnormalities based 
on machine learning algorithms
Muna Alsallal,a* Mhd Saeed Sharif,b Bydaa Hadi,a and Ruwaida Albadrya

aSamawa Technical Institute, ALfurat Alawsat Technical University, Iraq.
bSchool of Architecture, Computing and Engineering, University of East London (UEL) London, UK
*Correspondence to Muna Alsallal (E-mail: mona27793@gmail.com)
(Submitted: 14 October 2018 – Revised version received: 28 November 2018 – Accepted: 19 January 2019 – Published online: 26 June 2019)

Introduction
Lung nodules growth which is so-called a “abnormal pulmonary 
nodules” represents an important factor for lung cancer.1,2 It is 
considered to be the foremost reason of human cancerous 
deaths worldwide among the two genders.1 The pulmonary 
 nodules has taken an oval shaped growth in the infected lung 
which sometimes called lung’s spots. Patients with cancerous 
lung are mostly were detected at advanced stage due to  difficulties 
of detection at an early phases.1 The most important reason for 
late detection of this disease is inefficiency of X-rays technique 
in detecting such cases.1 The only efficient method is the using of 
the CT-scan device. Another reason for detection difficulties is 
the need for extensive expertise from the radiologists to be able 
in categorizing the lung nodules as normal or abnormal. From 
the radiologists point-of-view the size of normal lung nodules 
naturally ranges from 2 to 5 mm in thickness and from 1.5 to 3 
cm in diameter. However, if the growth in thickness or in area is 
larger, then the case needs to be considered as critical. It is also 
highlighted that the radiologists time is limited when compared 
with the number of patients that they have to see on daily basis. 
So designing a technique that can recognize the abnormal  
nodules at an early stage is important and can be assumed as a  
proactive step to prevent the case aggravation.

Nowadays, machine learning is being a well-known 
method for improving robust automatic techniques to analyze 
wide-range of biomedical data.3 Sajda in 2006 in his paper has 
reviewed several state-of-the-art of machine learning tech-
niques that have demonstrated their effectiveness in many 
tasks such as disease detection, diagnosis and treatment moni-
toring. The review also defined the expansions in machine 

learning techniques, concentrating on the main two types of 
learning techniques; supervised and unsupervised. It also went 
through linear and non-linear approaches and show advan-
tages and disadvantages of each. Such systems were called 
 computer aided diagnosis (CAD) systems, which are widely 
used in healthcare research area and mostly based on 
 machine-learning algorithms. The standard detection tech-
niques relied on a reference datasets that can work as a founda-
tion for developing robust systems. The Zero-Change dataset 
was chosen to measure the systems’ performance on nodule 
growth. The chosen dataset was used by Krishnan et al.2 in their 
open source system for detecting the progress in lesion sizing. 
They have evaluated their proposed system on a proven clinical 
dataset (Zero-Change images dataset). This dataset comprises 
of 12 pairs of images, each pair contains one for the whole lung 
and the other is for just the region around the nodules.

One more thing that needs to be noticed is the factors that 
emphasizes the efficiency of classification tasks in such designed 
systems. This efficiency can be influenced by two  factors; 
chosen techniques and the set of features. Current scholars 
highlighted that machine-learning techniques demonstrated 
their ability to enhance the performance of detection systems.3 
Bengio et al.4 stated that learning  procedures can significantly 
enhance by combining several types of algorithms such as 
linear and non-linear approaches in order to gain featured 
deceptions of data. Another paper that was written by Xu et al.5 
was highlighted the usefulness of using non-linear machine-
learning approaches for such  systems and high-weight the 
value of the feature extraction phase.

Objective Investigates the possibility of the early detection in the case of lung infection. Most cases of lung cancer are detected in advance 
stages as this type is hard to be detected in premature phases. The Zero-Change dataset was chosen to measure the systems’ performance 
on nodule growth. The chosen dataset is assumed as a proven clinical dataset and was used by several researchers in their proposed 
systems. The designed detection technique has been considered to be used as a decision support tool. This technique is based on using 
two machine learning algorithms for classification purposes.
Methods Machine learning techniques was applied to detect interesting patterns and manipulate the dataset images in order to enhance 
the classification task. Pre-processing procedures also have been applied using different MATLAB functions. In addition two well-known 
techniques that related to the support vector machines (SVMs); the radial basis function kernel-based SVMs and the polynomial kernel-
based SVMs have also applied using MATLAB© package named PRTools.
Results The performance of this paper proposed technique was evaluated based on several values of both chosen techniques. The 
procedure was implemented on the basis of leave-one-out-cross-validation procedure in order to generate unbiased outcomes. The results 
of cross-validation procedure is averaged and presented as a classifier outcome. The misclassification error, sensitivity, specificity and 
accuracy are calculated to show a clear image about the two classifiers.
Conclusion The experimental results have shown that the proposed system has scored high accuracy by polynomial kernel SVM. A set of 
distinguishable representative features are correlated together by a statistics association. Also, this designed system can be considered as 
a benchmark for developing of other tissues abnormalities signs detection systems.
Keywords machine learning, lung cancer, lung nodule, detection system, classification, image processing, SVM

ISSN 2413-0516


166 J Contemp Med Sci | Vol. 5, No. 3, May–June 2019: 165–169

Detection system for lung nodule abnormalities based on machine learning algorithms
Original

M. Alsallal et al.

This research has proposed an automatic detection  technique 
relied on non-linear machine-learning algorithms to  classify if 
the nodule size is normal or abnormal as a second opinion to 
support the radiologist decision. The system was based on four 
main techniques; dataset acquisition, pre-processing  technique, 
image features extraction and finally applying a machine-learning 
technique for classification process. The proposed system trained 
the nodules sizes after extracting the required features. The next 
section explores the related work that was done in detecting 
 nodules abnormalities using machine-learning approaches. The 
 following section clarifies the experimental design which includes 
the system phases. Then, the results will be discussed in Section 4. 
Finally, we conclude this paper content, method and outcomes in 
conclusion section.

Background
Many studies were conducted using several machine-learning 
approaches in order to detect lung nodules abnormality. 
Bellotti et al.6 has used contour-based model to detect nodules 
abnormalities. They scored high performance by gaining 88.5% 
for detection accuracy. Another group of researchers Riccardi 
et al.7 has proposed a new system using 3D radial transforms to 
detect nodules with overall 71% of detection accuracy. Group 
of researchers used three different algorithms to determine the 
pulmonary nodules in CT scan as described in Camarlinghi  
et al.8 An advanced technique that based on feed forward 
neural networks used by Abdulla and Shaharum9 has been 
implemented by X-ray images. Specific features set proposed 
for their study included the area, perimeter and shape. A study 
that was conducted by Kuruvilla and Gunavathi10 proposed six 
discrete features, three of them was mentioned in Abdullah 
and Shaharum.9 They add skewness in addition to the time of 
extraction features from segmented slices that contained two 
lungs. All those separated features were trained by non-linear 
machine-learning algorithm, and reported good outcomes.

From the other side of the problem, Support Vector  
Machines algorithms and their non-linear derivatives were widely 
used in detecting abnormalities in medical images. The support 
vector machines (SVMs) were used to detect colon abnormality as 
a  classifier11,12 with encouraging outcomes. The same SVM 
 applications also implemented for ovarian abnormalities which has 
revealed good results.13 Furey et al.13 used the SVMs technique to 
several kinds of abnormality data which related to different human 
body parts such as blood, colon and more. The application of SVM 
scored high accuracy percentage in most classification cases. Papers 
written by Segal et al.14,15 applied the SVM for classification   purposes 
to separate two types of cells with different characteristics. They 
reported in their results high classification accuracy and has given 
new indicators to be noted by researchers. A study by Statnikov et 
al.16 worked in assessing several machine learning algorithms for 
their classification performance based on wide range of gene 
expression to detect abnormality, recommended SV, recommended 
SVMs as a promising approach in this field. So SVMs has two vital 
characteristics that make them superior to their peers.

Support Vector Machine
Support vector machines are a group of correlated supervised 
learning algorithms used commonly for classification and regres-
sion. They are considered among the most recent, sophisticated, 
and high-performance algorithms in artificial intelligence. They 

aim to separate high-dimensional data in “hyperspace” (“space” 
with a dimensionality equal to the number of features derived 
from the training set) using a hyperplane.17 SVM can be defined 
as a linear model and it always looks for a hyperplane to separate 
one class from another.18 SVMs technique is optimized by con-
necting with what so-called kernels. Kernels works on a notion 
that change the depiction of the dot-product in the linear formu-
lation to be non-linear. To illustrate: Vapnik18 stated that SVM 
based on dot-products for limited dimension which can be 
defined as Equation (1)

   ( )
Tf y x y=  (1)

where y represents the outcome of deploying the procedure  
on non-linear conversion to the proposed data, for example,  
yi = φ(xi), classification is performed by taking the sign of f(y). 
The notion of the non-linear transmute is mapping the data 
into a high-dimensional space, where the transmuted data is 
divided by a hyperplane and linearly separated. To optimize 
this separable process, the conversion of the data is completed 
by a kernel trick.19 The kernels is used to enhance the  separation 
process by replacing the dot-products by non-linear kernel 
functions. In high level language we can briefly describe the 
main three types of SVM kernels as shown in Table 1

Experimental Design
This paper proposed an automatic detection system consisted 
of four main components. Dataset acquisition, features 
 engineering technique which consists of image features extrac-
tion and classification component. The four components are 
shown in Fig. 1.

The same SVM applications also implemented for ovarian 
abnormalities which has revealed good results.13 Furey et al.13 

Table. 1 Briefly reviewed three types of kernels

Kernel 
time

Kernel 
 visualisation

Kernel description

Linear kernel

There is just a “normal” 
dot-product, thus in 2D 
decision boundary is 
always line. Separation of 
most of points correctly, 
but due to the “stiffness” 
of the hypothesis not all 
points can be captured.19

Polynomial 
Kernel

The polynomial kernel in-
duces space of polynomial 
combinations of the fea-
tures, up to certain degree. 
Consequently we can work 
with slightly “out of straight 
line” decision boundaries, 
such as parabolas.18

RBF Kernel

RBF Kernel induced space 
is Gaussian distributions 
space ... each point 
becomes a probability 
density function (up 
to scaling) of a normal 
distribution. In such space, 
dot-products are integrals 
and consequently the 
flexibility is very high.18


167J Contemp Med Sci | Vol. 5, No. 3, May–June 2019: 165–169

Original
Detection system for lung nodule abnormalities based on machine learning algorithmsM. Alsallal et al.

used the SVMs technique to several kinds of abnormality data 
which related to different human body parts such as blood, 
colon and more. The application of SVM scored high accuracy 
percentage in most classification cases. Papers written by Segal 
et al.14,15 applied the SVM for classification purposes to  separate 
two types of cells with different characteristics. They reported 
in their results high classification accuracy and has given new 
indicators to be noted by researchers. A study by Statnikov  
et al.16 worked in assessing several machine learning  algorithms 
for their classification performance based on wide range of 
gene expression to detect abnormality, recommended SV, two 
SVMs algorithms were used to perform classification process; 
radial basis function (RBF) and polynomial function kernels, 
respectively. These two techniques are trained using 
 zero-change dataset to find an optimal mode to classify images 
into their corresponding classes. Then, during the  classification 
phase, the images are classified of whether normal or abnormal. 
The overall process of dime-sized damage progress detection 
will be described in details in the following sub-sections.

For the purpose of this paper proposed system, a lung 
images data sets from a publicly available database The dataset 
comprises of twelve scan pairs from the CRPF_Database as the 
first six scan pairs of images take the similar portion thickness 
which normally equal to 1.25 mm. While the second six pairs 

scan set has different portion thickness which is equal to 2.5 mm. 
To describe the pair images; one of the pair scanned images 
shows the entire lung. The second one concentrates on the nodule 
region with the same scan resolution of the first. keeping in mind 
that the person who entitled to perform the scanning process has 
not change his location during the two scan processes.

For researchers convenient, the access of the 12 scan pairs 
are publicly available for use in a single compressed file of 
DICOM images. It can be downloaded conveniently from 
https://veet.via.cornell.edu/cgi-bin/datac website with all 
images accessories. They can be obtained from the CRPF_
Database homepage below the title “Repeat Single Session”. 
Furthermore, the scan images can also be downloaded sepa-
rately in several versions using the direct function for image 
database download function. To prepare for implementing the 
proposed system phases a particular pre-processing procedure 
is applied. We first applied the shade correction technique, as 
the unbalanced lightning if the scan image needs to be modi-
fied if a specific object has to be properly spotted. The second 
step of pre-processing procedure is applied using morpholog-
ical-opening task as a proactive step to assess the background. 
This task was applied using MATLAB© function which is 
called “imopen”. This function typically is applied for morpho-
logical-opening task that can be performed scan with 12 pixels 
structuring element. After morphological-opening is com-
pleted by applying “imtophat” MATLAB© function, the task of 
increasing the image contrast is performed. The task is com-
pleted by implementing “imadjust” MATLAB© function as 
shown in Figure 2. Then, “bwareaopen” another MATLAB© 
function is applied to eliminate the noise of the background. 
As a result the concluding output, which only displays the 
affected scan image region, is gained.

For this paper purpose seven types of features were 
 determined to be extracted from each image to train the 
 classifier. The set of chosen features that needs to be extracted 
is as  follows: definite number of pixels in nodule region (DNP), 
Marginal length around the region feature (MLAR)  (Perimeter 
of the entity) which can be clearly shown in Fig. 3 as the 

Fig. 1 This represents a block diagram of the proposed  detection 
 system for lungs nodules abnormality classifier 11,12 with  encouraging 
outcomes.

Fig. 2  This represents the scan lung image after implementing the 
task of increasing the image contrast.

Fig. 3  This represents the marginal length around the region  feature 
which perimeter of the entity.


168 J Contemp Med Sci | Vol. 5, No. 3, May–June 2019: 165–169

Detection system for lung nodule abnormalities based on machine learning algorithms
Original

M. Alsallal et al.

element surrounded by red color, maximum length of the 
major axis in the region (MaLR), minimum length of the 
minor axis in the region (MiLR), feature ratio [equal to 
(MaLR) divided by (MiLR) and finally Region roundness (RR) 
which is calculated by ((4 * π * DNP)/(MLAR^2)].

The classification procedure is implemented via a 
MATLAB© package named PRTools.18 This paper proposed 
system based on implemented two well-known techniques 
that related to the SVMs; the RBF kernel-based SVMs and the 
polynomial kernel-based SVMs.

Results
The performance of this paper proposed technique was  
evaluated based on several values of both chosen techniques. 
The procedure was implemented on the basis of leave-one- 
out-cross-validation procedure to generate unbiased 
 outcomes. The results of cross-validation procedure is  averaged 
and  presented as a classifier outcome as shown in Table 2. The 
 misclassification error, sensitivity, specificity and accuracy are 
calculated to show a clear image about the two classifiers. The 
experimental results show that the two classifiers are able to 
identify both classes; however, the polynomial kernel-based 
SVMs outperformed the RBF kernel-based SVMs.

On the other hand, the system performance were assessed 
based on radiologists opinions which was compared with 
system performance as shown in Table 3. The nodules fea-
tures that characterised by radiologists and the proposed 
system are premeditated and then compared. The presence of 
nodules which sized from 1.5 to 3 cm in diameter and 
 thickness that ranges from 2 to 5 mm represented the normal 
lung nodules.

However, if those measures are characterised in larger 
forms then they should be considered as an abnormal case 
which needs more attention and investigation. In order for the 
system performance to be tested when compared with the 
radiologist’s opinions, a statistical test were applied. t-Test was 
applied to measure the differences in mean between the 
marked images and the system output. The results are then 
depicted in Table 3, the value of sigma that was obtained under 
95% confidence interval, 0.211. This sigma value demonstrated 

Table. 2  Averaged results of two classifiers

Measure Polynomial kernel SVM RBF kernel SVM

Misclassification error 0.0919 0.1826

Accuracy 0.9191 0.7273

Specificity 1 0.3224

Sensitivity 0.881 0.763

Table 3. The analysis of t-test between the proposed system performance and radiologists opinions

Mean N Std. deviation

Pair 1 Marked 1.71 12 0.419

Testing 1.90 12 0.331

Correlations of paired samples

N Correlation Sig.

Pair 1 Marked and 
testing 12 −0.019 0.901

Paired differences

Mean Std. deviation Std. error mean

95%  Confidence interval 
of the difference t Sig. (two-tailed)
Lower Upper

Pair 1 Marked and 
testing −0.101 0.525 0.081 −0.265 0.069 −1.131 0.211

N = The number of nodules images analysed by the radiologists and the proposed system

that the means of the radiologist’s opinions and the system 
output are not pointedly varied.

Discussion
The proposed system was based on the notion of integrating 
several techniques starting from choosing the dataset, fea-
tures engineering to transform the original image to a set of 
proposed features then implementing the classifier. The 
pre-processing mechanism represents an important proac-
tive step to optimise the classifier performance. The detec-
tion of such cases is assumed as a challenging task for most 
automated detection systems. This challenge relates to the 
speciality of nodules characteristics with regard to their size, 
thickness and location. The performance of a lung nodule 
abnormalities based on machine learning algorithms is 
measured by calculating sensitivity, specificity and accuracy. 
After applying the both proposed classifiers on zero-change 
dataset, the confusion matrix was generated using leave-
one-out-cross-validation. Table 2 presented the values of 
sensitivity, specificity and accuracy. The accuracy value 
refers to the total number of correct predictions that was 
made by each classifier. As shown the polynomial kernel 
SVM outperformed the other algorithm. Sensitivity repre-
sents the ability of the system that can detect nodules abnor-
malities which is matter. While specificity represents the 
system ability to identify the normality of nodules. In addi-
tion, the sigma value that resulted from t-test as shown in 
Table 3 has shown that the differences between the marked 
images (radiologist opinions) and the system outcomes is 
not significant.


169J Contemp Med Sci | Vol. 5, No. 3, May–June 2019: 165–169

Original
Detection system for lung nodule abnormalities based on machine learning algorithmsM. Alsallal et al.

This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License which allows users to read, copy, distribute and make derivative 
works for non-commercial purposes from the material, as long as the author of the original work is cited properly.

Conclusion
This paper produced a detection system for categorising lung 
nodules as either normal or abnormal. This paper system worked 
on a notion of integrating several techniques; image acquisition, 
features engineering to accurately generate a clear depiction of 
nodules images. The zero-change dataset was used for the 
 purpose of this paper proposed system. The experimental results 
have shown that the proposed system has scored high accuracy 
by polynomial kernel SVM. A set of distinguishable  representative 
features which are correlated together by a statistics association. 

Also, this designed system can be considered as a benchmark for 
developing of other tissues abnormalities signs detection systems. 
In terms of developing the proposed system in future, the authors 
intend to integrate more intelligent techniques for feature extrac-
tion to facilitate the detection capabilities. Furthermore deep 
learning techniques will be involved to compare two generation 
of machine learning algorithms performance.

Conflicts of interest
None. 

References
 1. Kumar D, Wong A, Clausi DA. Lung nodule classification using deep features 

in CT images. In: 2015 12th Conference on Computer and Robot Vision 
(CRV), IEEE, Halifax, NS, Canada, 2015, pp. 133-138.

 2. Krishnan K, Ibanez L, Turner WD, Jomier J, Avila RS. An open-source 
toolkit for the volumetric measurement of CT lung lesions. Opt Express. 
2010;18:15256–15266.

 3. Sajda P. Machine learning for detection and diagnosis of disease. Annu. Rev. 
Biomed. Eng. 2006;8:537–565.

 4. Bengio Y, Courville A, Vincent P. Representation learning: a review and new 
perspectives. IEEE Trans Pattern Anal Mach Intell. 2013; 35:1798–1828.

 5. Xu Y, Mo T, Feng Q, Zhong P, Lai M, Chang EI. Deep learning of feature 
representation with multiple instance learning for medical image analysis. 
In: 2014 IEEE International Conference on Acoustics, Speech and Signal 
Processing (ICASSP), IEEE, Florence, Italy, 2014, pp. 1626–1630.

 6. Bellotti R, De Carlo F, Gargano G, Tangaro S, Cascio D, Catanzariti E, et al.  
A CAD system for nodule detection in low-dose lung CTs based on region 
growing and a new active contour model. Med Phys. 2007; 34:4901–4910.

 7. Riccardi A, Petkov TS, Ferri G, Masotti M, Campanini R. Computer-aided 
detection of lung nodules via 3D fast radial transform, scale space 
representation, and Zernike MIP classification. Med Phys. 2011;38:1962–1971.

 8. Camarlinghi N, Gori I, Retico A, Bellotti R, Bosco P, Cerello P, et al. 
Combination of computer-aided detection algorithms for automatic lung 
nodule identification. Int J Comput Assist Radiol Surg. 2012; 7:455–464.

 9. Abdulla AA, Shaharum SM. Lung cancer cell classification method using 
artificial neural network. Inform Eng Lett. 2012;2:49–59.

10. Kuruvilla J, Gunavathi K. Lung cancer classification using neural networks for 
CT images. Comput Methods Programs Biomed. 2014;113:202–209.

11. Moler EJ, Chow ML, Mian IS. Analysis of molecular profile data using 
generative and discriminative methods. Physiol Genomics. 2000;4:109–126.

12. Liu Y. Active learning with support vector machine applied to gene 
expression data for cancer classification. J Chem Inf Comput Sci. 
2004;44:1936–1941.

13. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. 
Support vector machine classification and validation of cancer tissue samples 
using microarray expression data. Bioinformatics. 2000;16:906–914.

14. Segal NH, Pavlidis P, Antonescu CR, Maki RG, Noble WS, DeSantis D,  
et al. Classification and subtype prediction of adult soft tissue sarcoma by 
functional genomics. Am J Pathol. 2003;163:691–700.

15. Segal NH, Pavlidis P, Noble WS, Antonescu CR, Viale A, Wesley UV, et al. 
Classification of clear-cell sarcoma as a subtype of melanoma by genomic 
profiling. J Clin Oncol. 2003;21:1775–1781.

16. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive 
evaluation of multicategory classification methods for microarray gene 
expression cancer diagnosis. Bioinformatics. 2005;21:631–643.

17. Alsallal M. A Machine Learning Approach for Plagiarism Detection.  
PhD diss., Coventry University, 2016.

18. Vapnik V. The Nature of Statistical Learning Theory. Springer Science & 
Business Media, New York, 2013.

19. Aizerman MA, Braverman EM, Rozonoer LI. Theoretical foundations of the 
potential function method in pattern recognition learning. Automation 
Remote Contr. 1964;25:821–837.

20. Duin RPW, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax DMJ, et al. 
PRTools4.1. A Matlab Toolbox for Pattern Recognition, Delft University of 
Technology. 

dx.doi.org/10.22317/jcms.06201909