 Kurdistan Journal of Applied Research (KJAR) Print-ISSN: 2411-7684 | Electronic-ISSN: 2411-7706 Website: Kjar.spu.edu.iq | Email: kjar@spu.edu.iq Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 86 Computer Aided Diagnostic System for Blood Cells in Smear Images Using Texture Features and Supervised Machine Learning Shakhawan Hares Wady 1 Applied Computer Science College of Medicals and Applied Sciences Charmo University 2 Department of Information Technology University College of Goizha Sulaimani, Iraq Shakhawan.hares@charmouniversity.org Article Info ABSTRACT Volume 7 - Issue 1- June 2022 DOI: 10.24017/Science.2022.1.8 Article history: Received: 22/4/2022 Accepted: 14/6/2022 Leukemia is a type of blood cancer that affects White Blood Cells (WBCs) and causes bone marrow destruction. A Complete Blood Count (CBC) and bone marrow aspiration are the most frequent tests used to detect Acute Lymphoblastic Leukemia (ALL). If not identified early enough, the condition can be fatal. In this context, an intelligent framework is designed to detect hematological disorders like leukemia (blood cancer). The feature extraction was performed using Center Symmetric Local Binary Pattern (CSLBP), Gabor Wavelet Transform (GWT), and Local Gradient Increasing Pattern (LGIP). The framework combined the extracted features and then fed them into machine learning classifiers, including Decision Tree (DT), Ensemble, K-Nearest Neighbor (KNN), Naïve Bayes (NB), and Random Forest (RF)). The ALL-IDB2 database was utilized as the training set to create a balanced database with 260 blood smear images. Consequently, a recommended model was established using numerous individual and combined feature extraction methodologies to generate the optimum feature set. The investigational consequences demonstrate that the developed feature fusion strategy surpassed previous techniques, with 97.49 ± 1.02% accuracy utilizing the Ensemble classifier. Keywords: Leukemia diagnosis, blood smear, feature extraction, machine learning. mailto:Shakhawan.hares@charmouniversity.org Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 87 1. INTRODUCTION Hematology analyzes blood and blood-forming structures, such as the identification, therapy, and cure of blood infection, myeloid tissue cells, cellular immunity disorders, fibrinolytic, and nutritive subsystems. Medical experts perform and evaluate various diagnostic procedures to support disease diagnostic and therapeutic specialists. They deal with blood and myeloid tissue to provide comprehensive diagnostic care to patients [1]. The volume and anatomical structure produced at any given time are determined by the needs of your body. Estimating how myeloid tissue cells respond to a medical situation may be more significant in some circumstances than establishing the patient's hematological condition. Medical professionals frequently analyze blood smear samples for abnormalities; if diseases are diagnosed, they perform a myeloid tissue biopsy and deliver a diagnosis within a brief time. As demonstrated in Figure 1A, the fundamental function of myeloid tissue is to create red blood cells (RBCs), platelets, and white blood cells (WBCs) [2]. Typically, RBCs are the cells grown the most and have the highest proportion (Figure 1B). The Complete Blood Count (CBC) is a hematological diagnostic procedure that produces data that can be utilized to identify a disorder. The CBC assesses the formation of all cell components, measures the patient's oxygen-carrying abilities via RBC counts, and checks the immune system via differential WBC counts. This examination facilitates the identification of anemia, various cancers, diseases, and a diversity of other disorders, along with the tracking of medication health consequences [3]. Figure 1: Blood smear components. (a) growth of blood cells in the myeloid tissue, (b) normal blood, and (c) blood during leukemia [4]. A variety of diseases can impact the approximate quantity of WBCs and their appearance on a blood film. This is evident in viral infections that increase WBCs, whereas the much more severe symptoms (as depicted in Figure 1C) are almost probably leukemias [1]. ALL- leukemia cells are a kind of blood or myeloid tissue malignancy in which the body generates malignant WBCs. Due to this irregular blood cellular division, the blood, lymphatics, and Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 88 myeloid tissue are all affected, exposing the immunological system to risk [5]. They can also impede RBCs and platelet formation in the stem cells. Additionally, those malignant WBCs can enter the bloodstream and lead to damage to specific organs in the body including the brain, kidney, spleen, liver, and other organs, leading to the growth of further severe malignancies. Leukemia is classified as acute or chronic depending on how quickly it grows or worsens. Acute leukemia appears rapidly and spreads immediately. Depending on the level of infected cell, ALL-leukemia cells can be categorized as acute lymphoblastic or myelogenous leukemia [6]–[8]. This research concentrates on ALL although it is expected to have a greater chance of survival than other varieties. Leukemia can only be diagnosed with a comprehensive examination of stained blood smears. Manual sample collection in anomalies in slide preparation, leading in non-standardized, unreliable, and inconsistent evaluations due to its complicated structure of WBCs. As a consequence, an expense and reliable computerized framework is required vital to satisfy the need for truthful analysis and identification without being influenced by the knowledge, weariness, or operational exhaustion of medical experts. As a basis, numerous Computer Aided Diagnosis (CAD) techniques for assessing blast cells in blood images have been developed. This work provides a methodology for leukemia classification and examines the effect of various feature methods on the classification process. The key contribution of this study is to fuse the features extracted from blood seamer images in order to improve the overall accuracy and thereby reduce the misclassification error rate. The rest of the article is structured in the following manner. Section 2 presents a review of related research. Section 3 comprises an overview of the CAD system architecture, a description of the ALL-IDB database, image pre- processing, feature extraction, feature fusion and classification, and performance measurements. Section 4 summarizes the experimental results were obtained by comparing numerous feature extraction to current methods. Lastly, Section 5 addresses the conclusion of the study. 2. LITERATURE REVIEW In medical image computing and processing, particularly in the area of ALL, machine learning (ML) and image processing approaches have delivered exceptional achievements [9]. Various procedures are widely used to determine microscopic smears for the diagnosis of ALL- leukemia cells. For the efficient recognition of these disease-causing disorders, a range of methodologies have been extensively utilized. The mechanisms presented comprise CNNs, supervised learning, feature extraction, and feature selection [10]. A brief review of some significant achievements from previous studies was provided. The study [11] intended to build an improved classification algorithm based on peripheral blood smear images that could categorize ALL-leukemia cells subtypes. Cytoplasmic vacuoles and the uniformity of the nuclear envelope of ALL-leukemia cells were the only geometrical features used in this study. Support Vector Machine (SVM), KNN, and Artificial Neural Network (ANN) with various measurement functions were considered and fine-tuned using the ALL-IDB2 database. Utilizing pre-trained AlexNet and fine-tuning, the authors in [5] developed a deep CNN learning algorithm for the classify of ALL-leukemia cells and its subtypes based on the ALL- IDB database supplemented using 50 private images. The paper [12] investigated and proposed a micro-pattern descriptor for identifying cancer cells and non-cancerous cells. Accordingly, a developed framework was developed by combining two feature extraction approaches, Local Directional Number Pattern (LDNP), and the Multi-scale Weber Local Descriptor (MWLD) methods, and loaded through ML classifiers (DT, Ensemble, KNN, NB, and RF). Investigational consequences demonstrate that the designed feature methodology guaranteed an acceptable performance when compared to other current studies. The study [13] was concerned with building an effective automatic approach for identifying ALL-leukemia cells. The presented scheme comprised of two phases. The initial phase was to separate the WBCs. Relevant features including shape, statistical, geometry, and Discrete Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 89 Cosine Transform (DCT) were derived from the fragmented regions in the second phase. To classify the segmented ALL-leukemia cells as healthy or abnormal, numerous classification approaches were applied to the derived features. The authors of [14] recommended that blood smear images be used to identify WBCs using a YOLOv2-Nucleus-Cytoplasm systematic model. Particle Swarm Optimization (PSO) has been used to improve the Bag-of-Features (BoF) generated from WBC images of blood smears for identification. Leukocyte-Images for Segmentation Classification (LISC) and ALL-IDB have been used to determine the identification outcomes. On both the ALL-IDB1 and ALL-IDB2 databases, Optimized Nave Bayes (ONB) exceeded Optimized Discriminant Analysis (O-DA) method. The ONB classification algorithm, on the other hand, outperformed the ODA classification method on the LISC database. The article [15] utilized deep CNNs to handle the ALL identification task . To provide a superior ALL-leukemia cells classification, the weighted ensemble of deep CNNs was investigated. The weights were calculated using the appropriate measurements of the ensemble candidates, including the Area Under the Curve (AUC), F1-score, and kappa coefficients. To generate a higher adaptation of the system, numerous data extensions and pre- processing were applied and the C-NMC-2019 ALL database was used to train and test a developed framework. The authors of [16] presented a classification scheme for ALL-leukemia cells and its subtypes. Firstly, a thresholding procedure was used to differentiate the ROI of lymphoblast from bone marrow aspirations. CNN, i.e., AlexNet, was employed for categorization. For the evaluation, the 330 Images database was employed and the classification accuracy obtained was 97.78 %. The study [17] employed NB and KNN classifiers to categorize cancerous and benign cells based on geometric, color, statistical, and textural features. The accuracy of the categorization reached 92.8 % when 60 image smears were employed. The study [18] proposed a technique for classifying Acute Myeloid Leukemia (AML) and its subtypes M4, M5, and M7. Primarily, a color k-means method was performed for cell segmentation. Applying multi-class SVM classifier, classification was performed utilizing six statistical features. This resulted in segmentation accuracy of 87.00 % and classification accuracy of 92.90 %. The research utilized microscopic images of WBCs to design a computer-based enterprise development for identifying and categorizing chronic lymphocytic leukemia based on Enhanced Virtual Neural Network (EVNN) classification in [19]. The suggested scheme had the maximum accuracy in identifying and categorizing ALL-leukemia cells using WBC images. In terms of accuracy, specificity, sensitivity, and error rate, the proposed technique ranked 76.60%, 89.90%, 97.80%, and 2.20 %, respectively. The study [20] presented a refined DL technique for accurate segmentation and categorization of WBCs. Preprocessing-based identification and segmentation were the two primary methods in the proposed approach. Simulated scans were performed with a Generative Adversarial Network (GAN) and standardized with color conversion throughout preprocessing. Pretrained deep architectures, including ShuffleNet and, DarkNet-53 were performed to extract the optimal deep features out of each blood smear image. Principal Component Analysis (PCA) was employed to choose more relevant features, which were then combined sequentially for identification task. To diagnose ALL, the authors of [21] investigated an effective leukemia classification algorithm that employed two texture features extracted from the nucleus image, i.e. LBP and GLCM approaches. The ALL-IDB2 database, which contains 260 (130 normal and 130 blast) blood smear images, was utilized to train a two-class classification algorithm. With classification performance of 93.84 % and 87.30 %, respectively, LBP texture features outscored GLCM texture features. The research paper [22] focused on a methodology for effectively identifying WBCs in complex blood smear images employing the Watershed Transform (WST) and circle fitting methodology. In preprocessing phases, the quantitative strategy incorporated segmentation and edge mapping derivation, and also parameterized circular estimation, that detected either isolated and overlapping WBCs, to differentiate overlapping WBCs. The system was estimated on a database of 384 WBCs images from the ALL-IDB and ASH image collections, with considerable overlap. Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 90 All of the earlier approaches outlined above have the limitation of classifying ALL cells from microscopic according to subtype. This research developed an intelligent framework for classifying blood smear images into healthy and leukemic blood cells. For feature extraction, the framework utilized CSLBP, GWT, and LGIP descriptor approaches. A balanced set of 260 blood smear images from the ALL-IDB2 database was utilized as training and testing sets. Additionally, to tackle the most challenging features of identifying ALL-leukemia cells in microscopic blood images, a recommended model was constructed through using various independent and combination feature extraction techniques. 3. METHODS AND MATERIALS 3.1 System Architecture The presented scheme utilized blood smear images as data required to categorize ALL- leukemia cells. To start, the methodology converted RGB color images to gray scale images and removed irrelevant regions to identify the areas of interest of normal and malignant cells. In addition, the approach evaluated at three various feature extractors: CSLBP, GWT, and LGIP. The CSLBP approach was being computed to extract a feature set from blood smear images using the ALL-IDB2 database. Subsequently, from the same blood smear images, the GWT and LGIP algorithms were also employed to extract two additional feature vectors. As training database, numerous individual and combination extracted features were generated and provided into the different classifiers. Lastly, applying five well-known predictors, the combined features were being applied to categorize blood smear images to categorize normal and abnormal blood cells. Figure 2 depicts the critical phases of the suggested design methodology. Figure 2: Workflow of proposed system Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 91 3.2 Image Pre-Processing In the pre-processing step, after transforming the original microscopic blood images from RGB to grayscale image, the region of interest was identified, cropped, and resized to an appropriate dimension of 256 x 256 pixels. To generate relevant information, the region of interest on the blood smear images was identified by a substantial fraction primarily the WBC cells region. Next, to achieve the enhancing effect and better classification performance, the contrast-limited adaptive histogram equalization method and the median filter were applied. Lastly, image adjustment was utilized to increase the performance of the blood smear images before they were fed into the feature extraction stage. 3.3 Feature Extraction This stage entails extracting relevant features from input data to be utilized in identification tasks [23]. Three groups of features, including CSLBP, GWT, and LGIP, were suggested and designed in this work to differentiate blood smear images into normal and abnormal blood cells. Prior to implementing the fusion process, a normalization process was performed on the extracted feature sets, which is the most popular strategy for minimizing the range of numerical data. 3.3.1 Center Symmetric Local Binary Pattern (CSLBP) Ojala et al. [24] presented the basic LBP generator, which is a robust feature descriptor that utilizes both shape and texture data to describe facial features. The LBP generator assigns a labeling to each pixel in the image through thresholding the intensity values in each pixel's 3 x3 neighborhood with the centered pixel intensity, then transforming the outcome to a binary representation applying Equation (1). ( ) ∑ ( ) ( ) ( ) { } (2) where symbolizes the pixel intensity of the central pixel ( ), symbolizes the gray value of the eight nearby pixels, and ( ) symbolizes the adaptive threshold operation function. Taking the data clockwise, beginning with the top left neighboring [25], yields the binary result. Figure 3 depicts the basic LBP generator. Figure 3: The Basic LBP generator Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 92 The LBP generator is difficult to apply in the scope of an area identifier since it provides extremely extended histograms. The strategy for measuring pixels in the neighbor was adapted towards another addition to the previous LBP generator to respond to the challenges. The central symmetrical pairings of pixels was examined rather than each pixel to a central pixel [26]. Heikkila et al. introduced a CSLBP depending on a localized neighborhood's centered symmetrical pair of pixels [27]. The central pixel's value is omitted in CSLBP, as well as the image can only be processed utilizing 16 bins. A mathematical equation for the CSLBP generator is as follows: ( ) ∑ ( ( ⁄ )) ( ⁄ ) ( ) ( ) { } (4) where ( ⁄ ) represent the gray values of pixel pairings on a circle of radius R that are center-symmetric with reference to the central pixel at ( ), and ( ) is the adaptive threshold operational function. In the same way, the CS-LBP works that certain gradient generators evaluate gray level variations within couples of contrary pixels in a neighborhood. A feature for every pixel of the area was generated in this work using the CS-LBP generator, that was influenced by the LBP generator, and then a collection of 16 features for every image was derived from the database of ALL-IDB2 images. 3.3.2 Gabor Wavelet Transform (GWT) The local characteristics of an image, which including spatial localization, spatial frequency, and directional selection, are depicted using Gabor-wavelets. Gabor-wavelets therefore are utilized in a variety of disciplines, involving texture investigation and image segmentation tasks [28]. The spatial (2-D) Gabor filter is a Gaussian kernel function tuned throughout the spatial domain by a complicated sinusoidal plane wave (a plane wave for 2-D Gabor filters), expressed by, ( ) ( ( ̂ ̂ ) ) ( ̂ ) (5) ̂ (6) ̂ (7) Where the variables of ̂ and ̂ explicitly state the location of a light impulse in the peripheral vision, is the direction of a Gabor function's regular to parallel stripes, is the Gaussian standard deviation, and is the spatial aspect ratio that designates the ellipticity of a Gabor function's support. Utilizing Gabor wavelets filters in 4 distinct scales and 6 directions, a collection of 48 features for every image was derived from the ALL-IDB2 image database. 3.3.3 Local Gradient Increasing Pattern (LGIP) LGIP is a pixel-based binary image identifier which is robust to fluctuations in lighting and white noise. As a consequence, it can be operated to generate the binary vectors for the horizontal and vertical directions resulting through fractional image partition [29]. LGIP is used to represent the magnitude and direction of an increasing trend in local intensity. LGIP first computes gradients response in each pixel's 8 potential directions applying Sobel masking as shown in Figure 4. According to the gradient value's sign, each mask's gradient value is encoded into a single bit (1 or 0). As a consequence, every pixel in the truncated CXR image is allocated an 8-bit coding identifier. The resulting bit is set to 1 or 0 if the mask for the pixel response is positive. Therefore, an 8-bit value is produced for every pixel, with each bit corresponding to the output of a certain mask. To speed up calculation, the eight bits can also be computed via intensity comparisons between the central pixel and its neighbors, as with the LBP generator. The Sobel gradient generator was utilized in this study Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 93 to improve stability in the presence of non-uniform light variations and random noise, and a collection of 37 features for each CXR image was derived from the ALL-IDB2 database. Figure 4: Sobel gradient masks in eight orientations [29]. 3.4 Feature Fusion and Classification The data combination has been utilized to a broad variety of ML and computer vision disciplines. Using a feature fusion method, an extracted feature vector that can be concatenated with another set of features created in the system design. Multi-feature fusion is significantly boost the model's predictability [25]. A fusion of feature vectors was proposed in this paper, which was conducted using a combination of CSLBP (1 x 16), GWT (1 x 48) and LGIP (1x 37) methods. Equations (8), (9), and (10) designate features derived by CSLBP, GWT, and LGIP, respectively. Equation (11) describes how the derived feature vectors were combined through concatenation procedure. { } ( ) { } ( ) { } ( ) ( ) { } ( ) The CSLBP, GWT, and LGIP features were then combined with 101 features. This fusion sequence was provided to the classifiers to evaluate the recommended scheme and classify blast cells in blood smear images, and it functioned as the concluding input for both the training and testing the ALL-IDB2 database. ML algorithms were utilized to detect patients with leukemia in the proposed workflow. To achieve the objective of diagnosing ALL- leukemia cells among normal and healthy individuals, five supervised ML classifiers, DT, Ensemble, KNN, NB, and RF classifiers, were applied for making the final predictions. 3.5 Performance Metrics Performance metrics is employed to determine the parameter space and feature extraction outcomes from multiple models. To evaluate the proposed model's performance in the classification of ALL-leukemia cells, the confusion matrix was used to calculate 7 popular performance measures: accuracy, precision, sensitivity, specificity, F1-score, MCC, and misclassification rate metrics. Four different performance parameters, True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN), were employed to generate the metrics defined by Equations (12)– (18). Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 94 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) √( ) ( ) ( ) ( ) ( ) ( ) ( ) 4. EXPERIMENTAL RESULTS The recommended method's primary objective is to differentiate between normal and abnormal ALL-leukemia cells. In this section, extensive experimentation was carried out to evaluate the recommended approach's performance in terms of confusion matrix metrics, including accuracy, precision, sensitivity, specificity, F1-score, MCC and misclassification error rate. Furthermore, the method was compared to the most current existing methods. 4.1 Database Description The investigation was concluded through utilizing ALL-IDB database , comprised of two image clusters ALL-IDB1 and ALL-IDB2 [30], and performance measurements. The identification systems' functionality was assessed using images of leukemic smears and also images of non-leukemic smears from the ALL-IDB2 database. The ALL-IDB2 database is a set of cropped regions of interest generated from the ALL-IDB1 database, and ALL-IDB2 images have gray scale characteristics that are similar to ALL-IDB1 database. There are 260 images throughout this database, half of which are healthy cells and half of which are blast cells. The database is labeled, ImIN_Y.jpg and it comprises a collection of the normal and blast cells' regions of interest. In the visual form, IN denotes a three-digit numeric value, whereas Y denotes a Boolean. As Y is 1, the raw sample equivalent to the patient is a normal one. When the value of Y is 0, conversely, the associated patient is diagnosed with ALL- leukemia. Figure 5 shows two examples of images from the ALL-IDB2 database: healthy lymphocytes and probable blast cells. Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 95 Figure 5: Example images contained in the ALL-IDB2: healthy lymphocyte (first row) and lymphoblast cells (second row). 4.2 Results and Discussion The recommended system's functionality was calculated employing extracted features generated from CSLBP, GWT, and LGIP values to recognize and classify ALL-leukemia cells. Numerous methodologies, each with its own set of features, as well as various combinations of CSLBP, GWT, and LGIP features, have been suggested. To see which scenario may achieve acceptable results, those derived features subsequently categorized utilizing five supervised classification models (DT, Ensemble, KNN, NB, and RF) with each scenario. Furthermore, the complete database was split into two groups: 80% for training the model and 20% for examining performance of the classifier through using holdout cross- validation procedure. Using the DT (Table 1), Ensemble (Table 2), KNN (Table 3), NB (Table 4), and RF (Table 5) classifiers, the comprehensive category/class wise assessment from each scenario was measured by means of accuracy and average accuracy as (mean ± SD). According to the results shown in table 1, it is concluded that combining the GWT and LGIP features (scenario 6) together attained the maximum overall accuracy of 93.07 ± 4.27%, whereas the classification overall accuracy of features extracted from CSLBP (scenario 1) technique had the lowest recording 79.80 ± 6.85 % when the DT classifier method was employed. In the case of NB classifier use (Table 2), it can be observed that the maximum classification overall accuracy of 90.19 ± 4.48 % was attained along with combining features extracted from the CSLBP (16 features), and LGIP (37 features) approaches, however GWT technique gave the lowermost performance outcomes recording 80.76 ± 4.79% as compared to the other scenarios. Table 1: The quantitative classification accuracies of various modeling approaches based on the DT classifier. The bold values indicate the highest classification accuracy. Methods Extracted Features Per Class Accuracy (%) Overall Accuracy (%) Regular cells ALL affected CSLBP 16 83.46 ± 8.70 76.15 ± 8.65 79.80 ± 6.85 GWT 48 90.38 ± 4.15 95.38 ± 2.37 92.88 ± 3.51 LGIP 37 87.69 ± 7.43 88.84 ± 5.64 88.26 ± 5.76 CSLBP + GWT 64 88.46 ± 5.12 92.30 ± 4.44 90.38 ± 3.51 CSLBP + LGIP 53 89.61 ± 6.02 90.38 ± 5.51 89.99 ± 4.42 GWT+ LGIP 85 92.69 ± 3.82 93.46 ± 3.02 93.07 ± 4.27 CSLBP + GWT + LGIP 101 91.15 ± 4.07 89.99 ± 4.51 90.57 ± 3.19 Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 96 Table 2: The quantitative classification accuracies of various modeling approaches based on the NB classifier. The bold values indicate the highest classification accuracy. Methods Extracted Features Per Class Accuracy (%) Overall Accuracy (%) Regular cells ALL affected CSLBP 16 89.61 ± 3.16 85.76 ± 6.29 87.69 ± 3.64 GWT 48 83.07 ± 8.14 78.46 ± 5.79 80.76 ± 4.79 LGIP 37 84.99 ± 5.87 83.84 ± 5.67 84.42 ± 8.38 CSLBP + GWT 64 90.76 ± 4.86 80.76 ± 5.43 85.76 ± 4.64 CSLBP + LGIP 53 91.53 ± 4.36 88.84 ± 5.86 90.19 ± 4.48 GWT+ LGIP 85 93.07 ± 2.37 82.69 ± 9.46 87.88 ± 5.87 CSLBP + GWT + LGIP 101 91.92 ± 4.23 84.99 ± 6.24 88.46 ± 6.72 According to the results in the Tables 3, 4, and 5, it is revealed that fusing the CSLBP, GWT, and LGIP features (scenario 7) together achieved the maximum overall accuracy of 97.49 ± 1.02%, 94.99 ± 2.74%, and 94.80 ± 2.40% by using Ensemble, KNN, and RF classifiers respectivel. Conferring to the consequence obtained in Tables 1–5, fusing the CSLBP, GWT, and LGIP features (scenario 7) together recorded the highest average accuracy of 97.49 ± 1.02 % with the Ensemble classifier, followed by the combination of the CSLBP and LGIP features (scenario 5) with 97.11 ± 0.86 % with the KNN classifier, and the classification average accuracy of feature extracted from CSLBP approach seemed to have the lowest recording (79.80 ± 6.85%). Table 3: The quantitative classification accuracies of various modeling approaches based on the Ensemble classifier. The bold values indicate the highest classification accuracy. Methods Extracted Features Per Class Accuracy (%) Overall Accuracy (%) Regular cells ALL affected CSLBP 16 91.15 ± 6.29 92.69 ± 3.13 91.92 ± 5.42 GWT 48 94.61 ± 2.86 94.61 ± 2.49 94.61 ± 2.03 LGIP 37 94.61 ± 2.13 95.76 ± 1.36 95.19 ± 2.75 CSLBP + GWT 64 94.99 ± 3.07 97.30 ± 0.59 96.15 ± 2.86 CSLBP + LGIP 53 95.76 ± 3.36 98.46 ± 0.98 97.11 ± 0.86 GWT+ LGIP 85 96.15 ± 1.62 97.30 ± 1.16 96.73 ± 1.22 CSLBP + GWT + LGIP 101 96.15 ± 2.56 98.84 ± 0.59 97.49 ± 1.02 Table 4: The quantitative classification accuracies of various modeling approaches based on the KNN classifier. The bold values indicate the highest classification accuracy. Methods Extracted Features Per Class Accuracy (%) Overall Accuracy (%) Regular cells ALL affected CSLBP 16 92.69 ± 3.12 92.30 ± 3.14 92.49 ± 3.56 GWT 48 94.23 ± 2.15 93.46 ± 3.29 93.84 ± 4.32 LGIP 37 91.15 ± 5.14 80.00 ± 5.67 85.57 ± 3.42 CSLBP + GWT 64 94.99 ± 2.59 93.84 ± 3.13 94.42 ± 2.11 CSLBP + LGIP 53 94.23 ± 3.53 92.69 ± 4.60 93.46 ± 3.64 GWT+ LGIP 85 94.23 ± 2.51 93.84 ± 2.86 94.03 ± 3.99 CSLBP + GWT + LGIP 101 94.61 ± 3.13 95.38 ± 2.53 94.99 ± 2.74 Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 97 Table 5: The quantitative classification accuracies of various modeling approaches based on the RF classifier. The bold values indicate the highest classification accuracy. Methods Extracted Features Per Class Accuracy (%) Overall Accuracy (%) Regular cells ALL affected CSLBP 16 93.84 ± 3.71 83.84 ± 8.26 88.84 ± 3.93 GWT 48 91.15 ± 5.45 93.84 ± 3.71 92.49 ± 3.44 LGIP 37 93.84 ± 2.79 86.15 ± 6.58 89.99 ± 4.13 CSLBP + GWT 64 94.61 ± 2.86 88.84 ± 4.60 91.73 ± 3.39 CSLBP + LGIP 53 94.99 ± 3.64 87.30 ± 4.45 91.15 ± 1.85 GWT+ LGIP 85 94.99 ± 2.07 91.92 ± 3.57 93.46 ± 3.64 CSLBP + GWT + LGIP 101 94.61 ± 3.71 94.99 ± 3.64 94.80 ± 2.40 Considering on the empirical outcomes for all seven scenarios provided in Figure 6, it is quite obvious that the various combinations of features generated from CSLBP, GWT, and LGIP techniques have a beneficial influence on the overall effectiveness and dominate the other scenarios including all classifiers. The results exposed that the fusion of the CSLBP, GWT, and LGIP approaches reached the highest average accuracy of 97.49 ± 1.02 %, 94.99 ± 2.74 %, and 94.80 ± 2.40 % using Ensemble, KNN, and RF classifiers respectively. Furthermore, the fusion of GWT and LGIP (scenario 6) and the fusion of CSLBP and LGIP (scenario 5) approaches achieved the peak average accuracy of 93.07 ± 4.27 %, 94.99 ± 2.74 %, and 90.19 ± 4.48 % using DT and RF classifiers respectively. Accordingly, the combination of CSLBP, GWT, and LGIP approaches with the Ensemble classifier resulted in the highest overall accuracy performance of 97.49 ± 1.02 % among the other learners for all scenarios. Figure 6: Overall accuracies performance comparison of the presented scheme scenarios utilizing various classifiers. The same consequence was discovered when different performance metrics (precision, sensitivity, specificity, F1-score, and MCC) were performed to evaluate the suggested workflow. The outcomes presented that the fusion of the CSLBP, GWT, and LGIP approaches reached the maximum precision rate of 98.87 %, 95.44 %, and 95.1 % performing the Ensemble, KNN, and RF classifiers respectively. Moreover, the features extracted from GWT (scenario 2) method recorded the highest Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 98 precision rate of 95.38 % using the DT classifier; however, the lowest precision rate of 78.03 % was generated utilizing CSLBP technique with the DT classifier. Consequently, including all scenarios, the experimental results confirmed a maximum precision score of 98.87 % with the Ensemble classifier among the other classifiers. Figure 7 summarizes the output of a comparative of the precision percentages of seven scenarios utilizing various classifiers. Figure 7: Precision performance comparison of the presented scheme scenarios utilizing various classifiers. Additionally, the combination of features from the CSLBP, GWT, and LGIP approaches similarly achieved the highest in terms of sensitivity percentage as 96.15 % was attained using the Ensemble classifier (see Figure 8) followed by the fusion of the CSLBP and GWT with sensitivity rate of 93.97 ± 3.67 % with the KNN classifier. conversely, the lowest sensitivity percentage of 83.07 % has been verified when the GWT technique was employed with the NB classifier. Based on the investigational consequences for all five classifiers described in Figure 8, it can be confirmed that the value of sensitivity percentage of fused features was higher than other features with the Ensemble classifier. With regards to the specificity rates, the results shown in Figure 9 proved that the advantage of the features derived from the fusion of the CSLBP, GWT, and LGIP techniques (scenario 7) and it was It undoubtedly produced great scores that clearly outperformed the other scenarios. The optimum result was achieved utilizing the Ensemble classifier using the fusion of CSLBP, GWT, and LGIP features, which outperformed other classifiers with specificity percentages of 95.38 % and 95.00 % for the KNN and RF classifiers, respectively. The specificity percentage of the CSLBP approach utilizing the DT classifier, on the other hand, was the lowest (76.15 %). Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 99 Figure 8: Sensitivity performance comparison of the presented scheme scenarios utilizing various classifiers. Figure 9: Specificity performance comparison of the presented scheme scenarios utilizing various classifiers. In terms of F1-score percentages, the results presented in Table 10 prove the effectiveness of the CSLBP, GWT, and LGIP approaches when combined, and it clearly generated outstanding results that dominate its other scenarios. The combination of the CSLBP, GWT, and LGIP features with the Ensemble classifier yielded the greatest consequences (F1-score rate of 97.46 %), followed by F1-score percentage of 94.96% with the KNN classifier, and the classification F1-score proportion of feature extracted from the GWT and LGIP methods yielded the lowest scoring (76.48 %) with the NB classifier. The results regarding MCC rates were very satisfactory. According to the consequences in Figure 11, it was discovered that combining the CSLBP, GWT, and LGIP features together attained the maximum MCC of 95.08 % with the Ensemble classifier followed by 90.11% with the KNN Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 100 classifier, whereas the classification MCC of features derived from the CSLBP technique had the lowest scoring (60.12 %) with the DT classifier. Figure 10: performance comparison of the presented scheme scenarios utilizing various classifiers. Figure 11: MCC performance comparison of the presented scheme scenarios utilizing various classifiers. The experimentations from Figure 12 also noticeably reported that the features extracted by the combination of the CSLBP, GWT, and LGIP approaches surpassed alternative scenarios and scored the maximum performance using the Ensemble classifier with regard to other. Considering the recorded results, the highest precision, sensitivity, specificity, F1-score, and MCC consequences of the features derived utilizing combined the CSLBP, GWT, and LGIP approaches were 98.87 %, 96.15 %, 98.84 %, 97.46 %, and 95.08 % respectively, and was obtained using 101 effective features. Whereas, the lowest precision, sensitivity, specificity, F1-score, and MCC scores were realized using the CSLBP method (78.03 %) with the DT classifier, the GWT method (83.07 %) with the NB classifier, the CSLBP method (76.15 %) with the DT classifier, the GWT and LGIP Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 101 methods (76.48 %) with the DT classifier, and the CSLBP method (60.12 %) with the DT classifier respectively. Figure 12: System performance comparison for the various scenarios using Ensemble classifier. The effectiveness of the suggested scenarios was indeed evaluated in this study using the same database and computational situation through misclassification error rate measurement. The misclassification error rates for the developed models were evaluated, as shown in Figure 13. The results showed that integrating the CSLBP, GWT, and LGIP features with the Ensemble classifier leads in a minimal misclassification error of 2.51%, indicating that the recommended scenario performed considerably improved than other possible scenarios. As a result, this scenario was approved as a proposed technique for categorizing ALL-leukemia cells. Figure 13: Misclassification error rate comparison for various scenarios using Ensemble classifier. Finally, the recommended fusion system's performance was compared to several current framework techniques, as can be seen in Table 6. As compared with conventional approaches, the suggested system generates remarkable results, especially in terms of overall accuracy rate. This is owing to the integration of the CSLBP, GWT, and LGIP approaches, which resulted in attaining its strengths. Furthermore, the other studies used a large number of features, whereas the current proposal only 101 features and produced the optimum outcomes. Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 102 Table 6: The performance comparison literature for ALL-leukemia diagnostic techniques. Previous study ALL-IDB2 database Accuracy (%) Year Classifier Das et al. [31] 2020 GLRLM +SVM (RBF) 96.00 S. Praveena et al. [32] 2020 GreyJOA +DCNN 93.50 Mondal et al. [15] 2021 Xception 93.90 Pradeep Kumar Das et al. [33] 2021 MobilenetV2 + ResNet18 97.18 Proposed work 2022 CSLBP + GWT + LGIP +Ensemble 97.49 The aforementioned extensive experiments indicate that the designed scheme can accurately discriminate normal cell instances from blasts in blood smear images, that might serve therapists make a clear conclusive decision on respective diagnostic experts' opinions and the developed instrument. 5. CONCLUSION Early diagnosis of ALL in white blood cells is essential to reduce disease risk. The key objective of the proposed work is to perform ALL-leukemia cells classification utilizing raw blood smear images using feature fusion as well as a ML method. Each trained framework was validated utilizing standardized performance measures in seven various scenarios. The recommended architecture was validated employing images of microscopic thin blood smears from the ALL-IDB2 database. When compared to the ground truth obtained utilizing features extracted using individual feature extraction methodologies such as CSLBP, GWT, and LGIP, the presented feature fusion workflow seemed to have a superior overall classification accuracy of 97.49 ± 1.02%. Furthermore, the presented framework was proved to be more accurate than previous studies for the categorization of ALL-leukemia cells based on experimental outcomes. REFERENCE [1] C. Di Ruberto, A. Loddo, and G. Puglisi, “Blob detection and deep learning for leukemic blood image analysis,” Appl. Sci., vol. 10, no. 3, 2020, doi: 10.3390/app10031176. [2] G. Drałus, D. Mazur, and A. Czmil, “Automatic detection and counting of blood cells in smear images using retinanet,” Entropy, vol. 23, no. 11, 2021, doi: 10.3390/e23111522. [3] B. George-Gay and K. Parker, “Understanding the complete blood count with differential,” J. Perianesthesia Nurs., vol. 18, no. 2, pp. 96–117, 2003, doi: 10.1053/jpan.2003.50013. [4] G. Soni and K. S. Yadav, “Applications of nanoparticles in treatment and diagnosis of leukemia Applications of nanoparticles in treatment and diagnosis of leukemia,” Mater. Sci. Eng. C, vol. 47, no. April, pp. 156–164, 2018, doi: 10.1016/j.msec.2014.10.043. [5] S. Shafique and S. Tehsin, “Acute lymphoblastic leukemia detection and classification of its subtypes using pretrained deep convolutional neural networks,” Technol. Cancer Res. Treat., vol. 17, pp. 1–7, 2018, doi: 10.1177/1533033818802789. [6] D. A. Arber et al., “WHO Classification 2016 - Myeloid neoplasms and acute leukemia,” Blood, vol. 127, no. 20, pp. 2391–2405, 2016, doi: 10.1182/blood-2016-03-643544.The. [7] F. Huang, P. Guang, F. Li, X. Liu, W. Zhang, and W. Huang, “AML, ALL, and CML classification and diagnosis based on bone marrow cell morphology combined with convolutional neural network: A STARD compliant diagnosis research,” Medicine (Baltimore)., vol. 99, no. 45, p. e23154, 2020, doi: 10.1097/MD.0000000000023154. [8] Y. Dong et al., “Leukemia incidence trends at the global, regional, and national level between 1990 and 2017,” Exp. Hematol. Oncol., vol. 9, no. 1, pp. 1–11, 2020, doi: 10.1186/s40164-020-00170-6. [9] M. Ghaderzadeh, F. Asadi, A. Hosseini, D. Bashash, H. Abolghasemi, and A. Roshanpour, “Machine Learning in Detection and Classification of Leukemia Using Smear Blood Images: A Systematic Review,” Sci. Program., vol. 2021, 2021, doi: 10.1155/2021/9933481. [10] M. Kim, K. Chae, S. Lee, H. J. Jang, and S. Kim, “Automated classification of online sources for infectious disease occurrences using machine-learning-based natural language processing approaches,” Int. J. Environ. Res. Public Health, vol. 17, no. 24, pp. 1–13, 2020, doi: 10.3390/ijerph17249467. [11] F. E. Al-Tahhan, M. E. Fares, A. A. Sakr, and D. A. Aladle, “Accurate automatic detection of acute lymphatic leukemia using a refined simple classification,” Microsc. Res. Tech., vol. 83, no. 10, pp. 1178–1189, 2020, doi: 10.1002/jemt.23509. [12] S. H. Wady, “Classification of Acute Lymphoblastic Leukemia through the Fusion of Local Descriptors,” UHD J. Sci. Technol., vol. 6, no. 1, pp. 21–33, Feb. 2022, doi: 10.21928/UHDJST.V6N1Y2022.PP21-33. Kurdistan Journal of Applied Research | Volume 7 – Issue 1 – June 2022| 103 [13] Z. F. Mohammed and A. A. Abdulla, “An efficient CAD system for ALL cell identification from microscopic blood images,” Multimed. Tools Appl., vol. 80, no. 4, pp. 6355–6368, Oct. 2020, doi: 10.1007/S11042-020- 10066-6. [14] M. Sharif et al., “Recognition of different types of leukocytes using YOLoV2 and optimized bag-of-features,” IEEE Access, vol. 8, pp. 167448–167459, 2020, doi: 10.1109/ACCESS.2020.3021660. [15] C. Mondal et al., “Ensemble of Convolutional Neural Networks to diagnose Acute Lymphoblastic Leukemia from microscopic images,” Informatics Med. Unlocked, vol. 27, p. 100794, Jan. 2021, doi: 10.1016/J.IMU.2021.100794. [16] A. Rehman, N. Abbas, T. Saba, S. I. ur Rahman, Z. Mehmood, and H. Kolivand, “Classification of acute lymphoblastic leukemia using deep learning,” Microsc. Res. Tech., vol. 81, no. 11, pp. 1310–1317, Nov. 2018, doi: 10.1002/JEMT.23139. [17] S. Kumar, S. Mishra, P. Asthana, and Pragya, “Automated Detection of Acute Leukemia Using K-mean Clustering Algorithm,” Adv. Intell. Syst. Comput., vol. 554, pp. 655–670, 2018, doi: 10.1007/978-981-10-3773- 3_64. [18] A. Setiawan, A. Harjoko, T. Ratnaningsih, E. Suryani, Wiharto, and S. Palgunadi, “Classification of cell types in Acute Myeloid Leukemia (AML) of M4, M5 and M7 subtypes with support vector machine classifier,” 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-January, pp. 45–49, Apr. 2018, doi: 10.1109/ICOIACT.2018.8350822. [19] K. Muthumayil, S. Manikandan, S. Srinivasan, J. Escorcia-Gutierrez, M. Gamarra, and R. F. Mansour, “Diagnosis of leukemia disease based on enhanced virtual neural network,” Comput. Mater. Contin., vol. 69, no. 2, pp. 2031–2044, 2021, doi: 10.32604/cmc.2021.017116. [20] S. Saleem, J. Amin, M. Sharif, M. A. Anjum, M. Iqbal, and S.-H. Wang, “A deep network designed for segmentation and classification of leukemia using fusion of the transfer learning models,” Complex Intell. Syst. 2021, pp. 1–16, Jul. 2021, doi: 10.1007/S40747-021-00473-Z. [21] V. Singhal and P. Singh, “Texture Features for the Detection of Acute Lymphoblastic Leukemia,” 2016, pp. 535–543. [22] K. N. Sukhia, M. M. Riaz, A. Ghafoor, and N. Iltaf, “Overlapping white blood cells detection based on watershed transform and circle fitting,” Radioengineering, vol. 26, no. 4, pp. 1177–1181, 2017, doi: 10.13164/re.2017.1177. [23] F. H. Ahmad and S. H. Wady, “COVID‑19 Infection Detection from Chest X‑Ray Images Using Feature Fusion and Machine Learning,” Sci. J. Cihan Univ. – Sulaimaniya, vol. 5, no. 2, pp. 10–30, 2021. [24] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987, 2002, doi: 10.1109/TPAMI.2002.1017623. [25] S. H. Wady and H. O. Ahmed, “Ethnicity Identification based on Fusion Strategy of Local and Global Features Extraction,” Int. J. Multidiscip. Curr. Res., vol. 4, no. April, pp. 200–205, 2016. [26] M. Heikkilä, M. Pietikäinen, and C. Schmid, “Description of interest regions with local binary patterns,” Pattern Recognit., vol. 42, no. 3, pp. 425–436, 2009, doi: 10.1016/j.patcog.2008.08.014. [27] R. Hatibaruah, V. K. Nath, and D. Hazarika, “An effective texture descriptor for retrieval of biomedical and face images based on co-occurrence of similar center-symmetric local binary edges,” Int. J. Comput. Appl., vol. 43, no. 6, pp. 589–600, 2021, doi: 10.1080/1206212X.2019.1590953. [28] S. Lahmiri and M. Boukadoum, “Hybrid discrete wavelet transform and Gabor filter banks processing for mammogram features extraction,” 2011 IEEE 9th Int. New Circuits Syst. Conf. NEWCAS 2011, vol. 2013, pp. 53–56, 2011, doi: 10.1109/NEWCAS.2011.5981217. [29] L. Zhou and H. Wang, “Local gradient increasing pattern for facial expression recognition,” in Proceedings - International Conference on Image Processing, ICIP, 2012, pp. 2601–2604, doi: 10.1109/ICIP.2012.6467431. [30] I. I. Conference and I. Processing, “ALL-IDB : The Acute Lymphoblastic Leukemia Image Database For Image Processing Ruggero Donida Labati , Vincenzo Piuri , Fabio Scotti Università degli Studi di Milano , Department of Information Technology ,” Ieee Int. Conf. Image Process., pp. 2089–2092, 2011. [31] P. K. Das, P. Jadoun, and S. Meher, “Detection and Classification of Acute Lymphocytic Leukemia,” Proc. 2020 IEEE-HYDCON Int. Conf. Eng. 4th Ind. Revolution, HYDCON 2020, Sep. 2020, doi: 10.1109/HYDCON48903.2020.9242745. [32] S. Praveena and S. P. Singh, “Sparse-FCM and Deep Convolutional Neural Network for the segmentation and classification of acute lymphoblastic leukaemia,” Biomed. Tech., vol. 65, no. 6, pp. 759–773, 2020, doi: 10.1515/bmt-2018-0213. [33] P. K. Das, S. Meher, R. Panda, and A. Abraham, “An Efficient Blood-Cell Segmentation for the Detection of Hematological Disorders,” IEEE Trans. Cybern., vol. PP, 2021, doi: 10.1109/TCYB.2021.3062152.