IHJPAS. 36(1)2023 113 This work is licensed under a Creative Commons Attribution 4.0 International License Studying the Classification of Texture Images by K-Means of Co- Occurrence Matrix and Confusion Matrix . Abstract In this research, a group of gray texture images of the Brodatz database was studied by building the features database of the images using the gray level co-occurrence matrix (GLCM), where the distance between the pixels was one unit and for four angles (0, 45, 90, 135). The k-means classifier was used to classify the images into a group of classes, starting from two to eight classes, and for all angles used in the co-occurrence matrix. The distribution of the images on the classes was compared by comparing every two methods (projection of one class onto another where the distribution of images was uneven, with one category being the dominant one. The classification results were studied for all cases using the confusion matrix between every Two cases or two steps (two different angles and for the same number of classes). The agreement percentage between the classification results and the various methods was calculated. Keywords: K-Means, Feature Extraction, Confusion Matrix, Agreement Percent, Class Projection. 1.Introduction Image processing is one of the primary areas employed in various applications. It may be characterized as a method through which primitive images are improved. The sources of these images are cameras, remote sensors on satellites, or images used for medical diagnostics. In recent years, advancements in image processing have been made using variety methods, and the influence of these advancements on the capabilities of improving images, whether for military reconnaissance missions, space probes, or other applications, has been significant. Image processing comprises several distinct methods that can be distinguished from one another, such as feature extraction and classification. These features have important and unique information about digital images, and together they help their respective classifiers get the best possible results [1-2]. doi.org/10.30526/36.1.2894 Article history: Received 19 June 2022, Accepted 7 Augest 2022, Published in January 2023. Ibn Al-Haitham Journal for Pure and Applied Sciences Journal homepage: jih.uobaghdad.edu.iq Haider S. Kaduhm Department of Phesics,College of Education for Pure Sciences, Ibn Al - Haitham, University of Baghdad, Baghdad, Iraq. ha idar.sa dek1204a@ihcoedu.uoba ghda d.edu.iq Hameed M. Abduljabbar Department of Phesics,College of Education for Pure Sciences, Ibn Al - Haitham, University of Baghdad, Baghdad, Iraq. hameed.m.aj@ihcoedu.uobaghdad.edu.iq https://creativecommons.org/licenses/by/4.0/ mailto:haidar.sadek1204a@ihcoedu.uobaghdad.edu.iq mailto:hameed.m.aj@ihcoedu.uobaghdad.edu.iq IHJPAS. 36(1)2023 114 The Brodatz texture database is one of the well-known global databases. The Brodatz texture images database was built from the Brodatz album, so it is considered an actual measure for evaluating the algorithms used in the segmentation and classification of texture images because it contains homogeneous and heterogeneous materials in addition to large-scale patterns [3-4]. The texture is a feature that allows to extract and arrange images to be used in many applications and provides spatial information about an image's hue or intensity. The texture of the points cannot be explained. It is determined by the spatial organization of gray-level values in the vicinity. The features of the texture explain how the image behaves. To derive these features, feature extraction techniques that help classify and identify images must be applied. Dimensional modification is a subset of feature extraction. The primary objective of this method is to collect essential features of the raw data and to interpret these features in a space with fewer dimensions. When the original data is too large to be handled, the raw data in this approach is converted to a decremental description of the features. Dealing with the actual data is unnecessary (big data, but more information is needed) [5]. The Texels are what the texture is composed of. A distinct definition can be given to the Texels, the basic unit used to describe the homogeneity of images, as they appear with a certain extent and regularity [6-7]. Also, they are the information provided by the intensity coordination in the image or the spatial arrangement of colors[5]. Textures can define the surface and the properties of aerial or satellite images, biomedical images, and other types of images. With these essential properties, it can be utilized in many applications, such as knowing product quality through industrial monitoring, finding land resources by remote sensing, and finally, medical diagnosis using a computer in tomography. Thus the texture of the image can be defined as the spatial contrast function of pixel intensities (gray values)[8-9]. Image texture describes the spatial arrangement of colors in an image. Also, spatial variation in pixel intensities (gray values) is used to define an image texture. Image texture has various uses and has been the topic of much investigation by many academics because the regular images usually do not have complex backgrounds. They include less information about textures. Therefore, one evident use of image textures is to select areas based on textural qualities using texture applications[9][10]. Image classification techniques are categorized into supervised and unsupervised classification methods [11]. Unsupervised classification aims to break the extensive data in the image into smaller units with similar characteristics[12]. Unsupervised classification is an aggregation principle used to reveal the structure of this aggregation in a data set[13][14]. The Isodata algorithm and the K- means algorithm are considered the most popular methods used in unsupervised classification. They are also widely used in satellite data analysis [15]. As for the supervised classification methods, in these methods, there is a group called the training sample, or what is known as the input of the analyst, which plays a prominent role in the accuracy of classification, as these samples are the main factor and of high importance in the supervised classification methods[16-18]. In statistical methods, the adopted idea is that the image's texture is determined using statistics related to the selected features from among a large group of local image characteristics. You find that the human system differentiates between a texture from others based on statistical features, which include first-degree, second-degree, and an example stats on second-degree is the gray level occurrence matrix, and high-level statistics such as the autocorrelation function [19-20]. One of the most important statistical methods is the gray-level presence matrix (GLCM), considered one of the oldest methods for extracting many texture features [21]. The co-occurrence matrix Co IHJPAS. 36(1)2023 115 (i, j) calculates these features. It finds the gray levels i and j, respectively, with the number of pixel co-occurrence, within a certain distance [21-23]. The result of this is a matrix of distance d and direction for every number of intensity levels observed in the image. Classification for fine textures requires small values for distance d, while for coarse textures, a considerable distance [24]. After creating (GLCM), the statistical characteristics are calculated as an example. Six statistical properties can be inferred from (GLCM) (Contrast, correlation, energy, homogeneity, Entropy, and maximum probability [21]. This research is to study the features of texture images taken from the Brodatz database for different angles and its impact on the classification results in terms of the distribution of images in various classes and the percentage of agreement between them on the classification of images. The whole image was adopted in extracting the characteristics and the classification process. Texture analysis and classification have been studied thoroughly for their importance; the following is some research about them: The texture analysis approach for local binary patterns (LBP) and the gray level co-occurrence matrix method was tested and contrasted against one another (GLCM). They are used on industrial products and new integration schemes to conduct texture analysis. They collected a wide range of industrial samples. The obtained findings were satisfactory and demonstrated effectiveness in identifying complicated industrial products with varying color and pattern distributions [24]. The spatial gray level method was for texture analysis. He proposed a minor modification to replace the usual frequency matrices with addition and difference graphs when the sum and difference of two random variables with the same data were related to each other and determine the main axes of the joint probability function associated with them. He presented two high-probability classifiers for texture. The first, the sum and difference graph, was considered a feature vector component and performs a fast execution, avoiding the explicit evaluation of the feature vector. The other was a classifier based on general measurements extracted from the graphs. He illustrated that the proposed method of tissue analysis from the traditional spatial gray level dependence was the reduction of the computation time as well as the storage of memory [25]. The suggestion was to make a mixture based on both the co-existence matrix (GLCM) as well as the random threshold vector matrix (RTV) due to the need for a high accuracy rate of tissue classification using the random threshold vector (RTV). They were used on different data sets, such as Brodatz and Outex. In the beginning, the first dimension of the feature vector was calculated. They then estimated the inverse from the co-existence matrices by applying the (RTV) method. They found that the vectors had two dimensions, one was the Entropy of the proposed existence matrix, and the other was the threshold dimension. Their proposed approach showed a high-quality accuracy rate for classification textures and contained significant discriminatory information required for successful analysis [26]. 2.Methodology The following steps were adopted in this research: ο‚· Using the Brodatz image database as a sample to test the algorithm. ο‚· Building a database for each angle (0, 45, 90, and 135) of the gray-level co-occurrence matrix method (GLCM) using a distance equal to one pixel. The features adopted in extracting the images features using the following equations[27]: πΆπ‘œπ‘›π‘‘π‘Ÿπ‘Žπ‘ π‘‘ = βˆ‘ βˆ‘ (𝑖 βˆ’ 𝑗)2π‘βˆ’1𝑖=0 π‘βˆ’1 𝑖=0 𝑝(𝑖, 𝑗) (1) πΆπ‘œπ‘Ÿπ‘™π‘Žπ‘‘π‘–π‘œπ‘› = 1 𝜎π‘₯πœŽπ‘¦ βˆ‘ βˆ‘ (𝑖,𝑗)𝑝 𝑁 𝑗=1 𝑁 𝑖=1 (𝑖,𝑗) βˆ’ πœ‡π‘₯ πœ‡π‘¦ (2) IHJPAS. 36(1)2023 116 πΈπ‘›π‘‘π‘Ÿπ‘œπ‘π‘¦ = βˆ’ βˆ‘ βˆ‘ 𝑝(𝑖, 𝑗)π‘™π‘œπ‘”(𝑝(𝑖, 𝑗))𝑁𝑗=1 𝑁 𝑖=1 (3) π»π‘œπ‘šπ‘œπ‘”π‘’π‘›π‘’π‘–π‘‘π‘¦ = βˆ‘ βˆ‘ 𝑝(𝑖,𝑗) 1 π‘βˆ’1 𝑗=0 π‘βˆ’1 𝑖=0 + |𝑖 βˆ’ 𝑗| (4) π‘†π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π·π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘› 𝜎 = √ 1 π‘βˆ’1 βˆ‘ |πΆπ‘œπ‘– βˆ’ πœ‡| 2𝑁 𝑖=1 (5) where ΞΌ is the mean of co-occurrence matrix πœ‡ = 1 𝑁 βˆ‘ πΆπ‘œπ‘– 𝑁 𝑖=1 π‘‡β„Žπ‘–π‘Ÿπ‘‘ π‘€π‘œπ‘šπ‘’π‘›π‘‘π‘’π‘š (𝑀3) = ( 1 π‘βˆ’1 βˆ‘ |πΆπ‘œπ‘– βˆ’ πœ‡| 2𝑁 𝑖=1 ) 3 (6) π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™π‘–π‘§π‘’π‘‘ 𝜎 = 1 βˆ’ 1 1+( π‘†π‘π‘œ 255 ) 2 (7) π‘ˆπ‘›π‘–π‘“π‘œπ‘Ÿπ‘šπ‘–π‘‘π‘¦(π‘ˆπ‘“) = βˆ‘ βˆ‘ ( πΆπ‘œπ‘–π‘— βˆ‘ βˆ‘ πΆπ‘œπ‘–π‘— ) 2 𝑗𝑖 (8) Where 𝑝(𝑖, 𝑗) the probability of the co-occurrence matrix element value πΆπ‘œπ‘–π‘— ο‚· Classifying the Brodatz database using the K_ Means classifier for a number of classes. ο‚· Calculating the confusion matrix between each two classification methods for all angles 0, 45, 90 and 135. ο‚· Calculating the agreement percent between the results of classificat ion of each two methods (two different angles). ο‚· Studying the distribution of the classified images between methods (different angles) by calculating the projection of one method into another method, as shown in Figure 1 First Method S e c o n d M e th o d The projection for the first method on the second The projection of the second method on the first Figure 1. The confusion matrix The confusion matrix was used to calculate the agreement percent for the classification results of classifying the images between each two methods, using equation 9. π΄π‘”π‘Ÿπ‘’π‘’π‘šπ‘’π‘›π‘‘ π‘ƒπ‘’π‘Ÿπ‘π‘’π‘›π‘‘ (𝐴𝑃) = βˆ‘ πΆπ‘œπ‘›π‘“π‘–π‘– 𝑁 𝑖 βˆ‘ πΆπ‘œπ‘›π‘“π‘–π‘— 𝑁 𝑖𝑗 (9) 3.Results and discussion Brodatz images were used as samples to study the GLCM method in distinguishing texture images, where 112 (gray images) were used. Below, in Figure 2, is a sample of these images. IHJPAS. 36(1)2023 117 Figure 2. A set of samples taken from Brodatz The GLCM was calculated for the angles (0, 45, 90 and 135) with one-pixel distance to study the effect of angle change on the ability of the GLCM to distinguish texture images. A database was created for each image and each angle used in the gray-level co-occurrence matrix using eight features that illustrate in the methodology as a pattern to describe the texture of the images. K-means classifier was used to classify the images starting from two to eight classes. The confusion matrix was used to analyze the results, as shown in Figure 2. The results of each method intersected with other methods. Figure 3. shows the relationship between the number of classes and the percentage of agreement for every two angles. The general behavior of all selected angles decreased rate with the number of classes. Still, the behavior of (135-45) and (0-90) angles are different from the others because they are right angles to each other. For the angle (45-135) and (0-90), the percentage of agreement was around 90%. And for several classes, up to seven classes between the two angles (45-135) and within five classes between the two angles (0-90). At the same time, the rest of the angles showed that the percentage of agreement for the class has started decreasing from the third class and downward. When the number of varieties was few, the percentage of compatibility between them was very high, but this case still must be fixed. With the increase in the number of classes, the percentage of the agreement began to decrease, and the speed of reduction depended on the value of the angle formed by the two angles used in the classification in terms of being a multiple of the angle 90. IHJPAS. 36(1)2023 118 Figure 3. The distribution of the number of classes and the agreement percentage for each of the two angles The confusion matrix was used to study the distribution of the images in each method relative to the second method (its projection) and various adopted classes, as shown in Figure 2. The projection of the first method on the second and the second method on the first and for all classes and angles adopted in the research were calculated, as shown in Figure 4. The projection for first method to second method The projection of second method to first method IHJPAS. 36(1)2023 119 IHJPAS. 36(1)2023 120 Figure 4.The projection of the method on to the other, using different classes It can be seen that in the confusion matrix, there is a problem with the classification results. In most cases, a class contains the highest number of images. We note that this method distributes images to the classes so that there is a class that contains most images, regardless of their arrangement, which is usually the class in the middle, which represents one of the defects of this method, as the distribution is not equal. Still, one of the classes is dominant and takes the most significant number of images. This behavior is identical to all the results of different classes' projections and angles. IHJPAS. 36(1)2023 121 4.Conclusions Depending on the obtained results, we can reach the following conclusions: ο‚· The percentage of agreement between the classification results when using the GLCM method is high when the number of classes is few. It decreases when the number of classes is increased. ο‚· When every two angles used for GLCM form a right angle, the percentage of agreement decreases less than in the rest of the cases. ο‚· When using GLCM, the classification results in one of the classes being the dominant one in terms of the number of images, and it is usually arranged in the middle. References 1.Mayada, J. K.; . Emad, K. J.; Decision Tree for Image Classification; Iraqi Commission for Computers and Informatics: Baghdad, 2013. 2.Latef, A. A. A.; Image Retrieval Based on Coefficient Correlation Index. Ibn AL-Haitham J. Pure Appl. Sci.2017, 25. 3.Farhan, A.H.; Mohammed Y. Kamil Texture Analysis of Breast Cancer Using Mammogram; Mustansiriyah University /College of Science: Baghdad, 2020. 4.Zhang, X.; Cui, J.; Wang, W.; Lin, C. A.; Study for Texture Feature Extraction of High-Resolution Satellite Images Based on a Direction Measure and Gray Level Co-Occurrence Matrix Fusion Algorithm. Sensors2017, 17, 1474. 5.Wirth, M. A.; Texture Analysis. Univ. Guelph Guelph, ON, Canada2004. 6. Chang, T.; Kuo, C. C.; Texture Analysis and Classification with Tree-Structured Wavelet Transform. IEEE Trans. image Process.1993, 2, 429–441. 7.Hashim F. A. AL-Bassam; A Texture Analysis System Based on Spatial Frequency and Attributes for Image Classification; University of Baghdad - College of Science Department of Physics: Baghdad, 2019. 8.Naghashi, V.; Co-Occurrence of Adjacent Sparse Local Ternary Patterns: A Feature Descriptor for Texture and Face Image Retrieval. Optik (Stuttg).2018, 157, 877–889. 9.Warner, T. A.; Foody, G.M.; Nellis, M.D. The SAGE Handbook of Remote Sensing; Sage Publications, 2009. ISBN 1412936160. 10.Mohammed, M.A.; Naji, T.A.H.; Abduljabbar, H. M. The Effect of the Activation Functions on the Classification Accuracy of Satellite Image by Artificial Neural Network. Energy Procedia.2019, 157, 164–170. 11.Akey Sungheetha, D. J. An Efficient Clustering-Classification Method in an Information Gain NRGA-KNN Algorithm for Feature Selection of Micro Array Data. Life Sci. J.2013, 10. 12.Sharma, A. R.; Beaula, R.; Marikkannu, P.; Sungheetha, A.; Sahana, C. Comparative Study of Distinctive Image Classification Techniques. In Proceedings of the 2016 10th International Conference on Intelligent Systems and Control (ISCO); IEEE, 2016,1–8. 13.Jain, M.; Tomar, P.S.; Review of Image Classification Methods and Techniques. Int. J. Eng. Res. Technol.2013, 2, 852–858. 14.Abduljabbar, H.M.; Hatem, A. J.; Al-Jasim, A. A. Desertification Monitoring in the South-West of Iraqi Using Fuzzy Inference System. NeuroQuantology.2020, 18, 1. 15.Abburu, S.; Golla, S. B.; Satellite Image Classification Methods and Techniques: A Review. Int. J. Comput. Appl.2015, 119. 16.Mohammed, M. A.; Hatem, A. J.; Change Detection of the Land Cover for Three Decades Using Remote Sensing Data and Geographic Information System. In Proceedings of the AIP Conference Proceedings; AIP Publishing LLC, 2020. 2307, 20029. 17.Zhang, J.; Tan, T.; Brief Review of Invariant Texture Analysis Methods. Pattern Recognit.2002, 35, 735–747. 18. Julesz, B.; Caelli, T.; On the Limits of Fourier Decompositions in Visual Texture Perception. Perception.1979, 8, 69–73. IHJPAS. 36(1)2023 122 19.Haralick, R. M.; Statistical and Structural Approaches to Texture. Proc. IEEE1979, 67, 786–804. 20. Abaas Hussain, L. H.; Correction of Non-Uniform Illumination for Biological Images Using Morphological Operation Assessing with Statistical Features Quality. Ibn AL-Haitham J. Pure Appl. Sci.2017, 29, 81–90. 21.Hussein, M. A.; Abbas, A. H.; Comparison of Features Extraction Algorithms Used in the Diagnosis of Plant Diseases. Ibn AL-Haitham J. Pure Appl. Sci.2018, 523–538. 22.Materka, A.; Strzelecki, M. Texture Analysis Methods–a Review. Tech. Univ. lodz, Inst. Electron. COST B11 report, Brussels. 1998, 10, 4968. 23.Suresh, A.; Shunmuganathan, K.L.; Image Texture Classification Using Gray Level Co- Occurrence Matrix Based Statistical Features. Eur. J. Sci. Res.2012, 75, 591–597. 24.Akhloufi, M.A.; Maldague, X.; Larbi, W.; Ben A New Color-Texture Approach for Industrial Products Inspection. J. Multimed.2008, 3. 25.Unser, M.; Sum and Difference Histograms for Texture Classification. IEEE Trans. Pattern Anal. Mach. Intell.1986, 118–125. 26.Rezaei, M.; Saberi, M.; Ershad, S.F. Texture Classification Approach Based on Combination of Random Threshold Vector Technique and Co-Occurrence Matrixes. In Proceedings of the Proceedings of 2011 International Conference on Computer Science and Network Technology; IEEE. 2011, 4, 2303–2306. 27.Ali, A. H.; Abdulsalam, S. I.; Nema, I. S.; Detection and Segmentation of Ischemic Stroke Using Textural Analysis on Brain CT Images. Int. J. Sci. Eng. Res.2015, 6, 396–400.