Lontar - Template LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 91 The Classification of Acute Respiratory Infection (ARI) Bacteria Based on K-Nearest Neighbor Zilvanhisna Emka Fitria1, Lalitya Nindita Sahendaa2, Pramuditha Shinta Dewi Puspitasaria3, Prawidya Destariantoa4, Dyah Laksito Rukmib5, Arizal Mujibtamala Nanda Imronc6 aDepartment of Information Technology, Politeknik Negeri Jember bDepartment of Animal Science, Politeknik Negeri Jember Jl. Mastrip PO.BOX 164 Jember, 68121, Indonesia 1zilvanhisnaef@polije.ac.id (Corresponding author) 2lalitya.ns@polije.ac.id 3pramuditha@polije.ac.id 4prawidya@polije.ac.id 5dyah.laksito@polije.ac.id cDepartement of Electrical Engineering, Universitas Jember Jl. Kalimantan No. 37, Kampus Tegalboto, Jember, 68121, Indonesia 5arizal.tamala@unej.ac.id Abstract Acute Respiratory Infection (ARI) is an infectious disease. One of the performance indicators of infectious disease control and handling programs is disease discovery. However, the problem that often occurs is the limited number of medical analysts, the number of patients, and the experience of medical analysts in identifying bacterial processes so that the examination is relatively longer. Based on these problems, an automatic and accurate classification system of bacteria that causes Acute Respiratory Infection (ARI) was created. The research process is preprocessing images (color conversion and contrast stretching), segmentation, feature extraction, and KNN classification. The parameters used are bacterial count, area, perimeter, and shape factor. The best training data and test data comparison is 90%: 10% of 480 data. The KNN classification method is very good for classifying bacteria. The highest level of accuracy is 91.67%, precision is 92.4%, and recall is 91.7% with three variations of K values, namely K = 3, K = 5, and K = 7. Keywords: Bacteria, Acute Respiratory Infection, Image Processing, KNN 1. Introduction Acute Respiratory Infections (ARI) are included in the list of the top ten infectious diseases whose incidence of infectious diseases (disease prevalence) and morality (a measure of the number of deaths in a population) are quite high in the world [1]. ARI is divided into two, namely upper respiratory tract infections (URTIs) and lower respiratory tract infections (LRTIs). The upper respiratory tract consists of the ears, nose, and throat, while the lower respiratory tract consists of the trachea, bronchi, bronchioles, and lungs [2]. Some examples of ARI diseases caused by bacteria are pneumonia, tuberculosis (TB), diphtheria, and pharyngitis [3]. Pneumonia is an infectious disease caused by an infection that causes the lungs to become inflamed. The causative pathogens (bacteria) are Streptococcus pneumoniae, Staphylococcus aureus, Haemophilus influenza, Mycoplasma pneumonia, Chlamydophila pneumonia, and Legionella pneumophila [4]. Tuberculosis (TB) is one of the serious health problems in Indonesia. TB is an infection caused by Mycobacterium tuberculosis in the lower respiratory tract. Diphtheria is an acute infectious disease caused by Corynebacterium diphtheriae which attacks the upper respiratory tract [2]. From year to year in East Java, the number of diphtheria sufferers is reported to continue to increase until, in 2019, there were 358 cases [5]. In addition, Neisseria gonorrhoeae is a bacterial pathogen that causes pharyngitis [4], which usually occurs in sexually transmitted diseases (STD) without symptoms (asymptomatic) [3]. LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 92 Achievement of performance indicators of infectious disease control and handling programs, namely discovery, treatment, and success of treatment [5]. Generally, the discovery process is carried out by examining specimens or sputum from the patient, which is then carried out by a microscopic examination process. However, the problems that often occur are the limited number of medical analysts, a large number of patients, differences in perceptions and experiences of medical analysts in identifying bacteria in sputum/throat sputum samples, and the time required for the examination process is relatively longer. Based on the description of the problem above, the researchers created an automatic and accurate bacterial classification system for the early detection of acute respiratory infections (ARI). Several references are used as references by researchers regarding the identification of bacteria that cause pneumonia and tuberculosis. In 2016, a Streptococcus pneumoniae detection system was created from digital microscope images with an accuracy rate of 80%[6]. Then the bacterial segmentation was developed using the Channel Area Thresholding (CAT) segmentation method so that the system was able to identify bacilli with an accuracy of 97.58% on the sputum image dataset [8]. Meanwhile, the identification of Mycobacterium tuberculosis bacteria has also been carried out using image segmentation and the K-Means clustering method in 2015 [7]. The following research compares two classification methods: backpropagation and K-Nearest Neighbor (KNN), to obtain an accuracy rate of 93.22% for backpropagation and 94.92% for KNN [9]. Based on the references above, the researcher uses the K-Nearest Neighbor (KNN) method. The KNN method is a general and straightforward classification method used, but this research is an early stage of research on ARI bacterial classification, so we focus on selecting the right features to classify ARI bacteria. There is a difference with previous research, namely the type of bacteria studied. In this research, researchers added Staphylococcus aureus and Streptococcus pneumoniae as bacteria for pneumonia disease, Corynebacterium diphtheriae as bacteria for diphtheria disease, and Neisseria gonorrhoeae as pathogens for pharyngitis disease. 2. Research Methods This study uses the personal data of the researcher, namely the bacterial image dataset from throat sputum. Several stages were carried out in this research, namely bacterial image, image preprocessing, image segmentation, feature extraction, and bacterial classification using the KNN method, as shown in Figure 1. Figure 1. Block Diagram of The Bacterial Classification System Proposed 2.1. Bacteria Images Generally, the size of bacteria is 0.4 to 2 m, consisting of three general forms, namely cocci, bacilli, and spirochetes [4]. The three forms have other specific forms such as Staphylococcus aureus, which is included in the cocci in a cluster group, Streptococcus pneumoniae is included in the cocci in chains group, Corynebacterium diphteriae is included in the clubshaped and pleomorphic rods group, and Neisseria gonorrhoeae is included in the diplococci group [10]. LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 93 Meanwhile, Mycobacterium tuberculosis belongs to the aerobic acid-fast rods group [11], as shown in Figure 2, and based on the literature. The research image data is shown in Table 1. 1. Gram-positive cocci in graplike cluster (Staphylococci) 2. Gram-positive cocci in chains (Steptrococci) 3. Gram-positive cocci with capsules (Pneumococci) 4. Gram-positive, clubshaped, pleomorphic rods (Corynebacteria) 5. Gram-negative rods with pointed ends (Fusobacteria) 6. Gram-negative curved rods (here commashaped vibrios) 7. Gram-negative diplococci, adjacent sides flattened (neisseria) 8. Gram-negative straight rods with rounded ends (coli bacteria) Figure 2. Bacterial Morphology [10]. Table 1. Variation of Acute Respiratory Infection Bacterial Image Bacterial Name Disease Bacterial Images Staphylococcus aureus Pneumonia Streptococcus pneumoniae Pneumonia Corynebacterium diphteriae Diphtheria Neisseria gonorrhoeae Pharyngitis Mycobacterium tuberculosis TB Table 1 shows that the research data consisted of 5 classes, namely Staphylococcus aureus and Streptococcus pneumoniae as pneumonia disease bacteria, Corynebacterium diphtheriae as diphtheria disease bacteria, Neisseria gonorrhoeae as asymptomatic pharyngitis bacteria, and Mycobacterium tuberculosis as tuberculosis (TB) bacteria. LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 94 2.2. Preprocessing Images The data normalization process is carried out at this research stage, such as uniformity of image size and uniformity of color space used before the image segmentation process. Initially, the size of the bacterial image varied from 1920x1080 pixels, but the size was very large, and it was necessary to cut the image to 151x151 pixels, as shown in Figure 3. The result of the cropping process is part of the normalization of data that represents the shape of the ARI bacteria. In addition, the cropping process aims to reduce the computational load [12]. Figure 3. Image Size (a) 1920x1080 pixels to (b) 151x151 pixels The cropped image is an RGB color space image where the color space consists of 3 color components, namely red components, green components, and blue components. RGB color space has a large size, so it isn't easy to segment, so it needs to be converted to another color space [13], for example, HSV color space. The HSV color space is a color space that also consists of 3 color components, namely the Hue color component, the Saturation color component, and the Value color component. The process of converting color from RGB color space to HSV color space with the formula equation [14] : β„Žπ‘’π‘’ = tan⁑( 3π‘₯(𝐺 βˆ’ 𝐡) (𝑅 βˆ’ 𝐺) + (𝑅 βˆ’ 𝐡) ) (1) π‘ π‘Žπ‘‘π‘’π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘› = 1 βˆ’ min⁑(𝑅,𝐺,𝐡) 𝑉 (2) π‘‰π‘Žπ‘™π‘’π‘’ = 𝑅 + 𝐺 + 𝐡 3 (3) Next is the process of adding contrast (contrast stretching). Its function is to even out the distribution of light and dark intensity over the entire intensity scale so that the image has a high contrast value. 2.3. Segmentation At this stage, the aim is to separate the research object from the background. This stage uses a threshold process where we have to find the threshold value with formula equation [15] : π‘ π‘’π‘”π‘šπ‘’π‘›π‘‘π‘Žπ‘‘π‘–π‘œπ‘›β‘(π‘₯,𝑦) = { 1,π‘–π‘“β‘π‘”π‘Ÿπ‘Žπ‘¦π‘ π‘π‘Žπ‘™π‘’(π‘₯,𝑦) ≀ 𝑇 0,π‘–π‘“β‘π‘”π‘Ÿπ‘Žπ‘¦π‘ π‘π‘Žπ‘™π‘’(π‘₯,𝑦) > 𝑇 (4) To find the threshold value (T), we have to look at the histogram of the grayscale image to find out the gray-level value of the research object and the background. In addition to using the thresholding technique, the segmentation process is also carried out using the chain-code technique. This method uses a labeling system for each binary object. It then calculates the proximity of the pixel values based on the direction of 4 or 8 surrounding neighbors, as shown in Figure 4. LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 95 Figure 4. (a) 4-connected and (b) 8-connected 2.4. Feature Extraction At this stage, the aim is to find characteristic values that can distinguish the first class from other classes. Feature extraction used in this research is morphological or shape features such as Bacterial count, area, perimeter, and form factor. Determination of the area and perimeter using a chain code, where area (A) represents the area of the bacteria, the perimeter or circumference (P) represents the edge length, and the shape factor (S) represents the shape of the bacteria. The three parameters are expressed by the equation formula [16] : 𝐴 = π‘π‘’π‘šπ‘π‘’π‘Ÿβ‘π‘œπ‘“β‘π‘π‘–π‘₯π‘’π‘™π‘ β‘π‘–π‘›β‘π‘Ÿπ‘œπ‘€ βˆ’ 1 + π‘Ÿπ‘œπ‘€β‘π‘‘π‘œ βˆ’ 2 + β‹― + π‘Ÿπ‘œπ‘€β‘π‘‘π‘œ βˆ’ 8 (5) 𝑃 = βˆ‘ πΈπ‘£π‘’π‘›β‘π‘π‘œπ‘‘π‘’ + √2π‘₯ βˆ‘π‘œπ‘‘π‘‘β‘π‘π‘œπ‘‘π‘’ (6) 𝑆 = 𝑃2 𝐴 (7) 2.5. K-Nearest Neighbor Classification K-Nearest Neighbor (KNN) classification is one of the classification methods with supervised learning methods. In supervised learning, the classification target is known. The KNN method uses the closest distance to the object to classify data, so that the method is often known as lazy learning. The basic principle of KNN is to find the value of K where the value of K is the closest amount of data that will determine the classification results and to calculate the closest distance using Euclidean distance (ED) calculations with the equation formula [16]–[18]: 𝐸𝐷⁑(π‘₯𝑖,π‘₯𝑗) = βˆšβˆ‘(π‘₯π‘–π‘Ÿ βˆ’ π‘₯𝑖𝑗) 2 𝑛 π‘Ÿ=1 (8) Where Xir is the testing data and Xij is the training data The total number of data is 481 images, consisting of 94 images of Corynebacterium diphteriae bacteria, 91 images of Mycobacterium tuberculosis, and 95 images of Neisseria gonorrhoeae 92 images of Staphylococcus aureus, and 108 images of Streptococcus pneumoniae bacteria. In this research, the classification process is to find the highest level of accuracy from the KNN method in comparing training data and testing data. The comparison of the data carried out is 50% : 50%, 60% : 40%, 70% : 30%, 80% : 20% and 90% : 10%. 3. Result and Discussion In the research of bacterial images, which were originally in the RGB color space, they were converted into HSV color spaces using the equations (1), (2), and (3) so that the HSV color space channel that best represented the shape of the bacteria was shown in Figure 5. The figure shows that the Hue component image best represents the shape of the bacteria Streptococcus pneumoniae, Corynebacterium diphtheriae, and Mycobacterium tuberculosis. Meanwhile, Staphylococcus aureus and Neisseria gonorrhoeae bacteria can be represented well on the LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 96 image of the saturation component. To clarify the shape of the bacteria, the following process is contrast stretching which causes the image to have a high contrast value so that it also affects the histogram of the image. In addition, there is a change in the image before and after the contrast stretching process, as shown in Figure 6. (a) (b) (c) (d) (e) Staphylococcus aureus Streptococcus pneumoniae Corynebacterium diphteriae Neisseria gonorrhoeae Mycobacterium tuberculosis Figure 5. Image of (a) RGB, (b) HSV, (c) Hue, (d) Saturation and (e) Value on various types of bacteria Figure 6 shows a difference between the Hue image histogram before and after the contrast stretching process. The range of gray values of the HSV image is 0 - 1. This is certainly different LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 97 from the range of gray values of the grayscale image, which is 0 - 255. In the image before contrast stretching, there are two peaks in the histogram, namely 0.58 and 0.78, while after doing the contrast stretching, there are two peaks in the histogram contrast stretching, the distribution of light and dark intensity throughout the intensity scale so that the image histogram looks bigger than before. In addition to changes in the histogram, Figure 6 also shows changes in the hue image before and after contrast stretching. The contrast stretching process helps the process of segmentation because the image of (a) the value of gray level is similar between the object and the background, while the image (b) occur significant color difference between the object and the background that will ease the process of segmentation using a threshold. (a) (b) Figure 6. (a) Hue image and histogram before contrast stretching while (b) Hue image and histogram after contrast stretching on Mycobacterium tuberculosis bacteria image After the contrast stretching process, the segmentation process is carried out based on the threshold value with the equation (4). Because this study used hue images and saturation images, the threshold value used has a range of 0.4 to 0.7. It depends on the results of the contrast stretching the image, whether it is dark or light. The thresholding process results are a binary image, an image with two values, namely 0 (black) and 1 (white), as shown in Figure 8. (a) (b) (c) (d) (e) Figure 7. Thresholding image of (a) Staphylococcus aureus, (b) Streptococcus pneumoniae, (c) Corynebacterium diphtheriae, (d) Neisseria gonorrhoeae and (e) Mycobacterium tuberculosis LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 98 Figure 7 shows that the threshold image can represent most forms of bacteria, but there are some bacteria such as Neisseria gonorrhoeae and Mycobacterium tuberculosis that need to be re- segmented. This is because there is still noise in the segmentation image based on the threshold value. Noise is meant objects that are not parts of the bacterial body, such as paint residues and other objects like Polymorphonuclear (PMN) cells. PMN itself is one of the white blood cells that will appear if there is an infection in the body. In the image of Neisseria gonorrhoeae bacteria, the shape of polymorphonuclear cells (PMN) is also segmented, so it is necessary to do segmentation based on area. To perform the segmentation, the process is continued by labeling the object and finding the area value using a chain code with the proximity of 8 neighboring pixels. This process is known as the Channel Area Thresholding (CAT) segmentation technique [19]. In this research, all bacterial images were segmented based on the Channel Area Threshold (CAT) value, but the range of the threshold values differed depending on each bacterium's area. The threshold value for the CAT segmentation technique is denoted [S.Area]. The value of the threshold area [S. Area] used varies depending on the area of each bacterium. Determination of S.Area is to use two threshold values, namely [S. Area] β‰₯ 5 & [S. Area] ≀ 100 was used to remove other objects such as PMN in Neisseria gonorrhoeae, while Streptococcus pneumoniae, Corynebacterium diphtheriae, and Mycobacterium tuberculosis, two threshold values were [S. Area] β‰₯ 50 & [S. Area] ≀ 7000. Then in Staphylococcus aureus, the difference in the upper threshold value [S. Area] ≀ 10000. The results of the CAT segmentation are shown in Figure 8. (a) (b) (c) (d) (e) Figure 8. CAT segmentation images (a) Staphylococcus aureus, (b) Streptococcus pneumoniae, (c) Corynebacterium diphtheriae, (d) Neisseria gonorrhoeae and (e) Mycobacterium tuberculosis Figure 8 shows the results of the segmentation image where there are only bacterial objects, without any noise such as staining or other cells (PMN) in the bacterial image (d) Neisseria gonorrhoeae. The following process is feature extraction based on the shape (morphology) of bacteria. Morphological features are used to classify bacteria that cause ARI because the characteristics of the shape of the bacteria are in accordance with Figure 2, so the features used are bacterial count, area, perimeter, and Shape factor. The results of bacterial feature extraction are shown in Table 2. Table 2. Feature extraction on each type of bacteria Feature Staphylococcus aureus Streptococcus pneumoniae Corynebacterium diphteriae Neisseria gonorrhoeae Mycobacterium tuberculosis Bacterial count Minimum 1 1 1 3 1 Maximum 8 12 5 29 1 Average 3 6 2 12 1 Area Minimum 1555 832 208 414 613 Maximum 9984 4679 1580 3949 2465 Average 4639 2438 728 1502 1332 Perimeter Minimum 586 384 101 263 257 Maximum 4009 1993 696 2495 877 Average 1774 1088 292 1021 487 Shape factor Minimum 220,833 154,793 25,584 167,075 81,061 Maximum 1697,013 958,275 311,889 1750,571 351,729 Average 687,847 491,619 123,410 700,310 181,752 Table 2 shows that the area, perimeter, and shape factor of the bacteria with the largest value is Staphylococcus aureus, while the smallest is Corynebacterium diphtheriae. The highest number of bacteria is Neisseria gonorrhoeae, as many as 29 bacteria in one image. In comparison, the least number of bacteria is Mycobacterium tuberculosis, as many as one bacteria in one image. LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 99 These features will be the input of the K-Nearest Neighbor (KNN) classification method. The basic principle of KNN is to find the value of K where the value of K is the closest amount of data that will determine the classification results and calculate the closest distance using the Euclidean distance calculation using the equation (8). The learning process of the KNN method is supervised learning, where the target is known beforehand. When testing the test data (unknown class label), the KNN algorithm looks for the training data closest to the test data. The test data is classified according to the class from the training data with the closest Euclidean distance. This study uses 480 data which is divided into training data and testing data with the provisions of 50% : 50%, 60% : 40%, 70% : 30%, 80% : 20% and 90% : 10% with variations in K values, the results accuracy, precision, recall are shown in Table 3. Table 3. The results of accuracy, precision, and recall on variations in the value of K Training Data : Testing Data K. value Accuracy (%) Precision (%) Recall (%) 50% : 50% 1 87,5 87,9 87,5 3 85 85,6 85 5 87,5 87,8 87,5 7 86,67 87 86,7 9 86,25 86,5 86,3 11 85,42 85,7 85,4 60 % : 40% 1 86,46 87,1 86,5 3 88,02 88,6 88 5 88,54 88,8 88,5 7 85,94 86,3 85,9 9 85,94 86,1 85,9 11 85,94 86,3 85,9 70 % : 30 % 1 86,81 87,5 86,8 3 88,19 88,7 88,2 5 90,28 90,5 90,3 7 88,19 88,6 88,2 9 86,81 86,9 86,7 11 86,81 86,9 86,8 80 % : 20% 1 84,38 85,7 84,4 3 85,42 86,2 85,4 5 88,54 89,4 88,5 7 88,54 89 88,5 9 90,63 91,6 90,6 11 89,58 90,3 89,6 90 % : 10% 1 87,5 89,8 87,5 3 91,67 92,4 91,7 5 91,67 92,4 91,7 7 91,67 92,4 91,7 9 89,58 90,1 89,6 11 89,58 90,1 89,6 Table 3 contains the comparison of training data and test data used with variations in the value of K to produce the best level of accuracy, precision, and recall. In comparing data 50%: 50%, the best accuracy rate is 87.5%, with a K = 1. Comparison of data 60%: 40%, the best accuracy rate is 88.54% with a K = 5. Comparison of data 70%: 30 % the best accuracy rate is 90.28% with a Value of K = 5. Comparison of data 80%: 20% the best accuracy rate is 90.63% with a Value of K = 9. This is different in the comparison of training data and test data 90%: 10 %, the best level of accuracy is 91.67%, precision is 92.4%, and recall is 91.7% with three variations in the value of K, namely K = 3, K = 5 and K = 7. To find out the results of the KNN classification, a confusion matrix was made, as shown in Table 4. LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 100 Table 4. Confusion Matrix with a data ratio of 90%: 10% at the value of K = 7 Output Target a b c d e 10 0 0 0 0 a = Corynebacterium diphteriae 1 8 0 0 0 b = Mycobacterium tuberculosis 0 0 4 0 1 c = Neisseria gonorrhoeae 0 1 0 12 0 d = Staphylococcus aureus 1 0 0 0 10 e = Streptococcus pneumoniae Table 4 shows that as many as 10 data were correctly classified as Corynebacterium diphtheriae, while for Mycobacterium tuberculosis, 8 data were correctly classified, and 1 data was misclassified as Corynebacterium diphtheriae. 4 data were correctly classified as Neisseria gonorrhoeae and 1 data was misclassified as Streptococcus pneumonia. The Staphylococcus aureus was classified correctly as many as 12 data, and 1 data was misclassified into Mycobacterium tuberculosis. Streptococcus pneumoniae were classified correctly as many as 10 data, and 1 data was misclassified into Corynebacterium diphtheriae. These results can occur due to the closeness of the values between the KNN input parameters (number of bacteria, area, perimeter, and shape factor) for each bacterium, as shown in Table 2. An example of the average perimeter feature is shown in Table 5 below. Table 5. Perimeter feature values for each type of bacteria Feature Staphylococcus aureus Streptococcus pneumoniae Corynebacterium diphteriae Neisseria gonorrhoeae Mycobacterium tuberculosis Perimeter Minimum 586 384 101 263 257 Maksimum 4009 1993 696 2495 877 Average 1774 1088 292 1021 487 Table 5 shows a closeness of the average value of perimeter features between Staphylococcus aureus, Streptococcus pneumoniae, and Neisseria gonorrhoeae bacteria, namely 1774, 1088, and 1021. Of course, this proximity affects the classification results using the KNN method, causing misclassification between bacteria so that a confusion matrix is created and is shown in Table 4. Suppose we compare with previous research where the accuracy of KNN is 94.92% while the accuracy of KNN in this research is 91.67%. This difference occurs because the previous research only classified one bacterium, namely Mycobacterium tuberculosis. Still, in this research, we added four other bacteria, namely Staphylococcus aureus, Streptococcus pneumoniae, Corynebacterium diphtheriae, and Neisseria gonorrhoeae. 4. Conclusion This research is one of the computer vision studies that aims to classify acute respiratory tract infection (ARI) bacteria using the K-Nearest Neighbor (KNN) method. The parameters used in this study are shape parameters, namely Bacterial count, area, perimeter, and form factor. The data used are 480 data with the best comparison of training data and test data, namely 90%: 10%. The KNN classification method can classify these bacteria with the highest level of accuracy, namely 91.67%, precision 92.4%, and recall 91.7% with 3 variations in the value of K, namely K = 3 K = 5 and K = 7. In this study, it is necessary to add other features and compare them with other classification methods to get the best classification method to classify bacteria that cause acute respiratory infections (ARI). References [1] E. Setyowati and S. Mariani, β€œPenerapan Jaringan Syaraf Tiruan dengan Metode Learning Vector Quantization (LVQ) untuk Klasifikasi Penyakit Infeksi Saluran Pernapasan Akut (ISPA),” vol. 4, p. 10, 2021. [2] S. J. Pitt, Clinical Microbiology for Diagnostic Laboratory Scientists. Chichester, UK: John Wiley & Sons, Ltd, 2017. doi: 10.1002/9781118745847. [3] K. Struthers, Clinical Microbiology, Second Edi. New York: CRC Press, 2017. LONTAR KOMPUTER VOL. 12, NO. 2 AUGUST 2021 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2021.v12.i02.p03 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 30/E/KPT/2018 101 [4] C. R. Mahon and D. C. Lehman, Textbook of Diagnostic Microbiology, Sixth Edit. St. Louis, Missouri: Elsevier, 2019. doi: 10.1309/u0mb-0p7r-rrwf-4bth. [5] Dinas Kesehatan Provinsi Jawa Timur, Profil Kesehatan Provinsi Jawa Timur 2019. Surabaya: Dinas Kesehatan Provinsi Jawa Timur, 2020. [6] R. Yuliwardana, β€œDeteksi Bakteri Streptococcus pneumoniae Berbasis Jaringan Syaraf Tiruan dari Citra Mikroskop Digital,” Universitas Airlangga, Surabaya, 2016. [7] R. Rulaningtyas, Andriyan Bayu Suksmono, T. Mengko, and P. Saptawati, "Multi patch approach in K-means clustering method for color image segmentation in pulmonary tuberculosis identification," in 2015 4th International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME), Bandung, Indonesia, Nov. 2015, pp. 75–78. doi: 10.1109/ICICI-BME.2015.7401338. [8] K. S. Mithra and W. R. S. Emmanuel, "Segmentation of Mycobacterium Tuberculosis Bacterium From ZN Stained Microscopic Sputum Images," in 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, Dec. 2018, pp. 150–154. doi: 10.1109/ICSSIT.2018.8748294. [9] L. N. Sahenda, M. H. Purnomo, I. K. E. Purnama, and I. D. G. H. Wisana, "Comparison of Tuberculosis Bacteria Classification from Digital Image of Sputum Smears," in 2018 International Conference on Computer Engineering, Network and Intelligent Multimedia (CENIM), Surabaya, Indonesia, Nov. 2018, pp. 20–24. doi: 10.1109/CENIM.2018.8711386. [10] F. H. Kayser, Ed., Medical microbiology. Stuttgartβ€―; New York, NY: Georg Thieme Verlag, 2005. [11] P. R. Murray, Basic Medical Microbiology. Philadelphia: Elsevier, 2018. [12] Z. E. Fitri, A. Baskara, M. Silvia, A. Madjid, and A. M. N. Imron, "Application of backpropagation method for quality sorting classification system on white dragon fruit (Hylocereus undatus)," IOP Conference Series : Earth Environmental Science, vol. 672, no. 1, p. 012085, Mar. 2021, doi: 10.1088/1755-1315/672/1/012085. [13] A. M. Nanda Imron and Z. E. Fitri, "A Classification of Platelets in Peripheral Blood Smear Image as an Early Detection of Myeloproliferative Syndrome Using Gray Level Co- Occurrence Matrix," Journal of Physics: Conference Series, vol. 1201, p. 012049, May 2019, doi: 10.1088/1742-6596/1201/1/012049. [14] Z. E. Fitri, U. Nuhanatika, A. Madjid, and A. M. N. Imron, β€œPenentuan Tingkat Kematangan Cabe Rawit (Capsicum frutescens L.) Berdasarkan Gray Level Co-Occurrence Matrix,” Jurnal Teknologi Informasi Dan Terapan (JTIT), vol. 7, no. 1, pp. 1–5, Jun. 2020, doi: 10.25047/jtit.v7i1.121. [15] Z. E. Fitri, R. Rizkiyah, A. Madjid, and A. M. N. Imron, β€œPenerapan Neural Network untuk Klasifkasi Kerusakan Mutu Tomat,” Jurnal Rekayasa Elektrika, vol. 16, no. 1, May 2020, doi: 10.17529/jre.v16i1.15535. [16] Z. E. Fitri, L. N. Y. Syahputri, and M. N. Imron, "Classification of White Blood Cell Abnormalities for Early Detection of Myeloproliferative Neoplasms Syndrome Based on K- Nearest Neighbor," Scientific Journal of Informatics, vol. 7, no. 1, p. 7, 2020. [17] I. M. A. S. Widiatmika, I. N. Piarsa, and A. F. Syafiandini, "Recognition of The Baby Footprint Characteristics Using Wavelet Method and K-Nearest Neighbor (K-NN)," Lontar Komputer Jurnal Ilmiah Teknologi Informasi, vol. 12, no. 1, p. 41, Mar. 2021, doi: 10.24843/LKJITI.2021.v12.i01.p05. [18] R. J. Al Kautsar, F. Utaminingrum, and A. S. Budi, β€œHelmet Monitoring System using Hough Circle and HOG based on KNN,” Lontar Komputer Jurnal Ilmiah Teknologi Informasi, vol. 12, no. 1, p. 13, Mar. 2021, doi: 10.24843/LKJITI.2021.v12.i01.p02.