Title Science and Technology Indonesia e-ISSN:2580-4391 p-ISSN:2580-4405 Vol. 5, No. 4, October 2020 Research Paper Performance of Cans Classification System for Di�erent Conveyor Belt Speed using Naïve Bayes Yulia Resti1*, Firmansyah Burlian2, Irsyadi Yani2 1Department of Mathematics, Faculty of Mathematics and Natural Science Universitas Sriwijaya, Sumatera Selatan, Indonesia 2Department of Machanical Engineering, Faculty of Engineering Universitas Sriwijaya, Sumatera Selatan, Indonesia *Corresponding author: yulia_resti@mipa.unsri.ac.id Abstract The classification system in the sorting process in the can recycling industry can be made based on digital images by exploring the basic color pixel values of images such as R, G, and B as variable inputs. In real time, the classification of cans in the sorting process occurs when cans placed on a conveyor belt move at a certain speed. This paper discusses the performance of can classification systems using the Naïve Bayes method. This method can handle all types of variables, including when all variables are continuous. Two types of conveyor belts are designed to get di�erent speeds, and all images of the cans are captured on both conveyor belts. Two models of Bayes naive are built on the basis of the di�erent distribution assumptions; the original model (all Gaussian distributed) and the model based on the best distribution. Performance of the classification system is built by dividing data into the learning data and the testing data with a composition of 50:50 in which each data is designed into 50 groups with di�erent percentages on each type of cans using sampling technique without replacement. The results obtained are first, the speed of the conveyor belt when capturing an image a�ects the pixel values of red, green, and blue and ultimately a�ects the results of the classification of cans. Second, not all input variables are Gaussian distributed. The classification system was built using assumption that the best distribution model for each input variable has the be�er average accuracy level than the model that assumes all input variables are Gaussian distributed, and the accuracy level of classification on the first speed of conveyor belt with a gear ratio of 12:30 and a diameter of 35 mm has an accuracy that is be�er than the other speed, both on the original model and the model based on the best distribution. However, it is necessary to test more statistical distribution models to obtain significant results. Keywords Classification System, Conveyor Belt Speed, Naïve Bayes Received: 28 August 2020, Accepted: 4 October 2020 https://doi.org/10.26554/sti.2020.5.4.111-116 1. INTRODUCTION The automation technology of an industrial system that uses intelligent computing systems has continued to develop rapidly recently (Kamboj et al. (2019); Nikhil et al. (2017); Oladapo et al. (2016); Bargal et al. (2016); Fluke (2015); Rosenblat et al. (2014)) including the automation of sorting systems in the can recy- cling industry that uses object classi�cation techniques based on digital images (Resti et al. (2018); Resti et al. (2017b)). Clas- si�cation of cans based on digital images of cans placed on a static conveyor belt can be seen in (Resti et al. (2019); Resti et al. (2017a); Resti (2015); Yani et al.; Yani et al. (2009)). In real time, the classi�cation of cans in a sorting system occurs when cans placed on a conveyor belt move at a certain speed. Obtaining a higher level of accuracy becomes important in the classi�cation system (Sin & Wang, 2019; Arono� et al. (1982)). Naïve Bayes is one method that is widely used in classi- �cation models (Harzevili and Alizadeh, 2018; Agarwal et al. (2015)) especially digital object classi�cation models can be seen in (Mansour (2018); Pérez-Díaz et al. (2017); Nikhil et al. (2017); Salinas-Gutiérrez et al. (2010); Jayech and Mahjoub (2010)). This method can handle various types of input variables. When the input variables are continuous type, generally this method is built by assuming all input variables are Gaussian distributed or a combination. The concept of conditional probability as in the Bayes theorem with the naive assumptions in this method causes the calculation of posterior probabilities to be simpler (Han et al. (2011); Mitchell (1997)). For large datasets, this method often has a higher level of accuracy than other methods (Adetunji et al. (2018); Kini et al. (2015); Loan (2006)), while for small datasets this method also perform powerful classi�er (Mansour, 2018). This article discusses the performance of a can classi�cation https://doi.org/10.26554/sti.2020.5.4.111-116 Resti et. al. Science and Technology Indonesia, 5 (2020) 111-116 Figure 1. The cans image capturing system system based on digital image of cans using the Naïve Bayes method. Regarding real time, the image capturing was carried out on cans placed on conveyor belts. Both types of conveyor belts are designed to get di�erent speeds using di�erent sizes and gear ratios, and all images of the cans are captured on both. We also propose two models of Bayes; the original model and the model based on the best distribution. In the �rst model, all input variables are assumed to be Gaussian distribution while in the second model, each input variable is assumed to be Gamma or Gaussian distribution according to statistical tests (Wang and Liu (2006); De Wet (1980); Stephens (1974); Chakravarty et al. (1967)). Performance of the classi�cation system is built by dividing data into the learning data and the testing data with a composition of 50:50 in which each data is designed into 50 groups with di�erent percentages on each type of cans using sampling technique without replacement. A highest level of accuracy is expected from the combination of these two speeds and the two models. 2. EXPERIMENTAL SECTION 2.1 Methods The stages of this research are as follows: 1. Designing 2 types of conveyor-belts, the �rst using a gear with a ratio of 12:30 and diameter of 35 mm, and the second, using a gear with a ratio of 14: 30 and diameter of 42 mm. These designs produced the speed of 0.181 m/s (the �rst conveyor-belt) and 0.086 m/s (the second conveyor-belt) respectively. 2. Capturing images of the cans placed on the �rst conveyor- belt. The cans were captured using a web camera connected to a computer with the illumination of the light-emitting diode (LED) lamp set at an angle of 30° as shown in Figure 1. Then, the cans are placed on the second conveyor belt and the image capturing proses is done the same way. Furthermore, the can image data is processed using the RGB color model with a color depth of 8 bits where the region of inter- est in Each image is obtained using image processing cropping techniques. Data summary of the pixel values of R, G, and B of the two data are presented in Table 1. 3. Divide the data into learning data and testing data with a composition of 50:50, where each data is designed into 50 groups with di�erent percentages on each type of can using sampling technique without replacement. The percentage of cans in each Table 1. Data Summary of the Pixel Values of R, G, and B Statistics The 1st speed The 1st speed Input variable Input variable R G B R G B Minimum 141.3 143.0 137.7 135.4 134.6 131.8 1st Quartile 149.8 153.0 148.9 145.4 147.8 144.0 Median 153.3 156.2 152.2 148.1 151.1 147.5 Mean 155.4 156.4 152.7 150.4 151.5 148.3 3rd Quartile 159.6 159.6 156.1 153.3 154.5 150.7 Maximum 204.8 186.2 194.8 207.6 182.5 193.4 type of the learning data and the testing data, respectively is presented in Table 2. Table 2. Design of the learning and the testing data Percentage of Cans in Each Type Learning Testing Group Can Type Group Can Type 1st 2nd 3rd 1st 2nd 3rd 1 33.6 35.2 31.2 1 25.6 31.2 43.2 2 32.8 32.0 35.2 2 34.4 32.8 32.8 3 30.4 33.6 36.0 3 28.8 32.8 38.4 4 28.0 30.4 41.6 4 31.2 36.0 32.8 5 38.4 22.4 39.2 5 20.8 44.0 35.2 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 50 30.4 32.0 37.6 50.0 28.8 34.4 36.8 4. Applying the Naive Bayes method [23-24] to model the can classi�cation system. Let pixel values from each of the red, blue, and green colors for the k-th conveyor belt type be the input variables that denoted as R k ,G k ,B k and the three types of cans are food cans, beverage cans, and non-food and beverage cans be the output variables that denoted as T jk . A can is classi�ed as a can of j-th can type, if the can has the greatest posterior probability in the j-th can type as written in (1). P(T jk |R k ,G k ,B k ) = P(R k ,G k ,B k |T jk )P(T jk ) P(R k ,G k ,B k ) (1) where P(R k ,G k ,B k |T jk ) is the likelihood function of input variables given output variable, and P(R k ,G k ,B k ) is the joint probability density function. Modeling using the Naive Bayes method consists of two as- sumptions; �rst, all input variables are assumed to be Gaussian distributions with a probability density function as in (2) for input variable R k with parameters � rk and � rk ; second, each input variable is assumed to be distributed as the best distribu- tion model of the Gaussian distribution and Gamma distribution with a probability density function as in (3) for input variable R k with parameters � rk and � rk ; based on 5 goodness of �t tests; © 2020 The Authors. Page 112 of 116 Resti et. al. Science and Technology Indonesia, 5 (2020) 111-116 Kolmogorov-Simirnov [28], Cramer von Mises [29], Anderson- Darling [30], Akaike Information Criteria and Bayesian Informa- tion Criteria [31]. The �rst assumption is called original model (OM), while the second assumption is called the best model (BM). P(R k ;� rk ,� rk ) = 1 � rk √ 2� exp [ − 1 2 ( r k − � rk � rk ) 2 ] (2) P(R k ;� rk ,� rk ) = 1 � � rk rk Γ(� rk ) r � rk −1 k exp [ − ( r k � rk )] (3) 5. Measuring the performance of can classi�cation for each conveyor-belt type data (�rst speed and second speed) and both model assumptions; original model (OM) and model based on best distribution (BM). OM assumes all input variables are Gaus- sian distributed while BM is a model based on the best distribu- tion of input variables. The accuracy performance is calculated as the mean of accuracy level. 3. RESULTS AND DISCUSSION 3.1 The best distribution model of input variables All variables from the two data are tested with 5 goodness of �t tests to determine the suitability of each variable with the Gaus- sian and Gamma distribution models. The results of 5 goodness of �t tests for the 1st speed data are given in Table 3, while the parameters of the best distribution models are given in Table ?? The distribution model that has smaller goodness of �t value is a better model. Each of input variable has 2 - 5 tests that support it as the best model. The best distribution model of the input variables R, G, and B are all Gamma distributions, on the 2nd can type are all Gaussian distributions, while on the 3rd cans type are Gamma, Gaussian, and Gamma distributions, respectively. The results of the goodness of �t tests of all input variables for each can type of the second conveyor belts speed and the parameters of the best distribution models are given in Table 5 and Table 6 successively. Table 5 informs that on the 1st can type, the best distribu- tion model of the input variables R, G, and B are all Gamma distributions, on the 2nd can type are Gamma, Gaussian, Gamma distributions, while on the 3rd cans type are all Gamma distri- butions. In the 2nd speed, at least each input variable has three tests that support it as the best model, and on average it has four tests that support it. 3.2 Performance of Classi�cation Table 7 shows the accuracy level of each conveyor-belt type both in the original model (OM) and the best distribution model (BM). Each group has a di�erent accuracy level for each conveyor- belt type both in the original model (OM) and the best distri- bution model (BM). To that end, classi�cation performance is measured as the mean of the accuracy levels of the 50 groups. The variances, bias and con�dence interval of the mean of the 50 groups are also presented in Table 7. The mean of the classi�cation accuracy level of 50 groups noted that BM has better accuracy than OM, both on the 1st data speed (the 1st conveyor-belt type) with a di�erence of 0.4%, and the 2nd data speed (the 2nd conveyor-belt type) with a di�erence of 0.1%. The variance, bias, and con�dence interval of the mean in both data also show that BM has better performance than OM. This small di�erence in the four statistics can be caused by the variable distribution model adjusted for each input variable only two, namely Gaussian and Gamma. Fitting the distribution of input variables to more distribution models allows a more appropriate distribution model to be ob- tained so that the level of accuracy can be higher. Comparison of the measurement of accuracy of the 1st and 2nd speeds for both OM and BM has a di�erence of around 6-7%, a bias di�erence of around 5-8%, and a con�dence interval of more than 7%. These measurements show that the performance of can classi�cation at the 1st speed is better than the 2nd speed at both OM and BM. 4. CONCLUSIONS This paper proposed the performance of a can classi�cation sys- tem based on the digital image built using 2 types of conveyor belts and 2 types of models in the Naive Bayes method to obtain the highest level of accuracy. The performance of the classi�- cation accuracy is built by dividing data into the learning data and testing data with a composition of 50:50 in which each data is designed into 50 groups with di�erent percentages on each type of cans using resampling techniques with replacement. The results show that the classi�cation system was built using as- sumption the best distribution model for each input variable has a better performance of accuracy than the model that assumes all input variables are Gaussian distributed, and the performance of accuracy on the �rst speed is better than the second speed, both on the original model (OM) and the model based on the best (BM) distribution. Overall, the best classi�cation performance is owned by the Naive Bayes method which assumes the best distribution model for each input variable where image data is obtained from the capturing system with a conveyor belt speed of 0.181 m/s. Important notes from the results of this study are �rst, the conveyor belt speed when capturing images a�ects the pixel value of red, green, and blue and ultimately a�ects the re- sults of the classi�cation of cans. Second, not all input variables are Gaussian distributed. Implementation of the best statistical distribution model on the Naïve Bayes method can in�uence the results of classi�cation but it is necessary to test more statistical distribution models to obtain signi�cant results. 5. ACKNOWLEDGEMENT This research was supported by DIPA, University of Sriwijaya, No. SP DIPA-042.01.2.400953/2019, for the Competitive Research, No. 0015 /UN9/SK.LP2M.PT/2019. © 2020 The Authors. Page 113 of 116 Resti et. al. Science and Technology Indonesia, 5 (2020) 111-116 Table 3. Goodness-of-�t test for the 1st speed Input Goodness The 1st cans type The 2nd cans type The 3rd cans type Variable of �t Gaussian Gamma Gaussian Gamma Gaussian Gamma R1 KS 0.12 0.12 0.07 0.06 0.06 0.06 CVM 0.20 0.17 0.06 0.07 0.04 0.04 AD 1.47 1.28 0.55 0.58 0.27 0.26 AIC 599.42 595.44 404.34 405.02 619.17 618.61 BIC 604.03 600.05 409.18 409.86 624.24 623.67 G1 KS 0.11 0.11 0.09 0.09 0.06 0.07 CVM 0.13 0.12 0.16 0.16 0.05 0.06 AD 1.02 0.95 1.33 1.40 0.34 0.39 AIC 555.60 553.72 399.01 400.00 548.47 548.81 BIC 560.21 558.33 403.84 404.84 553.54 553.87 B1 KS 0.12 0.11 0.12 0.13 0.06 0.06 CVM 0.32 0.27 0.16 0.16 0.06 0.06 AD 2.20 1.87 1.06 1.13 0.40 0.40 AIC 573.56 568.51 426.78 428.04 576.68 575.79 BIC 578.17 573.12 431.62 432.88 581.74 580.85 Table 4. Parameter of the best distribution model for the 1st speed Input The 1st cans type The 2nd cans type The 3rd cans type Variable Parameter Parameter Parameter R1 �r11 144.89 �r21 150.87 �r31 565.11 �r11 0.91 �r21 12.49 �r31 3.61 G1 �g11 249.84 �g21 154.02 �g31 158.03 �g11 1.59 �g21 2.63 �g31 4.54 B1 � b11 193.40 � b21 150.84 � b31 866.93 � b11 1.27 � b21 3.11 � b31 5.62 Table 5. Goodness-of-�t test for the 2nd speed Input Goodness The 1st cans type The 2nd cans type The 3rd cans type Variable of �t Gaussian Gamma Gaussian Gamma Gaussian Gamma R2 KS 0.13 0.12 0.07 0.07 0.11 0.11 CVM 0.32 0.26 0.09 0.09 0.25 0.22 AD 1.79 1.45 0.62 0.60 1.28 1.13 AIC 603.33 598.39 460.68 460.54 627.62 625.43 BIC 607.94 603.00 465.52 465.38 632.68 630.49 G2 KS 0.08 0.08 0.08 0.08 0.06 0.05 CVM 0.11 0.08 0.09 0.10 0.05 0.04 AD 0.70 0.54 0.61 0.66 0.31 0.27 AIC 552.90 550.84 441.79 442.09 597.78 596.59 BIC 557.51 555.45 446.62 446.93 602.85 601.65 B2 KS 0.15 0.14 0.07 0.07 0.08 0.08 CVM 0.41 0.34 0.04 0.04 0.17 0.15 AD 2.41 1.96 0.31 0.32 1.31 1.09 AIC 576.11 570.59 445.38 445.40 608.33 604.17 BIC 580.71 575.20 450.22 450.22 613.39 609.23 © 2020 The Authors. Page 114 of 116 Resti et. al. Science and Technology Indonesia, 5 (2020) 111-116 Table 6. Parameter of the best distribution model for the 2nd speed Input The 1st cans type The 2nd cans type The 3rd cans type Variable Parameter Parameter Parameter R2 �r21 132.13 �r22 1518.32 �r23 477.43 �r21 0.85 �r22 10.29 �r23 3.19 G2 �g21 246.83 �g22 150.63 �g23 663.66 �g21 1.61 �g22 3.38 �g23 4.40 B2 �b21 180.19 �b22 147.95 �b23 585.61 � b21 1.20 � b22 3.46 � b23 3.97 Table 7. Performance of Naive Bayes Accuracy level of classi�cation (%) Group 1st speed 2nd speed OM BM OM BM 1 73.6 72.8 64.8 66.4 2 75.2 76.0 72.0 71.2 3 78.4 77.6 69.6 69.6 4 71.2 72.8 67.2 69.6 5 80.0 78.4 79.2 76.8 ⋮ ⋮ ⋮ ⋮ ⋮ 50 81.6 81.6 68.8 69.6 Mean 76.6 77.0 69.2 69.3 Variance 11.5 9.7 17.5 16.9 Biased of mean 12.2 9.5 17.2 17.0 con�dence interval of mean 76.1 - 77.2 76.5 – 77.6 68.6 – 69.8 68.7 – 69.9 REFERENCES Adetunji, A., J. Oguntoye, O. Fenwa, and N. Akande (2018). Web Document Classi�cation Using Naïve Bayes. Journal of Ad- vances in Mathematics and Computer Science; 1–11 Agarwal, S., N. Jain, and S. Dholay (2015). Adaptive testing and performance analysis using naive bayes classi�er. Procedia Computer Science, 45; 70–75 Arono�, S. et al. (1982). Classi�cation accuracy: a user approach. Photogrammetric Engineering and Remote Sensing, 48(8); 1299– 1307 Bargal, N., A. Deshpande, R. Kulkarni, and R. Moghe (2016). PLC based object sorting automation. International Research Journal of Engineering and Technology, IRJET, 3(07) Chakravarty, I. M., J. Roy, and R. G. Laha (1967). Handbook of methods of applied statistics De Wet, T. (1980). Cramér-von Mises tests for independence. Journal of Multivariate Analysis, 10(1); 38–50 Fluke, J. (2015). Implementing an Automated Sorting System Han, J., J. Pei, and M. Kamber (2011). Data mining: concepts and techniques. Elsevier Harzevili, N. S. and S. H. Alizadeh (2018). Mixture of latent multinomial naive Bayes classi�er. Applied Soft Computing, 69; 516–527 Jayech, K. and M. A. Mahjoub (2010). New approach using Bayesian Network to improve content based image classi�ca- tion systems. International Journal of Computer Science Issues (IJCSI), 7(6); 53 Kamboj, D., A. Diwan, et al. (2019). Development of Automatic Sorting Conveyor Belt Using PLC. International Journal of Mechanical Engineering and Technology, 10(8) Kini, M., D. Devi, and N. Chiplunkar (2015). Text mining Ap- proach to Classify Technical Research Document using Naïve Bayes. International Journal of Advanced Research in Computer and Communication Engineering, 4(7); 386–391 Loan, P. (2006). An approach of the Naive Bayes classi�er for the document classi�cation. General Mathematics, 14(4); 135–138 Mansour, A. M. (2018). Texture classi�cation using Naïve Bayes classi�er. IJCSNS Int. J. Comput. Sci. Netw. Secur, 18(1); 112– 121 Mitchell, T. M. (1997). Machine Learning. New York: McGraw-Hill Nikhil, B., S. Pramod, G. G. W. Patil, and G. S. and (2017). Au- tomatic sorting machine. International Journal of Advance Research and Innovative Ideas in Education, 3(3): 2254-2262, 3(3); 2254–2262 Oladapo, B. I., V. Balogun, A. Adeoye, C. Ijagbemi, A. S. Oluwole, I. Daniyan, A. E. Aghor, and A. P. Simeon (2016). Model design and simulation of automatic sorting machine using proximity sensor. Engineering science and technology, an international © 2020 The Authors. Page 115 of 116 Resti et. al. Science and Technology Indonesia, 5 (2020) 111-116 journal, 19(3); 1452–1456 Pérez-Díaz, Á. P., R. Salinas-Gutiérrez, A. Hernández-Quintero, and O. D. Cedeño (2017). Supervised Classi�cation Based on Copula Functions. Res. Comput. Sci., 133; 9–18 Resti, Y. (2015). Dependence in Classi�cation of Aluminium Waste. Journal of Physics: Conference Series, 622; 012052 Resti, Y., A. Mohruni, T. Rodiana, and D. Zayanti (2019). Study in Development of Cans Waste Classi�cation System Based on Statistical Approaches. Journal of Physics: Conference Series, 1198(9); 092004 Resti, Y., A. S. Mohruni, F. Burlian, I. Yani, and A. Amran (2017a). A probability approach in cans identi�cation. MATEC Web of Conferences, 101; 03012 Resti, Y., A. S. Mohruni, F. Burlian, I. Yani, and A. Amran (2018). Design of mechanical arm for an automatic sorting system of recyclable cans. Journal of Physics: Conference Series, 1007; 012066 Resti, Y., S. M. Mohruni, F. Burlian, I. Yani, and A. Amran (2017b). Automation of a cans waste sorting system using the ejector system. Modern Applied Science, 11(3); 48–52 Rosenblat, A., T. Kneese, and D. Boyd (2014). Understanding intelligent systems. Data and Society Working Paper, October 8. Data and Society Research Institute. Salinas-Gutiérrez, R., A. Hernández-Aguirre, M. J. Rivera-Meraz, and E. R. Villa-Diharce (2010). Supervised probabilistic classi- �cation based on Gaussian copulas. In Mexican International Conference on Arti�cial Intelligence. Springer, pages 104–115 Stephens, M. A. (1974). EDF statistics for goodness of �t and some comparisons. Journal of the American statistical Association, 69(347); 730–737 Wang, Y. and Q. Liu (2006). Comparison of Akaike information criterion (AIC) and Bayesian information criterion (BIC) in se- lection of stock–recruitment relationships. Fisheries Research, 77(2); 220–225 Yani, I., M. Hannan, H. Basri, E. Scavino, and N. E. bin Ah- mad Basri (2009). Detecting Object Using Combination of Sharpening and Edge Detection Method. European Journal of Scienti�c Research, 32(1); 121–127 Yani, I., E. Scavino, M. Hannan, D. Wahab, and H. Basri (). An Au- tomatic Sorting System for Recycling Beverage Cans using the Eigenface Algorithm. In Proceedings of the Third International Conference on Soft Computing Technology in Civil, Structural and Environmental Engineering. Civil-Comp Press © 2020 The Authors. Page 116 of 116 INTRODUCTION EXPERIMENTAL SECTION Methods RESULTS AND DISCUSSION The best distribution model of input variables Performance of Classification CONCLUSIONS ACKNOWLEDGEMENT