Title Science and Technology Indonesia e-ISSN:2580-4391 p-ISSN:2580-4405 Vol. 6, No. 4, October 2021 Research Paper Prediction of Plastic-Type for Sorting System using Fisher Discriminant Analysis Irsyadi Yani1, Yulia Resti2*, Firmansyah Burlian1, Ansyori3 1Department of Mechanical Engineering, Faculty of Engineering, Sriwijaya University, Palembang, 30662, Indonesia2Department of Mathematics, Faculty of Mathematics and Natural Science, Sriwijaya University, Palembang, 30662, Indonesia3Department of Electrical Engineering, Faculty of Engineering, Sriwijaya University, Palembang, 30662, Indonesia *Corresponding author: yulia_resti@mipa.unsri.ac.id AbstractRecycling is a more environmentally friendly method of managing and reducing plastic waste that can significantly reduce landdegradation, pollution, and greenhouse gas emissions. According to its composition, an essential first step in the recycling process issorting out plastic waste. However, inadequate sorting of plastic types can result in cross-contamination and increasing industrialoperating costs. A low-cost automated plastic sorting system can be developed by using digital image data in the red, green, andblue (RGB) color space as the dataset and predicting the type using learning datasets. The purpose of this paper is to demonstratehow to use Fisher Discriminant Analysis (FDA) to predict the plastic type from a digital image of the RGB model and then evaluatethe performance using cross-validation. This work has four main steps: collecting plastic digital image data, forming statistical tests,predicting plastic types, and evaluating prediction performance. FDA is quite effective for predicting the type of plastic. Performancemeasures the accuracy of 87.11 %, the recall-micro of 91.67 %, the recall-micro of 80.97 %, the specificity-micro of 90.33 %, and thespecificity-macro of 90.38 %, respectively. The micro is determined by the number of decisions made for each object. In comparison,the macro is calculated based on the average decision made by each class. KeywordsFisher Discriminant Analysis, Plastic-Type, Prediction Received: 11 July 2021, Accepted: 4 October 2021 https://doi.org/10.26554/sti.2021.6.4.313-318 1. INTRODUCTION Although plastic is the most widely used inorganic material globally, particularly in countries experiencing rapid economic growth (Srigul et al., 2016), plastic can be harmful to the envi- ronment due to its hundreds-year decomposition time (Shuai et al., 2020). Recycling is a viable option for managing and re- ducing plastic waste instead of landlls and incineration (Chow et al., 2016). This step can signicantly reduce land degrada- tion, pollution, and greenhouse gas emissions while also saving up to 95 % of the energy used in the plastic manufacturing process (Siddique et al., 2008). Sorting plastic waste according to its material composition is the initial step in the recycling process. This stage is critical because the improper classica- tion of plastic types can result in cross-contamination, which increases industrial operating costs (Pivnenko et al., 2016). In addition, this process frequently encounters diculties when attempting to dierentiate between dierent types of plastic (Ruj et al., 2015). The plastic types Polyethylene Terephtha- late (PET/PETE), High-Density Polyethylene (HDPE), and Polypropylene are widely used in the community and have the potential to become waste (PP). Due to the ineectiveness and ineciency of the manual method, automatic plastic sorting is a viable solution to this problem. A low-cost automatic plastic sorting system can be developed by utilizing machine learning and a digital image with the RGB color model as a dataset. Machine learning- derived predicted plastic-type values have a purpose in the sorting process. The articial neural network backpropagation (ANNB) method also is implemented to predict plastic-type based on digital images (Khona’ah et al., 2015; Yani et al., 2020). The ANNB algorithm is a widely used and popular prediction/classication algorithm. However, the minimum accuracy of the classication method is 85 % (Arono, 1985). Additionally, the performance of the method is solely based on its accuracy. Therefore, numerous metrics must be used to evaluate the eectiveness of methods (Gorunescu, 2011). One of the prediction methods in machine learning is Fisher Discriminant Analysis. This method is a powerful tool for developing a statistical prediction algorithm (Raudys and Young, 2004). It has proven very successful in a variety of tasks, including recognizing, assessment of risk, identication, diagnosis, or classifying (Vranckx et al., 2021; Chumachenko https://crossmark.crossref.org/dialog/?doi=10.26554/sti.2021.6.4.313-318&domain=pdf https://doi.org/10.26554/sti.2021.6.4.313-318 Yani et. al. Science and Technology Indonesia, 6 (2021) 313-318 et al., 2021; Bari and Fattah, 2020; Wang et al., 2018). This method has several models, such as LinearDiscriminant Analy- sis (LDA), Quadratic Discriminant Analysis (QDA), and Fisher Discriminant Analysis (FDA). The rst two models require a Gaussian multivariate assumption. Only the LDA and FDA assume that the covariance matrix is homogeneous. When the covariance matrix is not homogeneous, the more appropri- ate model is QDA. LDA is more appropriate than QDA for small sample sizes in learning data and vice versa for enormous sample sizes (James et al., 2013). The LDA can also be more appropriate than QDAwhen the data dimension is small (Wahl and Kronmal, 1977). This article proposes using FDA to predict the three plastic types used in sorting systems, with ve metrics for method per- formance: accuracy, the micro and macro proportion of plastic types correctly predicted (recall-micro and recall-macro), and the micro and macro proportion of plastic types correctly pre- dicted (specicity-micro and specicity-macro) (Dinesh and Dash, 2016; Sokolova and Lapalme, 2009). 2. EXPERIMENTAL SECTION 2.1 Materials The statistics summary of image data collected related to the ve normalized predictor variables is noted in Table 1. Table 1. Summary Statistic of Variable Statistic Predictor Variable Red Green Blue Entropy Variance X1 X2 X3 X4 X5 Minimum 0.33 0.35 0.33 0.00 0 1st Quartile 0.61 0.63 0.67 0.01 0.01 Median 0.7 0.75 0.78 0.02 0.02 Mean 0.76 0.79 0.8 0.01 0.05 3rd Quartile 0.98 0.99 0.98 0.02 0.12 Maximum 1 1 1 0.03 0.13 2.2 Methods Figure 1 presents the main stages of this research. Each stage has a minimal one step. The rst need to get images of plastic is to build the acquisition system. This system has two key components: a web camera that takes digital images and a computer that processes the images into the RGB format. There are 450 dierent plastic data collected by captur- ing the images in three dierent random poses. Plastic waste comes from three types; PET, HDPE, and PP. The obtained images are processed into RGB color format, where each color component has a value of 8 bits so that each color component has a scale of 28 = 256 or a pixel value range of 0 to 255. The resolution of the image stored in the database is 560 × 420 pixels. The image is cropped to 34 × 34 pixels with cropping coordinates [280 180 33 33]. Figure 2 presents the three types of the cropped plastic waste digital image. Figure 1. Research Methodology Figure 2. Digital Image of Plastic-Type The second step is to check the discriminant analysis as- sumptions. This work proposed the Discriminant Analysis s related to the plastic-types prediction method. The Doornik- Hansen (Adesoye et al., 2016), the Fligner-Killeen (Stevens, 2012), and the Pillai Trace (Carey, 1998) tests to check multi- variate Gaussian distribution, covariance matrix homogeneity, and mean vector equality assumptions related to the prediction method assumptions. The tests are written in (1)–(3), Doornik − Hansen = ( Z( √︁ \1) + +Z22 ) (1) FK = ∑k j=1 nj ( x̄zj − x̄ 2 )2 S2 (2) PT = trace ( B(B +W )−1 ) (3) FortheDoonik-HansentestAdesoyeetal. (2016),Z( √ \1) and z2 are dened respectively as, Z( √︁ \1) = ln(G/c) + √︁ (G/c)2 + 1√︁ ln(𝜔) (4) z2 = (( b 2𝜑 ) 1 3 − 1 + 1 9𝜑 ) (9𝜑) 1 2 (5) © 2021 The Authors. Page 314 of 318 Yani et. al. Science and Technology Indonesia, 6 (2021) 313-318 where G, c, 𝜔2, b , and𝜑 are written successively as, G = √︁ \1 √︄ (n + 1)(n + 3) 6(n − 1) (6) c = √︄ 2 (𝜔2 − 1) (7) 𝜔 2 = −1 + √︁ 2𝛽2 − 1 (8) b = (b2 − 1 − b2)2k (9) 𝜑 = (n + 5)(n + 7)((n − 2)(n2 + 27n − 70) + b1(n − 7)(n2 + 2n − 5)) 6(n − 3)(n + 1)(n2 + 15n − 4) (10) For m2 and m3 are the second and third central moments, respectively, \1 = m23 m32 (11) 𝛽2 = 3(n2 + 27n − 70)(n − 3)(n + 1) (n − 2)(n + 5)(n + 7)(n + 9) (12) k = (n + 7)(n + 7)(n3 + 37n2 + 11n − 313)(n − 3)(n + 1) 12(n − 3)(n + 1)(n2+15n−4) (13) The Fligner-Killeen test are dened Stevens (2012) succes- sively as, S = 1 Σkj=1nj Σ k j=1njSj (14) For the Pillai Trace test B and W are formulated Carey (1998) as, B = k∑︁ j=1 nj(X̄j − X̄)(X̄j − X̄)T (15) W = k∑︁ j=1 nj k∑︁ i=1 nj(xi j − X̄)(xi j − X̄)T (16) The third step is to implement the discriminant analysis to build learning models and predict the plastic types. The stages in this step are randomly split data, learning model de- velopment, and predict the plastic types into PET, HDPE, and PP for testing data. The data were randomized into ve-folds, four folds to build a learning model, and the remaining one- fold to predictive data (Lantz, 2019; Alpaydin, 2016). The model analysis that is implemented is one of three models: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), or Fisher Discriminant Analysis (FDA). The model selection is based on the results of statistical testing as- sumptions. LDA or QDA can be implemented when Gaussian assumptions are fullled. LDA considers that all groups have the same covariance matrix, whereas QDA is calculated based on the covariance matrix of each group (Hastie et al., 2009). The sample size is critical when deciding whether to use LDA or QDA (Wahl and Kronmal, 1977). Generally, LDA is more appropriate than QDA for small sample sizes in learning data and vice versa for enormous sample sizes (James et al., 2013). However, if this assumption is not met, it is more appropriate to implement the FDA. The plastic image with X = (X1, X2, X3, X4, X5)T is clas- sied as the j -th plastic-type if the discriminant function d̂j(x) is the largest. The d̂j(x) for both models, LDA and QDA, respectively (James et al., 2013). 𝛿(x) = ln𝜋j + XTΣ−1 ` j − 1 2 ` T j Σj ` j (17) 𝛿(x) = ln𝜋j− 1 2 ln|Σj |− 1 2 XTΣ−1X+XTΣ1j ` j− 1 2 ` T j Σj ` j (18) with covariance matrix respectively, Σ and Σj,∀j. In FDA, X is classied as the j -th plastic-type if the linear combination,Yj = VTX, is maximum where, V = S−1W ( `1 − `2) (19) SW = 2∑︁ j=1 Sj (20) Sj = ∑︁ xi𝜖 jthg (Xi − ` j)(Xi − ` j)T (21) ` j = 1 nj ∑︁ xi𝜖 jthg Xi (22) The nal step is to evaluate the performance of the discrim- inant analysis. Scalar values are used to represent classication © 2021 The Authors. Page 315 of 318 Yani et. al. Science and Technology Indonesia, 6 (2021) 313-318 performance in various metrics such as accuracy, recall-micro (`), recall-macro (M), specicity-micro (`), and specicity- macro (M). The TPj , FPj ,TNj , and FNj values are deter- mined for each plastic type, j = 1, 2, 3. The micro proportion is calculated based on the number of decisions per object, while the macro proportion is calculated based on the average deci- sion per class. The performance measurements refer to Table 2 for the rst plastic type. The performance measure for other plastic types is determined similarly (Dinesh and Dash, 2016; Sokolova and Lapalme, 2009). Table 2. Confusion Matrix for Plastic-Type, j = 1 Actual j 1 2 3 Prediction 1 True-Positive False-Negative False-Negative (TP) (FN) (FN) 2 False-Positive True-Negative True-Negative (FP) (TN) (TN) 3 False-Positive True-Negative True-Negative (FP) (TN) (TN) Accuracy = Σ3j=1 TPj+TNj TPj+FPjFNj+TN 3 (23) Recall` = Σ3j=1TPj Σ3j=1(TPj + FNj) (24) RecallM = Σ3j=1 TPj (TPj+FNj) 3 (25) Speci f icity` = Σ3j=1TNj Σ3j=1(FPj +TNj) (26) Speci f icityM = Σ3j=1 TNj (FPj+TNj) 3 (27) 3. RESULTS AND DISCUSSION Tables 3–4 summarize the results of the assumption tests for discriminant analysis for all of the learning data. This work usedtheDoornik-HansenandtheFligner-Killeentests toassess multivariateGaussiandistributionsofexplanatoryvariablesand homogeneity of covariance matrices between types of plastic waste, respectively. Table 3 demonstrates that not all plastic types in all learn- ing data have a multivariate Gaussian distribution at the 5 % signicance level. Only the rst, second, and fth folds datasets, Table 3. Multivariate Gaussian Test Doornik-Hansen Test Learning Data 1 2 3 4 5 Type of plastic PET statistic 140.96 143.88 123.75 104.97 117.42 p-value 0 0 0 0 0 HDPE statistic 397.25 456.53 351.74 369.93 300.34 p-value 0 0 0 0 0.09 PP statistic 29.05 27.37 25.77 36.07 37.22 p-value 0 0 0 0 0 and even then, only HDPE plastic-type data have a multivari- ate Gaussian distribution. The assumption of a multivariate Gaussian distribution is required for the majority of multivari- ate analyses. However, it is challenging to locate data with a multivariate Gaussian distribution over all real-world groups (Hallin and Paindaveine, 2009). Table 4. Homogeneity of Covariance Matrices Test Fligner-Killeen Test Learning Data 1 2 3 4 5 Chi-sq 4.35 0.58 0.95 1.35 1.7 p-value 0.11 0.74 0.62 0.51 0.43 The next assumption test in discriminant analysis is the ho- mogeneityof the covariance matrix. This independent variable test is carried out when the Gaussian multivariate assumption is not met. Currently, the assumption of an equal mean vector is not necessary. Related to the homogeneity test as described in Table 4, the result shows that all learning data have a ho- mogenous covariance matrix with a signicance level of 5 %. FDA is used to make predictions based on the ndings test of the Gaussian multivariate and the covariance matrix homogeneity assumptions. Table 5. Performance of Plastic Waste Classication using FDA Performance Testing Data Average VarianceMeasurement 1 2 3 4 5 Accuracy 87.41 85.19 85.93 88.89 88.15 87.11 2.37 Recall miu 81.11 77.78 78.89 79.07 83.22 91.67 5.29 RecallM 81.03 78.86 79.07 83.22 82.68 80.97 4.04 Specicity miu 90.56 88.89 89.44 91.67 91.11 90.33 1.32 SpecicityM 90.74 88.87 89.53 91.49 91.26 90.38 1.30 This work has an accuracy of 87.11 %, recall-micro (`) and recall-macro (M) at 91.67 % and 80.97 % respectively, specicity-micro (`) and specicity-macro (M) at 90.33 % and 90.38 % respectively. This information shows that the FDA method is quite good at predicting plastic type since, according to Arono (1985), the minimum accuracy of the classication method is 85 %. Other than that, the specicity that calcu- lates the truth in all plastic-types other than the selected types against all other types has the higher standard deviation (about 2 %), and the recall calculates the correctness model of statisti- cal learning in predicting that the plastic-type has the lowest standard deviation (about 1 %). Thus, this work’s result is better © 2021 The Authors. Page 316 of 318 Yani et. al. Science and Technology Indonesia, 6 (2021) 313-318 than Khona’ah et al. (2015), who implemented the ANNB al- gorithm to predict the plastic types with an accuracyof 86.67 %. Although the dierence in prediction accuracy does not reach 1 %, this work has proposed dierent validation techniques and more performance measures than Khona’ah et al. (2015) to show that the prediction results have low variance. Therefore, better prediction performance for plastic types than our pro- posed method can be obtained by implementing classication methods that do not require the assumption of a multivariate Gaussian distribution and homogeneity of the covariance ma- trix. These methods include k-NN, decision tree, or Support Vector Machine. 4. CONCLUSIONS Plastic recycling is a more environmentally friendly method of managing and reducing plastic waste that can signicantly reduce land degradation, pollution, and greenhouse gas emis- sions. This stage is crucial because inaccurate sorting of plastic types can cross-contamination and increase industrial operat- ing costs. This paper evaluates the performance of the Fisher Discriminant Analysis model to predict the plastic type using digital images. Thismodelsuccessfullypredicts theplastic-type. Performance measures the accuracy of 87.11 %, the micro and macro proportion of plastic-type with correctly predicted (re- call) was 91.67 % and 80.97 %, respectively. In contrast, the micro and macro proportion of the plastic-type into other types predicted correctly (specicity) was 90.33 % and 90.38 %, respectively. However, superior prediction performance for plastic types can be obtained using classication methods that do not require the assumption of a multivariate Gaussian distribution and homogeneity of the covariance matrix, for the examples k-NN, decision tree, or Support Vector Machine. 5. ACKNOWLEDGEMENT The research/publication of this article was funded by DIPA of Public Service Agency of Sriwijaya University 2021. SP DIPA-023.17.2.677515 /2021, On November 23, 2020. In accordance with the Rector’s Decree Number: 0010/ UN9/ SK.LP2M.PT/2021, On April 28, 2021. REFERENCES Adesoye, J., B. Golam Kibria, and F. George (2016). Perfor- mances of several univariate tests of normality: An empirical study. J. Biom. Biostat, 7; 1–8 Alpaydin, E. (2016). Machine learning: the new AI. MIT press Arono, S. (1985). The minimum accuracy value as an index of classication accuracy. Photogrammetric Engineering and Remote Sensing, 51(1); 99–111 Bari, M. F. and S. A. Fattah (2020). Epileptic seizure detection in EEG signals using normalized IMFs in CEEMDAN do- main and quadratic discriminant classier. Biomedical Signal Processing and Control, 58; 101833 Carey, G. (1998). Multivariateanalysisofvariance (MANOVA) II: practical guide to ANOVA and MANOVA for SAS. Re- trieved September, 1; 2009 Chow, C.-f., W.-M. W. So, and T.-Y. Cheung (2016). Re- search and development of a new waste collection bin to facilitate education in plastic recycling. Applied Environmen- tal Education & Communication, 15(1); 45–57 Chumachenko, K., J. Raitoharju, A. Iosidis, and M. Gabbouj (2021). Speed-up and multi-view extensions to subclass discriminant analysis. Pattern Recognition, 111; 107660 Dinesh, S. and T. Dash (2016). Reliable evaluation of neural network for multiclass classication of real-world data. arXiv preprint arXiv:1612.00671 Gorunescu, F. (2011). Data Mining: Concepts, models and tech- niques, volume 12. Springer Science & Business Media Hallin, M. and D. Paindaveine (2009). Optimal tests forhomo- geneityofcovariance, scale, andshape. JournalofMultivariate Analysis, 100(3); 422–444 Hastie, T., R. Tibshirani, and J. Friedman (2009). The ele- ments of statistical learnin. Cited on; 33 James, G., D. Witten, T. Hastie, and R. Tibshirani (2013). An introduction to statistical learning, volume 112. Springer Khona’ah, B., D. Rosiliani, and I. Yani (2015). Identication and Clasication of Plastic Color Images Based on The RGB Method. . Journal of Multidisciplinary Engineering Science and Technology, 6(6); 10170–10174 Lantz, B. (2019). Machine learning with R: expert techniques for predictive modeling. Packt publishing ltd Pivnenko, K., M. Eriksen, J. Martín-Fernández, E. Eriksson, and T. Astrup (2016). Recycling of plastic waste: Presence of phthalates in plastics from households and industry. Waste Management, 54; 44–52 Raudys, Š. and D. M. Young (2004). Results in statistical discriminant analysis: A review of the former Soviet Union literature. Journal of Multivariate Analysis, 89(1); 1–35 Ruj, B., V. Pandey, P. Jash, and V. Srivastava (2015). Sorting of plastic waste for eective recycling. International Journal of Applied Science and Engineering Research, 4(4); 564–571 Shuai, C., Y. Cheng, W. Yang, P. Feng, Y. Yang, C. He, F. Qi, and S. Peng (2020). Magnetically actuated bone scaold: Microstructure, cell response and osteogenesis. Composites Part B: Engineering, 192; 107986 Siddique, R., J. Khatib, and I. Kaur (2008). Use of recycled plastic in concrete: A review. Waste Management, 28(10); 1835–1852 Sokolova, M. and G. Lapalme (2009). A systematic analysis of performance measures for classication tasks. Information processing & management, 45(4); 427–437 Srigul, W., P. Inrawong, and M. Kupimai (2016). Plastic classi- cationbaseoncorrelationofRGBcolor. In201613thInter- national Conference on Electrical Engineering/Electronics, Com- puter, Telecommunications and Information Technology (ECTI- CON). IEEE, pages 1–5 Stevens, J. P. (2012). Applied multivariate statistics for the social sciences. Routledge Vranckx, I., J. Raymaekers, B. De Ketelaere, P. J. Rousseeuw, © 2021 The Authors. Page 317 of 318 Yani et. al. Science and Technology Indonesia, 6 (2021) 313-318 and M. Hubert (2021). Real-time discriminant analysis in the presence of label and measurement noise. Chemometrics and Intelligent Laboratory Systems, 208; 104197 Wahl, P.W.andR.A.Kronmal (1977). Discriminant functions when covariances are unequal and sample sizes are moderate. Biometrics; 479–484 Wang, X., X. Li, R. Ma, Y. Li, W. Wang, H. Huang, C. Xu, and Y. An (2018). Quadratic discriminant analysis model for assessing the risk of cadmium pollution for paddy elds in a county in China. EnvironmentalPollution, 236; 366–372 Yani, I., D. Rosiliani, B. Khona’ah, and F. Almahdini (2020). Identication and plastic type and classication of PET, HDPE, and PPusing RGB method. In IOPConference Series: Materials Science and Engineering, volume 857. IOP Publish- ing, page 012015 © 2021 The Authors. Page 318 of 318 INTRODUCTION EXPERIMENTAL SECTION Materials Methods RESULTS AND DISCUSSION CONCLUSIONS ACKNOWLEDGEMENT