Journal of Applied Engineering and Technological Science Vol 4(1) 2022 : 139-148 139 CLASSIFICATION OF EDELWEISS FLOWERS USING DATA AUGMENTATION AND LINEAR DISCRIMINANT ANALYSIS METHODS Fransiscus Rolanda Malau1*, Dadang Iskandar Mulyana2 Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika, DKI Jakarta, Indonesia1,2 fransiscus.rolanda.malau@gmail.com Received : 24 August 2022, Revised: 10 September 2022, Accepted : 10 September 2022 *Corresponding Author ABSTRACT Edelweiss is a plant that grows at a height, and is known as a perennial flower because it has beautiful petals and does not wilt easily. Although edelweiss in Indonesia is still in the same family as Leontopodium Alpinum, it turns out that the type of edelweiss found in the mountains of Indonesia is different from edelweiss found abroad. Therefore, in this study, an image processing system was developed that can classify the types of edelweiss flowers based on their image using Linear Discriminant Analysis to classify data into several classes based on the boundary line (straight line) obtained from linear equations. In this study, the types of edelweiss flowers used in this study were Anaphalis Javanica and Leontopodium Alpinum, the two types of edelweiss flowers were distinguished based on their color characteristics using hue and saturation values. The images used are 1500 images for training data and 450 test data images with a training and test data ratio of 70:30, so that the accuracy produced in the testing process is 99.77% in the Linear Discriminant Analysis method. Keywords : Edelweiss Flowers; Data Augmentation; Linear Discriminant Analysis 1. Introduction The edelweiss flower or perennial flower is actually a Leontopodium Alpinum flower found in the highlands of the Alps. In Indonesia, edelweiss flowers were first discovered in 1819 by a naturalist from Germany named Georg Carl Reinwardt on the slopes of Mount Gede. However, the name edelweiss comes from German, which consists of the words edel (noble) and white (white). Translated into Indonesian, it means "high white flower" (Kiswantoro and Susanto, 2021; Martin & Susandi, 2022). The splendor of this white flower is then used as a sign or sign of eternal love. Not only that, the beauty of this eternal flower can also be a magnet for climbers to use as selfies. Although edelweiss flowers in Indonesia are still in the same family as Leontopodium Alpinum, it turns out that the types of edelweiss found in the mountains of Indonesia are different from the types of edelweiss found abroad. Types of Edelweiss Flowers Anaphalis javanica is one type of edelweiss flower that is often found by Indonesian mountain climbers (Kiswantoro and Susanto, 2021; Hanafi, et al., 2019). The crown of Javanese edelweiss consists of hundreds of small, round white flower buds that are not pointed. In the center is a yellow "flower head". Edelweiss colors such as brown, blue and pink are the result of artificial dyes. In contrast to edelweiss anaphalis javanica, European edelweiss is an edelweiss flower that is easily found in the Alps. This plant is widespread in Alpine countries such as Austria, Germany, Italy, France, and Switzerland. This perennial flower has a different shape from the Javanese edelweiss. In one flower of Edelweiss Leontopodium Alpinum there are 500 to 1,000 flower buds with 2 to 10 "flower heads" surrounded by spiky velvety white leaves (Kiswantoro and Susanto, 2021). In this study, the color feature extraction used is feature extraction based on hue and saturation values. Color feature extraction with HSV is used to obtain various information from the color in the image to facilitate the identification process. HSV (Hue, Saturation, Value) is a type of perceptual color space. HSV has cylindrical coordinates, which consists of three color channels namely Hue, Saturation and Value. In addition, LDA will achieve optimal projections to be able to enter spaces with smaller dimensions and look for patterns that can be separated to group them from contour lines obtained from linear equations(Kiswantoro and Susanto, 2021). Malau and Mulyana… Vol 4(1) 2022 : 139-148 140 Classification is the process of finding a model that can divide the data by class, and is divided into two phases: a training phase (learning) and testing to understand how the categories of data are known to improve. The model exposure assessment phase is the result of the training phase using the new data as test data. The result of this phase is the level of accuracy/performance of the model when predicting unknown class data, especially test data (Antoko, et al., 2021; Hanafi et.al, 2019; Kartika Wisnudhanti, 2020). Digital Image Processing Image is a two-dimensional matrix resulting from a continuous two-dimensional analog image into a discrete image through a sampling process. An image can be represented in the form of a two-dimensional matrix (with two variables x and y), where x and y are spatial coordinates and f (x, y) is the image at those coordinates. The smallest signal unit of the matrix is called a pixel (Hashari et.al, 2018). Machine learning is a computational algorithm or computer process that works based on historical data to improve performance in creating predictors. In machine learning, there are three learning methods, namely unsupervised learning, supervised learning, and reinforcement learning(Pradika, et al., 2020). In unsupervised learning, the training data used does not yet have a class, so the data are grouped based on the same characteristics. Supervised learning is a learning method for training data that already has classes. Furthermore, the reinforcement learning will look for the right steps in order to obtain the right predictions and in accordance with the existing conditions (Pratama et.al, 2018). Fig 1, Linear Discriminant Analysis (Source: https://media.geeksforgeeks.org) Linear Discriminant Analysis (LDA) was first applied to the facial recognition process by Etemad and Chellapa. Linear Discriminant Analysis works based on scatter matrix analysis which aims to find an optimal projection that can maximize the spread between classes and minimize the spread within the face data classes. The LDA algorithm has almost the same matrix calculation characteristics as PCA(Prasetiyanto, et al., 2022; Putra, 2017). The basic difference is that in LDA, there is a minimum difference between the images in the class. The difference between classes is represented by the Sb matrix (scatter between class) and the difference within the class is represented by the Sw matrix (scatter within class). The covariance matrix is obtained from the two matrices. To maximize the distance between classes and minimize the distance within the class, a discriminant power is used (Ramdhani, 2015). Data Augmentation is the process of enriching training data which aims to avoid the appearance of overfitting. The data augmentation process consists of several stages, namely horizontal flip, shear range, and zoom range. Shear range and zoom range itself has a value of 0.2. The horizontal flip stage works to increase the amount of training data by rotating the image or image horizontally by 90 degrees. The Shear range stage applies the shear transformation method, which is to add variations to the image by rotating the image to a certain degree, and the zoom range step is to enlarge the image to a certain scale from the original image (Naufal and Kusuma, 2021; Fadillah et al., 2021; Harahap & Muslim, 2018; Nana, et al., 2022). Based on the background of the problem that has been described, this research will build a classification system for the types of edelweiss flowers. The system built can distinguish the types of edelweiss flowers with a digital image processing approach. Where the color feature extraction Malau and Mulyana… Vol 4(1) 2022 : 139-148 141 will be carried out based on the hue and saturation values and then the extraction results will be classified using Linear Discriminant Analysis (LDA)( Nurhalimah, et al., 2020). 2. Research Methods Research Stages Fig 2. Research Stages Systematic and structured research must be carried out throughout the research stages so that the research is right on target and in accordance with the research objectives. This research consists of several stages to conduct research so that it can be carried out properly which can be seen from Figure 2. At the initial stage is to collect datasets from two types of edelweiss flowers which will be used as training and test data. This phase is very important because dataset availability is an important factor for image processing performance. The quality and number of records affect the classification results, so preparation is needed at the time of collection. From the data set that has been collected, then the data augmentation process is carried out to significantly increase the diversity of the data set that has been obtained without losing the essence or essence of the data(Sanjaya & Ayub, 2020). So in this study the dataset used consisted of 1950 images, then the dataset was divided into 70%. 30% training and testing to determine the structure of the model. Therefore, the training data used are 1500 training data and 450 test data. After collecting the dataset and augmentation data, the next step is to perform an image transformation using the L*a*b color space, which is intended to digitally identify color content. The steps taken are to change and change the image color space from RGB to XYZ. In addition, the resulting RGB color value is used as a value to calculate the L*a*b* value. After the image transformation is done, the next step is to perform image segmentation which serves to separate one object from another. Separation is carried out based on regional boundaries that have the same shape or layout. Malau and Mulyana… Vol 4(1) 2022 : 139-148 142 The output of this process is a binary image with a value of 1 (white) for the desired object and a value of 0 (black) for the background. Image segmentation in this paper uses thresholding technique. The process of transforming an image into binary format so that the feature extraction process can be easily carried out. The next step is to increase the information for feature extraction by using HSV color features based on hue and saturation values(Sentosa, et al., 2022). Color feature extraction with HSV is used to obtain different information from colors in an image to facilitate the identification process. After HSV feature extraction, color feature information is obtained for the identification process in the LDA algorithm. LDA maintains the information index area, but includes more classes. Classes are separated with the aim that this condition increases the distance between classes and reduces the distance of information processing in the classroom. The number of features produced by LDA depends on the number of classes and the number of poses performed. After all the stages are carried out in the last stage, a test will be carried out to see how well the built model works. In this phase, the validity of the developed algorithm or model is tested(Sinulingga, et al., 2016). LDA Methods This edelweiss flower classification system uses the LDA method. The purpose of Linear Discriminant Analysis is to classify objects into one of two or more groups based on various features that describe the class or group(Husein & Harahap, 2017). The edelweiss flower classification process consists of a training process and a classification process. To carry out the training process, the covariance matrix in the SW class is first searched, and the covariance matrix between the SB classes is defined as follows: ( Where : Xk : image k, Ni : number of samples in class Xi, C : number of classes, μi : average image of class and average image of class-i. Furthermore, the search for the eigenvectors of the multiplication matrix between SB and the inverse SW is carried out using Eq. The eigenvector is selected based on the largest eigenvalue. The eigenvector values are used to make projections for each training data using Eq. (Bimantoro, 2020). Where : FPT : data projection value, Xi : input data W : the vector eigenvalues are selected based on the largest eigenvalues. The results of the training process in the form of projections from each training data are then stored for comparison with test data. The classification process is carried out by projecting the test data (Bimantoro, 2020). The test data projection is done by multiplying the test data with the eigenvector used in the training process. The classification stage is carried out by comparing the projected data from the training results with the projected results from the test data. Furthermore, to find out the class of the test data, the distance between the projected training data and the projected test data is carried out using Eq Malau and Mulyana… Vol 4(1) 2022 : 139-148 143 Where : dij : distance between projected training data and test data xik : training data projection xij : test data projection. The results of the search for the distance between the projected training data and the test data are then sorted from the largest to the smallest. The result of the largest distance search is the classification result of the test data. These results are then verified by comparing the actual class with the class classification results. The software used in the analysis of edelweiss flower image data is Matlab software. Data Collection Process In this study, there were 5 images per each type of dataset obtained from image searches on google search which were disseminated from various sources. Datasets Creation Most of the image data in this study were downloaded from the internet and of course have different sizes. Therefore, then the pixels of each image used are changed to 300x300 pixels and then the background of each image is changed to red in order to facilitate the feature extraction process because the edelweiss flower itself is white. From the image, the data augmentation process is then carried out to increase the diversity of data available for the training model, the process of augmenting data in this study uses a library of the python programming language(Solihin, et al., 2022). Fig 3. Process Data Augmentation Augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks. The sample in this study took 2 types of edelweiss flowers with a sample of 1,500 images for training data and 450 images for test data so that the data generated from each type became as follows: Table 1 - Research Datasets Class Training Data Test Data Anaphalis Javanica 750 225 Leontopodium Alpinum 750 225 Total 1500 450 Resize the used image to fit the image for training or testing. Feature extraction is an object recognition technique that looks at certain features of an object that aims to perform calculations and comparisons to classify an image. Two separate data sets were used in this study, each containing training data and test data. Test Design Malau and Mulyana… Vol 4(1) 2022 : 139-148 144 Fig 4 Test Design Figure 4 shows the test of the design. The first stage is an image that has been cropped and resized outside the system and whose background has been manually changed to input for training. Next is the preprocessing stage. In this phase, the image is converted from the RGB color space to L*a*b and the image segmentation process is carried out using the thresholding method. These features are trained in LDA. The results of the research are data predictions, stored and used in the classification process. 3. Results and Discussions Fig 5. Examples of Flower Images in Each Class The image data used in this study is data on 2 types of edelweiss flower images, totaling 1500 training data, with test data of 450 edelweiss flower images. From the results of the training carried out to get very high accuracy results, namely with 100% accuracy. After doing training on the image. Malau and Mulyana… Vol 4(1) 2022 : 139-148 145 Fig 6. Training Data Distribution Graph Based on the results of the distribution of data in Figure 6 which shows that there is a clear difference between edelweiss flowers of Anaphalis Javanica and Leontopodium Alpinum, this causes a high level of accuracy Fig 7. Test Data Distribution Graph Also based on the results of the distribution of data in each class along with the resulting boundary line in Figure 7. Thus the magnitude of the resulting accuracy value indicates that of the 450 test data images carried out there is only 1 wrong image. Fig 8. GUI Classification of Flower Types Leontopodium Alpinum Similarly, the test result from the GUI program in Figure 8 show that the result of the image input process up to classifying with very good result. Malau and Mulyana… Vol 4(1) 2022 : 139-148 146 Fig 9. Value Accuracy From Figure 9 it can be seen that the testing accuracy value for 450 edelweiss flower images is 99.77%. These results can be categorized as very good. These results are influenced by several factors, including the classification can be optimal when feature extraction can provide the best information, and color-based feature extraction is very easy to recognize when the tested images have different colors. Based on the results of the author's test, the author also knows that there are several factors that cause misclassification, including: (1) The amount of training and test data is too small and must be added because the more models are trained, the more models are trained, the more models. resulting from. (2) If the picture is not clear, it will be difficult to classify the model, so that there are still errors in classification. 4. Conclusion This study uses the Linear Discriminant Analysis (LDA) method to classify the types of edelweiss flower images based on their color characteristics. This study uses a programming platform that uses the Matlab matrix-based language. Based on the results of the tests that have been carried out, it can be concluded that the classification system for the type of edelweiss flower image using the Linear Discriminant Analysis method was successfully built with the resulting accuracy in determining the flower image of 99.77%. The use of color feature extraction with the HSV algorithm helps to obtain various information from the colors in the image to assist in the recognition process. LDA can then obtain an optimal projection that can enter a smaller dimension space by looking for patterns that can be separated so that they can be grouped according to the boundaries obtained from the linear equation so that it can be determined. LDA retains the information index area, but includes more classes. Classes are separated with the aim that this condition increases the distance between classes while reducing the distance for preparing information in the classroom. The number of features produced by LDA depends on the number of classes and the number of poses performed. There are two determining classes in this study, namely Anaphalis Javanica and Leontopodium Alpinum. Suggestions for further research can add some other classes or use more complex categories for further investigation. In addition, improvements are needed to maximize feature extraction and identification and increase the number of datasets used for both training and testing. In addition, to achieve better feature Malau and Mulyana… Vol 4(1) 2022 : 139-148 147 extraction and identification, deep learning algorithms should be used to add feature extraction algorithms based not only on color but also on texture and shape. References Antoko, T. D., Ridani, M. A., & Minarno, A. E. (2021). Klasifikasi Buah Zaitun Menggunakan Convolution Neural Network. Komputika: Jurnal Sistem Komputer, 10(2), 119-126.5. Fadillah, R. Z., Irawan, A., & Susanty, M. (2021). Data Augmentasi Untuk Mengatasi Keterbatasan Data Pada Model Penerjemah Bahasa Isyarat Indonesia (BISINDO). Jurnal Informatika, 8(2), 208-214. Hanafi, M. H., Fadillah, N., & Insan, A. (2019). Optimasi Algoritma K-Nearest Neighbor untuk Klasifikasi Tingkat Kematangan Buah Alpukat Berdasarkan Warna. IT Journal Research and Development, 4(1), 10-18. Harahap, R. N., & Muslim, K. (2018). Peningkatan Akurasi Pada Prediksi Kepribadian MBTI Pengguna Twitter Menggunakan Augmentasi Data. Teknologi Informasi dan Ilmu Komputer, 815-822. Hashari, I., Hidayat, B., & Arif, J. (2018). Identifikasi Fosil Gigi Geraham Manusia Berbasis Pengolahan Citra Digital Dengan Metode Gabor Wavelet Dan Klasifikasi Linier Dicriminant Analysis (LDA). eProceedings of Engineering, 5(2). Husein, A. M., & Harahap, M. (2017). Penerapan Metode Distance Transform Pada Kernel Discriminant Analysis Untuk Pengenalan Pola Tulisan Tangan Angka Berbasis Principal Component Analysis: Penerapan Metode Distance Transform Pada Kernel Discriminant Analysis Untuk Pengenalan Pola Tulisan Tangan Angka Berbasis Principal Component Analysis. Sinkron: jurnal dan penelitian teknik informatika, 2(2), 31-36. Kartika Wisnudhanti, F. C. (2020). Metode Convolutional Neural Network Dalam Klasifikasi Citra Tiga Tokoh Wayang Pandawa. 7(2018), 1–5. Kiswantoro, A., & Susanto, D. R. (2021). Strategi Pengembangan Desa Wonokriti Sebagai Desa Wisata Edelweis Di Kawasan Taman Nasional Bromo Tengger Semeru. Journal of Tourism and Economic, 4(2), 119-134. Martin, K., & Susandi, D. (2022). Perancangan dan Implementasi Sistem Irigasi Kabut Otomatis Tanaman Edelweis Menggunakan Mikrokontroler Arduino Uno. ikraith-informatika, 6(1), 57-66. Nana, N., Mulyana, D. I., Akbar, A., & Zikri, M. (2022). Optimasi Klasifikasi Buah Anggur Menggunakan Data Augmentasi dan Convolutional Neural Network. Smart Comp: Jurnalnya Orang Pintar Komputer, 11(2), 148-161. Naufal, M. F., & Kusuma, S. F. (2021). Pendeteksi Citra Masker Wajah Menggunakan CNN dan Transfer Learning. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 8(6), 1293- 1300. Nurhalimah, N., Wijaya, I. G. P. S., & Bimantoro, F. (2020). Klasifikasi Kain Songket Lombok Berdasarkan Fitur GLCM dan Moment Invariant Dengan Teknik Pengklasifikasian Linear Discriminant Analysis (LDA). Jurnal Teknologi Informasi, Komputer, dan Aplikasinya (JTIKA), 2(2), 173-183. Pratama, A. S. S., Wibawa, A. P., & Handayani, A. N. (2022). Convolutional Neural Network (Cnn) Untuk Menentukan Gagrak Wayang KULIT. Mnemonic: Jurnal Teknik Informatika, 5(2), 98-102. Pradika, S. I., Nugroho, B., & Puspaningrum, E. Y. (2020, November). Pengenalan Tulisan Tangan Huruf Hijaiyah Menggunakan Convolution Neural Network Dengan Augmentasi Data. In Prosiding Seminar Nasional Informatika Bela Negara (Vol. 1, pp. 129-136). Prasetiyanto, A. E., Kusrini, K., & Hartanto, A. D. (2022, February). Analisis Review Siswa Selama Pembelajaran pada Masa Pandemi Menggunakan Metode Topic Modelling LDA. In STAINS (SEMINAR NASIONAL TEKNOLOGI & SAINS) (Vol. 1, No. 1, pp. 241-246). Putra, I. M. K. B. (2017). Analisis topik informasi publik media sosial di surabaya menggunakan pemodelan latent dirichlet allocation (LDA) (Doctoral dissertation, Institut Teknologi Sepuluh Nopember). Malau and Mulyana… Vol 4(1) 2022 : 139-148 148 Ramdhani, Y. (2015). Komparasi Algoritma LDA Dan Naïve Bayes Dengan Optimasi Fitur Untuk Klasifikasi Citra Tunggal Pap Smear. Jurnal Informatika, 2(2).. Sanjaya, J., & Ayub, M. (2020). Augmentasi Data Pengenalan Citra Mobil Menggunakan Pendekatan Random Crop, Rotate, dan Mixup. Jurnal Teknik Informatika dan Sistem Informasi, 6(2). Sentosa, E., Mulyana, D. I., Cahyana, A. F., & Pramuditasari, N. G. (2022). Implementasi Image Classification pada Batik Motif Bali dengan Data Augmentation dan Convolutional Neural Network. Jurnal Pendidikan Tambusai, 6(1), 1451-1463. Sinulingga, S., Fatichah, C., & Yuniarti, A. (2016). Pengenalan Wajah Menggunakan Two Dimensional Linear Discriminant Analysis Berbasis Optimasi Feature Fusion Strategy. JATISI (Jurnal Teknik Informatika dan Sistem Informasi), 3(1), 1-11. Solihin, A., Mulyana, D. I., & Yel, M. B. (2022). Klasifikasi Jenis Alat Musik Tradisional Papua menggunakan Metode Transfer Learning dan Data Augmentasi. Jurnal SISKOM-KB (Sistem Komputer dan Kecerdasan Buatan), 5(2), 36-44.