panduan LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 20 Poisonous Shrimp Detection System for Litopenaeus Vannamei using k-Nearest Neighbor Method Abdullah Husin 1 , Othman Mahmod 2 , Lisa Afrinanda 3 1,3 Department of Information System, Universitas Islam Indragiri, Indonesia 2 Department of Fundamental and Applied Sciences, Universiti Teknologi PETRONAS, Malaysia 1 abdialam@yahoo.com 2 mahmod.othman@utp.edu.my 3 Lisaafrinanda@gmail.com Abstract One of the important seafoods in the food consumption of humans is shrimp. Although shrimp contains proteins that are needed by the human body, sometimes it contains toxins. This is due to environmental factors or catching processes that may use toxins. Therefore, the community should take precautions when consuming shrimp. White shrimp (Litopenaeus vannamei) is one type of shrimp that is preferred because of its delicious taste. The purpose of this research is to develop a computerized system for poisonous white shrimp detection. The category of white shrimps consists of two kinds, i.e., fresh white shrimps that are caught in a natural way (class A), and poisonous white shrimps that are caught by using toxin (class B). The features used are RGB colors (red, green, and blue) and texture (energy, contrast, correlation, and homogeneity). A similarity-based classification is performed by the k-Nearest Neighbor (k-NN) algorithm. The experiment was conducted on a dataset consisting of 90 white shrimp images. The holdout validation method was used to evaluate the system. The original dataset was divided into two parts, whereby 60 images were used as training samples and 30 images were used as testing images. Based on the evaluation results, it can be concluded that the classification accuracy is 73.33%. The benefit of the developed system is to help the community in selecting good and safe white shrimps. Keywords: White shrimp, Classification, k-Nearest Neighbor, Holdout 1. Introduction Indonesia is one of the largest shrimp producing countries in the world. About 77% of the global shrimp production is produced by Asian countries, including Indonesia. Based on the 2013 data from the Ministry of Marine Affairs and Fisheries Indonesia, it is known that the achievement of fishery exports in Indonesia is approximately 802,000 tons at a price of US$2.6 billion. The achievement is largely sourced from shrimp commodities, which is US$997 million [1]. White shrimp, or Litopenaeus vannamei (see Figure 1), is one of the best-selling shrimps and is in great demand due to its taste, and it is often offered as the main menu at restaurants. White shrimps are fast growing in Indonesia and have several advantages over other types of shrimp as they have a fast growth cycle. The shrimps are usually caught in several ways: (1) by natural means such as nets and non-toxic baits; or (2) by toxins, for example, decis, tuba, and other toxins. mailto:1abdialam@yahoo.com mailto:2mahmod.othman@utp.edu.my mailto:3Lisaafrinanda@gmail.com LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 21 Figure 1. White Shrimp (Litopenaeus Vannamei) Generally, the detection of toxic shrimp is performed by consumers in plain view. Poisonous shrimp detection tools are rarely used by ordinary people; most tools have never been applied in the marketplace. The detection of poisonous shrimp in plain view is less precise and inconsistent due to the limitations of the senses and negligence of humans. This often adversely affects consumers such as dry throat complaints, stomach aches, and itching [2]. The rapid development of computer hardware and software supported by pattern recognition and image processing has resulted in technological advances in the detection of objects through images. Therefore, it is expected that determining the classification of white shrimps can be realized with the help of computers and technology. This system is expected to be useful for the community, by helping to detect the type of both poisonous shrimps and natural shrimps. 2. Research Methodology This study aims to develop a poisonous shrimp detection system for white shrimp variants. To achieve this objective, the following steps need to be taken as follows: Figure 2. Research Methodology 2.1. Data Collection A total of 90 (ninety) random shrimp samples were taken by image acquisition using Xiaomi Note 1 digital camera (13 MP camera resolution). A tripod was used to ensure that the image capture used the same distance of 40 cm. There are 2 (two) categories of samples taken, namely poisonous white shrimps and natural white shrimps. Each category consisted of 45 samples. All of the images were converted to bitmap file type and changed to 640 x 480 pixels resolution. Furthermore, preprocessing was applied to the images by using processing techniques. 2.2. System Development A classification system was constructed consisting of two sub-systems, namely a class builder subsystem used to build a knowledge database, and a subclassification system used to predict unknown shrimp categories. The attributes or features used are color (red, green, and blue) and texture (energy, contrast, correlation, and homogeneity). These features are significant to be used in performing image classification [3]. The process of creating the database began with the process of feature extraction. After the feature extraction of sample images were performed, the feature vector of each sample image was added and stored in a knowledge database. The classification process was executed by using the k-Nearest Neighbor [4] method. The feature Data Collection System Development System Evaluation LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 22 vector of an unknown image was compared to the feature vector of a sample image stored in the knowledge database. The similarities were then calculated by using the distance between two feature vectors. The smaller obtained vector distance indicates that the unknown image is more similar to the certain sample image. 2.3. System Evaluation System evaluation was performed to estimate the classification performance by using the holdout method [5]. In the holdout method, the original dataset with the known class labels was partitioned into two parts, namely training data used to build the knowledge database, and test data used to test the performance of the system. The ratio of training data and test data is 2 to 1. The original dataset consisting of 90 labeled images was divided into two parts. The first partition consisted of 60 images stored in the knowledge database, while the second partition comprised 30 images used as the test data. After implementation of the system, the estimation of classification performance was obtained by using the holdout method. The recapitulation of the classification results was contained in the confusion matrix [6], whether the images were categorized correctly or categorized incorrectly. Then, the confusion matrix was used to measure the performance of the classification system. An example of a confusion matrix with two categories or classes is shown in Table 1. Table 1. Confusion Matrix for Classification of Two Categories fij Prediction category (j) category=1 category=2 Real category (i) category=1 f11 f12 category=2 f21 f22 Each fij cell contains the number of objects i, which is categorized as j. The total number of objects correctly categorized is f11 + f22 and the total number of objects incorrectly categorized is f12 + f21. Generally, the accuracy of the system is the comparison between the number of objects correctly categorized and the total number of predictions, thus it can be written in the formula as follows: (1) 3. Literature Review The literature review contains theories and articles related to the concept of classification and a brief review of researches related to the k-NN algorithm and its application. 3.1. Classification Classification is the process of grouping objects or patterns into certain class labels that have been previously defined, based on their characteristics or attributes [7]. The task of classification is to predict the categorical or discrete target variable. Pattern classification is an important area in learning machine and artificial intelligence. This area has become an integral part of most intelligent engine systems or automated machines built for decision making. The input of the classification system is the pattern of unknown objects and the output is the category of unknown objects as shown in Figure 3. Pattern classification has been used for predictions and decision making [8]. Figure 3. The Block Diagram of Classification System Pattern Classification System Category LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 23 A classifier is a function that maps a pattern or object that can be represented as a feature vector to one of the class labels. In other words, a classifier is an algorithm used to perform classification tasks. There are several approaches in classification: (1) based on similarity; (2) based on probabilistic approach; (3) constructing decision boundaries; and (4) combining classifiers. 3.2. K-Nearest Neighbor K-Nearest Neighbor (k-NN) is one of the classification algorithms based on the similarity approach. K-NN is a commonly used method in classification problems. This method is effective and has been widely used in classification problems. The advantages of this method are reasonably simple, popular, effective, and efficient. This method is often applied and gives good results [9]. Similar objects will be classified in the same category. The similarity is obtained based on the closest distance between the sample data and the object. Objects are classified based on the majority of nearest neighbors, where the parameter k shows the number of nearest neighbors. Figure 4 is an illustration of the k-NN method. The question formulated in the figure is to determine the category of the green circle, whether it is a blue square or red triangle. If k = 3, then the green circle is categorized as a red triangle, because there are 2 red triangles and only 1 blue square inside the inner circle. If k = 5, then the green circle is categorized as a blue square, because there are 3 blue squares versus 2 red triangles in the outer circle. The k-NN algorithm consists of two main steps: (1) find the number of k objects in the sample that are closest to the unknown object by using the feature vector distance metric; and (2) make a vote of the k number of the closest object to determine the class of the unknown object. The accuracy of k-NN depends on the distance metric and the value of k. Generally, the distance metric used is the Euclidean distance [10] as shown by Equation (2). If two vectors are known: x = [x1, x2, x3, ... xn] and y = [y1, y2, y3, ... yn], then the distance of the two vectors is: (2) 4. Results and Discussions The system consisted of two subsystems, namely class builder subsystem that is intended to form a knowledge database, and classification subsystem that is used to classify the unknown shrimp categories. 4.1. Class Builder Subsystem There are several buttons in the class builder subsystem that can be used by the developer to build a knowledge database. Sample images were used to build a database consisting of feature vectors and classes. The interface of the class builder subsystem can be seen in Figure 5. Figure 4. k-NN Illustration LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 24 Figure 5. Class Builder Subsystem Interface 4.2. Classification Subsystem Classification subsystem aims to make a classification or detection of the image of white shrimps. There are several buttons in the classification subsystem that can be used by the user to perform the classification of shrimps. The interface of the classification subsystem can be seen in Figure 6. Figure 6. Classification Subsystem Interface 4.3. Results and Discussion A system evaluation was executed to measure the performance of the detection system. The evaluation was performed by several different parameters: k = 1, k = 3, k = 5, and k = 7. The validation test was carried out by the holdout method, where 60 images were used as training data and 30 images as test data. Table 2. Confusion Matrix for k = 1 fij Predicted Class Class A Class B Original Class Class A 10 5 Class B 3 12 LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 25 The confusion matrix can be used to measure the performance of classification. The configuration of confusion matrix at k = 1 is shown in Table 2. Based on Table 2, it is obtained that from 30 test images, there are 22 images correctly classified, while the remaining 8 images are misclassified. Thus, it can be calculated that the accuracy is 73.33%. Table 3. Confusion Matrix for k = 3 fij Predicted Class Class A Class B Original Class Class A 10 5 Class B 4 11 Based on Table 3, the classification results using k-NN for k = 3 with the test data of 30 images show the test images that are correctly classified by the system are 21 test images, while the remaining 9 test images are classified wrongly by the system. Thus, it can be seen that the accuracy obtained is 70%. Table 4. Confusion Matrix for k = 5 fij Predicted Class Class A Class B Original Class Class A 10 5 Class B 4 11 Based on Table 4, the classification results using k-NN for k = 5 with the test data of 30 images demonstrate that the testing images correctly classified by the system are 21 test images, while the remaining 9 testing images are misclassified. Thus, it can be concluded that the accuracy obtained is 70%. Table 5. Confusion Matrix for k = 7 fij Predicted Class Class A Class B Original Class Class A 9 6 Class B 6 9 Based on Table 5, the classification results using k-NN for k = 7 with the test data of 30 images indicate the test images that are correctly classified by the system are 18 test images, while the remaining 12 test images are misclassified. Thus, it can be calculated that the accuracy obtained is 60%. Some examples of correctly or successfully classified white shrimps by the system can be seen in Figure 7. Category : Class A (Natural) Distance : 2.83 Similarity : 96.98% Category : Class B (Poisonous) Distance : 2,45 Similarity : 95.84% Figure 7. White shrimps that are correctly classified Secara Benar LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 26 The name of the detected class of shrimps is given by the system, which consist of two possibilities, namely natural white shrimp and poisonous white shrimp. The distance value is calculated by using the Euclidean distance metric. The similarity is given by the system to indicate the percentage of similarity between the target white shrimps and sample images in the knowledge database. If the percentage of similarity of the target objects is less than the threshold of 50%, it will be rejected by the system automatically. The testing for rejection ability of the system against other objects is shown in Figure 8. Figure 8. Object that is rejected to be classified Based on Figure 8, it is known that if the percentage of similarity between the test image and training image is less than 50%, then the classification system will perform rejection. Thus, it can be concluded that the developed system is able to resist foreign objects. 5. Conclusion The poisonous white shrimp detector system has been successfully developed using the k- Nearest Neighbor method. The features used were RGB colors (red, green, and blue) and texture (energy, contrast, correlation, and homogeneity). The level of similarity was measured through the feature vector distance by using the Euclidean distance metric. The prediction was conducted by the system if the percentage of similarity is above 50% and rejected if otherwise. A system performance evaluation was executed by using the holdout validation method, where 60 images were used to build a knowledge database and 30 images were used for testing. The experiment was performed for several parameter values: k = 1, k = 3, k = 5, and k = 7. Based on the performance evaluation using confusion matrix, the best accuracy is 73.33% for k = 1. Nevertheless, the accuracy still needs to be improved. Optimal accuracy could not be achieved due to several factors: (1) the collection of shrimp samples was not done simultaneously; (2) the quality of the camera was considerably low; and (3) the image segmentation process was not excellent. Therefore, for better performance of the system in the future, it is suggested to overcome the abovementioned factors. References [1] D. Novita, T. R. Ferasyi, and Z. A. Muchlisin, “Intensitas dan Prevalensi Ektoparasit Pada Udang Pisang ( Penaeus sp .) Yang Berasal dari Tambak Budidaya di Pantai Barat Aceh,” Jurnal Ilmiah Mahasiswa Kelautan dan Perikanan Unsyiah, vol. 1, no. 3, pp. 268–279, 2016. [2] M. Prashanth and C. Indranil, “Journal of Medical and Health Sciences Food Poisoning : Illness Ranges from Relatively Mild Through To Life Threatening,” Journal of Medical and Health Sciences Food, vol. 5, no. 4, pp. 1–19, 2016. [3] Abdullah, Usman, and M. Efendi, “Sistem Klasifikasi Kualitas Kopra berdasarkan Warna dan tekstur Menggunakan Metode Nearest Classifier (NMC),” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 4, no. 4, pp. 297–303, 2017. [4] S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang, “Efficient kNN Classification With Different Numbers of Nearest Neighbors,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2017. [5] P. Galdi and R. Tagliaferri, “Data Mining: Accuracy and Error Measures for Classification and Prediction,” in Reference Module in Life Sciences, no. January, Elsevier, 2018, pp. 1–14. [6] J. M. Kirimi and C. A. Moturi, “Application of Data Mining Classification in Employee Performance Prediction,” International Journal of Computer Applications, vol. 146, no. 7, Category : Rejected Distance : 93,01 Similarity : 25% LONTAR KOMPUTER VOL. 9, NO. 1, APRIL 2018 p-ISSN 2088-1541 DOI : 10.24843/LKJITI.2018.v09.i01.p03 e-ISSN 2541-5832 27 pp. 28–35, 2016. [7] N. C. S. Reddy, K. S. Prasad, and A. Mounika, “Classification Algorithms on Datamining : A Study,” International Journal of Computer Intelligence Research, vol. 13, no. 8, pp. 2135–2142, 2017. [8] P. Sagar, Prinima, and Indu, “Analysis of Prediction Techniques based on Classification and Regression,” International Journal of Computer Applications, vol. 163, no. 7, pp. 47– 51, 2017. [9] M. Kibanov, M. Becker, J. Mueller, M. Atzmueller, A. Hotho, and G. Stumme, “Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data,” in Proceedings of Symposium on Applied Computing (SAC), 2017, pp. 1–9. [10] E. López-Iñesta, F. Grimaldo, and M. Arevalillo-Herráez, “Classification similarity learning using feature-based and distance-based representations: A comparative study,” Applied Artificial Intelligence, vol. 29, no. 5, pp. 445–458, 2015.