International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol. 16, No. 04, 2022 Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning Using Mobile Application https://doi.org/10.3991/ijim.v16i04.28991 Irawan Dwi Wahyono1(), Djoko Saryono1, Hari Putranto1, Khoirudin Asfani1, Harits Ar Rosyid1, Sunarti1, Mohd Murtadha Mohamad2, Mohd Nihra Haruzuan Bin Mohamad Said2, Gwo Jiun Horng3, Jia-Shing Shih3 1Universitas Negeri Malang, East Java, Indonesia 2Universiti Teknologi Malaysia, Johor Bahru, Malaysias 3Southern Taiwan University of Science and Technology, Tainan, Taiwan irawan.dwi.ft@um.ac.id Abstract—There are many resources for media learning in online learning that all of the teachers made many media which it made a problem if there have the same subject and material. This problem made online learning having a big database and many materials made useless because the material has the same purpose. The big problem in overload database is that online learning can’t be accessed by everyone. This research to fix this problem developed an algorithm in Artificial Intelligence for the classification of material in online learning with the same subject and purpose so that teachers can use already media. This algo- rithm is text mining and Shared Nearest Neighbour (SSN) that is embedded in the mobile application to display the classification and the location of searching media in database online learning. The testing in this research applied in 142 media with 130 data training and 12 data testing is the result of testing is 94.7% of the accuracy of the algorithm and The average of validation is 73.33%. Keywords—text mining, classification, mobile application 1 Introduction The effect of the pandemic era is that all learning uses online learning in web-based applications. This reason is that all of the face-to-face learning move to online learn- ing to avoid the coronavirus in the class. The problem is that all teachers must make a media learning such as a video, text note, and animation and upload in online learning so it makes overload in database online learning [1–3]. If online learning is an overload in the database, it makes a big problem that it can’t be accessed by all people. The solving this problem is that if there is the same material with the same subject or topic, another teacher doesn’t need to upload it. Another problem is that how to know if the material already exists in online learning [2–3]. It can fix by another application to classify the material in online learning and the position of material in online learning so iJIM ‒ Vol. 16, No. 04, 2022 159 https://doi.org/10.3991/ijim.v16i04.28991 mailto:irawan.dwi.ft@um.ac.id Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… the teacher can use the material for their teaching in online learning [1–3]. Clearly, the problem will be fixed by making an application to classify all of the material in online learning based on topic and subject with a specific category. Another hand, There is much material in online learning with the same topic and same subject. It makes it useless and loading in the database. For instance, video of introduction of a network computer already exists in 10–11 file that is made database that is an overload and difficult to find the specific material with the result of time out. The problem will be worst if many people find and access with the same time that the result condition of online learning is down and can’t be accessed [4–5]. Meanwhile, many research uses artificial intelligence (AI) to solve this problem but it needs more resources for online learning. Online learning has a limited resource, so this problem will be fixed by using AI that is a little resource such as machine learning. Machine learning had used in much online learning for assessing a student and grading the student. Machine learning can be integrated into online learning with the web-based application but now, all people use the mobile application in online learning [6–7]. Obviously, now, online learning needs machine learning in the mobile application to solve the many problems in a database online learning such as making classification of material online learning. However, now online learning uses machine learning that is used for optimization or effective of usage online learning. This reason is that online learning has limited resources and is accessed by many peoples. For instance, all students use online learn- ing in the morning in the pandemic era, so the database will overload because of the time and the total of accessing it [8–10]. Online learning needs more space or elimi- nated material in there if the material is the same as another material in the same subject or topic and make classification of the material. Meanwhile, there are many machine learning algorithm that is integrated into online learning but just for assessment user or grading the user in online learning [10–11]. To illustrate, a text-mining algorithm is used for assessment, or naïve Bayes is used to classifying the ability of users or students [11–13]. Two algorithms can use for classification material in the database in online learning by processing the title of the file. Needless to say, after the processing of classification, if the material already exists in there, the teacher doesn’t make or upload another material in there. The problem is fixed by using a machine learning algorithm that is mobile-based. The purpose of this research made a mobile application to classify the material online learning based on each category of subject. The application uses a machine-learning algorithm to make classification by using the title of the file in the material in database online learning. The algorithm is text mining and Shared Nearest Neighbour (SNN) Algorithm. The length of the title is processed by a text-mining algorithm and after that gives a weighting for each word in the title. Every title of file with weighting has a value that is calculated by SNN to get near the cluster for each category. The utilization of this is a mobile application to know the accuracy of the algorithm and the result of classification. At the end of this research will be tested in a real database of online learning to get a real validation of the result of the application. 160 http://www.i-jim.org Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… 2 Method This study uses 2 methods, namely the Text Mining algorithm and the SNN Algo- rithm. Text mining performs the process of retrieval of training data, data testing, tokenizing, filtering, steaming, and cleaning, and then finally weighting the value of the IDF TF algorithm by grouping SNN based on the closeness of similarity values to classify each type of document contained in online learning. The processing in this research is shown in Figure 1. Fig. 1. The processing of this research 2.1 Text mining The stages of text mining include the cleaning, tokenizing and filtering processes. In the cleaning process, words are truncated in file titles that exceed 12 words. So from the initial data in the database, if a title is found that has more than 12 words, the 13th word to the last word is omitted. The tokenizing process in this study begins by taking the practical work title data in the database, from the practical work title data then the tokenizing process is carried out. The results of the tokenizing process are stored back in the database. In this study, the filtering process was carried out using a stop list model or eliminating words that were not important. First, the words that are considered unimportant are stored in the database, namely in, to, from, and, to, at, or. Once stored in the database, unimportant words will be called to match the words in each title. If one of the stop lists is found in the file title, the word will be deleted by the system. The results of this filtering process are then stored in a database. Then the weighting is based on the title match using the TF-IDF equation as in equation 1 [13–14] to produce several categories. IDF d df = log (1) The description of equation 1 is that IDF is the value of Frequency Document Invers, df is the total of frequency document and d is the total of the document. The sample of TF-IDF is shown in Table 1 that F is the Name of File and item is component text in the title of the file. This sample of calculation in this research uses 10 titles of files in the database of material online learning. iJIM ‒ Vol. 16, No. 04, 2022 161 Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… Table 1. The result of TF-IDF on sample Text in the Title of the File TF IDF F1 F2 F3 …. F10 Log Network 0 0 1 …. 0 1.5522 Security 1 1 1 … 1 1.4273 Technology 0 1 1 … 0 0.5980 ……. …. …. …. …. …. ….. RPL 1 0 1 …. 1 2.0293 After all of the documents have value based on weighting using TF-IDF and have many categories of the file, the application will a clustering based on near of value each of file of media in online learning using SNN algorithm. 2.2 Shared nearest neighbour (SNN) algorithm The Shared Nearest Neighbour (SNN) algorithm is a grouping process on high-dimensional data that has been developed [15–17]. The SNN algorithm requires 3 input parameters, namely, k which is the number of nearest neighbors, e which is the shared neighbor threshold value, and mint which is the minimum amount of data for each group. Shared nearest neighbor algorithm (SNN) steps in this research is [15–17] 1. Calculating the similarity value from the existing data 2. Form a list of the k-nearest neighbors of each data point for k data 3. Forming a neighboring graph from a list of k nearest neighbors 4. Find the density for each data 5. Finding representative points 6. Form a group of these representative points Meanwhile, to calculate the similarity distance between titles, the Euclidean equa- tion is used. Euclidean equality is the determination of the square root of the difference between the coordinates of a pair of objects. The distance vectors x and y (x, y) is shown in equation 2 [15],[17]. sim x y d x yi ii n ( , ) ( )� � � �� 2 1 (2) Where x and y are n-dimensional vectors. For example, after calculating TF-IDF, 10 of the title of the file is processed in the SNN algorithm and the result of the example is shown in Table 2. 162 http://www.i-jim.org Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… Table 2. The result of SNN in data set Text in the Title of the File d F1 F2 F3 ….. F10 Network 0.011 0.2211 2.4095 ….. 0 Security 0.3576 0.3576 0 …. 2.0372 Technology 4.1183 0 0.5980 ….. 0.5980 Network 0 0.5980 0 ….. 0.5108 …. …. …. …. …. …. Total 9.1302 11.6856 22.4470 ….. 2.2154 Distance vectors 3.0216 3.4184 1.4884 …… 4.1349 Most of the styles are intuitive. However, we invite you to read carefully the brief description below. 3 Result and discussion This section is about implementation and testing. The implementation uses a mobile application and it has a validation submission. The testing in this research makes testing for algorithm and validation. After testing, the analyses data will check the accuracy in algorithm and application. 3.1 Implementation Fig. 2. The processing of this research iJIM ‒ Vol. 16, No. 04, 2022 163 Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… Implementation made in a mobile platform that is shown in Figure 2. The use of this application is 1. Users must log in as a validator or as a teacher or lecture. 2. Select one or more choosing the subject of material. This application has 3 option to choose: Teknik Elektro, Pendidikan Teknik Elektro, or Teknik Informatika material. This step makes auto-select the database based on the subject. 3. Load database that had done to initial and pre-processing step. 4. Afterload the database, the application shows the result of classification based on the category of subject. 5. Users can use an option detail in each category to show the species of material online based on a specific subject. 6. After the user shows the detail of the result of each category, the user can give validation for each category or all categories. 3.2 Testing The testing in the application has 2 sections for checking accuracy. First is the testing of the algorithm for knowing the effective and valid algorithm. 1) Testing for algorithm The application is embedded in online learning with the database of material and then it is tested in all of the systems to get accuracy by using a classification algorithm. The technical of testing is K-Fold Cross-validation to get the performance of this algo- rithm. In K-Fold Cross Validation, the data set of training divide into all of the multiple random values (k) without replacement where a multiply equal with the sum of k-1 as model training. Besides that one of the rest from multiple is used for testing. This step was repeated by all of k so the kind of model and the calculation of performance was the same repeated by all of k. The total data set in this research for testing is 200 data with 10 data for each cate- gory. Selecting of sample is used by testing of data with the random method for each of category that is done for spreading of data rated in all of the categories. Data set is got from the labeling of all material in online learning in a specific subject that is electrical engineering subject. The result of testing to get the performance of the application showed in data of qualitative that is presented by the implementation of the algorithm. The data of the result of performance is got from 10 times of testing using k-Fold Cross-Validation. The total sample is 200 of material in online learning in a specific subject and the result is showed in Table 3. 164 http://www.i-jim.org Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… Table 3. The result of testing in the algorithm Testing Accuracy Precision Recall 1 94.29 82.50 71.74 2 94.76 85.09 73.25 3 92.80 79.60 64.18 4 93.29 76.34 66.96 5 94.92 83.65 72.53 6 94.34 83.21 70.61 7 93.50 75.05 66.90 8 94.92 84.01 73.24 9 94.23 79.52 71.29 10 93.16 76.92 66.22 Average 94.7 80.6 70.69 Based on Table 3, the testing for performance using Text mining and SNN algorithm with k-Fold Cross-validation is got the result that for the average of accuracy is 94.7%, the average of precision is 80,6% and lastly, the average of recall is 70.69%. 2) Testing for validation The processing of validation is the same with testing in the algorithm that it has taken 10 times to test the validation. This testing took 3 validators to check the result of the classification of material in online learning. The validator use application that is showed in Figure 2. The validator is a teacher that teaches an electrical engineering sub- ject. The teachers were checked all of the material that had been classified and they sent feedback by application. The format of feedback is valid or no. The result of validation is shown in Table 4. Table 4. The result of the validation of the application Testing Validator 1 Validator 2 Validator 3 1 Valid Valid Valid 2 Valid Valid Valid 3 Valid No Valid 4 Valid No No 5 Valid Valid Valid 6 Valid No Valid 7 No No No 8 Valid Valid No 9 Valid No No 10 Valid Valid No Average 90 60 60 iJIM ‒ Vol. 16, No. 04, 2022 165 Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… Based on Table 4, the analysis is 1. Validator 1 was given a 9 valid status in 10 times of testing. 2. Validator 2 was given a 5 valid status in 10 times of testing. The validator gave no valid in testing 3, 4, 6, 7 and 9. 3. Validator 3 was given a 5 valid status in 10 times of testing. The validator gave no valid status in testing 5,7, 8, 9, and 10. However, Validator 2 and Validator 3 given same the sum of valid status but they had a different number in testing given no valid status. The result of validation is that the average is 73.33%. The result of Table 1 and Table 2 has a relationship about the result of validation and the result of the recall. If the value of recall is high in Table 1, all validators in Table 2 are given valid status in their feedbacks. All of the testing given a significant average both testing in algorithm and testing invalidation. The average rate for all testing is 83, 5% that this research success to classify the material on online learning based on a specific subject. 4 Conclusion This research made a mobile application to classify the material online learning based on each category of subject. The application uses a machine-learning algorithm to make classification by using the title of the file in the material in database online learning. The algorithm is text mining and Shared Nearest Neighbour (SNN) Algo- rithm. The length of the title is processed by a text-mining algorithm and after that gives a weighting for each word in the title. Every title of file with weighting has a value that is calculated by SNN to get near the cluster for each category. The end of processing is that there are many categories of the subject with each of specific material online learn- ing. Clearly, this application helps teachers or students to find material online learning based on specific subjects and topics in online learning material 5 Acknowledgment This research funded by PNBP Universitas Negeri Malang, Indonesia in 2021. 6 References [1] Martin, F., Sun, T., & Westine, C. D. (2020). A systematic review of research on online teaching and learning from 2009 to 2018. Computers & Education, 159, 104009. https://doi. org/10.1016/j.compedu.2020.104009 [2] Rasheed, R. A., Kamsin, A., & Abdullah, N. A. (2020). Challenges in the online component of blended learning: A systematic review. Computers & Education, 144, 103701. https://doi. org/10.1016/j.compedu.2019.103701 166 http://www.i-jim.org https://doi.org/10.1016/j.compedu.2020.104009 https://doi.org/10.1016/j.compedu.2020.104009 https://doi.org/10.1016/j.compedu.2019.103701 https://doi.org/10.1016/j.compedu.2019.103701 Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… [3] Wahyono, I., Saryono, D., Asfani, K., Ashar, M., & Sunarti, S. (2020). Smart online courses using computational intelligence. International Journal of Interactive Mobile Technologies (iJIM), 14(12), 29–40. https://doi.org/10.3991/ijim.v14i12.1560 [4] Marcus, V. B., Atan, N. A., Yusof, S. M., & Tahir, L. (2020). A systematic review of e-service learning in higher education. International Journal of Interactive Mobile Technol- ogies, 14(6), 4–14. https://doi.org/10.3991/ijim.v14i06.13395 [5] Hoi, S. C., Sahoo, D., Lu, J., & Zhao, P. (2021). Online learning: a comprehensive survey. Neurocomputing, 459, 249–289. https://doi.org/10.1016/j.neucom.2021.04.112 [6] Joy, J., & Pillai, R. V. G. (2021). Review and classification of content recommenders in e-learning environment. Journal of King Saud University-Computer and Information Sci- ences. https://doi.org/10.1016/j.jksuci.2021.06.009 [7] Wahyono, I. D., Fadlika, I., Asfani, K., Putranto, H., & Hammad, J. (2019, October). New Adaptive Intelligence Method for Personalized Adaptive Laboratories. In 2019 International Conference on Electrical, Electronics and Information Engineering (ICEEIE) (Vol. 6, pp. 196–200). IEEE. https://doi.org/10.1109/ICEEIE47180.2019.8981477 [8] Zahour, O., Benlahmar, E. H., Eddaouim, A., & Hourrane, O. (2020). A comparative study of machine learning methods for automatic classification of academic and vocational guid- ance questions. International Journal of Interactive Mobile Technologies, 14(8), 43–60. https://doi.org/10.3991/ijim.v14i08.13005 [9] Luo, X. (2021). Efficient English text classification using selected machine learning tech- niques. Alexandria Engineering Journal, 60(3), 3401–3409. https://doi.org/10.1016/j. aej.2021.02.009 [10] Wahyono, I. D., Putranto, H., Asfani, K., & Afandi, A. N. (2019, September). VLC-UM: A Novel Virtual Laboratory using Machine Learning and Artificial Intelligence. In 2019 International Seminar on Application for Technology of Information and Communication (iSemantic) (pp. 360–365). IEEE. https://doi.org/10.1109/ISEMANTIC.2019.8884288 [11] Cheng, M. Y., Kusoemo, D., & Gosno, R. A. (2020). Text mining-based construction site accident classification using hybrid supervised machine learning. Automation in Construc- tion, 118, 103265. https://doi.org/10.1016/j.autcon.2020.103265 [12] Baharudin, N. A., & Jantan, H. (2019). Mobile-based word matching detection using intel- ligent predictive algorithm. International Journal of Interactive Mobile Technologies, 13(9), 140–151. https://doi.org/10.3991/ijim.v13i09.10848 [13] Wahyono, I. D., Saryono, D., Ashar, M., & Asfani, K. (2019, September). Face Emotional Detection Using Computational Intelligence Based Ubiquitous Computing. In 2019 Inter- national Seminar on Application for Technology of Information and Communication (iSemantic) (pp. 389–393). IEEE. https://doi.org/10.1109/ISEMANTIC.2019.8884320 [14] Kumar, S., Kar, A. K., & Ilavarasan, P. V. (2021). Applications of text mining in services management: A systematic literature review. International Journal of Information Manage- ment Data Insights, 1(1), 100008. https://doi.org/10.1016/j.jjimei.2021.100008 [15] Xie, X., Fu, Y., Jin, H., Zhao, Y., & Cao, W. (2020). A novel text mining approach for scholar information extraction from web content in Chinese. Future Generation Computer Systems, 111, 859–872. https://doi.org/10.1016/j.future.2019.08.033 [16] Liu, R., Wang, H., & Yu, X. (2018). Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Information Sciences, 450, 200–226. https://doi.org/10.1016/j. ins.2018.03.031 [17] Wahyono, I. D., Ashar, M., Fadlika, I., Asfani, K., & Saryono, D. (2019, October). A New Computational Intelligence for Face Emotional Detection in Ubiquitous. In 2019 Interna- tional Conference on Electrical, Electronics and Information Engineering (ICEEIE) (Vol. 6, pp. 148–153). IEEE. https://doi.org/10.1109/ICEEIE47180.2019.8981420 iJIM ‒ Vol. 16, No. 04, 2022 167 https://doi.org/10.3991/ijim.v14i12.1560 https://doi.org/10.3991/ijim.v14i06.13395 https://doi.org/10.1016/j.neucom.2021.04.112 https://doi.org/10.1016/j.jksuci.2021.06.009 https://doi.org/10.1109/ICEEIE47180.2019.8981477 https://doi.org/10.3991/ijim.v14i08.13005 https://doi.org/10.1016/j.aej.2021.02.009 https://doi.org/10.1016/j.aej.2021.02.009 https://doi.org/10.1109/ISEMANTIC.2019.8884288 https://doi.org/10.1016/j.autcon.2020.103265 https://doi.org/10.3991/ijim.v13i09.10848 https://doi.org/10.1109/ISEMANTIC.2019.8884320 https://doi.org/10.1016/j.jjimei.2021.100008 https://doi.org/10.1016/j.future.2019.08.033 https://doi.org/10.1016/j.ins.2018.03.031 https://doi.org/10.1016/j.ins.2018.03.031 https://doi.org/10.1109/ICEEIE47180.2019.8981420 Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning… 7 Authors Irawan Dwi Wahyono is a lecture on Department of Engineering in Universitas Negeri Malang, Indonesia (Email: irawan.dwi.ft@um.ac.id). Djoko Saryono is a Professor on Department of Literature in Universitas Negeri Malang, Indonesia (Email: djoko.saryono.fs@um.ac.id). Hari Putranto is a lecture on Department of Engineering in Universitas Negeri Malang, Indonesia (Email: Hari.putranto.ft@um.ac.id). Khoirudin Asfani is a lecture on Department of Engineering in Universitas Negeri Malang, Indonesia (Email: khoirudin.asfani.ft@um.ac.id). Harits Ar Rosyid is a lecture on Department of Engineering in Universitas Negeri Malang, Indonesia (Email: harits.ar.ft@um.ac.id). Sunarti is a lecture on Department of Literature in Universitas Negeri Malang, Indonesia (Email: sunarti.fs@um.ac.id). Mohd Murtadha Mohamad is a lecture on School of Computing in Universiti Teknologi Malaysia, Malaysia (Email: murtadha@utm.my). Mohd Nihra Haruzuan Bin Mohamad Said is a lecture on Department of Educa- tional Sciences, Mathematics and Creative Multimedia Universiti Teknologi Malaysia, Malaysia (Email: nihra@utm.my). Gwo Jiun Horng is a lecture on Department of Computer Science and Informa- tion Engineering in Southern Taiwan University of Science and Technology, Taiwan (Email: grojium@stust.edu.tw). Jia-Shing Shih is a lecture on Department of Electrical Engineering in Southern Taiwan University of Science and Technology, Taiwan (Email: jasonshih@stust.edu. tw). Article submitted 2021-12-21. Resubmitted 2022-01-24. Final acceptance 2022-01-25. Final version published as submitted by the authors. 168 http://www.i-jim.org mailto:irawan.dwi.ft@um.ac.id mailto:djoko.saryono.fs@um.ac.id mailto:Hari.putranto.ft@um.ac.id mailto:khoirudin.asfani.ft@um.ac.id mailto:harits.ar.ft@um.ac.id mailto:sunarti.fs@um.ac.id mailto:murtadha@utm.my mailto:nihra@utm.my mailto:grojium@stust.edu.tw mailto:jasonshih@stust.edu.tw mailto:jasonshih@stust.edu.tw