International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol. 16, No. 04, 2022


Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

Shared Nearest Neighbour in Text Mining for 
Classification Material in Online Learning 

Using Mobile Application

https://doi.org/10.3991/ijim.v16i04.28991

Irawan Dwi Wahyono1(), Djoko Saryono1, Hari Putranto1, Khoirudin Asfani1, 
Harits Ar Rosyid1, Sunarti1, Mohd Murtadha Mohamad2,  

Mohd Nihra Haruzuan Bin Mohamad Said2, Gwo Jiun Horng3, Jia-Shing Shih3
1Universitas Negeri Malang, East Java, Indonesia

2Universiti Teknologi Malaysia, Johor Bahru, Malaysias
3Southern Taiwan University of Science and Technology, Tainan, Taiwan

irawan.dwi.ft@um.ac.id

Abstract—There are many resources for media learning in online learning 
that all of the teachers made many media which it made a problem if there have 
the same subject and material. This problem made online learning having a big 
database and many materials made useless because the material has the same 
purpose. The big problem in overload database is that online learning can’t be 
accessed by everyone. This research to fix this problem developed an algorithm 
in Artificial Intelligence for the classification of material in online learning with 
the same subject and purpose so that teachers can use already media. This algo-
rithm is text mining and Shared Nearest Neighbour (SSN) that is embedded in 
the mobile application to display the classification and the location of searching 
media in database online learning. The testing in this research applied in 142 
media with 130 data training and 12 data testing is the result of testing is 94.7% 
of the accuracy of the algorithm and The average of validation is 73.33%.

Keywords—text mining, classification, mobile application

1 Introduction

The effect of the pandemic era is that all learning uses online learning in web-based 
applications. This reason is that all of the face-to-face learning move to online learn-
ing to avoid the coronavirus in the class. The problem is that all teachers must make a 
media learning such as a video, text note, and animation and upload in online learning 
so it makes overload in database online learning [1–3]. If online learning is an overload 
in the database, it makes a big problem that it can’t be accessed by all people. The 
solving this problem is that if there is the same material with the same subject or topic, 
another teacher doesn’t need to upload it. Another problem is that how to know if the 
material already exists in online learning [2–3]. It can fix by another application to 
classify the material in online learning and the position of material in online learning so 

iJIM ‒ Vol. 16, No. 04, 2022 159

https://doi.org/10.3991/ijim.v16i04.28991
mailto:irawan.dwi.ft@um.ac.id


Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

the teacher can use the material for their teaching in online learning [1–3]. Clearly, the 
problem will be fixed by making an application to classify all of the material in online 
learning based on topic and subject with a specific category.

Another hand, There is much material in online learning with the same topic and 
same subject. It makes it useless and loading in the database. For instance, video of 
introduction of a network computer already exists in 10–11 file that is made database 
that is an overload and difficult to find the specific material with the result of time out. 
The problem will be worst if many people find and access with the same time that the 
result condition of online learning is down and can’t be accessed [4–5]. Meanwhile, 
many research uses artificial intelligence (AI) to solve this problem but it needs more 
resources for online learning. Online learning has a limited resource, so this problem 
will be fixed by using AI that is a little resource such as machine learning. Machine 
learning had used in much online learning for assessing a student and grading the 
student. Machine learning can be integrated into online learning with the web-based 
application but now, all people use the mobile application in online learning [6–7]. 
Obviously, now, online learning needs machine learning in the mobile application to 
solve the many problems in a database online learning such as making classification of 
material online learning.

However, now online learning uses machine learning that is used for optimization 
or effective of usage online learning. This reason is that online learning has limited 
resources and is accessed by many peoples. For instance, all students use online learn-
ing in the morning in the pandemic era, so the database will overload because of the 
time and the total of accessing it [8–10]. Online learning needs more space or elimi-
nated material in there if the material is the same as another material in the same subject 
or topic and make classification of the material. Meanwhile, there are many machine 
learning algorithm that is integrated into online learning but just for assessment user 
or grading the user in online learning [10–11]. To illustrate, a text-mining algorithm 
is used for assessment, or naïve Bayes is used to classifying the ability of users or 
students [11–13]. Two algorithms can use for classification material in the database in 
online learning by processing the title of the file. Needless to say, after the processing of 
classification, if the material already exists in there, the teacher doesn’t make or upload 
another material in there.

The problem is fixed by using a machine learning algorithm that is mobile-based. 
The purpose of this research made a mobile application to classify the material online 
learning based on each category of subject. The application uses a machine-learning 
algorithm to make classification by using the title of the file in the material in database 
online learning. The algorithm is text mining and Shared Nearest Neighbour (SNN) 
Algorithm. The length of the title is processed by a text-mining algorithm and after 
that gives a weighting for each word in the title. Every title of file with weighting has a 
value that is calculated by SNN to get near the cluster for each category. The utilization 
of this is a mobile application to know the accuracy of the algorithm and the result 
of classification. At the end of this research will be tested in a real database of online 
learning to get a real validation of the result of the application.

160 http://www.i-jim.org



Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

2 Method

This study uses 2 methods, namely the Text Mining algorithm and the SNN Algo-
rithm. Text mining performs the process of retrieval of training data, data testing, 
tokenizing, filtering, steaming, and cleaning, and then finally weighting the value of 
the IDF TF algorithm by grouping SNN based on the closeness of similarity values 
to classify each type of document contained in online learning. The processing in this 
research is shown in Figure 1.

Fig. 1. The processing of this research

2.1 Text mining

The stages of text mining include the cleaning, tokenizing and filtering processes. 
In the cleaning process, words are truncated in file titles that exceed 12 words. So from 
the initial data in the database, if a title is found that has more than 12 words, the 13th 
word to the last word is omitted.

The tokenizing process in this study begins by taking the practical work title data in 
the database, from the practical work title data then the tokenizing process is carried 
out. The results of the tokenizing process are stored back in the database. In this study, 
the filtering process was carried out using a stop list model or eliminating words that 
were not important. First, the words that are considered unimportant are stored in the 
database, namely in, to, from, and, to, at, or. Once stored in the database, unimportant 
words will be called to match the words in each title. If one of the stop lists is found in 
the file title, the word will be deleted by the system. The results of this filtering process 
are then stored in a database. Then the weighting is based on the title match using the 
TF-IDF equation as in equation 1 [13–14] to produce several categories.

 
IDF

d
df

= log
 

(1)

The description of equation 1 is that IDF is the value of Frequency Document Invers, 
df is the total of frequency document and d is the total of the document. The sample of 
TF-IDF is shown in Table 1 that F is the Name of File and item is component text in the 
title of the file. This sample of calculation in this research uses 10 titles of files in the 
database of material online learning.

iJIM ‒ Vol. 16, No. 04, 2022 161



Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

Table 1. The result of TF-IDF on sample

Text in the Title 
of the File

TF IDF

F1 F2 F3 …. F10 Log

Network 0 0 1 …. 0 1.5522

Security 1 1 1 … 1 1.4273

Technology 0 1 1 … 0 0.5980

……. …. …. …. …. …. …..

RPL 1 0 1 …. 1 2.0293

After all of the documents have value based on weighting using TF-IDF and have 
many categories of the file, the application will a clustering based on near of value each 
of file of media in online learning using SNN algorithm.

2.2 Shared nearest neighbour (SNN) algorithm

The Shared Nearest Neighbour (SNN) algorithm is a grouping process on 
high-dimensional data that has been developed [15–17]. The SNN algorithm requires 
3 input parameters, namely, k which is the number of nearest neighbors, e which is the 
shared neighbor threshold value, and mint which is the minimum amount of data for 
each group.

Shared nearest neighbor algorithm (SNN) steps in this research is [15–17]

1. Calculating the similarity value from the existing data
2. Form a list of the k-nearest neighbors of each data point for k data
3. Forming a neighboring graph from a list of k nearest neighbors
4. Find the density for each data
5. Finding representative points
6. Form a group of these representative points

Meanwhile, to calculate the similarity distance between titles, the Euclidean equa-
tion is used. Euclidean equality is the determination of the square root of the difference 
between the coordinates of a pair of objects. The distance vectors x and y (x, y) is 
shown in equation 2 [15],[17].

 
sim x y d x yi ii

n
( , ) ( )� � �

��
2

1  
(2)

Where x and y are n-dimensional vectors.
For example, after calculating TF-IDF, 10 of the title of the file is processed in the 

SNN algorithm and the result of the example is shown in Table 2.

162 http://www.i-jim.org



Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

Table 2. The result of SNN in data set

Text in the Title of 
the File

d

F1 F2 F3 ….. F10

Network 0.011 0.2211 2.4095 ….. 0

Security 0.3576 0.3576 0 …. 2.0372

Technology 4.1183 0 0.5980 ….. 0.5980

Network 0 0.5980 0 ….. 0.5108

…. …. …. …. …. ….

Total 9.1302 11.6856 22.4470 ….. 2.2154

Distance vectors 3.0216 3.4184 1.4884 …… 4.1349

Most of the styles are intuitive. However, we invite you to read carefully the brief 
description below.

3 Result and discussion

This section is about implementation and testing. The implementation uses a mobile 
application and it has a validation submission. The testing in this research makes testing 
for algorithm and validation. After testing, the analyses data will check the accuracy in 
algorithm and application.

3.1 Implementation

Fig. 2. The processing of this research

iJIM ‒ Vol. 16, No. 04, 2022 163



Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

Implementation made in a mobile platform that is shown in Figure 2. The use of this 
application is

1. Users must log in as a validator or as a teacher or lecture.
2. Select one or more choosing the subject of material. This application has 3 option to 

choose: Teknik Elektro, Pendidikan Teknik Elektro, or Teknik Informatika material. 
This step makes auto-select the database based on the subject.

3. Load database that had done to initial and pre-processing step.
4. Afterload the database, the application shows the result of classification based 

on the category of subject.
5. Users can use an option detail in each category to show the species of material 

online based on a specific subject.
6. After the user shows the detail of the result of each category, the user can give 

validation for each category or all categories.

3.2 Testing

The testing in the application has 2 sections for checking accuracy. First is the testing 
of the algorithm for knowing the effective and valid algorithm.

1) Testing for algorithm

The application is embedded in online learning with the database of material and 
then it is tested in all of the systems to get accuracy by using a classification algorithm. 
The technical of testing is K-Fold Cross-validation to get the performance of this algo-
rithm. In K-Fold Cross Validation, the data set of training divide into all of the multiple 
random values (k) without replacement where a multiply equal with the sum of k-1 as 
model training. Besides that one of the rest from multiple is used for testing. This step 
was repeated by all of k so the kind of model and the calculation of performance was 
the same repeated by all of k.

The total data set in this research for testing is 200 data with 10 data for each cate-
gory. Selecting of sample is used by testing of data with the random method for each of 
category that is done for spreading of data rated in all of the categories. Data set is got 
from the labeling of all material in online learning in a specific subject that is electrical 
engineering subject.

The result of testing to get the performance of the application showed in data of 
qualitative that is presented by the implementation of the algorithm. The data of the 
result of performance is got from 10 times of testing using k-Fold Cross-Validation. 
The total sample is 200 of material in online learning in a specific subject and the result 
is showed in Table 3.

164 http://www.i-jim.org



Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

Table 3. The result of testing in the algorithm

Testing Accuracy Precision Recall

1 94.29 82.50 71.74

2 94.76 85.09 73.25

3 92.80 79.60 64.18

4 93.29 76.34 66.96

5 94.92 83.65 72.53

6 94.34 83.21 70.61

7 93.50 75.05 66.90

8 94.92 84.01 73.24

9 94.23 79.52 71.29

10 93.16 76.92 66.22

Average 94.7 80.6 70.69

Based on Table 3, the testing for performance using Text mining and SNN algorithm 
with k-Fold Cross-validation is got the result that for the average of accuracy is 94.7%, 
the average of precision is 80,6% and lastly, the average of recall is 70.69%.

2) Testing for validation

The processing of validation is the same with testing in the algorithm that it has 
taken 10 times to test the validation. This testing took 3 validators to check the result 
of the classification of material in online learning. The validator use application that is 
showed in Figure 2. The validator is a teacher that teaches an electrical engineering sub-
ject. The teachers were checked all of the material that had been classified and they sent 
feedback by application. The format of feedback is valid or no. The result of validation 
is shown in Table 4.

Table 4. The result of the validation of the application

Testing Validator 1 Validator 2 Validator 3

1 Valid Valid Valid

2 Valid Valid Valid

3 Valid No Valid

4 Valid No No

5 Valid Valid Valid

6 Valid No Valid

7 No No No

8 Valid Valid No

9 Valid No No

10 Valid Valid No

Average 90 60 60

iJIM ‒ Vol. 16, No. 04, 2022 165



Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

Based on Table 4, the analysis is

1. Validator 1 was given a 9 valid status in 10 times of testing.
2. Validator 2 was given a 5 valid status in 10 times of testing. The validator gave no 

valid in testing 3, 4, 6, 7 and 9.
3. Validator 3 was given a 5 valid status in 10 times of testing. The validator gave no 

valid status in testing 5,7, 8, 9, and 10.

However, Validator 2 and Validator 3 given same the sum of valid status but they 
had a different number in testing given no valid status. The result of validation is that 
the average is 73.33%.

The result of Table 1 and Table 2 has a relationship about the result of validation and 
the result of the recall. If the value of recall is high in Table 1, all validators in Table 2 
are given valid status in their feedbacks. All of the testing given a significant average 
both testing in algorithm and testing invalidation. The average rate for all testing is 
83, 5% that this research success to classify the material on online learning based on a 
specific subject.

4 Conclusion

This research made a mobile application to classify the material online learning 
based on each category of subject. The application uses a machine-learning algorithm 
to make classification by using the title of the file in the material in database online 
learning. The algorithm is text mining and Shared Nearest Neighbour (SNN) Algo-
rithm. The length of the title is processed by a text-mining algorithm and after that gives 
a weighting for each word in the title. Every title of file with weighting has a value that 
is calculated by SNN to get near the cluster for each category. The end of processing is 
that there are many categories of the subject with each of specific material online learn-
ing. Clearly, this application helps teachers or students to find material online learning 
based on specific subjects and topics in online learning material

5 Acknowledgment

This research funded by PNBP Universitas Negeri Malang, Indonesia in 2021.

6 References

 [1] Martin, F., Sun, T., & Westine, C. D. (2020). A systematic review of research on online 
teaching and learning from 2009 to 2018. Computers & Education, 159, 104009. https://doi.
org/10.1016/j.compedu.2020.104009

 [2] Rasheed, R. A., Kamsin, A., & Abdullah, N. A. (2020). Challenges in the online component 
of blended learning: A systematic review. Computers & Education, 144, 103701. https://doi.
org/10.1016/j.compedu.2019.103701

166 http://www.i-jim.org

https://doi.org/10.1016/j.compedu.2020.104009
https://doi.org/10.1016/j.compedu.2020.104009
https://doi.org/10.1016/j.compedu.2019.103701
https://doi.org/10.1016/j.compedu.2019.103701


Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

 [3] Wahyono, I., Saryono, D., Asfani, K., Ashar, M., & Sunarti, S. (2020). Smart online courses 
using computational intelligence. International Journal of Interactive Mobile Technologies 
(iJIM), 14(12), 29–40. https://doi.org/10.3991/ijim.v14i12.1560

 [4] Marcus, V. B., Atan, N. A., Yusof, S. M., & Tahir, L. (2020). A systematic review of 
e-service learning in higher education. International Journal of Interactive Mobile Technol-
ogies, 14(6), 4–14. https://doi.org/10.3991/ijim.v14i06.13395

 [5] Hoi, S. C., Sahoo, D., Lu, J., & Zhao, P. (2021). Online learning: a comprehensive survey. 
Neurocomputing, 459, 249–289. https://doi.org/10.1016/j.neucom.2021.04.112

 [6] Joy, J., & Pillai, R. V. G. (2021). Review and classification of content recommenders in 
e-learning environment. Journal of King Saud University-Computer and Information Sci-
ences. https://doi.org/10.1016/j.jksuci.2021.06.009

 [7] Wahyono, I. D., Fadlika, I., Asfani, K., Putranto, H., & Hammad, J. (2019, October). 
New Adaptive Intelligence Method for Personalized Adaptive Laboratories. In 2019 
International Conference on Electrical, Electronics and Information Engineering (ICEEIE) 
(Vol. 6, pp. 196–200). IEEE. https://doi.org/10.1109/ICEEIE47180.2019.8981477

 [8] Zahour, O., Benlahmar, E. H., Eddaouim, A., & Hourrane, O. (2020). A comparative study 
of machine learning methods for automatic classification of academic and vocational guid-
ance questions. International Journal of Interactive Mobile Technologies, 14(8), 43–60. 
https://doi.org/10.3991/ijim.v14i08.13005

 [9] Luo, X. (2021). Efficient English text classification using selected machine learning tech-
niques. Alexandria Engineering Journal, 60(3), 3401–3409. https://doi.org/10.1016/j.
aej.2021.02.009

 [10] Wahyono, I. D., Putranto, H., Asfani, K., & Afandi, A. N. (2019, September). VLC-UM: 
A Novel Virtual Laboratory using Machine Learning and Artificial Intelligence. In 2019 
International Seminar on Application for Technology of Information and Communication 
(iSemantic) (pp. 360–365). IEEE. https://doi.org/10.1109/ISEMANTIC.2019.8884288

 [11] Cheng, M. Y., Kusoemo, D., & Gosno, R. A. (2020). Text mining-based construction site 
accident classification using hybrid supervised machine learning. Automation in Construc-
tion, 118, 103265. https://doi.org/10.1016/j.autcon.2020.103265

[12] Baharudin, N. A., & Jantan, H. (2019). Mobile-based word matching detection using intel-
ligent predictive algorithm. International Journal of Interactive Mobile Technologies, 13(9), 
140–151. https://doi.org/10.3991/ijim.v13i09.10848

 [13] Wahyono, I. D., Saryono, D., Ashar, M., & Asfani, K. (2019, September). Face Emotional 
Detection Using Computational Intelligence Based Ubiquitous Computing. In 2019 Inter-
national Seminar on Application for Technology of Information and Communication 
(iSemantic) (pp. 389–393). IEEE. https://doi.org/10.1109/ISEMANTIC.2019.8884320

 [14] Kumar, S., Kar, A. K., & Ilavarasan, P. V. (2021). Applications of text mining in services 
management: A systematic literature review. International Journal of Information Manage-
ment Data Insights, 1(1), 100008. https://doi.org/10.1016/j.jjimei.2021.100008

 [15] Xie, X., Fu, Y., Jin, H., Zhao, Y., & Cao, W. (2020). A novel text mining approach for 
scholar information extraction from web content in Chinese. Future Generation Computer 
Systems, 111, 859–872. https://doi.org/10.1016/j.future.2019.08.033

 [16] Liu, R., Wang, H., & Yu, X. (2018). Shared-nearest-neighbor-based clustering by fast search 
and find of density peaks. Information Sciences, 450, 200–226. https://doi.org/10.1016/j.
ins.2018.03.031

 [17] Wahyono, I. D., Ashar, M., Fadlika, I., Asfani, K., & Saryono, D. (2019, October). A New 
Computational Intelligence for Face Emotional Detection in Ubiquitous. In 2019 Interna-
tional Conference on Electrical, Electronics and Information Engineering (ICEEIE) (Vol. 6, 
pp. 148–153). IEEE. https://doi.org/10.1109/ICEEIE47180.2019.8981420

iJIM ‒ Vol. 16, No. 04, 2022 167

https://doi.org/10.3991/ijim.v14i12.1560
https://doi.org/10.3991/ijim.v14i06.13395
https://doi.org/10.1016/j.neucom.2021.04.112
https://doi.org/10.1016/j.jksuci.2021.06.009
https://doi.org/10.1109/ICEEIE47180.2019.8981477
https://doi.org/10.3991/ijim.v14i08.13005
https://doi.org/10.1016/j.aej.2021.02.009
https://doi.org/10.1016/j.aej.2021.02.009
https://doi.org/10.1109/ISEMANTIC.2019.8884288
https://doi.org/10.1016/j.autcon.2020.103265
https://doi.org/10.3991/ijim.v13i09.10848
https://doi.org/10.1109/ISEMANTIC.2019.8884320
https://doi.org/10.1016/j.jjimei.2021.100008
https://doi.org/10.1016/j.future.2019.08.033
https://doi.org/10.1016/j.ins.2018.03.031
https://doi.org/10.1016/j.ins.2018.03.031
https://doi.org/10.1109/ICEEIE47180.2019.8981420


Short Paper—Shared Nearest Neighbour in Text Mining for Classification Material in Online Learning…

7 Authors

Irawan Dwi Wahyono is a lecture on Department of Engineering in Universitas 
Negeri Malang, Indonesia (Email: irawan.dwi.ft@um.ac.id).

Djoko Saryono is a Professor on Department of Literature in Universitas Negeri 
Malang, Indonesia (Email: djoko.saryono.fs@um.ac.id).

Hari Putranto is a lecture on Department of Engineering in Universitas Negeri 
Malang, Indonesia (Email: Hari.putranto.ft@um.ac.id).

Khoirudin Asfani is a lecture on Department of Engineering in Universitas Negeri 
Malang, Indonesia (Email: khoirudin.asfani.ft@um.ac.id).

Harits Ar Rosyid is a lecture on Department of Engineering in Universitas Negeri 
Malang, Indonesia (Email: harits.ar.ft@um.ac.id).

Sunarti is a lecture on Department of Literature in Universitas Negeri Malang, 
Indonesia (Email: sunarti.fs@um.ac.id).

Mohd Murtadha Mohamad is a lecture on School of Computing in Universiti 
Teknologi Malaysia, Malaysia (Email: murtadha@utm.my).

Mohd Nihra Haruzuan Bin Mohamad Said is a lecture on Department of Educa-
tional Sciences, Mathematics and Creative Multimedia Universiti Teknologi Malaysia, 
Malaysia (Email: nihra@utm.my).

Gwo Jiun Horng is a lecture on Department of Computer Science and Informa-
tion Engineering in Southern Taiwan University of Science and Technology, Taiwan 
(Email: grojium@stust.edu.tw).

Jia-Shing Shih is a lecture on Department of Electrical Engineering in Southern 
Taiwan University of Science and Technology, Taiwan (Email: jasonshih@stust.edu.
tw).

Article submitted 2021-12-21. Resubmitted 2022-01-24. Final acceptance 2022-01-25. Final version 
published as submitted by the authors.

168 http://www.i-jim.org

mailto:irawan.dwi.ft@um.ac.id
mailto:djoko.saryono.fs@um.ac.id
mailto:Hari.putranto.ft@um.ac.id
mailto:khoirudin.asfani.ft@um.ac.id
mailto:harits.ar.ft@um.ac.id
mailto:sunarti.fs@um.ac.id
mailto:murtadha@utm.my
mailto:nihra@utm.my
mailto:grojium@stust.edu.tw
mailto:jasonshih@stust.edu.tw
mailto:jasonshih@stust.edu.tw