KEDS_Paper_Template


Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 

Vol 3, No 2, December 2020, pp. 106–111 eISSN 2597-4637 
 

https://doi.org/10.17977/um018v3i22020p106-111  
©2020 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id 

This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) 

Generating Javanese Stopwords List using  

K-means Clustering Algorithm 

Aji Prasetya Wibawa a, 1, *, Hidayah Kariima Fithri a, 2,  

Ilham Ari Elbaith Zaeni a, 3, Andrew Nafalski b, 4 

a Electrical Engineering Department, Universitas Negeri Malang 

Jl Semarang 5, Malang, East Java 65145, Indonesia 

b UniSA Education Futures, School of Engineering, University of South Australia  

SCT2-39 Mawson Lakes Campus, Adelaide, South Australia 5095, Australia 

1 aji.prasetya.ft@um.ac.id *; 2 hidayah9a20@gmail.com; 3 Ilham.ari.ft@um.ac.id; 4 andrew.nafalski@unisa.edu.au 

* corresponding author 

 
I. Introduction 

Text processing in Information Retrieval (IR) requires text documents as primary data sources. 
However, not all words in the text document are used. Some words often appear in text documents 
and do not have meaning called stopword [1], stored in a stopword list called a stopword database 
(corpus) [2][3]. The stopword removal approach depends on this Corpus to remove unnecessary 
words on the text [4]. The formed word list must be in the same language [1][5]. Various stopword 
list has been developed for popular languages such as English, Chinese [6], Sanskrit [7], Arab [8], 
Gujarati [9], and Indonesia [10]. However, a stopword list for low resources language such as 
Javanese is not available yet. 

Javanese is one of the traditional languages in Indonesia [11]. Javanese language has a level of 
politeness or known as unggah-ungguh, namely Ngoko, Madya and Krama [12][13]. Many 
historical documents, news, and stories are written in Javanese. Since the use of Javanese tends to 
become unpopular, retrieving information from such language could be difficult. The use of 
stopword removal may ease the IR process on Javanese text. Despite its benefit, list generation is 
quite complicated. In general, linguists manually label the substantial Corpus and store and send the 
result to separate storages. Therefore, an alternative to stopword list generation is badly needed. 

This paper aims to explore the use of the clustering approach for creating a stopword list in 
Javanese. The words are excluded from the bag of words to speed up the text classification process 
[14]. The clustering method used is K-means, one of the fast algorithms in the big data processing. 
The method classifies a given set of data through a certain number of K clusters [15]. Determination 
of words included in the Stopword list is done by grouping words based on each word frequency. 
Clustering eases the way to determine the threshold of words that include stopwords. 

ARTICLE INFO A B S T R A C T   

Article history: 

Received 1 December 2020 

Revised 15 December 2020 

Accepted 29 December 2020 

Published online 30 December 2020 

 
Stopword removal necessary in Information Retrieval. It can remove frequently 
appeared and general words to reduce memory storage. The algorithm eliminates 
each word that is precisely the same as the word in the stopword list. However, 
generating the list could be time-consuming. The words in a specific language and 
domain must be collected and validated by specialists. This research aims to develop 
a new way to generate a stop word list using the K-means Clustering method. The 
proposed approach groups words based on their frequency. The confusion matrix 
calculates the difference between the findings with a valid stopword list created by a 
Javanese linguist. The accuracy of the proposed method is 78.28% (K=7). The result 
shows that the generation of Javanese stopword lists using a clustering method is 
reliable. 

This is an open access article under the CC BY-SA license 

(https://creativecommons.org/licenses/by-sa/4.0/). 

Keywords: 

Stopwords 

Javanese language 

Clustering 

K-means 

http://u.lipi.go.id/1502081730
http://u.lipi.go.id/1502081046
https://doi.org/10.17977/um018v3i22020p106-111
http://journal2.um.ac.id/index.php/keds
mailto:keds.journal@um.ac.id
https://creativecommons.org/licenses/by-sa/4.0/
https://creativecommons.org/licenses/by-sa/4.0/


 A.P. Wibawa et al. / Knowledge Engineering and Data Science 2020, 3 (2): 106–111 107 

 
II. Materials and Methods 

The goal of this study is to generate a stopword list from the Javanese stopword corpus. The 
selected Javanese level of politeness is Ngoko, due to its usage and vocabularies [11][12]. Figure 1 
shows the four stages in conducting this research. 

The first stage is data collection. The dataset used was taken from the website Ki-demang.com in 
the Javanese Short Stories category. The data consists of 106 stories without considering page 
numbers and titles. The collection of stories is combined into a text document, used as the stopword 
generation dataset. 

The second stage is data preprocessing: case folding, punctuation removal, tokenizing, and 
filtering. The first preprocessing, case folding, changes uppercase letters into lowercase letters. The 
punctuation removal deletes the punctuation characters and numbers from the dataset. Furthermore, 
the tokenizing step spits the dataset into a single word. This step produces 17,763 types of words 
and their frequency. The result of tokenizing is words, cleared from typographical errors, words 
without meaning, names, and non-Ngoko words, resulting in 14,384 types. This deletion is based on 
a Javanese-Indonesian and Indonesian Javanese translation dictionary. Table 1 shows examples of 
deleted words. 

The dataset of 14,384 different words is submitted to Javanese linguists. The linguists group the 
dataset into two classes, namely stopwords and non-stopwords. Furthermore, general words 
(conjunction) considered as stopwords are 3,224 words. The non-stop words consist of 11,160 
specific words: noun, verb, and adjectives. Table 2 shows the example of two categories. 

The third stage is clustering the 14,384 unique words and their frequency. Figure 2 shows the 
pseudocode of the k-means clustering method [16]. 

The first k-means clustering stage determines the k value or the number of clusters. In the study, 
the k value is k=3, k=5, k=7, k=9, k=11, k=13, and k=15 [17]. The next step calculates the distance 
between data and centroid using Euclidean Distance [18]. 

Here, the results of each case are recognized in two classes: stopwords and non-stopwords. All 
words in cluster 0 are labeled as non-stopwords, while stopword is all words in other clusters. For 
example, if k=7, each word in the cluster 1 to 6 are stopwords, while the rest (in cluster 0) is non-
stop words. This first assumption is based on the observation that words with high frequency [19] 
are outside cluster 0. Table 3 illustrates one example of the frequency distribution of stopwords 
when k=7. In this case, 680 words is labeled as stop words, where 13704 words are non-stopwords. 

 
Fig. 1. Research stages 

Table 1. Examples of deleted words 

Typographical errors  Words without meaning  Names Non-Ngoko words 

lungaa lha Ezza wontening 

rilaaaaa we Sukartiah inbox 

ã³mongan lur Yono  meresahkan 

ewosemono aaaaaaaa Inah mengganggu  

senaosa loh Sumantri  pusaraning  

banjarpetambakan dhuk Laras  out 

sesambhungane ugh Yani awalnya 

ampuunn sttt Irvan berbincang 
 

Data collection

•106 Javanese 
stories 

Data 
preprocessing

•case folding

•removal of 
punctuations

•tokenizing

•filtering

Data clustering

•k-means

Stopword 
evaluation

•confusion 
matrix


108 A.P. Wibawa et al. / Knowledge Engineering and Data Science 2020, 3 (2): 106–111 

The fourth stage is evaluation, which aims to test the performance of the proposed method. The 
opinion of experts is used as a reference. A confusion matrix is applied to calculate accuracy and 
precision [20]. At this stage, all cases are tested to decide the best stopwords set based on the k-
means clustering technique. 

The accuracy is obtained by dividing the number of only correct documents by all documents 
[21]. The true value means that the clustering results have the same class as the reference. On the 
other hand, precision is the comparison of true positive (TP) with the total of true positive and false 
positive (FP) [21]. TP means that when the result of clustering is a stopword and it is the same as the 
reference. FP means that the predicted result is stopwords while the reference is non-stopwords. 

Table 2. Example of linguists’ classified words 

Stopwords Non-stopwords 

aku artane 

ana birahine 

apa cungkup 

dadi dhialog 

iki endhog 

ing garwamu 

kang jaitan 

sing karak 

wae langgananku 

yen macak  

 
Table 3. Stopwords and non –stopwords when K=7 

k Frequency distribution Number of stopwords Number of Non-stopwords 

0 1-25 0 13704 

1 2000-3000 3  

2 650-1050 13  

3 290-600 28  

4 1100-1600 5  

5 26-100 531  

6 105-290 100  

 Total 680 13704 

 
Input: 

D = {d1, d2 … dn}       Data used. 

k = {2, 3, 4 … n)      Desired number of clusters 

Output:  

One set  k cluster. 

Steps 1:  

Randomly select k centroid from D as the initial 

centroid (center of the initial cluster) 

Step 2: 

Determine each item in the cluster that has the closest 

cluster center; Calculate new averages for each cluster; 

Step 3: 

Repeat step 2 until the centroid cluster value does not 

change or until the maximum number of iterations is 
reached 

Fig. 2. K-means clustering algorithm 


 A.P. Wibawa et al. / Knowledge Engineering and Data Science 2020, 3 (2): 106–111 109 

 
III. Results and Discussions 

Table 4 shows the performance of the stopword list using k-means algorithm. The accuracy and 
precision represent the method performance by comparing the result with Javanese linguists’ manual 
classification.  

In Table 4, the highest accuracy is 78.2%, with 57.3% precision. The cluster supports this result 
with a value of k = 7. The result consists of 680 stopwords and 13704 non-stopwords, while the 
experts identify 3,224 and 11,160 of the same categories. The cluster can correctly indicate 11,030 
of 14,384 words, which is dominated by non-stopwords category. Figure 3 shows the distribution of 
the word based on the first assumption that the first cluster is the non-stopwords. 

As seen in Figure 3, experts recognize most words as non-stop words. The k-means wrongly 
categorized the non-stopwords into stopwords category (area within the grey line). On the other 
hand, the precision is 57.3% of the orange and gray areas, which means that most stopwords are 
categorized as non-stopwords. The lowest performance is when k=5. The accuracy is 25 %, and the 
precision 21.7%. Only 3089 is true stopwords, and 65 words are true non-stopwords.  

The second assumption is then applied for comparison. Table 5 shows the result, assuming the 
first cluster is the stopwords, while the rest is non-stopwords.  

Table 4. Stopword generator performance with the first asumption 

k Stopwords Non-stopwords Accuracy Precision 

3 49 14335 77.9% 100.0% 

5 14184 200 21.9% 21.7% 

7 680 13704 78.2% 57.3% 

9 13281 1103 25.0% 21.5% 

11 1500 12884 75.6% 40.6% 

13 2145 12239 73.4% 36.1% 

15 1750 12634 74.9% 39.2% 

 
Table 5. Stopword generator performance with the second assumption 

k Stopwords Non Stopwords Accuracy Precision 

3 14335 49 22.07% 22.1% 

5 200 14184 78.07% 67.5% 

7 13704 680 21.7% 20.68% 

9 1103 13281 74.9% 32.6% 

11  12884 1500 24.3% 20.2% 

13 12239 2145 26.5% 20.0% 

15 12634 1750 25.02% 20.08% 

 
Fig. 3. Words distribution based on the first assumption 


110 A.P. Wibawa et al. / Knowledge Engineering and Data Science 2020, 3 (2): 106–111 

The best performance in Table 5 is when k = 5, where the accuracy value is 78.07% and the 
precision value is 67.5%. This case indicates 135 true stopwords and 11095 true non-stopwords. The 
obtained precision is 67.5%, which is equal to 135 of 200 stopwords. 

The accuracy of both scenarios (Table 4 and Table 5) is similar. However, the precision of the 
best scenario (k=5) in Table 5 is higher than the best of Table 4 (k=3). It means that the performance 
second assumption is more promising than the first in recognizing the stopwords. Therefore, k-
means locates stopwords in the first cluster while the-nonstopwords are in other clusters. 

IV. Conlusion 

K-means is applicable for Javanese stopwords list generation. The algorithm indicates the 
stopword location is in the first cluster of the words list. However, the current promising result is 
still possible to be improved. Further research should consider the balance of frequency distribution 
and the implementation of word stemming in the preprocessing. The use of more training data may 
balance the frequency, while the stemming may combine the unique words and unites the 
occurances of combined words.  

Declarations  

Author contribution  

All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. 

Funding statement  

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit 
sectors.  

Conflict of interest  

The authors declare no conflict of interest.  

Additional information  

No additional information is available for this paper. 

References 

[1] R. T. Lo, B. He, and I Ounis “Automatically Building a Stopword List for an Information Retrieval System,” J. Digit. 
Inf. Manag. vol. 3, no. 1, pp. 3–8, 2005. 

[2] J. Kaur, “A Systematic Review on Stopword Removal Algorithms,” Int. J. Future Revolut. Comp. Sci. Comm. Eng. 
vol. 4, no. 4, pp. 207–210, 2018. 

[3] J. Kaur and P.K. Buttar, 2018. “Stopwords Removal and its Algorithms Based on Different Methods”. International 
Journal of Advanced Research in Computer Science, vol. 9, no. 5, pp. 81–88, Oct. 2018.  

[4] S. Vijayarani, M. J. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining - An Overview,” Int. J. of 
Comp. Sci. Comm. Net. vol. 5, no. 1, pp. 7–16, 2015. 

[5] L. Dolamic and J. Savoy, “When Stopword Lists Make the Difference,” J. Am. Soc. Inf. Sci. Technol., vol. 61, no. 1, 
pp. 200–203, 2010. 

[6] F. Zou, F. L. Wang, X. Deng, S. Han, and L. S. Wang, “Automatic Construction of Chinese Stop Word List,” Proc. 
5th WSEAS Int. Conf. Appl. Comp. Sci. 2006, pp. 1010–1015, 2006. 

[7] J. K. Raulji, “Stop-Word Removal Algorithm and its Implementation for Sanskrit Language,” Int. J. Comp. Applica. 
vol. 150, no. 2, pp. 15–17, 2016. 

[8] R. M. Duwairi, “Arabic Sentiment Analysis using Supervised Classification,” Int. Conf. Futur. Internet Things Cloud, 
pp. 579–583, 2014. 

[9] R. M. Rakholia and J. R. Saini, “A Rule-Based Approach to Identify Stop Words for Gujarati Language,” Proc. 5th Int. 
Conf. Front. Intell. Comput. Theory Appl. Adv. Intell. Syst. Comput., p. 515, 2017. 

[10] M. C. Kirana, N. P. Perkasa, M. Z. Lubis, and M. Fani, “Visualisasi Kualitas Penyebaran Informasi Gempa Bumi di 
Indonesia Menggunakan Twitter,” Journal of Applied Informatics and Computing, vol. 3, no. 1, pp. 23–32, 2019. 

[11] A. P. Wibawa, A. Nafalski, J. Tweedale, N. Murray, and A. E. Kadarisman, “Hybrid Machine Translation for 
Javanese Speech Levels,” Proc. 5th Int. Conf. Knowl. Smart Technol., pp. 64–69, 2013. 

[12] S. Poedjosoedarmo, “Javanese Speech Levels,” Indonesia, vol. 6, no. 6, pp. 54–81, 1968. 

[13] A. P. Wibawa, A. Nafalski, A. E. Kadarisman, and W. F. Mahmudy, “Indonesian-to-Javanese Machine Translation,” 
Int. J. Innov. Manag. Tech., vol. 4, no. 4, pp. 451–454, 2013. 

[14] S. V. S. Gunasekara and P. S. Haddela, “Context aware stopwords for Sinhala Text classification,” 2018 Natl. Inf. 
Technol. Conf., pp. 1–6, 2018. 

https://dblp.org/db/journals/jdim/jdim3.html
https://dblp.org/db/journals/jdim/jdim3.html
http://www.ijfrcsce.org/index.php/ijfrcsce/article/view/1499
http://www.ijfrcsce.org/index.php/ijfrcsce/article/view/1499
https://doi.org/10.26483/ijarcs.v9i5.6301
https://doi.org/10.26483/ijarcs.v9i5.6301
https://www.researchgate.net/publication/339529230_Preprocessing_Techniques_for_Text_Mining_-_An_Overview
https://www.researchgate.net/publication/339529230_Preprocessing_Techniques_for_Text_Mining_-_An_Overview
https://doi.org/10.1002/asi.21186
https://doi.org/10.1002/asi.21186
http://www.cs.cityu.edu.hk/~lwang/research/hangzhou06.pdf
http://www.cs.cityu.edu.hk/~lwang/research/hangzhou06.pdf
https://doi.org/10.5120/ijca2016911462
https://doi.org/10.5120/ijca2016911462
https://doi.org/10.1109/ficloud.2014.100
https://doi.org/10.1109/ficloud.2014.100
https://doi.org/10.1007/978-981-10-3153-3_79
https://doi.org/10.1007/978-981-10-3153-3_79
https://doi.org/10.30871/jaic.v0i0.1246
https://doi.org/10.30871/jaic.v0i0.1246
https://doi.org/10.1109/kst.2013.6512789
https://doi.org/10.1109/kst.2013.6512789
https://ecommons.cornell.edu/bitstream/handle/1813/53445/INDO_6_0_1107138592_54_81.pdf?sequence=1&isAllowed=y
https://doi.org/10.7763/ijimt.2013.v4.440
https://doi.org/10.7763/ijimt.2013.v4.440
https://doi.org/10.1109/nitc.2018.8550073
https://doi.org/10.1109/nitc.2018.8550073


 A.P. Wibawa et al. / Knowledge Engineering and Data Science 2020, 3 (2): 106–111 111 

 
[15] T. M. Kodinariya, “Review on determining number of Cluster in K-Means Clustering,” International Journal of 
Advance Research in Computer Science and Management Studies, vol. 1, no. 6, pp. 90–95, 2013. 

[16] K. A. A. Nazeer and M. P. Sebastian, “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm,” 
Proceedings of the World Congress on Engineering 2009, vol. I, pp. 1–5, 2009. 

[17] D. T. Pham, S. S. Dimov, and C. D. Nguyen, “Selection of K in K -means clustering,” Proc. Inst. Mech. Eng. Part C: 
J. Mech. Eng. Sci., vol. 219, no 1, May 2004, pp. 103–119, 2005. 

[18] F. Leisch, “A toolbox for K -centroids cluster analysis,” Computational Statistics & Data Analysis, vol. 51, no 2, pp. 
526–544, 2006. 

[19] N. Grozavu, Y. Bennani, and M. Lebbah, “From variable weighting to cluster characterization in topographic 
unsupervised learning,” Proc. Int. Jt. Conf. Neural Networks, pp. 1005–1010, 2009. 

[20] V. M. Patro and M. R. Patra, “Augmenting Weighted Average with Confusion Matrix to Enhance Classification 
Accuracy,” Transactions on Machine Learning and Artificial Intelligence, vol. 2, no. 4, pp. 77–91, 2014. 

[21] A. Mishra and S. Vishwakarma, “Analysis of TF-IDF Model and its Variant for Document Retrieval,” Int. Conf. 
Comput. Intell. Commun. Networks Anal., pp. 772–776, 2015. 

 
http://www.ijarcsms.com/docs/paper/volume1/issue6/V1I6-0015.pdf
http://www.ijarcsms.com/docs/paper/volume1/issue6/V1I6-0015.pdf
https://www.researchgate.net/publication/44260003_Improving_the_Accuracy_and_Efficiency_of_the_k-means_Clustering_Algorithm
https://www.researchgate.net/publication/44260003_Improving_the_Accuracy_and_Efficiency_of_the_k-means_Clustering_Algorithm
https://doi.org/10.1243%2F095440605X8298
https://doi.org/10.1243%2F095440605X8298
https://doi.org/10.1016/j.csda.2005.10.006
https://doi.org/10.1016/j.csda.2005.10.006
https://doi.org/10.1109/ijcnn.2009.5178666
https://doi.org/10.1109/ijcnn.2009.5178666
https://doi.org/10.14738/tmlai.24.328
https://doi.org/10.14738/tmlai.24.328
https://doi.org/10.1109/cicn.2015.157
https://doi.org/10.1109/cicn.2015.157

	I. Introduction
	II. Materials and Methods
	III. Results and Discussions
	IV. Conlusion
	Declarations
	Author contribution
	Funding statement
	Conflict of interest
	Additional information

	References
	[1] R. T. Lo, B. He, and I Ounis “Automatically Building a Stopword List for an Information Retrieval System,” J. Digit. Inf. Manag. vol. 3, no. 1, pp. 3–8, 2005.
	[2] J. Kaur, “A Systematic Review on Stopword Removal Algorithms,” Int. J. Future Revolut. Comp. Sci. Comm. Eng. vol. 4, no. 4, pp. 207–210, 2018.
	[3] J. Kaur and P.K. Buttar, 2018. “Stopwords Removal and its Algorithms Based on Different Methods”. International Journal of Advanced Research in Computer Science, vol. 9, no. 5, pp. 81–88, Oct. 2018.
	[4] S. Vijayarani, M. J. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining - An Overview,” Int. J. of Comp. Sci. Comm. Net. vol. 5, no. 1, pp. 7–16, 2015.
	[5] L. Dolamic and J. Savoy, “When Stopword Lists Make the Difference,” J. Am. Soc. Inf. Sci. Technol., vol. 61, no. 1, pp. 200–203, 2010.
	[6] F. Zou, F. L. Wang, X. Deng, S. Han, and L. S. Wang, “Automatic Construction of Chinese Stop Word List,” Proc. 5th WSEAS Int. Conf. Appl. Comp. Sci. 2006, pp. 1010–1015, 2006.
	[7] J. K. Raulji, “Stop-Word Removal Algorithm and its Implementation for Sanskrit Language,” Int. J. Comp. Applica. vol. 150, no. 2, pp. 15–17, 2016.
	[8] R. M. Duwairi, “Arabic Sentiment Analysis using Supervised Classification,” Int. Conf. Futur. Internet Things Cloud, pp. 579–583, 2014.
	[9] R. M. Rakholia and J. R. Saini, “A Rule-Based Approach to Identify Stop Words for Gujarati Language,” Proc. 5th Int. Conf. Front. Intell. Comput. Theory Appl. Adv. Intell. Syst. Comput., p. 515, 2017.
	[10] M. C. Kirana, N. P. Perkasa, M. Z. Lubis, and M. Fani, “Visualisasi Kualitas Penyebaran Informasi Gempa Bumi di Indonesia Menggunakan Twitter,” Journal of Applied Informatics and Computing, vol. 3, no. 1, pp. 23–32, 2019.
	[11] A. P. Wibawa, A. Nafalski, J. Tweedale, N. Murray, and A. E. Kadarisman, “Hybrid Machine Translation for Javanese Speech Levels,” Proc. 5th Int. Conf. Knowl. Smart Technol., pp. 64–69, 2013.
	[12] S. Poedjosoedarmo, “Javanese Speech Levels,” Indonesia, vol. 6, no. 6, pp. 54–81, 1968.
	[13] A. P. Wibawa, A. Nafalski, A. E. Kadarisman, and W. F. Mahmudy, “Indonesian-to-Javanese Machine Translation,” Int. J. Innov. Manag. Tech., vol. 4, no. 4, pp. 451–454, 2013.
	[14] S. V. S. Gunasekara and P. S. Haddela, “Context aware stopwords for Sinhala Text classification,” 2018 Natl. Inf. Technol. Conf., pp. 1–6, 2018.
	[15] T. M. Kodinariya, “Review on determining number of Cluster in K-Means Clustering,” International Journal of Advance Research in Computer Science and Management Studies, vol. 1, no. 6, pp. 90–95, 2013.
	[16] K. A. A. Nazeer and M. P. Sebastian, “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm,” Proceedings of the World Congress on Engineering 2009, vol. I, pp. 1–5, 2009.
	[17] D. T. Pham, S. S. Dimov, and C. D. Nguyen, “Selection of K in K -means clustering,” Proc. Inst. Mech. Eng. Part C: J. Mech. Eng. Sci., vol. 219, no 1, May 2004, pp. 103–119, 2005.
	[18] F. Leisch, “A toolbox for K -centroids cluster analysis,” Computational Statistics & Data Analysis, vol. 51, no 2, pp. 526–544, 2006.
	[19] N. Grozavu, Y. Bennani, and M. Lebbah, “From variable weighting to cluster characterization in topographic unsupervised learning,” Proc. Int. Jt. Conf. Neural Networks, pp. 1005–1010, 2009.
	[20] V. M. Patro and M. R. Patra, “Augmenting Weighted Average with Confusion Matrix to Enhance Classification Accuracy,” Transactions on Machine Learning and Artificial Intelligence, vol. 2, no. 4, pp. 77–91, 2014.
	[21] A. Mishra and S. Vishwakarma, “Analysis of TF-IDF Model and its Variant for Document Retrieval,” Int. Conf. Comput. Intell. Commun. Networks Anal., pp. 772–776, 2015.