LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

104 
 

Implementation of Equal-Width Interval Discretization in 
Naive Bayes Method  for Increasing Accuracy of 

Students' Majors Prediction 
 

Alfa Saleh
a1

, Fina Nasari
a2 

 
a
Faculty of Computer Science and Engineering, Potensi Utama University 

Jl. K.L Yos Sudarso KM 6.5 Tanjung Mulia Medan, Indonesia 
1
alfa@potensi-utama.ac.id 

2
fina@potensi-utama.ac.id  

 
Abstract 

 
The Selection of majors for students is a positive step that is done to focus students in 
accordance with their potential, it is considered important because with the majors, students are 
expected to develop academic ability according to the field of interest. In previous research, 
Naive Bayes method has been tested to classify the student’s department based on the criteria 
that support the case study on Private Madrasah Aliyah PAB 6 Helvetia students and the 
accuracy of the test from 100 student data is 90%. in this study, the researcher developed a 
previously used method by applying an equal-width interval discretization that would transform 
numerical or continuous criteria into a categorical criteria with a predetermined k value, different 
k values would be tested to find the best accuracy value. from the 120-student data that have 
been tested, it is proved that the result of the classification of the application of equal-width 
interval discretization on the Naive Bayes method with the value of k = 8 is better and increased 
the accuracy value 91.7% to 93.3%. 

   
Keywords: Data Mining, Naive Bayes, Equal-Width Interval Discretization, Students’ Majors 
  
 
1. INTRODUCTION 
The role of education is very important in supporting the development of technology that almost 
has penetrated into all areas. It also affects the determination of majors for high school / 
equivalent students, where the determination of the student's department is a process to focus 
students in a particular area of the interested field, this is done so that each student can learn 
more in the subjects that are in accordance with the concentration which has been specified for 
the student. The problem is the ongoing system of private school Madrasah Aliyah PAB 2 
Helvetia Medan, the place where researchers conduct research is not entirely effective because 
students are given a questionnaire to determine which majors they are interested in regardless 
of other criteria that may have a stake in determining eligibility students in terms of choosing 
majors. Through the process of determining the majors for students is an important step in 
preparing students to concentrate on the field that students are interested in when it should 
continue to the next education level. In the previous research, researchers also have done the 
process of mining to dig information about the determination of student majors using Naive 
Bayes method, the results of the research were tested 100 student data based on several 
criteria include the average score of natural science subjects, the average value of science 
social, classroom teacher recommendation and the questionnaire value filled by the students 
concerned. from the 100 data tested using the Naive Bayes method, it is obtained the accuracy 
value of determining student majors by 90% with an error of 10% [1] . The Naive Bayes method 
was chosen because it was widely implemented in various fields of science, as in the Xingxing 
Zhou research (2016), the Naive Bayes method was used to classify images to improve the 
accuracy of brain diagnosis using NMR imagery, where 94.5% sensitivity classification was 
obtained, 91.70% and the overall accuracy of 92.60 [2]. Naive Bayes is one of the top ten (10) 
data mining algorithms for simplicity and efficiency, as evidenced by the performance of Naive 
Bayes in classifying text [3], [4]. In addition, Naive Bayes is widely recognized as a simple and 
effective probabilistic classification method [5]–[7], and its performance is proportional to or 
higher than the decision tree [8] and artificial neural networks [9]. 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

105 
 

However, researchers wanted to expand their previous research by applying Unsupervised 
Discretization [10] to improve the performance of the Naive Bayes method so that the 
percentage of predicted accuracy results could increase compared to the previous one. Where 
Unsupervised Discretization techniques in transforming numerical criteria / attributes are 
excellent [11]. 
 
2. Research Methods 

2.1. Naïve Bayes 

Naive Bayes is a model-based classification method and offers competitive classification 
performance compared with other data-driven classification methods [12]–[15], such as neural 
network, support vector machine (SVM), logistic regression, and k-nearest neighbors. The naive 
Bayes applies the Bayes’ theorem with the “naive” assumption that any pair of features is 
independent for a given class. The classification decision is made based upon the maximum-a-
posteriori (MAP) rule. Usually, three distribution models, including Bernoulli model, multinomial 
model and Poisson model, have commonly been incorporated into the Bayesian framework and 
have resulted in classifiers of Bernoulli  naive Bayes (BNB), multinomial naive Bayes (MNB) and 
Poisson naive Bayes (PNB), respectively[4]. The formula of Bayes's theorem is [16]: 
 

        
           

    
                                                              

 
Where variable X represents Data with unknown class, H represents The data hypothesis is a 
specific class, P (H|X) represents The probability of hypothesis H is based on condition X 
(posterior probability), P (H) represents Hypothesis probability H (prior probability), while P (X|H) 
represents The probability of X is based on the conditions in hypothesis H and P (X) represents 
Probability X. Therefore, the method of Naive Bayes above is adjusted as follows: 
 

           
              

        
                                  

 
Where Variable C represents the class, while the F1 ... Fn represents the characteristics of the 
user for the classification process. Therefore, the above formula can also be written simply as 
follows: 
 

           
                  

        
                                       

 
2.2. Unsupervised Discretization 

Discretization is the process of converting a continuous attribute value into a limited number of 
intervals and associated with each interval with a discrete numerical value. Discretization 
process is carried out before the learning process [17]. Among the methods of Unsupervised 
Discretization, there are several simple methods. (Equal-width Interval Discretization and equal-
frequency Interval Discretization) and more sophisticated, based on clustering analysis, such as 
k-means discretization. The Continuous range is divided into subranges by user-specified width 
or Frequency[18]. But in this study, researchers used Equal-width interval Discretization 
technique, which is the simplest discretization method that divides the observed range of values 
in each feature / attribute. The process involves sorting the observed values of the continuous 
feature / attribute and finding the minimum (Vmin) and maximum (Vmax) values. The interval 
can be calculated by dividing the observed range of values for the variables into k of the same 
size using the following formula [18]. 

 

         
          

 
                                                    

                                                           
 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

106 
 

Then the limits can be constructed for i = 1 ... k-1 using the above equation. This type of 
discretization does not depend on multi-relational data structures. However, this discretization 
method is sensitive to outliers that can drastically reduce the range. The limitations of this 
method are given by the uneven distribution of data points: some intervals may contain more 
data points than others. 

2.3. Research Stages 

In the Naïve Bayes method, the constant (categorical) String data is distinguished from 
continuous numerical data, this difference will be seen when determining the probability value of 
each criterion whether it is a criterion with a string data value or a criterion with a numeric data 
value. The stages of applying the method of Naive Bayes in this study can be seen in Figure 1 
below. 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
Figure 1. Research Stages of Equal-Width Interval Discretization on Naive Bayes 

2.3.1. Data Collection 

The data that will be used as training data is the academic data of the students as respondents, 
where the sample of student data is taken as much as 120 data, they consist of The students’ 
academic data such as the score of Mathematics, Physics, Chemistry, Biology, Economics, 
Geography, History and Sociology ,the questionnaire that is filled by students and 
recommendation from the homeroom. 

2.3.2. Data Cleaning 

In the process of data cleaning, the data that eventually used in this research is the exact value 
of subjects, non-exact subjects, a recommendation from the homeroom, and questionnaires 
filled by students.  

2.3.3. Determining the Criteria 

The criteria that used based on data that has been collected is as in table 1 below: 

 

 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

107 
 

         Table 1. Criteria 

NO Criterion Type of Criterion Value 

1 The average score of exact 
subjects 

Numerical/Continuous 0 - 100 

2 The average score of non-exact 
subjects 

Numerical/Continuous 0 - 100 

3 
Recommendation Categorical 

Science, 
Social Studies 

4 Questionnaire  
Categorical 

Science, 
Social Studies 

 
There are four (4) criteria used in this research, namely the average score of exact subjects, the 
average value of non-exact subjects, recommendation and lift. Two (2) of them are numerical / 
continuous criteria and two (2) categorical criteria. To improve the accuracy of the Naive Bayes 
method, discretization is performed using unsupervised discretization techniques on numerical / 
continuous criteria, the goal is to transform numerical/continuous criteria into categorical criteria 
using formulas 4 and 5. The following table 2 discriminates numerical criteria / continuous. 

 

 

 

 

 

 

 

 

 

 

 

 
 
In table 2 above, you can see the results of the discretization process using the Unsupervised 
Discretization technique. Where the criteria / attributes of The average values of exact and non-
exact subjects with numerical or continuous type are transformed into categorical criteria with 8 
categories. The first category is the average value of exact sciences that are below 71.9125, the 
second category is the average value of exact subjects which are between 71.9125-73.825, the 
third category is the average value of exact subjects which are between 73.825- 75.7375, the 
fourth category is the average value of exact subjects that are between 75.7375-77.65, the fifth 
category is the average value of exact subjects that are between 77.65-79.5625, the sixth 
category is the average value of exact subjects that are between 79.5625-81.475, the seventh 
category is the average value of exact subjects which are between 81.475-83.3875, and the 
eighth category is the average value of exact sciences that are above 83.3875. 

Furthermore, the results of the discretization of the criteria for the average value of non-exact 
subjects are also divided into 8 categories, where the first category is the average value of non-
exact subjects under 71,875, the second category is the average value of non-exact subjects -
acts that are between 71,875-73,75, the third category is the average value of non-exact 
subjects that are between 73.75-75.625, the fourth category is the average value of non-exact 
subjects that are between 75.625-77.5, the fifth category is the average value of non-exact 
subjects that are between 77.5-79.375, the sixth category is the average value of non-exact 
subjects that are between 79.375-81.25, the seventh category is the average value of non-exact 
subjects between 81.25-83.125, and the eighth category are the average values of non-exact 
subjects above 83.125. 

Numerical/Continuous Criteria 

The average score of 
exact subjects 

The average score of 
non-exact subjects 

<71.9125 <71.875 

71.9125 – 73.825 71.875 – 73.75 

73.825 – 75.7375 73.75 – 75.625 

75.7375 – 77.65 75.625 – 77.5 

77.65 – 79.5625 77.5 – 79.375 

79.5625 – 81.475 79.375 – 81.25 

81.475 – 83.3875 81.25 – 83.125 

83.3875> 83.125> 

Table 2. The results of Discretization with k=8 

 

 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

108 
 

2.3.4. The Probability of Each Criterion 

Several criteria have been set as a reference in classifying students' majors using Unsupervised 
Discretization techniques on the Naive Bayes method. The next step, determining the 
probability value of each criterion, for example, the probability value of the average scores of the 
exact scores of subjects to be shown is the probability value with the value k = 8. 

Here the value of probability criteria of the average value of the exact sciences can be seen in 
table 3. 
 

Table 3. The Probability of The average score of exact subjects with k=8 

 

 

 

 

 

 

 

 

 

 

from table 3 above, there were 60 students placed in the science studies major and 60 students 
were placed in the social studies major . Based on these data, there were 4 students with the 
average value of exact subjects below 71.9125 placed in the science studies major and the 
probability value of 0.067, 3 student with an average value of exact subjects between 71.9125-
73.825 placed in the science studies major and the probability value of 0.05 , 12 students with 
the average value of exact subjects between 73.825-75.7375 are placed in the science studies 
major  and the probability value is 0.2, 1 student with an average value of exact subjects 
between 75.7375-77.65 is placed in the science studies major and the probability value is 
0.017, 2 students with the average value of exact subjects between 77.65-79.5625 are placed in 
the science studies major and the probability value is 0.033, 13 students with the the average 
value of exact subjects between 79.5625-81.475 are placed in the science studies major and 
the probability value is 0.217, 8 students with the average value of exact subjects between 
81,475-83.3875 is placed in the science studies major and the probability value is 0.133, 17 
students with the average value of exact subjects above 83.3875 are placed in the science 
studies major and the probability value is 0.283. Meanwhile, there were 17 students with the 
average value of exact subjects below 71.9125 placed at the social studies major  and the 
probability value was 0.283, 8 students with the average value of exact subjects between 
71.9125-73.825 were placed in the social studies major  and the probability value was 0.133, 12 
students with the average value of exact subjects between 73.825-75.7375 were placed in the 
social studies major  and the probability value was 0.2, 3 students with the average value of 
exact subjects between 75.7375-77.65 were placed in the social studies major and the 
probability value was 0.05, 2 students the average value of exact subjects between 77.65-
79.5625 are placed in the social studies major  and the probability value is 0.033, 9 students 
with the average value of exact subjects between 79.5625-81.475 are placed in the social 
studies major and the probability value is 0.15, 6 students with the average value of exact 
subjects is between 81,475-8 3.3875 is placed at the social studies major and the probability 
value is 0.1, 3 students with an average value of exact subjects above 83.3875 are placed at 
the social studies major and the probability value is 0.05. 

The probability value of the average score of non-exact subjects with a value of k = 8, be shown 
in table 4 as follows. 
 
 

The Average Score of 
Exact Subject 

Probability 

Science Social Studies 

<71.9125 0.067 0.283 

71.9125 – 73.825 0.05 0.133 

73.825 – 75.7375 0.2 0.2 

75.7375 – 77.65 0.017 0.05 

77.65 – 79.5625 0.033 0.033 

79.5625 – 81.475 0.217 0.15 

81.475 – 83.3875 0.133 0.1 

83.3875> 0.283 0.050 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

109 
 

 Table 4. The Probability of The average score of non-exact subjects with k=8 

 

 

 

 

 

 

 

 

 

 

 
from table 4 above, there were 60 students placed in the science studies major and 60 students 
were placed in the social studies major. Based on these data, there were 18 students with the 
average value of non-exact subjects below 71.9125 placed in the science studies major and the 
probability value of 0.3, 10 student with an average value of non-exact subjects between 
71.9125-73.825 placed in the science studies major and the probability value of 0.167, 9 
students with the average value of non-exact subjects between 73.825-75.7375 are placed in 
the science studies major and the probability value is 0.15, 2 student with an average value of 
non-exact subjects between 75.7375-77.65 is placed in the science studies major and the 
probability value is 0.033, there is no student with the average value of non-exact subjects 
between 77.65-79.5625 are placed in the science studies major and the probability value is 0, 
10 students with the the average value of non-exact subjects between 79.5625-81.475 are 
placed in the science studies major and the probability value is 0.167, 8 students with the 
average value of non-exact subjects between 81,475-83.3875 is placed in the science studies 
major and the probability value is 0.133, 3 students with the average value of non-exact 
subjects above 83.3875 are placed in the science studies major and the probability value is 
0.05. Meanwhile, there were 3 students with the average value of non-exact subjects below 
71.9125 placed at the social studies major  and the probability value was 0.05, 6 students with 
the average value of non-exact subjects between 71.9125-73.825 were placed in the social 
studies major  and the probability value was 0.1 , 15 students with the average value of non-
exact subjects between 73.825-75.7375 were placed in the social studies major  and the 
probability value was 0.25, 1 students with the average value of non-exact subjects between 
75.7375-77.65 were placed in the social studies major and the probability value was 0.033, 4 
students the average value of non-exact subjects between 77.65-79.5625 are placed in the 
social studies major  and the probability value is 0.067, 11 students with the average value of 
non-exact subjects between 79.5625-81.475 are placed in the social studies major and the 
probability value is 0.183, 10 students with the average value of non-exact subjects is between 
81,475-8 3.3875 is placed at the social studies major and the probability value is 0.167, 10 
students with an average value of non-exact subjects above 83.3875 are placed at the social 
studies major and the probability value is 0.167. 

The probability value for the recommendation criteria can be seen in table 5. 
 
                        Table 5. The Probability of the recommendation criteria with k=8 

 

 

 

 

 

The average score 
of non-exact 
subjects 

Probability 

Science Social Studies 

<71.875 0.3 0.05 

71.875 – 73.75 0.167 0.1 

73.75 – 75.625 0.15 0.25 

75.625 – 77.5 0.033 0.017 

77.5 – 79.375 0 0.067 

79.375 – 81.25 0.167 0.183 

81.25 – 83.125 0.133 0.167 

83.125> 0.05 0.167 

  

Recommendation 
Probability 

Science Social Studies 

Science 0.967 0.15 

Social Studies 0.033 0.85 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

110 
 

The number of students used was 120 students who had been recommended by the previous 
homeroom teacher, there were 60 students were placed in the science studies major and 60 
students were placed in the social studies major. Based on these data there were 59 students 
who were recommended to enter the science studies major and placed in the science studies 
major, while there was 1 student who was recommended to enter the social studies major but 
was placed in the science studies major. Furthermore, there were 9 students who were 
recommended to enter the science studies major but were placed at the social studies major 
while there were 51 students who were recommended to enter the social studies major and 
placed at the social studies major. Thus, the probability of students who are recommended to 
enter the science studies major and be placed in the science studies major is 0.967 while the 
probability of students who are recommended to enter the social studies major but is placed at 
the science studies major is 0.033. While the probability of students who were recommended to 
enter the science studies major but placed in the social studies major was 0.15. then, the 
probability of students being recommended to enter the social studies major and placed in the 
social studies major was 0.85. The probability value for the questionnaire criteria can be seen in 
table 6. 

The probability value for the Questionnaire criteria can be seen in table 6. 
 
                       Table 6. The Probability of the Questionnaire criteria with k=8 
 
 
 
 
 

 
 
The number of students used was 120 students who had been given questionnaires, it was 
recorded as many as 60 students were placed in the science studies majors and 60 more 
students were placed in the social studies major. Based on these data there were 50 students 
who chose the science studies major and were placed in the science studies majors, while there 
were 10 students who chose the social studies major but were placed in the science studies 
major. Then there were 9 students who chose the science studies major but were placed in the 
social studies majors while there were 51 students who chose the social studies major and were 
placed in the social studies major. Thus the probability of students who choose the science 
studies major can be calculated and placed at the science studies major of 0.833, the 
probability of students who choose the social studies major but placed in the science studies 
majors is 0.167. Whereas, the probability of students who choose the science studies major but 
placed at the social studies major is 0.15 while the probability of students who choose the social 
studies major and placed at the social studies major is 0.85. 
 
3. Result and Discussion 

To see the consistency of the use of equal-width interval discretization in the Naive Bayes 
method, it was tested for some data, The following test of the implementation of unsupervised 
discretization on The Naive Bayes method by using sample 60 data can be seen in table 7. 
 
                  Table 7. Testing Results with 60 data 

 

 

 

 

 

 

From the test results using 60 sample data, the application of equal-width interval discretization 
technique on the Naive Bayes method with the value of k = 4 successfully classify the data with 

Questionnaire 
Probability 

Science Social Studies 

Science 0.833 0.15 

Social Studies 0.167 0.85 

Amount of ‘K’ 
value 

Weighted Average 

TP Rate FP Rate Precision Recall F-Measure 

4 0.917 0.082 0.917 0.917 0.917 
6 0.917 0.082 0.917 0.917 0.917 
8 0.933 0.067 0.933 0.933 0.933 
10 0.967 0.033 0.967 0.967 0.967 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

111 
 

the accuracy of 91.7%, while for the value k = 6, obtained a level of accuracy of 91.7%, then for 
value k = 8, the obtained accuracy of 93.3% and for the value k = 10, the accuracy rate 
obtained is 0.967%. meanwhile, testing is also done with 90 data, the test result can be seen in 
table 8 below. 
 
                    Table 8. Testing Results with 90 data 

 

 

 

 

 

 
From the test result using 90 sample data, the application of equal-width interval discretization 
technique on Naive Bayes method with k = 4 value succeeded in classifying the data with 90% 
accuracy, while for k = 6, the accuracy level was 92.5%, then the value k = 8, the accuracy of 
93.3% and k = 10, the accuracy of 9.25%. meanwhile, testing is also done with 120 data, the 
test result can be seen in table 9 below. 

                  Table 9. Testing Results with 120 data 

 

 

 

 

 
 
The test result using 120 sample data, the application of equal-width interval discretization 
technique on Naive Bayes method with value k = 4 succeeded in classifying data with 90% 
accuracy, while for k = 6, the accuracy level was 92.2%, then for the value k = 8,  
the accuracy of 93.3% and k = 10, the accuracy of 88.9%.  
 
The graph of the test results with some previous data can be seen in Figure 2 below: 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 

Figure 2. The test results of Unsupervised Discretization Implementation on 
 the Naive Bayes method 

Amount of ‘K’ 
value 

Weighted Average 

TP Rate FP Rate Precision Recall F-Measure 

4 0.9 0.1 0.9 0.9 0.9 
6 0.922 0.078 0.922 0.922 0.922 
8 0.933 0.067 0.933 0.933 0.933 
10 0.889 0.111 0.889 0.889 0.889 

Amount of ‘K’ 
value 

Weighted Average 

TP Rate FP Rate Precision Recall F-Measure 

4 0.9 0.1 0.9 0.9 0.9 
6 0.925 0.075 0.925 0.925 0.925 
8 0.933 0.067 0.933 0.933 0.933 
10 0.925 0.075 0.925 0.925 0.925 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

112 
 

from the figure 2 above can be seen the results of testing the application of equal-width interval 
discretization on the Naive Bayes method in predicting the suitability of students' majors. In the 
test with 60 sample data, the accuracy value of k = 10 was the best result with 58 successfully 
classified data correctly. Furthermore, in the test with 90 sample data, the best classification 
result is owned by the value of k = 8 with 84 data successfully classified correctly, and the last 
test with 120 sample data, got the best result at value k = 8 where there are 112 data 
successfully classified with correct. 
  
4. Conclusion 

The conclusion that can be summarized in this study is the application of Unsupervised 
Discretization on the Naive Bayes method has quite an impact on the test results, where the 
criteria used for this test are: data on the average value of exact courses, data on the average 
value of non-exact courses, recommendation data and student questionnaire data. And the 
application of Unsupervised Discretization especially equal-width discretization to Naive Bayes 
method in predicting the suitability of the student majors increased from the result of accuracy in 
the previous study by 90% to 93.3%. 
 
5. Acknowledgments 

Researchers would like to thank the Ministry of Research and Technology Higher Education 
Republic of Indonesia (KEMENRISTEKDIKTI) which has helped this research morally and 
financially. 
 
References 
 
[1] A. Saleh, “KLASIFIKASI METODE NAIVE BAYES DALAM DATA MINING UNTUK 

MENENTUKAN KONSENTRASI SISWA ( STUDI KASUS DI MAS PAB 2 MEDAN),” in 
Konferensi Nasional Pengembangan Teknologi Informasi dan Komunikasi (KeTIK) 2014, 
2014, pp. 200–207. 

[2] X. Zhou, S. Wang, W. Xu, G. Ji, P. Phillips, P. Sun, and Y. Zhang, “Detection of 
Pathological Brain in MRI Scanning Based on Wavelet-Entropy and Naive Bayes 
Classifier,” Springer, Cham, 2015, pp. 201–209. 

[3] L. Jiang, C. Li, S. Wang, and L. Zhang, “Deep feature weighting for naive Bayes and its 
application to text classification,” Engineering Application of  Artificial Inteligence, vol. 52, 
pp. 26–39, Jun. 2016. 

[4] B. Tang, S. Kay, and H. He, “Toward Optimal Feature Selection in Naive Bayes for Text 
Categorization,” Feb. 2016. 

[5] A. M. P. and D. S. R., “A sequential naïve Bayes classifier for DNA barcodes,” Stat. Appl. 
Genet. Mol. Biol., vol. 13, no. 4, pp. 1–12, 2014. 

[6] J. Wu, S. Pan, X. Zhu, Z. Cai, P. Zhang, and C. Zhang, “Self-adaptive attribute weighting 
for Naive Bayes classification,” Expert Systems With Application, vol. 42, no. 3, pp. 1487–
1502, Feb. 2015. 

[7] N. Mohamad, N. Jusoh, Z. Htike, and S. Win, “Bacteria identification from microscopic 
morphology using naive bayes,” International Journal of Computer Science, Engineering 
and Information Technology (IJCSEIT ), vol. 4, no. 2, pp. 1–9, 2014. 

[8] Y. Zhang, S. Wang, P. Phillips, and G. Ji, “Binary PSO with mutation operator for feature 
selection using decision tree applied to spam detection,” Knowledge-Based Systems, vol. 
64, pp. 22–31, Jul. 2014. 

[9] S. Kotsiantis, “Integrating Global and Local Application of Naive Bayes Classifier.,” Inter-
national Arab Journal of Information Technology, vol. 11, no. 3, pp. 300–307, 2014. 

[10] S. Palaniappan and T. Kim Hong, “Discretization of continuous valued dimensions in OLAP 
data cubes,” International Journal of Computer Science and Network Security, vol. 8, no. 
11, pp. 116–126, 2008. 

[11] I. Kareem and M. Duaimi, “Improved accuracy for decision tree algorithm based on 
unsupervised discretization,” International Journal of Computer Science and Mobile 
Computing, vol. 3, no. 6, pp. 176–183, 2014. 

[12] G. Forman, “An Extensive Empirical Study of Feature Selection Metrics for Text 
Classification,” The Journal of Machine Learning Research, vol. 3, no. Mar, pp. 1289–1305, 



LONTAR KOMPUTER VOL. 9, NO. 2, AUGUST 2018                p-ISSN 2088-1541   
DOI : 10.24843/LKJITI.2018.v09.i02.p05  e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

113 
 

2003. 
[13] Y. Yang and J. Pedersen, “A comparative study on feature selection in text categorization,” 

in 14th International Conference on Machine Learning, 1997, pp. 412–420. 
[14] A. Genkin, D. D. Lewis, and D. Madigan, “Large-Scale Bayesian Logistic Regression for 

Text Categorization,” Technometrics, vol. 49, no. 3, pp. 291–304, Aug. 2007. 
[15] B. Tang and H. He, “ENN: Extended Nearest Neighbor Method for Pattern Recognition 

[Research Frontier],” IEEE Computational Intelligence Magazine, vol. 10, no. 3, pp. 52–60, 
Aug. 2015. 

[16] A. Saleh, “Implementasi Metode Klasifikasi Naïve Bayes Dalam Memprediksi Besarnya 
Penggunaan Listrik Rumah Tangga,” Creat. Inf. Technol. J., vol. 2, no. 3, pp. 207–217, 
2015. 

[17] A. Al-Ibrahim, “Discretization of Continuous Attributes in Supervised Learning algorithms,” 
Res. Bull. Jordan  ACM, vol. 2, no. 4, pp. 158–166, 2011. 

[18] D. Joiţa, “Unsupervised static discretization methods in data mining,” Titu Maiorescu 
University, 2010.