107Predicting and Analyzing .....(Teny Handhayani; Lely Hiryanto)      

PREDICTING AND ANALYZING THE STUDENTS’ LENGTH
OF STUDY-TIME USING SUPPORT VECTOR MACHINE

Teny Handhayani1; Lely Hiryanto2

1,2Computer Science Department, Faculty of Information Technology, Tarumanagara University
Jln. Letjen S. Parman No. 1, DKI Jakarta 11440, Indonesia

1tenyh@fti.untar.ac.id; 2lelyh@fti.untar.ac.id

Received: 24th February 2017/ Revised: 17th March 2017/ Accepted: 24th March 2017

Abstract - The length of study-time is one of the 
important issues in higher education. The goal of this 
research was to predict and analyze the length of study-
time in the early stage of Computer Science students in 
X University. The research proposed Mutual Information 
(MI) as feature selection method and Support Vector 
Machine (SVM) as a classification method. There were two 
different sections of the experiments. The first experiment 
used two class targets that were grouped in ‘on time group’ 
and ‘late group’. The experiment result shows that the 
proposed method produces accuracy around 85%. The 
second experiment used three class targets, ‘on time group’, 
‘late group’, and ‘very late group’. The experiment result 
of the proposed method produces accuracy around 80%. 
Mutual Information (MI) does not only successfully raise 
the accuracy but also uncovers the relationship between 
subjects and the class targets.

Keywords: Support Vector Machine, Mutual Information, 
length of study-time

I. INTRODUCTION

Students’ grades are one of the important information 
in academics. Every university stores those in the database. 
The students’ grade dataset has some useful information. It 
does not only list the students’ transcripts but also contains 
a pattern of the data for further analysis. The collection 
of students’ grade dataset can be used to build a system 
to predict the length of students time and the students’ 
performance. Predicting the students’ performance is useful 
for academic workers and institution to improve the learning 
and teaching process (Shahiri et al., 2015). Moreover, 
predicting the students’ length of study-time is important 
for the academic worker and institution to help the students 
to arrange their study plan. 

The length of study-time is a part of the important 
issue in Indonesia higher education system. It is a duration 
of study that spent by the students from the first semester 
up to the maximum of the academic year. According to the 
government of the Republic of Indonesia (Dirjen Belmawa, 
2016), there is a different length of study-time. For example, 
the full-time bachelor degree students need around 4 to 7 
years to finish their degree. Full-time students in bachelor 
degree have 3,5 years or 7 semesters as the minimum 
academic year and 7 years or 14 semesters as the maximum 
academic year is. Moreover, the duration of a semester is 
around 5 months. The bachelor degree students who fail 
to finish their study in 7 years will be expelled from the 
university. Then, they are labeled as drop out. 

Indonesian higher education system usually starts 
the academic semester in September and February every 
year. The length of study time is not the only criteria for 
the students to receive bachelor degree status. There are 
some academic and non-academic requirements that must 
be fulfilled to be bachelor degree graduate. However, the 
length of study-time has an important role for the students 
and their institution. It is also one of the criteria to evaluate 
the performance of higher education systems by the 
government. Research related to the length of study-time 
behavior and academic achievement has been conducted by 
Ukpong and George (2013).

Some research in educational data mining has been 
done using various methods. Ogunde and Ajibade (2014) 
predicted the grade of university students using ID3 
decision tree algorithm. They used students’ data such as 
sex, students’ entry grade, entrance examination score, and 
grade obtained in the graduation. The result showed that 
the performance of ID3 algorithm using IF-THEN rules 
had produced accuracy of 79,56%. Moreover,  Shahiri et 
al. (2015) analyzed the performance of decision tree using 
IF-THEN rules, Neural Network, Naïve Bayes, K-Nearest 
Neighbor, and Support Vector Machine to predict the 
performance of students based on some features. The 
features were students’ Cumulative Grade Point Average 
(CGPA), internal and external assessments, extra-curricular 
activities, students’ demographic, high school background, 
social interaction, psychometric factors, and scholarship.
The researchers concluded that Neural Network and decision 
tree performed higher accuracy than other methods. 

Then, Taruna and Pandey (2014) compared the 
performance of decision tree, Naïve Bayes, Naïve Bayes 
Tree, K-Nearest Neighbor and Bayesian Network for 
predicting students’ grade in four classes for engineering 
students. There is also a research conducted by Mouri 
et al. (2016). They used Bayesian Network to predict 
students’ final grade using e-book logs data. Next, Bo et al. 
(2015) implemented deep learning for predicting students’ 
performance for junior high school students. Meanwhile,  
Liu and Cheng (2016) proposed Machine Learning Feature 
Selection (MLFS) and Support Vector Machine (SVM) to 
analyze students’ academic achievement for the elementary 
school. Moreover, the research of educational data mining 
for predicting employability of IT graduates has been 
done by Piad et al. (2016). They identified that IT core, IT 
professional and gender were variables that had significant 
features in predicting IT employability. They applied logistic 
regression which produced 78,4% of accuracy. Moreover, 
there is also a research regarding the unsupervised method 
using K-Means that has been applied in mapping students’ 


108 ComTech, Vol. 8 No. 2 June 2017, 107-114

performance by Harwati et al. (2014). They used dataset 
consisting of some features. Those features were gender, 
national origin, parental job, Grade Point Average (GPA), 
optimization grade, and grade of production planning 
and control. This research mapped the students into three 
clusters, namely performance of low students, average 
students, and smart students. 

The aim of this research is to develop a computer 
system for predicting the length of study-time and analyze 
the data for decision support system. The system is expected 
to predict the length of study-time after the students finished 
their study in the fourth semester. The researchers use a 
dataset from the X University. The name of the university 
is hidden to cover the private information. This research is 
focused on predicting and analyzing the bachelor degree 
students majoring in Computer Science in X University. In 
this research, the researchers are interested in conducting 
the research in Computer Science department. It is because 
according to the information from the faculty, some 
Computer Science students have difficulties in the first 
and second year. Thus, some of them leave or change their 
major in the early stage of the academic year. Based on this 
condition, the researchers use the students from the first to 
the fourth semester.

In X University, the length of study-time for a 
bachelor degree is 3,5 years to seven years. In this university, 
there are two groups regarding the length of study-time. The 
first groups are the students who have a length of study-time 
about 3,5 to 4 years. It is called as ‘on time group’. On the 
other hand, the students who finish their degree in 5 to 7 
years is named ‘late group’. Moreover, the others who need 
more than 7 years or leave their study without completing 
the rules is grouped into ‘drop out’. 

The research, analyzes the list of subjects that have 
an important effect on the students’ grade. The research 
proposes Support Vector Machine (SVM) and Mutual 
Information (MI). SVM is a powerful classifier (Cristianini & 
Taylor, 2000). It implements kernel method that can be used 
to handle non-linear separable data. It has been successfully 
implemented to predict the performance of faculty member 
as stated by Deepak et al. (2016). Meanwhile, Mutual 
Information (MI) is useful to measure the relationship 
between two variables. It can work without affected by the 

data distribution (Smith, 2015). Some researches related 
to MI for feature selection can be found in Alzubaidi et al. 
(2016), Gad and Rady (2015), and Li et al. (2015).

This research is different from the other related 
works. It predicts the length of study-time for Computer 
Science students based on their grades from the first to 
the fourth semester. In general, student’s performance is 
estimated based on their grade in these semester periods. 
This research will predict the graduation for each student. 
The result of this research is necessary for students and 
academic members, especially for the academic planning. 
The advantage of this research is this research reveals a list 
of subjects that contribute more in assigning the length of 
study-time. Moreover, it reveals the relationship between 
subjects and its contribution on students’ length of study-
time.

 
II. METHODS

This research consists of several main phases. The 
first phase is feature selection. The researchers apply Mutual 
Information (MI) to select the appropriate features. After 
feature selection, the data is divided into training data and 
testing data randomly. The second phase is predicting the 
class of length of study-time using Support Vector Machine 
(SVM). The model is formed in training phase using training 
data, and the classification uses testing data. In this research, 
SVM module used is in Scikit-learn (2016). Figure 1 shows 
the flow chart of this research.

Then, research uses the dataset from database in X 
University. It is a dataset of Computer Science students in 
the year of 2008 to 2012. The data consist of 240 alumni 
and 25 subjects. Table 1 shows the list of 25 subjects in the 
first to the fourth semester. The subjects are chosen from the 
mandatory subjects for Computer Science students. The list 
of subjects is collected by considering a recommendation 
from the Head of the Computer Science department. The 
list of subjects are the features, and the length of study-time 
is the class target. Meanwhile, Table 2 shows the sample of 
the data. The dataset is the weight of students’ grade which 
is ranged from 0 to 4.  Table 3 describes the weight, grade, 
and the annotation of grades. 

Figure 1 Flow Chart of the Research


109Predicting and Analyzing .....(Teny Handhayani; Lely Hiryanto)      

Table 1 List of Subjects

No Code Subjects Semester
1 N1 Basic Algorithm 1
2 N2 Calculus I 1
3 N3 Discrete Mathematics 1
4 N4 Management and Computer 

Organization
1

5 N5 Introduction to Computer 1
6 N6 Logic Information 1
7 N7 Advanced Algorithm 2
8 N8 Information Systems 2
9 N9 Linear Algebra 2
10 N10 Statistics 1 2
11 N11 Digital System 2
12 N12 Operating System 2
13 N13 Human Computer Interaction 2
14 N14 Algorithm Analysis 3
15 N15 Statistics 2 3
16 N16 Physics Mechanics 3
17 N17 Database 3
18 N18 Graph Theory 3
19 N19 Introduction to Artificial 

Intelligence
3

20 N20 Object Oriented Programming and 
Java 1

3

21 N21 Differential Equations 4
22 N22 Visual Programming using Visual 

Basic .Net
4

23 N23 Data Structure 4
24 N24 Computer Network 1 4
25 N25 Physics Electric Wave 4

Table 2 Sample Data

Student ID S1 S2 S3 S4 S5 Class
ID001 2,45 2,04 2,71 2,21 2,99 1
ID002 2,15 2,32 1,73 2,04 2,53 2
ID003 2,24 2,10 2,50 3,40 4,00 3
ID004 1,71 1,68 2,25 2,25 3,20 3
ID005 2,73 2,07 3,38 2,43 2,53 2
ID006 2,50 3,04 3,13 1,50 3,00 2
ID007 2,85 2,20 3,08 2,50 2,62 2
ID008 3,32 3,04 4,00 3,05 4,00 1
ID009 2,94 2,40 4,00 2,51 3,53 1
ID010 3,28 2,23 4,00 2,84 3,78 1

Table 3 Students’ Grade Annotation

No Score Grade Annotation
1 W = 4 A Excellent
2 3 ≤  W< 4 B Good
3 2 ≤  W< 3 C Satisfactory
4 1 ≤  W< 2 D Fair
5 0 ≤  W< 1 E Failed

       *W is weight of grade

This research implements two different class 
targets. There are two class targets based on the duration 
of study, namely two classes and three classes. The detail 
of class target is explained in Table 4. The length of study-
time is measured in year. The classification of two classes 
is following the rule in X University in determining the 
‘on time group’ and ‘late group’. The three classes is a 

suggestion from the researchers because the late group has 
longer range of duration, so it might be proper to define 
the new group. The three classes group represent ‘on time 
group’, ‘late group’, and ‘very late group’.

Table 4 Class Target Criteria

Two Classes Three Classes

Class 1: 3,5 ≤ Duration ≤ 4
Class 2: 4,5 ≤ Duration ≤ 7

Class 1: 3,5 ≤ Duration≤ 4
Class 2: 4,5 ≤ Duration≤ 5
Class 3: 5,5 ≤ Duration≤ 7

Mutual Information (MI) measures the relationship 
between two variables. The high score of MI about variable 
indicates that those two variables have a close relationship. 
Meanwhile, the low score describes that there is a weak 
relationship between them. The Mutual information is 
computed using equation (1) (Zhang et al., 2012).

            (1)

MI is implemented to select features which have a 
close relationship to the length of study-time. The features 
that have high MI score means that those have a close 
relationship to the class target. For each pair of feature and 
the class target, it is measured by MI. The average MI score 
is used as a threshold. The score between feature and class 
target which are bigger is chosen as a feature, while the 
others are removed. Figure 2 shows the algorithm of feature 
selection based on Mutual Information.

Figure 2 Feature Selection Using Mutual Information

Support Vector Machine (SVM) is an algorithm 
introduced by Vapnik (Cristianini & Taylor, 2000). It can 
be implemented for classification and regression. SVM 
uses kernel method to handle the nonlinearly separable data 
by mapping the data in high dimensional. SVM computes 
optimum hyperplane separating the dataset in minimum error 
(Cristianini & Taylor, 2000). The original SVM classifies 
the data into two classes, +1 and -1. For instance, the data 
is . The  is feature and  is class 
label of . A hyperplane can be described using equation 
(2) (Liu & Zheng, 2005). If the training data are linearly 
separable, SVM creates optimal hyperplane to separate the 
two classes. If the data of two classes are separable, it can 
be computed using equation (3) (Suykens et al., 2002). On 
the other hand, if the data is non-linearly separable, it can be 
computed using equation (4).

   (2)


110 ComTech, Vol. 8 No. 2 June 2017, 107-114

(3)

.

(4)

Figure 3 illustrates the optimum hyperplane on SVM. 
In Figure 3(a), the data is perfectly separated by a linear 
hyperplane. Meanwhile, Figure 3(b) describes the kernel 
method to separate the nonlinearly separable data on SVM.

Figure 3 Linear and Non-Linear Hyperplane on SVM

Although originally SVM is only available for 
classification of two different class targets, it has been 
developed for classifying more than two class targets or 
multi-class classification. The common algorithms for 
multi-class SVM are one-against-all, one-against-one, and 
Directed Acyclic Graph SVM (Hsu & Lin, 2002). 

In one-against-one SVM, it creates  classifiers, 
where k is the number of classes. For instance, there are 
3 different classes, so the one-against-one SVM creates 6 
classifiers. The hyperplane is constructed from two classes 
that are chosen from k-classes. Table 5 shows the one-
against-one SVM model classifier (Liu, Wang, & Zheng, 
2007).

Table 5 One-against-One SVM

Class A Class B
Class A Class C
Class B Class A
Class B Class C
Class C Class A
Class C Class B

In SVM one-against-all, there are N data                            
{[x1, y1], ... , [xn, yn]} which the xi is feature and yi is class 
label of xi. The  yi  is multi-class, yi ϵ {1, 2, ... , M}. The one-
against-all SVM creates M binary of SVM classifiers. Each 
classifier segregates one class from the other classes. The 
ith of SVM is trained using all training data of the ith class 
that belongs to positive label, and the others are signed as 
a negative label as stated by Liu & Zheng (2005). Table 6 
shows one-against-all SVM model classifier.

Table 6 SVM One-Against-All

Class A NonClass A
Class B NonClass B
Class C Non Class C

... ...
Class m NonClass m

III. RESULTS AND DISCUSSIONS

In the feature selection step, the researchers use MI 
to select the features which have a strong relationship to 
the length of study-time. MI score is computed between 
each feature and the length of study-time. The high MI 
score indicates that the feature has a strong relationship to 
the length of study-time. After computing all MI scores for 
25 features, the average MI score is 0,24. In this research, 
the feature selection method is choosing the features which 
have MI score ≥ 0,24. The outcome of the feature selection 
phase is 12 subjects that can be seen in Table 7. In Table 7, 
Discrete Mathematics has the highest MI score. It describes 
that there is the strongest relationship between Discrete 
Mathematics and to the length study-time.

Table 7 Feature Selection Result

No Code Subjects MI Score
1 N3 Discrete Mathematics 0,30
2 N11 Digital System 0,29
3 N7 Advanced Algorithm 0,28
4 N16 Physics Mechanics 0,28
5 N1 Basic Algorithm 0,27
6 N10 Statistics 1 0,27
7 N24 Computer Network 1 0,27
8 N12 Operating System 0,27
9 N19 Introduction Artificial 

Intelligence
0,26

10 N20 Object Oriented Programming 
and Java 1

0,26

11 N23 Data Structure 0,25
12 N21 Differential Equations 0,24

The researchers conduct two experiments. The 
first experiment is using two class targets, and the other 
is implementing three class targets. The class targets are 
determined in Table 4 based on the length of study-time. 
The system is developed using the scikit-learn module 
(Scikit-learn, 2016). 

The dataset consists of 240 instances and 25 subjects 
(features). Table 8 shows the data distribution of each class. 
There is 69,17% in ‘on time group’ and 30,83% in ‘late 
group’. The experiments are repeated 50 times to select 
70% training data and 30% testing data. The data selection 
considers the distribution of each class for fairness reason. 
In each experiment, the researchers choose the training data 
and testing data randomly. The training and testing data are 
chosen once for each experiment for the fairness. It means 
that the experiment before and after feature selection use the 
same training and testing data. This technique is applied to 
all algorithms.


111Predicting and Analyzing .....(Teny Handhayani; Lely Hiryanto)      

Table 8 Data Distribution

No
Length Study-Time 

(year)
The Number of Instances

1 3,5 45
2 4,0 121
3 4,5 44
4 5,0 11
5 5,5 10
6 6,0 7
7 6,5 0
8 7,0 2

The detail of the first experiment result is shown in 
Table 9. The decision tree and Gaussian Naïve Bayes are 
used to compare the performance of SVM. The experiment 
result shows that feature selection using MI only improve 
the accuracy slightly. It happens to SVM, decision tree, and 
Gaussian Naïve Bayes.  The highest increasing accuracy 
is reached by SVM. It is around 2%. SVM shows the best 
accuracy among decision tree and Gaussian Naïve Bayes. 

Table 9 Experiment Result of Two Classes

No Methods
Before Feature 

Selection
After Feature 

Selection
Avg.Acc. Std. Avg.Acc. Std.

1 SVM Linear 
Kernel

83,64% 0,04 85,72% 0,04

2 Decision Tree 79,39% 0,04 80,97% 0,04
3 Gaussian 

Naïve Bayes
84,33% 0,04 85,03% 0,04

The second experiment uses three class targets that 
are defined by the researchers. In this experiment, SVM 
multi-class classification used is from scikit-learn. There 
are two methods, namely one vs one SVM and one vs rest 
SVM. Both methods use linear kernel. The experiment 
result shows that there is a slight rising accuracy after 
applying feature selection method. After feature selection 
phase, the accuracy of SVM increases about 3%. On the 
other hand, the accuracy of decision tree and Gaussian 
Naïve Bayes only rise 0,33%. Both the first and second 
experiments produce small deviation standard of accuracy. 
The small deviation standard shows that the accuracy of 
each experiment remains stable. Table 10 shows the result 
of the second experiments.

Table 10 Experiment Result of Three Classes

No Methods
Before Feature 

Selection
After Feature 

Selection
Avg. Acc. Std. Avg.Acc. Std.

1 SVM One Vs One 77,2% 0,05 80,58% 0,03
2 SVM One Vs Rest 77,2% 0,04 80,82% 0,04
3 Decision Tree 76,41% 0,05 76,74% 0,04
4 Gaussian Naïve 

Bayes
78,88% 0,05 79,21% 0,05

In the first and second experiments, the researchers 
prefer to use the linear kernel in SVM. It is because the 
linear kernel is the easiest kernel method. It does not require 
tuning parameter kernel that needs further research. In the 
first and second experiments, the performance of SVM 
reaches the best accuracy than the other methods. It might 
be caused by the dataset which is nonlinearly separable data. 
SVM works by mapping the dataset into feature vectors in 
high dimensional, so the data which are impossible to be 
separated in input space can be classified properly in there.

To analyze the relationship among subjects, the 
researchers compute MI score between them. The average of 
MI score is around 0,7. It shows that some of the subjects have 
a strong relationship with others. The network containing 
the subjects has MI score ≥ 0,8.  The interesting subjects are 
the subjects which have degree ≥ 3 in the network. Figure 
4 shows that Advanced Algorithm has the highest degree in 
the network. It means that Advanced Algorithm influences 
some subjects such as Data Structure, Database, Physics 
Mechanics, Physics Electric Wave, Introduction to Artificial 
Intelligence, Differential Equations and Object Oriented 
Programming and Java 1. Advanced Algorithm is also 
affected by Basic Algorithm, Management and Computer 
Organization, and Introduction to Computer. Moreover,  
Introduction to Artificial Intelligence has the second highest 
degree in the network. It has a close relationship with 
Management and Computer Organization, Introduction 
to Computer, Advanced Algorithm and Object Oriented 
Programming and Java 1. Figure 4 also shows the Mutual 
Information network of several subjects

The highest MI score is 0,85 which computed by 
Advanced Algorithm and Object Oriented Programming, 
and Java 1. On the other hand, the MI score between Basic 
Algorithm and Advanced Algorithm is 0,83. In  Figure 5, 
there is a direct relationship between Basic Algorithm and 
Advanced Algorithm, and Advanced Algorithm, Object 
Oriented Programming and Java 1. It also shows an 
indirect relationship of Basic Algorithm, Object Oriented 
Programming and Java 1. It might be affected by the rule 
of the Computer Science department in X University. Based 
on this rule, the students are allowed to enroll in Advanced 
Algorithm class after they have successfully passed Basic 
Algorithm. Furthermore, the students must get minimum 
grade C in Advance Algorithm if they want to enroll in 
Object Oriented Programming and Java 1.

Figure 4 Mutual Information Network 
among Selected Subjects


112 ComTech, Vol. 8 No. 2 June 2017, 107-114

Figure 5 shows the scatter plot of the selected subject 
after feature selection. It shows the relationship between 
the weight of the grade and length of study-time. The C1, 
C2, and C3 are the group of Class 1, Class 2 and Class 3 
which are explained in Table 4. The data are taken from 
the first outcome of the academic report, so some students 
have the weight of grade less than 2,0. It is the standard 
to pass the subjects. The students who have eight grades 
which are less than 2,0 have to enroll the subject in the other 
next semester. Re-enrolling for the same subject in the next 
semester usually affects the length of study-time. In Figure 
5, the group of students who graduate on time (less than or 
equal to 4 years) mostly have a high weight of grade. On the 
other hand, the group of students who graduate more than 4 
years, some of them have the weight of grade less than 2,0. 
It means that they need longer time to finish the study.

In addition, the researchers analyze the grade of 
subjects to find more information. Statistics 1, Physics 
Mechanic, and Advanced Algorithm produce the percentage 

of small grades more than 20%. While, Differential 
Equation, Introduction to Artificial Intelligence and Object 
Oriented Programming and Java 1 have percentage of 
grade ≤ 2,0 around 10% to 16%. By reaching the minimum 
requirement grade in those subjects in early semesters, the 
students can graduate on time. However, the lower grade 
in particular subjects might be caused by the content of the 
courses, the instructors’ performance, and the background 
of the students. Table 11 shows the list of subjects which 
have the percentage of the weight of grade ≤ 2,0.

Moreover, feature selection based on the MI only 
increases the accuracy slightly. In Table 7, MI score of 
each feature and the length of study-time is less than 0,5. 
It means that the features do not have a strong relationship 
with the length of study-time. In fact, the length of study-
time is not only affected by the subjects from the first to 
the fourth semester but also the other subjects in other 
semesters. There are non-academics factors that contribute 
to the length of study-time. There are personal identification, 

Figure 5 Scatter Plot of Selected Features


113Predicting and Analyzing .....(Teny Handhayani; Lely Hiryanto)      

students’ background, demographic, and psychology. 
Those are excluded in this research, but they are important 
information for the students. 

Table 11 List of Subject 
with Weight of grade less than 2,0

Group 1 Group 2 Subject
8,33% 17,08% Statistics 1
6,25% 17,50% Physics Mechanics
3,75% 16,67% Advanced Algorithm
5,83% 10,83% Differential Equations
3,33%   8,75% Introduction Artificial Intelligence
1,67% 10,00% Object Oriented Programming and Java 1
2,08%   6,25% Basic Algorithm
0,83%   6,67% Digital System
0,83%   5,42% Discrete Mathematics
1,25%   3,33% Data Structure
0,00%   2,92% Computer Network 1
0,42%   0,42% Operating System

IV. CONCLUSIONS

This research is for predicting and analyzing the 
length of study-time for the Computer Science students in 
X University. The researchers use the dataset of the weight 
of grade of the particular subjects and the length of study-
time (in year). In this research, the researchers implement 
Mutual Information (MI) to select the proper subjects that 
have high contribution in the length of study-time and 
Support Vector Machine (SVM) to predict the length of 
study-time. The outcome of feature selection process is 12 
subjects.  

The experiments are done in two sections. In the 
first experiment, the researchers use two class targets to 
predict the length of study-time. The performance of SVM 
produces 83,64% of accuracy. After feature selection, the 
accuracy of proposed method reaches 85,72%. In the second 
experiment, the researchers propose three class targets. 
The accuracy of SVM is around 77% and 80% for before 
and after feature selection respectively. The performance 
of the proposed method is higher than the performance of 
decision tree and Gaussian Naïve Bayes.

Feature selection using MI is successfully 
implemented to select the subjects which have a close 
relationship to the class target. It is also can be used to 
detect the list of subjects that contribute more to the length 
of study-time. In the future research, it is necessary to 
include the non-academic factors that might determine the 
length study-time. Furthermore, it is necessary to conduct 
further study to analyze the main problem that causes the 
lower grade in particular subjects.

REFERENCES

Alzubaidi, A., Cosma, G., Brown, D., & Pockley, A. G. 
(2016). Breast cancer diagnosis using a hybrid 
genetic algorithm for feature selection based 
on mutual information. In 2016 International 
Conference on Interactive Technologies and Games 
(ITAG) (pp. 70-76). IEEE.

Bo, G., Rui, Z., Guang, X., Chuangming, S., & Li, Y. (2015). 
Predicting students performance in educational 
data mining. In 2015 International Symposium 
on Educational Technology (ISET) (pp. 125-128). 
IEEE.

Cristianini, N., & Taylor, J. (2000). An introduction to 
Support Vector Machines and other kernel-based 
learning methods. New York: Cambridge University 
Press.

Deepak, E., Pooja, G. S., Jyothi, R. N., Kumar, S. V., & 
Kishore, K. V. (2016). SVM kernel based predictive 
analytics on faculty performance evaluation. In 2016 
International Conference on Inventive Computation 
Technologies (ICICT) (pp. 1-4). IEEE.

Dirjen Belmawa. (2016). Direktorat jenderal pembelajaran 
dan kemahasiswaaan Kemristekdikti. Retrieved 
February 22nd, 2017, from http://belmawa.ristekdikti.
go.id/2016/03/04/kemristekdikti-sosialisasikan-
permen-nomor-44-tahun-2015-tentang-sn-dikti/

Gad, W., & Rady, S. (2015). Email filtering based on 
supervised learning and mutual information feature 
selection. In 2015 Tenth International Conference 
on Computer Engineering & Systems (ICCES) (pp. 
147-152). IEEE.

Harwati, Alfiani, A. P., & Wulandari, F. A. (2014). Mapping 
student’s performance based on data mining 
approach. In The 2014 International Conference on 
Agro-industry (ICoA): Competitive and Sustainable 
Agroindustry for Human Welfare (pp. 173-177). 
ELSEVIER.

Hsu, C. W., & Lin, C. J. (2002). A comparison of methods 
for multiclass support vector machines. IEEE 
Transactions on Neural Networks, 13(2), 415-425.

Li, Y., Ma, X., & Yang, M. (2015). Improved feature selection 
based on normalized mutual information. In 2015 14th 
International Symposium on Distributed Computing 
and Applications for Business Engineering and 
Science (DCABES) (pp. 518-522). IEEE.

Liu, W. X., & Cheng, C. H. (2016). A hybrid method based 
on MLFS approach to analyze students’ academic 
achievement. In 12th International Conference on 
Natural Computation, Fuzzy Systems and Knowledge 
Discovery (ICNC-FSKD) (pp. 1625-1630). IEEE.

Liu, Y., & Zheng, Y. F. (2005). One-against-all multi-class 
SVM classification using reliability measures. In 
International Joint Conference on Neural Networks 
(pp. 849-854). Montreal: IEEE.

Liu, Y., Wang, Rui, & Zheng, Y. S. (2007). An improvement 
of one-against-one method for multi-class Support 
Vector Machine. In Sixth International Conference 
on Machine Learning and Cybernetics (pp. 2915-
2920). Hongkong: IEEE.

Mouri, K., Okubo, F., Shimada, A., & Ogata, H. (2016). 
Bayesian network for predicting students’ final grade 
using e-book logs in university education. In 2016 
IEEE 16th International Conference on Advanced 
Learning Technologies (ICALT) (pp. 85-89). IEEE.

Ogunde, A. O., & Ajibade, D. A. (2014). A data mining 
system for predicting university students’ graduation 
grades using ID3 decision tree algorithm. Journal 


114 ComTech, Vol. 8 No. 2 June 2017, 107-114

of Computer Science and Information Technology, 
2(1), 21-46.

Piad, K. C., Dumlao, M., Ballera, M. A., & Ambat, S. C. 
(2016). Predicting IT employability using data 
mining techniques. In 2016 Third International 
Conference on Digital Information Processing, Data 
Mining, and Wireless Communications (DIPDMWC) 
(pp. 26-30). IEEE.

Scikit-Learn. (2016). Scikit-Learn. Retrieved December 
10th, 2016 from http://scikit-learn.org/stable/  

Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A 
review on predicting student’s performance using 
data mining techniques. In The Third Information 
Systems International Conference (pp. 414–422). 
Procedia Computer Science. 

Smith, R. (2015). A mutual information approach to 
calculating nonlinearity. Stat, 4(1), 291-303.

Suykens, J. A., Gestel, T. V., Brabanter, J. D., Moor, B. 
D., & Vandewalle, J. (2002). Least Square Support 
Vector Machines. London: World Scientific

Taruna, S., & Pandey, M. (2014). An empirical analysis of 
classification techniques for predicting academic 
performance. In 2014 IEEE International Advance 
Computing Conference (IACC) (pp. 523-528). IEEE.

Ukpong, D. E., & George, I. N. (2013). Length of study-
time behaviour and academic achievement in social 
studies education students in the University of Uyo. 
International Education Studies, 6(3), 172.

Zhang, X., Zhao, X. M., He, K., Lu, L., Cao, Y., Liu, J., ... & 
Chen, L. (2012). Inferring gene regulatory networks 
from gene expression data by path consistency 
algorithm based on conditional mutual information. 
Bioinformatics, 28(1), 98-104.