Knowledge Engineering and Data Science (KEDS)  pISSN 2597-4602 

Vol 6, No 1, April 2023, pp. 24–40  eISSN 2597-4637 

 
https://doi.org/10.17977/um018v6i12023p24-40 

©2023 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id 

This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) 

Exploring the Impact of Students Demographic Attributes on 

Performance Prediction through Binary Classification in the 

KDP Model 

Issah Iddrisu a,1,*, Peter Appiahene a,2, Obed Appiah a,3, Inusah Fuseini b,4 

a University of Energy and Natural Resources, Post Office Box 214, Sunyani, Ghana 
b University for Development Studies, Unnamed Road, Tamale, Ghana 

1 issah.iddrisu.stu@uenr.edu.gh*; 2 peter.appiahene@uenr.edu.gh; 3 obed.appiah@uenr.edu.gh; 
 4 obed.appiah@uenr.edu.gh 

* corresponding author 

 
I. Introduction  

Learner assessment is central to determining students' progress in every educational establishment. 

Evaluating students' performance, however, has become a daunting task as more factors are now 

involved when it comes to the determinants of student achievements due to the paradigm shift now 

taking place in the educational sector: the use of learning management systems (LMS), student 

information systems (SIS), and educational management information systems (EMIS). The data 

produced by these systems tend to overwhelm educational decision-makers due to the diversity and 

the massive volume of data housed by these data sources. However, recent research improvements 

have made powerful computational prediction methods and techniques, such as machine learning, a 

realistic alternative for various applications, including Educational Decision Support Systems 

(EDSS). 

Machine learning (ML) is one way that can help decipher the intricate relationship between these 

students' data and their performance. When implemented correctly in learning environments, machine 

learning will improve our knowledge of fundamental processes by simplifying the identification, 

extraction, and evaluation of underlying factors affecting student learning and achievement levels. 

Much progress has been made in machine learning about its use in other fields such as medicine, 

commerce, the transport industry, bioinformatics, road traffic detection and control, and in diverse 

fields where decision-making is crucial [1]. ML involves searching through many possible hypotheses 

to ascertain the most appropriate and relevant data and then comparing it with existing data generated 

by the learner. The idea of machine learning is derived from various disciplines, such as probability 

and statistics, computational complexity, information theory, neurology, evolutionary theories, and 

models [2]. 

ARTICLE INFO A B S T R A C T   

Article history: 

Received 9 March 2023 

Revised 20 March 2023 

Accepted 21 April 2023 

Published online 30 April 2023 

 
During the course of this research, binary classification and the Knowledge Discovery 
Process (KDP) were used. The experimental and analytical capabilities of Rapid 
Miner's 9.10.010 instructional environment are supported by five different classifiers. 
Included in the analysis were 2334 entries, 17 characteristics, and one class variable 
containing the students' average score for the semester. There were twenty 
experiments carried out. During the studies, 10-fold cross-validation and ratio split 
validation, together with bootstrap sampling, were used. It was determined whether or 
not to use the Random Forest (RF), Rule Induction (RI), Naive Bayes (NB), Logistic 
Regression (LR), or Deep Learning (DL) methods. RF outperformed the other four 
methods in all six selection measures, with an accuracy of 93.96%. According to the 
RF classifier model, the level of education that a child's parents have is a major factor 
in that child's academic performance before entering higher education. 

This is an open access article under the CC BY-SA license 

(https://creativecommons.org/licenses/by-sa/4.0/).  

Keywords: 

Student Demographic 

Performance Prediction 

Classification 

KDP Model 

http://u.lipi.go.id/1502081730
http://u.lipi.go.id/1502081046
http://journal2.um.ac.id/index.php/keds
mailto:keds.journal@um.ac.id
https://creativecommons.org/licenses/by-sa/4.0/
https://creativecommons.org/licenses/by-sa/4.0/


 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 25 

 
The ML design approach leans itself against several criteria that embody identifying the natural 

experiences acquired from training, the exact function to learn, a demonstration for the said function, 

and the optimal algorithm for learning it according to the training examples. ML algorithms commonly 

used include; Decision trees (DT), Support Vector Machines (SVM), Artificial Neural Networks 

(ANN), Logistic Regression (LR), Naïve Bayes (NB), and rule inductions (RI) algorithms. Similar to 

the other fields where ML has been successfully employed, its application on educational data is a 

promising area in research identified as Educational Data Mining (EDM). It involves creating 

processes to extract patterns embedded in datasets within educational settings [3]. This concept has 

been implemented to improve and assess educational activities and decision-making. 

Prediction, which encompasses the subcategories of classification, regression, and density 

estimation, is a paradigm in EDM [4]. Relation mining, association mining, correlation mining, 

sequential pattern mining, and causative data mining are all types of clustering [5]. In addition, 

prediction also incorporates data distillation to aid in human logic and model finding. EDM has proven 

to be the primary source of solid and dependable data analysis regarding educational decision-making 

at the country's educational institutions [6][7]. It carefully identifies education challenges to determine 

appropriate solutions that address them. The inclusion of an Expert System in managing primary 

education due to EDM has been enumerated in [8] and [6]. Educational Data Mining has been used to 

track the academic welfare of students and the general administrative procedures of educational 

institutions worldwide [9][10]. 

It is essential to be aware of the factors (also known as the predictor variables) that influence 

students' academic performance to comprehend and enhance the current state of the educational 

system [11]. Therefore, determining the characteristics associated with students' academic 

accomplishment has always aroused the interest of academics who work in EDM. Many earlier studies 

dissected this phenomenon by isolating one variable at a time. They attempted to investigate the 

relationship between a single element and its impact on academic accomplishment by collecting data, 

the majority of which was obtained using instruments of the survey type. Previous research works 

have been published in the academic world to determine the primary elements or characteristics that 

influence the learner's achievement, including the algorithms that produce the best prediction result. 

Students' apparent poor performance in numerous educational establishments has been influenced 

by various predictors [12][13]. They include personal characteristics, intellectual ability, gender and 

aptitude tests, academic achievement, previous college accomplishments, and demographic 

characteristics [14] in modeling students’ academic performance based on their cognitive and non-

cognitive characteristics [11]. Seven ML heterogenous lazy classifiers were employed, including DT, 

KNN, ANN, LR, RF, AdaBoost, and SVM. They used the 10-fold and leave-one-out cross-evaluation 

techniques to evaluate the selected classifiers' predictive performance. The student's absent days 

(SAD) were the dominant feature for predicting students’ academic success. It was also concluded 

that the RF, LR, and ANN were viable in predicting students' performance. 

Implementation of ML to determine students' academic achievement based on the student's internal 

assessment data constructed an ANN algorithms-based prediction model [15]. The best classification 

accuracy attained by the model was 95.34% through the ANN. Furthermore, the Precision, Recall, F-

Score, Accuracy, and Kappa Statistics efficiency were derived as rule-based decision specifications 

to discover the most practical classification methods. However, the study presented inconsistent 

observations on which specific machine learning model is most accurate in predicting students' 

performance. Investigating factors affecting students' performance at the postgraduate level by using 

the ANN for constructing the model [16]. The study presented a model using the deep learning 

approach for performance prediction based on 395 postgraduate students and 30 records within the R 

data mining environment. A comparison of the accuracy of the LR, the RF technique, and the ANN 

revealed that the LR performed with 12.339% accuracy. The RF gave an accuracy of 28.101%, and 

the ANN had an accuracy of 97.429% on the given dataset. With this prediction accuracy, it was 

concluded that ANN is more reliable and demonstrates improved classification results than other 

traditional classifiers. The dataset used in the study was based on the attributes from institutions of 

higher learning. It will be interesting to apply the same model to datasets of pre-tertiary institutions to 

validate the model's generalization. 


26 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
Investigated the prediction of students' learning outputs and explored the likelihood of recognizing 

the critical features in the data to be used in creating the prediction model using visualization and the 

clustering algorithm techniques [17]. The outcome demonstrated the capability of the clustering 

algorithm in classifying significant indicators within the datasets. In addition, the study showed the 

efficiency of SVM and Learning Discriminant Analysis (LDA) algorithms in training educational 

datasets while giving satisfactory classification accuracy and reliability test rates. However, the small 

data set cannot be generalized to prove the model's efficacy on all educational datasets. 

Three different ML technique was used to forecast student performance [18]. The DT, NB, and 

LR classifications were employed here. The feature engineering criteria and the modification and 

selection of dataset characteristics were applied to enhance the predictions made by ML algorithms. 

The dataset used was put in two separate categories. The research findings suggest that using ML to 

anticipate student performance may be helpful. The most successful method from the first dataset was 

NB classification, with 98% accuracy; DT did better for the second batch of data, with 78% accuracy. 

In the study, the specific attributes and techniques capable of determining future learning outcomes 

could not be identified, presenting a conceptual vacuum that warrants further investigation. 

Studies on the relationships between the instructional strategies employed by instructors and 

educators and how they impact students' academic performance have recently attracted more attention. 

Most research focuses on achievement due to the use of assessment techniques such as class tests, 

homework, class exercises, project work, and semester examinations [19]. When predicting a student's 

future academic success, past grades from an academic institution are seen to have the appropriate 

amount of weight as enumerated by [20], mainly when those grades come from continuous 

assessment, which shows a student's early mastery of a topic and progress of the study. Explored the 

efficacy of assessments using examination techniques, class tests, assignments, and mid-semester 

quizzes, including the influence of lecturer response on students' performance [21]. The study's 

outcome revealed a correlation between the assessments students took and, eventually, the student's 

final grades. Another investigation exploring the relevance of formative assessment to improve the 

prediction of learner grades in examinations suggested the possibility of identifying students who may 

perform poorly in their final examinations. The possibility of being able to forecast, with a degree of 

accuracy, how a student will perform at an end-of-course examination [22]. The effects of giving 

assessment feedback on time to students often result in a small quantity of enhancement in the final 

grades [23].  

Predicting the validity of previous achievements in determining students' performance in higher 

education [24]. The high school Scholastic Assessment Test (SAT) score marks and the early years' 

university grades were considered possible predictors of future performance. The impact of subjects 

on students' advanced placements was also investigated. Their finding clearly connected these three 

characteristics and students' university accomplishments. Among the factors that influence students’ 

performance are school effects, socio-economic background, and personal traits hindering students' 

performance [12]. Student background characteristics such as education levels, the profession of 

parents/guardians, and place of residence all play an essential part in defining students' success Tinto 

(1975) [24]. This is further corroborated by referring to these phenomena on students' academic 

success as "a one-hundred-factor problem," as many researchers focused on different aspects of 

students' performance in different periods and came out with diverse conclusions [25].  

Examining the impact of socio-economic influence on the upbringing of students and the final 

results of their education, realized that students from privileged backgrounds attained higher grades 

or had necessary skills that proved valuable within the academic setting [26]. This suggests that the 

level of poverty and even the area students come from can affect a student's academic output. 

Furthermore, this suggests that a student's home environment is a contributory factor to his 

performance.  

In Serbia, some demographic features, including gender, ethnicity, and the students' school 

background, were investigated to determine which among them had more influence on the student’s 

academic performance in Mathematics and the Serbian language [27]. The result indicated that student 

affluence contributed the most to poor mathematics performance, whereas the Serbian language 


 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 27 

 
grades were less affected. Gender had a relatively minimal effect on the grades suggesting that gender 

had less effect on students' performance at the university level. Integrating demographic data 

alongside school results is recommended because learner achievement is based almost entirely on 

students' past exam results, mostly without consideration for the setting where some of these 

performances had been accomplished [28]. Again, research on student achievement and the 

associations with context-specific background variables and attainment in broader terms was limited 

mainly [12][13]. Hence the need to delve into the correlation between students' performance and their 

demographic variables.  

More so, literature in this regard has failed to provide further remedies or intervention strategies 

based on identifiable traits early in a student's programs of study. As a result, the goal of this research 

is to execute ML on students’ demographic characteristics to track their achievements, as well as 

design a classification model capable of mapping student features and performance in order to 

effectively implement the Ministry of Education’s (MOE) flagship early intervention scheme to 

improve underperforming students' academic achievements in schools.  

The paper aims to identify and apply ML algorithms to uncover the key demographic factors that 

influence newly admitted students' academic achievement as well as identify students to receive 

appropriate academic intervention so that overall school performance can be scaled up in the West 

Africa Senior Secondary Certificate Examination (WASSCE). The research aims to examine and 

address the following set of questions: 

1. Which machine learning classification algorithms are more viable in predicting students' 
academic attainment based on their demographic attributes? 

2. What primary demographic attributes influence students' academic performance at Ghana's 
Senior High School (SHS) level?  

II. Methods  

This study employed the experimental research approach using binary classification techniques 

based on the six-step KDP model. The classification technique was used to sort the students into either 

in need of intensive intervention or low intervention. 

We employed secondary data from two sources. Based on the placement forms of students from 

the Computerized School Selection and Placement System (CSSPS), the demographic, Basic 

Education Certificate Examination (BECE) average score, and previous school data were extracted. 

In contrast, the semester average score and the Grades for English Language, Mathematics, and 

Integrated Science for their Senior High School (SHS) performance were also extracted from the 

Student's Information System (SIS). Also, with the suggestion of the domain expert (ICT coordinator 

of Tamale Islamic Science Senior High School (TISSEC), the following student attributes were 

considered helpful for the task at hand: mother's education level, father's education level, Sponsor for 

the student's education, the birth position of the student in the family, and parental status of students. 

This study used 1854 records and 17 common attributes (including the class attribute) for training and 

evaluating the various models. The description of students' features used in the study is summarized 

in Table 1. 

A. Dataset Optimization and Feature Extraction 

Primary and real-world data will invariably contain imbalanced data challenges [29]. For example, 

whenever the number of instances from one class (the minority class) is significantly lower than the 

number of instances from the other classes (the majority class), the minority class may be the most 

effective, leading to the highest error cost in terms of learning [30]. The Synthetic Minority 

Oversampling Technique (SMOTE) with default settings was used as a sampling technique to upscale 

the minority classes (i.e., Students' demographic variables) to manage class imbalance within the 

features. The upscaling synthetically increased the number of demographic variables by 79% within 

the local repository of Rapid Miner after the SMOTE Up-sampling application.  

 
28 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
Table 1. Attributes extracted from the database 

No. Attribute Name Data Value Data Type 

1 Gender  [1]: Male = M: [2] Female = F  Nominal 

2 Student position in the family [1]:1st born, [2]: last born, [3]: others, [4]: only child Numeric 

3 Parents marital status [1]: married,[2]: Single, [3]: widowed Nominal 

4 Fathers Edu. [1]: Primary school, [2]:  Junior High School (JHS), [3] 

Secondary school (SHS), [4]: Tertiary, [5]: Non 

Nominal 

5 Mothers’ Edu. [1]: Primary school, [2] Junior High School (JHS), [3]: 

Secondary school (SHS), [4]: Tertiary, [5]: Non 

Nominal 

6 Fathers’ Occ. [1]: Retired, [2]: Government, [3]: private sector 

employee, [4]: self-employment, [5] 

Nominal 

7 Mothers’ Occ [1]: Retired, [2]: Government, [3]: private sector 

employee, [4]: self-employment, [5] 

Nominal 

8 Sponsor [1]: Self, [2]: Parent, [3]: scholarship, [4]: others Nominal 

9 Residential Status [1]: Boarding, [2]:  Day Nominal 

10 Type of JHS attended [1]: Private, [2]: Public Nominal 

11 BECE Aggregate [1]: 6-8, [2]: 9-11, [3]: 12-15, [4]: 16-19, [5]: 20-24, [6]: 

25-30, [7]: above 30 

Numeric 

12 BECE Accumulated raw 

score 

[1]: 50-100, [2]: 101-200, [3]: 201-300, [4]: 301-400 Numeric 

13 First Semester AVG score [1]: 00-45, [2]: 46-50, [3]: 51-55, [4]: 56-60, [5]: 61-65, 

[6]:66-70, [7]: [8]: 71-79, [9]:80 and above 

Numeric 

14 Region of Residence [1]: N/R, [2]: A/R, [3]: G/R, [4]: C/R, [5]: U/W, [6]: 

U/E, [7]: S/R, [8]: NE/R, [9]: W/R, [10]: E/R, [11]: V/R, 

[12]: O/R, [13]: Ahafo/ R, {14]: Bono East, [15]: Bono, 

[16]: WN/R 

Nominal 

15 Integrated Science [1]: A1, [2]: B2, [3]: B3, [4]: C4, [5]: C5, [6]: C6, [7]: D7, 

[8]: E8,[9]:  F9 

Nominal 

16 English Language [1]: A1, [2]: B2, [3]: B3, [4]: C4, [5]: C5, [6]: C6, [7]: D7, 

[8]: E8,[9]:  F9 

Nominal 

17 Mathematics [1]: A1, [2]: B2, [3]: B3, [4]: C4, [5]: C5, [6]: C6, [7]: D7, 

[8]: E8,[9]:  F9 

Nominal 

Since not all attributes have equal significance in prediction within a defined dataset, feature 

extraction and order are critical. Given this, the attributes were sorted on information gain by weight, 

as seen in Table 2. The operator "Weight by Information Gain", was used in RapidMiner to determine 

the order of the attributes. Figure 1 depicts the descending order of information gained from common 

attributes to class attributes. 

Table 2. Attributes weights by information gain 

No. Attribute Information Gain No. Attribute Information Gain 

1 Mother Education 0.157 9 Region of Residence 0.016 

2 Father Education 0.139 10 Age 0.013 

3 BECE Raw Score 0.131 11 JHS Location 0.012 

4 BECE Aggregate 0.123 12 Sponsor 0.003 

5 Father Occupation 0.083 13 Student Birth Position 0.001 

6 Mother Occupation 0.030 14 Parent Marital Status 0.000 

7 JHS Type 0.026 15 Student Maturity 0.000 

8 Residential Status 0.016 16 Gender 0.000 

 
B. Modeling Technique and Model Building 

Experiments were conducted in this study to build models by incorporating specified classifiers 

for predicting the performance of pre-tertiary students based on demographic information. Five 

classification approaches were used for model construction to meet the study's aims. RapidMiner 

Studio was used to conduct the analysis. The RF algorithms from DT, RI algorithms from rule-based 

classifiers, the NB algorithm from Bayesian Networks, the LR algorithm from Regression, and DL 

algorithms from NN were chosen for the experiments among the various classification algorithms 

available in RapidMiner. The grounds for selecting the algorithms are their capacity to handle 

polynomial attributes effectively, the ease of understanding and interpretation of the model's outcomes 

for the investigations, and their popularity in recent years in education-related classification problems. 


 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 29 

 
Fig. 1. A line graph of information gain of attributes 

 
C. Description of the selected algorithms 

First, a classification method is used to construct a decision tree (DT). The classification processes 

are described in this instance via a hierarchical array of decisions on feature variables that manifest in 

the shape of a tree [31]. DT are made of nodes joined to constitute a rooted tree; therefore, it is a 

directed graph comprising nodes known as roots without incoming edges (Figure 1). The other nodes 

that determine the class of objects are known as the leaves or terminal nodes [32]. Every leaf is 

attributed to a class representing the most appropriate target value [33]. Nodes with a blend of diverse 

classes are to be split further. A stopping criterion determines when the decision tree algorithm should 

terminate. When an entire training sets in the terminal/leaf node fit within a particular class, then the 

stopping criterion is said to be reached [34]. Figure 2 illustrates a typical DT structure [35]. 

 
Fig. 2. Concept of a decision tree 


30 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
Every node matches a characteristic, while the branches link with an array of values. All nodes are 

labeled with the attributes they test, and every branch has its corresponding values [36]. The range of 

values is mutually exclusive and complete. The properties of a tree being disjoint or complete are vital 

as they ensure every instance maps to one case (Figure 2). 

Averaging ensemble approaches include the Random Forest (RF) algorithm. RF represent huge 

feature areas and are more resilient than DT. RF is a bagged classifier that connects a group of DT 

classifiers to form a forest of trees [37]. A diverse collection of classifiers is formed by integrating 

randomization into the classifier-building process. The ensemble prediction is presented as the average 

prediction of the discrete classifiers [2]. In RF, every tree in the ensemble is created using a unique 

bootstrap sample, which includes a random selection of instances with replacements from the entire 

training dataset [38]. 

Random feature selection is used in an RF [39], where ‘𝑚’ features are chosen randomly from ‘𝑀’ 
features for every node of the DT “𝑡”, and the optimal value is taken from “𝑚”. Therefore, the split 
determined when splitting a node throughout tree formation is no longer the best among all features. 

Alternatively, the chosen split is the best among a randomly picked collection of characteristics. As a 

result, the forest bias often grows concerning the bias of a single non-random tree [40]. However, 

averages generally compensate for an overall model's increase in bias. Table 3 shows a description of 

RF-optimized parameters and data types within RapidMiner.  

Table 3. Some random forest algorithm parameters with their values in RapidMiner 

Parameters Value Description Type 

Criterion Information 

Gain 

Determine the criteria by which the qualities will be divided apart. The 

value is improved for each of these criteria about the selected parameter. 

Nominal 

Apply 

Pruning 

True Upon development, the random forest model's random trees can be 

trimmed. Depending on the confidence variable, some branches are 

supplemented by leaves if approved. 

Boolean 

Random Split False Rather than being balanced, this setting divides numerical characteristics 

randomly. 

Boolean 

Number of 

Trees 

20 This indicates how many random trees will be produced. Numeric 

Maximal 

Depth 

10 A tree's depth fluctuates According to the supplied example set's 

dimensions and properties. 

Numeric 

Confidence 0.1 This option specifies the confidence level used in pruning's pessimistic 

error computation. 

Numeric 

Voting 

strategy 

Majority 

Vote 

Outlines the prediction plan if the model prediction is in disagreement. Nominal 

The second is Bayesian Classification. The Bayesian classifiers also called the Naïve Bayes (NB), 

are based on statistical classifiers derived from the Bayesian theorem [41]. The accuracy and speed of 

the Bayesian classifiers have been proven to be of high magnitude on large databases [14]. The 

Bayesian classification offers a pictorial view of underlying associations on which to perform learning. 

A Trained Bayesian holds that networks can be helpful in classification [42]. A Bayesian classification 

graphical model is indicated in Figure 3. 

 
Fig. 3. Bayesian graphical model  


 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 31 

 
Let 𝑋 denote a data tuple labeled as the measurements on 𝑛 attributes. Let 𝐻 denote the hypothesis. 
Then, 𝑃(𝐻|𝑋) denotes the probability of 𝐻 being actual is based on 𝑋. 𝑃(𝐻|𝑋) denotes the probability 
of 𝐻, conditioned on 𝑋. On the other hand, 𝑃(𝐻) denotes the prior probability of hypothesis 𝐻. 
Correspondingly, 𝑃(𝐻|𝑋) denotes the posterior probability of 𝑋 conditioned on 𝐻, while 𝑃(𝑋) is the 
prior probability of 𝑋. The Bayesian theorem offers a criterion for computing the posterior probability, 
𝑃(𝐻|𝑋) from 𝑃(𝐻), 𝑃(𝑋|𝐻), and 𝑃(𝑋). The equation is denote in (1). 

𝑃(𝐻|𝑋) =
𝑃(

𝑋

𝐻
)𝑃(𝐻)

𝑃(𝑋)
          (1) 

For classification problems, 𝑋 will represent an observed data tuple, assuming 𝐻 as a hypothesis 
binding on 𝑋 with class C. These are used to establish the probability of 𝑃(𝐻|𝑋)  that binds on tuple 
𝑋 in class C, according to the attribute depiction of 𝑋 [43]. The NB algorithm makes learning simple 
by assuming that variables are autonomous of a specific class while offering a probabilistic 

interpretation of classification [11]. Though autonomy is a wrong assumption in general, the NB 

classifier frequently outperforms more advanced classifiers in practice. For example, while employing 

NB to analyze university and primary school students' performance, [44] found that the NB algorithm 

had superior accuracy in predicting the performance of primary school students. 

Third is rule-based classifiers. A typical rule is described as follows: 𝐼𝐹 a condition exists, 𝑇𝐻𝐸𝑁 
the result [32]. The antecedent condition is on the rule's left side and consists of a variety of logical 

operators, comprising of >, <, =, & & OR, mainly employed on feature variables. The consequent that 

generates the class variable is on the rule's right side. RI is a rule presented as Qi→c, Qi being the 

antecedent and C as a class variable. The symbol → epitomizes a condition “𝑇𝐻𝐸𝑁”. The symbol Qi 
denotes a condition applied to the feature set [43]. A rule is of the form: 𝐼𝐹 (attribute 1; value 1) and 
(attribute 2; value 2) and …… (Attribute n; value n) 𝑇𝐻𝐸𝑁 (decision; value).  

Rule induction is experimented with in the study and is a widely applied rule-based classification 

technique. As stated, [33] rules are good when denoting information and aspect of information. RI 

generates rules by dividing and conquering the training set, bringing out all instances bound by the 

rule. Rule induction uses the divide-and-conquer and separate-and-conquer rule learning approaches. 

The rule algorithms generate a decision list, an ordered set of rules. Through J48, rule induction 

discovers rules based on partial DT, develops a partial C4.5 decision tree, and translates the "best" 

leaf into a rule [41]. Typically, an if-then rule has the form: IF mother education = primary AND 

mother occupation = Government AND JHS location= Urban THEN Status = Low Intervention. 

Fourth is Support Vector Machines (SVM). SVM is a learning algorithm to study and understand 

classification and regression rules. Support Vector Machines (SVMs), for example, can be used to 

train radial base functions (RBFs), polynomials, and multilayer perceptron (MLP) classifiers [14]. The 

SVMs are derived from the statistical learning concept, which aims to solve related problems, except 

more complex ones, as a transitional step [45]. The SVM belongs to the supervised learning algorithm 

family capable of generating learning rules based on the given training dataset. The SVM has a 

comprehensive theoretical basis and entails comparatively fewer data samples for the training; 

investigations indicate that SVM is not sensitive to sample dimensions [46]. 

Fifth is neural networks (NN), simulating humans' NN system. It comprises an interrelated cluster 

of artificial neurons processing information based on a connectionist technique for calculations [3]. 

The NN framework is made up of interconnected nodes through a directional link. Every node presents 

itself as a processing unit, and each link depicts a causal association among the nodes. The nodes are 

adaptive (the outputs of the nodes are based on the modifiability of the parameters concerning these 

nodes) [46].  

Every node in the input layer of an artificial neural network (ANN) matches a predictor when the 

ANN is first constructed. After that, the input nodes are connected to various other nodes contained 

within the hidden layer. Every input node is connected to other hidden layer nodes within the network. 

The inner layer nodes are linked via other inner layers or directly to an output layer. One or several 

response variables constitute the output layer [32]. 


32 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
Next to the input layer, the other nodes take in inputs, multiply the inputs by a connection weight 

𝑊𝑥𝑦 (nodes 1 to 3 are put as 𝑊13), sum them, and then apply a function (known as activation or 
squashing function) to them, then transfer the results to the next layer. For instance, values passed 

from nodes 4 to 6 are put as activation functions to ([𝑊14 * value of node 1] + [𝑊24 * value of node 
2]). Figure 4 depicts an NN structure. 

 
Fig. 4. A neural network with one hidden layer 

The most basic deep networks are feed-forward deep networks, commonly known as multilayer 

perceptron (MLPs) [46]. The MLP is the most implemented NN architecture in predictive data mining. 

The MLP is based on the feed-forward deep network with many possibly concealed layers, with the 

input and output layers connected [46]. The feed-forward neural network has no interconnections 

between nodes within a given layer; instead, outputs from one layer are used as input information to 

nodes in subsequent layers. This ensures modularity within the network, i.e., nodes are coherent in 

functionality or provide an equivalent level of abstraction on input vectors [33]. 

The last is regression, commonly employed in predictive model building and the analytical 

processes in data mining. Regression predictions are primarily centered on historical data using 

functions and formulas [47]. It is mainly a statistical approach to data mining. Regression is 

implemented to derive a model between dependent and independent variables [47]. Regression is also 

used to build a model to analyze existing datasets to forecast trends using linear or logistic regression 

(LR) techniques derived from statistical methods where functions are driven from an existing dataset. 

The derived data is subsequently mapped to the functions to assist in predicting [48]. 

The LR algorithm is applied to build a regression model using categorical dependent variables. LR 

is put into three categories (1) binary, in the case of binary response variables, (2) multinomial – for 

the above two non-ordered dependent variables (3) ordinal for an ordered category [33]. Researchers 

and data analysts generally use LR to analyze and classify proportional and binary response data [49]. 

The LR can effortlessly handle probability and multi-class issues in classification. 

D. Research Design and Evaluation Metrics 

This study is based on experimental research that employs binary classification techniques. The 

data comprised numerical (e.g., age, test scores, etc.) and nominal (textual data), e.g., gender, 

residential status, and former school. The experimental study concepts are chosen because they are 

the basic approach to studying cause and effects (cause/effect) connections and studying the relations 

between two variables [33]. Also, Experimental research is used by researchers to make comparisons 

between two or more groups on one or more metrics.  

The research again employed a hybrid data mining model development approach based on the 

KDP model to carry out the study. This approach gives the researcher a deeper understanding of the 

problem than deploying only one approach. This design methodology was employed to obtain a much 

more broad-minded, research-oriented explanation of the phases; it symbolizes a data mining process 

rather than just a modeling step; and has numerous novel, clear, and specific feedback loops [33]. 

Figure 5 adopted indicates the six steps KDP modeling approach comprising of understanding the 

research problem, understanding data, preparation of data, mining the data, analyzing the knowledge 

base, and using the knowledge that has been discovered [50]. 

 
 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 33 

 
Fig. 5. The six-step KDP model 

Evaluation of model performance is an essential rating for models’ effectiveness, improving 

parameters during the iterative learning process, and choosing an acceptable model from an 

assortment of models [51]. The following six widely known performance metrics were used to 

compare and select algorithms for evaluating the classification task: accuracy, precision, sensitivity, 

specificity, AUC, and F-measure to construct a robust model. 

The most prevalent metric for measuring the feasibility of a model is its accuracy. A data mining 

classifier's correct accuracy is measured by how well its predictions match the actual true or false 

values. The equation for accuracy can be seen in (2). Precision for a class is equivalent to counts of 

true positives (i.e., the counts of instances rightly considered as positive) divided by the total count of 

instances considered as a positive class (i.e., summation of true positives and false positives) as in (3). 

Here, recall can be explained as a ratio of the number of true positives to the overall count of instances 

belonging to the positive class (i.e., summation of true positives and false negatives). i.e., instances 

not considered to belong to the positive class, though they belong to it). Recall carries the same value 

as sensitivity in model performance denote in (4). 

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =  
𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑁
          (2)                                 

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =  
𝑇𝑃

𝑇𝑃+𝐹𝑃
          (3) 

𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃

𝑇𝑃+𝐹𝑁
           (4) 

Similarly, the negative class's precision and recall are defused. Use precision is determined by the 

proportion of instances categorized as negative that negative. In contrast, the ratio of true negatives to 

the total number of instances of the negative class will provide a recall for users. The F-measure is a 

metric for evaluating the performance of classifiers using confusion matrices. F-Measure is the 

opposite correlation between accuracy and recall, defined as the harmonic mean of precision and recall 


34 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
as in (5). It is essential to determine if a model's accuracy and recall are pretty well balanced [52]. The 

"true negative rate" is the name given to specificity. It provides information on the percentage of actual 

instances of negativity that a given model has correctly predicted as negative denote as in (6). It 

measures the proportion of real negatives to all negatives. 

𝐹 − 𝑀𝑒𝑎𝑠𝑢𝑟𝑒 =  
2 X Precision X Recall

Precision+Recall
        (5) 

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =  
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒+𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
        (6) 

The area under the ROC curve (AUC) calculates the area under the ROC curve from (0,0) to (1,1) 

in two dimensions. The AUC defines an overall assessment of performance across all potential 

categorization criteria. AUC may be seen as the likelihood that a random positive instance will be 

ranked higher than a random negative instance in a given model. 

E. Experimental Settings and Experimentation with selected algorithms 

The models were developed and simulated in the design view of the Rapid Miner’s modeling 

environment using a Fujitsu laptop computer with Windows 10 pro (version 21H2) 64-bit operating 

system, an X64-based processor (Intel(R) Core (TM) i7-4702MQ CPU @ 2.20GHz 2.20 GHz) and 8 

Gigabytes of Random Access Memory (RAM). 

The K-fold cross-validation and the split validation were employed in each experiment as metrics 

evaluation techniques. The default parameter relative ratios of 0.7 for training and 0.3 for testing were 

adopted in split validation. In 10-fold cross-validation, data is arbitrarily subdivided into ten mutually 

exclusive equal subgroups of one to ten. Training and testing are repeated ten times. The initial 

subgroup is reserved as a test set. 

The exploration method was used to identify the most suitable algorithm during the 

experimentation process. Four different experiments were conducted for each of the five algorithms 

used in the study (random forest, rule induction, Naïve Bayes, regression, and deep learning) as 

follows: 

Experiment 1: Experimenting algorithm with split (ratio split) validation test mode.  

Experiment 2: Experimenting algorithm by employing Bootstrap resampling with a split (ratio split) 

validation test mode.  

Experiment 3: Experimenting algorithm with 10-Fold Cross validation test mode.  

Experiment 4: Experimenting by employing Bootstrap resampling with 10-Fold Cross validation test 

mode. 

 A pictorial representation of the study method is illustrated in Figure 6. 

III. Result and Discussion 

This section presents the results of the random forest model on the dataset to discover the student 

demographic variables influencing their performance. 

A. Determination and Evaluation of the Best Classification Model for predicting students’ 

achievements 

 RQ1: Which machine learning classification algorithms are more viable in predicting students' 

academic attainment based on their demographic attributes? 

 One of the primary goals of this study is to identify a suitable ML classifier capable of predicting 

students' academic success based on demographic characteristics. Five algorithms were explored to 

implement the classification modeling: RF, RI, NB, LR, and DL. The results of the experiments are 

presented in Table 4. 

 
 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 35 

 
Fig. 6. A pictorial depiction of the study framework 

Table 4. Summary of best-performing models from the five algorithms 

A
lg

o
r
it

h
m

 
T

e
st

 m
o

d
e
 

A
c
c
u

r
a

c
y
 

P
r
e
c
is

io
n

 
S
e
n

si
ti

v
it

y
 

S
p

e
c
if

ic
it

y
 

F
-M

e
a

su
r
e
 

A
U

C
 

RF Pruning using Bootstrap resampling with 10-Fold 

Cross-validation 

93.96% 93.19% 94.97% 92.94% 94.04% 0.980 

RI With Ratio Split validation 83.00% 83.48% 82.29% 83.71% 82.88% 0.879 

NB Using Split validation 79.43% 78.77% 80.57% 78.29% 79.66% 0.879 

LR Using split validation 81.57% 80.44% 83.43% 79.71% 81.91% 0.892 

DL Using Bootstrap resampling with 10-Fold Cross 

Validation 

84.45% 82.15% 88.49% 80.35% 85.11% 0.924 

 Comparing the six-performance metrics in Table 4, RF (pruned) implementing Bootstrap 

resampling with 10-Fold Cross validation had the most outstanding performance metrics among the 

five classifiers for predicting students’ characteristics influencing their academic performance. The 

RF had an accuracy of 93.96%, a precision of 93.19%, a sensitivity of 94.97%, a specificity of 92.94%, 

an F-measure of 94.04%, and an AUC of 0.980. As a result, the RF result with 10-fold cross-validation 

and bootstrap resampling was selected as the proposed model for the study. 


36 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
B. Analysis of Attributes of Importance in the Random Forest Classifier Model 

RQ2: What primary demographic attributes influence students' academic performance at the SHS 

level in Ghana?  

The weights of the respective attributes by information gain were determined using the model 

simulator operator to determine the attributes that had a significant impact on the decision made by 

the RF classifier. These weights were ordered in descending. The list's top two attributes were 

considered the most relevant in the model choice process. According to the RF classifier model 

simulator, the mother's and father's education levels (with the highest weights of 0.358 and 0.168, 

respectively) are the two discovered demographic factors that significantly support the classification 

model per this study. Figure 7 depicts the order in which the attributes in support of the prediction are 

arranged according to their weights. 

 
Fig. 7. Order of attributes according to weights of importance 

The two most contributing demographic attributes based on the weight of contributions to the 

decision made by the model are the mother's and father's education levels. The BECE attributes 

happened to belong to academic features; hence they were excluded. 

This section explains the evaluation technique for the model developed to evaluate the 

demographic factors impacting student performance in pre-tertiary institutions. The study included 

twenty specific tests with various classifiers. The following evaluation parameters were used: the 

confusion matrix, the number of trees in the forest, and comparing the ROC of a random forest with 

the ROCs of rule induction, NB, LR, and DL classifiers to construct a robust model. 

Table 5 displays the confusion matrix of the chosen model, created using the RF algorithm and the 

Bootstrap resample approach with 10-fold cross-validation. 

Table 5. Confusion matrix evaluation for random forest model 

 PREDICTED CLASS  

 
A
C

T
U

A
L

 
C
L

A
S

S
 

 A 

POSITIVE 

B 

NEGATIVE 

Classified As 

POSITIVE (TP)= 1108 (FN)=65 A=Low Intervention 

NEGATIVE (FP)=91 (TN)= 1070 B= Intensive Intervention 

 
 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 37 

 
Much may be learned by meticulously scrutinizing the errors generated by any classification 

model. The errors show discrepancies between the model's predictions and the tangible outcome in 

the actual business situation. When an appropriate model is discovered, the next step is determining 

why classification inaccuracies happened in the testing data. For instance, when predicting an attribute 

for a certain class label, the predicted and actual results may differ. 

However, because comparable features reside within the same class limit, the classifier predicts 

the data into a particular class. 

Table 4 displays the confusion matrix of the final model for the study. It indicates that 1108 2,334 

incidents were accurately labeled as low intervention, whereas 1070 instances were correctly labeled 

as intensive intervention. This classifier identified 91 instances as a low intervention when they should 

have been classified as intensive intervention. Again, 65 cases were wrongly labeled as an intensive 

interventions when they should have been classified as low intervention. The misclassification of the 

two groups might be because if low intervention status happens, there is also a potential for intensive 

intervention status to occur, and vice versa. 

ROC curves with averaged thresholds for all five classifiers were generated, and their Arear under 

Curves (AUCs) were evaluated using 10-fold cross-validation. Finally, the ROC graph is constructed 

and shown in Figure 8. 

 
Fig. 8. ROC curves to compare the performance of random forest and the other classifiers in the study 

From the ROC graph, it can be deduced that random forest achieves superior classification metrics 

compared to the rest of the four classifiers (i.e., RI, NB, LR, and DL). The thick red line represents 

the curve for the random forest with an AUC of 0.980. 

C. Determining Students’ Intervention Type 

Eventually, Figure 8 illustrates the classifier's conclusive results after considering all features. The 

study's primary purpose was to discover the demographic determinants of learners' academic success. 

These determinants help educational administrators define learners as needing intensive or low 

intervention. Per the confusion matrix class prediction of the random forest model in Table 4, Out of 

the 2334 upscaled sample understudy, 1173 (50.26%) were labeled as needing low intervention, while 

1161 (49.74%) of the second-year students whose data was used were classed as needing intensive 

academic intervention to enhance their performance. As a result, it is possible to examine and conclude 

that the model effectively categorized the 2334 students according to the type of intervention they 

needed to boost their performance. The student's classification by intervention types is illustrated in 

Figure 9. 


38 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
Fig. 9. Number of students classified as in need of low or intensive intervention 

Overall, the RF classifier emerged as the best classification technique for the task from this study. 

The RF classifier correctly classified 2193(93.96%) instances, while 141(6.04%) instances were 

incorrectly classified. According to [53] in "Estimates of highly accurate models“, the RF model is 

highly viable for performance determinants prediction since its accuracy extends beyond the 75% 

lower bound benchmark.  

Again, the mother's and father's education levels (with information gains of 0.358 and 0.168, 

respectively) are the recognized demographic factors per this study that significantly influence pre-

tertiary students’ academic achievement. In their study, this finding is confirmed by [54] that well-

educated parents prioritize a text-rich home environment, enhancing their academic achievement. 

IV. Conclusion 

The proposed demographic-based predictive model offers an innovative approach to predict 

learner performance accurately and recommend appropriate intervention schemes. By leveraging 

demographic information, educational institutions can provide targeted support to students, ultimately 

enhancing their educational experience and improving academic outcomes. This study has 

significantly reduced the gap in practical knowledge observed in the literature by introducing an 

intervention scheme for respective students requiring intensive or minimal academic interventions in 

its prediction procedure. 

Declarations  

Author contribution  

All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. 

Funding statement  

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.  

Conflict of interest  

The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence 
the work reported in this paper.  

Additional information  

Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. 

Publisher’s Note: Department of Electrical Engineering and Informatics - Universitas Negeri Malang remains neutral with 

regard to jurisdictional claims and institutional affiliations. 

 
http://journal2.um.ac.id/index.php/keds


 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 39 

 
References 

[1] R. Hasan, S. Palaniappan, S. Mahmood, K. U. Sarker, and A. Abbas, “Modelling and predicting student’s academic 
performance using classification data mining techniques,” Int. J. Bus. Inf. Syst., vol. 34, no. 3, pp. 403–422, 2020. 

[2] M. N. Yakubu and A. M. Abubakar, “Applying machine learning approach to predict students’ performance in higher 
educational institutions,” Kybernetes, vol. 51, no. 2, pp. 916–934, 2022. 

[3] M. Arashpour et al., “Predicting individual learning performance using machine-learning hybridized with the 
teaching-learning-based optimization,” Comput. Appl. Eng. Educ., vol. 31, no. 1, pp. 83–99, 2023. 

[4] D. M. Ahmed, A. M. Abdulazeez, D. Q. Zeebaree, and F. Y. H. Ahmed, “Predicting University’s Students 
Performance Based on Machine Learning Techniques,” 2021 IEEE Int. Conf. Autom. Control Intell. Syst. I2CACIS 
2021 - Proc., no. August, pp. 276–281, 2021. 

[5] F. de Galiza Barbosa et al., “Genitourinary imaging,” Clinical PET/MRI. pp. 289–312, 2022. 
[6] F. Inusah, Y. M. Missah, N. Ussiph, and F. Twum, “Expert System in Enhancing Efficiency in Basic Educational 

Management using Data Mining Techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 11, pp. 427–434, 2021. 
[7] F. Inusah, Y. M. Missah, U. Najim, and F. Twum, “Data Mining and Visualisation of Basic Educational Resources 

for Quality Education,” Int. J. Eng. Trends Technol., vol. 70, no. 12, pp. 296–307, Dec. 2022. 
[8] F. Inusah, Y. M. Missah, U. Najim, and F. Twum, “Integrating expert system in managing basic education: A survey 

in Ghana,” Int. J. Inf. Manag. Data Insights, vol. 3, no. 1, p. 100166, 2023. 
[9] F. Inusah, Y. M. Missah, U. Najim, and F. Twum, “Agile neural expert system for managing basic education,” Intell. 

Syst. with Appl., vol. 17, no. December 2022, p. 200178, 2023. 
[10] H. Drachsler and W. Greller, “Privacy and analytics - it’s a DELICATE issue a checklist for trusted learning analytics,” 

ACM Int. Conf. Proceeding Ser., vol. 25-29-Apri, no. April, pp. 89–98, 2016. 
[11] B. Owusu-Boadu, I. K. Nti, O. Nyarko-Boateng, J. Aning, and V. Boafo, “Academic Performance Modelling with 

Machine Learning Based on Cognitive and Non-Cognitive Features,” Appl. Comput. Syst., vol. 26, no. 2, pp. 122–
131, 2021. 

[12] I. Issah, O. Appiah, P. Appiahene, and F. Inusah, “A systematic review of the literature on machine learning 
application of determining the attributes influencing academic performance,” Decis. Anal. J., vol. 7, no. March, p. 
100204, 2023. 

[13] M. Tadese, A. Yeshaneh, and G. B. Mulu, “Determinants of good academic performance among university students 
in Ethiopia: a cross-sectional study,” BMC Med. Educ., vol. 22, no. 1, pp. 1–9, 2022. 

[14] F. Ouatik, M. Erritali, F. Ouatik, and M. Jourhmane, “Predicting Student Success Using Big Data and Machine 
Learning Algorithms,” Int. J. Emerg. Technol. Learn., vol. 17, no. 12, pp. 236–251, 2022. 

[15] S. Hussain and M. Q. Khan, “Student-Performulator: Predicting Students’ Academic Performance at Secondary and 
Intermediate Level Using Machine Learning,” Ann. Data Sci., no. Ml, 2021. 

[16] V. K. Pal and V. K. K. Bhatt, “Performance prediction for post graduate students using artificial neural network,” Int. 
J. Innov. Technol. Explor. Eng., vol. 8, no. 7, pp. 446–454, 2019. 

[17] L. M. Abu Zohair, “Prediction of Student’s performance by modelling small dataset size,” Int. J. Educ. Technol. High. 
Educ., vol. 16, no. 1, 2019. 

[18] M. Pojon, “Using Machine Learning to Predict Student Performance,” Univ. Tampere, no. June, pp. 1–28, 2017. 
[19] B. Sekeroglu, K. Dimililer, and K. Tuncal, “Student Performance Prediction and Classification Using Machine 

Learning Algorithms,” in Proceedings of the 2019 8th International Conference on Educational and Information 
Technology, Mar. 2019, pp. 7–11. 

[20] M. N. Yakubu and A. M. Abubakar, “Applying machine learning approach to predict students’ performance in higher 
educational institutions,” Kybernetes, no. June, 2021. 

[21] F. Aman, A. Rauf, R. Ali, F. Iqbal, and A. M. Khattak, “A Predictive Model for Predicting Students Academic 
Performance,” in 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), 
Jul. 2019, pp. 1–4. 

[22] J. López-Zambrano, J. A. L. Torralbo, and C. Romero, “Early prediction of student learning performance through data 
mining: A systematic review,” Psicothema, vol. 33, no. 3, pp. 456–465, 2021. 

[23] A. I. Adekitan and E. Noma-Osaghae, “Data mining approach to predicting the performance of first year student in a 
university using the admission requirements,” Educ. Inf. Technol., vol. 24, no. 2, pp. 1527–1543, 2019. 

[24] Y. Altujjar, W. Altamimi, I. Al-turaiki, and M. Al-razgan, “Predicting Critical Courses Affecting Students 
Performance : A Case Study,” Procedia - Procedia Comput. Sci., vol. 82, no. March, pp. 65–71, 2016. 

[25] A. Ahadi, R. Lister, H. Haapala, and A. Vihavainen, “Exploring machine learning methods to automatically identify 
students in need of assistance,” ICER 2015 - Proc. 2015 ACM Conf. Int. Comput. Educ. Res., pp. 121–130, 2015. 

[26] D. T. Ha, C. N. Giap, P. T. T. Loan, and T. L. H. Huong, “An Empirical Study for Student Academic  Performance 
Prediction Using Machine Learning Techniques,” Int. J. Comput. Sci. Inf. Secur., vol. 18, no. 3, pp. 21–28, 2020. 

[27] J. David and G. Anastasija, “Predicting Academic Performance Based on Students ’ Family Environment : Evidence 
for Colombia Using Classification Trees,” vol. 11, no. 3, pp. 299–311, 2019. 

[28] M. I. Al-Twijri and A. Y. Noaman, “A New Data Mining Model Adopted for Higher Institutions,” Procedia Comput. 
Sci., vol. 65, no. Iccmit, pp. 836–844, 2015. 

[29] A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and 
Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018. 

[30] P. D. Jenssen, T. Krogstad, and K. Halvorsen, “Community wastewater infiltration at 69 o northern latitude – 25 years 
of experience,” Soil Sci. Soc. Am. Onsite Wastewater Conf. , Albuquerque NM, 7-8 April 2014, no. April, pp. 7–8, 
2014. 

[31] C. Anuradha and T. Velmurugan, “Fast Boost Decision Tree Algorithm: A novel classifier for the assessment of 
student performance in Educational data,” vol. 31, pp. 254–0223, 2016. 

[32] P. Bhatia, “Introduction to Data Mining,” Data Min. Data Warehous., pp. 17–27, 2019. 
[33] S. Samson, “Use of Data Mining For Determining Higher Education Students ’Performance,” St. Mary’s University, 

2019. 
[34] J. Feng, “Predicting Students’ Academic Performance with Decision Tree and Neural Network,” University of Central 

Florida, 2019. 
[35] K. David Kolo, S. A. Adepoju, and J. Kolo Alhassan, “A Decision Tree Approach for Predicting Students Academic 

Performance,” Int. J. Educ. Manag. Eng., vol. 5, no. 5, pp. 12–19, 2015. 
[36] Y. Liu, S. Fan, S. Xu, A. Sajjanhar, S. Yeom, and Y. Wei, “Predicting Student Performance Using Clickstream Data 

and Machine Learning,” Educ. Sci., vol. 13, no. 1, 2023. 

https://doi.org/10.1504/ijbis.2020.108649
https://doi.org/10.1504/ijbis.2020.108649
https://doi.org/10.1108/K-12-2020-0865
https://doi.org/10.1108/K-12-2020-0865
https://doi.org/10.1002/cae.22572
https://doi.org/10.1002/cae.22572
https://doi.org/10.1109/I2CACIS52118.2021.9495862
https://doi.org/10.1109/I2CACIS52118.2021.9495862
https://doi.org/10.1109/I2CACIS52118.2021.9495862
https://doi.org/10.1016/B978-0-323-88537-9.00012-X
https://dx.doi.org/10.14569/IJACSA.2021.0121148
https://dx.doi.org/10.14569/IJACSA.2021.0121148
http://dx.doi.org/10.14445/22315381/IJETT-V70I12P228
http://dx.doi.org/10.14445/22315381/IJETT-V70I12P228
https://doi.org/10.1016/j.jjimei.2023.100166
https://doi.org/10.1016/j.jjimei.2023.100166
https://doi.org/10.1016/j.iswa.2023.200178
https://doi.org/10.1016/j.iswa.2023.200178
https://doi.org/10.1145/2883851.2883893
https://doi.org/10.1145/2883851.2883893
http://dx.doi.org/10.2478/acss-2021-0015
http://dx.doi.org/10.2478/acss-2021-0015
http://dx.doi.org/10.2478/acss-2021-0015
https://doi.org/10.1016/j.dajour.2023.100204
https://doi.org/10.1016/j.dajour.2023.100204
https://doi.org/10.1016/j.dajour.2023.100204
https://doi.org/10.1186/s12909-022-03461-0
https://doi.org/10.1186/s12909-022-03461-0
https://doi.org/10.3991/ijet.v17i12.30259
https://doi.org/10.3991/ijet.v17i12.30259
https://doi.org/10.1007/s40745-021-00341-0
https://doi.org/10.1007/s40745-021-00341-0
https://www.ijitee.org/wp-content/uploads/papers/v8i7s2/G10760587S219.pdf
https://www.ijitee.org/wp-content/uploads/papers/v8i7s2/G10760587S219.pdf
https://doi.org/10.1186/s41239-019-0160-3
https://doi.org/10.1186/s41239-019-0160-3
https://urn.fi/URN:NBN:fi:uta-201706262111
https://doi.org/10.1145/3318396.3318419
https://doi.org/10.1145/3318396.3318419
https://doi.org/10.1145/3318396.3318419
https://doi.org/10.1108/K-12-2020-0865
https://doi.org/10.1108/K-12-2020-0865
https://doi.org/10.1109/IISA.2019.8900760
https://doi.org/10.1109/IISA.2019.8900760
https://doi.org/10.1109/IISA.2019.8900760
https://doi.org/10.7334/psicothema2021.62
https://doi.org/10.7334/psicothema2021.62
https://doi.org/10.1007/s10639-018-9839-7
https://doi.org/10.1007/s10639-018-9839-7
https://doi.org/10.1016/j.procs.2016.04.010
https://doi.org/10.1016/j.procs.2016.04.010
https://doi.org/10.1145/2787622.2787717
https://doi.org/10.1145/2787622.2787717
https://www.researchgate.net/publication/340351415_An_Empirical_Study_for_Student_Academic_Performance_Prediction_Using_Machine_Learning_Techniques
https://www.researchgate.net/publication/340351415_An_Empirical_Study_for_Student_Academic_Performance_Prediction_Using_Machine_Learning_Techniques
http://dx.doi.org/10.25115/psye.v11i3.2056
http://dx.doi.org/10.25115/psye.v11i3.2056
https://doi.org/10.1016/j.procs.2015.09.037
https://doi.org/10.1016/j.procs.2015.09.037
https://doi.org/10.1613/jair.1.11192
https://doi.org/10.1613/jair.1.11192
https://www.soils.org/files/meetings/specialized/full-conference-proceedings.pdf
https://www.soils.org/files/meetings/specialized/full-conference-proceedings.pdf
https://www.soils.org/files/meetings/specialized/full-conference-proceedings.pdf
https://www.researchgate.net/publication/311347685_Fast_Boost_Decision_Tree_Algorithm_A_novel_classifier_for_the_assessment_of_student_performance_in_Educational_data
https://www.researchgate.net/publication/311347685_Fast_Boost_Decision_Tree_Algorithm_A_novel_classifier_for_the_assessment_of_student_performance_in_Educational_data
https://www.cambridge.org/core/books/abs/data-mining-and-data-warehousing/introduction-to-data-mining/4A5DA2EB1116347161117A2F4EB2A4B1
http://repository.smuc.edu.et/handle/123456789/5274
http://repository.smuc.edu.et/handle/123456789/5274
https://stars.library.ucf.edu/etd/6301/
https://stars.library.ucf.edu/etd/6301/
http://dx.doi.org/10.5815/ijeme.2015.05.02
http://dx.doi.org/10.5815/ijeme.2015.05.02
https://doi.org/10.3390/educsci13010017
https://doi.org/10.3390/educsci13010017


40 I. Iddrisu et al. / Knowledge Engineering and Data Science 2023, 6 (1): 24–40 

 
[37] M. M. Z. Eddin, N. A. Khodeir, and H. A. Elnemr, “A Comparative Study of Educational Data Mining Techniques 
for skill-based Predicting Student Performance,” Int. J. Comput. Sci. Inf. Secur., vol. 16, no. 3, pp. 56–62, 2018. 

[38] P. Sokkhey and T. Okazaki, “Hybrid machine learning algorithms for predicting academic performance,” Int. J. Adv. 
Comput. Sci. Appl., vol. 11, no. 1, pp. 32–41, 2020. 

[39] M. N. Yakubu, “Applying machine learning approach to predict students ’ performance in higher educational 
institutions,” no. June, 2021. 

[40] Y. Denny, H. Leslie, H. Spits, and W. Budiharto, “Systematic Literature Review on Abstractive Text Summarization,” 
no. November, 2021. 

[41] K. Blackmore and T. R. J. Bossomaier, “Comparison of See5 and J48.PART Algorithms for Missing Persons 
Profiling,” no. December, 2016. 

[42] F. Ofori, E. Maina, and R. Gitonga, “Using Machine Learning Algorithms to Predict Students’ Performance and 
Improve Learning Outcome: A Literature Based Review,” J. Inf. Technol., vol. 4, no. 1, pp. 2616–3573, 2020. 

[43] S. Agrawal, S. K., and A. K., “Using Data Mining Classifier for Predicting Student’s Performance in UG Level,” Int. 
J. Comput. Appl., vol. 172, no. 8, pp. 39–44, 2017. 

[44] D. Gašević, V. Kovanović, and S. Joksimović, “Piecing the learning analytics puzzle: a consolidated model of a field 
of research and practice,” Learn. Res. Pract., vol. 3, no. 1, pp. 63–78, 2017. 

[45] P. G. Sameer and S. R. Barahate, “Educational Data Mining – A New Approach to the Education Systems,” pp. 18–
20, 2016. 

[46] A. S. Hashim, W. A. Awadh, and A. K. Hamoud, “Student Performance Prediction Model based on Supervised 
Machine Learning Algorithms,” IOP Conf. Ser. Mater. Sci. Eng., vol. 928, no. 3, 2020. 

[47] C. A. Palacios, J. A. Reyes-Suárez, L. A. Bearzotti, V. Leiva, and C. Marchant, “Knowledge discovery for higher 
education student retention based on data mining: Machine learning algorithms and case study in chile,” Entropy, vol. 
23, no. 4, pp. 1–23, 2021. 

[48] D. T. Larose and C. D. Larose, “Data Mining and Predictive Analytics (Wiley Series on Methods and Applications in 
Data Mining): 9781118116197: Computer Science Books @ Amazon.com,” Wiley Ser., p. 794, 2015. 

[49] M. Maalouf, “Logistic regression in data analysis: an overview,” Int. J. Data Anal. Tech. Strateg., vol. 3, no. 3, p. 281, 
2011. 

[50] K. J. Cios, W. Pedrycz, R. W. Swiniarski, and L. A. Kurgan, Data mining: A knowledge discovery approach. 2007 . 
[51] Y. Chen et al., “Evaluation efficiency of hybrid deep learning algorithms with neural network decision tree and 

boosting methods for predicting groundwater potential,” Geocarto Int., vol. 37, no. 19, pp. 5564–5584, 2022. 
[52] D. M. W. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and 

correlation,” pp. 37–63, 2020. 
[53] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques. 

2016. 
[54] M. Idris, S. Hussain, and N. Ahmad, “Relationship between Parents’ Education and their children’s Academic 

Achievement,” J. Arts Soc. Sci., vol. 7, no. 2, pp. 82–92, Dec. 2020. 
 

https://www.researchgate.net/publication/330502931_A_Comparative_Study_of_Educational_Data_Mining_Techniques_for_skill-based_Predicting_Student_Performance
https://www.researchgate.net/publication/330502931_A_Comparative_Study_of_Educational_Data_Mining_Techniques_for_skill-based_Predicting_Student_Performance
http://dx.doi.org/10.14569/IJACSA.2020.0110104
http://dx.doi.org/10.14569/IJACSA.2020.0110104
https://doi.org/10.1108/K-12-2020-0865
https://doi.org/10.1108/K-12-2020-0865
http://dx.doi.org/10.24507/icicelb.12.11.XXX
http://dx.doi.org/10.24507/icicelb.12.11.XXX
https://www.researchgate.net/publication/290312652_Comparison_of_See5_and_J48PART_Algorithms_for_Missing_Persons_Profiling
https://www.researchgate.net/publication/290312652_Comparison_of_See5_and_J48PART_Algorithms_for_Missing_Persons_Profiling
https://www.researchgate.net/publication/340209478_Using_Machine_Learning_Algorithms_to_Predict_Students'_Performance_and_Improve_Learning_Outcome_A_Literature_Based_Review
https://www.researchgate.net/publication/340209478_Using_Machine_Learning_Algorithms_to_Predict_Students'_Performance_and_Improve_Learning_Outcome_A_Literature_Based_Review
http://dx.doi.org/10.5120/ijca2017915201
http://dx.doi.org/10.5120/ijca2017915201
https://doi.org/10.1080/23735082.2017.1286142
https://doi.org/10.1080/23735082.2017.1286142
https://scholar.google.com/scholar?q=Educational%20Data%20Mining%20%20A%20New%20Approach%20to%20the%20Education%20Systems
https://scholar.google.com/scholar?q=Educational%20Data%20Mining%20%20A%20New%20Approach%20to%20the%20Education%20Systems
https://iopscience.iop.org/article/10.1088/1757-899X/928/3/032019
https://iopscience.iop.org/article/10.1088/1757-899X/928/3/032019
https://doi.org/10.3390/e23040485
https://doi.org/10.3390/e23040485
https://doi.org/10.3390/e23040485
http://repo.darmajaya.ac.id/4011/1/Data%20Mining%20and%20Predictive%20Analytics.pdf
http://repo.darmajaya.ac.id/4011/1/Data%20Mining%20and%20Predictive%20Analytics.pdf
http://dx.doi.org/10.1504/IJDATS.2011.041335
http://dx.doi.org/10.1504/IJDATS.2011.041335
https://doi.org/10.1007/978-0-387-36795-8
https://doi.org/10.1080/10106049.2021.1920635
https://doi.org/10.1080/10106049.2021.1920635
https://www.researchgate.net/publication/228529307_Evaluation_From_Precision_Recall_and_F-Factor_to_ROC_Informedness_Markedness_Correlation
https://www.researchgate.net/publication/228529307_Evaluation_From_Precision_Recall_and_F-Factor_to_ROC_Informedness_Markedness_Correlation
https://www.sciencedirect.com/book/9780123748560/data-mining-practical-machine-learning-tools-and-techniques#book-description
https://www.sciencedirect.com/book/9780123748560/data-mining-practical-machine-learning-tools-and-techniques#book-description
https://doi.org/10.46662/jass-vol7-iss2-2020(82-92)
https://doi.org/10.46662/jass-vol7-iss2-2020(82-92)