International Journal of Applied Sciences and Smart Technologies


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
65 

 
Factors Influencing the Difficulty Level of the 

Subject: Machine Learning Technique 

Approaches 
 

Hari Suparwito 

 
Department of Informatics, Faculty of Science and Technology, 

Sanata Dharma University, Yogyakarta, Indonesia 

Corresponding Author: shirsj@jesuits.net 

 
(Received 07-05-2019; Revised 21-05-2019; Accepted 21-05-2019) 

 
Abstract 

The difficulty level of a subject is needed either to understand the student 

acceptance of the subject and the highest level of student achievement in it. 

Some factors are considered, what kind of instructions, the readiness of the 

instructor and students in teaching and learning, evaluation and monitoring 

systems, and student expectations. Many factors are involved, and educators 

should know this. It is better if they can discern which are the prime factors 

and which the secondary factors. The purpose of the study is to find out the 

determinant factors in establishing the difficulty level of the subject from 

the students’, teachers’ and infrastructure point of view using three machine 

learning techniques. The MSE and the variable importance measurement 

were used to predict between some factors such as Attendance, Instructors, 

and other factors as independent variables and the difficulty level of the 

subject as a dependent variable. The study result showed that Gradient 

Boosting Machine obtained the MSE value result 1.14 and 1.30 for training 

and validation dataset. The model generated five variable importance as an 

independent factor, i.e. Attendance, Instructor, The course can give a new 

perspective to students, The quizzes, assignments, projects and exams 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
66 

 
contributed to helping the learning, and The Instructor was committed to the 

course and was understandable. The Gradient Boosting Machine is superior 

to other methods with the lowest MSE and MAE values results. Two 

methods, Gradient Boosting Machine and Deep Learning, have produced 

the same five main factors that influenced the difficulty of the subject. It 

means these factors are significant and should get intention by the 

stakeholders 

Keywords: machine learning, regression, deep learning, random forest, 

gradient boosting machine, data mining, education. 

 
1 Introduction 

Education provides people with knowledge about life and the world. It helps build 

character and leads to illumination. Given the importance of education, researchers ask 

themselves what factors influence the process of teaching and the attitude of students so 

that the students can understand the subjects, and what factors help to measure the 

difficulty level of subjects. The difficulty level of subjects is needed both to understand 

either the student acceptance of their subject or to ascertain the highest level of the 

student achievement in them [1] 

John D. et al. [2] have examined some aspects and conducted some reviews based on 

learning conditions, student characteristics, materials and criterion tasks for effective 

learning techniques. Another group of researchers [3] have found that the social context 

influenced effective teaching and learning. Some factors mentioned were direct 

instruction, frequent monitoring, sense of communities, and student expectations. There 

are many factors involve here. 

Research on education using data mining are increasing and promising in the last 

years and mostly focusing the research on student’s performance, the effectiveness of 

learning and students and teacher’s perception of learning [4]. Romero et al. stated that 

the objective using data mining in education areas is to improve the learning itself and 

the actors are students and teachers with the subjects of learning and the way to deliver 

as a medium relates them. Vanthienen and De Witte [5] revealed that their study 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
67 

 
showed the use of machine learning methods is advantageous especially when it faces a 

nonlinear interaction function such as the role of a school principal to accommodate the 

district size policies. Another research in education field using the machine learning 

technique was undertaken by Liao, Zingaro [6]. They stated that using machine learning 

techniques; they can identify students who are at risk of performing poorly in a course. 

Moreover, the machine learning approach was also performed for evaluating and 

predicting the student’s level of proficiency [7]. To successfully predict the quality of 

this type of educational process the authors use one of the machine learning techniques. 

They claimed that the proposed technique could be effectively used in the educational 

management when the online teaching strategy should be selected based on student’s 

goals, individual features, needs and preferences. Finally, Cope and Kalantzis [8] 

claimed that the use of machine learning and big data analysis in research on education 

should be undertaken because these emerging sources of evidence of learning have 

significant implications for the relationships between assessment and instruction. 

Moreover, for educational researchers, these datasets are in some senses different from 

conventional evidentiary sources, and this raises a new approach and give a different 

point of view to the traditional research in education areas. 

The objective of this research is to find out the determinant factors that affect the 

student’s acceptance focusing on the difficulty level of students understanding of the 

subjects. Instead of using a statistical approach in this present study we performed three 

machine learning techniques, i.e. Deep Learning, Random Forest, and Gradient 

Boosting Machine. Another purpose of this research is to introduce and compare the 

results of three machine learning methods in education areas. As the data set, we 

collected the dataset from the student evaluation at Gazi University Ankara [9] and was 

taken from the UCI repository dataset. This data set will be examined by three machine 

learning techniques using H2O platforms. 

This paper is organised as follows. In section 2, we describe the research 

methodology with the following process in data mining approaches and then the results 

based on the H2O data mining tools calculation are presented and discussed in section 

3. In chapter 4, we provide the conclusion and the subsequent work research outcome. 

 
International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
68 

 
2 Research Methodology 

In general, the steps in this study follows the model of data mining techniques [10]: 

 
2.1  Objective determination 

The first step was to discover the real-world problems. This study will attempt to 

answer the educational question of how to understand and measure the difficulty level 

of the subject from the students’, teachers’ and infrastructures’ point of view. To be 

more precise, the following research question was raised: What is the determinant 

factors which make students think and establish that this subject is difficult or easy? 

A hypothesis was created to test which attributes in the data set gives a significant 

contribution toward the research question: Students think that the level of the subject 

difficulty is more likely to be influenced by the subject syllabus, activities and 

interactions between students and instructors and the readiness of students and teachers 

to engage in the learning process. 

By analysing and testing this hypothesis, it shall know the determinant factors to 

answer the question of why do the students think that the subject is difficult to 

understand? Moreover, what should be done by the teachers so that the students can 

accept and understand the subject materials more easily? 

 
2.2  The proposed work 

To examine three machine learning models we selected the dataset from the UCI 

machine learning repository about Turkiye Students Evaluation data set [9]. 

Furthermore, the dataset was analysed for reducing its dimensional features by using 

Principle Component Analysis (PCA) and then followed by performing a data 

normalisation using z-normalization. Moreover, the dataset was randomly split into ratio 

80% ∶ 20% from the data population as training and validation dataset. 

Three machine learning techniques then were applied to training dataset obtaining 

the regression model, the MSE and MAE value results and the variable importance of 

each method. Using the model, we observed validation dataset to find out the MSE and 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
69 

 
MAE values results and the variable significance for the testing dataset. All processes 

can be seen in the diagram below. 

 
Figure 1. The proposed work. It was started by selecting the dataset to deliver the MSE and 

MAE value results and the variable importance rank 
 

2.3  Data pre-processing 

The research used the data result of the student questionnaire at Gazi University 

Ankara Turkey [9]. The dataset was obtained from the UCI machine learning repository 

dataset (https://archive.ics.uci.edu/ml/index.php). There are 5820 instances in the data 

set with 33 attributes where 28 attributes are formed in a Likert-type scale with the 

value from 1 to 5. The Likert-type scale values 1 equals to a strongly disagree value, 

and the value 5 equals to a strongly agree value. The five other attributes are questions 

with the answers in the natural numbers data format. The questions can be grouped into 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
70 

 
three substantial group questions based on students’, teachers’ and infrastructures’ point 

of view. 

Next, we undertook a PCA analysis for features reduction. Matrix correlation from 

the PCA analysis showed each eigenvalue of the features. A new variable (principal 

component) was calculated based on eigenvalues with the values bigger than one. The 

PCA analysis result for a new variable is five principal components. We analysed and 

found that five principles components can be grouped into Attendance, Instructor, 

subject preparation, quizzes or exams, and the relationship between students and 

instructors. 

 
Table 1. Table of principle component 

Component Standard 

Deviation 

Proportion of 

Variance 

Cumulative of 

Variance 

PC1 6.140 0.588 0.588 

PC2 3.686 0.212 0.800 

PC3 1.701 0.045 0.845 

PC4 1.411 0.031 0.876 

PC5 1.059 0.017 0.894 

 
Figure 2. The cumulative proportion of variance versus principle component. 

 
From five principal components, we selected which features have a high rank based 

on the eigenvector values of each feature. Finally, we found 15 features that can be used 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
71 

 
in this study. Therefore, the number of features was reduced from 33 features to 15 

features only. A new reduced feature is shown in the following table. 

 
Table 2.  PCA analysis results 

Features 

Name of features  

Difficulty (target label) 

Attendance  

Instructors 

Q1 The semester course content, teaching method and evaluation system were provided at the 

start 

Q4 The course was taught according to the syllabus announced on the first day of class. 

Q5 The class discussions, homework assignments, applications and studies were satisfactory. 

Q7 The course allowed fieldwork, applications, laboratory, discussion and other studies. 

Q8 The quizzes, assignments, projects and exams contributed to help the learning. 

Q12 The course helped me look at life and the world with a new perspective. 

Q16 The Instructor was committed to the course and was understandable. 

Q21 The Instructor demonstrated a positive approach to students. 

Q22 The Instructor was open and respectful of the views of students about the course 

Q24 The Instructor gave relevant homework assignments/projects and helped/guided students. 

Q25 The Instructor responded to questions about the course inside and outside of the course. 

Q27 The Instructor provided solutions to exams and discussed them with students. 

Q28 The Instructor treated all students in a right and objective manner. 

 
2.4  Data mining   

The next process after the data pre-processing was to decide the kind of evaluation to 

be applied to the data set. The regression task is chosen because the data set is already 

classified in attributes and the questionnaire’s answer is on a Likert-type scale from 1 to 

5 means already classified too. Another reason is that this study’ goal is directed to 

discover which attributes are the determinant factors of the difficulty level of the 

subject. 

Three machine learning techniques that are Deep Learning (DL), Random Forest 

(RF) and Gradient Boosting Machine (GBM) were used to examine the data set 

focusing on the regression analysis between 15 attributes as an independent variable and 

the difficulty level of the subject as a target or dependent variable. 

 
2.4.1.  Deep Learning 

Introduced the first time by Hinton et al. DL becomes more and more popular as one 

method to solve the problems in machine learning areas [11]. Deep learning is a part of 

machine learning techniques that aim to imitate the work of the human brain using an 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
72 

 
artificial neural network.  Different from other machine learning programs, the deep 

learning algorithm is made by a complex and high capability to learn, work and classify 

data. 

In general, DL consist of 3 main layers: input-hidden-output. Input layers work for 

containing raw data as input data. Hidden layers are applied for observing, learning and 

classifying data based on the references, in case of DL hidden layers usually consist of 

more than three layers. Output layers present the results. 

 
Figure 3. Deep Learning diagram (the picture was taken from 

https://www.kdnuggets.com/2017/05/deep-learning-big-deal.html). 

 
2.4.2. Random Forest 

Random Forest is an ensemble learning technique for classification [12]. RF works 

by constructing a collection of decision tree at training time and returning the class that 

is the mode of all of the classes of the individual trees. Like DL, the RF algorithm has a 

significant advantage when analysing many of the datasets. It can address high-

dimensional data with an excellent ability to learn from a large amount of data, and it 

can realise learning regression and classification for nonlinear sample data. 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
73 

 
Figure 4. Random Forest architecture for classification and regression analysis (picture was 

taken from https://www.researchgate.net/figure/Architecture-of-the-random-forest-

model_fig1_301638643) 

 
2.4.3. Gradient Boosting Machine 

Gradient boosting is a form of machine learning boosting. Boosting means target 

outcomes for each case are set based on the gradient of the error to the prediction. The 

idea behind GBM is to set the target outcomes for the next model in order to minimise 

the error. Each new model performs in the direction that minimises prediction error 

[13]. Even though RF and GBM are an ensemble learning method, GBM and RF differ 

in the way the trees are created: the order and the way the results are combined. GBM 

tries to add new trees that compliment the already built ones. This usually gives a better 

accuracy with fewer trees. Therefore, GBM performs better than RF if parameters tuned 

carefully [14]. 

 
2.4.4  Cross-Validation  

The goal of cross-validation is to test the model's ability to predict new data and to 

give an insight into how the model will generalise to an independent dataset. In each 

machine learning model was undertaken the K-fold Cross-Validation (CV) method and 

it was applied to training and testing data set. The K-fold CV method was selected for 

the data sampling method because data instances should be evaluated in training and 

testing data set. The number of instances is quite large so when the K-fold CV does the 

data sampling to the training and testing data set K-fold CV can do quite well. This 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
74 

 
experiment was repeated many times, in this case, the repeating times was expressed by 

the K values. Even for some scientists argued that K=10 is the best value but in this 

research, the selection of the best K value in K-fold CV done by repeating many times 

experiment using various K values [15]. In this study, K-fold CV equal to 10 was 

applied. 

Machine learning methods worked by using some parameters and finding the best 

result, each machine learning method has specific parameters to adjust. We used data 

grid analysis to find the best parameters to provide the optimum results. The following 

table shows the grid search parameters applied for 

 
Table 3. Grid parameters values model 

Model              Grid Parameter values  

DL Function – Rectifier; Tanh  

 Hidden layers – 200, 200, 100, 50; 100,100,50; 50, 100, 100, 50 

Epochs – 50; 100; 200 

CV – 5; 10  

RF nTrees – 50; 100; 200 

Epochs – 50; 100; 200 

CV – 5; 10  

GBM nTrees – 50; 100; 200 

Epochs – 50; 100; 200 

CV – 5; 10  

 
The best performance from each model showed by the following parameters 

 
Table 4. Parameters values model 

Model      Parameter values  

DL Function – Rectifier 

 Hidden layers – 200, 200, 100, 50 

Epochs – 200 

CV – 10  

Input dropout – 0.2  

RF nTrees – 200 

Epochs – 100 

CV – 10  

GBM nTrees – 50 

Epochs – 50 

CV – 10  

 
Tabel 4 shows the best parameters gave by the grid search analysis. 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
75 

 
3 Results and Discussions 

Three machine learning methods were used to examine the dataset. The results obtained 

were the MSE and MAE values of each method and the variable importance. The Mean 

Squared Error (MSE) value was used to find the difference between the estimator and 

what is estimated. The MSE is achieved by applying the following formula: 

 
(1) 

Where �̌�  is a vector of 𝑛 prediction and 𝑌 is the vector of observed values 

corresponding to the input to the function that created the predictions. 𝑌𝑖 is the i-th value 

of the vector. 

In this study, the training dataset was the data obtained from 80% number of data 

population, while the dataset from the rest of the number of populations (20%) was 

used as a testing dataset. H2O machine learning tools were performed for training and 

testing dataset, and the MSE value results are presented in the following table. 

 
Table 5. MSE and MAE values of three machine learning models 

Models Training data set Validation data set 

 MSE MAE MSE MAE 

DL 1.25 0.89 1.33 0.92 

RF 1.31 0.92 1.38 0.91 

GBM 1.14 0.84 1.30 0.90 

 
The lowest MSE values are the best result because it describes the similarity between 

the real values and the prediction values. In other words, the lower the MSE, the higher 

the accuracy of prediction as there would be an excellent match between the actual and 

predicted data set. In this study, the lowest MSE value is obtained by GBM models.   

Like the MSE value, the MAE value obtained by the formula 

 
(2) 

 
Where 𝑥 and 𝑦 values are observed and predicted values. The lower MAE value also 

indicates better performance of the models. 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
76 

 
Understanding the best model for the prediction can be performed by using deviance 

of training and testing dataset [16]. Deviance measurement is used for measure how 

well the model to predict It attempt is a generalisation of the idea of using the sum of 

squares of residuals in ordinary least square to cases where model-fitting is obtained by 

maximum likelihood. The following picture shows the deviance score for each number 

of trees in GBM. 

 
Figure 5. GBM deviance score for each number of trees. We show the GBM model result only 

because GBM method obtained the best result 

 
3.1.  Variable importance 

Wei, Lu [17] stated that it is essential to know which the more significant factor or 

variable in the regression or prediction analysis. Whereas Grömping [18] argued that 

predictive analysis would be more convincing when the most influential predictor 

variable obtained, though the way to find variable importance is challenging and some 

regression models are not directly planned to find the variable importance. Therefore, 

another method needs to be used to find the variable importance. Some techniques in 

machine learning could be used as an alternative way to find the variable importance, 

especially when dealing with high-dimensional input data and the categorical output. 

Which variables are more significant in predicting the difficulty of the subject? Three 

ML methods were applied in this study. The percentage of Mean Square Error (MSE) 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
77 

 
and Mean Absolute Error (MAE) was measured, which indicates which variable has a 

more significant influence compared with other variables in predicting the difficulty of 

the subject values. Table 6 shows the rank of the variable importance results and it also 

is given for example the graph of the variable importance from the GBM result in fig. 6 

 
Table 6. variable importance results of each models 

Models Variable importance 

DL 1. Attendance  

2. Instructure  

3. Q12 - The course helped me look at life and the world with a new perspective.  

4. Q16 - The Instructor was committed to the course and was understandable  

5. Q8 - The quizzes, assignments, projects and exams contributed to help the learning. 

RF 1. Attendance 

2. Q22 - The Instructor was open and respectful of the views of students about the course.  

3. Q25 - The Instructor responded to questions about the course inside and outside of the course.  

4. Q21 - The Instructor demonstrated a positive approach to students. 

5. Instructure  

GBM 1. Attendance 

2. Instructure  

3. Q12 - The course helped me look at life and the world with a new perspective. 

4. Q8 - The quizzes, assignments, projects and exams contributed to help the learning. 

5. Q16 - The Instructor was committed to the course and was understandable. 

 
DL and GBM models have the same variable importance even though for Q8, Q12 

and Q16 have a different rank. However, the main five factors are the same that was 

produced by DL and GBM analysis. For three machine learning models, two main 

factors are attendance and instructors have a significant influence in determining the 

difficulty level of the subject. It means these two factors are the most important 

predictor for the difficulty of the subject variable. 

The previous study also revealed that student’s performance was not only dependent 

on their academic effort but also some other aspect that has a similar influence as well 

[19]. 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
78 

 
Figure 6. GBM variable importance 

 
To answer the main question in the first section, now we can see the rank of the 

variable importance, especially from DL and GBM results. Moreover, if we observe 

which features have a significant influence, we can draw some points here, 

a) Attendance has the most significant impact. The respondent thought that 

attendance whether by students or by instructors have an important role and it 

can make their presumption about the subjects. Attendance means participation 

and involvement between students and instructors.   

b) Instructors and their attitudes or approach to the students are related to the 

subjects. The students are convinced that the instructors have a significant 

impact on delivering the subjects to them whether it was easy or difficult to be 

understood by them. This aspect is also related to the instructors’ attitude such as 

how the instructor was committed to the course, how they respond if students are 

asking the subject in or out classes, how they can encourage the students to do 

the best with the selected subjects. The previous study by Martin, Wang [20] 

stated that instructors become an essential factor to make the subjects were easy 

or difficult in front of their students. 

c) The course can give a new perspective to students. A new perspective could be 

driven by the students. Therefore, they would focus on learning the subject and 

the next it will make the subject was easy to learn. In other words, giving a new 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
79 

 
perspective for life become a stimulus to the students to learn and love the 

subjects. 

d) The quizzes, assignments, projects and exams contributed to help the learning. 

The students need the way to express their ability in understanding the subjects. 

The students felt that reading some theories were not enough, they needed some 

exercises, and by doing the exercises, they can understand the subject more. 

These aspects were also mentioned by Henderson and Harper [21] in their 

research. They revealed that some correction, assessment, and teacher’s 

feedback on student’s quizzes could help the students to prepare their exams 

better. 

4 Conclusions 

Three machine learning algorithms, i.e. Deep Learning, Random Forest, and Gradient 

Boosting Machine with K-folds CV data sampling methods have been applied to 

analyse the difficulty level of the subject based on students’, teachers’ and 

infrastructures’ point of view. The data set is collected from the student questionnaire 

result at Gazi University Ankara. The result revealed that there are five determinant 

factors, i.e. Attendance, Instructors, the course helped me look at life and the world with 

a new perspective, the quizzes, assignments, projects and exams contributed to helping 

the learning, and the Instructor was committed to the course and was understandable. 

These five determinant factors can affect student’s and instructor’s perspective on the 

difficulty level of the subject. The two main factors are Attendance and Instructors. This 

study also demonstrated that data mining methods could be employed in the education 

field. However, the ability to understand data and how to work with them is very 

crucial. Data mining processes are important especially step by step at the stage model 

of data mining can be used as guidance on how to work with the data mining to solve 

the real-world problems. 

In the subsequent study, it is possible to discover and compare these techniques with 

another algorithm in classification and regression tasks. Another possibility is also to 

compare some other tools such as Orange and Rapidminer tools where these two tools 

work on machine learning algorithm for solving the same problem. 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
80 

 
Acknowledgements 

This research was supported by Department of Informatics Engineering, Sanata 

Dharma University. We would also like to thank the anonymous reviewers; whose 

comments greatly improved the manuscript. 

References 

[1] M. T. Tillery and A. Fishbach, “How to measure motivation: a guide for 

experimental social psychologist,” Social and Personality Psychology Compass, 8 

(7), 328−341, 2014. 

[2] J. Dunlosky, K. A. Rawson, E. J. Marsh, M. J. Nathan, and D. T. Willingham, 

“Improving students' learning with effective learning techniques: promising 

directions from cognitive and educational psychology,” Psychological Science in 

the Public Interest, 14 (1), 4−5, 2013. 

[3]  P. Hallinger and J. F. Murphy, “The social context of effective schools,” American 

Journal of Education, 94 (3), 328–355. 

[4] C. Romero and S. Ventura, “Data mining in education,” Wiley Interdisciplinary 

Reviews: Data Mining and Knowledge Discovery, 3 (1), 12−27, 2013. 

[5]  J. Vanthienen and K. D. Witte, “Data analytics applications in education,” CRC 

Press Taylor & Francis Group, 2017. 

[6]  Liao, S.N., et al., “A robust machine learning technique to predict low-performing 

students,” ACM Transactions on Computing Education (TOCE), 19 (3), 18, 2019. 

[7]  N. Kushik, N. Yevtushenko, and T. Evtushenko, “Novel machine learning 

technique for predicting teaching strategy effectiveness,” International Journal of 

Information Management, (2016). https://doi.org/10.1016/j.ijinfomgt.2016.02.006 

[8]  B. Cope and M. Kalantzis, “Big data comes to school: implications for learning, 

assessment, and research,” AERA Open, 2 (2), 1–19, 2016. 

[9]  G. Gunduza and E. Fokoue, Turkiye student evaluation I, University of California, 

School of Information and Computer Sciences, 2013. 


International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
81 

 
[10] P. Cabena, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi, “Discovering data 

mining: from concept to implementation,” Englewood Cliffs, N. J. Prentice Hall, 

1998. 

[11]  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, 521 (7553), 

436−444, 2015. 

[12] A. Liaw and M. Wiener, “Classification and regression by randomforest,” R news,  

2 (3),18−22, 2002. 

[13] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,”   

Annals of statistics, 29 (5), 1189−1232, 2001. 

[14] R. E. Schapire, “The boosting approach to machine learning: an overview, in 

nonlinear estimation and classification,” Springer, 149−171, 2003. 

[15] R, Kohavi and F. Provost, “Confusion matrix,” Machine Learning, 30 (2-3), 

271−274, 1998. 

[16] G. Ritschard, “Computing and using the deviance with classification trees,” 

COMPSTAT 2006-Proceedings in Computational Statistics, 55−66, August 2006. 

[17] P. Wei, Z. Lu, and J. Song, “Variable importance analysis: a comprehensive 

review,” Reliability Engineering & System Safety, 142, 399−432, 2015. 

[18] U. Grömping, “Variable importance in regression models,” Wiley Interdisciplinary 

Reviews: Computational Statistics, 7 (2), 137−152, 2015. 

[19] A. A. Saa, “Educational data mining & students’ performance prediction,” 

International Journal of Advanced Computer Science and Applications, 7 (5), 

212−220, 2016. 

[20] F. Martin, C. Wang, and A. Sadaf, “Student perception of helpfulness of 

facilitation strategies that enhance instructor presence, connectedness, engagement 

and learning in online courses,” The Internet and Higher Education, 37, 52−65, 

2018. 

[21] C. Henderson and K. A. Harper, “Quiz corrections: improving learning by 

encouraging students to reflect on their mistakes,” The Physics Teacher, 47 (9), 

581−586, 2009. 

 
International Journal of Applied Sciences and Smart Technologies 

Volume 1, Issue 1, pages 65–82 

ISSN 2655-8564 

  
82 

 
This page intentionally left blank