Lontar - Template


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 

44 
 

Forecasting New Student Candidates Using the 
Random Forest Method    

 
Rahmat Robi Waliyansyah

a1
, Nugroho Dwi Saputro

a2 

 
a
Informatics, Universitas PGRI Semarang 

Jl. Sidodadi Timur No.24, Dr. Cipto, Semarang 
1)

rahmat.robi.waliyansyah@upgris.ac.id 
2
nugputra1@gmail.com 

 
Abstract 
 

College education institutions regularly hold new student admissions activities, and the number 
of new students can increase and can also decrease. University of PGRI Semarang (UPGRIS) 
on the development of new student admissions for the 2014/2015 academic year up to 
2018/2019 with so many admissions selection stages. To meet the minimum comparison 
requirements between the number of students with the development of human resources, 
facilities, and infrastructure, it is necessary to predict how much the number of students 
increases each year. To make a prediction system or forecasting, the number of prospective 
new students required a good forecasting method and sufficiently precise calculations to predict 
the number of prospective students who register. In this study, the method to be taken is the 
Random Forest method. For the evaluation of forecasting models used Random Sampling and 
Cross-validation. The parameter used is Mean Absolute Error (MAE), Mean Squared Error 
(MSE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R

2
). The results of 

this study obtained the five highest and lowest study programs in the admission of new 
students. Therefore, UPGRIS will make a new strategy for the five lowest study programs so 
that the desired number of new students is achieved. 

  
Keywords: Random Forest, Forecasting, Admission of new students, Promotion Strategy 
  
 
1. Introduction 

Forecasting is an estimate of something that hasn't happened. In social science, everything is 
completely uncertain, and it is difficult to estimate precisely. In this case, forecasting is needed. 
Forecasting is based on data contained during the past that are analyzed using certain 
methods. Whether or not the results of a study are determined by the accuracy of the 
predictions made [1]. 

College education institutions routinely hold new student admissions activities and the number 
of new students can experience an increase and can also decrease, even the data obtained 
based on existing historical data continues to increase [2]. 

The development of a university is influenced by the interest of the community, especially 
prospective students to study in the campus, the greater interest of prospective students needs 
to be followed by the development of human resources, facilities, and infrastructure. To meet 
the minimum comparison requirements between the number of students with the development 
of human resources, facilities, and infrastructure, it is necessary to predict how much the 
number of students increases each year. The Random Forest Method is effectively used to get 
a predictive model for increasing the number of new students [3]. 

The University of PGRI Semarang was founded in 2014, which is a merger IKIP PGRI 
Semarang with Semarang Academy of Technology (ATS). UPGRIS in the development of new 
student admissions for the 2014/2015 academic year up to 2018/2019 with so many admissions 
stages, namely selection/interest paths, achievement, regular, Past Learning Recognition (RPL) 
and BIDIKMISI (the aid of education costs from the government for high school graduates 
(SMA) or the equivalent that has good academic potential but has economic limitations) and 
maybe for the next year the exam path entry to UPGRIS will continuously increase because the 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

45 
 

quota in each department or faculty has been determined and the population level of people is 
different. 

Several studies related to the prediction of the number of prospective students include artificial 
neural networks with a backpropagation method to predict the number of new students. The 
results of this study indicate that backpropagation has a good level of accuracy in the 
predictions of new students with a 5-1 neuron structure with 1 (one) hidden layer, learning rate 
(lr) used 0.1, and MSE value 0.001 [4]. Furthermore, related to the prediction of the number of 
prospective new students using fuzzy time series-time invariant. Based on this study, the results 
of the prediction obtained by using three intervals six comparisons with the MAE value of the 
prediction error of 0.54, interval 9 with the MAE value, the prediction error was 0.32 and interval 
12 with the MAE value of the prediction error of 0.29 [5]. From some of these studies, results 
obtained are good, but researchers conducted a different approach using Random Forest 
because that method can be used for incomplete attributes & can be applied to a large sample. 

Some related studies that use the Random Forest Method are Assessment of the relationship of 
environmental factors with populations with different genetics using the Random Forest Method. 
The object used is Mytilus sea shells. The results obtained from novel machine learning can 
show the relationship of environmental factors with populations with different genetic functions 
[6] classification of medical data using the Random Forest Method. The results obtained from 
the experiment were able to produce good predictions of 10 diseases [7]. Use of the Random 
Forest Method in the analysis of genetic data. The results obtained are that the Random Forest 
Method is not only good for analysis but also good for prediction and classification, variable 
selection, path analysis, genetic association and epistasis detection, and unsupervised learning 
[8]. They are determining the location of Malonation using the Random Forest approach. LAMP 
is a development of LSTM and Random Forest. Overall, LEMP is very good at identifying the 
location of Malonation [9]. Random Forest and Stochastic Gradient approach to predict noise 
levels in car body design. The parameters used in building the model are using cross-validation 
and repeated ten times in the dataset. The built model shows better accuracy results than the 
previous model [10]. Use of the Random Forest Method in predicting air pollution. The data 
used comes from the Central Pollution Control Board for two cities (Delhi and Patna). The seven 
parameters used are C6H6, NO2, O3, SO2, CO, PM2.5, and PM10. The prediction results 
obtained are far better than before [11]. Predict protein structure using the Random Forest 
approach. The results of this study are compared with the AMIDE dataset, which shows good 
results [12]. Detection of DNS DDoS attacks using the Forest Random Algorithm. In this study, 
the level of detection accuracy reached 99.2% [13]. Investigate the use of software with the 
Random Forest detector. The evaluation process was done by Random Sampling with training 
data as much as 70%. The dataset used in this study is ISBSG R8, Tukutuku, and COCOMO. 
The results obtained in the evaluation were that Random Forest outperformed Regression 
Trees on all criteria [14]. Use of the Random Forest Method in predicting Alzheimer's disease. 
The dataset used is ADNI (AD / HC) The results obtained in this study are the sensitivity of the 
dataset in predicting an increase of 79.5% / 75% to 83.3% / 81.3% [15]. The Random Forest 
algorithm is used to predict rainfall. Random forest accuracy using the 10-fold cross validation 
technique is 71.09% while the technique uses all data at 99.45%. The level of accuracy 
generated from the use of the technique of all data as training data and testing data is a 
substitution estimate, where the estimated results are often very good which is useful for 
diagnostic purposes [16]. 

To make a prediction system or forecasting, the number of prospective new students required a 
good forecasting method and sufficiently precise calculations to predict the number of 
prospective students who register. In this study, the method to be taken is the Random Forest 
method. 

 
2. Research Methods 

Prediction of prospective new students at PGRI University Semarang by using five stages. 
These stages are (1) problem analysis; (2) data collection; (3) data processing; (4) random 
forest implementation; (5) analysis phase. The research method carried out in this study can be 
seen in Figure 1. 
 

LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 

46 
 

Figure 1. Research Method Flowchart 

2.1. Problem analysis 

The analysis is done so that it can be a reference for making a system that will be made, 
namely, forecasting the number of prospective students who register. At this time, UPGRIS 
does not yet have a system for forecasting the number of prospective student applicants, so 
there are problems that occur because the university does not have a forecasting system, as 
explained in the previous background. To find out the forecasting of the number of prospective 
new students who register for the following year, then a forecasting application design system is 
created for the number of prospective students who register using the Random Forest method. 

2.2. Data Collection 

The data used is the data on the number of new student registrants is the new UPGRIS student 
data for the 2014/2015 academic year up to 2018/2019. UPGRIS has eight faculties and 23 
study programs. From the data obtained, not all new students registered, do a re-registration. 
That are various reasons, for example, accepted at state universities, not enough money, being 
a police or army officer, etc. The data used in this study can be seen in Table 1.  
    

Table 1. Data on the Number of New Students at the University of PGRI Semarang 

Year Study Program Registrant New students 

2018 BK 259 144 
2018 PGSD 735 323 
2018 PAUD 46 162 
2018 PPKn 66 34 
2018 MTK 241 133 
2018 Biologi 110 54 
2018 FIS 43 24 
2018 PBSI 264 158 
2018 PBI 259 147 
2018 PBJ 44 28 
2018 MP 98 74 
2018 PTI 57 29 
2018 Ekonomi 93 50 
2018 PB 24 11 
2018 PJKR 499 315 
2018 T-Sipil 122 69 
2018 T-Mesin 190 116 
2018 T-Elektro 31 17 
2018 Informatika 132 95 
2018 T-Pangan 59 32 
2018 Arsitektur 60 40 
2018 Hukum 99 49 
2018 Manajemen 314 195 
2017 BK 250 158 
2017 PGSD 780 407 
2017 PAUD 77 55 
2017 PPKn 87 48 
2017 MTK 259 175 
2017 Biologi 164 109 
2017 FIS 49 32 
2017 PBSI 249 183 
2017 PBI 272 178 
2017 PBJ 49 19 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

47 
 

Year Study Program Registrant New students 

2017 MP 169 116 
2017 PTI 73 42 
2017 Ekonomi 121 93 
2017 PB 45 21 
2017 PJKR 488 308 
2017 T-Sipil 96 68 
2017 T-Mesin 167 125 
2017 T-Elektro 34 14 
2017 Informatika 111 71 
2017 T-Pangan 47 26 
2017 Arsitektur 50 21 
2017 Hukum 66 40 
2017 Manajemen 276 177 
2016 BK 289 135 
2016 PGSD 1076 491 
2016 PAUD 109 63 
2016 PPKn 68 36 
2016 MTK 385 194 
2016 Biologi 191 97 
2016 FIS 71 33 
2016 PBSI 305 179 
2016 PBI 359 179 
2016 PBJ 48 27 
2016 MP 191 130 
2016 PTI 77 48 
2016 Ekonomi 204 101 
2016 PB 56 20 
2016 PJKR 557 320 
2016 T-Sipil 154 77 
2016 T-Mesin 188 112 
2016 T-Elektro 47 13 
2016 Informatika 181 99 
2016 T-Pangan 62 24 
2016 Arsitektur 66 25 
2016 Hukum 62 27 
2016 Manajemen 203 91 
2015 BK 292 157 
2015 PGSD 1499 497 
2015 PAUD 106 72 
2015 PPKn 83 61 
2015 MTK 350 195 
2015 Biologi 230 135 
2015 FIS 90 53 
2015 PBSI 439 275 
2015 PBI 364 197 
2015 PBJ 41 25 
2015 MP 56 0 
2015 PTI 70 44 
2015 Ekonomi 302 166 
2015 PB 13 0 
2015 PJKR 554 308 
2015 T-Sipil 147 69 
2015 T-Mesin 192 122 
2015 T-Elektro 41 20 
2015 Informatika 131 69 
2015 T-Pangan 52 28 
2015 Arsitektur 70 29 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 

48 
 

Year Study Program Registrant New students 

2015 Hukum 0 0 
2015 Manajemen 0 0 
2014 BK 492 173 
2014 PGSD 2572 501 
2014 PAUD 161 77 
2014 PPKn 137 59 
2014 MTK 622 226 
2014 Biologi 330 132 
2014 FIS 202 77 
2014 PBSI 559 213 
2014 PBI 515 169 
2014 PBJ 50 7 
2014 MP 0 0 
2014 PTI 106 44 
2014 Ekonomi 368 131 
2014 PB 0 0 
2014 PJKR 648 230 
2014 T-Sipil 143 41 
2014 T-Mesin 251 90 
2014 T-Elektro 83 21 
2014 Informatika 133 43 
2014 T-Pangan 66 22 
2014 Arsitektur 45 14 
2014 Hukum 0 0 
2014 Manajemen 0 0 

2.3. Data processing 

Data from this study were taken at the UPGRIS Information and Technology Development 
Agency in May 2019. The data is a recapitulation of the number of new students applying to 
UPGRIS to become new students, namely from 2014 to 2018. Figure 2 is explained that the 
amount of data used is 37,648 with details: 115 lines and three attributes used (Study Program, 
Registrant & Year of applicants), and the target used is New Students. Figure 3 explains the 
amount of training data used by 70% of 115 rows contained in the dataset. 
 

Figure 2. Data Type used 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

49 
 

Figure 3. Sample Data 

2.4. Random Forest Implementation 

Random forest is one method used for classification and regression. This method is an 
ensemble of learning methods using a decision tree as a base classifier that is built and 
combined [17]. There are three important aspects in the Random Forest method, which are: (1) 
do bootstrap sampling to build predictive trees; (2) each decision tree predicts a random 
predictor; (3) then the forest random predicts by combining the results of each decision tree by 
means of a majority vote for classification or the average for regression. 

The process of combining the estimated values of many trees is similar to that done in the 
bagging method. Note that every time the tree is formed, the explanatory change candidate 
used to do the separation is not all the change involved, but only a portion of the election results 
are random. This process produces a single tree with different sizes and shapes. The expected 
result is that a single tree collection has a small correlation between the trees. This small 
correlation results in a small variety of randomized results [18] and smaller than the alleged 
variety of bagging results [19]. 

Further [19] explain that in Breiman [20] it has been proven that the limit of the magnitude of the 
prediction error by Random Forest is : 
 

                                                                     (1) 

 
Where    is the average correlation between pairs the conjecture of two single trees and s is 
average strength measurement for tree accuracy single. The greater s value indicates that the 
prediction accuracy is getting better. If you want to have a good Random Forest, then many 
single trees must be obtained with     smaller and s bigger. 

In Figure 4, information is provided regarding the steps to implement the Random Forest 
algorithm to predict the number of new students. The first step is to input data from the data 
transformation, which consists of explanatory attributes and target attributes. After that, the data 
is divided into two types (training data and testing data) with a percentage of 70% and 30%. In 
addition, the determination of training and testing data was also carried out using 95% training 
data. Later results will be compared between the two types of methods for determining the 
training data and testing the data. The Random Forest algorithm in this study uses 100 decision 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 

50 
 

trees that are randomly generated. Training data is used as input data for the Random Forest 
algorithm, while testing data is used to test or evaluate the output or model generated from the 
Random Forest algorithm. 
 

Figure 4. Random Forest Implementation 
 
Evaluation of the performance of Random Forest is done by using several measurement 
parameters, namely, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean 
Absolute Error (MAE), and Determination Coefficient (R

2
). Accuracy is the most common and 

simple parameter for evaluating the performance of predictive algorithms, namely by showing 
the level or percentage of predictive truth. MAE shows how many prediction deviations from the 
truth. RMSE is referred to as a brier score that measures related prediction deviations from the 
truth. MSE is very good at providing an overview of how consistently the model is built. R

2
 is 

useful for predicting and seeing how much the influence of variables given simultaneously. The 
Random Forest performance evaluation is shown in Figure 5. 
 

Figure 5. Random Forest Performance Evaluation 
 
The forecasting models carried out are then validated using a number of indicators (MSE, 
RMSE, MAE & R

2
). 

Mean Absolute Error is a measure of the difference between two continuous variables. Assume 
X and Y are paired observation variables that express the same phenomenon. Mathematically 
MAE is defined as follows : 
 

                                                    (2) 

 
Where    is the value of the forecast,     is the true value, and   is the amount of data. Based on 
formula 2, MAE intuitively calculates the average error by giving equal weight to all data (  = 
1.....n). 

Mean Squared Error (MSE) is another method for evaluating forecasting methods. Each error or 
remainder is squared. Then added up and added to the number of observations. This approach 
regulates large forecasting errors because they are squared. The method produces moderate 
errors, which are probably better for small errors, but sometimes make a big difference. 
Mathematically MSE is defined as follows : 

 
LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

51 
 

                                            (3) 

 
Based on formula 3, MSE gives greater weight compared to MAE, which is the quadratic value 
of error. As a consequence, small error value will be smaller and large error will be greater. 

Root Mean Squared Error (RMSE) is an alternative method for evaluating forecasting 
techniques that are used to measure the accuracy of the forecast results of a model. RMSE is 
the average value of the number of squared errors. It can also state the size of the error 
produced by an approximate model. The low RMSE value indicates that the variation in the 
value produced by an approximate model is close to the variation in the value of its 
observations. Mathematically RMSE is defined as follows : 
 

                                          (4) 

 
Based on formula 4,   is the value of observations,    is predictive value,   is a sequence of data 
in the database, and   is the amount of data. 

The coefficient of determination (R
2
) is often interpreted as how much the ability of all 

independent variables to explain the variance of the dependent variable. In general, R
2
 for 

cross-data is relatively low because of the large variations between each observation, while 
data for time series data usually has a higher coefficient of determination. In simple terms, the 
coefficient of determination is calculated by squaring the Correlation Coefficient (R). 
Mathematically R

2
 is defined as follows: 

 
                                                                                                                    (5) 
 

Coefficient of determination with symbol    is the proportion of variability in a calculated data 
based on a statistical model. Another interpretation that    is defined as the proportion of 
variation responses by the regressor (independent variable / X) in the model. Thus, if    = 1 it 
will mean that the corresponding model explains all the variability in the Y variable. If    = 0 will 
mean that there is no relationship between the regressor (X) and the Y variable. 
 

2.5 Analysis 

In the analysis phase, an analysis of the model produced in connection with a case study 
predicts the number of new students applying to UPGRIS. In addition, the results of testing 
based on testing parameters were also analyzed to determine the quality of the model 
produced. 
  
3. Result and Discussion 

Figure 6 is a presentation of the evaluation of output from a random forest algorithm with data 
sharing techniques using 70% random sampling of data and iterations 100 times. 
  

Figure 6. Evaluation of Random Forest Performance on Model Results from Random Sampling 
70% 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 

52 
 

Figure 7. Evaluation of Random Forest Performance on Model Results from Cross-Validation 
 

Figure 7 is the result of an evaluation of the model produced by the random forest algorithm with 
Cross-Validation. 

Based on the results of the evaluation of the resulting model, it can be analyzed that the random 
forest implementation uses 70% for training data. If seen from MSE, RMSE, MAE, and R

2
. 

Random forest accuracy uses random sampling technique for MSE = 1424.913, RMSE = 
37.748, MAE = 23.482 and R

2
 = 0.871  then results random forest evaluation use cross-

validation, if seen from MSE, RMSE, MAE and R
2
. Random forest accuracy uses random 

sampling technique for MSE = 874.127, RMSE = 29.566, MAE = 18.985 and R
2
 = 0.921. 

Forecasting results using the Random Forest Method are shown in Table 2. 
 

Table 2. The Results of Forecasting the Number of New Students with Random Forest 

New Students Random Forest Year Study Program Registrant 

144 148.295 2018 BK 259 
323 355.117 2018 PGSD 735 
162 24.960 2018 PAUD 46 
34 28.554 2018 PPKn 66 
133 139.277 2018 MTK 241 
54 59.359 2018 Biologi 110 
24 24.378 2018 FIS 43 
158 160.948 2018 PBSI 264 
147 155.691 2018 PBI 259 
28 25.474 2018 PBJ 44 
74 56.391 2018 MP 98 
29 33.309 2018 PTI 57 
50 54.821 2018 Ekonomi 93 
11 15.457 2018 PB 24 
315 303.180 2018 PJKR 499 
69 74.829 2018 T-Sipil 122 
116 117.175 2018 T-Mesin 190 
17 15.105 2018 T-Elektro 31 
95 81.079 2018 Informatika 132 
32 26.223 2018 T-Pangan 59 
40 26.476 2018 Arsitektur 60 
49 56.947 2018 Hukum 99 
195 182.278 2018 Manajemen 314 
158 142.247 2017 BK 250 
407 417.175 2017 PGSD 780 
55 49.823 2017 PAUD 77 
48 56.533 2017 PPKn 87 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

53 
 

New Students Random Forest Year Study Program Registrant 

175 168.138 2017 MTK 259 
109 100.942 2017 Biologi 164 
32 25.181 2017 FIS 49 
183 163.855 2017 PBSI 249 
178 167.764 2017 PBI 272 
19 25.749 2017 PBJ 49 
116 113.646 2017 MP 169 
42 41.914 2017 PTI 73 
93 82.053 2017 Ekonomi 121 
21 22.338 2017 PB 45 
308 295.671 2017 PJKR 488 
68 62.258 2017 T-Sipil 96 
125 110.830 2017 T-Mesin 167 
14 15.376 2017 T-Elektro 34 
71 70.335 2017 Informatika 111 
26 25.163 2017 T-Pangan 47 
21 25.370 2017 Arsitektur 50 
40 53.600 2017 Hukum 66 
177 167.875 2017 Manajemen 276 
135 153.314 2016 BK 289 
491 457.772 2016 PGSD 1076 
63 64.438 2016 PAUD 109 
36 29.092 2016 PPKn 68 
194 185.185 2016 MTK 385 
97 115.228 2016 Biologi 191 
33 33.354 2016 FIS 71 
179 180.793 2016 PBSI 305 
179 185.774 2016 PBI 359 
27 25.565 2016 PBJ 48 
130 119.624 2016 MP 191 
48 51.128 2016 PTI 77 
101 108.596 2016 Ekonomi 204 
20 23.304 2016 PB 56 
320 310.339 2016 PJKR 557 
77 77.728 2016 T-Sipil 154 
112 115.691 2016 T-Mesin 188 
13 18.272 2016 T-Elektro 47 
99 107.120 2016 Informatika 181 
24 26.083 2016 T-Pangan 62 
25 27.995 2016 Arsitektur 66 
27 54.022 2016 Hukum 62 
91 111.642 2016 Manajemen 203 
157 154.161 2015 BK 292 
497 463.477 2015 PGSD 1499 
72 63.870 2015 PAUD 106 
61 56.687 2015 PPKn 83 
195 188.799 2015 MTK 350 
135 132.376 2015 Biologi 230 
53 56.373 2015 FIS 90 
275 262.299 2015 PBSI 439 
197 185.603 2015 PBI 364 
25 24.110 2015 PBJ 41 
0 25.237 2015 MP 56 
44 39.433 2015 PTI 70 
166 164.016 2015 Ekonomi 302 
0 9.322 2015 PB 13 

308 307.548 2015 PJKR 554 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 

54 
 

New Students Random Forest Year Study Program Registrant 

69 75.773 2015 T-Sipil 147 
122 116.110 2015 T-Mesin 192 
20 17.651 2015 T-Elektro 41 
69 75.619 2015 Informatika 131 
28 26.218 2015 T-Pangan 52 
29 33.116 2015 Arsitektur 70 
83 59.507 2015 Hukum 0 
0 18.088 2015 Manajemen 0 

173 208.222 2014 BK 492 
501 463.046 2014 PGSD 2572 
77 82.459 2014 PAUD 161 
59 75.657 2014 PPKn 137 
226 238.067 2014 MTK 622 
132 168.270 2014 Biologi 330 
77 92.868 2014 FIS 202 
213 242.909 2014 PBSI 559 
169 209.330 2014 PBI 515 
7 15.701 2014 PBJ 50 
0 12.772 2014 MP 0 
44 62.214 2014 PTI 106 
131 154.416 2014 Ekonomi 368 
0 12.654 2014 PB 0 

230 256.458 2014 PJKR 648 
41 75.027 2014 T-Sipil 143 
90 115.037 2014 T-Mesin 251 
21 55.027 2014 T-Elektro 83 
43 77.268 2014 Informatika 133 
22 19.932 2014 T-Pangan 66 
14 15.618 2014 Arsitektur 45 
89 58.161 2014 Hukum 0 
0 14.088 2014 Manajemen 0 

 
The results of testing using Random Forest obtained 5 study programs with a significant 
increase in the number of new students and 5 study programs with the lowest number of new 
students. Study Program with an increase in the number of students, which are: Management 
Study Program (75%), PBSI / Indonesian Language and Literature Study Program (52%), 
Mathematics Education (50%), Economic Education (46%), MP / Masters in Education 
Management (43%). Five study programs with the lowest number of new students, which are: 
Master of Education and Indonesian Language (2.6%), Law (2.7%), Early Childhood Education 
(PAUD) (3.4%), Food Technology (3,7%), Javanese Language and Literature Education / PBJ 
(4.5%). Therefore, UPGRIS will focus more on the five lowest study programs in accepting new 
students to make a promotion strategy that is more effective and efficient, so that it is expected 
to get the number of new students according to the target set. 

Forecasting is forecasting or estimation of something that has not happened. Forecasts carried 
out, in general, will be based on data contained in the past that are analyzed using certain 
methods. Forecasting is attempted to be made to minimize the influence of uncertainty, in other 
words aiming to get a forecast that can minimize forecast errors that are usually measured by 
MAE, MSE, RMSE, and R

2
. Forecasting is a very important tool in effective and efficient 

planning. 

Demand forecasting has certain characteristics that apply in general. These characteristics must 
be considered to assess the results of a demand forecasting process and the forecasting 
method used. Forecasting characteristics, namely the causal factors that apply in the past, are 
assumed to be valid in the future, and forecasting is never perfect, actual demand is always 
different from the forecast demand. The use of various forecasting models will provide different 
forecast values and degrees of different forecast errors. The art of forecasting is to choose the 
best forecasting model that is able to identify and respond to historical activity patterns from the 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 
 

55 
 

data. For the evaluation of forecasting models, MAE is more intuitive in providing error averages 
for all data. Whereas MSE is very sensitive to outliers. Because the square value is calculated, 
the outlier error will be given a very large weight and make the MSE value even greater. MSE is 
very good at providing an overview of how consistently the model is built. By minimizing the 
value of MSE, it means minimizing model variants. Models that have small variants can provide 
relatively more consistent results for all input data compared to models with large variants. 
RMSE is a more intuitive alternative than MSE because it has the same measurement scale as 
the data being evaluated. For example, twice the value of RMSE means that the model has 
twice the error than before. Whereas twice the value of MSE does not mean that. If MSE is 
analogous to a variant, then RMSE can be analogous to the standard deviation. 

The amount of this R
2
 ranges between 0-1. The smaller the value of R

2
, then the effect of the 

independent variable (x) on the dependent variable (y) is getting weaker. Conversely, if the 
value of R

2
 gets closer to number 1, then the effect will be stronger. 

 
4. Conclusion 

For the evaluation of forecasting models, MAE is more intuitive in giving the average error of the 
entire data, whereas MSE is very sensitive to outliers. Because the square value is calculated, 
the outlier error will be given a very large weight and make the MSE value even greater. RMSE 
is a more intuitive alternative than MSE because it has the same measurement scale as the 
data being evaluated. The fundamental weakness of R

2
 is the blank towards the number of 

independent variables, and then the R
2
 value must increase no matter whether the variable 

affects the dependent variable or not. Therefore it is recommended to use the "adjusted R
2
" 

value when evaluating the model. 

From the results of forecasting new students using Random Forest, the highest and lowest 5 
study programs were obtained in the admission of new students. Therefore, UPGRIS will make 
a new strategy for the five lowest study programs so that the desired number of new students is 
achieved. 
 
References 

[1] A. Purba, “Perancangan Aplikasi Peramalan Jumlah Calon Mahasiswa Baru yang 
mendaftar menggunakan Metode Single Exponential Smoothing (Studi Kasus: Fakultas 
Agama Islam UISU),” Jurnal Riset Komputer, vol. 2, no. 6, pp. 8–12, 2015. 

[2] M. Irfan, L. P. Ayuningtias, and J. Jumadi, “Analisa Perbandingan Logic Fuzzy Metode 
Tsukamoto, Sugeno, Dan Mamdani (Studi Kasus : Prediksi Jumlah Pendaftar Mahasiswa 
Baru Fakultas Sains Dan Teknologi Uin Sunan Gunung Djati Bandung),” Jurnal Teknik 
Informatika, vol. 10, no. 1, pp. 9–16, 2018. 

[3] A. S. Ritonga and S. Atmojo, “Pengembangan Model Jaringan Syaraf Tiruan untuk 
Memprediksi Jumlah Mahasiswa Baru di PTS Surabaya (Studi Kasus Universitas Wijaya 
Putra),” Jurnal Ilmiah Teknologi Informasi Asia, vol. 12, no. 1, p. 15, 2018. 

[4] L. Nurhani, A. Gunaryati, S. Andryana, and I. Fitri, “Jaringan Syaraf Tiruan Dengan Metode 
Backpropagation,” in Seminar Nasional Teknologi Informasi dan Multimedia, 2018, pp. 25–
30. 

[5] S. Karmita, A. Bramanto, O. Gaffar, and A. S. Wiguna, “Prediksi Jumlah Calon Mahasiswa 
Baru Menggunakan Fuzzy Time Series-Time Invariant,” in Prosiding Seminar Ilmu 
Komputer dan Teknologi Informasi, 2018, vol. 3, no. 1, pp. 208–214. 

[6] T. Kijewski et al., “Random forest assessment of correlation between environmental factors 
and genetic differentiation of populations: Case of marine mussels Mytilus,” Oceanologia, 
vol. 61, no. 1, pp. 131–142, 2019. 

[7] M. Z. Alam, M. S. Rahman, and M. S. Rahman, “A Random Forest based predictor for 
medical data classification using feature ranking,” Informatics in Medicine Unlocked, vol. 
15, no. January, pp. 1–12, 2019. 

[8] X. Chen and H. Ishwaran, “Random forests for genomic data analysis,” Genomics, vol. 99, 
no. 6, pp. 323–329, 2012. 

[9] Z. Chen, N. He, Y. Huang, W. T. Qin, X. Liu, and L. Li, “Integration of A Deep Learning 
Classifier with A Random Forest Approach for Predicting Malonylation Sites,” Genomics, 
Proteomics Bioinforma., vol. 16, no. 6, pp. 451–459, 2018. 


LONTAR KOMPUTER VOL. 11, NO. 1 APRIL 2020   p-ISSN 2088-1541 
DOI : 10.24843/LKJITI.2020.v11.i01.p05    e-ISSN 2541-5832 
Accredited B by RISTEKDIKTI Decree No. 51/E/KPT/2017 

56 
 

[10] A. Patri and Y. Patnaik, “Random forest and stochastic gradient tree boosting based 
approach for the prediction of airfoil self-noise,” in International Conference on Information 
and Communication Technologies (ICICT 2014), 2015, vol. 46, pp. 109–121. 

[11] Rubal and D. Kumar, “Evolving Differential evolution method with random forest for 
prediction of Air Pollution,” in International Conference on Computational Intelligence and 
Data Science (ICCIDS 2018), 2018, vol. 132, pp. 824–833. 

[12] C. Kathuria, D. Mehrotra, and N. K. Misra, “Predicting the protein structure using random 
forest approach,” in International Conference on Computational Intelligence and Data 
Science (ICCIDS 2018), 2018, vol. 132, pp. 1654–1662. 

[13] L. Chen, Y. Zhang, Q. Zhao, G. Geng, and Z. Yan, “Detection of DNS DDoS Attacks with 
Random Forest Algorithm on Spark,” in The 2nd International Workshop on Big Data and 
Networks Technologies (BDNT 2018), 2018, vol. 134, pp. 310–315. 

[14] Z. Abdelali, H. Mustapha, and N. Abdelwahed, “Investigating the use of random forest in 
software effort estimation,” International Conference on Intelligent Computing in Data 
Science, vol. 148, no. 2, pp. 343–352, 2018. 

[15] A. V. Lebedev et al., “Random Forest ensembles for detection and prediction of 
Alzheimer’s disease with a good between-cohort robustness,” NeuroImage: Clinical, vol. 6, 
pp. 115–125, 2014. 

[16] A. Primajaya et al., “Random Forest Algorithm for Prediction of Precipitation,” Indonesian 
Journal of Artificial Intelligence and Data Mining, vol. 1, no. 1, pp. 27–31, 2018. 

[17] V. Y. Kulkarni and P. K. Sinha, “Effective Learning and Classification using Random Forest 
Algorithm,” International Journal of Engineering and Innovative Technology, vol. 3, no. 11, 
pp. 267–273, 2014. 

[18] K. Hastuti, “Analisis Komparasi Algoritma Klasifikasi Data Mining Untuk Prediksi 
Mahasiswa Non Aktif,” Seminar Nasional Teknologi Informasi & Komunikasi Terapan 2012, 
vol. 14, no. 1, pp. 241–249, 2012. 

[19] M. Zhu, “Kernels and Ensembles,” Journal The American Statistician, vol. 62, no. 2, pp. 
97–109, 2008. 

[20] L. Breiman, Random Forest, Second Edition, California: Statistics Department University of 
California Berkeley, 2001.