International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol. 15, No. 08, 2021


Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

Predictive Analytic on Human Resource Department 

Data Based on Uncertain Numeric Features Classification 

https://doi.org/10.3991/ijim.v15i08.20907 

Asrul Huda (), Noper Ardi 
Universitas Negeri Padang, Padang, Indonesia 

asrulhuda@gmail.com 

Abstract—Business Intelligence is very popular and useful for a better un-

derstanding of business progress these days, and there are many different meth-

ods or tools being used in Business Intelligence. It uses combination of artificial 

intelligence, data mining, math, and statistic to gain better understanding and 

insight on the business process performance. As employees have an important 

role in business process, the desire to have a tool for classifying and predicting 

their wages are desirable. In this research, we tried to analyzed dataset from 

Human Resource Department, and this dataset can be used to analyst the data in 

order to draw a conclusion about whether any employees would prematurely 

leave the company, and then, a preventive action based on those parameters can 

be proposed. This is a kind of predictive analytic system which bases on Naïve 

Bayes, and it can predict whether an employee would leave or stay according to 

his or her characteristics. But the Naïve Bayes itself does not enough. So we 

develop a way to solve the problem using uncertain Numeric features classifica-

tion on it. The accuracy of the result is depended on the amount and effective-

ness of the training sets. 

Keywords—Predictive Analytic; Naïve Bayes; Business Intelligence; Human 

Resource 

1 Introduction 

Human analytical is an application of math, statistics and data modeling related to 

employees in the company to view and predict future employee performance based on 

the data. This analysis is commonly known as HR (Human Resource) Analytic. Im-

proving business performance is the main goal of HR strategy, and HR analytic is 

used to make a better decision about it. 

Employees play a substantial role in the business process. The longer employees 

work in a company, the greater their value for the company. The value means in this 

context is such as their familiarity of the company, work experience, tool used in the 

company, even a project that they work in it before. So, keeping them comfortably 

staying in the company is essential[1][2]. Occasionally, they leave the company with-

out any notification. This can be happening due to many factors such as salary, over 

workload, promotion, or even better opportunity from other company. If this situation 

172 http://www.i-jim.org

https://doi.org/10.3991/ijim.v15i08.20907
https://doi.org/10.3991/ijim.v15i08.20907
mailto:asrulhuda@gmail.


Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

is not taken seriously, it will be detrimental to the company. When they walk out the 

door, their substantial value also gone with them. Not surprisingly then, the tool for 

predicting this kind of situation is substantial. In an organization, the employee’s data 

stored in HR which can be used to make a predictive modelling as a preventive action. 

The ultimate goal of the model is to predict and significantly reduce employee turno-

ver[3]. 

There are several conditions that can be used as a reference of Human Resource 

Department (HRD) to predict their employees will be leave of the company. From 

these predictions, the HRD can do preventive action for the most valuable company's 

assets are not out of the company. 

2 Descriptive Analytic 

Usually, the first activity that can be used for analytics is descriptive analytics. This 

is the starting point of further more complex analytics like predictive and prescriptive. 

The preliminary stage of data processing is descriptive analytic. it generates a sum-

mary of historical data to produce useful information. 

Descriptive analytics is data simplification[4]. Descriptive analytic is a field of sta-

tistics focusing on gathering and summarizing raw data to be easily interpreted. It can 

be used to analyze the past event based on the available data and get insight about 

how to deal with the future.  

In contrast to predictive models which focus on predicting a single customer be-

havior, descriptive models can identify many different relationships between custom-

ers or products. Essentially, in descriptive analytic, seeking answer about what hap-

pened can be done easily without performing complex analysis like in diagnostic and 

predictive models.[5] Descriptive analytics can be used to analyzed the reasons behind 

past failure or success by mining its historical data and studies its performance. There 

are many management reporting data available, such as, marketing, operation, finance, 

and sales that uses this type of analysis. If these available data used efficiently, it will 

improve business performance. Classifying or prospecting costumer into groups can 

also be done by quantifying relationship in data using descriptive models.[6]  

The using of descriptive models is very broad these days, for example, to classify 

costumers by their specific references, such as, life stage, marital, or product refer-

ences. For further development, descriptive modeling tools can be used to make pre-

diction and simulate large number of individualized agents.[3] 

3 Predictive Analytic 

Predictive analytics uses historical data, statistical algorithm and Machine learning 

to predict future events. Predictive analytics turns data into actionable and valuable 

information. Predictive model is build based on historical data using mathematical 

model to capture its significant trends. Predictive analytics uses this trends to set on 

the probable future result of a prospect or an event of a currently happening 

situation.[7] 

iJIM ‒ Vol. 15, No. 08, 2021 173



Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

3.1 Naïve bayes 

Naive Bayes is a collection of classification algorithms which are based on Bayes 

Theorem[8]. Probabilistic classifier is used in Naïve Bayes classifier. Statistical clas-

sifiers that used in Bayesian classifier can be used to predict the probability of a class 

membership, such as, probability that a given sample is a property of a specific class. 

Naive Bayes is also known for another name, Simple Bayes or independence Bayes. 
In machine learning, a simple probabilistic classifier is used for Naive Bayes clas-

sifiers. Supervised learning method as well as a statistical method for classification is 

represented by Naive Bayes Classification. The model is a probabilistic model that 

permit to capture uncertainty about the model in a contentious way by determining 

probabilities[9]. 
Bayesian classification provides useful combination of observed data, past 

knowledge and learning algorithms[10]. The classification also provides a useful 

perspective for better understanding and also evaluating many learning algorithms. 

For hypothesis, this will helps to determine exact probabilities and also it is robust to 

noise in input data[11]. 

 𝜌(𝐴|𝐵) =
𝜌(𝐵|𝐴).𝜌(𝐴)

𝜌(𝐵)
 (1) 

𝜌(𝐴|𝐵) is conditional probability of an event A occurred given the event B is true, 

𝜌(𝐵|𝐴) is probability of the event B occurred given the event A is true and 𝜌(𝐴) and 
𝜌(𝐵) is probabilities of event A and B happened respectively.[12] 

4 Experimental Design 

There are several stage will be conducted in this research. The Stage are mainly 

consisting of two main stages, data preparation stage and testing data stage. The rest 

of the stage can be seen in Figure 1. 

174 http://www.i-jim.org



Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

 

Fig. 1. Research Flowchart 

The initial stage starts from data collection phase. The data used in this research are 

collected from open data at Kaggle.com. The raw data acquired will be processed in 

advance at data preparation stage. In This stage, the raw data format will be adjusted 

to a readable format for java programming. The results of this stage will be used as 

input data in the model. 

In the training and testing stage, the random data will be choosing for Testing and 

the rest of the data is used for training. Training data is used to form the initial classi-

fication model which will then be tested with test data[13][14]. 

5 Experimental Analyst 

5.1 Dataset 

On this project, we use “Human Resource Analytic” dataset from 

https://www.kaggle.com. This dataset is simulated by a company who studied about 

“Why their best and most experienced employees are leaving prematurely”. With this 

database, we can decide what the most valuable parameters are, and then we will be 

able to predict which valuable employees will possibly leave the company next. 

There are ten parameters in this dataset: 

iJIM ‒ Vol. 15, No. 08, 2021 175

https://www.kaggle.com/ludobenistant/hr-analytics


Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

1. Satisfaction level, indicates the satisfaction level of the employee. 

2. Last evaluation, indicates the data about achievement from the last evaluation. 

3. Number of projects, indicates the number of projects taken by an employee that 

assign by company. 

4. Average monthly hours, indicates the average monthly hours of work provided 

by the company. 

5. Time spent at the company, the total time the worker works at the company. 

6. Work accident, indicates whether the employee has an accident. The value for 

this parameter is binary (1 = yes, 0 = no). 

7. Promotion last 5 years, indicates whether the employee has a promotion in last 

5 years. The value for this parameter is binary (1 = yes, 0 = no). 

8. Departments (column sales), indicates the department where the employee 

work. The value for this parameter is categorical which is consist of sales, tech-

nical, marketing, support, management, accounting, product manager, account-

ing, and HR. 

9. Salary, indicates The Employee salary level based on their works. The value for 

this parameter is categorical which is consist of high, medium, and low. 

10. Left, indicates whether the employee has left the company. The value for this 

parameter is binary (1 = yes, 0 = no) 

 

Fig. 2. The original dataset 

5.2 Prediction and implementation 

The main goal is to predict whether an employee would leave or stay. From the da-

taset and the main goal, it clearly shows that there are two classes which are “leave” 

and “stay”, and each class has nine features which are the other parameters in the 

dataset. 
To implement the system, we created a simple project using Java programing lan-

guage with Jetbrain IntellijIDEA IDE. And there is no specific structure of the project 

packages. However, in order to make it easy to manage and understand, we created 

two different packages which one of them is for the training set or data, and another is 

for the classes of Java.  
We set up the training sets by creating a file named “TrainingSets.txt” which is ba-

sically the dataset in text format, and inside, there are many lines of dataset in which 

each line consists nine features and the defined class “leave” or “stay” at the end of 

176 http://www.i-jim.org



Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

the line like in the fig. 2. below. Be noted that the column of G (left) in fig. 1. will be 

shifted to the last order and all values of 0 are changed to “stay” and 1 to “leave”.  

 

Fig. 3. The training set of the dataset 

Naive Bayes classifier assumes that all the features are unrelated to each other. The 

presence or absence of a feature does not influence the presence or absence of any 

other feature, and this is the reason why Naive Bayes works well with any type of 

training set with specific classes.[12] However, it still could lead to inaccuracy with 

any feature which have uncertain value representing as various numbers, or in the 

other words, the uncertain numeric features need to be classified and grouped in order 

to provide high accuracy. 
There are five uncertain numeric features, and in order to fix the issue above, we 

find the minimum and maximum of each of numeric feature, and then classify into 

four different levels or quarters of its rank. Those five uncertain numeric features are 

classified as satisfaction_level, last_evaluation, number_profect, average_monthly, 

time_spent. The complete data for those five uncertain numeric features is shown in 

the table 1 below. 

Table 1.  Uncertain numeric features classification 

Feature Min Max Q1 Q2 Q3 Q4 

Satisfication_level 0 1 0-0.24 0.25-0.49 0.5-0.74 0.75-1 

Last_evaluation 0 1 0-0.24 0.25-0.49 0.5-0.74 0.75-1 

Number_projecr 2 7 0-1 2-4 5-7 8-10 

Average_monthly 96 310 75-149 150-224 225-299 300-375 

Time_spent 2 10 0-1 2-4 5-7 8-10 

 

Last but not least, our training set (Fig. 2) does not contain any note or mark to tell 

which feature each value belongs to; therefore, the system needs to differentiate the 

values between features by indexing the orders of the values of each line for example, 

“0.91 0.68 3 218 3 1 0 accounting medium” will be marked as “0-q4 1-q3 2-q2 3-q2 

4-q2 4-y 6-n accounting medium”. 

Note:  

iJIM ‒ Vol. 15, No. 08, 2021 177



Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

1. For features “Whether they have had a work accident” and “Whether they have 

had a promotion in the last 5 years”, their values are marked as “y” if they 

equal to “1”, and as “n” if they equal to “0”. 

2. If any value is unknown or not is in any group of the classify, it will be marked 

as “u”. 

6 Result and Discussion 

The system designed in this research based of java programming. The implemented 

system does not have any user interface. It requires command line Windows to test 

the data. Input data test can be put as the argument after the jar file, and the result of 

prediction will be show after.  

1. Test 1: Test whether the program is working well. Input: “0.91 0.68 3 218 3 1 0 

accounting medium” from Class “stay” line 3615. The data will be processed in 

the model and the result is correct as shown as below. 

 

Fig. 4. Test 1 

2. Test 2: Test an input which does not exist in the training set but closely similar 

to line 3178 from class “stay”. Input: “0.55 0.85 6 210 4 0 0 support medium” 

The result is expected to be “STAY”, and it does come as expected. The result 

is shown below. 

178 http://www.i-jim.org



Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

 

Fig. 5. Test 2 

3. Test 3: Test an incorrect or incomplete input. Input: “0.85 6 210 4 0 0 support 

medium”. The system will check the input and give some advices for the input. 

The result is shown below. 

 

Fig. 6. Test 3 with incorrect input 

The system asked to put add “-1” for any unknown features. So the input should be 

corrected as “-1 0.85 6 210 4 0 0 support medium”. And the result is shown as in the 

fig. 6. 

iJIM ‒ Vol. 15, No. 08, 2021 179



Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

 

Fig. 7. Test 3 after corrected the input 

7 Conclusion 

The accuracy of the result is depended on the amount and effectiveness of the 

training sets. Naive Bayes classifier assumes that all the features are unrelated to each 

other. The presence or absence of a feature does not affect the other features, and this 

is the reason why Naive Bayes works very well with any type of training set with 

specific classes. However, it still could lead to inaccuracy with any features which 

have uncertain value representing as various numeric numbers, or in the other words, 

the uncertain numeric features need to be classified and grouped in order to provide 

high accuracy.  

8 References 

[1] D. P. R.Shiva, “Customer behavior analysis using Naive Bayes with bagging homogene-

ous feature selection approach,” J. Ambient Intell. Humaniz. Comput., no. Published on 

Springer, 2020. https://doi.org/10.1007/s12652-020-01961-9 

[2] M. V. Sebt, E. Komijani, and S. S. Ghasemi, “Implementing a Data Mining Solution Ap-

proach to Identify the Valuable Customers for Facilitating Electronic Banking,”. Interna-

tional Journal of Interactive Mobile Technologies (iJIM). vol. 14, no. 15, 2020, pp. 157–

174. https://doi.org/10.3991/ijim.v14i15.16127 

[3] Z. Jaffar, “Predictive Human Resource Analytics Using Data mining Classification Tech-

niques,” Int. J. Comput., vol. 32, no. 1, pp. 9–20, 2019.  

[4] S. Loeb, S. Dynarski, D. McFarland, P. Morris, S. Reardon, and S. Reber, “Descriptive 

analysis in education: A guide for researchers,” U.S. Dep. Educ. Inst. Educ. Sci. Natl. 

Cent. Educ. Eval. Reg. Assist., no. March, pp. 1–40, 2017. 

180 http://www.i-jim.org

https://doi.org/10.1007/s12652-020-01961-9
https://doi.org/10.3991/ijim.v14i15.16127


Short Paper—Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric… 

[5] N. Ardi and Isnayanti, “Structural Equation Modelling-Partial Least Square to Determine 

the Correlation of Factors Affecting Poverty in Indonesian Provinces,” IOP Conf. Ser. Ma-

ter. Sci. Eng., vol. 846, no. 1, 2020. https://doi.org/10.1088/1757-899x/846/1/012054 

[6] J. Strickland, Predictive Analytics using R. Lulu, 2015. 

[7] A. Fuentes, Hands-On Predictive Analytics with Python: Master the complete predictive 

analytics process, from problem definition to model deployment. Packt, 2018. 

[8] A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Al-

gorithm based on Sentiment Analysis Using Review Dataset,” pp. 266–270, 2020. 

https://doi.org/10.1109/smart46866.2019.9117512 

[9] S. P. Huma Parveen, “Sentiment Analysis on Twitter Data-set using Naive Bayes Algo-

rithm,” 2nd Int. Conf. Appl. Theor. Comput. Commun. Technol., 2017. 

[10] I. M. Obeidat, N. Hamadneh, M. Alkasassbeh, M. Almseidin, and M. I. Alzubi, “Intensive 

Pre-Processing of KDD Cup 99 for Network Intrusion Classification Using Machine 

Learning Techniques,”. International Journal of Interactive Mobile Technologies (iJIM). 

Vol 13. No.1, 2019. https://doi.org/10.3991/ijim.v13i01.9679 

[11] S. P. Sujata Butte, “Big Data and Predictive Analytics Methods for Modeling and Analysis 

of Semiconductor Manufacturing Processes,” Microelectron. Electron Devices, no. IEEE 

Workshop on, 2016. https://doi.org/10.1109/wmed.2016.7458273 

[12] S. Kaliya Meiyyar, “The Comparative Study for Diagnosing Heart Disease Using KKN 

and Naïve Bayes,” Int. J. Adv. Res. Comput. Sci. Manag. Stud., vol. 3, no. 8, 2015.  

[13] N. Ardi, N.A Seatiawan and T.B Adji “Analytical Incremental Learning for Power Trans-

former Incipient Fault Diagnosis Based on Dissolved Gas Analysis,” pp. 3–6, 2020. 

https://doi.org/10.1109/icst47872.2019.9166441 

[14] A. Huda, N. Azhar, K. Anshari, and S. Hartanto, “Practicality and Effectiveness Test of 

Graphic Design Learning Media Based on Android,”. International Journal of Interactive 

Mobile Technologies (iJIM). Vol.14. No.4, 2020 pp. 192–203. 

https://doi.org/10.3991/ijim.v14i04.12737 

9 Authors 

Asrul Huda works in Universitas Negeri Padang in Indonesia. Email: asrulhu-

da@gmail.com 

Noper Ardi works for Universitas Negeri Padang in Indonesia. Email: nop-

er.ardi@gmail.com 

Article submitted 2021-01-03. Resubmitted 2021-02-26. Final acceptance 2021-02-26. Final version 
published as submitted by the authors.  

iJIM ‒ Vol. 15, No. 08, 2021 181

https://doi.org/10.1088/1757-899x/846/1/012054
https://doi.org/10.1109/smart46866.2019.9117512
https://doi.org/10.3991/ijim.v13i01.9679
https://doi.org/10.1109/wmed.2016.7458273
https://doi.org/10.1109/icst47872.2019.9166441
https://doi.org/10.3991/ijim.v14i04.12737
mailto:asrulhuda@gmail.com
mailto:asrulhuda@gmail.com
file:///D:/I%20A%20O%20E%202021/R%20View/iJIM/iJIM%2008/Divya/noper.ardi@gmail.com
file:///D:/I%20A%20O%20E%202021/R%20View/iJIM/iJIM%2008/Divya/noper.ardi@gmail.com