Paper Title (use style: paper title)


P-ISSN : 2715-2448 | E-ISSN : 2715-7199   

Vol.1 No.2, 10 July 2020 

Buana Information Tchnology and Computer Sciences (BIT and CS) 

 
27 | Vol.1 No.2, 10 July 2020 
Buana Information Tchnology and Computer Sciences (BIT and CS) 

Implementation of K-Nearest Neighbor Algorithm for Customer Satisfaction 

 
Sutan Faisal 1  
Study Program 

Technical Information 
Faculty of Engineering Computer 

Science, University Buana Perjuangan 
Karawang 

sutan.faisal@ubpkarawang.ac.id

 
 ‹β› 

Nurhayati 2 
Study Program 

Technical Information 
Faculty of Engineering Computer 

Science, University Buana Perjuangan 
Karawang 

nurhayati@ubpkarawang.ac.id 

Abstract—Customer satisfaction is the company's goal in 

providing services to its customers. Sewa Camera Cikarang is 

committed to customer satisfaction. By using the K Nearest 

Neighbor (KNN) algorithm of this study to analyze customer 

satisfaction of camera tenants. In this study price, facilities, 

services and loyalty are input attributes of customer satisfaction. 

Satisfied and dissatisfied is the result of the output. Increasing 

customer satisfaction and increasing profits on Cikarang Camera 

Rentals is the aim of this research. This study using the KNN 

algorithm obtained accuracy = 98%, recall classification = 

86.67%, classification accuracy = 100% and AUC = 0.750. It is 

expected that the results of this study can be used as a reference 

for building applications that can facilitate companies in 

obtaining information about customer satisfaction. 

Keywords—Datamining, classification, KNN algorithm, 
customer satisfaction. 

 
Abstrak—Customer Kepuasan pelanggan merupakan tujuan 

perusahaan dalam memberikan layanan kepada pelanggannya. 

Sewa Kamera Cikarang  berkomitmen untuk kepuasan 

pelanggan. Dengan menggunakan algoritma K-Nearest Neighbor 

(KNN) penelitian ini untuk menganalisa kepuasan pelanggan 

penyewa kamera. Dalam penelitian ini harga, fasilitas, layanan 

dan loyalitas merupakan atribut masukan kepuasan pelanngan . 

Puas dan tidak puas merupakan hasil outputnya. Meningkatkan 

kepuasan pelanggan dan meningkatkan laba pada Sewa Kamera 

Cikarang adalah tujuan penelitiian ini. Penelitian ini  dengan 

menggunakan algoritma KNN mendapatkan akurasi = 98%, 

klasifikasi recall = 86,67%, ketepatan klasifikasi = 100% dan AUC 

= 0,750. Diharapkan hasil penelitian ini dapat dijadikan acuan 

untuk membangun aplikasi yang dapat memudahkan perusahaan 

dalam memperoleh informasi tentang kepuasan pelanggan. 

 
Kata kunci— Pengumpulan data, klasifikasi, algoritma KNN, 

kepuasan pelanggan. 
 

I. INTRODUCTION  

A.  Introduction 

Along with the high level of human activity to meet the 
needs and needs of daily life, humans need to release their 

fatigue with a vacation. Then it needs to be supported with a 

camera to capture the moment of his vacation. But not 

everyone has a camera that is good enough to capture the 

holidays. 

 
Public awareness of the elements of service that can be 
provided by companies is increasing due to advances in 

education and a more prosperous economy, as well as the 

development of science and technology. The importance of 

service quality provided by service companies and in the 

form of goods is increasingly being realized by consumers. 

Each consumer's assessment of the quality of services / 

services varies depending on how consumers expect the 

quality of the service / service based on experience [1]. 

 
Achieving success in a service business, customer 

satisfaction must be the basis of management decisions, so 
management must make increasing customer satisfaction a 

fundamental goal. In order to provide quality services, the 

company must continually improve the quality of its human 

resources and the equipment it leases. This step is important 

to improve services from time to time. 

 
People who judge whether or not the quality of service is 

called a consumer. By comparing the services they receive 

with the services they expect consumers can judge the 

service. Consumers who are satisfied with the services 

provided by a company will make these consumers come 

back again to use the company's services again. Companies 
that have loyal customers because the company can satisfy 

their customers. Word of mouth promotion without coercion 

regarding the services it has received will be carried out by 

loyal consumers [4]. 

 
Tight competition must be faced by companies in the 

increasingly rapid development of the business world. The 

customers he has by the company are expected to be 

maintained forever. To realize this, it is not something that is 

easily climatic, as business competition is very tight at the 

moment considering that there are rapid changes that can 
occur at any time such as changes in customers, competitors 

or changes in broad conditions that are always dynamic. This 

requires policy makers to develop a strategy that is able to 

achieve sales growth targets, increase the company's market 

share, and achieve capabilities as the basis for sustainable 

growth. [1]. 

 
The tight competition must be faced by the company in 

the rapid development of the business world. In general, there 

are many ways to maintain customers forever, in a very tight 

mailto:sutan.faisal@ubpkarawang.ac.id
mailto:sutan.faisal@ubpkarawang


28 | Vol.1 No.2, 10 July 2020 
Buana Information Tchnology and Computer Sciences (BIT and CS) 

business competition it is very difficult to realize it given the 

many changes that can occur at any time. Such as changes in 

customers. Competitors and changes in broad conditions that 

always change dynamically. This makes policy makers to 

continue to develop a strategy that can achieve the goals of 

rental growth, increase market share, and the achievement of 
capabilities as a basis for sustainable growth [16]. 

 
B. Definition of Data Minning 

Data mining is data mining that has long been taken from 

several series of activities when viewed from the point of 

view, according to [5]. Data mining is an integrated data 
analysis process that consists of a series of actions based on 

the definition of the objectives to be analyzed, with data 

analysis and interpretation of the results. 

In recent years data mining has attracted the attention of the 

public and the world of information systems, because useful 

information in the form of knowledge generated from large 

data is needed. Applications ranging from market analysis, 

fraud detection, and customer retention, to production control 

and exploration science are generated from information and 

knowledge. [7].  

According to [5], Data mining has the following stages of 

the process: 

1.  Defining goals for analysis 

The clearest statement of the problem and the achieved 

goals are the most important in the correct formulation of 
the analysis. Determining the method to be used is one of 

the most difficult parts of the process. There must be no 

room for doubt or uncertainty and clear goals must. 
2.  Selection, organization, and preliminary treatment data 

The collection or selection of data needed to be analyzed 

is done after the objectives are analyzed and identified. 

The ideal source of data is theata's backup company, a 

"storage room" of historical data that is no longer used. If 

there is no data storage, the data market can be created by 

matching different corporate data sources. 

3.  Exploration of data analysis and transforming it  

At this stage involves an initial exploration analysis of 

data, which is very similar to thetechnique Online 

Analytical Process (OLAP). Transformation of the 
original variables to better understand the phenomena or 

statistical methods used are carried out at this stage. To 

highlight anomalous data, different data from other data 

is used in the analysis of exploration. 

4.  Specifications of statistical methods  

Statistical methods can be used, as well as many available 

algorithms, so it is possible to classify already available 

methods. The choice of method used to prepare the 

analysis depends on the problem being studied or the type 

of data available. Different methods are edited into two 

main classes according to different stages of data analysis, 
in particular: 

a. Descriptive Method  
To describe groups of data in a concise manner is the 

main goal of the method. There is no descriptive 

hypothesis between the available variables. Included 

in this group are the association method, log-

linearmodel, graphical model). 

b. Prediction Method 

The purpose of this class method is to describe one or 

more variables that are performed by finding 

classification or prediction rules based on the data. 

These rules help to predict or classify one or more 

answers or future variables of the target variables in 

relation to what is happening with the explanatory or 
input variable. Included in this method are neural 

networks, decision trees, and linear and logistic 

regression models. 

5. Data analysis based on the method chosen, which will then 
be applied to the statistical method to be used then 

translating into the appropriate algorithm to get the 

required results based on available data.  

6.  Evaluation of the methods used and Comparison for the 

analysis of the final model selection 

7. Commentary on the selected model and its use in the 

decision-making process. 

 
C. Clasification and Prediction 

Classification and prediction is a method that can make 

smart decisions. Researchers have now proposed a number of 

classifications and forecasting methods for machine learning, 
pattern recognition, statistical research. In this study, we focus 

on classifying methods in data mining as part of the machine 

learning process. The form of data analysis that can be used 

to extract models to predict future trends in data to be 

predicted is the classification and prediction of data mining. 

The classification process is divided into two stages, first the 

learning process in which the classification algorithm is used 

to analyze training data. is, the results of the presentation of 

the learning model or classifier in the form of classification 

rules, the two phases of the classification process, estimating 

the accuracy of the classification model or classifier from the 
test data. If the accuracy is accepted, the model is applied to 

find out the predicted results of new data. Bayesian methods, 

Bayesian networks, algorithm-based rules, neural networks, 

vector machine support, mining rules associations, k-nearest 

neighbors, case-based reasoning, genetic algorithms, rough 

sets and fuzzy logic are the classification techniques used. 

Focusing the Nearest Neighbor (KNN) K algorithm in this 

study. 

D. Data Minning Methods 

The idea of people already having knowledge in the 
process of classifying management has already been widely 

used. But talking about taxonomy (Tassein = classify + nomos 

= science, law) its use as a science of grouping living 

organisms (alpha taxonomy) at first ,has since become a 
general science group, including the principle of 

classification (taxonomic schemes). Thus, classification 

(taxonomy) processes the placement of an object (concept) 

based on a number of categories, each object (concept) based 

on ownership. [6].  

Four basic components for the classification process:  

1. Class: The dependent variable of the model is the 

categorical variable to represent the 'label' that uses the 

object after its classification. Examples of lessons are: 

heart attack, customer loyalty, stellar lesson (galaxy), 

earthquake lesson (storm), etc. 


29 | Vol.1 No.2, 10 July 2020 
Buana Information Tchnology and Computer Sciences (BIT and CS) 

2. Predictors: Classification of data and based on the 

classification made from the model represented by the 

characteristics (attributes) which are independent 

variables. Examples of such predictors are: smoking, drug 

consumption, blood sugar, sales frequency, sex status, 

satellite images, geological record, and wind speed 
direction, season, etc. 

3. Training dataset The data used for the 'training' model to 
recognize according to class, based on predictions 

available from the two two component data values before. 
4. Testing the dataset: contains new data classifications based 

on the Model built on, and classifications that are accurate 

(model performance) so they can be evaluated [6].  

a. There are no other attributions in the separate post 

b. There are no records inbranch an empty  

 
E,  K Neighrest Neighbor 

K-Nearest Neighbor (kNN) is included in the instance-

based learning group. This algorithm is also one of 

thetechniques lazy learning. KNN searches the k group of 

objects in the training data that is closest (similar) to the 

object in new data or testing data (similar) to the object in 

new data ordata testing [15]. Case in point, for example it is desirable to find 

a solution to the problem of a new patient by using a solution 

from an old patient. To find solutions from new patients, 

closeness to old patient cases is used, solutions from old cases 

that have closeness to new cases are used as a solution. There 
were new patients and 4 old patients, namely P, Q, R, and S 

(Figure 2). . When there is a new patient, the solution is taken 

from the case of the elderly patient who has the greatest 

kinship. 

 
          Fig. 1 Ilustrasi KNN 

For example, D1 distance between new patients and 

patient P, D2 distance between new patients and sick Q, D3 

distance between new patients and sick R, D4 distance 

between new patients and patient S. The picture shows that 
D2 is closest to the new case. Thus, the patient Q solution will 

be used as a solution for the new patient. (Henny Leidiyana, 

2013) Euclidean distance and manhattan distance (city block 
distance) are ways to measure the proximity between new 

data and old data (training data), the most commonly used is 

euclidean distance. [2], namely:  

 
Where  a = a1, a2, ..., an,  and b = b1, b2, ..., bn  represents 

the n attribute values of the two records.  

For attributes with category values, measurements with 

euclidean distance do not match. Instead, the following 

functions are used [10]:  

Different (a, b) {0 if ai = bi  

                                    = 1 besides  

 
where ai and bi are the category values. If the attribute value 
between the two records being compared is the same, the 

distance value is 0, the meaning is similar, on the contrary, if 

it is different then the value of proximity is 1, it means it is 

not similar at all. For example the color attribute with red and 

red values, the value of proximity is 0, if red and blue then 

the value of proximity 1. Normalization is done if measuring 

the distance from attributes that have large values, such as 

income attributes. Normalization can be done with min-max 

normalization or Z-score standardization [10]. If thedata 

training consists of a mixture of numerical and category 

attributes, the use of min-max normalization is preferred [10]. 

To calculate the similarity of cases, a formula is used [9]:  
 

Note:  

P = New cases  

q = Cases in storage  

n = Number of attributes in each case 

i = Individual attributes between 1 to n 

f = Function similarity attribute i between cases p and case q 

w = Weight given to i attribute  

 
E. Evaluation and Validation of Data Mining Prediction 
Methods 

In this study Cross Validation, Confusion Matrix, and 

ROC (curvesReceiver Operating Haracteristic) curve 

methods are used for evaluation and validation.  
1. Cross Validation  

To predict the error rate standard testing is done. In 

getting the overall error rate, the training data is 

randomly divided into several parts with the same 

comparison then the error rate is calculated section by 

section, then calculate the average for all error rates 
 

2. Confusion Matrix  

Table 2.1 is the method used, one class is considered 

positive and the other negative, if the dataset consists of 

only two classes. Percentage of accuracy of data records 

that are classified correctly after testing the classification 
results is the result of evaluation with a confusion matrix 

that has accuracy, precison, and recall.Accuracy values 

[7]. The proportion of positive predicted cases that are 

also true positive on the actual data is called precision or 

confidence. The proportion of true positive cases that is 

correctly predicted correctly is called recall or sensitivity. 

[12]. 

 
Table 1 Model Conflusion Matrix 

 
Correct 

Classification 

Classified as 

+ - 

+ True positives False negatives 

- False positives True negatives  

 
30 | Vol.1 No.2, 10 July 2020 
Buana Information Tchnology and Computer Sciences (BIT and CS) 

True Positive is the number of positive records that are 

classified positively, false positive is the number of negative 

records that are classified positively, false negative is the 

number of positive records classified as negative, true 

negative is the number of negative records classified as 

negative, then enter the test data. To get the amount of 
sensitivity (recall), Specifity, precision, and accuracy enter 

the value of the test data into the confusion matrix. Sensitivity 

is used to compare the number of t_pos to the number of 

positive records, while the comparison of the number of t_neg 

to the number of negative records is used precision. The 

equation below is used to calculate it 7]:  

 
Sensitifity = 𝑡_𝑝𝑜𝑠
𝑝𝑜𝑠

 (3.0) 

         Specifity = 𝑡_𝑛𝑒𝑔
𝑛𝑒𝑔

  
         Precision = 𝑡_𝑝𝑜𝑠
𝑡_𝑝𝑜𝑠+𝑓_𝑝𝑜𝑠

 
         Accuracy  = Sensitivity  pos               + 

                                                (pos+neg) 

                              Specifity    neg 

                                                (pos+neg) 

 
Remarks: 

t_pos = Number of true positives 
t_neg = Number of true negative  

p = Number of record positives  

n = Number of tuples negatives  

f_pos = Number of false positives  

 
3. ROC Curve 

Accuracy and visually comparing classifications can be 

demonstrated by the ROC Curve. Confusion matrix 

specified by the ROC. Two-dimensional graphics with 

horizontal lines as false positives and vertical lines as true 

positive are called ROC (Vercellis, 2009). To measure the 

difference in performance the method used is generated 
from the calculation of the area under curve (AUC). The 

formula used by AUC θr =
1

mn
∑ ∑ ψmi=1

n
j=1 (xt

r, xjr)

  
Where : 𝟁(X,Y) = {

1   𝑌 < 𝑋
1

2
  𝑌 = 𝑋

0  𝑌 > 𝑋

       
      Description:  

      X = positive output  

      Y = negative output  

   
II. METHOD 

In this study using rapidminer studio 9.0 testing tools, 

using the following methodology:  

 
Fig. 2 Methodology Used 

 
A.  Dataset  

Is a collection of data, a database table represented by a 

dataset, or it could be a data matrix where each particular 

variable is represented by a column, the amount of data is 

represented by a row. 

The Retrieve operator loads the RapidMiner object into 

the process used in this research. ExampleSet, but can also be 

a Collection or a Model. Data is retrieved this way as well as 

meta data from the RapidMiner Object. 

B.  Validation  

The operator used to perform simple validation randomly 

divides ExampleSet into a training set to set the test and 

evaluate the model. Split validation to estimate the 

performance of the learning operator (usually in an invisible 

data set) is performed by this operator. In practice, it will be 
shown how accurate a model estimate (learned by certain 

learning operators). 

C. KNN Algorithm  

In this study an experiment was carried out using the 

classification method of decamination tree datamining KNN 

algorithm on customer satisfaction questionnaire data on 
Cikarang Camera Rental. Data will be processed using the 

KNN algorithm and produce a model, then the resulting 

model will be tested Cross Validation which produces 

accuracy, precision, recall and AUC.  

D. Apply Model  

Learning algorithm which is the first model trained on 
ExampleSet by other Operators. After that, this model can be 

applied to another ExampleSet called Apply Model. To get 

predictions on data that are not visible or to transform data by 

applying the preprocessing model is the goal of applying the 

model. 

The model attribute must be compatible with ExampleSet 

where the model is applied. ExampleSet Apply The model 

must have the same number, sequence, type, and role 

attributes as ExampleSet used to generate the model. 

 
E. Performance The   

The operator is used to evaluate the statistics of a 

binomial classification task, ie a classification task whose 


31 | Vol.1 No.2, 10 July 2020 
Buana Information Tchnology and Computer Sciences (BIT and CS) 

label attribute has a binomial type. This operator provides a 

list of performance criterion values from the binomial 

classification task. 

Measure the results of this study using a confusion matrix 

(accuracy, recall classification, classification accuracy) and 

the ROC curve. 
 

III. RESULTS AND DISCUSSION 
 

A. Data Set Analysis Results The data  

Set used is the customer satisfaction questionnaire data set 

for Cikarang Camera Rental, this data set contains data - 

information about customer satisfaction questionnaires 

regarding prices, facilities, services and loyalty. The total data 

in this data set is 100 records, each of which has 10 attributes 

including: 

1. No (integers, roles: id)  

2. Name (polynominals)  
3. Price X1 (integers)  

4. Facility X2 (integers)  

5. Services X3 (integers)  

6. Loyalty X4 (integers)  

7. Results (binomials: satisfied & dissatisfied)  

From the attribute data set above (1 to 7) the training & 

test process will be carried out using the 10 Fold Cross 

Validation method, while the 7th attribute will be the target of 

the results of the classification process. and here we will try 

to analyze the difference between accuracy and error obtained 

by comparing the predicted results and results.  

 
Fig. 3 10 Fold Cross Validation 

 
The questionnaire can be illustrated below: 
 

                          Fig. 4. Questionnaire Form 

 
Data that has been processed using MS Excel 

 
            Fig. 5. Data Questionnaire that has been processed 

 
B. Experiment and Evaluation Results  

In this experiment there are 7 attributes which will be 

trained and 2 values that indicate the target (classification) on 

the 7th attribute, which means the KNN algorithm is 

initialized, 6 input attributes and 1 output attribute.  

The results of this study are:  

1. Confusion Matrix Table  

       
number of True PUAS (TP) is 124 records classified as 

True Positive 124 records and False Negative (FN) of 0 

records . Next 26 records for True Dissatisfaction (TTP) 

are classified as True Positive 23 records and False 

Negative as many as 3 records.  

2.  Pervormance Vector  

No Nama Harga X1 Fasilitas X2 Pelayanan X3 Loyalitas X4 Hasil

1 Anwar 6 4.33 4.00 3.50 PUAS

2 Maulana 6 4.33 4.00 3.50 PUAS

3 Budiman 5 4.33 3.75 3.50 PUAS

4 Geofany 5 4.33 3.75 3.00 PUAS

5 Fiki Ananda 5 4.33 3.75 3.00 PUAS

6 Haryanto 2 3.00 3.00 2.50 TIDAK PUAS

7 Rizky Narezka 5 4.33 4.25 3.75 PUAS

8 Aditia 5 4.00 4.50 3.75 PUAS

9 Agil 5 4.33 3.75 4.00 PUAS

10 Fadilah 3 3.00 3.75 2.75 TIDAK PUAS

11 Purwati 5 4.33 3.50 3.75 PUAS

12 Nurhajjah 5 3.67 3.50 4.25 PUAS

13 Umay 5 4.67 4.00 3.50 PUAS

14 Jesica 3 3.67 3.25 2.75 TIDAK PUAS

15 Krismonga 5 4.00 3.50 3.50 PUAS

16 Marzuki 3 3.00 2.75 2.75 TIDAK PUAS

17 Akbar 3 3.67 2.50 2.50 TIDAK PUAS


32 | Vol.1 No.2, 10 July 2020 
Buana Information Tchnology and Computer Sciences (BIT and CS) 

 
3. ROC Curve 

 
IV. CONCLUSION 

The classification method using the KNN algorithm is 

very good for determining the correctness of classification in 

data mining. Evidenced by the results of accuracy = 98%, 

classification recall = 86.67%, Classification precision = 

100% and AUC = 0.750.  

 
REFERENCES 

[1]  Abdul Rohman, Model Algoritma K Nearest 

Neighbour (KNN) Untuk Prediksi kelulusan 
Mahasiswa, Universitas Pandanaran Semarang, 

2015. 

[2] Bramer, Max. Principles of data mining. Vol. 180. 

London: Springer, 2007. 

[3] Basuki, Achmad dan Syarif, Iwan. 2003. Modul Ajar 

Decision Tree. Surabaya : PENS-ITS. 

[4] Deddy Setyawan, “Analisis Kepuasan Pengguna 

Jasa Transportasi Taksi Untuk Meningkatkan 

Loyalitas,” Universitas Diponegoro, 2010. 

[5] Giudici, Paolo, and Silvia Figini. Front Matter. John 

Wiley & Sons, Ltd, 2009.Applied Data Mining for 
Business and Industry. 

[6] Gorunescu, Florin. Data Mining: Concepts, models 

and techniques. Vol. 12. Springer Science & 

Business Media, 2011. 

[7] Han J, Kamber M. 2001. Data Mining : Concepts 

and Techniques. Simon Fraser University, Morgan 

Kaufmann Publishers. 

[8] Henny Leidiyana, 2013. Penerapan Algoritma K 

Nearest Neighbor untuk Penentuan Resiko Kredit 

Kepemilikan Kendaraan Bermotor. Jurnal Penelitian 

Ilmu Komputer, System Embedded & Logic. 

[9] Kusrini&Luthfi,E.T. 2009. Algoritma Data Mining. 
Yogyakarta : Andi Publishing. 

[10] Larose, D.T, 2006. Discovering Knowledge in Data: 

An Introduction to Data mining. John Willey &Sons, 

Inc. 

[11] M Rizki Ilham, Purwanto. 2016. Implementasi 

Datamining Menggunakan Algoritma  C 4.5 Untuk 

Prediksi Kepuasan Pelanggan. UDINUS Semarang. 
[12] Powers, David Martin. "Evaluation: from precision, 

recall and F-measure to ROC, informedness, 

markedness and correlation." (2011). 

[13] Rapid-I GmbH. (2008).Rapidminer-4.2-tutorial. 

Germany: Rapid-I. 

[14] Resty Mardiana, “Faktor – Faktor Yang 

Memperngaruhi Kepuasan Pengguna Jasa Taksi 

Blue Bird,” Jakarta, Universitas Gunadarma, 2010. 

[15] Sachdeva, M., Zhu, S., Wu, F., Wu, H., Walia, V., 

Kumar, S., ... & Mo, Y. Y. (2009). p53 represses c-

Myc through induction of the tumor suppressor miR-

145. Proceedings of the National Academy of 
Sciences, 106(9), 3207-3212. 

[16] Tan S, Kumar P, Steinbach M. 2005. Introduction To 

Data Mining. Addison Wesley.