KEDS_Paper_Template


Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 

Vol 4, No 2, December 2021, pp. 128–137 eISSN 2597-4637 

 

 

 

 

https://doi.org/10.17977/um018v4i22021p128-137  

©2021 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id  

This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) 

KEDS is Sinta 2 Journal (https://sinta.kemdikbud.go.id/journals/detail?id=6662) accredited by Indonesian Ministry of Education, Culture, 

Research, and Technology 

A Comparative Study of Machine Learning-based Approach  

for Network Traffic Classification 

Kien Trang
 a, b, 1

, An Hoang Nguyen
 a, b, 2, 

* 

a School of Electrical Engineering, International University  

Quarter 6, Linh Trung Ward, Thu Duc City, Ho Chi Minh City 700000, Vietnam 

b Vietnam National University, Ho Chi Minh City 

Linh Trung Ward, Thu Duc City, Ho Chi Minh City 700000, Vietnam 

1 tkien@hcmiu.edu.vn; 2 nhan@hcmiu.edu.vn* 

* corresponding author 

 

 

I. Introduction 

The accelerated development of the Internet has led to a new era of humans in the last decades. 
Nowadays, Internet applications are applied widely in different fields, including education and the 
working environment. Over a million learners are affected and need to switch to distance learning 
mode due to the outbreak of COVID-19 [1]. As the survey in [2], approximately 37% of US residents 
work remotely full-time in the first quarter of 2020, which leads to the fact that the data usage of the 
Internet reaches a new record height. The emergence of the Internet of Things (IoT) has brought about 
a major shift in the growing number and variety of connected devices and different applications 
supported by the network service provider. Thus, network traffic classification can solve complex 
network management problems for Internet Service Providers (ISPs). 

The goal of network traffic classification is to identify various types of network protocols and 
applications existing in a network to facilitate network management. The packets are classified to 
calculate the appropriate service policy for the routers. QoS, network planning, monitoring, traffic 
trend analysis, and firewall configuration benefit from traffic classification. Moreover, Internet traffic 
classification may play an important component of automated intrusion detection systems for 
automatically identifying denial of service attacks to allocate network resources to priority 

ARTICLE INFO A B S T R A C T   

Article history: 

Submitted 7 December 2021 

Revised 24 December 2021 

Accepted 29 December 2021 

Published online 31 December 2021 

 

Internet usage has increased rapidly and become an essential part of human life, 
corresponding to the rapid development of network infrastructure in recent years. 
Thus, protecting users’ confidential information when joining the global network 
becomes one of the most significant considerations. Even though multiple encryption 
algorithms and techniques have been applied in different parties, including internet 
providers, and web hosting, this situation also allows the hacker to attack the network 
system anonymously. Therefore, the significance of classifying network data streams 
to improve network system quality and security is attracting increasing study interests. 
This work introduces a machine learning-based approach to find the most suitable 
training model for network traffic classification tasks. Data pre-processing is first 
applied to normalize each feature type in the dataset. Different machine learning 
techniques, including k-Nearest Neighbors (KNN), Artificial Neural Network (ANN), 
and Random Forest (RF), are applied based on the normalized features in the 
classification phase. An open-access dataset ISCXVPN2016 is applied for this 
research, which includes two types of encryption (VPN and Non-VPN) and seven 
classes of traffic categories classes. Experimental results on the open dataset have 
shown that the proposed models have reached a high classification rate – over 85% in 
some cases, in which the RF model obtains the most refined results among the three 
techniques. 

This is an open access article under the CC BY-SA license 

(https://creativecommons.org/licenses/by-sa/4.0/). 

Keywords: 

Artificial neural network 

K-nearest neighbors 

Machine learning 

Network traffic classification 

Random forest  

http://u.lipi.go.id/1502081730
http://u.lipi.go.id/1502081046
https://doi.org/10.17977/um018v4i22021p128-137
http://journal2.um.ac.id/index.php/keds
mailto:keds.journal@um.ac.id
https://creativecommons.org/licenses/by-sa/4.0/
https://sinta.kemdikbud.go.id/journals/detail?id=6662
https://creativecommons.org/licenses/by-sa/4.0/


 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 129 

 

customers [3]. The ISPs can also increase the quality of services by accelerating the incident 
management process based on the Internet traffic classification. 

In network traffic classification, traditional methods have certain limitations. Firstly, packet 
marking is suggested to distinguish traffic based on its QoS class. Some common fields are used, such 
as Type of Service (ToS), Differentiated Services Code Point (DSCP), and Explicit Congestion 
Notification (ECN). Then, several protocols have been proposed for traffic classification, including 
Differentiated Services (DiffServ), Integrated Services (IntServ), and Multi-Protocol Label Switching 
(MPLS). Due to the system compatibility problem, these protocols are not widely deployed and 
applied in reality. Besides, Port-based and Payload-based are known as the commonly applied 
techniques in terms of tradition. Each packet is assigned a port number assigned by Internet Assigned 
Number Authority (IANA) for the port-based method. The classification can be obtained based on the 
registered port number. For instance, port 25 (SMTP) and port 110 (POP3) are used to send and 
receive mail, respectively. However, due to the increase of Internet applications, dynamic port 
numbers and tunneling are used to hide the port number leading to the limitations in this method [4]. 
For the payload-based method, the data packet's content is examined against the characteristics of 
network applications in Internet traffic. This technique is especially recommended for Peer-to-Peer 
(P2P) applications. However, this technique also has certain limitations due to the high demand for 
hardware to detect features in data packets and the incapacity of handling encrypted data traffic 
packets [5][6]. In general, these traditional approaches have drawbacks in terms of classified accuracy 
and resources. 

Over the last few years, in artificial intelligence (AI) research, machine learning (ML) has achieved 
remarkable success that allows automatic identification and classification without human intervention 
in some cases. Some recent research is gradually switching towards machine learning applications in 
network traffic classification. Yuan et al. [7] introduced using an advanced version of the decision 
tree called Hadoop C4.5 to classify the network traffic. The applied dataset contains eight classes 
which have 248 properties for each class. The results give an improvement in terms of classified speed 
and accuracy compared to the original method – reaching over 80%. The study in [8] mentioned the 
Netmate tool to select 23 core features before training for classification. Different algorithms are 
applied for comparting, including C4.5, Support Vector Machine (SVM), BayesNet, and Naive Bayes. 
Among these experiments, C4.5 gives the highest accuracy – 78.9%, while the lowest is 68.1% 
belongs to BayesNet. Similarly, Y. Ma et al. also applied the C4.5 decision tree to classify Internet 
traffic – reaching 88% in average accuracy. SVM and K-means are employed based on the realistic 
traces of the Internet in the research of Z. Fan et al. [9]. They apply the feature selection before the 
training stage. Different training and test set ratios are conducted, and the overall results are about 
98% for both classifiers. According to a study of these classification outcomes, the classification 
model based on supervised learning algorithms has greater precision than the classification model 
based on unsupervised learning methods. Four distinguish feature selection methods also are discussed 
as a pre-processing step in [10] to improve the efficiency of the computation process and limit the 
error in classification. Besides, they also conduct experiments on different classifiers, including k-
Nearest Neighbors (KNN), Random Forest (RF), and Gradient Boosting. The accuracy of feature 
selection methods and classifiers is approximately 85% in general. NaiveBayes classifier also is 
applied in work [11], [12], and [13], reaching over 90%, 93% and around 55%, respectively.  

As a result of the support from hardware, deep learning becomes one of the most helpful assistants 
in the task of classification. Convolutional Neural Network (CNN) is introduced as one of the powerful 
methods to deal with the complicated image-based classification for the huge dataset. The end-to-end 
architecture of CNN can feed the input data directly without feature extraction or pre-processing and 
output predicted probabilistic or predicted class. Although many proposed models are established for 
graphical classification usage, inspired by previous studies, many researchers try to adjust these 
models to fit with the network traffic classification. F. Zhang et al. [14] proposed an improved version 
for Capsule Neural Network (CapsNet) to identify network traffic. A conversion step and 
normalization are conducted to turn the features into the two-dimensional array before feeding into 
the networks. Three versions of CNN are compared in the experiments with an average accuracy of 
over 95%. Besides, the study [15] introduced using the pre-trained model ResNet and self-developed 
CNN. The result from ResNet outperforms self-developed CNN, which reaches nearly 95.5% and 
97%, respectively. The author explains that ResNet has the pre-trained weight and more complex 



130 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 

architecture than the rest. However, deep learning is not a universal method to apply to every case; 
indeed, the dependence on the dataset is one of the big challenges.  

There are three distinguished methods conducted in the study [16], including Random Forest (RF), 
Linear Discriminant Analysis (LDA), and Deep Neural Network (DNN). Regarding accuracy, two 
traditional machine learning methods, RF and LDA, have higher results than DNN for scenario A, 
while there is an improvement of DNN over RF in scenario B. L. Zhipeng et al. [17] discussed using 
two famous pre-trained CNN models: ResNet50 and GoogleNet. Since these two models are used for 
images, one-hot encoding transforms the symbolic features to the binary features stored as vectors. 
Afterward, the binary vectors are converted to grayscale images. The results give about 81% for the 
two given models. To deal with the limited samples dataset, the work in [18] proposed using Deep 
Convolutional Generative Adversarial Network (DCGAN) to generate more samples before training 
progress.  

This network can perform semi-supervised learning with the existent samples and create more new 
data to enrich the dataset. By this method, the learning for classification would have more data for 
training and testing, which can improve the generalization and prevent the overfitting problem. As the 
baseline CNN results, this study achieves 89% and 78% for self-collected and ISCX datasets, 
respectively. The research in [19] proposed employing feature extraction based on a convolutional 
recurrent autoencoder neural network. The proposed approach is established based on the autoencoder 
architecture, consisting of the encoder, latent space, and decoder. Different DNNs are applied to verify 
the performance, including CNN, Sparse Autoencoder (SAE), and Long Short-Term Memory 
(LSTM). Ultimately, the Stacked-CNN–LSTM architecture reaches the highest performance in almost 
all metrics. Table 1 shows the summary of the related studies in Network Traffic Classification. 

Table 1. Comparative studies 

Research Method Result Number of Class 

[7] C4.5 Decision tree Over 80% 8 

[8] C4.5 Decision tree 

SVM 

Bayes Net 

NaiveBayes 

78.9% 

74% 

68.1% 

71.8% 

5 

[9] SVM 

K-means 

~98% for both methods 6 

[10] KNN 

Random Forest 

Gradient Boosting 

~85% for all cases Not mentioned 

[11] NaiveBayes Over 90% 7 

[12] NaiveBayes 93% Not mentioned 

[13] NaiveBayes 54~55% for all cases 3 

[14] Improved Caspnet Over 95% 12 

[15] ResNet 

self-developed CNN 

~97% 

~95.5% 

8 

[16] Random Forest 

LDA 

DNN 

95%, 42% for Scenario A, B 

98%, 76% for Scenario A, B 

69%, 74% for Scenario A, B 

3 

[17] ResNet50 

GoogleNet 

81.5% 

81.8% 

5 

[18] DCGAN +  base-line CNN  89% for self-collected dataset 

78% for ISCX dataset 

Not mentioned 

[19] CNN-SAE-CNN 

LSTM-SAE-NN 

CNN-LSTM-SAE-NN 

Stacked- CNN-LSTM-SAE-NN 

> 95% for all cases 4 

 



 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 131 

 

Although various parties have used different encryption methods and approaches, including 
internet service providers and web hosting companies, this circumstance allows a hacker to attack the 
network system anonymously. As a result, the importance of classifying network data streams in order 
to improve the quality and security of network systems is drawing an increasing amount of research 
interest. This work introduces a machine learning-based approach for determining the most 
appropriate training model for network traffic classification tasks, described in detail elsewhere. 

II. Approach 

Figure 1 depicts the processing chart of the proposed approach. The dataset applied for this work 
is taken from [20]. Before feeding into machine learning models, pre-processing is initially applied to 
meet some basic requirements, including normalization and data transformation. Then, the dataset is 
divided into two subsets: training and test set. Finally, different traditional machine learning models 
are applied to test different scenarios. From the comparative studies in the previous section, the 
traditional models almost give better performance than the advanced models, and it can be explained 
that different datasets may have a variety in size and latent properties, leading to the fact that deep 
learning techniques cannot perform well in some narrow size of the dataset. Thus, K-Nearest 
Neighbors (KNN), Artificial Neural Network (ANN), and Random Forest (RF) are chosen to apply in 
this study. 

A. Data Pre-processing 

Since the given dataset contained different types of features with various ranges, this leads to that 
pre-process step being applied to deal with the classification purpose of the proposed approach. 
Therefore, normalization is necessary to convert the numerical value to a similar scale without 
affecting the difference of value range. Therefore, scale normalization is applied by (1). 

' min( )

max( ) min( )

i
i

d d
d

d d

−
=

−
 (1) 

where d  is the feature vector, id  each element in the feature vector, and the corresponding 

normalized element. After this process, the feature would be in the range of 0 and 1. Besides, each 
class's label name, such as VPN-Mail and VPN-VOIP, needs to be converted into numeric values. 
Missing values in data can be caused by data corruption or a failure to record data which also 
influences classification performance. Since some machine learning algorithms are not able to work 
with missing values. The corresponding data element will be removed to prevent the impact of the 
training process to deal with this phenomenon. 

 

Fig. 1. The processing chart of the proposed algorithms 



132 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 

B. Machine Learning Models 

In general, machine learning is the process of seeking and describing structural patterns in a given 
data set. The output of the machine learning model will be a description of the learned knowledge 
which can be classification and regression. 

1) K-Nearest Neighbors (KNN) 

KNN is one of the most fundamental and simplest in the supervised machine learning algorithms, 
which operates by grouping all the samples having similar characteristics of the dataset [21]. Instead 
of learning from the training data, KNN mechanically memorizes all the data. Then, all the 
computational processes are conducted in the test phase, which means that every time a sample of the 
test dataset is input for classification, the algorithm computes the difference between the testing data 
point and the nearest ones. The predicted label is dependent on the label of the nearest data points 
having the minimum distance [22]. In addition, a voting process may be conducted in the case of many 
different labels in the data points. 

Let 
1 2

{ , ,..., }
n

X x x x=  is a sample and 
1 2
, ,...,

n
x x x  are the features of the sample. The majority 

rule specifies the classification procedure based on the number of k-nearest reference vectors from the 

projection of the sample X . An assumption of all samples in the data set corresponding to points that 

exist in an n-dimensional space denoted by 
n

  is conducted. Distance metrics define the distance 
between points in the mentioned spatial dimension. The formula for calculating the distance between 

samples 
i

X  and jX  is defined in (2). 

1/

1

( , )

p
n

p
i j

i j f f

f

d X X x x
=

 
= − 
 
  (2) 

where 
i

f
x  and 

j

f
x  are the corresponding value of the number of features f  of the data sample 

i
X  

and 
j

X , respectively. Next, the algorithm selects k  points corresponding to the number of samples 

in the training set with the closest distance to the sample at the input. The sample's label X  will be 
classified based on the number of classes of the above k  samples according to the rule of majority 
voting. 

2) Artificial Neural Network (ANN) 

ANN is a machine learning algorithm that simulates the biological neural activity of humans. This 
method consists of 3 main layers: input, hidden, and output. Each layer consists of many neurons 
which are connected to process information. Each neuron includes data inputs to receive and process 
to produce an output. In addition, the neuron output or the neuron processing result can be used as an 
input for other neurons. Independent values in the input are passed to the neural network node to 
produce dependent values in the output. The precondition is that those output values must correspond 
to the input data group as independent variables. 

Each input value 
i

x  is attached with corresponding weight 
i

w  and bias 
i

b , representing the 

importance of that input value at the neuron node compared to other input values. The computation 
takes the summation of all input data values with the weights and biases for each neuron. These 
weights are set randomly by default at the initial. During the training, the updated weights are 
computed through the optimization process. Then, an activation function is applied to map the input 
values of a neuron node to the output. The mathematical representation is defined in (3). 

1

K

i i i i

i

m w x b
=

= +  (3) 

where K  is the number of input values passing through a neuron. Therefore, after mapping the 
activation, (3) is adjusted to (4). 

1

( )
K

i i i i i

i

y f m f w x b
=

 
= = + 

 
  (4) 



 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 133 

 

3) Random Forest (RF) 

Random Forest, developed in the study [23], is the combination of multiple decision trees referred 
to as the bagging method. The typical decision tree model classifies the data samples in the training 
dataset based on their features. The training process starts from the root with the complete dataset, 
splitting into smaller samples at the different terminal or intermediate nodes based on the values of 
specific calculation metrics, such as Entropy or Gini index, of one respective feature. The Entropy is 
the parameter indicating the randomness of the analyzing feature, which decides how the model splits 
the data into subsets based on that respective feature. Then, based on the Entropy values, the model 
calculates the Information gain, determining how well the data were split. The decision tree mostly 
tries to maximize the Information Gain while keeping the Entropy value minimum. The formula for 
calculating the Entropy and information gain is illustrated in (5) and (6). 

2

1

log ( )
c

i i

i

E p p
=

= −  (5) 

1

( 1) ( , )
k

j

IG E t E j t
=

= − −  (6) 

where c  is the number of features, k  is the maximum number of subsets divided. The random forest 
utilizes different inputs with different features corresponding to each decision tree for predictions. 
Multiple prediction outcomes are made to classify the data samples. The final classification step of 
the random forest model is made based on the majority rule of the outcomes of those decision trees. 
Therefore, the increase in the number of decision trees during the RF model creation helps to increase 
the accuracy level in classification decisions while avoiding the heavy computation load in the 
hyperparameter tuning process. 

III. Results Analysis 

This section presents the dataset and scenarios description, the evaluation metrics, and the 
discussions of the obtained results from the three machine learning models, being RF, KNN, and 
ANN, respectively. 

A. Dataset Description 

In the scope of this research, the dataset VPN – Non-VPN (ISCXVPN2016) [20] is implemented 
for the training and testing phases. It was created during an experiment from the New Brunswick 
University, Canada, in which the dataset generator created two user accounts to participate in different 
Internet services such as Facebook, uTorrent, and Skype. 

Each class inside the dataset is also divided into two categories: Non-VPN and VPN encryption 
traffic classes. Therefore, the total number of labels for classification can be considered up to 14 
classes. The dataset nature corresponds to the training, and the testing scheme is divided into two 
steps. The first step is to classify the two general classes, Non-VPN and VPN encrypted traffic flow. 
Afterward, for each class, seven distinguished traffic flows are classified. The detailed process in the 
classification task is described in Figure 2. Besides the types of Internet traffic-based classification, 
the data also includes time-based division. Therefore, for each step of the classification process, the 
data is divided into four categories 15, 30, 60, and 120s. 

B. Evaluation Metrics 

In this study, the experiments are conducted on the Colab Pro-environment with 26 GB RAM and 
GPU NVIDIA Tesla P100. The applied dataset is separated into two subsets: training and test set 

 

Fig. 2. Dataset classification scenarios 



134 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 

followed the ratio 80/20, respectively. Besides, cross-validation is not applied in this case due to the 
large dataset. In machine learning and artificial intelligence, one of the most common evaluation 
means is the usage of the confusion matrix. The confusion matrix is often implemented to evaluate 
the performance of a supervised learning model and the level of confusion while classifying the 
classes. The confusion matrix consists of the main four parameters, which are true positive (TP), false 
positive (FP), true negative (TN), and false-negative (FN). 

The calculation from these four numbers could be implemented to examine the learning models 
through the frequently used evaluation metrics: accuracy, precision, Recall, and F1-score. Among the 
four metrics, the most famous indicator is the accuracy level, the number of samples classified 
correctly to their respective labels within the whole dataset. The formula to calculate the accuracy 
value is demonstrated in (7). 

TP TN
A

TP TN FP FN

+
=

+ + +
 (7) 

Even though the accuracy level is frequently used to obtain a basic understanding of the learning 
models, it is not recommended to neglect the number of falsely classified data samples into incorrect 
labels. Therefore, to perform a complete assessment over given learning models, the combination of 
other metrics is necessary. Precision is the number of samples data classified into a label belonging to 
that class. On the contrary, Recall is the number of samples accurately classified into a class over the 
total number of samples correctly and incorrectly classified into that respective class. 

Finally, the F1-score is the combination metric from both Precision and Recall values, in which it 
is only archived high performance by having high results in those two metrics. Through the analysis 
of F1-score, the assessment process will acquire a thorough evaluation of the efficiency of the learning 
model. The formulas to calculate the above values are described in (8), (9), and (10), respectively. 

Precision
TP

TP FP
=

+
 (8) 

Recall
TP

TP FN
=

+
 (9) 

2 Precision Recall
F1-score

Precision Recall

 
=

+
 (10) 

1) Scenario A1 

In scenario A1, the primary purpose of classification is to distinguish the Internet traffic flow into 
two categories, Non-VPN and VPN encrypted traffic usage. The dataset is divided into four subsets 

 

Fig. 3. Dataset classification scenarios 



 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 135 

 

of data samples with different amounts of time recorded, which are 15, 30, 60, and 120s. The 
evaluation metrics in all four subsets of data samples of the three machine learning models are 
recorded in Figure 3. 

At a glance, the results recorded from the RF model are the highest values, followed by the KNN 
and the lowest results from the ANN model. Throughout the evaluation metrics Accuracy, Precision, 
Recall, and F1-score, the RF model consistently produces numbers around the 88-94% range. The 
Recall value of the RF model in the 60s dataset is the only exception at 85%, which is in the higher 
range than other models. The ANN model is the least effective in classification, with the average range 
staying approximately at 77%. However, compared to the other two models, the ANN is the most 
balanced since all four metrics are almost the same throughout the different time-based datasets. In 
other words, the time feature does not affect the performance of the ANN model. On the other hand, 
the KNN model provides relatively high results, with most metrics being approximately 80-86%. The 
60s subset of data is the worst time-based samples toward this model, with 82%, 82.1%, 76.3%, and 
79.13% recorded for Accuracy, Precision, Recall, and F1-score, respectively. 

2) Scenario A2 – Non-VPN 

In contrast with the total domination of the RF model in scenario A1, the remarkable evaluation 
metrics of the Non-VPN subsets divide alternately by the ANN and RF model. To be more specific, 
the RF model scores the highest mostly in the Accuracy and Precision aspects, whereas the Recall and 
F1-score are greater in the ANN model than the other two, as indicated in Figure 4. In the accuracy 
metrics, all of the time-based subsets produce more than 90% results for the RF model, with only the 
120s dataset containing an exception of the KNN and RF sharing the same 92.8% value. For the 
Precision aspect, the upper range recorded in the RF model is around 86-88%. Another point to be 
noted is that, even though the ANN is not always the greatest model, the produced results are greater 
than 80%. 

On the other hand, the Recall and F1-score metrics mark a big drop in the performance of RF and 
KNN models. All KNN models fall below 70%, with the lowest value being the Recall in the 60s 
dataset of only 61%. The drop in evaluation metrics also appears in the RF model as the time feature 
increases in the time-based subset of samples. The highest values are in the 15s dataset, with Recall 
of 81.5% and F1-score is 84.8%. In the 120s dataset, these values fall to 66.1% and 73.5%, 
respectively. In contrast, the ANN model shows the most stable values, mostly greater than 80%. The 
only exception is in the 60s data set in which the values are lower, around 2% than the RF model. 

3) Scenario A2 – VPN 

In the case of the VPN encrypted subset, the performance of the learning models recorded a 
significant drop in the Precision, Recall, and F1-score except for the ANN model. The results are 
illustrated in Figure 5. The RF model still provides three over four highest values in the Accuracy 

 

Fig. 4. Recorded evaluation metrics for KNN, ANN, and RF models – Scenario A2 – Non-VPN 



136 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 

metric, all of which are larger than 86%. However, only in the 120s, the peak value belongs to the 
KNN model with 87.8% in classified accuracy. On the contrary, the ANN model displays total 
domination in the three remaining metrics. Most of the values in the 15, 30, and 60s dataset were 
recorded in the range of approximately 80.5% - 84%. The trend only reduces in the 120s dataset, with 
the Precision, Recall, and F1-score being 78.4%, 74.4%, and 76.3%, respectively. 

IV. Conclusions 

In this research, different machine learning models are recommended to classify multiple Internet 
traffic flows included in the open-access VPN – Non-VPN (ISCXVPN2016) dataset. The learning 
models include the Random Forest, the K-Nearest Neighbors, and the Artificial Neural Networks. The 
models are trained, then perform the classification task in two steps: the Non-VPN and VPN 
classification in scenario A1. Subsequently, the models classify each subset into seven different 
Internet traffic classes. Based on the obtained results, the Random Forest is the most suitable training 
model for this dataset, even though the classification results indicate that it is not accurate in the long-
time data samples, such as the results in the 120s subset. 

In future research, different datasets of more complex Internet traffic classification schemes and 
more effective yet suitable training models such as reinforcement learning models could be considered 
for further analysis. The ISCXVPN2016 dataset is well established with different categories and sub-
scenarios. However, new Internet and communication protocols and applications are emerging daily, 
corresponding to the rapidly increasing Internet usage rate all over the world. The encryption protocols 
are also developed to protect the user’s personal information and secure the Internet connection. 
Therefore, appropriate training models fitting in the purpose of the Internet flows classification, which 
is suitable for practical application and development, can be discovered and will be the main target 
for research in the field. 

Declarations  

Author contribution  

All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. 

Funding statement  

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.  

Conflict of interest  

The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence 
the work reported in this paper.  

 

Fig. 5. Recorded evaluation metrics for KNN, ANN, and RF models – Scenario A2 –VPN 



 K. Trang and A.H. Nguyen / Knowledge Engineering and Data Science 2021, 4 (2): 128–137 137 

 

Additional information  

Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. 

Publisher’s Note: Department of Electrical Engineering - Universitas Negeri Malang remains neutral with regard to 
jurisdictional claims and institutional affiliations. 

References 

[1] G.R. El Said, “How Did the COVID-19 Pandemic Affect Higher Education Learning Experience? An Empirical 
Investigation of Learners’ Academic Performance at a University in a Developing Country”, Advances in Human-
Computer Interaction, vol. 2021, pp. 1–10, Feb. 2021.    

[2] L. Yang, D. Holtz, S. Jaffe, S. Suri, S. Sinha, J. Weston, C. Joyce, N. Shah, K. Sherman, B. Hecht, and J. Teevan, “The 
effects of remote work on collaboration among information workers,” Nature Human Behaviour, Sep. 2021. 

[3] L. Stewart, G. Armitage, P. Branch, and S. Zander, "An Architecture for Automated Network Control of QoS over 
Consumer Broadband Links," TENCON 2005 - 2005 IEEE Region 10 Conference, pp. 1-6, November 2005. 

[4] T. Karagiannis, A. Broido, M. Faloutsos, and K. claffy, “Transport layer identification of P2P traffic,” Proceeding of 
the 4th ACM SIGCOMM conference on Internet measurement (IMC '04), New York, pp. 121–134, September 2004. 

[5] P. B. Park, Y. Won, J. Chung, M. Kim, and J. W.-K. Hong, “Fine-grained traffic classification based on functional 
separation,” International Journal of Network Management, vol. 23, no. 5, pp. 350–381, Aug. 2013. 

[6] G. Aceto, A. Dainotti, W. de Donato and A. Pescape, "PortLoad: Taking the Best of Two Worlds in Traffic 
Classification," 2010 INFOCOM IEEE Conference on Computer Communications Workshops, pp. 1-5, March 2010.  

[7] Z. Yuan and C. Wang, "An improved network traffic classification algorithm based on Hadoop decision tree," 2016 
IEEE International Conference of Online Analysis and Computing Science (ICOACS), pp. 53-56, May 2016. 

[8] M. Shafiq, X. Yu, A. A. Laghari, L. Yao, N. K. Karn and F. Abdessamia, "Network Traffic Classification techniques 
and comparative analysis using Machine Learning algorithms," 2016 2nd IEEE International Conference on Computer 
and Communications (ICCC), pp. 2451-2455, October 2016. 

[9] Z. Fan and R. Liu, "Investigation of machine learning based network traffic classification," 2017 International 
Symposium on Wireless Communication Systems (ISWCS), pp. 1-6, August 2017. 

[10] A. Pasyuk, E. Semenov and D. Tyuhtyaev, "Feature Selection in the Classification of Network Traffic Flows," 2019 
International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon), pp. 1-5, October 
2019. 

[11] Y. Wang, Y. Xiang and S. Yu, "Internet Traffic Classification Using Machine Learning: A Token-based Approach," 
2011 14th IEEE International Conference on Computational Science and Engineering, pp. 285-289, August 2011. 

[12] S. Dong and R. Jain, “Flow online identification method for the encrypted Skype,” in Journal of Network and Computer 
Applications, vol 132, pp. 75-85. 

[13] M. Dixit, R. Sharma, S. Shaikh and K. Muley, "Internet Traffic Detection using Naïve Bayes and K-Nearest Neighbors 
(KNN) algorithm," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1153-
1157, May 2019. 

[14] F. Zhang, Y. Wang and M. Ye, "Network Traffic Classification Method Based on Improved Capsule Neural Network," 
2018 14th International Conference on Computational Intelligence and Security (CIS), pp. 174-178, November 2018. 

[15] H. Lim, J. Kim, J. Heo, K. Kim, Y. Hong and Y. Han, "Packet-based Network Traffic Classification Using Deep 
Learning," 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 
046-05, February 2019. 

[16] J. Kwon, D. Jung and H. Park, "Traffic Data Classification using Machine Learning Algorithms in SDN Networks," 
2020 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1031-1033, 
October 2020.  

[17] Z. Li, Z. Qin, K. Huang, X. Yang, and S. Ye, “Intrusion Detection Using Convolutional Neural Networks for 
Representation Learning,” Lecture Notes in Computer Science, pp. 858–866, 2017. 

[18] A. S. Iliyasu and H. Deng, "Semi-Supervised Encrypted Traffic Classification with Deep Convolutional Generative 
Adversarial Networks," in IEEE Access, vol. 8, pp. 118-126, 2020. 

[19] G. D’Angelo and F. Palmieri, "Network traffic classification using deep convolutional recurrent autoencoder neural 
networks for spatial–temporal features extraction," Journal of Network and Computer Applications, vol. 173, pp. 
102890, 2021. 

[20] G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of Encrypted and VPN Traffic 
using Time-related Features,” Proceedings of the 2nd International Conference on Information Systems Security and 
Privacy (ICISSP2016), pp. 407-414, February 2016. 

[21] H. A. H. Ibrahim, O. R. Aqeel Al Zuobi, M. A. Al-Namari, G. Mohamed Ali, and A. A. A. Abdalla, "Internet traffic 
classification using machine learning approach: Datasets validation issues," 2016 Conference of Basic Sciences and 
Engineering Studies (SGCAC), pp. 158-166, February 2016. 

[22] A. Moldagulova and R. B. Sulaiman, "Using KNN algorithm for classification of textual documents," 2017 8th 
International Conference on Information Technology (ICIT), pp. 665-671, May 2017. 

[23] J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, Mar. 1986. 

http://journal2.um.ac.id/index.php/keds
https://doi.org/10.1155/2021/6649524
https://doi.org/10.1155/2021/6649524
https://doi.org/10.1155/2021/6649524
https://doi.org/10.1038/s41562-021-01196-4
https://doi.org/10.1038/s41562-021-01196-4
https://doi.org/10.1109/tencon.2005.301139
https://doi.org/10.1109/tencon.2005.301139
https://doi.org/10.1145/1028788.1028804
https://doi.org/10.1145/1028788.1028804
https://doi.org/10.1002/nem.1837
https://doi.org/10.1002/nem.1837
https://doi.org/10.1109/infcomw.2010.5466645
https://doi.org/10.1109/infcomw.2010.5466645
https://doi.org/10.1109/icoacs.2016.7563047
https://doi.org/10.1109/icoacs.2016.7563047
https://doi.org/10.1109/compcomm.2016.7925139
https://doi.org/10.1109/compcomm.2016.7925139
https://doi.org/10.1109/compcomm.2016.7925139
https://doi.org/10.1109/iswcs.2017.8108090
https://doi.org/10.1109/iswcs.2017.8108090
https://doi.org/10.1109/fareastcon.2019.8934169
https://doi.org/10.1109/fareastcon.2019.8934169
https://doi.org/10.1109/fareastcon.2019.8934169
https://doi.org/10.1109/cse.2011.58
https://doi.org/10.1109/cse.2011.58
https://doi.org/10.1016/j.jnca.2019.01.007
https://doi.org/10.1016/j.jnca.2019.01.007
https://doi.org/10.1109/iccs45141.2019.9065655
https://doi.org/10.1109/iccs45141.2019.9065655
https://doi.org/10.1109/iccs45141.2019.9065655
https://doi.org/10.1109/cis2018.2018.00045
https://doi.org/10.1109/cis2018.2018.00045
https://doi.org/10.1109/icaiic.2019.8669045
https://doi.org/10.1109/icaiic.2019.8669045
https://doi.org/10.1109/icaiic.2019.8669045
https://doi.org/10.1109/ictc49870.2020.9289174
https://doi.org/10.1109/ictc49870.2020.9289174
https://doi.org/10.1109/ictc49870.2020.9289174
https://doi.org/10.1007/978-3-319-70139-4_87
https://doi.org/10.1007/978-3-319-70139-4_87
https://doi.org/10.1109/access.2019.2962106
https://doi.org/10.1109/access.2019.2962106
https://doi.org/10.1016/j.jnca.2020.102890
https://doi.org/10.1016/j.jnca.2020.102890
https://doi.org/10.1016/j.jnca.2020.102890
https://doi.org/10.5220/0005740704070414
https://doi.org/10.5220/0005740704070414
https://doi.org/10.5220/0005740704070414
https://doi.org/10.1109/sgcac.2016.7458022
https://doi.org/10.1109/sgcac.2016.7458022
https://doi.org/10.1109/sgcac.2016.7458022
https://doi.org/10.1109/icitech.2017.8079924
https://doi.org/10.1109/icitech.2017.8079924
https://doi.org/10.1007/bf00116251

	I. Introduction
	II. Approach
	A. Data Pre-processing
	B. Machine Learning Models
	1) K-Nearest Neighbors (KNN)
	2) Artificial Neural Network (ANN)
	3) Random Forest (RF)


	III. Results Analysis
	A. Dataset Description
	B. Evaluation Metrics
	1) Scenario A1
	2) Scenario A2 – Non-VPN
	3) Scenario A2 – VPN


	IV. Conclusions
	Declarations
	Author contribution
	Funding statement
	Conflict of interest
	Additional information

	References
	[1] G.R. El Said, “How Did the COVID-19 Pandemic Affect Higher Education Learning Experience? An Empirical Investigation of Learners’ Academic Performance at a University in a Developing Country”, Advances in Human-Computer Interaction, vol. 2021, pp....
	[2] L. Yang, D. Holtz, S. Jaffe, S. Suri, S. Sinha, J. Weston, C. Joyce, N. Shah, K. Sherman, B. Hecht, and J. Teevan, “The effects of remote work on collaboration among information workers,” Nature Human Behaviour, Sep. 2021.
	[3] L. Stewart, G. Armitage, P. Branch, and S. Zander, "An Architecture for Automated Network Control of QoS over Consumer Broadband Links," TENCON 2005 - 2005 IEEE Region 10 Conference, pp. 1-6, November 2005.
	[4] T. Karagiannis, A. Broido, M. Faloutsos, and K. claffy, “Transport layer identification of P2P traffic,” Proceeding of the 4th ACM SIGCOMM conference on Internet measurement (IMC '04), New York, pp. 121–134, September 2004.
	[5] P. B. Park, Y. Won, J. Chung, M. Kim, and J. W.-K. Hong, “Fine-grained traffic classification based on functional separation,” International Journal of Network Management, vol. 23, no. 5, pp. 350–381, Aug. 2013.
	[6] G. Aceto, A. Dainotti, W. de Donato and A. Pescape, "PortLoad: Taking the Best of Two Worlds in Traffic Classification," 2010 INFOCOM IEEE Conference on Computer Communications Workshops, pp. 1-5, March 2010.
	[7] Z. Yuan and C. Wang, "An improved network traffic classification algorithm based on Hadoop decision tree," 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), pp. 53-56, May 2016.
	[8] M. Shafiq, X. Yu, A. A. Laghari, L. Yao, N. K. Karn and F. Abdessamia, "Network Traffic Classification techniques and comparative analysis using Machine Learning algorithms," 2016 2nd IEEE International Conference on Computer and Communications (I...
	[9] Z. Fan and R. Liu, "Investigation of machine learning based network traffic classification," 2017 International Symposium on Wireless Communication Systems (ISWCS), pp. 1-6, August 2017.
	[10] A. Pasyuk, E. Semenov and D. Tyuhtyaev, "Feature Selection in the Classification of Network Traffic Flows," 2019 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon), pp. 1-5, October 2019.
	[11] Y. Wang, Y. Xiang and S. Yu, "Internet Traffic Classification Using Machine Learning: A Token-based Approach," 2011 14th IEEE International Conference on Computational Science and Engineering, pp. 285-289, August 2011.
	[12] S. Dong and R. Jain, “Flow online identification method for the encrypted Skype,” in Journal of Network and Computer Applications, vol 132, pp. 75-85.
	[13] M. Dixit, R. Sharma, S. Shaikh and K. Muley, "Internet Traffic Detection using Naïve Bayes and K-Nearest Neighbors (KNN) algorithm," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1153-1157, May 2019.
	[14] F. Zhang, Y. Wang and M. Ye, "Network Traffic Classification Method Based on Improved Capsule Neural Network," 2018 14th International Conference on Computational Intelligence and Security (CIS), pp. 174-178, November 2018.
	[15] H. Lim, J. Kim, J. Heo, K. Kim, Y. Hong and Y. Han, "Packet-based Network Traffic Classification Using Deep Learning," 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 046-05, February 2019.
	[16] J. Kwon, D. Jung and H. Park, "Traffic Data Classification using Machine Learning Algorithms in SDN Networks," 2020 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1031-1033, October 2020.
	[17] Z. Li, Z. Qin, K. Huang, X. Yang, and S. Ye, “Intrusion Detection Using Convolutional Neural Networks for Representation Learning,” Lecture Notes in Computer Science, pp. 858–866, 2017.
	[18] A. S. Iliyasu and H. Deng, "Semi-Supervised Encrypted Traffic Classification with Deep Convolutional Generative Adversarial Networks," in IEEE Access, vol. 8, pp. 118-126, 2020.
	[19] G. D’Angelo and F. Palmieri, "Network traffic classification using deep convolutional recurrent autoencoder neural networks for spatial–temporal features extraction," Journal of Network and Computer Applications, vol. 173, pp. 102890, 2021.
	[20] G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of Encrypted and VPN Traffic using Time-related Features,” Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP201...
	[21] H. A. H. Ibrahim, O. R. Aqeel Al Zuobi, M. A. Al-Namari, G. Mohamed Ali, and A. A. A. Abdalla, "Internet traffic classification using machine learning approach: Datasets validation issues," 2016 Conference of Basic Sciences and Engineering Studie...
	[22] A. Moldagulova and R. B. Sulaiman, "Using KNN algorithm for classification of textual documents," 2017 8th International Conference on Information Technology (ICIT), pp. 665-671, May 2017.
	[23] J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, Mar. 1986.