INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
ISSN 1841-9836, 9(6):672-685, December, 2014.

Auto Adaptive Identification Algorithm Based on Network
Traffic Flow

S. Dong, X. Zhang, D. Zhou

Shi Dong*
1. School of Computer Science and Technology, Zhoukou Normal University
Zhoukou, 466001, China
2. School of Computer Science & Technology, Huazhong University of Science and Technology
Wuhan, 430074, China
*Corresponding author: njbsok@gmail.com

Xingang Zhang
School of Computer and Information Technology, Nanyang Normal University
Nanyang, 473061, China
zxg@nynu.edu.cn

Dingding Zhou
Department of Laboratory and Equipment Management, Zhoukou Normal University
Zhoukou, 466001, China
zdd@zknu.edu.cn

Abstract: Traffic identification is a key task for any Internet Service Provider
(ISP) or network administrator. Machine learning method is an important research
method on traffic identification, while impact of the asymmetry router on the traffic
identification is considered, so this paper analyzes the impact of asymmetry routing
on traffic identification, and proposes an effective method to decrease the impact,
and experimental results show the auto adaptive algorithm can improve the traffic
identification.
Keywords: Traffic identification, Internet Service Provider (ISP), Auto Adaptive
algorithm (AA), asymmetry routing.

1 Introduction

Traffic identification play an important in many fundamental network operations and main-
tenance activities to detect invade and malicious attacks forbid applications, bill on the content
of traffics and ensure quality of service. It increasingly becomes one of the most interesting
topics in network science and technology fields, especially in recent years. The current network
traffic identification methods roughly five categories: (1) port-based method; (2) based on deep
packet inspection (dpi) methods; (3) based on the network flow characteristic; (4) based on host
behavior [1]; (5) based on machine learning methods.

The machine learning methods are divided into supervised and unsupervised machine learn-
ing. These are the more classic identification method; of course, there is also individual QOS
quality of service features for identification [2]. Many share a naive assumption about the Inter-
net that traffic on a given link is approximately symmetric, meaning that both directions of a
conversation flow across the same physical link. Many developers even embed this assumption
in their traffic classification tools [3, 4]. In fact, except at network edges, Internet traffic is often
routed asymmetrically [5], which will impair or invalidate the results of tools and models that
assume otherwise. An important cause of this asymmetry is "hot-potato routing" [6], the busi-
ness practice of configuring traffic crossing one’s network to exit as soon as possible, minimizing
resource consumption, and thus cost, of one’s own infrastructure. Particularly common in com-
mercial settlement-free peering agreements, hot-potato routing implies that the network on the

Copyright © 2006-2014 by CCC Publications


Auto Adaptive Identification Algorithm Based on Network Traffic Flow 673

receiving side of a packet will bear higher cost per received packet. The underlying assumption
is that if both networks in a settlement-free peering agreement follow this practice, it will even
out, and both sides will share evenly in carrying traffic exchanged by their customers. Another
cause of asymmetric traffic is link redundancy, or alternative paths within networks. Since rout-
ing decisions occur independently for each packet, load-balancing algorithms may cause packets
destined to the same endpoint to follow different paths. Other traffic engineering techniques,
e.g., policy-based SPF (Shortest Path First), may also induce asymmetry in internal routing
state of large provider networks, through studying on asymmetric routing, we found it had some
impacts on traffic identification, and we propose auto adaptive (AA) method to improve traffic
identification. Experiments results show that the AA method can achieve better accuracy than
others.

The paper is structured as follows: Section 2 introduces related work of traffic identification;
Section 3 proposes AA algorithm and evaluation method; in Section 4, at last, we list the
proportion results which are classified by our identification algorithm, and analyze the impact of
ε on traffic identification; Section 5 concludes the paper.

2 Related work

The application identification problem has been changing due the efforts of two factors that
are in a continuous competition. On the one hand, the applications, and especially those that do
not want to be detected (e.g., P2P applications), in order to use the network resources without
control. On the other hand, a group of network operators, investigators and even ISPs who need
to know the traffic characteristics of their networks to manage the resources or even charge the
users depending on their consumption.

2.1 Research on traffic identification

It has become a hot research between domestic and foreign experts who take the traffic
identification as research direction, which proceed distinguish, QOS, intrusion detection, traffic
monitoring, billing and management. From the beginning of the study on port-based method,
this method is the use for marking and identifying the traffic type by fixed port which supplied by
the IANA, the other method is aim at P2P and some certain protocols, which adopt method based
on deep packet detection methods, but this method has defect that can’t get some encrypted
information and can’t get the new service type. Recently traffic identification has new method
with a number of new applications come out. With appearance of the new service, the method of
machine learning has been applied to the traffic identification. Identify fields on the flow, roughly
divided into three research directions: one is the feature selection algorithm [7, 8], the other is
identification algorithm [1,2,9], another is a category for different types of data sets, for example,
all packets can be divided into flows [10–14] that are sampling NETFLOW [15]. Complementary
information about related work in the field of traffic identification can be found in the survey of
traffic identification techniques using machine learning in [16], in the comparison of contemporary
classification methods in [13], the survey on Inter- net traffic identification in [17] and the research
review on traffic identification in [18]. A critical but constructive analysis of the field of Internet
traffic identification is proposed in [19], focusing on major obstacles to progress and suggestions
for overcoming them. Although some articles have been studied on the identification algorithm,
but the identification algorithm still exist some problems to be needed to solve, such as the
neural network identification algorithm is one point worthy of study. All previous research
studies in traffic identification either use insufficient network data, usually non-public, or use
very few/meaningless metrics for evaluation, making it impossible to compare results shown in


674 S. Dong, X. Zhang, D. Zhou

different papers [17]. In addition to features selection based on flow, especially the impact of
the size of packet traffic is always to be concerned. Therefore, in this article we propose AA
method, and we analyze different feature metric set (bidirection feature or unidirection feature)
cause different identification results.

2.2 Asymmetry routing

For a pair of hosts A and B, if the path from A to B (forward direction) is different from
the path from B to A (reverse direction), we say that the pair of paths between A and B exhibit
routing asymmetry. This scenario can be very common in the Internet core where asymmetric
routing is an usual practice [20, 21], this asymmetry in the Internet can appear on both as level
and router level paths. In fact, the path followed by packets exchanged between end points along
one direction can be different from the one followed by packets going in the opposite direction.
Recent reports suggest that asymmetrical routing might be moving closer to the edge of the
internet than one might expect. For example, the analysis presented in [22] argues that this
practice is nowadays quite common even in ISPs directly serving campus-wide networks.

2.3 Flow metric

Definition 1. The definition of flow metric, which is composed with traffic statistical feature
such as flow length, flow during etc. These features have high correlation with application type.
So considered as flow metric to classify traffic by machine learning. While nowadays there are
two kinds of flow metric, one is unidirectional flow metric, and the other is bidirectional flow.

Unidirectional flow metric

Uniflow (Unidirectional flow)(or one-way) within your network is most likely the result of an
incorrect configuration, but may also be symptomatic of a larger problem related to your overall
routing architecture. Since network communications are bi-directional in nature, unidirectional
traffic patterns on your network mean that the traffic flow in one direction is not following the
same path as the other. By design, the least cost route to a destination should also be the
desired return path. Uniclassifier (Unidirectional classifier) is classifier which use unidirectional
flow metric for training set. Where unidirectional flow metric is adopted as table 1 in this paper.

Bidirectional flow metric

Biflow(Bidirectional flow): A biflow is a Flow as defined in the IPFIX Protocol document
[RFC5101], composed of packets sent in both directions between two endpoints. A biflow is
composed from two uniflows such that:
1.the value of each Non-directional Key Field of each Uniflow (Unidirectional flow) is identical
to its counterpart in the other, and
2.the value of each Directional Key Field of each uniflow is identical to its reverse direction
counterpart in the other. Biclassifier(bidirectional classifier) is classifier which use bidirectional
flow metric for training set. Where bidirectional flow metric is adopted as table 2 in this paper.


Auto Adaptive Identification Algorithm Based on Network Traffic Flow 675

Table 1: unidirectional flow feature

Feature Feature Description

lport low port number
hport high port number
duration Flow duration
Transproto Stream transport protocol used (TCP / UDP)
TCPflags TCP header flag,transport layer protocol is UDP,the feature is 0
pps Packets/duration
bps bytes/duration
Mean packets arrived time duration/packets
tos TOS from NETFLOW
Mean packet length bytes/packets

Table 2: bidirectional flow feature

Feature Feature Description

lport low port number
hport high port number
duration Flow duration
Transprotocol Stream transport protocol used (TCP / UDP)
TCPflags1 TCP header flag,transport layer protocol is UDP,the feature is 0
TCPflags2 TCP header flag,transport layer protocol is UDP,the feature is 0
pps Packets/duration
bps bytes/duration
Mean packets arrived time duration/packets
Bidirectional Packets ratio Forward packets/ backward packets
Bidirectional Bytes ratio Forward bytes/ backward bytes
Bidirectional Packet length ratio Bidirectional packets length ratio
Bidirectional packets Forward packets + backward packets
Bidirectional bytes Forward bytes + backward bytes
tos Bidirectional TOS OR from NETFLOW
Mean packet length Bidirectional bytes/Bidirectional packets


676 S. Dong, X. Zhang, D. Zhou

3 Methodology

3.1 Auto Adaptive algorithm (AA)

In this paper, we propose an algorithm which can auto adjust the flow metric to adapt the
traffic identification. The algorithm is called auto adaptive algorithm(AA). The algorithm’s core
thought is that different traffic can select different classifier with different flow metric (unidirec-
tional flow or bidirectional flow).
Suppose there are n flow samples, each sample has p features, then construct the n*p flow matrix,
as follows:

A =




x11 x12 · · · x1p
...

. . .
...

xn1 xn2 · · · xnp


 (1)

When features number p of the samples are very large which enlarge dimensions of the sample,
theoretically, having more features should result in more discriminating power. However, prac-
tical experience with machine learning algorithms has shown that this is not always the case.
Many learning algorithms can be viewed as making an (biased) probability estimate of a set of
features with the class label. This is a complex, high dimensional distribution. Asymmetric rout-
ing existing will impact on the traffic identification. So we can consider to adopt auto adaptive
method to do with it. In order to depict the method, we have to introduce the H which represent
the threshold.

H =
Bidirection_flow_number

total_flow_number
(2)

Definition 2. Optimal threshold: which is used to evaluate the traffic accuracy, it is minimum
threshold. When the traffic accuracy is maximum. H is optimal threshold ε.

According to different H, and select H as optimal threshold to enable to obtain the best
traffic results, where H is random variable. When H < ε, it will choose unidirectional flow and
generate the unidirectional classifier, conversely, it will choose directional flow and generate the
directional classifier.

Algorithm AA presents the two kinds of flow metric. The sequence of steps that we show in
Figure 1. The procedure mainly set two kinds of dataset for training and testing data set. With
these data, we choose AA algorithm to train and test data. The process of machine learning
identification is shown in Figure 2:


Auto Adaptive Identification Algorithm Based on Network Traffic Flow 677

1.Collecting traffic(Input): Collecting network data from network traffic
2.Selecting traffic features and training data for building traffic classification model(Data Process-
ing): Optimal selecting the known traffic features through the traffic feature selection algorithms.
In this paper we only adopt two kinds of feature metric(unidirectional metrics and bidirectional
metrics), so extra feature selection method is not added. The traffic classification model is built
by training data.
3.Classified the traffic by machine learning algorithm (Output): Using the machine learning
identification algorithm to classify network traffic data and generate flow with label.

Figure 1: Traffic identification process of AA method

Figure 2: Process of Machine learning, traffic identification

3.2 Algorithm Evaluation

In this paper, we use the routine evaluation standard for verifying the effectiveness of our
identification algorithm. The effectiveness of the current flow identification algorithm has the


678 S. Dong, X. Zhang, D. Zhou

Table 3: NOC_SET dataset

AppID Application Protocal Flow number Proportion(%)

1 WWW HTTP 4943 64.6
2 Bulk FTP 39 0.5
3 Mail IMAP,POP3,SMTP 91 1.19
4 P2P BitTorrent,eDonkey,Gnutella,XunLei 1414 18.5
5 Service DNS,NTP 433 5.7
6 Interactive SSH, CVS, pcAnywhere 6 0.08
7 Multimedia RTSP,Real 20 0.3
8 Voice SIP,Skype 276 3.6
9 Others games, attacks 431 5.6

following three concepts evaluation criteria. And the concepts involved are as follows:
-TP (true positive): The flows of application A are classified as A correctly, which is a correct
result for the identification;
-FP (false positive): The flows not in A are misclassified as A. For example, a non-P2P flow is
misclassified as a P2P flow. FP will produce false warnings for the identification system;
-FN (false negative): The flows in A are misclassified as some other category. For example, a
true P2P flow is not identified as P2P. FN will result in identification accuracy loss.

The calculating methods are as follows:
1. Precision: The percentage of samples classified as A that are really in class A

Precision =
TP

TP + FP
(3)

2. Recall: The percentage of samples in class A that are correctly classified as A

Recall =
TP

TP + FN
(4)

3. Overall accuracy: The percentage of samples that are correctly classified

Overallaccuracy =

∑n
i=1 TPi∑n

i=1(TPi + FPi)
(5)

4 Experiment

4.1 Dataset

NOC_SET dataset

In order to validate the method and analyze the impact factor,we adopt NOC_SET as
dataset.as shown from table 3. We collected data at southeast university,and the collecting site
is a 10G backbone channel on Jiangsu Province border of CERNET. We adopt DPI method to
mark flow and generate NOC_SET dataset,and use ourself l7_filter_modify software to label
the flow.l7_filter_modify is developed based on L7filter [23], at last, we generate NOC_SET
dataset.


Auto Adaptive Identification Algorithm Based on Network Traffic Flow 679

LBNL_SET dataset

Table 4: LBNL_SET dataset

AppID Category flow number Proportion

1 80 15000 47.69%
2 110 1400 4.45%
3 25 1350 4.29%
4 139 3300 10.49%
5 993 400 1.27%
6 443 10000 31.8%

This LBNL_SET data is randomly sampled in several different periods from one node on the
internet. The LBNL traffic traces are collected at the Lawrence Berkeley National Laboratory
under the enterprise tracing project [24]. The packet traces are obtained at the two central
routers of the LBNL network and they contain more than one hundred hours of traffic generated
from several thousand internal hosts. The traffic traces are public, but they are completely
anonymized, so ascertaining the "ground truth" on the application behind each recorded flow
is not possible. Therefore, for this set, we built protocol sets according to the TCP destination
port number of each flow, an accepted practice in these cases [25]. We use the traffic traces
captured on January 6 and 7, 2005 to obtain the training and the optimization sets. Once again
we perform the training by using the most frequently used port numbers in the dataset. Detail
LBNL_SET dataset is shown in table 4.

CAIDA dataset

We built this data set starting from three hour long traces obtained by the Cooperative
Association for Internet Data Analysis (CAIDA) [26], and collect at the AMES Internet Ex-
change (AIX) along an OC48 link on Mar 24, 2011. We use flows extracted from the first hour
(corresponding to the interval 16:15-17:00 UTC) to build the training set the optimization set
and from the third hour (18:00-18:10 UTC) to buld the evaluation set. As for the previous set,
these traces are also anonymized, so port numbers are used as indicators of each protocol. The
selection of flows composing the training, optimization and evaluation sets.

Table 5: CAIDA_SET dataset

AppID Category flow number Flow(%) packets(%) bytes(%)

1 80 328091 84.69 81.74 81.58
2 110 11539 0.6 0.24 0.25
3 21 28567 3.32 0.03 0.09
4 25 2648 4.57 2.47 2.72
5 4662 2099 0.79 1.34 1.35


680 S. Dong, X. Zhang, D. Zhou

10

15

20

25

30

35

40

45

50

NJUCernet JSUCernet Caida−chicago Caida−sanjose
traffic

F
S

E
(%

)
Flows

(a) Flows

10

15

20

25

30

35

40

45

50

55

60

NJUCernet JSUCernet Caida−chicago Caida−sanjose
traffic

F
S

E
(%

)

Bytes

(b) Bytes

Figure 3: Comparison of FSEs for traffic

Table 6: the identification Overall accuracy rate AA, Biclassifier, Uniclassifier

Identification Overall accuracy

AA 99.6742%
Biclassifier 88.2%
Uniclassifier 89.2%

4.2 Impact of asymmetry router on traffic identification:

In this paper, we adopt experimental data based on the NOC-SET data set and CAIDA datas
set, use MATLAB tools, WEKA tools and the corresponding algorithm to identify network traffic
data [27]. NOC-SET data firstly divided into two test data were 20% and 80% of the test data,
and we compared our method that is AA with Biclassifier and Uniclassifier. In order to evaluate
and analyze effectiveness of the method about AA. We study traffic identification distribution.
In order to analyze asymmetry router, firstly we should remove from the traces any traffic that
is inherently asymmetric, such as UDP and ICMP that do not always expect packet recipients
to reply, and which would mislead symmetry comparisons if they appear in different magnitudes
across networks. TCP background radiation, such as network scanning and probing, can also be
a substantial fraction of total inherently asymmetric flows on some links, although it is usually a
much lower proportion of bits. We adopt Flow-based Symmetry Estimator(FSE) [28] to evaluate
impact degree on traffic, which is a simple method estimate the level of routing symmetry from
passively measured flow data. From Figure 3 and Figure 4 we can see different traffic have
different FSE, and CAIDA traffic is less. It indicated asymmetry router of CAIDA traffic were
more obvious than NOC-SET.
From Table 6 we can see that overall accuracy of AA method traffic is better than biclassifier and
uniclassifier,we adopt AA method to classify traffic based NOC-SET data,and select parameter
ε=0.5(detailed analysis shown in session F). The data is divided into 9 categories, respectively,
WWW, Mail, Bulk, Service, P2P, Interactive, Voice, Multimedia, Others

Table 6 indicates the AA algorithm achieved better result than Biclassifier and Uniclassifier
method, moreover. P2P can be seen from Table 7 and the voice of the precision and the recall
has greatly improved. The reason for high accuracy is that the proportion of P2P and voice


Auto Adaptive Identification Algorithm Based on Network Traffic Flow 681

Table 7: Identification performance for NOC_SET(Precision and Recall)

Category
Algorithm

biclassifier uniclassifier AA
Precision Recall Precision Recall Precosiin Recall

WWW 98% 100% 99% 100% 98.5% 99.2%
P2P 58% 100% 75% 100% 93.7% 91.2%
Mail 83% 91.3% 90% 99% 100% 100%
Service 58.90% 100% 70% 99% 90% 90.4%
Inter 84.5% 100% 87% 100% 80% 100%
Multimedia 100% 75% 90% 80% 60% 100%
Voice 35% 50% 45% 55% 37% 50%
Others 44% 46% 48% 77% 45% 60%

account for set of the total is relatively small,the impact of the identification results reduce to
a minimum due to the collection of the specimen Caused by imbalance in the ratio.This paper
also build NOC_SET dataset which is constructed by bidirectional flow characteristic.

4.3 Comparison of identification algorithm with NOC-SET dataset

Experimental data for the NOC_SET data set (Table 3 as fellows) The analysis data are
actual measured IP trace [29], while the traffic flow exits about 40% biflow. NOC_SET dataset is
composed by biflow feature.biflow have more information for traffic identification.if use biclassifier
to classify the traffic, then the identification result will be improved. In this section, we compare
AA algorithm with biclassifier and uniclassifier. Traffic identification result is shown in Table
7. As shown in Table 7, identification result indicates that AA could achieve better accuracy
compared with Biclassifier and Uniclassifier.But observing from Inter and Service, identification
accuracy of AA is lower than the other method. From Service to Inter types, precision of
biclassifier and uniclassifier method is reduced, while the AA is in increments, so that biclassifier
and uniclassifier method is easily affected by the number of training samples, while the AA is not
vulnerable to the impact of the training Sample dataset. Among three identification algorithm
AA, biclassifier and uniclassifier, the overall accuracy of the AA algorithm is highest.

4.4 Comparison of identification algorithm with CAIDA_SET dataset

The data set used in experimental platform: Experimental data for the CAIDA_SET data
set (Table 5 as fellows). The analysis data are actual measured IP trace [29]. The two core links
are part of an OC192 Tier1 backbone operated by a commercial ISP in the U.S. The first link
connects Chicago and Seattle, monitored at an Equinix data center in Chicago. The other one
connects San Jose and Los Angeles, monitored at a datacenter in San Jose. On those links, TCP
is responsible for about 50% of flows, which was 85% of packets and 93% of bytes on average.UDP
carried about 45% of flows (13% of packets and 6% of bytes). We adopted port-based method
to mark Flow and generated CAIDA_SET dataset.while the traffic flow exits about 10% biflow.
CAIDA_SET dataset is composed by uniflow feature. Biflow have more information for traffic
identification. If use biclassifier to classify the traffic, then the identification result will be
improved. In this section, we compare AA algorithm with biclassifier and uniclassifier. Traffic
identification result is showed in Table 8.


682 S. Dong, X. Zhang, D. Zhou

Table 8: Identification performance for CAIDA_SET(Precision and Recall)

Category
Algorithm

biclassifier uniclassifier AA
Precision Recall Precision Recall Precision Recall

80 92% 98% 98% 97% 96.5% 98.2%
110 63% 97% 83% 99% 95.7% 92.2%
21 82% 88.3% 92% 98% 99% 99%
25 60.80% 99% 72% 98% 92% 92.4%
4662 82.4% 99% 89% 98% 82.9% 99.2%
Overall
Accuracy

65.72% 94.1342% 95.8921%

Table 9: Identification performance for LBNL_SET(Precision and Recall)

Category
Algorithm

biclassifier uniclassifier AA
Precision Recall Precision Recall Precision Recall

80 96% 98% 97% 93% 96.5% 98.2%
110 78% 90% 85% 90% 92.5% 83.2%
25 88% 82.7% 89% 87% 97% 99%
139 59.80% 98% 78% 92% 93% 91.6%
993 86.5% 99% 79% 99% 87% 99%
443 88.5% 99% 89% 99% 84% 99%
Overall
Accuracy

68.83% 93.237% 95.861%

As shown in Table 8, identification result indicates that AA could achieve better accuracy
compared with biclassifier and uniclassifier. According to analysis of 4.4 section on traffic result,
we can see CAIDA exists the same phenomena which is unbalance sample data. So that biclas-
sifier and uniclassifier method is easily affected by the number of training samples, while the
AA is not vulnerable to the impact of the training Sample dataset. Among three identification
algorithm AA, biclassifier and uniclassifier, the overall accuracy of the AA algorithm is highest.

4.5 Comparison of identification algorithm with LBNL_SET dataset

We obtained LBNL data from the Lawrence Berkeley National Laboratory, and construct
the bidirectional and unidirectional flow metric. We respectively train the two metrics and
generate biclassifier and uniclassifier. We compute H value the formula 2 in section 3, and adopt
AA method to select classifier which is uniclassifier or biclassifier. The experimental results is
shown in table 9. From the results we can see uniclassifier and uniclassifier method is affected
by unbalance sample data, while AA method can overcome the problem and improve traffic
identification results.


Auto Adaptive Identification Algorithm Based on Network Traffic Flow 683

4.6 Impact of ε on traffic identification

In this paper we propose AA method to auto adaptive select classifier(biclassifier or uniclas-
sifier), while threshold ε is a parameter of AA method.ε decide classifiers which were selected, so
it is very important for traffic identification. In this section, we will analyze the impact of ε on
traffic identification. Detailed experiment method is adopting AA method proposed by varying
from ε[0.1,1] based on three dataset(NOC_SET, CAIDA, LBNL_SET). From Figure 4 we can

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.8 1
0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

ε

O
ve

ra
ll 

A
cc

u
ra

cy

 
NOC−SET
CAIDA
LBNL

Figure 4: The identification results with ε

see overall accuracy of CAIDA and NOC_SET have biggest change happened when ε vary from
0.1 to 1. Overall accuracy of CAIDA shows an increasing tendency, while NOC_SET is de-
scending. The possible reasons why is that CERNET network contain more symmetry routing,
while asymmetry routing is less. Collection point of CAIDA data exist more asymmetry routing.
Thus when threshold ε is very small, more opportunity will be selected by biclassifier. Just as
mentioned that collection point of NOC_SET is CERNET network containing more symmetry
routing, which will have more bidirectional flow metrics, so NOC_SET showed an descending
tendency and when ε =0, overall accuracy is maximum.ε =0.5, overall accuracy of CAIDA and
NOC_SET is equal. LBNL have not obvious asymmetry routing. So overall accuracy is gentle.

5 Conclusion

In this paper we propose auto adaptive algorithm, and on this basis, the introduction of biclas-
sifier and uniclassifier, and adopt the improved AA method to classify traffic for MOORE_SET
as data set, moreover, compare with two other methods which is the biclassifier and uniclassifier
method, the results show that, AA method are greatly improved on identification accuracy, to
further prove AA method is effective, this paper collect the data in Jiangsu provincial network
border and organize trace into flow record such as data sets NOC_SET, the experimental results
show that: AA method has high identification accuracy,and we analyze the impact of ε on traffic
identification and find ε=0.5 which can be considered as the fixed value, traffic results will be
better.


684 S. Dong, X. Zhang, D. Zhou

Acknowledgments

This paper is supported by Education Department of Henan Province Science and Technol-
ogy Key Project Funding (14A520065) and Research Innovation of Zhoukou Normal University
(zknuA201408).

Bibliography

[1] T. Karagiannis, K. Papagiannaki, M. Faloutsos (2005); Blinc: multilevel traffic classification
in the dark, in: ACM SIGCOMM Computer Communication Review, ACM, 35: 229–240,
DOI:10.1145/1080091.1080119.

[2] A. Moore, K. Papagiannaki (2005); Toward the accurate identification of network applica-
tions, PAM’05 Proceedings of the 6th international conference on Passive and Active Network
Measurement, 41–54.

[3] A. Moore, D. Zuev (2005); Internet traffic classification using bayesian analysis tech-
niques, in: ACM SIGMETRICS Performance Evaluation Review, ACM, 33:50–60,
DOI:10.1145/1064212.1064220.

[4] L. Bernaille, R. Teixeira, K. Salamatian (2006), Early application identification, in: Proceed-
ings of the 2006 ACM CoNEXT conference, ACM, DOI:10.1145/1368436.1368445.

[5] Wolfgang John, Sven Tafvelin (2007); Differences between in- and outbound internet back-
bone traffic, in: Proceedings of Terena Networking Conference, TERENA, 1-14.

[6] Hotpotatorouting, http://en.wikipedia.org/wiki/Hot-potato_routing.

[7] N. Williams, S. Zander, G. Armitage, Evaluating machine learning algorithms for automated
network application identification, Center for Advanced Internet Architectures, CAIA, Tech-
nical Report 060410B, DOI:10.1.1.84.7170.

[8] N. Williams, S. Zander, G. Armitage (2006), A preliminary performance comparison of five
machine learning algorithms for practical ip traffic flow classification, ACM SIGCOMM Com-
puter Communication Review 36(5):5–16, DOI: 10.1145/1163593.1163596.

[9] Z. Li, R. Yuan, X. Guan (2007), Accurate classification of the internet traffic based on the
svm method, in: Communications, 2007. ICC’07. IEEE International Conference on, IEEE,
,1373–1378, DOI: 10.1109/ICC.2007.231.

[10] P. Teufl, U. Payer, M. Amling, M. Godec, S. Ruff, G. Scheikl, G. Walzl (2008), Infect-network
traffic classification, in:Networking, 2008. ICN 2008. Seventh International Conference on,
IEEE, 439–444, DOI: 10.1109/ICN.2008.42.

[11] T. Kiziloren, E. Germen (2007), Network traffic classification with self organizing maps,
in: Computer and information sciences, 2007. iscis 2007. 22nd international symposium on,
IEEE, 1–5, DOI: 10.1109/ISCIS.2007.4456852.

[12] Y. Lim, H. Kim, J. Jeong, C. Kim, T. Kwon, Y. Choi (2010), Internet traffic classification
demystified: on the sources of the discriminative power, in: Proceedings of the 6th Interna-
tional COnference, ACM, DOI: 10.1145/1921168.1921180.


Auto Adaptive Identification Algorithm Based on Network Traffic Flow 685

[13] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, K. Lee (2008); Internet traffic
classification demystified: myths, caveats, and the best practices, in:Proceedings of the 2008
ACM CoNEXT conference, ACM, DOI: 10.1145/1544012.1544023.

[14] J. Erman, M. Arlitt, A. Mahanti (2006), Traffic classification using clustering algorithms,
in: Proceedings of the 2006 SIGCOMM workshop on Mining network data, ACM, 281–286,
DOI: 10.1145/1162678.1162679.

[15] V. Carela-Espanol, P. Barlet-Ros, J. Solé-Pareta (2009), Traffic classification with sampled
netflow, DOI:10.1.1.390.5780.

[16] T. Nguyen, G. Armitage (2008), A survey of techniques for internet traffic classification
using machine learning, Communications Surveys & Tutorials, IEEE, 10(4):56–76.

[17] A. Callado, C. Kamienski, G. Szabó, B. Gero, J. Kelner, S. Fernandes, D. Sadok (2009),
A survey on internet traffic identification, Communications Surveys & Tutorials, IEEE,
11(3):37–52.

[18] M. Zhang, W. John, K. Claffy, N. Brownlee (2009), State of the art in traffic classifica-
tion: A research review, in:PAM ’09: 10th International Conference on Passive and Active
Measurement, Student Workshop, Seoul, Korea.

[19] A. Dainotti, A. Pescape, K. Claffy (2012), Issues and future directions in traffic classification,
Network, IEEE, 26(1):35–40.

[20] Z. Mao, L. Qiu, J. Wang, Y. Zhang (2005), On as-level path inference, in: ACM SIGMET-
RICS Performance Evaluation Review, ACM, 33:339–349.

[21] Y. He, M. Faloutsos, S. Krishnamurthy (2004), Quantifying routing asymmetry in the in-
ternet at the as level, in: Global Telecommunications Conference, GLOBECOM’04. IEEE, 3:
1474–1479.

[22] W. John (2008), On measurement and analysis of internet backbone traffic, Thesis for the
degree of Licentiate of Engineering, a Swedish degree between M.Sc. and Ph.D., Chalmers
University of Technology.

[23] J. Levandoski, E. Sommer, M. Strait, et al.(2008), Application layer packet classifier for
linux, http://l7-filter.sourceforge.net/.

[24] *** Lbnl/icsi enterprise tracing project, http://www.icir.org/enterprisetracing.

[25] T. Karagiannis, A. Broido, M. Faloutsos, et al. (2004), Transport layer identification of
p2p traffic, in: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement,
ACM, 121–134, DOI: 10.1145/1028788.1028804.

[26] *** The cooperative association for internet data analysis(caida), http://www.caida.org.

[27] T. Nguyen, G. Armitage (2006), Training on multiple sub-flows to optimise the use of ma-
chine learning classifiers in real-world ip networks, in: Local Computer Networks, Proceedings
2006 31st IEEE Conference on, IEEE, 369–376, DOI: 10.1109/LCN.2006.322122.

[28] W. John, M. Dusi, K. Claffy (2010), Estimating routing symmetry on single links by passive
flow measurements, in: Proceedings of the 6th International Wireless Communications and
Mobile Computing Conference, ACM, , 473–478, DOI: 10.1145/1815396.1815506.

[29] *** IP Trace Distribution System, http://iptas.edu.cn/src/system.php.