International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol  17 No  04 (2023)


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

Divergence Based Feature Selection for Pattern 
Recognizing of the Performance of Intrusion Detection in 

Mobile Communications Merged with the Computer 
Communication Networks  

https://doi.org/10.3991/ijim.v17i04.37733  

N. Chitra(), Safinaz. S, K. Bhanu Rekha 
Department of Electronics and Communication Engineering, Presidency University, Bangalore, 

India 
chitravadde24@gmail.com 

Abstract—The Feature Selection concept is the procedure where in the data 
is simplified removing the irrelevant features. Divergence method is another 
strategy of method where in the relation among the attributes and class is 
measured to understand their contribution to performance. Nowadays the 
mobile network is integrated with the other networks like the computer 
networks transmitting all kinds of data leading to attacks in networks known as 
intrusion in computer networks equally applicable to mobile communications. 
So in this paper the intrusion detection method involving the other other mode 
of communication is considered in mobile interaction network. The proposed 
algorithm performs feature selection using the divergence evaluation method to 
reduce the feature set. A 10% KDDCUP99 data set was used for the evaluation 
of the proposed algorithm, and performance metrics were evaluated using the 
C4.5 classifier. The metrics TPR, FPR and consistency were compared with the 
mutual information based DMIFSA, RMIFSA, MMIFSA methods and the 
proposed method is implemented on Python 3.8 that proved to achieve the high 
accuracy of 99.94% as compared to other methods and also reduce the 
redundant features. The consistency in Accuracy is maintained almost from 4 
features to 10 features in proposed method as compared to other methods that 
indicates the stability of the system is achieved.  

Keywords—divergence, feature selection, C4.5 classifier, mobile 
communications, TPR (True Positive Rate), FPR (False positive Rate), 
accuracy, stability 

1 Introduction  

The communication is done by merging the different platforms of transmission into 
one with compatabilty methods. Different forms like Voice data video data remote 
sensor information and sensor security are transmitted to targets using vast 
infrastructures. But there are situations where some unwanted persons begin to access 
the data or steal for their personal benefits. Strangers accessing the data is termed as 

iJIM ‒ Vol. 17, No. 04, 2023 75

https://doi.org/10.3991/ijim.v17i04.37733


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

illegal accessing which will take control of the network alarming and stealing of 
personal data which causes harm to society. To detect the anomalous behavior in the 
network, the certain characteristics leading to disruption in the communication were 
identified which is known as attacking features. Identification of those features is 
done using various methods like mutual information method, entropy, gain ratio etc. 
Various Methods of implementation of feature selection are summarised as follows. 
They are filter, embedded and wrapper methods used for best selection of features. 
Each feature in filter method is computed individually for its relation with the class 
and the better one [1] is selected and it requires the lesser time of evaluation. It does 
not require the algorithms that are already predefined in its functioning in terms of 
machine learning technology. Feature selection is done in which no other machine 
learning algorithm used except its computation and evaluation of metrics and 
comparison of them. This method is more efficient and the speed of [2] computation 
of metrics is high. The feature selection method is applied to the model and the 
system performance is improved by selecting the good subset of features [3] in 
embedded method. Sequential forward selection and genetic algorithms [1] are 
common wrapper-based feature selection techniques. This type of technique, 
however, carries a greater risk of over fitting and is very computationally costly, 
necessitating a lengthy training period. The decision of the features selected is directly 
impacted by the sequence in which subsets are entered into wrapper methods. The 
wrapper chooses the features that we attempt to incorporate into the model using the 
learning method [4]. Forward selection is used to choose the features:- The set is 
initially null and is then arbitrarily chosen by the minimal estimation value repeatedly 
until the set of chosen characteristics is met by the substantial evaluation criteria level 
[5]. Reverse elimination: The collection initially contains all of the characteristics but 
during the selection process. The features with the greatest evaluation value are 
eliminated as being unnecessary in steps or a two-way elimination. This method is 
similar to forward selection, except before adding a new feature, it looks for features 
that have already been selected. If this happens, the insignificant feature is then 
removed via the backward elimination process. The mRMR [6] using MI based 
algorithm performance degrades if the attributes are more and the features of high 
correlation are selected. Algorithms for supervised feature selection take into by 
assessing the features' significance in connection to the class data, whereas 
unsupervised in assessing the value of features, may take advantage of data variance 
or data distribution significance without labels. Algorithms for semisupervised feature 
selection [3] make use of a modest quantity of labelled data for enhancing the 
performance of the unsupervised learning methods. 

The detailed process of the feature selection done in the network attacks to identify 
the culprit is shown in Figure 1. Subset generation: Subset generation is the process of 
finding well-performing subsets of features and sending them as input to the scoring 
process. The feature selection process begins with subset generation and involves 
three different methods. Forward selection: With this method, the subset is considered 
empty, and if candidate features are selected, they are added one by one for each 
feature selected. Backward selection: Subsetting takes the entire dataset as input and 
removes them one by one after finding irrelevant ones. Random Selection: A subset 

76 http://www.i-jim.org


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

of features is randomly generated from the data set, and features are systematically 
removed or included one after the other. Evaluation method: This method uses a 
stopping or evaluation function to find the association between features and classes. 
Here we compare the evaluated value with the optimal value set before the evaluation. 
Validation Procedure: Validation procedure applies an evaluation procedure to 
classify the selected characteristics. The classified part has a training part and a 
testing part. We typically use 70% of the data as the training part and 30% of the data 
as the testing part. The selected features are the predicted features compared to the 
original dataset's classification results. The better the evaluation function, the better 
the evaluation part. However, verification does not fall under the process in the 
functional section. It is important to demonstrate the effectiveness of the selected 
features. 

 
Fig. 1. Feature selection process 

2 Literature review  

To extract the extent of data relevance, the authors Hwawen et al., proposed the 
concept of information content of functions and classes and related them to each other 
and that is called mutual information in the dynamic mutual information feature 
selection algorithm [DMIFS] [7]. Data relevance contributes to correlations between 
data and classes. The first subset of features is based on mutual information between 
classes and features. Highest values are considered repeatedly and the final features 

iJIM ‒ Vol. 17, No. 04, 2023 77


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

are evaluated using the evaluation function that is the stopping condition. 
Relationships between features are of higher importance, the best features are 
selected, and the weight of feature-class interactions is very important. Mutual 
information relates variables and indicates the uncertainty of their occurrence. Let X 
and Y be random variables related to discrete or continuous event probabilities. 

A random variable contains samples X, Y that are said to be two random variables 
which is related to probability of continuous or discrete occurrences.  

where X=�𝑥𝑥1, 𝑥𝑥2,𝑥𝑥3, 𝑥𝑥4 ⋯⋯⋯𝑥𝑥𝑁𝑁� and Y=(𝑦𝑦1, 𝑦𝑦2, 𝑦𝑦3, 𝑦𝑦4 ⋯⋯𝑦𝑦𝑁𝑁)Then the entropy 
of the quantities is given by equations (1) and (2) that gives the average information 
between them. 

 H(X) = −∑ p(xi) logP(xi)Ni=1   (1) 

 H(Y) = −∑ P(yi)logNi=1 P(yi) (2) 

Where P(xi) = possible number of instants of xi /Total number of instants (N) 
Where P(yi) = possible number of instants of yi /Total number of instants (N).  

 H(X|Y) = ∑ ∑ P(x, y) log P(x|y)x∈x,y∈yN i=1  (3) 

Where P(X,Y) is the joint distribution function of x and y 
H(X|Y) is the joint entropy of X and Y variables and 
P(X|Y) is conditional probability of X with Y is given. 
In Modified Mutual Information (MMIFSA) [8] the second features are selected by 

the product of Mutual information of Class and features with Mutual information of 
input features and selected features. The MMIFSA is further modified and the second 
features are done by normalising the value of MI between candidate features and 
selected features with the entropy of features of a dataset. Too many or too few 
attributes in the dataset will make evaluation difficult and degrade system 
performance. Therefore, the problem [9] of a rise in gain ratio than mutual 
information that lowers system performance is solved by mRMR (minimal 
redundancy and maximum relevance). A better way to quantify relevance and 
redundancy for by choosing mRMR features. The best features were selected using 
Pearson correlation coefficient to measure the redundancy and the relevance is 
measured using 'R' value. The mutual information concept is used for the calculation 
of pearson Coefficient and 'R' Value. For the evaluation of MI which requires 
categorical values. But if the data is continued to convert discretisation, the data is 
lost. But the 'R' value used does not require discretisation and is more beneficial than 
the mutual information. 

In Maximum relevance and Minimum Redundancy feature selection methods for a 
marketing machine learning platform for product available online and for marketing 
the number of features are available for application to machine learning techniques. 
To reduce [2] the features belonging to unwanted group, the concept called feature 
selection is applied. FDD (F-test correlation difference) is used to find the correlation 
and to find the redundancy factor and FCQ (F- correlation factor) scheme is used. 
Randomised dependance coefficient (RDC) is the one in which correlation measure is 

78 http://www.i-jim.org


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

replaced and the relevance factor is greatly improved. In [10, 11] the features are 
selected using the MIFS-ND (mutual information feature selection Non-dominated) 
the mutual information is used for minimal redundancy method. The authors in[12] 
has selected the best optimal genes in which the genes of same type are grouped or 
clustered and later by using the selecting approach the highly correlated genes are 
selected. In recognizing the best performing attributes there are different evaluating 
methods used earlier like Mutual information, gain ratio, distance based methods [13] 
and to expectation is used to evaluate the best feature. Jin XU et al., proposed the 
RRPC (mRMR) an incremental search method for optimal selection of features. 
Pearson Correlation Coefficient is used to find the best performing features. Three 
datasets Corral, Corral-4r and Corral-46 and compared with mRMR and Fischer 
method like semi supervised FS method like Select and unsupervised method like 
Laplacian method. In Corral-47 there are 12 features which contributed out of 47 
features. 

Hanchuan Peng et.al [14] derived the equation form for the minimal redundancy 
and maximum relevance method and implemented two stage Feature Selection 
algorithm and used Naive Bayes, SVM and LDA classifiers. Minimum redundancy is 
evaluated by average mutual Information between features. Maximum Relevance is 
evaluated by the mutual information between feature and target class.  

minimum redundancy: min R(s),  

 R= 1
𝑆𝑆2
∑ 𝐼𝐼(𝑥𝑥𝑖𝑖𝑥𝑥𝑖𝑖𝑥𝑥𝑗𝑗 ∈𝑆𝑆 , 𝑥𝑥𝑗𝑗)  (4) 

maxφ(D,R), φ=D-R 

 𝑚𝑚𝑚𝑚𝑥𝑥𝑥𝑥𝑗𝑗∈𝑋𝑋−𝑆𝑆𝑚𝑚−1 �𝐼𝐼�𝑥𝑥𝑗𝑗; 𝑌𝑌𝐿𝐿� −
1

𝑚𝑚−1
∑ 𝐼𝐼�𝑥𝑥𝑗𝑗; 𝑥𝑥𝑖𝑖�𝑥𝑥𝑖𝑖∈𝑆𝑆𝑚𝑚−1 �  (5) 

Where Fs is the selected feature and Fi is the feature instant. Fs is the subset 
containing selected features. 

In the mRMR we have (k-1) features from Fk and for obtaining next features from 
Fk-1 subsets with φ(D,R) is: 

F𝑘𝑘=𝑚𝑚𝑎𝑎𝑎𝑎min F𝑗𝑗∈𝐹𝐹−𝐹𝐹𝐾𝐾−1 �∑ 𝐷𝐷𝐾𝐾𝐿𝐿(𝐹𝐹𝑗𝑗, 𝐹𝐹𝑠𝑠)𝑗𝑗=1 𝑡𝑡𝑡𝑡 𝑛𝑛 −
1

𝐾𝐾−1
∑ 𝐷𝐷𝐾𝐾𝐿𝐿𝐹𝐹𝑖𝑖∈𝐹𝐹𝐾𝐾−1  (𝐹𝐹𝑗𝑗, 𝐹𝐹𝑖𝑖)� (6) 

Where fs is selected features with minimum of divergence with class. 
The contribution of the Paper: 

1. Feature selection method in which the relation between features are evaluated 
using the concept of Divergence between features is proposed. 

2. Then the divergence parameter is proposed which can be of use to find the best 
parameters. 

3. The suggested method is compared in terms of TPR, FPR and accuracy with other 
methods. 

4. Performance metrics are evaluated and the graphical representation of the metrics 
is performed of all the methods are made. 

iJIM ‒ Vol. 17, No. 04, 2023 79


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

3 Proposed method 

3.1 Formulation of method 

In the proposed method the divergence method is used which depict the relation 
between the feature attributes in terms of divergence value. In the correlation method, 
the characteristics are closely related if the correlation between the characteristics is 
maxima [15] and if it is less, this means that it is less related. In the divergence 
approach, features are considered to be closely connected if there is no divergence 
between them, and least related if there is maximum divergence. 

Let P and Q are the variables and the relation between them is given in terms of 
Kull Back Leibleir divergence as:  

 (P,Q)=P(x)*log𝑃𝑃(𝑥𝑥)
𝑄𝑄(𝑥𝑥)

 (7) 

Based on the supervised learning, the training data contains the labelled variables 
as P,Q which contains features fi and target as YL which are the elements in the 
vector. So  

 (P,Q) = 1
|𝐹𝐹|
∑𝐷𝐷𝐾𝐾𝐿𝐿 (𝐹𝐹𝑖𝑖 𝑌𝑌𝐿𝐿,)  (8) 

So the maximum related features is given by min value of DKL((𝐹𝐹𝑖𝑖 ,YL) (9) 

Redundant features are repetitive features that are unnecessary and should be 
deleted because they won't enhance the functionality of the system. So the redundant 
features are evaluated as: 

D1(f1, fn) = D11(f1, f1) + D12(f1, f2) + D13(f1, f3) + D14(f1, f4) + ⋯⋯⋯⋯⋯
+ D1𝑛𝑛(f1, fn) 

D2(f1, fn) = D11(f2, f1) + D22(f2, f2) + D23(f2, f3) + D24(f2, f4) + ⋯⋯⋯⋯⋯
+ D2𝑛𝑛(f2, fn) 

D3(f1, fn) = D31(f2, f1) + D32(f2, f2) + D33(f2, f3) + D34(f2, f4) + ⋯⋯⋯⋯⋯
+ D3𝑛𝑛(f3, fn) 

 
   ⋮                   ⋮                ⋮             ⋮                             ⋮                       ⋮⋮

                                            ⋮             
D𝑛𝑛(fn, fn) = D𝑛𝑛1(fn, f1) + D𝑛𝑛2(fn, f2) + D𝑛𝑛3(fn, f3) + D𝑛𝑛4(fn, f4) + ⋯⋯⋯

+ D𝑛𝑛𝑛𝑛(fn, fn) 

 D(fi, fn) = ∑ D (fi , F)fi ∈f  (10) 

The features f1,f2,f3 are the attributes of the given dataset. 
The least information bearing redundant features can be evaluated as given below.  
The maximum relation between features gives the relevance between them is 

evaluated by the divergence between the selected features and the given feature. 
DM (f1 , Fs ) = DM (f1 , Fs1 ) + DM (f1 , Fs2 ) + ⋯⋯⋯⋯⋯⋯ + DM(f1 , Fsk )   

DM (f2 , Fs ) = DM (f2 , Fs1 ) + DM (f2 , Fs2 ) +  ⋯⋯⋯⋯⋯⋯⋯ + DM(f2 , Fsk) 

80 http://www.i-jim.org


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

DM (f23 , Fs ) = DM (f3 , Fs1 ) + DM (f3 , Fs2 ) +  ⋯⋯⋯⋯⋯⋯⋯ + DM(f3 , Fsk) 
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮ 
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮ 

 
Divergence Ratio is the amount by the extent the feature is deviated from the 
maximised related feature and it measures the degree of closeness to the best features 
and represented as Dratio.  

 A = min (D(fi , Fs ) + ∑ ∑ D( Fs , fi )fi∈Ff∈fs   (12) 

 Dratio of feature fi =
[minD(fi ,FS )–D� fi ,f�] 

A
 (13) 

Proposed algorithm: KL divergence based feature 
selection 
1) Input: The given dataset F has n features and m 

instances and Y having the target as C with n instances. 
The Feature set is considered as F 
Output: selected features FS 
2) Evaluate the divergence of feature with class  
3) Consider the features having least divergence as 

subset Fs and the present available subset Fb= F- Fs 
4) Select the feature which agrees 

F𝑘𝑘=argmin  Fj∈F−FK−1 � � DM(Fj , Fs)
𝑗𝑗=1 𝑡𝑡𝑡𝑡 𝑛𝑛

− Dratio  ∗
1

𝐾𝐾 − 1
� D

𝐹𝐹𝑖𝑖∈𝐹𝐹𝐾𝐾−1

(𝐹𝐹𝑗𝑗, 𝐹𝐹𝑖𝑖� 

5) Fs= Fs  FK and  Fb= F- Fs 
6) Output: final FS that is selected features. 

3.2 Experimental set up 

The input data KDDCUP99 dataset is available and collected from the website 
http://kdd.ics.uci.edu. The simulation is done using Python 3.8 version and the coding 
is done using Pandas, Numpy, and matplotlib libraries. The 10%KDDCUP99 training 
dataset contain 22 types of attack and normal connections and each connection has 41 
features and class containing the attack and normal type. The total connections are 
recordings of two week network traffic of two million connections and each 
connection is TCP packets starting and ending. This data is prepared by the 1998 
DARPA Intrusion Detection Evaluation Program with an objective to survey the 
cause for the intrusion or attack in the networks. 

In the simulation, the algorithm uses the Kull back Leibner divergence is also 
called as relative entropy. 

iJIM ‒ Vol. 17, No. 04, 2023 81

http://kdd.ics.uci.edu/


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

4 Methods 

 The divergence is evaluated between the class and each features by using 
divergence concept as D(P,Q)=P(x)*log(P(xi)/Q(xi)) where Q(xi) is the feature and j 
=1,2,3, ----N and P(xi) is the class. Minimum value of divergence is the one which has 
more correlation, so the minimum value of the features are selected as the selected 
features. Then final set of attributes are selected the divergence of the selected feature 
and other feature are evaluated and further the minimum valued feature is selected as 
final set of the features. 

4.1 Performance metrics 

True Positive (TP): Those instances to be predicted as correctly as harmful if the 
attack class is said to be given. 

False Positive (FP): The prediction of instance as harmful where as the class given 
is normal. False negative (FN): The instances are correctly predicted as harmful while 
the class is also said as attack.  

Detection Rate or (TPR): The rate of system where the instances are correctly 
predicted if the target class is attack. 

 TPR=TP/(TP+FN) 

False positive rate (FPR): The proportionate of case in which the attack class is 
wrongly identified as normal. This criterion indicates the effectivenesss of the 
system.Usually this criterion is expected to be low for good performance system. 

 FPR = FP/(TN+FP) 

Accuracy: Ratio of correct prediction to the total correct prediction and wrong 
prediction. 

 Accuracy=(TP+TN)/(TP+TN+FP+FN) 

5 Results and discussions 

To find the close relation between features the divergence is evaluated and if it is 
less between them then they are considered more related. The features of 
KDDCUPP99 are evaluated for divergence and same is plotted from Figure 2 As the 
features f2,f15, f23 and f24are having the lowest divergence but while selecting the 
first set of features the feature relation with class is also to be considered. So we select 
f2,f15,f23 are first set of features with which the second set of features are evaluated. 

82 http://www.i-jim.org


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

 
Fig. 2. Divergence of all features 

The models are simulated after features are simplified and the accuracy is observed 
as shown above in Figure 3. Though the DMIFSA has the same Accuracy of KL 
based FS as 99.94%, DMIFSA lacks the TPR rate which gives the correctness of the 
model. It has many redundant features that does not carry any extra information 
which helps the model to be trained more better. So KL based FS will not contain any 
redundant features as DMIIFSA. Other models like RPFMI, MMIFS are less accurate 
than KL based FS. 

iJIM ‒ Vol. 17, No. 04, 2023 83


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

 
Fig. 3. Comparison of Accuracy of methods 

From the Table 1, it is observed that even with 9 features the accuracy of KL based 
FS is more than RPFMI. With 12 features MMIFS is less than the KL based FS and 
the time to simulate is more in the DMIFSA that indicates the presence of many 
irrelevant attributes. In the simplified feature set of 11 the feature 5 is still used for 
simulation the accuracy will not change so it has no extra information. The accuracy 
is seen to be maintain consistency as it has no redundant features. 

Table 1.  Simulated results of KL based Feature selection algorithm 

No of Features Accuracy(%) Feature Number Confusion Matrix 

10 99.934 2,4,23,15,32,24,29,34,3,5 
118952 55 
29153 47 

9 99.94 4,23,15,32,24,29,34,3,5 
118954 53 
29170 30 

8 99.44 23,15,32,24,29,34,3,5 
118954 53 
29168 32 

7 99.93 15,32,24,29,34,3,5 
118952 55 
29155 45 

6 99.93 32,24,29,34,3,5 
118952 55 
29155 45 

5 99.92 24,29,34,3,5 
118950 57 
29148 52 

Accuracy, KL 
based FS, 

99.94

Accuracy, 
RPFMI, 98.94

Accuracy, 
MMIFS, 99.77

Accuracy, 
DMIFSA, 

99.94

98.4

98.6

98.8

99

99.2

99.4

99.6

99.8

100

100.2

KL based FS RPFMI MMIFS DMIFSA

Accuracy

Accuracy

84 http://www.i-jim.org


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

In the above graph Figure 4 the simulation of algorithms is observed and it is found 
that the accuracy values of KL based FS has almost constant values since the features 
contribute to be more informative than the other algorithms. It indicates the algorithm 
is handling the issues of presence of irrelevant features in the given dataset. 

 
Fig. 4. Modelling of the methods 

The diversion value of all the features are evaluated and the miminum of them is 
observed and Figure 5 is the graphical representation of diversion the final features 
selected and of them feature f5 has lowest value and it contributes more to the 
performance of system. The features f36 is also contributing and in the same way 
feature f2,f3,f4,f15,f23,f24,f29,f32,f33,f36 are contributing in building the 
performance of the system. 

 
Fig. 5. Minimum value of feature to feature KL based diversion  

-10

0

10

20

30

40

50

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33

Arg Min of KL based diversion

Arg Min of KL
based diversion

iJIM ‒ Vol. 17, No. 04, 2023 85


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

As the attacks are taken as anamloy, in the Figure 6 the TPR is maintained 
consistency from 5 features to the 10 features which depicts the consistency in its 
performance and the TPR is almost maximum indicating the proposed model has 
better performance over other methods. The FPR is maintained almost very low value 
from the 5 features to 10 features indicting that the predicted attack is never wrong. 
The consistency of TPR and FPR values indicates the stability of the system in which 
the system can performance well with very low FPR value. 

 
Fig. 6. TPR and FPR of Anamoly data 

6 Conclusions  

The features contributing to the increase in the performance of the system is done 
using the Kullback Leibler divergence method implemented. The Accuracy of the 
methods are evaluated and the comparison is done by other mutual information 
methods using KDDCUP99 data set and C4. 5 classifier and proved that KL based 
Feature selection achieved 99.94% of Accuracy and the time to simulate is also 
minimised. It is observed that the mutual information based methods DMIFSA, 
MMIFS, RPFMI are inferior to the performance of the KL based feature selection 
method. The mobile networks has the problem of intrusion and it is detected using the 
Divergence based method to minimise the attacks and further this method can be 
extended to rank the features so as to reduce the simulation time. 

7 Acknowledgements  

All the authors are thankful to the reviewers for the timely suggestions very helpful 
for improving the manuscript. 

86 http://www.i-jim.org


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

8 References 

[1] Jianuo Li, Hongyan Zhang, Jianjun Zhao, Xiaoyi Guo, Xiaoyi Guo, Guorong Deng,” 
Embedded Feature Selection and Machine Learning Methods for Flash Flood 
Susceptibility-Mapping in the Mainstream Songhua River Basin”, China, Quantifying 
Geographical processes Using Remote Sensing Techniques, Nov 2022.  

[2] Zhenyu Zhao, Radhika Anand, Mallory Wang,” Maximum Relevance and Minimum 
Redundancy Feature Selection Methods for a Marketing Machine Learning Platform”, 
arXiv:1908.05376 [stat.ML]2019. https://doi.org/10.1109/DSAA.2019.00059  

[3] Insik Jo, Sangbum Lee, Sejong Oh,” Improved Measures of Redundancy and Relevance 
for mRMR Feature Selection”, Computers 2019 MDPI, pp1-14. 

[4] S. Visalakshi, V. Radha, “A Literature Review of Feature Selection Techniques and 
Applications, Review of feature selection in data mining “, 2014 IEEE International 
Conference on Computational Intelligence and Computing Research. https://doi.org/10.11 
09/ICCIC.2014.7238499  

[5] W. Liu, J. G. Suna, L. Liu, H. J. Zhang, “Feature selection with dynamic mutual 
information,” Pattern Recognition, 42(2009) 1330 – 1339. https://doi.org/10.1016/j.patcog. 
2008.10.028  

[6] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression 
data. J Bioinforma Comput Biol. 2005;03(02):185–205. https://doi.org/10.1142/S0219720 
005001004  

[7] W. Liu, J. G. Suna, L. Liu, H. J. Zhang, “Feature selection with dynamic mutual 
information,” Pattern Recognition, 42(2009):1330 – 1339. https://doi.org/10.1016/j.patcog. 
2008.10.028  

[8] Jingping Song, Zhiliang Zhu, Peter Scully and Chris Price, "Modified Mutual Information-
basedFeatureSelectionforIntrusionDetectionSystemsinDecision Tree Learning", Journal of 
computers, 2014, 9(7): 1542-1546. https://doi.org/10.4304/jcp.9.7.1542-1546  

[9] Hussain, A. Chowdary and Dhruba Bhattacharyya,”mRMR+: An effective feature 
selection algorithm for classification”, Springer International Publishing AG 2017 B.U. 
Shankar et al. (Eds.): PReMI 2017, LNCS 10597, pp. 424–430, 2017. https://doi.org/10.10 
07/978-3-319-69900-4_54  

[10] Hoque N, Bhattacharyya DK, Kalita JK. Mifs-nd: A mutual information-based feature 
selection method. Expert Syst Appl. 2014; 41(14):6371–385. https://doi.org/10.1016/j.es–
wa.2014.04.019  

[11] Deb K, Agrawal S, Pratap A, Meyarivan T. In: Schoenauer M, Deb K, Rudolph G, Yao X, 
Lutton E, Merelo JJ, Schwefel H-P, editors. A Fast Elitist Non-dominated Sorting Genetic 
Algorithm for Multi-objective Optimization: NSGA-II. Berlin, Heidelberg: Springer; 2000, 
pp. 849–58. https://doi.org/10.1007/3-540-45356-3_83  

[12] Ghalwash MF, Cao XH, Stojkovic I, Obradovic Z. Structured feature selection using 
coordinate descent optimization. BMC Bioinformatics. 2016; 17(1):1–14. https://doi.org/ 
10.1186/s12859-016-0954-4  

[13] Par th Gupta, Paolo Rosso, “Expected Divergence based Feature Selection for Learning to 
Rank”, Proceedings of COLING 2012: Posters, COLING 2012, Mumbai, December 2012, 
pp431-440. 

[14] Peng H., Long F., Ding C., 2005, “Feature selection based on mutual information criteria 
of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern 
Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226– 1238. https://doi.org/10.11–
09/TPAMI.2005.159  

iJIM ‒ Vol. 17, No. 04, 2023 87

https://doi.org/10.1109/DSAA.2019.00059
https://doi.org/10.1109/ICCIC.2014.7238499
https://doi.org/10.1109/ICCIC.2014.7238499
https://doi.org/10.1016/j.patcog.2008.10.028
https://doi.org/10.1016/j.patcog.2008.10.028
https://doi.org/10.1142/S0219720005001004
https://doi.org/10.1142/S0219720005001004
https://doi.org/10.1016/j.patcog.2008.10.028
https://doi.org/10.1016/j.patcog.2008.10.028
https://doi.org/10.4304/jcp.9.7.1542-1546
https://doi.org/10.1007/978-3-319-69900-4_54
https://doi.org/10.1007/978-3-319-69900-4_54
https://doi.org/10.1016/j.es%E2%80%93wa.2014.04.019
https://doi.org/10.1016/j.es%E2%80%93wa.2014.04.019
https://doi.org/10.1007/3-540-45356-3_83
https://doi.org/10.1186/s12859-016-0954-4
https://doi.org/10.1186/s12859-016-0954-4
https://doi.org/10.11%E2%80%9309/TPAMI.2005.159
https://doi.org/10.11%E2%80%9309/TPAMI.2005.159


Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 

[15] M. A. Hall, “Correlation-based feature selection for machine learning,” Ph.D. thesis, 
Department of Computer Science, University of Waikato, Hillcrest, New Zealand, 1999. 

9 Authors  

N. Chitra is a research scholar in Presidency University and interested in wireless 
communications based on mobile network. She had bachelor in electronics and 
communication engineering and Master degree in Applied electronics from 
electronics and communication engineering department and presently pursuing Ph.D. 
in wireless communication using machine learning from the same department. She is 
more interested in mobile technology and its networks communications and the traffic 
problem associated with it (email: chitravadde24@gmail.com). 

Dr. Safinaz S. Assistant Professor at Presidency University. She received Ph. D 
under Visvesvaraya Technological University (VTU) Belagavi in 2019, completed M. 
Tech-Electronics at Sir MVIT in 2008 and B.E-Telecommunication at Sir MVIT in 
2004. She started the teaching career from 2004 onwards at Sir MVIT, and with a 
total of 17 years of teaching experience had peer reviewed journals and conferences 
of national and international repute and guided may award winning projects for 
Under-Graduate Students and Post-Graduate Students. She is a domain expert in the 
Image & Video Processing, FPGA Implementation using various Toolkits (email: 
safinazs@presidencyuniversity.in). 

Dr. K. Bhanu Rekha Assistant professor at Presidency University. She have 
completed BE in Telecommunication in RVCE 2002, M.Tech in Electronics in 2009 
both from Bangalore and Ph.D in Image and Video Processing 2019. Initially she 
started her career as Network Engineer in Accenture and Switched to teaching in the 
year 2005. She has total 17 years of teaching experience and have 9 publications in 
various international journals, international and national conferences of good repute. 
Her research interests include Signal Processing, Image and Video Processing, 
Verilog coding and FPGA implementation (email: 
bhanurekha@presidencyuniversity.in). 

Article submitted 2022-10-29. Resubmitted 2022-12-06. Final acceptance 2022-12-12. Final version 
published as submitted by the authors. 

88 http://www.i-jim.org