International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol 17 No 04 (2023) Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion Detection in Mobile Communications Merged with the Computer Communication Networks https://doi.org/10.3991/ijim.v17i04.37733 N. Chitra(), Safinaz. S, K. Bhanu Rekha Department of Electronics and Communication Engineering, Presidency University, Bangalore, India chitravadde24@gmail.com Abstract—The Feature Selection concept is the procedure where in the data is simplified removing the irrelevant features. Divergence method is another strategy of method where in the relation among the attributes and class is measured to understand their contribution to performance. Nowadays the mobile network is integrated with the other networks like the computer networks transmitting all kinds of data leading to attacks in networks known as intrusion in computer networks equally applicable to mobile communications. So in this paper the intrusion detection method involving the other other mode of communication is considered in mobile interaction network. The proposed algorithm performs feature selection using the divergence evaluation method to reduce the feature set. A 10% KDDCUP99 data set was used for the evaluation of the proposed algorithm, and performance metrics were evaluated using the C4.5 classifier. The metrics TPR, FPR and consistency were compared with the mutual information based DMIFSA, RMIFSA, MMIFSA methods and the proposed method is implemented on Python 3.8 that proved to achieve the high accuracy of 99.94% as compared to other methods and also reduce the redundant features. The consistency in Accuracy is maintained almost from 4 features to 10 features in proposed method as compared to other methods that indicates the stability of the system is achieved. Keywords—divergence, feature selection, C4.5 classifier, mobile communications, TPR (True Positive Rate), FPR (False positive Rate), accuracy, stability 1 Introduction The communication is done by merging the different platforms of transmission into one with compatabilty methods. Different forms like Voice data video data remote sensor information and sensor security are transmitted to targets using vast infrastructures. But there are situations where some unwanted persons begin to access the data or steal for their personal benefits. Strangers accessing the data is termed as iJIM ‒ Vol. 17, No. 04, 2023 75 https://doi.org/10.3991/ijim.v17i04.37733 Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… illegal accessing which will take control of the network alarming and stealing of personal data which causes harm to society. To detect the anomalous behavior in the network, the certain characteristics leading to disruption in the communication were identified which is known as attacking features. Identification of those features is done using various methods like mutual information method, entropy, gain ratio etc. Various Methods of implementation of feature selection are summarised as follows. They are filter, embedded and wrapper methods used for best selection of features. Each feature in filter method is computed individually for its relation with the class and the better one [1] is selected and it requires the lesser time of evaluation. It does not require the algorithms that are already predefined in its functioning in terms of machine learning technology. Feature selection is done in which no other machine learning algorithm used except its computation and evaluation of metrics and comparison of them. This method is more efficient and the speed of [2] computation of metrics is high. The feature selection method is applied to the model and the system performance is improved by selecting the good subset of features [3] in embedded method. Sequential forward selection and genetic algorithms [1] are common wrapper-based feature selection techniques. This type of technique, however, carries a greater risk of over fitting and is very computationally costly, necessitating a lengthy training period. The decision of the features selected is directly impacted by the sequence in which subsets are entered into wrapper methods. The wrapper chooses the features that we attempt to incorporate into the model using the learning method [4]. Forward selection is used to choose the features:- The set is initially null and is then arbitrarily chosen by the minimal estimation value repeatedly until the set of chosen characteristics is met by the substantial evaluation criteria level [5]. Reverse elimination: The collection initially contains all of the characteristics but during the selection process. The features with the greatest evaluation value are eliminated as being unnecessary in steps or a two-way elimination. This method is similar to forward selection, except before adding a new feature, it looks for features that have already been selected. If this happens, the insignificant feature is then removed via the backward elimination process. The mRMR [6] using MI based algorithm performance degrades if the attributes are more and the features of high correlation are selected. Algorithms for supervised feature selection take into by assessing the features' significance in connection to the class data, whereas unsupervised in assessing the value of features, may take advantage of data variance or data distribution significance without labels. Algorithms for semisupervised feature selection [3] make use of a modest quantity of labelled data for enhancing the performance of the unsupervised learning methods. The detailed process of the feature selection done in the network attacks to identify the culprit is shown in Figure 1. Subset generation: Subset generation is the process of finding well-performing subsets of features and sending them as input to the scoring process. The feature selection process begins with subset generation and involves three different methods. Forward selection: With this method, the subset is considered empty, and if candidate features are selected, they are added one by one for each feature selected. Backward selection: Subsetting takes the entire dataset as input and removes them one by one after finding irrelevant ones. Random Selection: A subset 76 http://www.i-jim.org Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… of features is randomly generated from the data set, and features are systematically removed or included one after the other. Evaluation method: This method uses a stopping or evaluation function to find the association between features and classes. Here we compare the evaluated value with the optimal value set before the evaluation. Validation Procedure: Validation procedure applies an evaluation procedure to classify the selected characteristics. The classified part has a training part and a testing part. We typically use 70% of the data as the training part and 30% of the data as the testing part. The selected features are the predicted features compared to the original dataset's classification results. The better the evaluation function, the better the evaluation part. However, verification does not fall under the process in the functional section. It is important to demonstrate the effectiveness of the selected features. Fig. 1. Feature selection process 2 Literature review To extract the extent of data relevance, the authors Hwawen et al., proposed the concept of information content of functions and classes and related them to each other and that is called mutual information in the dynamic mutual information feature selection algorithm [DMIFS] [7]. Data relevance contributes to correlations between data and classes. The first subset of features is based on mutual information between classes and features. Highest values are considered repeatedly and the final features iJIM ‒ Vol. 17, No. 04, 2023 77 Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… are evaluated using the evaluation function that is the stopping condition. Relationships between features are of higher importance, the best features are selected, and the weight of feature-class interactions is very important. Mutual information relates variables and indicates the uncertainty of their occurrence. Let X and Y be random variables related to discrete or continuous event probabilities. A random variable contains samples X, Y that are said to be two random variables which is related to probability of continuous or discrete occurrences. where X=�𝑥𝑥1, 𝑥𝑥2,𝑥𝑥3, 𝑥𝑥4 ⋯⋯⋯𝑥𝑥𝑁𝑁� and Y=(𝑦𝑦1, 𝑦𝑦2, 𝑦𝑦3, 𝑦𝑦4 ⋯⋯𝑦𝑦𝑁𝑁)Then the entropy of the quantities is given by equations (1) and (2) that gives the average information between them. H(X) = −∑ p(xi) logP(xi)Ni=1 (1) H(Y) = −∑ P(yi)logNi=1 P(yi) (2) Where P(xi) = possible number of instants of xi /Total number of instants (N) Where P(yi) = possible number of instants of yi /Total number of instants (N). H(X|Y) = ∑ ∑ P(x, y) log P(x|y)x∈x,y∈yN i=1 (3) Where P(X,Y) is the joint distribution function of x and y H(X|Y) is the joint entropy of X and Y variables and P(X|Y) is conditional probability of X with Y is given. In Modified Mutual Information (MMIFSA) [8] the second features are selected by the product of Mutual information of Class and features with Mutual information of input features and selected features. The MMIFSA is further modified and the second features are done by normalising the value of MI between candidate features and selected features with the entropy of features of a dataset. Too many or too few attributes in the dataset will make evaluation difficult and degrade system performance. Therefore, the problem [9] of a rise in gain ratio than mutual information that lowers system performance is solved by mRMR (minimal redundancy and maximum relevance). A better way to quantify relevance and redundancy for by choosing mRMR features. The best features were selected using Pearson correlation coefficient to measure the redundancy and the relevance is measured using 'R' value. The mutual information concept is used for the calculation of pearson Coefficient and 'R' Value. For the evaluation of MI which requires categorical values. But if the data is continued to convert discretisation, the data is lost. But the 'R' value used does not require discretisation and is more beneficial than the mutual information. In Maximum relevance and Minimum Redundancy feature selection methods for a marketing machine learning platform for product available online and for marketing the number of features are available for application to machine learning techniques. To reduce [2] the features belonging to unwanted group, the concept called feature selection is applied. FDD (F-test correlation difference) is used to find the correlation and to find the redundancy factor and FCQ (F- correlation factor) scheme is used. Randomised dependance coefficient (RDC) is the one in which correlation measure is 78 http://www.i-jim.org Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… replaced and the relevance factor is greatly improved. In [10, 11] the features are selected using the MIFS-ND (mutual information feature selection Non-dominated) the mutual information is used for minimal redundancy method. The authors in[12] has selected the best optimal genes in which the genes of same type are grouped or clustered and later by using the selecting approach the highly correlated genes are selected. In recognizing the best performing attributes there are different evaluating methods used earlier like Mutual information, gain ratio, distance based methods [13] and to expectation is used to evaluate the best feature. Jin XU et al., proposed the RRPC (mRMR) an incremental search method for optimal selection of features. Pearson Correlation Coefficient is used to find the best performing features. Three datasets Corral, Corral-4r and Corral-46 and compared with mRMR and Fischer method like semi supervised FS method like Select and unsupervised method like Laplacian method. In Corral-47 there are 12 features which contributed out of 47 features. Hanchuan Peng et.al [14] derived the equation form for the minimal redundancy and maximum relevance method and implemented two stage Feature Selection algorithm and used Naive Bayes, SVM and LDA classifiers. Minimum redundancy is evaluated by average mutual Information between features. Maximum Relevance is evaluated by the mutual information between feature and target class. minimum redundancy: min R(s), R= 1 𝑆𝑆2 ∑ 𝐼𝐼(𝑥𝑥𝑖𝑖𝑥𝑥𝑖𝑖𝑥𝑥𝑗𝑗 ∈𝑆𝑆 , 𝑥𝑥𝑗𝑗) (4) maxφ(D,R), φ=D-R 𝑚𝑚𝑚𝑚𝑥𝑥𝑥𝑥𝑗𝑗∈𝑋𝑋−𝑆𝑆𝑚𝑚−1 �𝐼𝐼�𝑥𝑥𝑗𝑗; 𝑌𝑌𝐿𝐿� − 1 𝑚𝑚−1 ∑ 𝐼𝐼�𝑥𝑥𝑗𝑗; 𝑥𝑥𝑖𝑖�𝑥𝑥𝑖𝑖∈𝑆𝑆𝑚𝑚−1 � (5) Where Fs is the selected feature and Fi is the feature instant. Fs is the subset containing selected features. In the mRMR we have (k-1) features from Fk and for obtaining next features from Fk-1 subsets with φ(D,R) is: F𝑘𝑘=𝑚𝑚𝑎𝑎𝑎𝑎min F𝑗𝑗∈𝐹𝐹−𝐹𝐹𝐾𝐾−1 �∑ 𝐷𝐷𝐾𝐾𝐿𝐿(𝐹𝐹𝑗𝑗, 𝐹𝐹𝑠𝑠)𝑗𝑗=1 𝑡𝑡𝑡𝑡 𝑛𝑛 − 1 𝐾𝐾−1 ∑ 𝐷𝐷𝐾𝐾𝐿𝐿𝐹𝐹𝑖𝑖∈𝐹𝐹𝐾𝐾−1 (𝐹𝐹𝑗𝑗, 𝐹𝐹𝑖𝑖)� (6) Where fs is selected features with minimum of divergence with class. The contribution of the Paper: 1. Feature selection method in which the relation between features are evaluated using the concept of Divergence between features is proposed. 2. Then the divergence parameter is proposed which can be of use to find the best parameters. 3. The suggested method is compared in terms of TPR, FPR and accuracy with other methods. 4. Performance metrics are evaluated and the graphical representation of the metrics is performed of all the methods are made. iJIM ‒ Vol. 17, No. 04, 2023 79 Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 3 Proposed method 3.1 Formulation of method In the proposed method the divergence method is used which depict the relation between the feature attributes in terms of divergence value. In the correlation method, the characteristics are closely related if the correlation between the characteristics is maxima [15] and if it is less, this means that it is less related. In the divergence approach, features are considered to be closely connected if there is no divergence between them, and least related if there is maximum divergence. Let P and Q are the variables and the relation between them is given in terms of Kull Back Leibleir divergence as: (P,Q)=P(x)*log𝑃𝑃(𝑥𝑥) 𝑄𝑄(𝑥𝑥) (7) Based on the supervised learning, the training data contains the labelled variables as P,Q which contains features fi and target as YL which are the elements in the vector. So (P,Q) = 1 |𝐹𝐹| ∑𝐷𝐷𝐾𝐾𝐿𝐿 (𝐹𝐹𝑖𝑖 𝑌𝑌𝐿𝐿,) (8) So the maximum related features is given by min value of DKL((𝐹𝐹𝑖𝑖 ,YL) (9) Redundant features are repetitive features that are unnecessary and should be deleted because they won't enhance the functionality of the system. So the redundant features are evaluated as: D1(f1, fn) = D11(f1, f1) + D12(f1, f2) + D13(f1, f3) + D14(f1, f4) + ⋯⋯⋯⋯⋯ + D1𝑛𝑛(f1, fn) D2(f1, fn) = D11(f2, f1) + D22(f2, f2) + D23(f2, f3) + D24(f2, f4) + ⋯⋯⋯⋯⋯ + D2𝑛𝑛(f2, fn) D3(f1, fn) = D31(f2, f1) + D32(f2, f2) + D33(f2, f3) + D34(f2, f4) + ⋯⋯⋯⋯⋯ + D3𝑛𝑛(f3, fn) ⋮ ⋮ ⋮ ⋮ ⋮ ⋮⋮ ⋮ D𝑛𝑛(fn, fn) = D𝑛𝑛1(fn, f1) + D𝑛𝑛2(fn, f2) + D𝑛𝑛3(fn, f3) + D𝑛𝑛4(fn, f4) + ⋯⋯⋯ + D𝑛𝑛𝑛𝑛(fn, fn) D(fi, fn) = ∑ D (fi , F)fi ∈f (10) The features f1,f2,f3 are the attributes of the given dataset. The least information bearing redundant features can be evaluated as given below. The maximum relation between features gives the relevance between them is evaluated by the divergence between the selected features and the given feature. DM (f1 , Fs ) = DM (f1 , Fs1 ) + DM (f1 , Fs2 ) + ⋯⋯⋯⋯⋯⋯ + DM(f1 , Fsk ) DM (f2 , Fs ) = DM (f2 , Fs1 ) + DM (f2 , Fs2 ) + ⋯⋯⋯⋯⋯⋯⋯ + DM(f2 , Fsk) 80 http://www.i-jim.org Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… DM (f23 , Fs ) = DM (f3 , Fs1 ) + DM (f3 , Fs2 ) + ⋯⋯⋯⋯⋯⋯⋯ + DM(f3 , Fsk) ⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮ ⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮ Divergence Ratio is the amount by the extent the feature is deviated from the maximised related feature and it measures the degree of closeness to the best features and represented as Dratio. A = min (D(fi , Fs ) + ∑ ∑ D( Fs , fi )fi∈Ff∈fs (12) Dratio of feature fi = [minD(fi ,FS )–D� fi ,f�] A (13) Proposed algorithm: KL divergence based feature selection 1) Input: The given dataset F has n features and m instances and Y having the target as C with n instances. The Feature set is considered as F Output: selected features FS 2) Evaluate the divergence of feature with class 3) Consider the features having least divergence as subset Fs and the present available subset Fb= F- Fs 4) Select the feature which agrees F𝑘𝑘=argmin Fj∈F−FK−1 � � DM(Fj , Fs) 𝑗𝑗=1 𝑡𝑡𝑡𝑡 𝑛𝑛 − Dratio ∗ 1 𝐾𝐾 − 1 � D 𝐹𝐹𝑖𝑖∈𝐹𝐹𝐾𝐾−1 (𝐹𝐹𝑗𝑗, 𝐹𝐹𝑖𝑖� 5) Fs= Fs FK and Fb= F- Fs 6) Output: final FS that is selected features. 3.2 Experimental set up The input data KDDCUP99 dataset is available and collected from the website http://kdd.ics.uci.edu. The simulation is done using Python 3.8 version and the coding is done using Pandas, Numpy, and matplotlib libraries. The 10%KDDCUP99 training dataset contain 22 types of attack and normal connections and each connection has 41 features and class containing the attack and normal type. The total connections are recordings of two week network traffic of two million connections and each connection is TCP packets starting and ending. This data is prepared by the 1998 DARPA Intrusion Detection Evaluation Program with an objective to survey the cause for the intrusion or attack in the networks. In the simulation, the algorithm uses the Kull back Leibner divergence is also called as relative entropy. iJIM ‒ Vol. 17, No. 04, 2023 81 http://kdd.ics.uci.edu/ Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 4 Methods The divergence is evaluated between the class and each features by using divergence concept as D(P,Q)=P(x)*log(P(xi)/Q(xi)) where Q(xi) is the feature and j =1,2,3, ----N and P(xi) is the class. Minimum value of divergence is the one which has more correlation, so the minimum value of the features are selected as the selected features. Then final set of attributes are selected the divergence of the selected feature and other feature are evaluated and further the minimum valued feature is selected as final set of the features. 4.1 Performance metrics True Positive (TP): Those instances to be predicted as correctly as harmful if the attack class is said to be given. False Positive (FP): The prediction of instance as harmful where as the class given is normal. False negative (FN): The instances are correctly predicted as harmful while the class is also said as attack. Detection Rate or (TPR): The rate of system where the instances are correctly predicted if the target class is attack. TPR=TP/(TP+FN) False positive rate (FPR): The proportionate of case in which the attack class is wrongly identified as normal. This criterion indicates the effectivenesss of the system.Usually this criterion is expected to be low for good performance system. FPR = FP/(TN+FP) Accuracy: Ratio of correct prediction to the total correct prediction and wrong prediction. Accuracy=(TP+TN)/(TP+TN+FP+FN) 5 Results and discussions To find the close relation between features the divergence is evaluated and if it is less between them then they are considered more related. The features of KDDCUPP99 are evaluated for divergence and same is plotted from Figure 2 As the features f2,f15, f23 and f24are having the lowest divergence but while selecting the first set of features the feature relation with class is also to be considered. So we select f2,f15,f23 are first set of features with which the second set of features are evaluated. 82 http://www.i-jim.org Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… Fig. 2. Divergence of all features The models are simulated after features are simplified and the accuracy is observed as shown above in Figure 3. Though the DMIFSA has the same Accuracy of KL based FS as 99.94%, DMIFSA lacks the TPR rate which gives the correctness of the model. It has many redundant features that does not carry any extra information which helps the model to be trained more better. So KL based FS will not contain any redundant features as DMIIFSA. Other models like RPFMI, MMIFS are less accurate than KL based FS. iJIM ‒ Vol. 17, No. 04, 2023 83 Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… Fig. 3. Comparison of Accuracy of methods From the Table 1, it is observed that even with 9 features the accuracy of KL based FS is more than RPFMI. With 12 features MMIFS is less than the KL based FS and the time to simulate is more in the DMIFSA that indicates the presence of many irrelevant attributes. In the simplified feature set of 11 the feature 5 is still used for simulation the accuracy will not change so it has no extra information. The accuracy is seen to be maintain consistency as it has no redundant features. Table 1. Simulated results of KL based Feature selection algorithm No of Features Accuracy(%) Feature Number Confusion Matrix 10 99.934 2,4,23,15,32,24,29,34,3,5 118952 55 29153 47 9 99.94 4,23,15,32,24,29,34,3,5 118954 53 29170 30 8 99.44 23,15,32,24,29,34,3,5 118954 53 29168 32 7 99.93 15,32,24,29,34,3,5 118952 55 29155 45 6 99.93 32,24,29,34,3,5 118952 55 29155 45 5 99.92 24,29,34,3,5 118950 57 29148 52 Accuracy, KL based FS, 99.94 Accuracy, RPFMI, 98.94 Accuracy, MMIFS, 99.77 Accuracy, DMIFSA, 99.94 98.4 98.6 98.8 99 99.2 99.4 99.6 99.8 100 100.2 KL based FS RPFMI MMIFS DMIFSA Accuracy Accuracy 84 http://www.i-jim.org Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… In the above graph Figure 4 the simulation of algorithms is observed and it is found that the accuracy values of KL based FS has almost constant values since the features contribute to be more informative than the other algorithms. It indicates the algorithm is handling the issues of presence of irrelevant features in the given dataset. Fig. 4. Modelling of the methods The diversion value of all the features are evaluated and the miminum of them is observed and Figure 5 is the graphical representation of diversion the final features selected and of them feature f5 has lowest value and it contributes more to the performance of system. The features f36 is also contributing and in the same way feature f2,f3,f4,f15,f23,f24,f29,f32,f33,f36 are contributing in building the performance of the system. Fig. 5. Minimum value of feature to feature KL based diversion -10 0 10 20 30 40 50 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Arg Min of KL based diversion Arg Min of KL based diversion iJIM ‒ Vol. 17, No. 04, 2023 85 Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… As the attacks are taken as anamloy, in the Figure 6 the TPR is maintained consistency from 5 features to the 10 features which depicts the consistency in its performance and the TPR is almost maximum indicating the proposed model has better performance over other methods. The FPR is maintained almost very low value from the 5 features to 10 features indicting that the predicted attack is never wrong. The consistency of TPR and FPR values indicates the stability of the system in which the system can performance well with very low FPR value. Fig. 6. TPR and FPR of Anamoly data 6 Conclusions The features contributing to the increase in the performance of the system is done using the Kullback Leibler divergence method implemented. The Accuracy of the methods are evaluated and the comparison is done by other mutual information methods using KDDCUP99 data set and C4. 5 classifier and proved that KL based Feature selection achieved 99.94% of Accuracy and the time to simulate is also minimised. It is observed that the mutual information based methods DMIFSA, MMIFS, RPFMI are inferior to the performance of the KL based feature selection method. The mobile networks has the problem of intrusion and it is detected using the Divergence based method to minimise the attacks and further this method can be extended to rank the features so as to reduce the simulation time. 7 Acknowledgements All the authors are thankful to the reviewers for the timely suggestions very helpful for improving the manuscript. 86 http://www.i-jim.org Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… 8 References [1] Jianuo Li, Hongyan Zhang, Jianjun Zhao, Xiaoyi Guo, Xiaoyi Guo, Guorong Deng,” Embedded Feature Selection and Machine Learning Methods for Flash Flood Susceptibility-Mapping in the Mainstream Songhua River Basin”, China, Quantifying Geographical processes Using Remote Sensing Techniques, Nov 2022. [2] Zhenyu Zhao, Radhika Anand, Mallory Wang,” Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform”, arXiv:1908.05376 [stat.ML]2019. https://doi.org/10.1109/DSAA.2019.00059 [3] Insik Jo, Sangbum Lee, Sejong Oh,” Improved Measures of Redundancy and Relevance for mRMR Feature Selection”, Computers 2019 MDPI, pp1-14. [4] S. Visalakshi, V. Radha, “A Literature Review of Feature Selection Techniques and Applications, Review of feature selection in data mining “, 2014 IEEE International Conference on Computational Intelligence and Computing Research. https://doi.org/10.11 09/ICCIC.2014.7238499 [5] W. Liu, J. G. Suna, L. Liu, H. J. Zhang, “Feature selection with dynamic mutual information,” Pattern Recognition, 42(2009) 1330 – 1339. https://doi.org/10.1016/j.patcog. 2008.10.028 [6] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinforma Comput Biol. 2005;03(02):185–205. https://doi.org/10.1142/S0219720 005001004 [7] W. Liu, J. G. Suna, L. Liu, H. J. Zhang, “Feature selection with dynamic mutual information,” Pattern Recognition, 42(2009):1330 – 1339. https://doi.org/10.1016/j.patcog. 2008.10.028 [8] Jingping Song, Zhiliang Zhu, Peter Scully and Chris Price, "Modified Mutual Information- basedFeatureSelectionforIntrusionDetectionSystemsinDecision Tree Learning", Journal of computers, 2014, 9(7): 1542-1546. https://doi.org/10.4304/jcp.9.7.1542-1546 [9] Hussain, A. Chowdary and Dhruba Bhattacharyya,”mRMR+: An effective feature selection algorithm for classification”, Springer International Publishing AG 2017 B.U. Shankar et al. (Eds.): PReMI 2017, LNCS 10597, pp. 424–430, 2017. https://doi.org/10.10 07/978-3-319-69900-4_54 [10] Hoque N, Bhattacharyya DK, Kalita JK. Mifs-nd: A mutual information-based feature selection method. Expert Syst Appl. 2014; 41(14):6371–385. https://doi.org/10.1016/j.es– wa.2014.04.019 [11] Deb K, Agrawal S, Pratap A, Meyarivan T. In: Schoenauer M, Deb K, Rudolph G, Yao X, Lutton E, Merelo JJ, Schwefel H-P, editors. A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. Berlin, Heidelberg: Springer; 2000, pp. 849–58. https://doi.org/10.1007/3-540-45356-3_83 [12] Ghalwash MF, Cao XH, Stojkovic I, Obradovic Z. Structured feature selection using coordinate descent optimization. BMC Bioinformatics. 2016; 17(1):1–14. https://doi.org/ 10.1186/s12859-016-0954-4 [13] Par th Gupta, Paolo Rosso, “Expected Divergence based Feature Selection for Learning to Rank”, Proceedings of COLING 2012: Posters, COLING 2012, Mumbai, December 2012, pp431-440. [14] Peng H., Long F., Ding C., 2005, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226– 1238. https://doi.org/10.11– 09/TPAMI.2005.159 iJIM ‒ Vol. 17, No. 04, 2023 87 https://doi.org/10.1109/DSAA.2019.00059 https://doi.org/10.1109/ICCIC.2014.7238499 https://doi.org/10.1109/ICCIC.2014.7238499 https://doi.org/10.1016/j.patcog.2008.10.028 https://doi.org/10.1016/j.patcog.2008.10.028 https://doi.org/10.1142/S0219720005001004 https://doi.org/10.1142/S0219720005001004 https://doi.org/10.1016/j.patcog.2008.10.028 https://doi.org/10.1016/j.patcog.2008.10.028 https://doi.org/10.4304/jcp.9.7.1542-1546 https://doi.org/10.1007/978-3-319-69900-4_54 https://doi.org/10.1007/978-3-319-69900-4_54 https://doi.org/10.1016/j.es%E2%80%93wa.2014.04.019 https://doi.org/10.1016/j.es%E2%80%93wa.2014.04.019 https://doi.org/10.1007/3-540-45356-3_83 https://doi.org/10.1186/s12859-016-0954-4 https://doi.org/10.1186/s12859-016-0954-4 https://doi.org/10.11%E2%80%9309/TPAMI.2005.159 https://doi.org/10.11%E2%80%9309/TPAMI.2005.159 Paper—Divergence Based Feature Selection for Pattern Recognizing of the Performance of Intrusion… [15] M. A. Hall, “Correlation-based feature selection for machine learning,” Ph.D. thesis, Department of Computer Science, University of Waikato, Hillcrest, New Zealand, 1999. 9 Authors N. Chitra is a research scholar in Presidency University and interested in wireless communications based on mobile network. She had bachelor in electronics and communication engineering and Master degree in Applied electronics from electronics and communication engineering department and presently pursuing Ph.D. in wireless communication using machine learning from the same department. She is more interested in mobile technology and its networks communications and the traffic problem associated with it (email: chitravadde24@gmail.com). Dr. Safinaz S. Assistant Professor at Presidency University. She received Ph. D under Visvesvaraya Technological University (VTU) Belagavi in 2019, completed M. Tech-Electronics at Sir MVIT in 2008 and B.E-Telecommunication at Sir MVIT in 2004. She started the teaching career from 2004 onwards at Sir MVIT, and with a total of 17 years of teaching experience had peer reviewed journals and conferences of national and international repute and guided may award winning projects for Under-Graduate Students and Post-Graduate Students. She is a domain expert in the Image & Video Processing, FPGA Implementation using various Toolkits (email: safinazs@presidencyuniversity.in). Dr. K. Bhanu Rekha Assistant professor at Presidency University. She have completed BE in Telecommunication in RVCE 2002, M.Tech in Electronics in 2009 both from Bangalore and Ph.D in Image and Video Processing 2019. Initially she started her career as Network Engineer in Accenture and Switched to teaching in the year 2005. She has total 17 years of teaching experience and have 9 publications in various international journals, international and national conferences of good repute. Her research interests include Signal Processing, Image and Video Processing, Verilog coding and FPGA implementation (email: bhanurekha@presidencyuniversity.in). Article submitted 2022-10-29. Resubmitted 2022-12-06. Final acceptance 2022-12-12. Final version published as submitted by the authors. 88 http://www.i-jim.org