FACTA UNIVERSITATIS 
Series: Electronics and Energetics Vol. 32, N

o
 2, June 2019, pp. 315-330 

https://doi.org/10.2298/FUEE1902315O 

FEATURE SELECTION FOR INTRUSION DETECTION SYSTEM 

IN A CLUSTER-BASED HETEROGENEOUS WIRELESS SENSOR 

NETWORK 

Opeyemi Osanaiye
1
, Olayinka Ogundile

2
, Folayo Aina

3
, Ayodele Periola

4
 

1
Department of Telecommunication Engineering, Federal University of Technology, 

Minna, Niger State, Nigeria 
2
Department of Physics and Telecommunications, Tai Solarin University of Education, 

Ogun State, Nigeria 
3
Department of Telecommunication Science, University of Ilorin, Ilorin, Kwara State, 

Nigeria 
4
Electrical Electronics and Computer Engineering, Bells University of Technology, Ota, 

Nigeria 

Abstract. Wireless sensor network (WSN) has become one of the most promising 

networking solutions with exciting new applications for the near future. Notwithstanding 

the resource constrain of WSNs, it has continued to enjoy widespread deployment.  

Security in WSN, however, remains an ongoing research trend as the deployed sensor 

nodes (SNs) are susceptible to various security challenges due to its architecture, hostile 

deployment environment and insecure routing protocols. In this work, we propose a 

feature selection method by combining three filter methods; Gain ratio, Chi-squared and 

ReliefF (triple-filter) in a cluster-based heterogeneous WSN prior to classification. This 

will increase the classification accuracy and reduce system complexity by extracting 14 

important features from the 41 original features in the dataset. An intrusion detection 

benchmark dataset, NSL-KDD, is used for performance evaluation by considering 

detection rate, accuracy and the false alarm rate. Results obtained show that our 

proposed method can effectively reduce the number of features with a high classification 

accuracy and detection rate in comparison with other filter methods. In addition, this 

proposed feature selection method tends to reduce the total energy consumed by SNs 

during intrusion detection as compared with other filter selection methods, thereby 

extending the network lifetime and functionality for a reasonable period. 

Key words: Chi-squared, cluster, Gain ratio, intrusion detection, NSL-KDD, ReliefF, WSNs 

                                                           
Received January 16, 2019; received in revised form March 9, 2019 
Corresponding author: Opeyemi Osanaiy 

Department of Telecommunication Engineering, Federal University of Technology, Minna, Niger State, Nigeria 

(E-mail: opyosa001@myuct.ac.za) 


316 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

1. INTRODUCTION 

 Wireless sensor networks (WSNs) are formed by sets of distributed autonomous 

devices with the capability to sense, process, transmit and receive observed or measured 

condition. The sensor nodes (SNs) used in WSNs are characterized by their light weight, 

limited processing power, limited energy, low storage capacity, short communication 

range and low bandwidth [1].  The sensor component of the SN measures the observed 

condition of a particular situation or physical surroundings while the microprocessor 

ensures the obtained information are intelligently computed [2]. The wireless radio of the 

node, on the other hand, ensure communication between neighbouring nodes. 

WSNs often times are deployed in remote, harsh and unattended environment over a 

certain period of time. These locations are most times not accessible, therefore, it is 

impractical to carryout maintenance on the nodes after installation. Common among its 

applications are in the area of environmental monitoring, air craft control, disaster control, 

medical health monitoring, surveillance and military application among many others [3]. 

Although WSNs have been used in numerous applications, the requirements of these 

applications have put a lot of constraints on its design and deployment. Security has been 

identified in the literature as one of the main constraints in the deployment of WSNs. This is 

evident as WSNs are subjected to vulnerabilities associated with wireless communication. 

Additionally, in events that involve unprotected hostile outdoor environment, WSNs are prone 

to different types of attack that compromise the confidentiality, integrity, authentication and 

availability of the data traffic and battery life of the SNs [4,5]. Many of these attacks have 

been identified, analysed and discussed in the literature, with authors proffering different 

defence and prevention techniques. One of such attacks is the denial of service attack, 

which can also be referred to as packet drop attack or sinkhole attack [6]. Blackhole attack 

in WSN is also a type of denial of service attack that advertises itself as either the 

destination node or the shortest route to get to the destination. Upon receiving these falsely 

advertised packets from other nodes, the attacker discards all the packets. Selective 

forwarding is a derivative of blackhole attack in which the adversary node does not reject 

all received packet, instead, it randomly selects packets that will be discarded [7]. The 

adversary can use this to evade detection.  

In order to protect the WSNs from intrusion by an adversary, various intrusion detection 

system (IDSs) have been proposed by researchers. These IDS defence solutions are 

categorized into signature-based and anomaly-based. The former relies on signatures of 

known attack patterns while the latter profiles a statistical usage model over a certain 

amount of time to classify data packets as either normal or anomaly using various 

techniques such as data mining, machine learning and statistical modelling. The signature-

based has a major flaw of not being able to detect unknown attacks while anomaly-based 

detection suffer from high false positive rate [8]. This has necessitated the emergence of a 

hybrid solution that uses the complementary feature of both techniques to achieve a higher 

detection rate. The novel challenges of most of these proffered security solutions for WSNs 

include its limited storage capacity, computational resources and battery power. Therefore, 

traditional security solutions are inappropriate for WSNs.   

Due to the resource limitation in WSN environment, proposed IDS designs are often 

lightweight and highly specialized by type of attack to reduce false alarms. Computational 

Intelligence IDS improves its performance by providing features such as learning, 

reasoning, perception, evolution and adaptation [5]. These features can be explored to 


 Feature Selection for Intrusion Detection System in a Cluster-Based Heterogeneous Wireless Sensor Network 317 

 
develop a more robust IDS that is adaptive to different application scenarios, to handle 

unknown attacks. 

In this work, we introduce a pre-processing phase in the form of feature selection by 

combining three filter feature selection methods; gain ratio, chi-squared and ReliefF, 

herein called triple-filter, to select one-third split (14 most important features) from the 

original dataset before classifying with a decision tree algorithm. The motivation behind 

feature selection is the resource constraint of SNs, therefore machine learning techniques 

use feature selection to eliminate redundant features to reduce the complexity of the 

proposed system. Intrusion detection benchmark dataset, NSL-KDD that consists of 41 

features [9] was used to evaluate the performance of the IDS by considering the detection 

rate, classification accuracy and false alarm rate in Waikato environment for knowledge 

analysis (Weka). Furthermore, we compared our result with the proposed work in [10]. 

The results obtained show that our proposed method can effectively reduce the number of 

features with a high classification accuracy and detection rate and a low false alarm rate 

as compared with [10].  

The contribution and relevance of this paper is as follows. In this work, we introduce a 

pre-processing phase in the form of feature selection, similar to our approach in [11]. 

However, here we combine three filter feature selection methods, herein called triple-filter. 

This is used to select the 14 most important features in NSL-KDD for intrusion detection in 

WSNs. This reduces the complexity of the IDS by presenting a lightweight technique. 

Reduced IDS complexity implies that the SNs in a WSN will consume less energy while 

maintaining high availability. Since the SNs are battery powered, prolonging the network 

lifetime and functionality to a reasonable time is very paramount. Thus, our proposed IDS 

defence solution is suitable for use in a real-time WSN as it helps to efficiently extend the 

network life-time and functionality.  

The rest of the paper is structured as follows. Section 2 describes related work on IDS 

defence solution for WSNs. In Section 3, the WSN architecture and the proposed IDS was 

discussed. Also, the section explains the three filter feature selection methods; gain ratio, 

chi-squared, and RefliefF in details. The feature selection and execution process is highlighted 

in Section 4 while Section 5 present the experimental results. Section 6 highlights the 

performance measure with respect to the classification accuracy, detection rate and false 

alarm, while we discuss the results in Section 7. Finally, Section 8 concludes the work and 

suggests possible research directions. 

2. RELATED WORK 

In defending against malicious attacks in WSN, various intrusion detection approaches 

have been proposed in the literature. An intelligent intrusion and prevention system was 

proposed in [1] by introducing a specialized dataset for WSN. This improves the detection and 

classification of four types of denial of service (DoS) attacks: Blackhole, Grayhole, Flooding, 

and Scheduling attacks. Artificial Neural Network (ANN) was used to train the dataset to 

detect and classify the different DoS attacks. Results from the work show that the dataset, 

WSN-DS, enhanced the IDS ability to achieve a higher classification accuracy rate. An IDS 

based on evidence theory was proposed in [12] for cluster-based WSN. In this work, each 

cluster head collects the behavioural pattern of its cluster members before constructing an 

input evidence according to the deviation from the normal pattern. A weight value is further 


318 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

applied to represent the importance of each behaviour characteristics and revise the evidence 

before its synthesis. A hybrid IDS that enhances security in cluster-based WSN has been 

proposed in [13]. In this work, the proposed IDS is deployed on the cluster head and consists 

of both an anomaly and misuse module. The output of the anomaly and misuse modules are 

integrated with a decision-making module to identify the presence of an attack before 

subsequently classifying into different attack type. In [14], a distributed two-layer and three-

layer IDS scheme was proposed for WSN to detect intrusion using 10% of the data to learn 

during the training phase. A complexity reduction process was introduced to select the 

features to minimize the energy consumption.  
A specification-based intrusion detection system was proposed in [15]. This system 

uses rule-based technique to map behaviours to either normal or anomalous. The rule-
based technique optimizes the local information obtained by watch dogs into a global 
information for decision making by cluster heads. This compensate for the 
communication pattern in the network. In [16], a decentralized IDS was proposed for 
WSN. The proposed algorithm is divided into three phases; data acquisition, rule 
application and intrusion detection. In data acquisition phase, messages are obtained in 
promiscuous mode and the relevant information are filtered and subsequently stored for 
analysis. The rule application phase, on the other hand, process the information and apply 
the rule to the stored data. If the message fails the test during analysis, a failure is raised. 
Lastly, in the intrusion detection phase, the amount of raised failure is compared with the 
expected amount of occasional failures in the network.  Intrusion alarm is raised if the 
former is higher than the latter.  

In [10], an integrated intrusion detection system (IIDS) was proposed for cluster-
based WSNs. The IIDS was based on an earlier work in [17] and it consists of three 
individual IDSs, namely: intelligent hybrid intrusion detection system (IHIDS), hybrid 
intrusion detection system (HIDS) and misuse IDS. These IDSs are designed for the base 
station (BS), cluster head and cluster members, based on their capacity and the type of 
attack they are vulnerable to. For example, the IHIDS with a learning capability is deployed 
in the BS. The IHIDS combines the anomaly and misuse detection by first filtering a large 
number of normal packets. The packets are then forwarded to the misuse detection 
module to identify the type of attack. This is done to achieve a high detection with low 
false alarm. The cluster heads, on the other hand, houses the HIDS, which is similar to 
the IHIDS but without a learning ability. The HIDS function to optimally detect attacks, 
however, it retrains the behaviour of the new attack previously detected and classified by 
the IHIDS. Lastly, due to the resource constraints of SNs, the misuse IDS is proposed. 
The misuse IDS uses a predetermined attack model to match packets to find and detect 
attacks. Experimental results for the misuse detection, using back propagation network 
and KDDCup’ 99 dataset, shows that a detection rate of 90.96% was achieved with an 
accuracy of 99.75% and false positive of 2.06%. 

In the discussion above, different techniques have been considered for feature selection 
in WSNs. The overall aim of these techniques is to enhance the ability of SNs to 
differentiate attacks in WSN. The performance of the security mechanism designed in this 
manner can be influenced by the number of features of the dataset. Different kinds of 
feature selection methods can be used to achieve varying results. This is because of the 
resource limitation characterizing WSNs. Therefore, a combination of different feature 
selection methods that considers the resource constraints in WSN is required. A strategy 
that uses multiple algorithms that harness its features will be advantageous in classifying 
the type of attack in WSNs.  


 Feature Selection for Intrusion Detection System in a Cluster-Based Heterogeneous Wireless Sensor Network 319 

 
Considering the resource limitation that characterize WSNs, this work proposes a 

feature selection method by combining the trio of gain ratio, chi-squared and ReliefF 

(triple-filter) to select one-third split (14 features) from the initial 41 features of the 

dataset. This will significantly reduce the complexity of the IDS and minimize the energy 

consumed during intrusion detection. More so, this filter feature selection method offers 

high detection rate with good classification accuracy and a low false alarm rate as shown 

in Table 4. 

3. WSN ARCHITECTURE AND PROPOSED IDS 

The deployment of WSNs is often made up of tens to hundred thousand of autonomous 

SNs that function via member node communication. This is necessary as a single sensor node 

only covers a small area, therefore can only provide limited information. This single node 

deployment limitation has brought forth the introduction of networks of SNs, that are self-

organising and collaborative, to achieve a wider coverage over a large environment. The SNs 

monitor, sense, computes and transmits the observed and measured condition of the 

environment to relay the information to the intended user through the base station. A typical 

sensor node consists of sensor components, microprocessor components and wireless radio. 

The sensor component measure the condition of the observed environment of interest while 

the microprocessor component embedded in the node is used to intelligently compute the 

obtained information [1]. The wireless radio component of the sensor node is used to initiate 

communication between neighbouring sensor nodes in WSN.  A significant benefit of sensor 

network deployment is its ability to extend its coverage area to environments where it is near 

impossible for human beings to access. 

When categorizing WSNs, environment the sensor nodes are deployed can be used. 

The work in [18] described five types of WSN, namely: underground WSN, terrestrial 

WSN, underwater WSN, multi-media WSN and mobile WSN.  

In underground WSN deployment, sensor nodes are buried under the surface of the 

ground to monitor and sense its condition. These sensor nodes transmit the sensed 

information to the sink node, which is placed above the ground, to relay it to the base 

station. Terrestrial WSN, on the other hand, consist of several cheap sensor nodes deployed 

on a specific area of interest, on the surface of the earth in a pre-planned or ad hoc way. The 

pre-planned deployment involves the optimal placement of sensor nodes, such as grid 

placement and 3-D placement model [19], while in ad hoc deployment, sensor nodes are 

randomly deployed. Underwater WSN deployment are instances where the sensor nodes are 

deployed under the water body to sense, explore and gather information about a subject 

matter and transmits this information using acoustic wave [20]. Underwater WSN presents 

a sparse sensor node deployment as compared to the dense deployment of terrestrial WSN. 

Multi-media WSN are sensor nodes equipped with cameras and microphones to ensure the 

efficient monitoring and tracking of multi-media events, such as imaging, audio and video 

[21]. Here, the sensor nodes interconnects over a wireless medium to retrieve, process, 

compress and convey sensed data in a pre-planned arrangement to ensure coverage. One 

major obstacle to the deployment of multimedia WSN is the resource challenge of sensor 

nodes, due to the excessive energy consumption during the compression and decompression 

when transmitting multi-media events. Finally, mobile WSN are sets of sensor nodes 

deployment that move and interact with the physical environment. Just as with static WSN, 


320 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

mobile nodes can sense, compute, transmit and receive observed and measured events. The 

sensor nodes have the potential to reorganise and reposition themselves after deployment to 

obtain information.  The obtained information can be distributed among other mobile nodes 

within their communication range using dynamic routing protocol. 
WSN can be further classified according to the structure and uniformity of the deployed 

sensor nodes. Some deployment consists of uniform nodes with equal capacity while other 
deployments consist of different sizes and capacity, depending on the architecture. In WSN, 
the network structure (topology) can be categorized into two, namely: flat-based and 
hierarchical [22]. The flat-based topology consists of sensor nodes with equal capacity, 
playing similar roles, such as monitoring and sensing events, computing the sensed 
information and transmitting it directly or via multi-hop routing towards the BS [23]. On 
the other hand, hierarchical WSNs are designed to distribute the sensing and monitoring 
function of the SNs into different levels. Cluster-based WSNs are typical example of 
hierarchical WSNs. In this paper, we limit our scope to cluster-based WSNs. 

Arranging SNs into clusters have been widely employed by researchers to efficiently 
sense and monitor a particular environment. The clustering technique is widely used in 
WSNs because it offers advantages such as reduced energy consumption, fault-tolerance, 
scalability, efficient data aggregation, latency reduction, and robustness [3,24]. A 
clustered WSN comprises of two sets of nodes, namely: the member nodes known as the 
non-cluster head nodes, and the coordinating nodes often referred to as the cluster head. 
Fig.1 shows a typical example of a cluster-based WSN, where c represents a cluster. As 
shown in Fig. 1, the non-cluster head nodes forward the sensed message to their 
respective cluster heads in a process known as intra-cluster communication. The cluster 
heads organise the messages from their respective members before transmitting it to the 
BS. Thus, clustering technique can be regarded as a two-layer hierarchy WSN, where the 
cluster heads work in the upper layer and the non-cluster head nodes operate in the lower 
layer. The coordinating nodes in most cases perform more function as compared to the 
lower layer nodes. Therefore, the cluster head nodes are sometimes equipped with better 
processing subsystem, sensing unit, radio subsystem, and power supply unit as compared 
with the lower layer. If the components of all the sensor nodes in the network are the 
same, the clustering WSN is usually referred to as a homogeneous clustering WSN. 
Otherwise, it is referred to as a heterogeneous clustering WSN. 

In this work, we assume that the cluster heads are equipped with a better processing 
subsystem, sensing unit, radio subsystem, and power supply unit. Accordingly, our 
proposed IDS is deployed on the cluster heads for intrusion detection. The cluster heads 
will monitor the SNs to detect attacks. Furthermore, the cluster heads will filter abnormal 
data and forward all the reliable sensed information to the BS, either directly or via one 
or more relay nodes. From the literature, the relay nodes can either be a cluster head node 
or a non-cluster head node [3]. Since our proposed IDS are installed only on the cluster 
head nodes, we assume that the relay nodes towards the BS can only be a cluster head 
node in order to maintain high availability. More so, the IDS is deployed only on the 
cluster head nodes to conserve the battery energy of the non-cluster head nodes, which in 
turns prolong the network lifetime and functionality. 

Finally, the BS integrates all the collected information and transmits the final result to 
the end user. This proposed IDS defence solution can be deployed with relevant energy-
efficient and energy-balanced clustering routing protocols such as [25, 26, 27, 28, 29]. 
However, in this paper, we verify our proposed IDS solution with the routing algorithm 
proposed in [26]. 


 Feature Selection for Intrusion Detection System in a Cluster-Based Heterogeneous Wireless Sensor Network 321 

 
Fig. 1 Typical example of a cluster-based WSN 

In this section, we present a detailed explanation of our proposed IDS. Current feature 

selection methods can be categorised into filter, wrapper and embedded method. While 

wrapper and embedded methods are time consuming and require specific classification 

techniques to determine the importance of feature subset, filter methods often rely on the 

general attributes of the dataset to carry out data pre-processing; a step which is 

independent of the induction algorithm [11]. Furthermore, filter methods can be classified 

into univariate and multivariate techniques. Univariate techniques, such as information 

gain, presents an efficient and scalable method, however they tend to disregard feature 

dependencies. Multivariate filter techniques, on the other hand, incorporate feature 

dependencies. This makes multivariate techniques more complex. The system that use 

multivariate techniques are less scalable and have a longer computational time than systems 

incorporating univariate techniques. 

In this work, we combine three filter selection methods, gain ratio, chi-squared, ReliefF, 

herein referred to as triple-filter method. The choice of these filter methods is due to its 

ranking and space searching algorithm. Furthermore, research has shown that combining 

feature selection methods can improve the performance of classifiers by identifying features 

that are weak individually but strong as a group [31]. Our proposed triple-filter method 

relies on the combined strength of the trio to determine the features that are strong in 

determining the output class. Here, we select 14 most important features. 

A. Gain Ratio 

In filter feature selection, the value of gain ratio is said to be large when data are 

evenly spread while it presents a small value when all data belongs to only one branch of 

attribute. Gain ratio is an improvement on information gain that remedies its bias towards 

features with large diversity value exhibited by the latter. It uses the number and size of 

branches to determine an attribute and corrects information gain by using intrinsic 

information [30]. Intrinsic information is the entropy of distribution of instance value for 

a given feature. Gain ratio can be calculated [30] for a given feature   and a feature value 
of   using the equation (1) below 


322 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

            (   )  
                (   )

               ( )
   (1) 

Where, Intrinsic Value ( )   ∑
    

    is the number of possible values feature   can take while      is the number of actual 
values of feature  . In our work, we select 14 features from the NSL-KDD dataset that 
represents the highest ranked feature using gain ratio. 

B. Chi-Squared 

Chi-squared (χ
2
), in mathematical statistics, is a feature selection method that is often 

used to determine the worth of an attribute with respect to a particular class. Chi-squared 

can be used to test the independence of two variables with an initial hypothesis,    ; with 
the assumption that the two features are not related [30, 31]. This can be tested using the 

chi-squared formula: 

     ∑ ∑
(          )

    
    (2) 

where      is the actual value and       is the predicted value declared by the hypothesis 
  . The higher the value of the chi-squared, the higher the evidence against the null 
hypothesis. 

C. ReliefF 

ReliefF is an extension of an earlier Relief algorithm that randomly samples an 

instance from the dataset to locate its nearest neighbours from both the same and opposite 

class [32]. The values of the attributes obtained from the nearest neighbours, after 

comparing with the sample instance, are used to update the relevant score from each of 

the attributes. The idea behind this is that, significant attributes will be able to distinguish 

between instances that belong to different classes and have the same value from instances 

belonging to the same class [32]. Key among the advantage of ReliefF filter method is its 

ability to deal with multiclass issues and its robustness and ability to deal with noisy and 

incomplete data [33]. ReliefF can be applied in virtually all situations because of its low 

bias. 

4. FEATURE SELECTION AND EXECUTION PROCESS 

As depicted in Fig. 2, we divided our proposed IDS defence solution for cluster-based 

heterogeneous WSN into three phases. The first phase in implementing a lightweight IDS 

is to introduce an initial pre-process stage for the dataset prior to training. To achieve 

this, we use our proposed triple-filter method for ranking. By ranking, the features that 

are strong in determining the output class of the dataset are obtained and one-third split of 

the ranked features are selected (that is, 14 features). One-third features of the entire 

features in the dataset was arrived at after ranking and eliminating redundant features 

before the performance of the classifier start to decline. These features selected represents 

the most significant features among all the filter methods. In the second phase, the 

training phase, the features selected after pre-processing the NSL-KDD dataset are used 

to train the IDS to detect possible attacks in the network. This is deployed on the cluster 


 Feature Selection for Intrusion Detection System in a Cluster-Based Heterogeneous Wireless Sensor Network 323 

 
head to monitor data from the sensor nodes to the base station. The final phase, the 

classification phase, is a process whereby a labelled training dataset is used to learn, 

before subsequently classifying a test data into one of the class labels [34].  

Anomaly detection techniques that use classification-based algorithms can be divided 

into two stages; the training stage and testing stage. In the training stage, labelled data are 

used to learn a particular classifier. Subsequently, this classifier can be used in the test 

stage to classify a test instance as either normal or anomaly.  In this work, we use 

decision tree classification algorithm to detect the occurrence of a DoS attack. 

Decision trees are data mining approach which are often called classifier trees or 

hierarchical classifiers and are used for prediction. It is a popular method because of its 

simple structure, ease to understand and the short time required to interpret [35]. During 

the classification process, the degree of adjustment of the model to the training set is very 

essential. When a tight stopping criterion is employed, it often creates a small and 

unidentified decision tree, while the algorithm with a loose stopping criterion, on the 

other hand, gives a larger decision tree that tends to over-fit the training dataset.  

Decision tree has been embraced for classification and data analysis in fields, such as 

agriculture, environmental, health, etc. Decision trees are recursive partition models that 

use a single variable to divide datasets at each level. Initially, all sets of cases are defined 

to belong to the same class before a variable is selected, using a split criterion, to 

determine the attribute to insert in a node and branch. Decision tree nodes consist of set 

of rules where each tree node is labelled with an attribute variable which creates a branch 

for each value. They are represented by a tree like structure, with the leaf nodes labelled 

with a class label [36]. 

From its original version of ID3 (Iterative Dichotomiser 3), C4.5 and C5.0 has been 

developed as an advance version of ID3 [35]. Over the years, the C4.5 algorithm has been 

used in the literature as the standard model for supervised learning. During a 

classification process, a training dataset is used to train the decision tree algorithm while 

a test dataset is used to validate the model. When there is a new sample of a test dataset, a 

prediction can be made on the state of the class variable using the path of the tree from 

the root to the leaf node, for the tree structure and sample values. 

For example, let us consider a set S, and select a case at random belonging to class Ct. 

To determine if the random sample belongs to the class Ct, we find Pi using the equation 

[37]:   

   =  
     (     )

   
  (3) 

Where    {
            

     denotes the number of samples contained in the set  . The information conveyed can 
therefore be represented by -                    *                   + where   is 
the probability distribution.  The entropy of   , which is the information conveyed by the 
distribution, can be expressed as follows: 

      ( )   ∑           
 
        (4) 

where n is the length of the information. When a set of   samples are segmented by using 
a non-categorical attribute X, we have a set   *             +where m is the 
number of samples. The weighted average is the information used in determining the 

class of an element     and can be determined using the formula: 


324 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

      (   )   ∑
    

      (   

 
   ) (5) 

Therefore, the information gain can be computed as follows: 

      (   )       ( )      (   ) (6) 

The Eqn. 6 above expresses the difference between the information required to 

determine the value of an element of    and the information required to determine    
having obtained the value of the attribute  . This is therefore referred as the information 
gain due to attribute X. In this work, we use J48 decision tree classification algorithm, a 

version of the C4.5 for classification. 

 
Fig. 2 Proposed Intrusion Detection Model 

5. EXPERIMENTAL RESULTS 

In this work, we use the combination of three filter methods during the pre-processing 

stage to select features from the labelled dataset, NSL-KDD. The most relevant features 

that are strong in determining the output class are ranked and chosen to be used by the 

machine learning algorithm to classify traffic packets as either normal or anomaly. Weka 

software [38], a machine learning tool that consists of series of machine learning 

algorithms, is used for our experimental analysis. During classification, the parameters of 

Weka are set to its default values. 
During evaluation, we determine the performance of our proposed triple-filter method 

by using an open source NSL-KDD dataset. The motivation behind the use of NSL-KDD in 
our work is because it is open source and readily available online. Furthermore, NSL-KDD 
can be modified to suit different experimental attack scenario in WSN. The NSL-KDD is a 
labelled benchmark dataset developed from the initial KDDCUP’99 that presented some 


 Feature Selection for Intrusion Detection System in a Cluster-Based Heterogeneous Wireless Sensor Network 325 

 
shortcomings. The NSL-KDD consists of 41 features and 2 classes, labelled as either attack 
or normal. The features in the dataset are categorized into four groups, namely; basic 
features, content features, time-based traffic features and connection-based traffic features 
[9]. The attacks in the dataset are grouped into DoS, Probe, R2L and U2R; with these 
attacks divided into training set and test set. The training set consists of 21 attack types, 
while on the other hand, the test set consists of an extra 17 unique attack type [9]. In this 
work, we have modified the dataset and extracted the DoS attack trace. DoS attack is one of 
the most prevalent attacks on the resource constraint sensor nodes in WSN that depletes its 
energy and cause a denial of service. DoS attacks on systems, often a times, are carried out 
using similar methods, however, its impact on different hosts varies. 

The feature selection process is carried out to determine the one-third slip (14 highest 
ranked features) of the NSL-KDD dataset using our proposed triple filter method, as 
shown in Table 1. This experiment is performed on an HP 64bit Windows 10 operating 
system with Intel (R) core (TM) i7-4700MQ CPU and 8GB of RAM. We use 10-fold 
cross-validation to estimate the performance of our proposed classifier. In a 10-fold cross 
validation, data are split into 10 equal folds of same sizes prior to carrying out 10 
iterations of training and validation. 

Table 1 Feature Selection using Filter Method 

Filter method Feature selected 

Gain Ratio 12,26,4,25,39,6,30,38,5,29,3,37,34,33 

Chi- Squared 5,3,6,4,29,30,33,34,35,12,23,38,25,39 

ReliefF 3,29,4,32,38,33,39,12,36,23,26,34,40,31 

From Table 1, it is seen that individual filter method has ranked the feature of the 

dataset according to its strength in determining the class. We have attached a weight to 

each ranking position, therefore, we determine the strongest features across the three 

filter methods, and cumulatively sum up the weights. Table 2 presents the output of our 

triple-filter method, that is, the fourteen most important features. These fourteen features 

have been used as the input for training the decision tree classifier, J48 in Weka.                    

Table 2 Triple-Filter Feature Selection Method 

Filter method Feature selected 

Triple-filter 3,4,29,33,34,39,12,5,30,38,26,25,23,6 

6. PERFORMANCE MEASURE 

During the evaluation of a classifier, different metrics such as classification accuracy, 
detection rate and false alarm rate can be used. These metrics are determinant on the 
measure of the true positive (TP), false positive (FP), true negative (TN) and false negative 
(FN). TP are the instances where attack packets are correctly classified, while situations of 
FP occur when certain amount of normal packets are misclassified as attack (false alarm). 
TN, on the other hand, are situations where normal packets are correctly classified, 
whereas, FN are instances where packets are classified as normal, when indeed they are 
attacks. Recently developed IDS for detecting attacks in WSN requires a relatively high 
detection rate with low false alarm. As discussed, in this work, we consider the classification 


326 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

accuracy, detection rate and false alarm rate of our triple-filter method. We compare these 
metrics with the performance of the full dataset containing all the features and each of the 
filter methods using J48 classifier. The metrics used for comparison are defined as follows. 

1. Classification accuracy: This is defined as the ratio of the data defined correctly to 

that of the entire dataset in percentage. The accuracy of a proposed technique can 

be derived using the formula: 

  CA= 
     

 × 100%  (7) 

2. Detection rate: Detection rates is usually based on the confusion matrix and can be 

determined by using the formula  

 DR =  
  

 × 100%.  (8) 

3. False alarm rate: This is the amount of normal data that are misclassified as attack 

during detection. The false alarm rate can be determined by using the formula: 

 FAR = 
  

 × 100% (9) 

Table 3 presents the performance measure of our proposed IDS defence solution with 

respect to the classification accuracy, detection rate, and false alarm rate. 

7. DISCUSSION 

Intrusion detection in WSN during an attack can further increase the complexity and 

resource consumption of the SNs. Thus, filter methods for feature selection when 

compared to wrapper methods are fast and easy to interpret. However, previous research 

has shown that it cannot determine features that are strong as a group but weak 

individually [39]. We have chosen to deploy our proposed IDS on the cluster heads 

because we assume that the cluster heads have better battery life with a higher software 

and hardware capability as compared to the other nodes. Fig. 3 shows the classification 

accuracy across different filter feature selection methods and our triple-filter method.  

 
Fig. 3 Classification accuracy for different filter methods 

98.60%

98.80%

99.00%

99.20%

99.40%

99.60%

99.80%

Accuracy

Full set Gain Ratio Chi-squared ReliefF Triple-filter


 Feature Selection for Intrusion Detection System in a Cluster-Based Heterogeneous Wireless Sensor Network 327 

 
As shown in Fig. 3 and Table 3, our proposed method exhibits the best accuracy 

performance. It presents a slight improvement of 0.01 % as compared to chi-squared 

filter method which gives the second best accuracy performance. In Fig, 4, the detection 

rate across the different filter methods and our proposed triple-filter is presented. The 

result shows that our proposed filter method with 14 selected features offers the best 

detection rate in comparison with the other filter methods. As shown in Table 3 and Fig. 

4, the triple-filter method offer a slight increase in detection rate of 0.02% when 

compared with the next best filter feature selection method. 

 
Fig. 4 Detection rate for different filter methods 

For the false alarm rate, ReliefF presents the worst result of 0.87% while the full 

dataset (with the entire features) showcase the best performance, 0.38%. Our proposed 

method presents a false alarm rate of 0.42% as shown in Fig 5. Although, our proposed 

triple-filter method do not offer the best false alarm rate, it is still suitable in real-time 

WSNs because it offer good classification accuracy and detection rate at a reduced 

complexity. Note that a lightweight IDS is an important requirement in order to prolong 

the lifetime and functionality of sensor networks. 

 
Fig. 5 False alarm rate for different filter methods 

98.60%

98.80%

99.00%

99.20%

99.40%

99.60%

99.80%

100.00%

Detection rate

Full set Gain Ratio Chi-squared ReliefF Triple-filter

0.00%

0.20%

0.40%

0.60%

0.80%

1.00%

False alarm rate

Full set Gain Ratio Chi-squared ReliefF Triple-filter


328 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

Table 3 Performance comparison of the triple-filter with full dataset,  

gain ratio, chi-square and ReliefF 

Filter methods No of features Accuracy Detection rate False alarm rate 

Full set 41 99.56% 99.49% 0.38% 

Gain Ratio 14 99.60% 99.68% 0.47% 

Chi-squared 14 99.66% 99.74% 0.41% 

ReliefF 14 99.08% 99.02% 0.87% 

Triple-filter 14 99.67% 99.76% 0.42% 

Finally, we compared the triple-filter method with a similar work in [10]. Table 4 

presents the performance comparison of the proposed triple-filter method with the work 

in [10]. 

Table 4 Performance comparison of the triple-filter with the work in [10] 

Filter method Classifier No of features Accuracy Detection rate False alarm rate 

SVM-RFE [32] BPN 24 99.75% 95.13% 2.06% 

Triple-filter J48 14 99.67% 99.67% 0.42% 

As presented in Table 4, the triple-filter feature selection, with 14 features, present an 

improvement in the detection rate and the false alarm rate as compared with the work in 

[10] using the NLS-KDD dataset. This shows the efficiency of our proposed triple-filter 

feature selection method in improving the detection rate of the decision tree classifier 

with minimal false alarm while conserving the limited resources of the sensor network. 

8. CONCLUSION AND FUTURE WORK 

In this paper, we have proposed the combination of three filter feature selection 

methods, gain ratio, chi-squared and ReliefF, called triple-filter, to pre-process dataset 

prior to attack classification. The proposed feature selection method is deployed in a 

heterogeneous cluster-based WSN, where the IDS is implemented on the cluster head 

nodes. The proposed IDS reduce the complexity of the system by selecting important 

features in the dataset, thus reducing the features from 41 to 14 before classification, 

using a decision tree algorithm, J48. Experimental results obtained show an improved 

performance with reduced feature set from 41 to 14. Also, our proposed triple-filter 

feature selection method performed better than individual filter methods using J48 

classifier. In the future, we seek to extend our work to study the effect of our solution on 

homogeneous WSNs and also to evaluate our proposed triple-filter feature selection on 

other classification algorithms. 

REFERENCES 

 [1] I. Almomani, B. Al-Kasasbeh, M. AL-Akhras, “WSN-DS: A Dataset for Intrusion Detection Systems in 
Wireless Sensor Networks”, Journal of Sensors, pp. 1–16, 2016. 

 [2] C. O'Reilly, A. Gluhak, M. A. Imran, S. Rajasegarar, “Anomaly detection in wireless sensor networks in 

a non-stationary environment”, IEEE Communications Surveys & Tutorials, vol. 16, pp. 1413–1432, 
2014. 


 Feature Selection for Intrusion Detection System in a Cluster-Based Heterogeneous Wireless Sensor Network 329 

 
 [3] O.O. Ogundile, A. S. Alfa (2017), “A Survey on an Energy-Efficient and Energy-Balanced Routing 

Protocol for Wireless Sensor Networks”, Sensor, vol. 17, 1084, 1–51, 2017. 
 [4] O. Osanaiye, A. Alfa, “Denial of Service Defence for Resource Availability in Wireless Sensor 

Networks”, IEEE Access, vol. 6, pp. 6975–7004, 2018. 

 [5] H.M. Salmon, et al, “Intrusion detection system for wireless sensor networks using danger theory 
immune-inspired techniques”, International journal of wireless information networks, vol. 20, pp. 39–66, 

2013. 

 [6] V. F. Taylor, D. T. Fokum, “Mitigating black hole attacks in wireless sensor networks using node-
resident expert systems”, In Proceedings of the IEEE Wireless Telecommunications Symposium (WTS), 

pp. 1–7, 2014. 

 [7] S. Athmani, D.E. Boubiche, A. Bilami, “Hierarchical energy efficient intrusion detection system for black 
hole attacks in WSNs”, In Proceedings of the IEEE World Congress Computer and Information Technology 

(WCCIT), pp. 1–5, 2013. 

 [8] O. Osanaiye, R. Choo, M. Dlodlo, “Distributed Denial of Service (DDoS) Resilience in Cloud: Review 
and Conceptual Cloud DDoS Mitigation Framework”, Journal of Network and Computer Applications, 

vol. 69, pp. 1447–1465, 2016. 

 [9] M. Tavallaee, E. Bagheri, W. Lu, A. Ghorbani, “A detailed analysis of the KDD CUP 99 dataset”, In 
Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence 

Applications CISDA, pp. 1–6. 

 [10] Wang S.-S., Yan K.-Q., Wang S.-C., Liu C.-W. (2011) An integrated intrusion detection system for 
cluster-based wireless sensor networks. Expert Systems with Applications, 38, 15234–15243. 

 [11] O. Osanaiye, H. Cai, K.K.R. Choo, A. Dehghantanha, Z. Xu and M. Dlodlo, “Ensemble-based multi-

filter feature selection method for DDoS detection in cloud computing” EURASIP Journal on Wireless 
Communications and Networking,  vol. 130, pp. 1–10, 2016. 

 [12] X. Deng, “An intrusion detection system for cluster based wireless sensor networks”, In Proceedings of 

the 16
th
 IEEE International Symposium on Wireless Personal Multimedia Communications (WPMC), 

2013, pp. 1–5. 

 [13] K. Q. Yan, S.C. Wang, S.S. Wang, C.W. Liu, “Hybrid Intrusion Detection System for enhancing the 

security of a cluster-based Wireless Sensor Network”, In Proceedings of the 3rd IEEE International 
Conference on Computer Science and Information Technology (ICCSIT), vol. 1, 2010, pp. 114–118. 

 [14] K. Medhat, R.A. Ramadan, I. Talkhan, “Distributed Intrusion Detection System for Wireless Sensor 

Networks”, In Proceedings of the 9th IEEE International Conference on Next Generation Mobile 
Applications, Services and Technologies, 2015, pp. 234–239. 

 [15] M. Tiwar, K.V. Arya, R. Choudhari, K. S. Choudhary, “Designing intrusion detection to detect black 

hole and selective forwarding attack in WSN based on local information”, In Proceedings of the 4th 
IEEE International Conference on Computer Sciences and Convergence Information Technology 

ICCIT'09, 2009, pp. 824–828. 

 [16] A.P.R. Da Silva, et al, “Decentralized intrusion detection in wireless sensor networks” In Proceedings of 
the 1st ACM international workshop on Quality of service & security in wireless and mobile networks, 

2005, pp. 16–23. 
 [17] Jong K., Marchiori E., Sebag M., van der Vaart A. (2004) Feature Selection in Proteomic Patten Data 

with Support Vector Machines. Symposium on Computational Intelligence in Bioinformatics and 

Computational Biology, 41–48. 
 [18] J. Yick, B. Mukherjee, D. Ghosal, “Wireless sensor network survey”, Computer networks, vol. 52, pp. 

2292–2330, 2008. 

 [19] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, “A survey on sensor networks”, IEEE 
Communications magazine, vol. 40, pp. 102–114, 2002 

 [20] J. Heidemann, et al, “Research challenges and applications for underwater sensor networking” In 

Proceedings of the IEEE Wireless Communications and Networking Conference, WCNC, 2006, pp. 228–

235. 

 [21] I. F. Akyildiz, T. Melodia, K. R. Chowdhury, “A survey on wireless multimedia sensor networks”, 

Computer networks, vol. 51, pp. 921–960, 2007. 
 [22] A. Abduvaliyev, A.-S. K. Pathan, J. Zhou, R. Roman, W.-C. Wong, “On the vital areas of intrusion 

detection systems in wireless sensor networks”, IEEE Communications Surveys & Tutorials, vol. 15, pp. 

1223–1237, 2013. 
 [23] Y. Yu, K. Li, W. Zhou, P. Li, “Trust mechanisms in wireless sensor networks: Attack analysis and 

countermeasures”, Journal of Network and computer Applications, vol. 35, pp. 867–880, 2012. 


330 O. OSANAIYE, O. OGUNDILE, F. AINA, A. PERIOLA 

 [24] C.C. Su, K.M. Chang, Y.H. Kuo, M.F. Horng, “The new intrusion prevention and detection approaches 

for clustering-based sensor networks”, In Proceedings of the IEEE Wireless Communications and 
Networking Conference, vol. 4, 2015, pp. 1927–1932. 

 [25] M. H. Anisi, A. H. Abdullah, S. A. Razak, “Energy-efficient and reliable data delivery in wireless sensor 

networks”, Wireless Networks, vol. 19, pp. 495–505. 
 [26] P. Kuila, P.K. Jana, “Energy Efficient Load- Balanced Clustering Algorithm for Wireless Sensor 

Networks”, Procedia Technology, vol. 6, pp. 771–777, 2012. 

 [27] P. Kuila, S. K. Gupta, P.K. Jana, “A novel evolutionary approach for load balanced clustering problem 
for wireless sensor networks”, Swarm and Evolutionary Computation, vol. 12, pp. 48–56, 2013. 

 [28] P. Kuila, P.K. Jana, “Approximation schemes for load balanced clustering in wireless sensor networks”, 

Journal of Supercomputing, vol. 68, pp. 87–105, 2014. 
 [29] R. Xie, X. Jia, “Transmission-Efficient Clustering Method for Wireless Sensor Networks Using 

Compressive Sensing”, IEEE Trans. Parallel Distrib. Syst., vol. 25, pp. 806–815, 2014. 

 [30] O. A. Osanaiye, DDoS defence for service availability in cloud computing. Doctoral dissertation, 
University of Cape Town, 2016. 

 [31] V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, “A review of feature selection methods on 

synthetic data”, Knowledge and information systems, vol. 34, no. 3, pp. 483–519, 2013. 
 [32] C. J. Mantas, J. Abellan, “Credal-C4. 5 Decision tree based on imprecise probabilities to classify noisy 

data”, Expert Systems with Applications, vol. 41, pp. 4625–4637, 2014. 

 [33] H.F. Eid, A.E. Hassanien, T.H. Kim, S. Banerjee, “Linear correlation-based feature selection for network 
intrusion detection model”, In Advances in Security of Information and Communication Networks, pp. 

240–248, 2013. 

 [34] M.B. Yassein, Y. Khamayseh, M. AbuJazoh, “Feature Selection for Black Hole Attacks”, Journal of 
Universal Computer Science, vol. 22, no. 4, pp. 521–536, 2016. 

 [35] J. Gehrke, V. Ganti, R. Ramakrishnan, W.Y. Loh, “BOAT-optimistic decision tree construction” In ACM 

SIGMOD Record, vol. 28, pp. 169–180, 1999. 
 [36] N. Sanchez-Marono, A. Alonso-Betanzos, M. Tombilla-Sanroman, “Filter methods for feature selection - 

a comparative study”, Intelligent Data Engineering and Automated Learning-IDEAL, pp. 178-187, 2007. 

 [37] N. Sengupta, J. Sen, J. Sil, M. Saha, “Designing of on line intrusion detection system using rough set 
theory and Q-learning algorithm”, Neurocomputing, vol. 111, pp. 161-168, 2013. 

 [38] http://www.cs.waikato.ac.nz/ml/weka/, [Online] access 2nd August 2017. 

[ 39] O. Osanaiye, R. Choo, M. Dlodlo, “Analysing feature selection and classification techniques for DDoS 
detection in cloud”, In Proceedings of the Southern Africa Telecommunication, pp. 198-203, 2016.