INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL Online ISSN 1841-9844, ISSN-L 1841-9836, Volume: 18, Issue: 1, Month: February, Year: 2023 Article Number: 4756, https://doi.org/10.15837/ijccc.2023.1.4756 CCC Publications A Deep Learning Approach for Efficient Anomaly Detection in WSNs Arul Jothi S, Venkatesan R Arul Jothi S* Department of Computer Science and Engineering PSG College of Technology, Coimbatore, 641004, India *Corresponding author:saj.cse@psgtech.ac.in Venkatesan R Department of Computer Science and Engineering PSG College of Technology, Coimbatore, India rve.cse@psgtech.ac.in Abstract Data reliability in Wireless Sensor Networks (WSNs) has a substantial influence on their smooth functioning and resource limitations. In a WSN, the data aggregated from clustered sensor nodes are forwarded to the base station for analysis. Anomaly Detection (AD) focuses on detecting outlier data to ensure consistency during data aggregation. As WSNs have critical resource limitations concerning energy consumption and sensor node lifetime, AD is supposed to provide data integrity with minimum energy consumption, which has been an active research problem. Hence, researchers are striving for methods to improve the accuracy of data handled with a concern on the constraints of WSNs. This paper introduces a Feed-forward Autoencoder Neural Network (FANN) model to detect abnormal instances with improved accuracy and reduced energy consumption. The proposed model also acts as a False Positive Reducer intending to reduce false alarms. It has been compared with the other dominant unsupervised algorithms over robustness and other significant metrics with real-time datasets. Relatively, our proposed model yields an improved accuracy with fewer false alarms thereby supporting a sustainable WSN. Keywords: Anomaly Detection, Autoencoder Neural Network, Data Aggregation, False Positive, Unsupervised Algorithms, Wireless Sensor Networks. 1 Introduction Sensor networks make use of devices that detect or measure a physical property and can record, indicate, or respond to it. WSN assists observations and creates supremacy over physical environments from inaccessible locations with higher correctness. They are suitable to use in the fields such as natural observation, and armed forces to obtain sensing information. Sensor nodes used in a WSN have acute energy and cost limitations as they have to constantly gather and send data from the environment. This has restricted WSNs to follow critical resource constraints enabling sustainability in network deployment. In a WSN, sensor nodes are employed in a distributed manner in such a way that data gathering would continue even if some of them are dead due to power depletion or other https://doi.org/10.15837/ijccc.2023.1.4756 2 Figure 1: Distributed deployment scenario of WSN with anomaly detection in CHs events. For continuous observance in the environment, data aggregation could be employed by forming clusters of sensor nodes with neighborhood nodes. Figure 1 depicts the implementation setup of a typical WSN that performs data aggregation in Cluster Heads (CHs). Clusters play a major role in increasing the lifetime of a WSN by the way of energy conservation, as all the data gathered by each sensor node are not blindly transmitted. The CH in each cluster is responsible for data aggregation and forwarding. The data aggregation process must ensure integrity by detecting and removing anomalous instances received from sensor nodes in a cluster [1]. These aggregated data after anomaly detection are forwarded to the base station suitable for further investigations and decisions. 1.1 Data aggregation in WSNs Data aggregation is the method of scrutinizing the gathered data from several sensors, estimat- ing the quantified response about the sensed environment, and providing fused information to the base station [2]. Data aggregation desires a scheme for converting the sensed data into high-quality information [3]. The major functions of Data aggregation are: avoiding data redundancy, reducing data transmission, and improving data accuracy [4, 5]. This can be achieved through AD during data aggregation. During data aggregation, the collection of data occurrences that are detected as erratic during aggregation amongst the residual of the dataset is defined as an anomaly or outlier [6]. Moreover, anomalies in sensor data are quantified as those important deviations in sensed data from the normal values [7]. Paying attention to anomalies and detecting abnormalities in data will increase data accuracy in WSNs [8]. The energy utilized for AD is a challenging task in a WSN due to its limitations on node lifetime. Deep learning provides solutions to overcome these issues with minimized resource utilization and extending the life expectancy of the network [9]. Our proposed work aims to perform AD during data aggregation in CH and reduce communication costs with reduced energy consumption. 1.2 Anomaly Detection in WSN Sensing data are collected in the form of data streams in huge volumes of real observations from the environment and are stored in a database. Modern WSNs are aimed to collect multivariate types of data from the field instantaneously. Nowadays, AD models should be capable of dealing with data without any prior knowledge. This can be approached with unsupervised techniques that try to extract useful features from the dataset. Figure 1.1 depicts the implementation of the AD model in a single cluster which can be considered for all CHs in the network. 2 Literature Background Researchers have focused on classical methods of unsupervised anomaly detection, which can be coarsely classified as clustering-based algorithms, classification algorithms, and deep learning methods. https://doi.org/10.15837/ijccc.2023.1.4756 3 Figure 2: Implementation of AD model in a single cluster Authors in [10] have implemented a prototype-based agglomerative hierarchical clustering method to identify features of uncertain data based on the probability distribution to be clustered and achieve average accuracy on all datasets. Ensemble clustering along with Gaussian Mixture Model and One- Class Support Vector Machine (OCSVM) [11] extracts features based on appropriate choice rather than manually by clustering anomalies into a specific type. Also, in [12] a hybrid combination of subspace clustering with OCSVM distinguishes between abnormal and normal instances based on mapping the features in the space as clusters. The classification-based approach used in [13] identifies outliers based on spatiotemporal correlation with Support Vector Data Description (SVDD). More- over, in [14], fixing the neighborhood makes the algorithm converge faster and provides robustness with prior assumptions on several anomalies. The authors of [15] introduced OCSVM to be suitable for both univariate and multivariate datasets by training the model using histogram-based labeling. The subspace method, Robust Principal Component Analysis (RPCA) used in [16] performed low-rank and sparse regularization for Spatio-temporal data distribution and detection of the anomalies. rPCA has not considered the detailed information of Spatio-temporal data distribution. Many authors have used deep learning approaches [17] as it acts more suitable for unsupervised anomaly detection. In [18], a deep variational autoencoder with a feed-forward neural network model reduces inefficient learning in a semi-supervised context (for training) restraining the learning model in taking widespread values from the weight space. Authors in [19] discuss the various deep learning techniques like Convolution Neural Network, Long Short-Term Memory, and Autoencoder which learn commonalities within the data to facilitate outlier detection. Hybrid models using deep learning techniques extract robust features within hidden layers. Distributed anomaly detection using an autoencoder neural network [20] model adapts the unfore- seeable and new changes in a non-stationary environment. This model introduced a priority scheme to lower the false-positive rate. In [21], fast outlier detection was introduced by using Deep Belief Networks (DBN) to extract invariant features for complex and dimensional datasets. Anomaly Detection using One-Class Neural Network in [22] discerns anomalies in complex datasets when the decision boundary between normal and anomalous is highly non-linear. Unsupervised anomaly detection algorithms in [23] aid a comparative evaluation of nearest neighbor-based algo- rithms, clustering-based algorithms, and statistical approaches dealing with both local and global anomalies. In recent years, Q-learning algorithms [24, 25] were implemented for handling reward noise and inverting gradient procedures for reducing loss. In addition, it is identified that deep reinforce- ment learning reduces the false-positive rates involving humans and learns meta-policy to optimize anomalies. A combination of the Gaussian process and Graph-based detection [26] with Hampel iden- tifier overcomes the limitation of datasets having dense and sparse dimensions. This model identifies outliers in data across time. The fuzzy logical model in [27] uses neighborhood information with a fuzzy decision process to detect anomalies enabling a high detection rate and low false positive alarm rate. In [28], the authors have suggested recurrent neural networks with long short-term memory units (RNN with LSTM) to efficiently identify anomalies in health log data that was compiled from various https://doi.org/10.15837/ijccc.2023.1.4756 4 devices. The trials only used a tiny amount of data. In both instances, the data also needed to go through a quick preparation step before analysis. The data were organized to match each method’s input. The model suggested in [29] operates in two phases and is effective in detecting network anomalies. Improved grey wolf optimization (ImGWO) is used for feature selection in the first phase to achieve the best possible trade-off between two goals, namely decreased error rate and feature-set minimization. In the second stage for network anomaly categorization, the model uses an improved Convolution Neural Network (ImCNN). The model suffers from high computation because of intricate design and parameter optimization during feature selection. To recognize outliers with a resourceful focus on the production of false alarms and incorrect conclusions, [30] created a method in which each cluster head was applied a denoising auto-encoder with a Gaussian kernel for outlier detection after performing a basic cluster algorithm based on the residual energies of the sensor nodes. The literature review of related works indicates that the following research gaps (RG) exist in anomaly detection, which is worthwhile to ponder upon: RG (i). The works in [14, 18, 23] indicate the need for tuning learning parameters to improve the model’s detection. Additional activation functions and alternative structures are required to gauge the method’s success [28]. RG (ii). The experiments in [19, 21, 22] express less detection accuracy with a low Area Under Curve (AUC) score. RG (iii). In [11, 13, 14, 20, 22, 24, 29, 30] the models require high computational time that consumes energy when dealing with large datasets. RG (iv). Subsequently, the methods in [12, 15, 19] suffer from a False Positive Rate (FPR). RG (v). The methods in [23, 25, 27] perform poorly for high-dimensional data. The proposed Feed-forward Autoencoder Neural Network (FANN) model is built to detect abnormal instances by employing multiple layers in the encoder and decoder to address a few of the aforemen- tioned research issues. In particular, the focus is given to lowering the false positives, increasing the AD accuracy, and conserving the energy of the WSN without affecting the computational time. 3 Proposed Feed-Forward Autoencoder Neural Network for Anomaly Detection 3.1 Overview Autoencoders are typical neural networks that take input data, learn its significant characteristics, and produce output that is analogous to the input. Furthermore, the autoencoder learns to estimate a characteristics function, to construct an output value. The block diagram of the proposed model is shown in Figure 3, where initially normal instances are considered for training, which is archived in the encoding phase as regular data patterns through a series of hidden layers. The decoder in the network tries to reconstruct the data which is the same as the input normal pattern. When abnormal instances are fed, the model gathers the relevant regular instances in memory for reconstruction and generates an output. For abstract explanation, the archive is implemented as a single memory entry [31]. An additional component in the autoencoder model is included to compute Anomaly Score (AS) to reduce false alarms. To ensure the durability of the model, the hyper-parameters like the number of layers for the encoder, the number of layers for the decoder, activation functions, and loss functions are tuned manually to get the appropriate threshold value for classification. In the proposed work, the autoencoder model tries to detect anomalies with fewer false positives, which is explained in the following section. 3.2 Schema of FANN model The proposed Feed-forward Autoencoder Neural Network (FANN) is composed of multiple hidden layers for encoding and decoding. Figure 3.1 shows the schema of it with an input layer and 6 dense layers encompassing 3 layers for encoding and 3 layers for decoding. https://doi.org/10.15837/ijccc.2023.1.4756 5 Figure 3: Block diagram of proposed Feed-forward Autoencoder Neural Network 3.3 Model Functionality As portrayed in Figure 3.1 there are six dense layers indicated as f1, f2, f3, f4, f5, and f6, respectively. The fusion of all these tasks is represented as f(x) = f6(f5(f4(f3(f2(f1(x)))))) Generally, this process describes the structure of deep learning neural network. The procedure for unsupervised anomaly detection is discussed below: 1. In the preparation phase, the encoder in the proposed FANN model learns the given unlabeled data Xa by compressing data representation with the function f’ and determines the attribute vector ha from Xa: ha = f ′(Xa) 2. The attribute vector ha is considered for archiving the normal behavior of data instances. This memory archive data is used during the testing phase for evaluating the reconstruction error of the model. 3. The decoder tries to reproduce Xa: from the attribute vector ha by a mapping function g’: r = g′(ha) = g′(f ′(Xa)) 4. Typically, training data instances cannot be used to estimate the characteristics of hidden layers but it encompasses a hidden correlation with them. The learning phase in the model evolves a procedure to assess the characteristics of hidden layers to obtain the best possible outcomes at the output layer. Moreover, this learning feature of the model must trace the correlation in the hidden layers with training data instances. 5. In the testing phase, the learning algorithm tries to interpret the knowledge gained during training with the same procedure of encoding and decoding to reproduce data instances. The reconstruction error of the model is generated using the loss function (discussed in section 3.4) to analyze the model’s detection accuracy. 6. Each dense layer’s weights are initialized randomly using Xavier initialization [32] such that the variance across each layer is the same. Equation 1 shows the random process initialization applied to all weights and equation 2 is used for estimating the variance of weights. Xavier initialization is considered efficient for activation functions since it prevents gradients from diminishing or setting off. For each layer Weight is calculated as W [l] (m,n) = N ( 0, 1 n[l−1] ) (1) and variance V is calculated as V ar(W) = 1 n[l−1] (2) 7. Moreover, the model needs to be employed with a suitable activation function and loss function for improving detection accuracy and reducing false positives. A comparative analysis of various activa- tion functions and loss functions is presented in the proceeding sections. https://doi.org/10.15837/ijccc.2023.1.4756 6 3.4 Activation function analysis for the model The activation function is considered one of the hyper-parameters that plays a significant role in the deep learning model. Choosing the best activation function in hidden layers and the output layer would enhance the model’s capability to do the desired task (In the proposed work context, the objective is to detect anomalous data). In [33, 34] authors have discussed various activation functions suitable for deep learning techniques. Based on the study from existing works, the following activation functions with their respective equation mentioned in table 1 are considered for performance evaluation. The performance analysis is presented in section 4.3. Table 1: Existing activation functions with its estimation formula Reference Activation Function Equation [34] Hyperbolic Tangent f(x) = tanh(x) = ex−e−x ex+e−x Rectified Linear Unit f(x) = max(0,x) = { xi, ifxi ≥ 0 0, ifxi < 0 Exponential Linear Unit f(x) = { α(exp(x) − 1) if x ≤ 0 x for x > 0 } Leaky ReLU f(x) = (ax + x) = { x for x < 0 ax for x ≥ 0 Softmax f(xi) = exp(xi)∑ j exp(xj ) Swish f(x) = x.sigmoid(x) = 11+e−x [35] Gaussian Error Linear Unit f(x) = f(x) = xP(X ≤ x) = xφ = x12 [1 + er ∗f( x√ 2 ] 3.5 Loss function analysis for the model For training the proposed FANN with improved accuracy, reconstruction error must be reduced. Thus, two different loss functions are considered and their effectiveness is analyzed for model suitability. This section discusses the significance of Arctan Mean Square Error (AMSE) [6] and Huber Loss (HL) [36] functions which are considered for analysis. 3.5.1 Arctan Mean Square Error (AMSE) The Mean Square Error function (MSE) is an essential function that provides robust behavior of the model during the testing and training phase to identify data overfit when validation error goes beyond the training error. MSE of a model relating to test data can be estimated as the mean of the squared prediction errors overall data in the test set. Thus, the loss function of MSE represented in equation 3 calculates the dissimilarity between the true value and the predicted value for the data instance. MSE = ∑n (i=1)(yi −λ(xi)) 2 n (3) where, yi is actual output value for test data instance xi , λ(xi) is the reconstructed output value for test data instance xi and n is the number of test data instances. MSE is sensitive toward outliers, but minimizing MSE will respond better in identifying outliers effectively. Moreover, decreasing squared error will improve the accuracy of a model when the number of training data instances is known. Thus, a modified mean squared error value for training the model called Arctan MSE (AMSE) in equation 4 is considered. AMSE as discussed in [6] will gradually drop https://doi.org/10.15837/ijccc.2023.1.4756 7 the rate of loss for independent training data. AMSE = 1 N ∑ ( p = 1)p ∑ ( i = 1)N (tan−1((Xpi − (g(y))pi))) 2 (4) where, Xpi is the predicted value for data point i, g(y)pi is the actual value for the data point i and N is total number of data points 3.5.2 Huber Loss (HL) Huber loss is more robust to outliers in data than the MSE loss which is suitable for differentiation by 0. There is a direct impact by tuning the hyper parameter,δ (delta) to reduce the error. Thus Huber loss considers δ 0 and δ ∞ as like MSE, MAE and is given as: Lδ(y,f(x)) = { 1 2 (y −f(x)) 2, |y −f(x)|≤ δ δ ∗ (|y −f(x)|−12δ), otherwise (5) where, δ(y −f(x))2 is the difference between observed value and predicted value, δ is the hyper-parameter delta, tuned to obtain minimum error (delta value chosen to converge to best solution is 0.1). Particularly, the loss reaches its minima by decreasing the gradient for MSE. Hence to avoid this, the HL function in equation 5 bends the minima in turn by decreasing the gradient by merging the optimum features from both MSE and MAE. For the proposed model, loss estimation is carried out by using HL and AMSE for 100 epochs. The analysis is presented in section 4.4.2. 3.6 False Positive rate reduction Generally, anomaly detection may produce false positives which could affect the model’s efficiency. To mitigate this drawback, the proposed FANN model is trained to perform precise prediction of actual anomalies by eliminating false positives with high recall and high precision. During the training phase, an acceptable threshold level is maintained for detecting abnormal data and false-positive rates. This will be considered over the test set also. An additional component (Thresholding Filter) is included in the model that removes false positives similar to [37]. To filter a false abnormal detection in the model, the results of the following were considered: • Receiver Operator Characteristic (ROC) – ROC is capable of analyzing the model’s performance based on various thresholds. The reason for choosing the ROC approach for false-positive re- duction can be observed in section 4.2. • Loss function – The outcome of reconstruction error from the loss functions viz, AMSE, HL is considered as the threshold to be fixed for filtering false positives in the model. • Conformance threshold - As proposed in [38], the conformance threshold is evaluated based on the mean distribution of input instances. In this research, an assumption is made that anomalies detected in the model have more false positives if it reaches a threshold of more than 1%. For estimating the threshold value, the Central Limit Theorem (CLT) in probability theory is used. CLT performs sampling of the given dataset to a normal distribution. The threshold is calculated as: Threshold(T) = (µ(# ofanomalies) −σ(# ofanomalies)) (6) where, µ is the mean or average of anomalies and σ is the standard deviation of the anomalies https://doi.org/10.15837/ijccc.2023.1.4756 8 Figure 5: Sensors for weather observations in AWS The Conformance threshold stated in equation 6 helps in maintaining a stable F1 score. This conformance threshold factor also impacts the performance of the model in terms of increasing the detection accuracy. The major intention of this research is to come up with a consistent model capable of categorizing abnormal instances. 4 Experimental Results 4.1 Dataset exploration The performance of the proposed model is evaluated considering climate data [39]. Prediction of weather is considered to be the most important task in real-time. Many sensor stations are built around the world by various research labs to monitor climatic changes like temperature, humidity, pressure, precipitation, etc., These data need to ensure integrity for analysis like weather prediction, global warming rate, and disaster alerts related to the environment. Weather Underground is a service provider for real-time weather information retrieved with the help of Automatic Weather Stations (AWS) available to over 250,000 around the world. An AWS is a set of weather measuring sensors that can be installed either in-home or in business buildings. Figure 3.6 shows the sensor setup of a Weather Station (WS) utilized for weather observations. Indira Gandhi International Airport (IGIA) station acts as WS and provides real-time weather observation for DELHI weather forecasts. Airport WSs routinely issue reports used by pilots, air traffic controllers, meteorologists, climatologists, and other researchers. In the Delhi region, 12 Automatic Weather Stations around a 50 km radius have been set up that measure 24x7 hours of weather in IGI airport. Here, ‘g’ aggregator nodes are considered in the WS network. A total of 20 features related to temperature, humidity, pressure, dew point, and wind speed with statistical readings are observed along with precipitation at each time interval. Nearly 20 years of climatic information are collected for AD investigation. These weather data are prone to errors, outages, and other defects. The following section discusses the implementation setup of AD using the FANN model. 4.2 Data processing and visualization The dataset is considered as unlabeled pre-processed data with temporary labels assigned for each column. The normal instances and abnormal instances are categorized for generating a training set. Moreover, the test set data is generated with both normal instances and abnormal instances. The distribution of data is visualized in Figure 4.1. Box Plot representation provides a great benefit for understanding the outlier rate in the dataset. It is also observed that the dataset has a highly imbalanced dataset. https://doi.org/10.15837/ijccc.2023.1.4756 9 Figure 6: Data distribution analysis using Box plot Figure 7: Accuracy of activation functions for 100 epochs 4.3 Analysis of activation functions with a loss function The processed data is fed into the proposed FANN for training. As discussed in section 3.4, various activation functions have been experimented with for the given dataset. The result shown in Figure 4.2 depicts the line plot of the accuracy of the activation functions trained for 100 epochs. From the graph Figure 4.2 of activation function accuracy, Hyperbolic Tangent activation maintains a stable accuracy of data when the model is executed for 100 epochs. In many research works, Leaky- ReLU is considered a suitable activation function but it will affect the computation and may overfit the model losing linearity. Based on the proposed FANN model’s focus, Hyperbolic Tangent (Tanh) activation function is identified as appropriate for the model to detect anomalies with high accuracy and minimum error. 4.4 Methods for mitigating False alarm rate 4.4.1 AUC - ROC Curve To rate the performance of the model, the classified outcomes are measured using Receiver Oper- ating Characteristics Curve repetitively. The Area Under the Curve (AUC), the integral of the ROC, is the probability that the classifier ranks based on positive examples for differentiating between pos- itive and negative classes. This perceptive ensures that the AUC assessment method is more suitable for false-positive reduction and detection accuracy. For empirical illustration in unsupervised AD, AUC-based assessment is rationally considered to be the actual standard for estimation. In Figure 4.4.1 AUC-ROC curve of the proposed FANN model provides the relationship between the true positive rate and the false positive rate for assessing detection accuracy. It is observed that https://doi.org/10.15837/ijccc.2023.1.4756 10 Figure 8: AUC-ROC curve of proposed FANN model Figure 9: (a) Reconstruction error using Arctan MSE (b) Reconstruction error using Huber Loss.Reconstruction error comparison of proposed model the AUC score obtained for the FANN model using HL 0.98 is compared to AMSE 0.96. 4.4.2 Loss Functions: AMSE and HL The proposed FANN model experiments with two loss functions: AMSE and HL during the training and test phase. The model’s training loss and test loss are accumulated for examining loss functions’ average reconstruction error. From the result of test loss, the threshold is varied to obtain outliers accurately and estimated with a range between 0.01 – 0.02. Figure 4.4.2(a) and 4.4.2(b) display the reconstruction error obtained by the model after varying threshold for 100 epochs. From the above graph, it is evident that the FANN model with the HL function reacts more robustly to outliers for training and test data than the AMSE loss function. This resulted in a False Alarm Rate of 32% using AMSE and 16% using HL respectively. 4.4.3 Conformance threshold The conformance threshold value is calculated from table 2 which represents the mean and standard deviation of the model’s reconstruction error. 5 Results Comparison and Discussions The proposed FANN model’s performance is evaluated in terms of success and competence. It has been compared with existing unsupervised algorithms as discussed in section 2. Figure 4.4.3 shows the https://doi.org/10.15837/ijccc.2023.1.4756 11 Table 2: Thresholding Comparisons using mean and standard deviation of the anomalies detected Reconstruction error AMSE Huber Loss Mean (µ) 0.10211 0.09656 Standard Deviation (σ) 0.09061 0.08506 min 0.000021 0.000003 25% 0.000521 0.000116 50% 0.000822 0.000291 75% 0.001372 0.000699 max 0.296708 0.240890 # of False Positives 504 220 Figure 10: Categorization of unsupervised algorithms for comparative analysis categorization of various algorithms such as clustering techniques U-AHC [10], DEP-SSEC [11], SSC- OCSVM [12], classification techniques N-STASVDD [13], OCSVM [6, 15], subspace method rPCA [16], and deep learning models AE-(DADA-S) [20], OCNN [22], LSTM [28], Hybrid learning model [29], Denoising AE [30]. Figure 5 shows the AUC score that is standardized for the 10 <= k <= 100 range. ‘k’ indicates the batch size of data instances. In our evaluation, many different ‘k’ values are considered and averaged by computing the AUC mentioned in table 3. Environment: Experiments were run on a 4-core processor in Windows OS with Intel Core i7-8th Gen@1.8 GHz and 8GB RAM. From Figure 5 it is observed that the proposed FANN achieves better reconstruction performance Table 3: Analyses of AUC score of proposed FANN with unsupervised algorithms Techniques Method AUC Score Clustering U-AHC 0.84 SSC-OCSVM 0.88 DEP-SSEC 0.77 Classification N-STASVDD 0.92 OCSVM 0.94 Subspace method rPCA 0.95 Deep Learning AE – (DADA-S) 0.92 OCNN 0.89 Proposed FANN Model (i) Arctan loss 0.96 (ii) Huber loss 0.98 https://doi.org/10.15837/ijccc.2023.1.4756 12 Figure 11: Generated AUC curves with different k values 10<=k<=100 with a higher AUC score for AD, which attempts to reduce RG (ii). 5.1 Anomaly Detection performance The performance metrics selected for evaluation are Accuracy, Precision, Recall (or Sensitivity), Specificity, F1 Score, False Alarm Rate (FAR), and Matthews Correlation Coefficient (MCC) based on a confusion matrix representing the number of true positive rates (t.p rates), true negative rates (t.n rates), false-positive rates (f.p rates) and false-negative rates (f.n rates). The following equations are utilized for calculating the metrics, Accuracy = #(t.prates+t.nrates)#(t.prates+t.nrates+f.prates+f.nrates) Precision = #(t.prates)#(t.prates+f.prates) Recall(or)Sensitivity = #(t.prates)#(t.prates+f.nrates) Specificity = #(t.nrates)(#(t.nrates+f.prates) F1Score = 2∗#(t.prates)#(2∗t.prates+f.prates+f.nrates) FalseAlarmRate(FAR) = #(f.prates)#(f.prates+t.nrates) Mathhewscorrelationcoefficient = (#(t.prates∗t.nrates)−#(f.prates∗f.nrates))√ (#(t.prates+f.prates)∗#(t.prates+f.nrates)∗#(t.nrates+f.prates)∗#(t.nrates+f.nrates)) For comparing the accuracy of our proposed model, experiments with different test sets are been carried out. The training data and test data are in the ratio of 3:2. To measure the accuracy of the models, in the 40% of test data various proportions of abnormality ranging from 1% to 15% are used. Table 4 shows the average accuracy calculated for each technique with various test window sizes. Based on experimentation, the accuracy of the proposed FANN model produces an improved accuracy by >98% making it suitable for AD in cluster head(CH) comparative to other existing techniques. Table 5 shows the evaluation results related to the precision, recall, specificity, F1 score, and False Alarm Rate (FAR) of the proposed FANN with other unsupervised algorithms discussed in the related work. Figure 5.1 depicts the overall evaluation results of the models for AD. The results show that our proposed FANN yields higher detection performance. This addresses RG (i). The motive of our proposed model is to design a false positive reducer model capable of reducing unnecessary false positives and avoiding data loss. This can be noticed in Figure 5.1(a), where the proposed FANN using HL produces lower false positives when compared with existing algorithms dis- cussed in section 2, thereby addressing RG (iv). Finally, the metric Matthews Correlation Coefficient (MCC) indicates that the model produces good results only if it classifies both positive and negative elements. In Figure 5.1(b), the algorithms that produce lower MCC convey a random guess model whereas the proposed FANN has produced an MCC of 0.8 showing upright performance in unsuper- vised anomaly detection. This is a positive move towards reducing the RG (v). For an imbalanced https://doi.org/10.15837/ijccc.2023.1.4756 13 Table 4: Average accuracy evaluation for AD with different episodes (training set and test set) Techniques Method Average Accuracy (%) Clustering U-AHC 0.84 SSC-OCSVM 0.88 DEP-SSEC 0.77 Classification N-STASVDD 0.92 OCSVM 0.94 Subspace method rPCA 0.95 Deep Learning LSTM 91.0 Hybrid learning 92.4 Denoising AutoEncoder 95.4 AE – (DADA-S) 0.92 OCNN 0.89 Proposed FANN Model (i) Arctan loss 0.96 Proposed FANN Model(ii) Huber loss 0.98 Table 5: Result of proposed models’ efficiency with Sensitivity, Specificity, F1 Score, FAR and Mathews Correlation Coefficient (MCC) Methods Precision Sensitivity Specificity F1 Score FAR MCC U-AHC 92.6 90.8 11.5 91.7 88.5 0.02 SSC-OCSVM 90.7 92 9 91.3 90.7 0.01 DEP-SSEC 95.5 91.5 17 93.5 83.3 0.06 N-STASVDD 94.8 94.4 20 94.6 80 0.14 OCSVM 96.9 96.4 30 96.7 70 0.25 rPCA 98.7 98.7 40 98.7 60 0.38 LSTM 95.8 96.2 64 92.7 51 0.54 Hybrid learning 96.9 89.7 72 97.4 33 0.62 Denoising Autoencoder 97.5 95.4 80 96 39.2 0.49 AE – (DADA-S) 99 98.9 55.8 98.9 44 0.58 OCNN 98.1 98.5 30.7 98.3 69.2 0.32 FANN Model (i) Arctan loss 99.5 67.1 99.5 99.5 32.8 0.67 FANN Model (ii) Huber loss 99.7 83.6 99.8 99.8 16.3 0.8 dataset, analyzing the detection accuracy alone will not be a consistent measure. This exposes an over-optimistic estimation of the model in terms of efficiency. To get rid of the problem of uneven clas- sification, the Matthews correlation coefficient (MCC) is estimated to identify the proposed model’s capability concerning effectiveness. AD model must motivate on accurate detection of anomalies but all the compared algorithms such as, • Clustering-based algorithms fail to enhance their accuracy due to their sensitivity to outliers, • Classification and subspace methods lag in identifying true negatives and false positives for a large high-dimensional dataset with increased processing time, • Deep learning algorithms fall back with lower MCC but produce higher false alarms, and require high computation time, Therefore, the proposed FANN model can be identified as a robust model achieving expected detection performance for AD. https://doi.org/10.15837/ijccc.2023.1.4756 14 (a) F1 score and FAR analysis (b) MCC score analysis Figure 12: Comparison plot of significant performance metrics (F1 score, FAR, and MCC) 5.2 Energy Performance In this section, to determine the utilization of energy during anomaly detection, complexity anal- ysis is performed. Table 6 provides the complexity analysis used in similar approaches [40]. The computational complexity of FANN is O(rng)based on the evaluation of ‘r’ -number of inputs at the time, ‘n’ - total dimension of the instances, and ‘g’ - number of aggregators and the communication overhead obtained is O((r-a)n)where ‘a’ - number of abnormal instances that are removed as it uses deep learning model. The energy performance of sensor nodes for our work is evaluated based on the Table 6: Complexity Analysis of Anomaly Detection Techniques Computational Complexity Communication Complexity Classification O(rn2) O(rvn2) Clustering O(rn6g) O(rn3) Subspace method O(r3g) O(r3n) Deep Learning O(rng) O((r-a)n) following factors [41]: • Computing energy – the energy consumed by CH for data processing and analysis • Communicating energy – the energy consumed by CH for receiving, and transmitting data with AD and without AD A decentralized algorithm was discussed in [42] that computes energy in the dynamic network by developing a network coding subgraph that satisfies user-defined quality and security constraints. The Efficient Data Collection Aware of Spatio-Temporal Correlation (EAST) algorithm for energy- aware data forwarding in WSNs [43] takes full advantage of both spatial and temporal correlation mechanisms to save energy while maintaining real-time, accurate data reports toward the sink node. To minimize the overall energy consumption, a dynamic optimization model [44] developed multi-hop topologies to mitigate data transmission costs. For AD energy consumption, the following are the premises considered: Let Etot be the initial energy unit of each aggregator node,Etrn be the energy unit of data packet communication,Erec be the energy unit for receiving a data packet, and Ead be the energy utilized by aggregator nodes ‘g’ for anomaly detection [45]. Then the residual energy Erd of sensor node after reception can be expressed as, Erd = Etot −Erec (7) And the energy consumed by Ead can be calculated as, Ead = CON.Tpc (8) where, CON – Power consumed to start the processor https://doi.org/10.15837/ijccc.2023.1.4756 15 Figure 13: Energy Model for sensor nodes Tpc- The time utilized by the number of instructions and the microcontroller operating frequency, (#ins) fq . Now, the remaining energy Etxrd available for transmission is expressed as, Etxrd = Erd −Ead (9) Here the equation 9 states that there will be energy consumption due to Ead before transmitting the aggregated data. To infer minimum energy consumption by Ead, the following expectations are considered: The total energy consumed by an aggregator node (sensor node) Etot is measured with Erec = (n∗X)J/s, Etrn = (m∗Y )J/s and Ead = (a)J/s as the assumed energy consumption unit. The value of ‘n’ indicates number of data instances sent to aggregator node and ‘m’ indicates number of data instances transmitted from aggregator node. The value of ‘m’ may vary or remains the same as ‘n’ depending on number of anomalies detected. A comparison of energy consumption of aggregator node is expressed in the following manner as: • Considering the energy consumption without AD for aggregator node, E1tot = PstTst + Erec + Etrn = PstTst + (n∗X + m∗Y )J/s (10) Here, Etrn = Erec since the number data transmitted will be the same as received. • Considering the energy consumption with AD for aggregator node, E1tot = PstTst + Erec + Etrn + Ead = PstTst + (n∗X + (m−o) ∗Y + a)J/s (11) Here, Etrn = (m − o) ∗ Y means energy consumed will be based on number data instances removed after AD. PstTst is the startup power and startup time for aggregator node o indicates the number anomalous data instances. An energy model is built for equation (11) and simulated using MATLAB shown in Figure 5.2. This energy model predicted the lifetime of node energy with and without AD. It can be observed from the X axis that the lifetime of sensor nodes increases with AD. Illustration: Let 0.004 J/s, 0.01 J/s and 0.01 J/s are the assumed energy consumption units for processing, data receiving and data transmission. Suppose the number of data instances received ‘n’ by aggregator node A1 is 100 then energy consumed by ErecandEad will be, Erec = 100 ∗ 0.01 = 1J/s Ead = 100 ∗ 0.004 = 0.4J/s Let the number of anomalies detected ‘o’ for aggregator node A1 is 10 then energy consumed by Etrn will be, Etrn = (100 − 10) ∗ 0.01 = 0.9J/s Then E2tot with 0.1 J/s as PstTst will be, E2tot = PstTst + Erec + Etrn + Ead https://doi.org/10.15837/ijccc.2023.1.4756 16 Figure 14: Energy consumption of aggregator node with AD using Energy Model Figure 15: Energy consumption of transmission after AD using Energy Model Figure 16: Comparison of average energy consumption of proposed AD model https://doi.org/10.15837/ijccc.2023.1.4756 17 Figure 17: Energy consumption analysis of various deep learning model = 0.01 + 1 + 0.4 + 0.9J/s = 2.31J/s The above illustration expresses the reduction in transmission energy due to the removal of anomalous instances as depicted in Figure 5.2. Based on the premises discussed in the energy model shown in Figure 5.2, it can be inferred that FANN consumes proportionately less energy for AD than trans- mission and reception. The comparative analysis of energy consumed by AD for various models is depicted in Figure 5.2. From the comparison, it is identified that the proposed deep learning model consumes reduced energy than existing deep learning models and energy models for AD. The computation of average energy consumption at a CH is analyzed. Following are the parameters used in our experiment: (i) Number of data packets is kept in a range of 150 to 750 (ii) Data corruption rate is kept in a range of 1.5% to 15% Also, the initial energy was set to 500 J/s and gradually decreased concerning processing various data packets using AD. A comparison of residual energy with AD against the residual energy without AD yields the inference. The energy consumption of various state-of-art techniques versus the proposed model is depicted in 5.2. The figure shows that the residual energy after AD using the proposed model appears to be marginally greater than the residual energy without AD. Other existing models, on the other hand, produce less residual energy after AD than residual energy without AD. This supports the proposed model for sustainable development in WSN. Consequently, the proposed FANN also ensures less computation time for AD compared to the other models as shown in 5.2, which is relatable to lower energy consumption, addressing RG (iii). 6 Conclusion AD using deep learning is a significant research area that has attracted the attention of researchers. Specific techniques can be adopted depending upon the application domain. The detection process https://doi.org/10.15837/ijccc.2023.1.4756 18 Figure 18: Computation time analysis of different unsupervised algorithms must have high accuracy consuming less energy when it is applied to WSNs. The implemented FANN model presented in this paper is promising in AD handling unlabeled data gathered in a WSN environment. The selection of hyper parameters for the model provides a unique perspective for improving the accuracy of the model in eliminating outliers. Moreover, the proposed model is capable of handling high-volume multidimensional datasets with less computation time and reduced energy utilization. The performance of the proposed model is evaluated using the AUC-ROC score and suitable metrics for anomaly detection such as Accuracy, Precision, Recall, Specificity, F1 score, and MCC. From the experimental results, it is inferred that our model outperforms the other algorithms discussed in the literature concerning detection accuracy. Additionally, the proposed FANN model proves to be a False Positive Reducer achieving a lower FAR, which is a major concern for ensuring less data loss. The energy consumption of the proposed AD model is also observed as low in the aggregator node. The above discussions show that applying AD on the aggregator node will ensure sustainable improvement in WSNs. Furthermore, promising directions leading to extended research can focus on ensemble learning and auto-tuning the hyper parameters to further improve AD accuracy. Researchers can also work towards recovering lost data after an attack detection to empower data consistency in WSNs. Author contributions The authors contributed equally to this work. Conflict of interest The authors declare no conflict of interest. References [1] Miao Xie; Song Han; Biming Tian; Sazia Parvin (2011). Anomaly Detection in Wireless Sensor Networks: A survey, Journal of Network and computer Applications, vol. 34, no.4, pp. 1302-1325, 2011. [2] Deqing Wang; Ru Xu; Xiaoyi Hu; Wei Su (2016). Energy-Efficient Distributed Compressed Sens- ing Data Aggregation for Cluster-Based Underwater Acoustic Sensor Networks, International Journal of Distributed Sensor Networks, pp. 1-14, 2016. [3] Barakkath Nisha U; Uma Maheswari N; Venkatesh R; Yasir Abdullah R (2015). Improving Data Accuracy Using Proactive Correlated Fuzzy System in Wireless Sensor Networks, KSII Transac- tions On Internet And Information Systems, vol.9, pp.1976-7277, 2015. https://doi.org/10.15837/ijccc.2023.1.4756 19 [4] Robert Mitchell; Ing-Ray Chen (2014). A survey of intrusion detection in wireless network appli- cations, Computer Communications, vol. 2 no. 42, pp. 1-23, 2014. [5] Bo Sun; Xuemei Shan; Kui Wu; Yang Xiao (2013). Anomaly Detection Based Secure In-Network Aggregation for Wireless Sensor Networks, IEEE Systems Journal, vol. 7, no. 1, pp. 13-25, 2013. [6] Sapna Singh; Daya Shankar Singh; Shobhit Kumar (2014). Modified Mean Square Algorithm with reduced cost of training and Simulation time for Character Recognition in Backpropagation Neu- ral Network, Advances in Intelligent Systems and Computing, Springer International Publishing, 2014. [7] XinqianLiu; Jiadong Ren; Haitao He; Qian Wang; Shengting Sun (2020). A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition, KSII Transactions on Internet and Information Systems, Vol. 14, No. 7, July 31, 2020. [8] Sankardas Roy; Mauro Conti; Sanjeev Setia; Sushil Jajodia (2012). Secure data aggregation in wireless sensor networks, IEEE Information Forensics and Security, vol. 7, no. 3, pp. 1040-1052, 2012. [9] Mohammed Abu Alsheikh; Shaowei Lin; Dusit Niyato; Hwee Pink Tan (2014). Machine learning in Wireless Sensor Network: Algorithm, Strategies & Application, IEEE Communication surveys & tutorials, vol. 16, 2014. [10] Francesco Gullo; Giovanni Ponti; Andrea Tagarelli; Sergio Greco (2017). An information-theoretic approach to hierarchical clustering of uncertain data, Information Science, 402, 199–215, 2017. [11] G. Yuan; B. Li; Y. Yao; S. Zhang (2017). A deep learning-enabled subspace spectral ensem- ble clustering approach for web anomaly detection, International Joint Conference on Neural Networks (IJCNN), pp. 3896-3903, 2017. [12] Guo Pu; Wang L (2021). A hybrid unsupervised clustering-based anomaly detection method, Tsinghua science, and technology, ISSN 1007-0214 pp 146-153, vol. 26, No 2, 2021. [13] Chen. Y; Li. S (2019). A Lightweight Anomaly Detection Method Based on SVDD for Wireless Sensor Networks, Wireless Personal Communication, 105, 1235–1256, 2019. [14] Ruff. L; Vandermeulen. R; Goernitz. N; Deecke. L; Siddiqui. S.A; Binder. A; Müller. E; Kloft. M (2018) Deep One-class Classification, Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, 80:4393-4402,2018. [15] Nurfazrina Mohd Zamry; Anazida Zainal; Murad A. Rassam (2018). Unsupervised anomaly detec- tion for unlabelled Wireless Sensor Networks Data, International Journal Advance Soft Computing Applications, Vol. 10, No. 2, 2018. [16] Xuehui Wang; Yong Zhang; Hao Liu; Yang Wang; Lichun Wang; Baocai Yin (2018). An Improved Robust Principal Component Analysis Model for Anomalies Detection of Subway Passenger Flow, Journal of Advanced Transportation, vol. 2018, 12 pages, 2018. [17] Aaron Tuor; Samuel Kaplan; Brian Hutchinson; Nicole Nichols; Sean Robinson (2017). Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams, Proceedings of AI for Cyber Security Workshop at AAAI, 2017. [18] Marwan Ali Albahar; Muhammad Binsawad (2020). Deep Autoencoders and Feedforward Net- works based on a New Regularization for Anomaly detection, Security and Communication Net- works, Hindawi, 2020. [19] Raghavendra Chalapathy; Sanjay Chawla (2019). Deep Learning for Anomaly Detection – A Survey, arXiv.org, 2019. https://doi.org/10.15837/ijccc.2023.1.4756 20 [20] T. Luo; S. G. Nagarajan (2018). Distributed Anomaly Detection Using Autoencoder Neural Networks in WSN for IoT, IEEE International Conference on Communications (ICC), pp. 1-6, 2018. [21] Yan Qiao; Xinhong Cui (2020). Fast outlier detection for high-dimensional data of WSN, Inter- national Journal of Distributed Sensor Networks, vol. 16(10), 2020. [22] Raghavendra Chalapathy; Aditya Krishna Menon; Sanjay Chawla (2019). Anomaly Detection using ONE-CLASS NEURAL NETWORKS, arXiv.org, 2019. [23] Markus Goldstein; Seiichi Uchida (2016). A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data, Journal.pone.0152173, PPLOS ONE, 2016. [24] DaochenZha; Kwei-Herng Lai; Mingyang Wan; Xia Hu (2020). Meta-AAD: Active Anomaly De- tection with Deep Reinforcement Learning, arxiv.org, 2020. [25] F. de La Bourdonnaye; C. Teulière; T. Chateau; J. Triesch (2018). Learning of binocular fixations using anomaly detection with deep reinforcement learning, International Joint Conference on Neural Networks (IJCNN), pp. 760-767, 2018. [26] Hoc Thai Nguyen; Nguyen Huu Thai (2019). Temporal & Spatial outlier detection in wireless sensor networks, Wiley ETRI Journal, 41(4):437–451, 2019. [27] Sahar Kamala; Rabie A. Ramadanb; Fawzy EL-Refai (2016). Smart Outlier Detection of WSN, Facta universitatis - series: Electronics and Energetics, vol. 29, Issue 3, pp. 383-393, 2016. [28] Jakovljevic; Mihajlo; Elbasani; Ermal; Kim; Jeong-Dong (2021). LLAD: Life-Log Anomaly De- tection Based on Recurrent Neural Network LSTM, Journal of Healthcare Engineering, Hindawi, pp. 2040-2295, 2021. [29] S. Garg; K. Kaur; N. Kumar; G. Kaddoum; A. Y. Zomaya; R. Ranjan (2019). A Hybrid Deep Learning-Based Model for Anomaly Detection in Cloud Datacenter Networks, IEEE Transactions on Network and Service Management, 16, 3, 924-935, 2019, doi: 10.1109/TNSM.2019.2927886. [30] Chander. B; Kumaravelan (2021). Outlier Detection in Wireless Sensor Networks with Denoising Auto-Encoder, Advances in Intelligent Systems and Computing, 1382, 2021. [31] Chongxuan Li; Jun Zhu; Bo Zhang (2016). Learning to generate with memory, In International Conference on Machine Learning (ICML), pp. 1177–1186, 2016. [32] Xavier Glorot; YoshuaBengio (2010). Understanding the difficulty of training deep feedforward neural networks, International Conference on Artificial Intelligence and Statistics (AISTATS), 2010. [33] Szandała T (2021). Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks, In: Bio-inspired Neurocomputing. Studies in Computational Intelligence, vol 903, 2021. [34] Nwankpa. C. E; Ijomah. W; Gachagan. A; Marshall. S (2021). Activation functions: comparison of trends in practice and research for deep learning, 2nd International Conference on Computational Sciences and Technology, pp. 124–133, 2021. [35] Hendrycks. D; Gimpel. K (2016). Gaussian Error Linear Units (GELUs), Machine Learning (cs.LG), arXiv e-prints, 2016. [36] Gokcesu; Kaan; Hakan Gokcesu (2021). Generalized Huber loss for robust learning and its efficient minimization for a robust statistics, arXiv preprint arXiv:2108.12627, 2021. [37] Vallez. N; Velasco-Mata. A; Deniz. O (2021). Deep autoencoder for false positive reduction in handgun detection, Neural Computing & Applications, 33, 5885–5895, 2021. https://doi.org/10.15837/ijccc.2023.1.4756 21 [38] Bezerra. F; Wainer.J (2012). A Dynamic Threshold Algorithm for Anomaly Detection in Logs of Process Aware Systems,Journal Information and Data Management, vol. 3, p.316-331, 2012. [39] https://www.wunderground.com/history/monthly/in/new-delhi/VIDP [40] Barakkath Nisha Usman; Uma Maheswari Natarajan; Venkatesh Ramalingam; Yasir Abdullah Rabi (2016). Fuzzy based Flat Anomaly Diagnosis and Relief Measures in Distributed Wireless Sensor Network, International Journal of Fuzzy Systems, vol 19, 2016. [41] John. T; Ogbiti; Ukwuoma Henry; Danjuma Salome; Ibrahim Mohammed (2016). Energy Con- sumption in Wireless Sensor Network, The International Institute for Science, Technology, and Education (IISTE), 2016. [42] Mohajer. A; Mazoochi. M; Niasar. F.A; Ghadikolayi. A.A; Nabipour. M (2013). Network Coding- Based QoS and Security for Dynamic Interference-Limited Networks, Communications in Com- puter and Information Science, 370, 2013. [43] Leandro A. Villas; Azzedine Boukerche; Daniel L. Guidoni; Horacio A.B.F. de Oliveira; Regina Borges de Araujo; Antonio A.F. Loureiro (2013). An energy-aware Spatio-temporal correlation mechanism to perform efficient data collection in wireless sensor networks, Computer Communi- cations, 36, 9, 2013. [44] A. Mohajer; F. Sorouri; A. Mirzaei; A. Ziaeddini; K. J. Rad, M. Bavaghar (2022). Energy-Aware Hierarchical Resource Management and Backhaul Traffic Optimization in Heterogeneous Cellular Networks, IEEE Systems Journal, pp. 1-12, 2022. doi: 10.1109/JSYST.2022.3154162. [45] Mohamed Elshrkawey; Samiha M. Elsherif; M. ElsayedWahed (2018). An Enhancement Approach for Reducing the Energy Consumption in Wireless Sensor Networks, Journal of King Saud Uni- versity – Computer and Information Sciences, vol. 30, 259–267, 2018. Copyright ©2023 by the authors. Licensee Agora University, Oradea, Romania. This is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial 4.0 International License. Journal’s webpage: http://univagora.ro/jour/index.php/ijccc/ This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE). https://publicationethics.org/members/international-journal-computers-communications-and-control Cite this paper as: Arul Jothi, S.; Venkatesan, R. (2023). A Deep Learning Approach for Efficient Anomaly Detection in WSNs, International Journal of Computers Communications & Control, 18(1), 4756, 2023. https://doi.org/10.15837/ijccc.2023.1.4756 Introduction Data aggregation in WSNs Anomaly Detection in WSN Literature Background Proposed Feed-Forward Autoencoder Neural Network for Anomaly Detection Overview Schema of FANN model Model Functionality Activation function analysis for the model Loss function analysis for the model Arctan Mean Square Error (AMSE) Huber Loss (HL) False Positive rate reduction Experimental Results Dataset exploration Data processing and visualization Analysis of activation functions with a loss function Methods for mitigating False alarm rate AUC - ROC Curve Loss Functions: AMSE and HL Conformance threshold Results Comparison and Discussions Anomaly Detection performance Energy Performance Conclusion