Microsoft Word - 23-3801_s_ETASR_V10_N5_pp6270-6275

Engineering, Technology & Applied Science Research Vol. 10, No. 5, 2020, 6270-6275 6270

www.etasr.com More & Mishra: Enhanced-PCA based Dimensionality Reduction and Feature Selection for Real-Time …

Enhanced-PCA based Dimensionality Reduction and

Feature Selection for Real-Time Network Threat

Detection

Pournima More

Department of Computer Science and Engineering
Koneru Lakshmaiah Education Foundation
Vaddeswaram, Andhra Pradesh, India

pournima.more1@gmail.com

Pragnyaban Mishra

Department of Computer Science and Engineering
Koneru Lakshmaiah Education Foundation
Vaddeswaram, Andhra Pradesh, India

pragnyaban@kluniversity.in

Abstract-With the rise of the data amount being collected and

exchanged over networks, the threat of cyber-attacks has also

increased significantly. Timely and accurate detection of any

intrusion activity in networks has become a crucial task. While

manual moderation and programmed logic have been used for
this purpose, the use of machine learning algorithms for superior

pattern mapping is desired. The system logs in a network tend to

include many parameters, and not all of them provide indications

of an impending network threat. The selection of the right

features is thus important for achieving better results. There is a

need for accurate mapping of high dimension features to low

dimension intermediate representations while retaining crucial

information. In this paper, an approach for feature reduction and

selection when working on network threat detection is proposed.

This approach modifies the traditional Principal Component

Analysis (PCA) algorithm by working on its shortcomings and by

minimizing false detection rates. Specifically, work has been done

upon the calculation of symmetric uncertainty and subsequent

sorting of features. The performance of the proposed approach is
evaluated on four standard-sized datasets that are collected using

the Microsoft SYSMON real-time log collection tool. The

proposed method is found to be better than the standard PCA

and FAST methods for data reduction. The proposed approach

makes a strong case as a dimensionality reduction and feature

selection technique for reducing time complexity and minimizing
false detection rates when operating on real-time data.

Keywords-principal component analysis; fast clustering;

dimensionality reduction; machine learning; network security

I. INTRODUCTION AND RELATED WORK

Network security is of utmost importance, especially for
companies or foundations. Any security breach will be
characterized by a pattern in the network logs preceding the
attack and these patterns, if detected accurately, can help
diverting major mishaps [1]. Previous approaches have relied
upon the use of Security Intelligence and Management Systems
(SIEMs) coupled with manual moderation for scanning
network threats. SIEMs are associated with Security
Operations Center (SOC) to whom they report these threats [2-
3]. They conform to the laws on risks and regulation criteria
and make use of predefined rules for catching any network

breach or Incident Response (IR). However, similar to manual
moderation, there are certain limitations in the use of SIEMs.
They operate on a static set of vulnerability rules and hence are
not able to detect any trends of a novel attack. Further, they are
associated with an operational overhead and hence come up
short when working with real-time data. The performance of
SOCs has worsened over the years and they are no longer
sufficient when working with large scale networks. Machine
learning algorithms help in deriving rules and making accurate
predictions even on previously unseen data. The behavioral
features of the system logs can be used for attack detection.
However, the system logs consist of many parameters and not
all of them contribute to the detection of network threats.
Previously, methods like Principal Component Analysis (PCA)
and FAST clustering have been used for eliminating the
redundant features from high dimensional data [3, 4]. While
PCA focuses on the derivation of a set of orthogonal eigen
vectors, FAST clustering focuses on an efficient but less
effective grouping of features. In this paper, an approach for
dimensionality reduction and feature selection is proposed
which ameliorates the shortcomings of the PCA algorithm and
improves the performance of the system. Specifically, FAST
and PCA approaches were combined in a novel way and
subsequently machine learning algorithms were used for
detecting anomalies that deviate from normal configurations,
hence predicting possible threats to the system. The main
contributions in this paper are:

• The formulation of a new approach for dimensionality
reduction and feature selection that improves standard
conventional approaches.

• The combination of FAST and PCA algorithms in a unique
manner for efficient, yet effective dimension mapping.

• A network threat detection approach that has better
performance than the previous approaches when working
on a real-time data set.

There has been ample work in the past on anomaly
detection on unseen data [5-9]. Most of this work can be
segregated into three types based on their nature: statistical,

Corresponding author: Pournima More

Engineering, Technology & Applied Science Research Vol. 10, No. 5, 2020, 6270-6275 6271

www.etasr.com More & Mishra: Enhanced-PCA based Dimensionality Reduction and Feature Selection for Real-Time …

distance, and density-based techniques. Each of these three
methods has some shortcomings. Statistical methods often
assume univariate distribution which is most often not true with
real-world data. Distance-based methods struggle with
overlapping data, while density-based methods may be useful
but are computationally very expensive. Authors in [10]
proposed a primitive regression method to detect the presence
of unauthorized users masquerading as registered users by
comparing their activity to the previous actions from those
accounts. Various unsupervised learning algorithms have been
applied, however, most of them have huge memory
requirements [2, 11]. Clustering has been used [12-14], but
incorrect grouping would lead to higher risk of false negatives.
As a result, dimensionality reduction becomes a crucial part of
the process and thus the majority of approaches have focused
on the use of PCA for this task [3, 15]. PCA aims to derive new
variables, called Main Components (PCs) as linear input
variable combinations so that a few of the new variables reflect
the overall variation between the input variables. Authors in
[19, 20] provide a network intrusion prevention approach that
tackles the problem of controlling high computer network
traffic and the time pressure to handle security threats. They
use the techniques of multivariate analytics, including
clustering and PCA to identify classes in the observed data.

The use of PCA is appealing due to its statistical
consistency, faster inference, and effective computation [11].
Yet there has been valid criticism in its use for network
intrusion detection due to several reasons like struggle with
rare class labels, exponential complexity, etc. [4, 7]. Examples
of this are [18, 19] where a combination of recorded variation
and the screen-plot process were used for selecting the key
components which may be risky as some anomaly amongst
lesser-known components may be ignored. The existence of
unusual data from regular operations is seen in [20] where the
criteria of choosing components for PCA remain unanswered.
A large number of variables could be reported to explain
network traffic behavior [21]. For example, authors in [22]
assume a certain expectation for collection methods for the
variable. Typically, every input variable has a non-zero factor
on all PCs. However, in practice, it is common that the majority
of the component loadings is close to zero [23] and is of less
practical significance. Recently there have been proposals of
some hybrid approaches [24-26] but they have a higher training
and inference cost, with increased complexity. Wormhole
attacks have been tackled using separate routing [27] but the
method does not work in structured log data. While synthetic
oversampling and under sampling approaches may work well
on text data [28], they are often detrimental to structured
network log data. Hence it can be seen that there is a need for
improving the shortcomings of the previous methods and come
up with a more accurate and efficient solution.

II. PROPOSED METHODOLOGY

In this paper, we work upon an enhanced feature selection
and reduction technique that ameliorates the shortcomings of
the previous approaches. The diagrammatic representation of
the proposed algorithm is shown in Figure 1. The phases on the
algorithm are defined in the following subsections.

Fig. 1. Diagrammatic representation of the proposed algorithm.

A. Log Files and Feature Extraction

The Microsoft SYSMON log process collection tool was
utilized for extracting the logs from the given system. These
logs are collected in real-time and have a certain number of
input attributes associated with them. These attributes are
described in detail below. The data features are stored together
in a matrix format for subsequent processing.

B. Calculation of Symmetric Uncertainty

Symmetric Uncertainty (SU) is used to obtain the selection
of features by calculating the fitness of the features between the
feature and the target class. Symmetric uncertainty is defined
as:

2*Gain( | )
SU( , )

( ) ( )

X Y
X Y

H X H Y
=

+
(1)

where Gain(X|Y) is the amount by which the entropy of the
variable Y decreases. H(X) is the entropy of a discrete random
variable X. If the prior probability of each element of X is p(x),
then H(X) can be calculated as:

H��� � �∑ p� � log
p� ��∈� (2)

The entropy value takes care of any associated bias
amongst the features with large values and also normalizes
them to a range of [0, 1]. An SU value of 1 indicates that the
information value of the one variable is fully represented by the
other, whereas a 0 value of SU(X, Y) indicates that X and Y are
independent variables. Such mathematical representation also
helps normalizing continuous features to a discrete form. These
SU values are used for the calculation of the eigenvalues and
the eigenvectors.

Engineering, Technology & Applied Science Research Vol. 10, No. 5, 2020, 6270-6275 6272

www.etasr.com More & Mishra: Enhanced-PCA based Dimensionality Reduction and Feature Selection for Real-Time …

C. Eigenvalues and Eigenvectors

Eigenvalues are used to calculate the dependency of
features and their correlation with the class values. The
eigenvector with the highest uniqueness is considered as the
most characteristic feature implying the highest variance [4].
Similarly, the second most unique vector is considered as the
second characteristic principal variable with information
retention. In this way, the top N eigenvectors are calculated and
represented using a co-occurrence matrix. N indicates the
higher contribution rate amongst all the eigenvectors. If M is a
n×n matrix, then v is considered as the eigenvector of M if:

M ×v = λ (3)

where λ is the eigenvalue associated with v. For the given
eigenvector v of M, given a scalar a:

a aλ× =M v v (4)

N different eigenvectors can always be chosen such that they
collectively account for a unit length:

2 2

1
... 1

n
v v+ + = (5)

The n eigenvectors are always orthogonal to each other.
Thus, they can be used as the basis for the formulation of a new
n-dimensional vector space. It is crucial to analyze the security
of information from different unknown sources or attacks [10].
A fixed policy can never detect new different threats that are
created. This examination can be done on different network
trees. Once the FAST calculations of SU are done, the
representation of the method is done by us using the PCA
method. A multidimensional hyperspace is hard to visualize.
The key objectives of the unsupervised learning approaches are
to minimize dimensionality, ranking all identifications based on
a composite index, and clustering related identifications based
on multivariate attributes together. Since it is difficult to
imagine a multidimensional space, PCA is used to scale down
the dimensionality of multivariate parameters into three
dimensions. In this paper, PCA is used for multivariate
analysis. The data set can be represented as a matrix X with n
rows and m columns, the sample rows, and the attribute
columns. Some multivariate methods presume a structure,
while others separate the cases into classes trying to figure out
the structure of data. The former is a case of supervised
learning while the latter indicates unsupervised learning.

In PCA, the interrelated multivariate parameters are
mapped to a set of non-related components, each expressing a
distinct linear combination of the main variables. The non-
related components extracted are the PCs and are predicted
from the covariance matrix's main variables' ownership. PCA
aims to obtain providence and minimum dimensionality by
extracting the smallest number of PCs that account for most of
the variation in the main multivariate information and
summarizing the data with minimum information loss. The
variance of an attribute is defined as:

)1(

)(

)var( 1

1
−

−

∑
=

(6)

In this method, the last principal component score needs to
be zero. All the given variables are scattered on a hyper plane.
If there is any update in the interrelations, there may be an
extension of the information outside the hyper plane. This is
reflected in the updates of the principal component scores that
were previously zero. Nil scores are the most sensitive to
updates in the interrelations. The main component scores are
powerful in observing any updates to the information. PCA
performs normal data circulation by selecting a suitable
orthogonal coordinate. The range of the main rectangular factor
scores is more suitable for expressing normal data distribution
than those of the sensed and actuated initial orthogonal
coordinates.

D. Threshold Calculations

After the eigenvectors are calculated, they are sorted in
descending order and the feature class mean value is calculated.
This value is used for deciding the threshold value for
obtaining the features. The following algorithm indicates the
entire process followed by the enhanced PCA algorithm.

Input: Input attribute matrix X ∈ x1, x2,...,xn, output Y,
threshold t

Output: Reduced representation S

1. For each xi in X do
2. Calculate H(xi)

3. Calculate SU(xi,Y)

4. Calculate Variance(xi)
5. Endfor

6. Sort all variances in descending order

7. Calculate eigenvalues λ1, λ2,…. λn

8. Calculate subsequent V ∈ v1; v2; ::; vn
9. For each vi in V do

10. If vi > t do

11. S.append(vi)
12. Endif

13. Endfor
14. return S

III. DATASET DESCRIPTION

For evaluation using a real-time dataset, data were collected
from logged data using the Microsoft Sysmon data collector
tool [30]. It provides the information of certain parameters
present in the network, considered as the input variables for the
network intrusion detection task. For a more generalized
evaluation and elimination of bias, we collect four different sets
of data each of varying sizes. The individual sizes of each of
the dataset are:

• Dataset 1: 1000

• Dataset 2: 1200

• Dataset 3: 800

• Dataset 4: 361

The total dataset size accounts for 3361 samples. Each of
the datasets is split into a training set and a testing set in a 3:2
ratio. For every sample in the dataset, there are 22 variables

Engineering, Technology & Applied Science Research Vol. 10, No. 5, 2020, 6270-6275 6273

www.etasr.com More & Mishra: Enhanced-PCA based Dimensionality Reduction and Feature Selection for Real-Time …

associated with it indicating the corresponding configuration.
These variables are described in Table I.

TABLE I. DESCRIPTION OF THE INPUT ATTRIBUTES

Parameter Description

Process name 1. Name of the process

three hours each:

2. 12pm-3pm

3. 3pm-6pm

4. 6pm-9pm

.............

9. 9am-12pm

Browsing history 10. URLs being browsed (if any)

Network transfer
11. Uploading

12. Downloading

Days of the week 13. Current day

Frequency 14. No. of executed processes

Spam mail keywords

15. Credit card

16. Loan

17. Offer

18. Monsoon

19. Sale

20. Winning

21. Money

22.Prize

IV. RESULTS AND DISCUSSION

The obtained results are presented on two different levels:
the output of the dimensionality reduction and the obtained
performance on task-specific performance metrics. While the
output variables received by the dimensionality reduction
algorithm can be variable, we judge their effectiveness by
comparing the results with those obtained by the models that
use other dimensionality reduction methods. Table II indicates
the output variables determined by the proposed approach to be
deemed most important and relevant to the task in hand. The
threshold was set to 66% and thus the total number of variables
was reduced to 15.

TABLE II. OUTPUT PARAMETERS DERIVED BY THE PROPOSED
ALGORITHM

Parameter No. Output parameter name

1 Name of the process

9 9am-12pm

3 3pm-6pm

11 Uploading

7 3am-6am

15 Credit card

5 9pm-12pm

17 Offer

2 12pm-3pm

10 Browsing history

6 12am-3am

12 Downloading

21 Money

19 Sale

16 Loan

The performance of the proposed approach is evaluated
with two different metrics: Accuracy and Inference time. These
are defined in (7) and (8):

Accuracy=
��: �� ��������� ��������� � !����

"�� � #�: �� � !����
(7)

Inference time = $%&'(&' – $)*(&' (8)

Inference or prediction time is defined as the amount of
time required for the system to output the prediction for a given
input set. It is found by subtracting the time frame when the
input was given from the time frame when the output was
predicted. The results of the proposed approach for each of the
four datasets for the three methods (FAST, PCA and enhanced
PCA) are compared. The accuracy obtained for the three
approaches is shown in Table III. It can be seen that the
proposed approach has outperformed the standard approaches
and has a healthy growth over the use of the normal PCA
approach. In Table IV the evaluation of the tested methods in
terms of Inference time is shown. It can be seen that while the
proposed approach is not the best regarding inference time, it
still performs better than the PCA algorithm. While the FAST
algorithm has faster inference time, the proposed approach is
better in terms of accuracy, which is more important in the case
of network intrusion detection systems. The graphical
comparison of the three approaches in terms of accuracy and
inference time is shown in Figures 2-3. Thus our proposed
approach has been able to get the right mix of effectiveness and
efficiency.

TABLE III. ACCURACY (%) OF THE TESTED ALGORITHMS

Dataset FAST PCA Enhanced PCA (proposed)

Dataset 1 90 87 92

Dataset 2 90 88 92

Dataset 3 88 88 89

Dataset 4 94 90 96

TABLE IV. INFERENCE TIME (ms)

Dataset FAST PCA Enhanced PCA (proposed)

Dataset 1 350 420 370

Dataset 2 360 530 380

Dataset 3 350 440 430

Dataset 4 270 315 260

Fig. 2. Accuracy comparison.

V. CONCLUSION

In this paper, a new approach was presented for the
dimensionality reduction and the subsequent prediction of any
intrusion threats to a network system. We were able to
minimize time complexity and false detection rates, especially
on a real-time data set.

Engineering, Technology & Applied Science Research Vol. 10, No. 5, 2020, 6270-6275 6274

www.etasr.com More & Mishra: Enhanced-PCA based Dimensionality Reduction and Feature Selection for Real-Time …

Fig. 3. Inference time comparison.

The proposed approach derived inspiration from the
traditional PCA algorithm but ameliorated its shortcomings and
improved further its performance. Specifically, we made use of
symmetric uncertainty, entropy, and associated factors to boost
the working of the feature selection PCA algorithm. The
Sysmon tool was used for the collection of data. The
performance of the proposed approach was evaluated on four
different datasets and was compared with the standard PCA
and FAST approaches. The proposed approach was found to be
more accurate while possessing a satisfactory inference time.
An increment of over 2% was observed in terms of accuracy as
compared to the standard approach. The reduction time was
faster than the regular PCA approach by more than 15% in all
cases. The proposed approach was proved to be a more
preferred method than the previous approaches. Future work in
this domain includes working upon other machine learning
algorithms to couple with the dimensionality reduction method
for enhanced results. Other preprocessing methods can also be
used so that the initial variables are made more model-friendly.
Genetic algorithms can be thought of as an alternative solution
for enhanced optimization strategy. The proposed approach can
be considered as a small contribution in the creation of timely
and accurate network intrusion detection systems.

REFERENCES

[1] S. Staniford-Chen and L. T. Heberlein, “Holding intruders accountable
on the Internet,” in IEEE Symposium on Security and Privacy, Oakland,

CA, USA, May 1995, pp. 39–49, doi: 10.1109/SECPRI.1995.398921.

[2] S.-J. Horng et al., “A novel intrusion detection system based on
hierarchical clustering and support vector machines,” Expert Systems

with Applications, vol. 38, no. 1, pp. 306–313, Jan. 2011, doi:
10.1016/j.eswa.2010.06.066.

[3] M. L. Shyu, S. C. Chen, K. Sarinnapakorn, and L. . W. Chang, “A Novel

Anomaly Detection Scheme Based on Principal Component Classifier,”
2003, pp. 172–179.

[4] H. Ringberg, A. Soule, J. Rexford, and C. Diot, “Sensitivity of PCA for

traffic anomaly detection,” in ACM SIGMETRICS International
Conference on Measurement and Modeling of Computer Systems, New

York, NY, USA, Jun. 2007, pp. 109–120, doi:
10.1145/1254882.1254895.

[5] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A

survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, Jul. 2009,
doi: 10.1145/1541880.1541882.

[6] H.-P. Kriegel, M. Schubert, and A. Zimek, “Angle-based outlier
detection in high-dimensional data,” in 14th ACM SIGKDD

International Conference on Knowledge Discovery and Data Mining,
New York, NY, USA, Aug. 2008, pp. 444–452, doi:

10.1145/1401890.1401946.

[7] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Conditional Anomaly
Detection,” IEEE Transactions on Knowledge and Data Engineering,

vol. 19, no. 5, pp. 631–645, May 2007, doi: 10.1109/TKDE.2007.1009.

[8] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF:
identifying density-based local outliers,” in ACM SIGMOD International

Conference on Management of Data, New York, NY, USA, May 2000,
pp. 93–104, doi: 10.1145/342009.335388.

[9] A. T. Siahmarzkooh, S. Tabarsa, Z. H. Nasab, and F. Sedighi, “An

Optimized Genetic Algorithm with Classification Approach used for
Intrusion Detection,” 2015. /paper/An-Optimized-Genetic-Algorithm-

with-Classification-Siahmarzkooh-
Tabarsa/b0e239298e7c6d8aa0e813a12fe55a2d12673e29 (accessed Sep.

12, 2020).

[10] W. Dumouchel and M. Schonlau, “A Comparison of Test Statistics for

Computer Intrusion Detection Based on Principal Components
Regression of Transition Probabilities,” in Proceedings of the 30th

Symposium on the Interface: Computing Science and Statistics, 1998,
pp. 404–413.

[11] Z. Muda, W. Yassin, M. N. Sulaiman, and N. I. Udzir, “Intrusion

detection based on K-Means clustering and Naïve Bayes classification,”
in 7th International Conference on Information Technology in Asia,

Kuching, Sarawak, Malaysia, Jul. 2011, pp. 1–6, doi:
10.1109/CITA.2011.5999520.

[12] A. T. Siahmarzkooh, J. Karimpour, and S. Lotfi, “A Cluster-based

Approach Towards Detecting and Modeling Network Dictionary
Attacks,” Engineering, Technology & Applied Science Research, vol. 6,

no. 6, pp. 1227–1234, Dec. 2016.

[13] J. Karimpour, S. Lotfi, and A. T. Siahmarzkooh, “Intrusion detection in
network flows based on an optimized clustering criterion,” Turkish

Journal of Electrical Engineering & Computer Sciences, vol. 25, no. 3,
pp. 1963–1975, May 2017.

[14] A. T. Siahmarzkooh, In press. A GWO-based Attack Detection System

Using K-means Clustering Algorithm (No. TRKU-11-08-2020-10987),
Technology Reports of Kansai University.

[15] A. Lakhina, M. Crovella, and C. Diot, “Characterization of network-

wide anomalies in traffic flows,” in 4th ACM SIGCOMM Conference on
Internet Measurement, New York, NY, USA, Oct. 2004, pp. 201–206,

doi: 10.1145/1028788.1028813.

[16] C. Taylor and J. Alves-Foss, “NATE: Network Analysis of Anomalous

Traffic Events, a low-cost approach,” in Proceedings of the 2001
workshop on New security paradigms, New York, NY, USA, Sep. 2001,

pp. 89–96, doi: 10.1145/508171.508186.

[17] C. Taylor and J. Alves-Foss, “An empirical analysis of NATE: Network
Analysis of Anomalous Traffic Events,” in Proceedings of the 2002

workshop on New security paradigms, New York, NY, USA, Sep. 2002,
pp. 18–26, doi: 10.1145/844102.844106.

[18] W. Wang and R. Battiti, “Identifying intrusions in computer networks

with principal component analysis,” in First International Conference
on Availability, Reliability and Security, Vienna, Austria, Apr. 2006, pp.

1–8, doi: 10.1109/ARES.2006.73.

[19] C. Callegari, L. Gazzarrini, S. Giordano, M. Pagano, and T. Pepe,
“When randomness improves the anomaly detection performance,” in

3rd International Symposium on Applied Sciences in Biomedical and
Communication Technologies, Rome, Italy, Nov. 2010, pp. 1–5, doi:

10.1109/ISABEL.2010.5702782.

[20] R. Kwitt and U. Hofmann, “Unsupervised Anomaly Detection in
Network Traffic by Means of Robust PCA,” in International Multi-

Conference on Computing in the Global Information Technology,
Guadeloupe City, Guadeloupe, Mar. 2007, pp. 37–37, doi:

10.1109/ICCGI.2007.62.

[21] W. Lee and S. J. Stolfo, “A framework for constructing features and
models for intrusion detection systems,” ACM Transactions on

Information and System Security, vol. 3, no. 4, pp. 227–261, Nov. 2000,
doi: 10.1145/382912.382914.

[22] M. Koeman, J. Engel, J. Jansen, and L. Buydens, “Critical comparison
of methods for fault diagnosis in metabolomics data,” Scientific Reports,

vol. 9, no. 1, Feb. 2019, doi: 10.1038/s41598-018-37494-7, Art. no.
1123.

Engineering, Technology & Applied Science Research Vol. 10, No. 5, 2020, 6270-6275 6275

www.etasr.com More & Mishra: Enhanced-PCA based Dimensionality Reduction and Feature Selection for Real-Time …

[23] H. Zou, T. Hastie, and R. Tibshirani, “Sparse Principal Component
Analysis,” Journal of Computational and Graphical Statistics, vol. 15,

no. 2, pp. 265–286, Jun. 2006, doi: 10.1198/106186006X113430.

[24] N. T. Pham, E. Foo, S. Suriadi, H. Jeffrey, and H. F. M. Lahza,
“Improving performance of intrusion detection system using ensemble

methods and feature selection,” in Proceedings of the Australasian
Computer Science Week Multiconference, New York, NY, USA, Jan.

2018, pp. 1–6, doi: 10.1145/3167918.3167951.

[25] A. J. Malik, W. Shahzad, and F. A. Khan, “Network intrusion detection
using hybrid binary PSO and random forests algorithm,” Security and

Communication Networks, vol. 8, no. 16, pp. 2646–2660, 2015, doi:
10.1002/sec.508.

[26] Y. Zhong et al., “HELAD: A novel network anomaly detection model
based on heterogeneous ensemble learning,” Computer Networks, vol.

169, Mar. 2020, doi: 10.1016/j.comnet.2019.107049, Art. no. 107049.

[27] F. Rezaei and A. Zahedi, “Dealing with Wormhole Attacks in Wireless
Sensor Networks Through Discovering Separate Routes Between

Nodes,” Engineering, Technology & Applied Science Research, vol. 7,
no. 4, pp. 1771–1774, Aug. 2017.

[28] P. Ratadiya and R. Moorthy, “Spam filtering on forums: A synthetic

oversampling based approach for imbalanced data classification,”
arXiv:1909.04826 [cs, stat], Sep. 2019, Accessed: Sep. 12, 2020.

[Online]. Available: http://arxiv.org/abs/1909.04826.

[29] P. More and M. P. Mishra, “Machine Learning for Cyber Threat
Detection,” International Journal of Advanced Trends in Computer

Science and Engineering, vol. 9, no. 1.1, pp. 41–46, 2020, doi:
10.30534/ijatcse/2020/0891.12020.

[30] M. Russinovich and T. Garnier, “Sysmon v11.11,” Jul. 15, 2020.

https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon
(accessed Sep. 12, 2020).