INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL
Online ISSN 1841-9844, ISSN-L 1841-9836, Volume: 18, Issue: 4, Month: August, Year: 2023
Article Number: 4644, https://doi.org/10.15837/ijccc.2023.4.4644

CCC Publications 

Facial emotion recognition using geometrical features based deep
learning techniques

J.L. Mazher Iqbal, M. Senthil Kumar, Geetishree Mishra
Asha G.R., Saritha A.N., A Karthik, BonthuKotaiah N.

J.L Mazher Iqbal
Department of ECE, Veltech Rangarajan Dr sagunthala R&D Institute of Science and Technology
Chennai, India
Corresponding author: mazheriq@gmail.com

M. Senthil Kumar
Department of ECE, Sree Dattha Group of Institutions
Sheriguda, Ibrahimpatnam, Hyderabad, 501510, India

Geetishree Mishra
Department of Electronics, B.M.S College of Engineering
Bull Temple Road Bangalore Karnataka-560019, India

Asha G R
Computer Science & Engineering, B.M.S. College of Engineering
Bangalore, Bull Temple Rd, Basavanagudi, Bengaluru, Karnataka 560019, India

Saritha A N
Computer Science & Engineering, B.M.S. College of Engineering
Bangalore, Bull Temple Rd, Basavanagudi, Bengaluru, Karnataka 560019, India

A Karthik
Institute of aeronautical Engineering, Hyderabad, India

BonthuKotaiah N
Department of Computer Science Central Tribal University of Andhra Pradesh
Vizianagaram, India

Abstract
In recent years, intelligent emotion recognition is active research in computer vision to understand
the dynamic communication between machines and humans. As a result, automatic emotion recog-
nition allows the machine to assess and acquire the human emotional state to predict the intents
based on the facial expression. Researchers mainly focus on speech features and body motions;
identifying affect from facial expressions remains a less explored topic. Hence, this paper proposes
a novel approach for intelligent facial emotion recognition using optimal geometrical features from
facial landmarks using VGG-19s (FCNN). Here, we utilize Haarcascade to detect the subject face
and determine the distance and angle measurements.The entire process is to classify the facial ex-
pressions based on extracting relevant features with the normalized angle and distance measures.
The experimental analysis shows high accuracy on the MUG dataset of 94.22% and 86.45% on
GEMEP datasets, respectively.

Keywords: VGG-19s, Emotion Recognition, Facial Analysis, Facial Landmarks, Feature Ex-
traction, Geometrical features, Hyper parameters


https://doi.org/10.15837/ijccc.2023.4.4644 2

1 Introduction
The field of Artificial Intelligence has a long-term challenge in developing a more leadership. effective

leadership for detecting human emotions. Facial emotion recognition (FER) is indeed an essential visual-based
technique for creating a more expert system capable of recognizing human emotions. The current methods
in FER are based on Action Units (AU), perceptual traits, and geometrical features. AU employs over 7,000
potential AU combinations to differentiate between emotions, which may be highly expensive and increase
processing time. Universalizing physical characteristics is another challenging topic [1]. Facial expressions
are one of the most reliable markers of a person’s mental state. Hence, FER has been used in a variety of
fields, such as security, rehabilitation, marketing, & sales. Feature extraction and selection are two primary
areas of focus [2] while designing efficient FER systems. The computer vision community as a whole sees
emotion identification via facial expression as problematic because of issues including individual diversity in
facial structure, the difficulty of distinguishing dynamic facial traits, low quality digital photographs, etc. [3].
As technology has progressed, so too has the use of human face recognition in a variety of contexts. More
research into to the user’s facial gestures and emotions is necessary for certain HCI initiatives, such as those
using a camera-equipped chatbots or a companion robots [4].

Face Emotion Recognition is crucial for human-machine interaction and communication (FER). In order for
the computer to understand the user’s emotional state and appropriately react, it studies their facial expressions.
For many years, researchers have toiled to create facial expression recognition (FER) systems [5]. As people
rely mostly on spoken signals and facial photos to comprehend the emotional condition of others, it makes sense
to utilise both together. The unique characteristics of speech and visual data make integration a significant
obstacle in emotion-recognition studies [6]. Facial expressions provide the brain with the most consistent data for
reading human emotions across contexts. Research into emotion recognition has developed at an exponential
pace in recent years, and facial expression recognition has emerged as a promising, hot field of study for
recognising a broad spectrum of basic emotions. One of the most basic emotions with many applications is joy,
and studies have shown that face expressions are more accurate than other approaches (such as audio/speech,
text, & physiological sensing) for gauging emotional states. Although most recent methods were developed
for recognising a wide range of emotions, improving their detection performance for a particular emotion is a
significant challenge (e.g., happiness). There aren’t many methods designed to pick out a single happy mood in
unrestricted videos, and the ones that do have insufficient accuracy because they don’t take into consideration
the processing of dramatic changes in head attitude [7]. Emotion recognition, or the study of how well computers
can interpret human feelings, is a burgeoning area with several practical applications. The challenge with most
approaches to emotion recognition based on visual signals is that individuals may hide their emotions by making
subtle changes to their expressions. As human emotions may be hard to pin down using traditional machine
learning and deep learning methods, EEG signals are increasingly being used for this purpose. Yet, most of
the proposed algorithms have subpar performance. Two convolutional neural network (CNN) approaches are
proposed in this study for effective recognition [8]. Model M1 is a fully parameterized CNN, whereas model
M2 is a less parameterized CNN. Speech emotion recognition refers to studies in signal processing that attempt
to identify feelings based on recordings of people’s voices (SER). Because to its widespread application in
settings as diverse as mental diagnosis and human-computer interaction, SER necessitates a robust framework
for accurate classification. With this goal in mind, we provide a yeast 2 - hybrid deep feature selection (HDFS)
method for emotion recognition from human conversations, which combines machine learning with autonomous
keyword engineering to achieve state-of-the-art performance in both precision and computational efficiency. The
dimensionality reduction process in our pipeline begins with a fuzzy entropy and similarity-based functionality
classification algorithm as well as ends with the widespread used Whale optimisation algorithm.

These features were extracted from raw audio signal mel-spectrograms using a qualified Wide-ResNet-50-2
supervised neural model.A k-nearest neighborhood classifier is used to the best collection of features to determine
an emotion’s category [9]. The ability to recognise human emotions is a huge boon to the field of computer
vision. Important for their safety in the case of a meltdown, this is the beginning of an automated service for
identifying the feelings of autistic children. Contemporary approaches to a breakdown issue are proactive as
opposed to reactive. Meltdown symptoms are sometimes shown by abnormal facial expressions [10] brought on
by a mashup of different emotions. Automated emotion recognition is gaining prominence due to the increasing
usage of human-computer interface software. Emotion identification may make use of a wide variety of data
sources including but not limited to speech, facial movements, body language, & physiological signs. The most
trustworthy emotional attachment with machines is established through the physiological indicators that are
practically impossible to manipulate. Thus, there’s been a lot of research on the challenge of automatically
recognising emotions from EEG data. Emotions come from a broad variety of cognitive processes that involve
various regions of the brain, making it necessary to recover a large number of features from the whole brain in
a number of bands in order to correctly identify them in EEG. When investigating how the brain controls its
emotions, it’s crucial to account for the ways in which different regions work together [11]. Facial expression
recognition (FER) is a challenging yet fascinating area of computer vision (CV). While many academics have


https://doi.org/10.15837/ijccc.2023.4.4644 3

devoted significant resources to studying FER in recent years, they perform very well on high-resolution pictures
but struggle to discern in the wild between specific emotional states [12].

Face emotion recognition from pictures is challenging since human facial expressions are notoriously hard
to predict. Emotion classification using deep learning (DL)-based techniques has not been as successful as the
current study. Incorrect layer choice inside the CNN model causes these models to underperform [13]. The ability
to express and understand one another’s feelings is fundamental to effective communication. Improvements in
emotion recognition have a significant impact on both human-computer interaction and computer-based voice
emotion identification. Speech emotion recognition (SER) is indeed the practise of determining the emotional
state of a speaker based on what they say or what they overhear others say. Hybrid systems have been proven
to outperform the more frequent single classifiers used in SER [14], but they are it isn’t without their faults.
If you can read a player’s mood while he’s engaged in an interactive game, you may be able to provide him
a more rewarding experience. Monitoring the player’s intrusive physiological signals is a common practise
in current methodologies for evaluating their emotional state [15]. This study proposes a geometric feature-
based emotion identification system using a VGG-19s model to identify and classify common emotions such
as happiness, sorrow, neutrality, wrath, fear, disgust, and surprise. The subject’s face is first identified using
a Haar-Cascade classifier, and then geometric facial feature points are extracted from the different regions of
the face using this information (such as the eyelids, cheeks, chin, and lips). We can gather the deformations
(facial modifications) induced by various expressions by employing the dimensions of the most significant face
landmarks to compute the angles and distances between the chosen facial regions, yielding the feature vector.
The resulting feature vector is the result of integrating the two. The system uses a VGG-19 architecture and
tweaks the DNN’s optimal hyperparameters to improve recognition accuracy. Together, these characteristics
are what the model learns to utilise to predict the system’s robustness. The proposed technique is validated
using Geneva Multimodal emotion projections (GEMEP)[13] bodily gesture dataset as well as the Multimedia
Comprehension Group (MUG)[12] facial expression database. High levels of recognition accuracy were achieved
on both datasets through the application of a VGG-19.

This paper will be organized as follows: Section 2 provides a summary of the existing literature on facial
expression recognition. In Section 3, we lay out the suggested methodological framework for designing feature
extraction from geometry. In Section 4, we present the experimental setup, performance measures, and results
from the MUG and GEMEP databases. The conclusions and suggestions are briefly outlined in Section 5.

2 Related Work
Authors were proposed a novel approach to emotion identification using geometrical fuzzy logic in this

research. The four-corner features of the mouth and eyes may be determined even without a reference face.
The recovered features specify the quadrilateral shape that did not map to geometric forms. Fuzzy membership
functions are used in the proposed Mixed Quadratic Shape Model (MQSM) to quantify the degree to which the
quadrilateral is inaccurate. Fuzzy features are extracted from attribute values and used to verify the classifier
(12 in total). To test the performance of the MSQM, the CK, JAFFE, & ISED databases are used in the
experiment. Using just 12 fuzzy attributes, the proposed method beat state-of-the-art techniques that often
depend on reference pictures [16]. Researchers have traditionally used static feature selection methods in their
studies. Despite their usefulness, these methods nevertheless have some significant limitations, particularly as
they pertain to dealing with spontaneous discourse. This is mostly due to the fact that each person’s facial
characteristics result in somewhat different emotional expressions. To address this problem, we provide a
dynamic attribute selection approach based on facial characteristics, which draws on two types of geometric
features: linear features & eccentricity features. When combined, they provide light on why and how our facial
expressions look the way they do. The suggested method also considers the subject’s head position, muscles,
and facial expressions.

The proposed technique outperforms the state-of-the-art methodologies, as shown by experiments performed
on the CK+ & DISFA datasets, and maintains its advantage even when evaluated against different datasets.
When applied to the CK+ dataset, the proposed method obtains an excellent 97.72% reliability in facial expres-
sion identification, while on the DISFA dataset, it gets an impressive 91.26 percent accuracy [17]. We present a
convolutional neural network (CNN) that could determine an individual’s emotional state from a single image
of their face. For the purpose of emotion recognition, the proposed FS-CNN first analyses facial landmarks to
anticipate emotions. Hybridization of patch cropped and convolutional neural network technology yields the
FS-CNN. In the first stage, we take care of locating faces in high-resolution images and cropping them as needed.
In order to anticipate facial expressions using landmarks analytics, we then used a convolutional neural networks
(CNN) to pyramid pictures and evaluate scale invariance. Using UMD Faces database, the proposed FS-CNN
was evaluated and improved. On average, almost 95% accuracy was attained, which is excellent [18].In this
paper, we present our Emotion, Age, and Gender Recognition (EAGR) system, which employs face recognition
algorithms to infer a user’s emotional state, chronological age, and sexual orientation. The EAGR system uses a


https://doi.org/10.15837/ijccc.2023.4.4644 4

CNN with three training models to identify seven emotions (six basic one and neutral), 4 age groups, as well as
two sexes; normalized facial crop rotation (NFC) is tried to apply as a merely helpful preprocess to the training
data; and augmented data is then used. The NFC will shave down hair so that facial features may be extracted
more easily for use in emotional analysis. Yet, the NFC would be capable of removing facial features, such as
age and gender, while the hair remains intact. The experiments made use of the 3-training model for recognizing
emotions, ages, and sex. By compensating for subjects’ slanted heads in the validation dataset, the proposed
binocular line angle correction (BLAC) yielded best-in-class mean average accuracy of 82.4%, 74.95%, & 96.65%
for real-time detection of seven moods, four age groups, as well as two genders, respectively. Moreover, NFC
preprocessing has the potential to significantly reduce training time. So, we consider the EAGR approach to
be cost-effective when trying to determine the gender, age range, or emotional state of a human subject. More
accurate feedback based on a wide range of face classifications may be possible thanks to other social uses of
the EAGR approach used by HCI services [19].

In order to advance from where we are now, this research proposes a novel FER layout. Improved Black
Hole’s global search capabilities as well as the Deep Learning Machine’s good generalization skills collaborate
to sort faces into categories (ELM). Our approach uses Linear Discriminant Analysis (LDA) & Principal Com-
ponent Analysis (PCA) to reduce the file sizes of facial images while preserving their essential distinguishing
features. The suggested technique has shown encouraging results on all three of these datasets, including the
Japanese Female Facial Expression (JAFFE), the Karolinska Directed Emotional Faces (KDEF), as well as the
expanded Cohn-Kanade (CK+). We also employed our own customized face dataset to further analyses the
proposed system, and found that the LDA-BH-ELM approach had an efficiency of 77% just on CK+ dataset &
80% on the KDEF database. Results showed that the proposed technique is better to traditional methods and
may provide remarkable performance [20]. In this paper, we provide a method for emotion identification that
makes use of both vocal cues and visual cues in tandem with one another. We develop three distinct varieties of
VGG-19s. Each of the neural nets is trained to identify emotions from a collection of pictures. Facial landmarks
are sent into a second network that may be able to depict facial motions. The audio is converted to its acoustic
properties and then used as inputs to the other network, which is responsible for visual synchronization. Using
an unique integration technique, we merge these three networks to improve the precision of their emotion recog-
nition findings. As proof of the effectiveness of the suggested approach, a comparison with another technique
is provided. The results suggest that this new approach is far better than the previous ones [21]. The Happy
Emotion Recognition system presented here improves upon previous work by combining 3D hybrid depth &
closeness features with conventional 2D deep features (HappyER-DDF). As a first step towards recovering spa-
tiotemporal information from a picture sequence, we use a 3D Original conception artificial neural hybrid having
long-short term memory (LSTM). Second, when a grin unfolds, we quantify the space occupied by various facial
features relative to an external point of reference (such as the nose peak) (or laugh). We use three publicly
available video datasets to assess functional & decision-level fusion techniques. Our HappyER-DDF approach
outperforms state-of-the-art facial expression technology [22] in terms of accuracy, as shown by the findings.
Being the most widely used EEG reference dataset, we use the DEAP and its valence and arousal indicators for
binary classification in this study. The dataset is complete with the addition of frequency domain Fast Fourier
Transform (FFT) extraction of features to supplement convolutional feature extraction. When compared to
other models, the M1 and M2 CNN designs achieve average accuracy of 99.89% & 99.22%, respectively. We
demonstrate the M2 model’s ability to consistently categorise polarities including over 96% accuracy using just
125 ms of EEG inputs and EEG outputs, and to reach 99.22% accuracy using only 2 seconds of EEG outputs.
Our suggested M2 model achieves a valence accuracy of over 96.8% with just 10% of a training dataset. The
source code used to carry out each experiment is made publicly available with its description [23].

Using a 5-fold cross-validation approach, we evaluate the suggested pipeline against a wide range of state-
of-the-art, recently published studies and find that it greatly outperforms them, demonstrating the excellence
and reliability of our work [24]. This essay painstakingly created the geometric, spatiotemporal, and profound
features of autistic children’s face expressions. In order to accomplish this, we compared the performance of
various combinations of these factors in Complex Emotion Recognition (CER) and determined which features
are most useful in identifying a Complex Emotion (CE) in autistic children during a crisis breakdown from
a typically functioning state. The "Meltdown crisis" 1 dataset, which details real-world Meltdown / Normal
scenarios involving autistic youngsters, was used to verify our assumptions. We show that the given data may
provide exceptionally promising results using a Random Forest classifier (91.27%) using custom-built features.
Classifiers trained on feature representation using InceptionResnetV2 utilising supervised learning approaches
provide state-of-the-art results (97.5% accuracy) [25]. In this study, 736 features are extracted from spectral
power and phase-locking data. To solve the challenge of learning key characteristics for emotion recognition, we
use swarm-intelligence (SI) algorithms here. We were able to apply the feature sets selected by these classification
approaches to the problem of distinguishing between happy and sad facial expressions.

In addition, popular features are often recycled into an updated feature set. Using the random forest
classification technique, we found an accuracy between 56.27 and 60.29 percent. The improvement was possible
because to an 87.17 percent reduction in file size, from 736 features to 94.40 features. We also highlight the


https://doi.org/10.15837/ijccc.2023.4.4644 5

importance of focusing on the best electrode locations for reading emotions. The categorization findings [26] are
promising, and we conclude that 11 channels predominate.To solve this issue, we created three novel applications
of neural networks. In this instance, we employed multi-scale kernel architectures with incorporated asymmetric
pyramidal networks (APNet). The square kernel was also replaced with a set of convolution layers in the x,
y, and z directions. This approach may enhance CNNs’ descriptive power by making it simpler for them to
integrate multi-scale information across their many layers. Utilizing random gradient descent and gradient
centralization during CNN training led to significant improvements (SGDGC). To verify the effectiveness of
APNet with SGDGC, experiments were performed on the widely used FER-2013, CK+, and JAFFE emotional
datasets. The results of our trials & compared to other government models [27] demonstrate that our method is
superior than utilising a single model and is on par with the outcomes achieved to use a model fusion strategy.
We present a powerful DL solution for this issue by classifying facial expressions according to their emotional
content using the CNN structure. In this study, we show how to analyse Viola-Jones (VJ) facial detector output
in a novel way by using a more advanced network architecture. After extensive testing to determine the optimal
arrangement, An first layer of the proposed model was developed. Many subjective and objective indicators of
performance informed our final verdict. The results not only show that people feel different things, but also
that those feelings may vary in strength. We conduct experiments on the fer-2013, ck+, & kdef datasets to test
the proposed model and compare the results to those obtained using state-of-the-art methods. Research like
this has the potential to aid authorities in "smart urban" [28].

The purpose of this research was to examine a suggested hybrid Long Short-Term Memory (LSTM) Networks
using Transformational Encoder in order to classify speakers’ emotions and make sense of their voice signals.
Mel Frequency Cepstral Coefficient-derived speech characteristics are used in the suggested fusion method for
the categorization of long-term memories (MFCC). The LSTM-Transformer model’s performance was evaluated
using rigorous testing. The results show that this is a significant improvement over the reported models before it.
Recognition accuracy for the proposed hybrid model was 75.6% just on RAVDESS dataset, 85.5% on the Emo-
DB dataset alone, and 72.4% just on vernacular dataset [29]. We found that HR and facial expressions may be
used to infer a player’s emotional state, and we provided a mechanism for doing so (FE). Because of the fleeting
nature of people’s emotional reactions, this research uses Kinect2.0 video for HR and FE detection in real time.
A convolutional neural network (CNN) is used to train the FE features, while a network with both long and
short-term memory (Bi-LSTM) is used to learn the HR features. Accurate emotion identification is achieved by
the SOM-BP networks by combining HR and FE characteristics. Our experimental results demonstrate that
our model is capable of accurately predicting players’ levels of joy, rage, sorrow, and composure in a large set of
games with low computational requirements. In light of this, the HR value may be a measure of how strongly
you feel [30].

From what has been said, it is clear that methods have been developed to determine emotional states
based on visual data & body language signals generated from dynamic aspects. Although while the problem
is less severe with still images of faces, there are still some emotions that can be differentiated. To identify
an emotion, it is sufficient to have at least some of the face traits present in a given image. By zeroing down
on a few important features, we can often determine the underlying emotional expression. So, we refined our
framework to focus on 66 variables representing more complex face regions by including the geometric measure.
Choosing appropriate hyper-parameters is the most difficult component of constructing a model utilizing deep
learning. We tested the model’s prediction ability under a variety of parameter combinations over tight ranges.

3 Proposed Approach
The proposed framework is shown in Figure 1. It consists of three stages (i)Facial Localization and Landmark

points detection,(ii)Facial geometric feature extraction using VGG-19, and (iii) Emotion Classification using
FCNN. We analyse the facial databases like MUG and GEMEP where a single face appears in eachvideo/image
frame for this work. Initially, the subject’s face is extracted from the facial input image sequence and 68
landmark points were identified from the faceregion. Following that, a geometrical face feature extraction is
made with the distance and angle measures.Finally,the extracted features are given to FCNN for training and
testing the facial emotions from the MUG and the GEMEP dataset.


https://doi.org/10.15837/ijccc.2023.4.4644 6

Figure 1: The frame work of the proposed system

3.1 Facial Localization and Landmark Detection
This session initially extracts the facial region of interest (ROI) from the input image/video. For better

recognition of face emotion, we used OpenCV’s built-in Haarcascades to locate the face in the image.It improves
the trade-off performance -Jones(VJ)algorithm for face detection [14], a dynamic approach used to detect the
object with a cascade function.Face detection is achieved easily by training the classifier with a set of positive
and negative frames by the AdaBoost algorithm and modifying the extracted weight features.Faciall and mark
identification is vital in many face analys is tasks to articulate specific facial behaviours based on facial muscle
movements. Figure 2shows the cropped face ROI and selected to analyse the temporal information byidentifying
the 68 landmark points using the Dlib frontal face detector toolkit [34].Facial expression recognition begins with
accurate and significant extraction of facial landmark features from the eyes,nose,eye brows,and lips.

Figure 2: Face Localization and Landmark Detection

3.2 Facial Geometrical Feature Extraction
Following the motions of prominent points inside the head, eye, eyebrows, nose, tongue, lips, and jaw, a geo-

metrical feature-based technique may extract minute changes in facial shape and movement. So, we considered
the feature representation based on the distance & angles of facial feature points in this investigation to differ-
entiate between emotions. By suggesting angle points and Euclidean distance between each pair of markers in
a frame, FACS defines features. In total, 66-dimensional features were taken from the facial information points,
representing 47 unique Euclidean distances and 19 angl emeasures between the coordinates of facial landmarks.
As a result, only the mostperilous facial features are added, as other distances points have only aminor effecton
a classifier’s decision-making. Figure 3 (a) and (b) shows the geometrical landmark points for distance and
angle measure.


https://doi.org/10.15837/ijccc.2023.4.4644 7

Figure 3: Facial Geo metrical Points (a) Forty-Seven Landmarks points selected for Distance
measure(b) Nineteen Landmark points selected for Angle measure

Using the use of Pythagoras’ Theorem, we may estimate the distance and angle between two points on a face
by utilising the points as landmarks. The distances between the various features of the face are denoted by the
pairs (xi, yi). 66-dimensional feature points are used to calculate point indices because of their high sensitivity
to subtleties of facial expression. Pairwise measurements of a two landmarks may be used to extract features for
FER. The 47 distance characteristics as well as the 19 angle characteristics, each with its own id, are listed in
Table 1. Before estimating the angles between the three sets of points, we calculate the distance among each set
of landmark coordinates inside a frame. Second, in order to train VGG-19s, feature fusion utilises a combination
of cartesian angles and distances between points to distinguish between 66 distinct facial expressions, ranging
from "neutral" to "extreme". object. The robot as well as the faces/emotions are separated by the length (in x
and y axes). Face emotion recognition rely on four coordinates (x, y, w, and h) in the frame: Bounding boxes
are denoted by x and y in the coordinate system; w and h represent the height and breadth of the a bounding
box, as well as the final values of w and h are used to determine kdX and kdY. The x and y dimensions are
calculated as follows:

kdX = StartX+En2

kdY = StartY Y +EndY2

wherein kdX is the x coordinate, kdY is the y group up, Start X is the beginning of the X axis in the structuring
element, Start Y is the beginning of a Y axis inside the structuring element, End X is the end of a X-axis as
in bounding box, & End Y is the end of the Y-axis inside the bounding box. Moreover, the robot’s built-in
camera’s distance from the detected face is determined by the formula:

"focal length " = wxd
W

"distance " = W xf
W

3.3 Statistical Feature Analysis
The box plot indicates the variance of individual class instances for representing facial emotion recognition.

Figure 4 and 5 represents the feature distribution of the distance angle measures in a statistical box plot analysis
for MUG and GEMEP datasets. Each discrete box in the box plot signifies the emotional class instance; the
central red line in the box maps the median of the sample data. The whiskers at the top and bottom of the
box indicate extreme data points without outliers, plus symbols outside the box mark the outlier features. As
a result, these plots show the variant features accurately discriminating the emotions of MUG and GEMEP
datasets.


https://doi.org/10.15837/ijccc.2023.4.4644 8

Table 1.Facial Geometric points for Distance and Angle measure for feature extraction.

Figure 4: Distance+Angle features distribution on MUG dataset


https://doi.org/10.15837/ijccc.2023.4.4644 9

Figure 5. Distance+Angle features distribution on GEMEP dataset

3.4 VGG-19s and FCNN
A. VGG

More than a million images from the ImageNet database were used to train the convolutional neural network
(CNN) designated as VGG-19. The network has 19 layers and can divide images into a thousand distinct cate-
gories, such as "computer," "mouse," "pencil," and "animal." Because of this, the network can now learn elaborate
feature representations for many kinds of images. Yet, despite the fact that the VGG net’s primary objective
was to triumph in the ILSVRC competition, it has been put to use in a wide variety of different contexts.
• Used simply as a decent classification architecture for a large number of additional datasets, and since the
authors made the models accessible to the public, they may be used as is or modified for use in other tasks that
are analogous, as well.
• Transfer learning: this method may also be used to problems involving face recognition.
• Weights are readily accessible with other platforms such as keras, which means that they may be manipulated
and utilized for any purpose the user desires.
• Loss of both content and format while utilizing the VGG-19 networking VGG-19 Architecture
• A square (224x224) RGB image was used as input to these networks, indicating that the input matrix was
also square (224,224,3).
• What has been done in the way of preprocessing thus far is to remove the training set’s mean RGB value
from each pixel. Just this one action was taken.
• To cover the entire picture, they used kernels that have been three pixel by three pixels and had a stride
length of one pixel.
• In order to keep the image’s resolution satellite consistent, spatial padding was used.
• sride 2 was used as the technique for maximum pooling across a 2-by-2-pixel window.
• To further enhance the model’s classification performance and shorten the calculation time, it was chosen to
use non-linearity using Rectified linear unit (ReLu). As compared to previous models that relied on tanh or
sigmoid curves, this one was proven to be far better.
• The incorporation of three fully connected layers, the first two of it has a size of 4096, the third of which had
1000 channels for 1000-way ILSVRC classification, and the last of which employed a softmax function.
B. CNN
The Convolutional Neural Network (CNN) is a specific a kind deep neural network framework used in the anal-
ysis of visual information. In contrast to traditional image processing, CNN may employ learnt filters rather
than edge, histogram, texture, etc. filters. This eliminates the need to learn through making mistakes.
• CNN begins with a feature-learning phase, and it is followed by a classification layer (also known as Fully
Connected Layer). Two of the most fundamental building blocks of the feature learning phase are the convolu-
tion operation as well as the pooling layer.
• A Convolution Layer is comprised of the Learnable Filters and Feature Extractors that were previously dis-
cussed.
• Pooling Layer: This not only brings approximately invariance but also accomplishes some spatial compres-
sion. Even if it is turned slightly, an automobile will still be recognizable as such.
C. VGG-19 & FCNN
VGG-19 is a state-of-the-art CNN with or before layers as well as a deep comprehension of the properties of
shapes, colours, and structures. VGG-19 is a deep neural network which was taught on millions of pictures of


https://doi.org/10.15837/ijccc.2023.4.4644 10

varying types and complexity.

Figure 6: Hybrid Architecture

To complete my classification goal of differentiating between photos with and without trees, we did not further
train VGG-19, instead freezing its layers and adding a shallow 2-layer networks on top of it. When f is the
characteristic length, w is indeed the width in pixel, d is the length in cm, & W is the widths in cm, the
distance is measured to determine the location of an object in addition to facial & emotion identification. When
comparing two faces, it’s crucial to know their exact x, y, and distance from one another. The challenge with
training neural learning is that it takes a lot of information. Because of its large size, the noisy dataset is
used for adaptive optimisation, even though our goal is to improve performance just on curated dataset. We
minimise the following bridge error during training:

L(S,L)=−
∑

iεN Lilog(si)

The proposed CNN based attention model comprised of 5 convolutional layers in which the first layer is
responsible for extraction of low level features from the input images. The other layers concentrate on the
region of interest provided by the FCN-perinet model. The detection of semantic regions was performed by
using the FCN and the classification of regions was carried out into three classes namely eyebrow, background,
and eye from which the region of interest is constructed. The operation of the attention layer was formulated
as, Where f(p, q)denotes the feature map of the CNN at the coordinates of the ROI, f(p, q)∗ denotes the treated
feature map at the end of the current layer and β denote the control coefficient which controls the adjustment
intensity. More precisely, let indicate the last convolutional layer’s vectorized FCNN answers (e.g. the ’conv54’
of VGG19), where m is the total amount of convolutional kernels in the final layer (e.g. m=512 in VGG19),
grows linearly with size of the input picture, & d is the amount of nodes each convolutional kernel.

4 Experimental setup
We’ll be working within the constraints outlined here. An Intel i7 processor @ 2.10 GHz & 16 GB of RAM

are all that’s needed for the testing on a Windows 10 machine. The bare minimum for running the VGG-19
and FCNN model is Python 3.76, the Keras 2.4.3 framework, as well as the Tensor Flow 2.3.1 libraries. We
experimented with a number of different model setups before settling on one that solved our issue. Unsuccessful
model configurations attempted to train the network to recognise novel illustration-domain picture properties.
We experimented with retraining the weights of the network’s lower layers to pick up on the rudimentary features
of the new domain, blocking updates to the network’s upper layers to preserve the refined object representation
learned during previous training, expanding the network’s depth, and tweaking parameters like drop - outs,
learning rate, and momentum. One of the most challenging aspects of building VGG-19 models is choosing
the combinations of hyper parameters using a looping strategy to increase the accuracy and efficiency of the
model. Deep learning with a grid search approach helps in selecting the suitable parameters for obtaining the
best prediction model which avoids over fitting. Our VGG-19 consists of five-layered architecture for training
andtesting. To assess the performance of our model, we trained our deep architecture up to 25layers,but there is
a significan tdecline in the accuracy.As a result,we choose the architecture with minimal layers, which provides
significantly faster inference and is moreappropriate for real-time applications.We demonstrate that VGG19’s
deep network produces much worse accuracy in our illustration database compared to natural pictures. The


https://doi.org/10.15837/ijccc.2023.4.4644 11

primary difference between our datasets and the original photos is in their statistical makeup. The creation
and training of a brand-new convolutional network is one strategy for enhancing performance on our data. We
don’t have enough data to train VGG19, as well as erasing the model’s previous training would mean losing all
of its previous learnings, therefore this is not a wise choice.

5 Dataset
In this study, we analyze facial expression datasets to extract geometrical characteristics, focusing on cases

when a single face looks in each picture or video frame. The data comes from two different sources: (a) the
MUG and (b) the GEMEP. Table 3 provides a concise summary of the characteristics shared by these two data
sets.

Table 3.Facial Expression Datasets
Datasets Total Sam-

ples
Subject Classes Temporality Resolution

Understanding Group (MUG)
[12]

11758 86 7 static 896×896

Geneva Multimodal Emotion
Portrayals (GEMEP) [13]

1823 10 17 dynamic 720×576

Understanding Group (MUG)
[12]

11758 86 7 static 896×896

Multimedia Understanding Group(MUG)
There are seven core emotion prototypes specified in the FACS handbook, and they are all part of the Multimedia
Understanding Team facial expression dataset. The dataset consists of 1462 face sequences including various
action components from 86 people, including 35 girls and 51 men between the ages of 20 and 35. Preprocessed
sequences yielded a total of (11758) pictures.

Figure 7:Sample frames of seven emotions from the MUG dataset.

Portrays seven different facial emotional class instances consisting of angry, disgust,fear, happiness, neutral,
sad and surprise. The recorded frames are posed at 19 frames persecond. Furthermore, this datasetis cho-
sen because of its authentic expressions that defeat the limitations of other similar datasets in FER, such as
illumination factors and numeroustakes per subject without occlusion.Table 4 depicts the numberof instances
perfacial expression.


https://doi.org/10.15837/ijccc.2023.4.4644 12

Table 4.The number of instances taken per facial expression.
Facial Expression Instances
Angry 1587
Disgust 1606
Fear 1638
Happy 1868
Sad 1802
Neutral 1389
Surprise 1868
Total 11758

5.1 GEMEP Dataset
The Geneva Multi-modal Emotion Portrayals(GEMEP)[37]is a multi modal framework created by Klaus

Scherer and Tanja Banziger. It consists of facial, audio and bodygesture video instances performed by ten
actors with different modalities. It encompasses a wide range of feelings, including: awe, amazement, anxiety,
anger, disdain, disgust, desperation, fear, irritation, curiosity, joy, pleasure, pleasure, relief, sorrow, surprise, and
tenderness. Initially, thevideo portrayals are converted to frames, with Haar cascade the face ROI is cropped and
used for our work.Figure 8 shows the sample frames of the face extraction process from the GEMEP Dataset.For
our experiment’s studies,1823 instances are considered.Table 5 shows the occurrence of each instance perfacial
expression.

Table 5. The number of instances taken per facial expression

Facial Expression Instances
Admiration 127
Angry 105
Contempt 61
Despair 126
Disgust 56
Interest 138
Irritation 134
Joy 113
Panic fear 115
Pleasure 179
Pride 93
Relief 116
Sad 125
Surprise 53
Tenderness 69
Total 1823

5.2 PerformanceMeasures
In this part, we give a metric for evaluating the deep networks model’s training-phase performance on the

test set. By doing so, we can better understand how quickly the model is able to converge. There is a 70%
training and 30% testing split for the suggested retrieved geometrical characteristics. Accuracy (A), Precise
(P), Recall (R), F1score, and ROC (Receiver Operating Characteristics) are some of the statistical measures
used to evaluate a model’s efficacy. Each occurrence of an emotional class in the MUG & GEMEP datasets was
evaluated using the metrics presented in Table 6.The effectiveness of the created facial expression recognition
system is measured via evaluation. The suggested system’s performance is evaluated in terms of its accuracy,
which is the rate at which it can recognise faces and expressions in real time. The formula for determining the
test’s precision is provided (1):

"accuracy " = T P +T N
T P +F P +F N +T N

where TP indicates positive results, TN negative results, FP positive results, and TN negative results. This
equation is used to achieve high quality in either face recognition or emotions recognition. The accuracy value
of a classification reveals its efficiency on a class-by-class basis.


https://doi.org/10.15837/ijccc.2023.4.4644 13

Table 6.PerformanceMeasure
Metric Description
Accurateness It measures how many samples were accurately labelled

in comparison to the total sample count.
Exactitude Using the sum of the positive samples, it extrapolates the

expected positive ones.
Recollection It quantifies the proportion of correctly identified emo-

tions.
F-score F-measure calculates an appropriate performance index

by averaging the results of two other metrics, "precision"
and "recall".

In this case, we have a four-way dichotomy denoted by the letters TP, TN, FP, and FN.

5.3 Experimental Results on MUG Dataset
Adamax optimizer was used to train for 100 epochs at a training rate of 0.001 to extract distance & angle

characteristics from the MUG dataset. For optimal network performance, they utilize Rectified Linear Unit
(ReLu) activation functions for all units in the hidden layer and’sigmoid’ activation functions for the output
layer.

Figure 8:Confusion Matrix:66features employed from MUG Dataset

The confusion matrix of the proposed distance and angle features on the MUG datasetdisplays that ‘angry’,
‘sad’, ‘happy’ and ‘disgust’ have been classified correctly with the bestaccuracy, and "surprise" has the lower
accuracy because some of the emotions like fear are mis classified as a surprise as shown in Figure9.

Table 7 denotes the performance measure of Accuracy, Precision, Recall, and F1-Score values for the DNN
model on the MUG dataset. The train and test set’s performance accuracy and loss graph produced outstanding
results with remarkably high consistency. Asshown in Figure 10 (a) and (b) recognizing facial emotions with
66 landmark features directly correlates to 100 epochs. The validation and train data began to converge at the
60th epoch and remained completely stable,and the system classifies emotion with the accuracy of 94.2 % for
the MUG dataset.

6 Experimental Results on GEMEP Dataset
A 100 epoch deep artificial neural model is used to train and evaluate the distance-and-angle fused features

from the GEMEP frames. To achieve optimal performance, we employ Rectified Linear Units (ReLu) inside the
hidden layer and’sigmoid’ in the output layer. When starting with a tiny dataset like GEMEP, dropout and
regularisation may help you train the best possible models for the validation set. For example, the confusion
matrix for the suggested features just on GEMEP dataset shows that the emotions of "pleasure," "sadness,"
"irritation," "disgust," "contempt," and "amusement" are accurately categorised. Figure 12 shows that although
the category "anxiety" has the highest accuracy, the category "pride" has the lowest accuracy.


https://doi.org/10.15837/ijccc.2023.4.4644 14

Table 7.The performance measure for the MUG dataset
Emotions Accurateness Exactitude Recollection f1 score

annoyed 0.97 0.92 0.97 0.94
revulsion 0.95 0.99 0.95 0.97
anxiety 0.93 0.95 0.93 0.94
joyful 0.95 0.93 0.95 0.94
unhappy 0.96 0.89 0.96 0.92
neutral 0.93 0.98 0.93 0.95
surprise 0.92 0.95 0.92 0.93

Table 7 denotes the performance measure of Accuracy, Precision, Recall, and F1-Score values for the DNN
model on the MUG dataset. The train and test set’s performance accuracy and loss graph produced outstanding
results with remarkably high consistency. Asshown in Figure 10 (a) and (b) recognizing facial emotions with
66 landmark features directlycorrelates to 100 epochs. The validation and train data began to converge at the
60th epoch and remained completely stable,and the system classifiese motion with the accuracy of 94.2 %forthe
MUGdataset.

6.1 Experimental Results on GEME PDataset
A 100-epoch deep artificial neural model is used to train and evaluate the distance-and-angle fused features

from the GEMEP frames. To achieve optimal performance, we employ Rectified Linear Units (ReLu) inside the
hidden layer and’sigmoid’ in the output layer. When starting with a tiny dataset like GEMEP, dropout and
regularisation may help you train the best possible models for the validation set. For example, the confusion
matrix for the suggested features just on GEMEP dataset shows that the emotions of "pleasure," "sadness,"
"irritation," "disgust," "contempt," and "amusement" are accurately categorised.

Figure 12 shows that although the category "anxiety" has the highest accuracy, the category "pride" has the
lowest accuracy.

Table 8. The performance measure for the GEMEP dataset
Emotions Accurateness Exactitude Recollection f1 score

Approbation 0.89 0.79 0.89 0.84
Delight 0.89 0.9 0.89 0.89
Annoyed 0.82 0.93 0.82 0.88
Nervousness 0.77 0.87 0.77 0.82
Disdain 0.90 0.87 0.9 0.89
Misery 0.83 0.76 0.83 0.79
Revulsion 0.88 0.84 0.88 0.86
Attention 0.83 0.93 0.83 0.88
Annoyance 0.95 0.89 0.95 0.92
Happiness 0.86 0.82 0.86 0.84
Fright fear 0.77 0.79 0.77 0.78
Favorite 0.94 0.93 0.94 0.94
Arrogance 0.71 0.89 0.71 0.79
Release 0.85 0.79 0.85 0.82
Unhappy 0.94 0.89 0.94 0.92
Amazement 0.84 0.83 0.94 0.88
Sensitivity 0.87 0.9 0.87 0.88

Table 8 displays the results of the performance analysis based on the confusion matrix reliability ratings.
According to this table, the suggested model achieves the maximum precision, reliability, recall, and f1 measure
as well as requires less computing time when the performance measure is increased for the specified input size.


https://doi.org/10.15837/ijccc.2023.4.4644 15

Figure 9:Confusion Matrix obtained for GEMEP Dataset

5. Comparative Study Table 9 contrasts the results provided with those obtained by the state-of-the-art
methods, showing that the proposed method for extracting geometrical features is both efficient and effective.

Table 9.State-of-the-artresults
Method Dataset Accuracy
ResNet 50 GEMEP 85.6
CNN 87.65
Mobile Net 89.2
Distance Manifolds using
SVM

92.76

Geometric features using
DNN

94.2

ResNet 50 MUG 84.6
CNN 88.65
Mobile Net 88.2
Distance Manifolds using
SVM

91.76

Geometric features us-
ingDNN

95.2

7 Conclusion and Future Work
In this paper, we proposed a new framework for recognizing facial emotion usingVGG-19s. We believe that

focusing on specific action units in the facialregionshelpsdetectfacial expressionsin depth.Additionally,weextracted
thefacefeature information using euclidean distance and angle measure using 66 distinct facialaction units to
highlight the most crucial parts for detecting facial emotions. To train the classifier from beginning and directly
forecast the output for the production of input characteristics, it is helpful to construct a VGG-19. Both the
MUG & GEMEP databases were used in experiments that sought to isolate individual feelings. Overall recog-
nition accuracy was found to be 94.22% for the MUG dataset and 86.45% for GEMEP dataset. Quantitative
measurements including as accuracy, precision, recall, f1-score, and ROC are used to verify the dataset’s per-
formance. Lastly, additional techniques developed in the area of face expression identification were compared
with the methodologies given in this study. The results indicated that the system was not capable of making
a reliable distinction between shock and satisfaction. For a given amount of input, the suggested task requires
less time on the computer and produces better performance metrics. Using real-time datasets, we combine face
traits with visual body motions to identify micro expressions of emotion.

Declaration:
Participation Consent and Ethical Approval:
This procedure is carried out without the involvement of people. Rights of Humans and Animals:
Animal and human rights are not being violated in any way.
Backing:
There is no money associated with this effort.


https://doi.org/10.15837/ijccc.2023.4.4644 16

Competing Interests:
There is no potential for a conflict of interest with this project.
Contributions to the Authorship:
There is no evidence of authorship.
Salutation:
No credit is due for this creation.

References
[1] Pérez, Ramses Fuentes, et al. “Prototype of MANET Network with Ring Topology for Mobile Devices.”

(2021).

[2] Rajathi L.V. and RubaSoundar K. -. “A Survey on Various MANET Protocols.” International Journal of
Innovative Research in Engineering & Multidisciplinary Physical Sciences (2022): n. pag.

[3] Malik, Kamal, and AnshuBhasin. “A survey of mitigation techniques of packet drop attacks in MANET.”
SSRN Electronic Journal (2022):

[4] Korir, FridahChepkemoi, and Wilson Cheruiyot. “A survey on security challenges in the current MANET
routing protocols.” Global Journal of Engineering and Technology Advances (2022):

[5] Mohamed, Hossam El-Din, et al. “Using MANET in IoT Healthcare Applications: A Survey.” (2021).

[6] Al-Shakarchi, Sanaa J. H. and RaaidAlubady. “A Survey of Selfish Nodes Detection in MANET: Solutions
and Opportunities of Research.” 2021 1st Babylon International Conference on Information Technology
and Science (BICITS) (2021): 178-184.

[7] Alam, Tanweer and Mohamed Benaida. “The Role of Cloud-MANET Framework in the Internet of Things
(IoT).” Computational Materials Science eJournal (2018): n. pag.

[8] Lakshmi, G. Vidhya and P. Vaishnavi. “An Efficient Security Framework for Trusted and Secure Routing
in MANET: A Comprehensive Solution.” Wireless Personal Communications 124 (2022): 333 - 348.

[9] Jabbar, Waheb A. et al. “MEQSA-OLSRv2: A Multicriteria-Based Hybrid Multipath Protocol for Energy-
Efficient and QoS-Aware Data Routing in MANET-WSN Convergence Scenarios of IoT.” IEEE Access 6
(2018): 76546-76572.

[10] Serhani, Abdellatif, et al. “AQ-Routing: mobility-, a stability-aware adaptive routing protocol for data
routing in MANET–IoT systems.” Cluster Computing 23 (2019): 13-27.

[11] Gomathi, K. and B. Parvathavarthini. “A Secure Clustering in MANET through Direct Trust Evaluation
Technique.” 2015 International Conference on Cloud Computing (ICCC) (2015): 1-6.

[12] Piyalikar et al. “Forecast Weighted Clustering in MANET.” Procedia Computer Science 89 (2016): 253-260.

[13] Suma, R., and Suma. “Integration of particle swarm optimization with an adaptive K-Nearest Neighbor
for energy-efficient clustering in MANET.” (2020).

[14] “A HYBRID APPROACH FOR NODE CO-OPERATION BASED CLUSTERING IN MANET.” (2018).
Krishnan, Rahul. “A Survey on Game Theory Approaches for Improving Security in MANET.” (2018).

[15] Gupta, Alok and Nikhil Ranjan. “A Survey of Attacker Identification and Security Schemes in MANET.”
(2020).

[16] Muruganandam, S. et al. “A Survey: Comparative study of security methods and trust manage solutions
in MANET.” 2019 Fifth International Conference on Science Technology Engineering and Mathematics
(ICONSTEM) 1 (2019): 125-131.

[17] Usha, G. et al. “Survey of Single and Cross Layer Security in MANET.” Indian journal of science and
technology 9 (2016):

[18] Sumra, Dr.Irshad Ahmed, et al. “Security issues and Challenges in MANET-VANET-FANET: A Survey.”
EAI Endorsed Transactions on Energy Web 5 (2018): 155884.


https://doi.org/10.15837/ijccc.2023.4.4644 17

[19] Rao, A. Arjuna et al. “Survey of Routing Protocols and Routing Attacks in MANET with Different Security
Technique in Cryptography for Network Security.” (2018).

[20] Poongodi, M. et al. “5G based Blockchain network for authentic and ethical keyword search engine.” IET
Commun. 16 (2021): 442-448.

[21] Zhang, Lei et al. “How Much Communication Resource is Needed to Run a Wireless Blockchain Network?”
IEEE Network 36 (2021): 128-135

[22] Nikhade, Jitendra R. and Vilas M. Thakare. “BlockChain Based Security Enhancement in MANET with
the Improvisation of QoS Elicited from Network Integrity and Reliance Management.” Ad Hoc Sens. Wirel.
Networks 52 (2022): 123-171.

[23] Suma, R. and Suma.. “Integration of particle swarm optimization with an adaptive K-Nearest Neighbor
for energy-efficient clustering in MANET.” (2020).

[24] J, Martin Sahayaraj et al. “IEEHR: Improved Energy Efficient Honeycomb based Routing in MANET for
Improving Network Performance and Longevity.” International Journal on Recent and Innovation Trends
in Computing and Communication (2022): n. pag.

[25] Alappatt, Valanto and Joe Prathap P. M.. “Trust-Based Energy Efficient Secure Multipath Routing in
MANET Using LF-SSO and SH2E.” International Journal of Computer Networks and Applications (2021):
n. pag.

[26] DrishyaS., R. et al. “A Stable Clustering Scheme with Node Prediction in MANET.” Int. J. Commun.
Networks Inf. Secur. 13 (2021): n. pag.

[27] Ponguwala, Maitreyi and Sreenivasa Rao. “E2-SR: a novel energy-efficient secure routing scheme to protect
MANET-IoT.” IET Commun. 13 (2019): 3207-3216.

[28] Kondaiah, Ramireddy and BachalaSathyanarayana. “Trust Factor and Fuzzy Firefly Integrated Particle
Swarm Optimization Based Intrusion Detection and Prevention System for Secure Routing of MANET.”
International Journal of Computer Networks & Communications 10 (2018): 13-33.

[29] Khan, Md. Sameeruddin and Md. Yusuf Mulge. “Efficient and Secure Data Transmission in MANETs
against Malicious AttackUsing AODV Routing and PSO Clustering with AES Cryptography.” (2017).

[30] Nagendranth, M. V. S. S. et al. “Type II fuzzy-based clustering with improved ant colony optimization-
based routing (t2fcatr) protocol for secured data transmission in manet.” The Journal of Supercomputing
78 (2022): 9102 - 9120.

[31] A.Karthik, Dr. J.L Mazher Iqbal. (2020). Performance estimation based recurrent-convolutional encoder
decoder for speech enhancement. International Journal of Advanced Science and Technology, 29(05), 772 -
777. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/9611

[32] Sakkarapani, Krishnaveni and C. Chandra Prabha. “Secure Multi-Path Routing Using Splitting and Merg-
ing Based Clustering for Reducing Power Usage in MANET.” (2021).

[33] Mohindra, AnubhutiRoda and Charu Gandhi. “A Secure Cryptography Based Clustering Mechanism for
Improving the Data Transmission in MANET.” (2021).

[34] Veeraiah, N. and B. Tirumala Krishna. “An approach for optimal-secure multi-path routing and intrusion
detection in MANET.” Evolutionary Intelligence 15 (2020): 1313-1327.

[35] Rani, Simpel. “A Hybrid and Secure Clustering Technique for Isolation of Black hole Attack in MANET.”
(2018).

[36] B, Revathi and Arulanandam K. “Design And Development of Robust And Secure Cluster Routing Al-
gorithm For Manet Based IOT.” International Journal of Computer Trends and Technology (2021): n.
pag.

[37] Sajyth, RB and G. Sujatha. “Design of Data Confidential and Reliable Bee Clustering Routing Protocol in
MANET.” 2018 International Conference on Computer Communication and Informatics (ICCCI) (2018):
1-7.


https://doi.org/10.15837/ijccc.2023.4.4644 18

[38] Rajaram, A., & Baskar, A. (2023).Hybrid Optimization-Based Multi-Path Routing for Dynamic Cluster-
Based MANET. CYBERNETICS AND SYSTEMS.

[39] Karthik, A., MazherIqbal, J.L. Efficient Speech Enhancement Using Recurrent Convolution Encoder and
Decoder. Wireless Pers Commun 119, 1959–1973 (2021). https://doi.org/10.1007/s11277-021-08313-6

[40] Anand, R. P., & Rajaram, A. (2020, December). Effective timer count scheduling with spectator routing
using stifle restriction algorithm in manet. In IOP Conference series: materials science and engineering
(Vol. 994, No. 1, p. 012031).IOP Publishing.

[41] Rathish, C. R., & Rajaram, A. (2018). Sweeping inclusive connectivity based routing in wireless sensor
networks. ARPN Journal of Engineering and Applied Sciences, 3(5), 1752-1760.

[42] M. S. Gharajeh, “FSB-System: A Detection System for Fire, Suffocation, and Burn Based on Fuzzy Deci-
sion Making, MCDM, and RGB Model in Wireless Sensor Networks,” Wireless Personal Communications,
vol. 105, no. 4, pp. 1171–1213, Mar. 2019.

[43] M. S. Gharajeh, “A Neural-MCDM-Based Routing Protocol for Packet Transmission in Mobile Ad Hoc
Networks,” International Journal of Communication Networks and Distributed Systems, vol. 21, no. 4, pp.
496–527, Sept. 2018.

Copyright ©2023 by the authors. Licensee Agora University, Oradea, Romania.
This is an open-access article distributed under the terms and conditions of the Creative Commons
Attribution-NonCommercial 4.0 International License.
Journal’s webpage: http://univagora.ro/jour/index.php/ijccc/

This journal is a member of and subscribes to the principles of
the Committee on Publication Ethics (COPE).

https://publicationethics.org/members/international-journal-computers-communications-and-control

Cite this paper as:

Mazher Iqbal, J.L.; Senthil Kumar, M.; Mishra Geetishree ; Asha, G.R.; Saritha, Karthik, A; J.V.N.;
BonthuKotaiah, N. (2023). Facial emotion recognition using geometrical features based deep learning techniques,
International Journal of Computers Communications & Control, 18(4), 4644, 2023.

https://doi.org/10.15837/ijccc.2023.4.4644


	 Introduction
	Related Work
	Proposed Approach
	Facial Localization and Landmark Detection
	Facial Geometrical Feature Extraction
	Statistical Feature Analysis
	VGG-19s and FCNN 

	Experimental setup
	Dataset
	GEMEP Dataset
	PerformanceMeasures
	Experimental Results on MUG Dataset

	Experimental Results on GEMEP Dataset
	 Experimental Results on GEME PDataset

	Conclusion and Future Work