Microsoft Word - ETASR_V11_N3_pp7172-7176


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7172-7176 7172 
 

www.etasr.com Hassan et al.: An Effective Combination of Textures and Wavelet Features for Facial Expression … 

 
An Effective Combination of Textures and Wavelet 

Features for Facial Expression Recognition 
 

Syed Muhammad Hassan 

Department of AI and Mathematical Sciences 
SMI University 
Karachi, Pakistan 

m.hassan@smiu.edu.pk 

Abdullah Alghamdi 

College of Computer Science and Information Systems 
Najran University 
Najran, Saudi Arabia 

abdulresearch@hotmail.com 

Abdul Hafeez 

Department of Software Engineering 

SMI University 
Karachi, Pakistan 

ahkhan@smiu.edu.pk 

Mohammad Hamdi 

College of Computer Science and Information Systems 

Najran University 
Najran, Saudi Arabia 

mahamdi@nu.edu.sa 

Imtiaz Hussain 

Department of AI and Mathematical Sciences 

SMI University 

Karachi, Pakistan 
imtiaz@smiu.edu.pk 

Mesfer Alrizq 

College of Computer Science and Information Systems 

Najran University 

Najran, Saudi Arabia 
msalrizq@nu.edu.sa 

 
Abstract—In order to explore the accompanying examination 

goals for facial expression recognition, a proper combination of 
classification and adequate feature extraction is necessary. If 

inadequate features are used, even the best classifier could fail to 

achieve accurate recognition. In this paper, a new fusion 

technique for human facial expression recognition is used to 

accurately recognize human facial expressions. A combination of 

Discrete Wavelet Features (DWT), Local Binary Pattern (LBP), 

and Histogram of Gradients (HoG) feature extraction techniques 

was used to investigate six human emotions. K-Nearest Neighbors 

(KNN), Decision Tree (DT), Multi-Layer Perceptron (MLP), and 

Random Forest (RF) were chosen for classification. These 

algorithms were implemented and tested on the Static Facial 

Expression in Wild (SWEW) dataset which consists of facial 

expressions of high accuracy. The proposed algorithm exhibited 

87% accuracy which is higher than the accuracy of the individual 
algorithms. 

Keywords-ANN; FER; DWT; LBP; HOG; K-Nearest Neighbors 

I. INTRODUCTION  

Facial expressions are a way of sentiment expression and 
non-verbal correspondence. There are various systems that deal 
with human attitude and point of view recognition. Facial 
Expression Recognition (FER) transforms is one of the most 
discussed scientific areas nowadays. This issue is furthermore 
incredibly noteworthy in Human-Computer Collaboration 
(HCI) [1, 2]. FER is being utilized to provide a description for 
the mental state of human beings [3]. Meanwhile, 
modifications in the look of photos can occur by disturbances 

in the pixels. Illumination troubles might also occur in indoor 
or outdoor photos. The exploration indicates those issues and 
proposes a combination strategy for different accessible 
highlights that surpasses these issues [4]. 

II. LITERATURE SURVEY AND THEORETICAL FRAMEWORK 

It is very difficult to identify human facial regions. In order 
to handle this efficiently, a technique should be implemented to 
recognize facial indicators. One clause that is vital to know is 
the dynamic angle on transferring video [5]. Lower and top 
face method extends the spatial pyramid histogram of edges 
which give 3-dimensional facial acknowledgment. 
Fundamentally in this method, elements are researched for 
cheerful and pity indicators [6]. LBP and Improved Local 
Binary Pattern have been applied alongside Coordinate 
Bunching Representation [7]. Face recognition using an 
optimized algorithm chain for both 2D and 3D images gives an 
accuracy about 96% with SVM classifier using LBP and PCA. 
Further testing on 2D and 3D images using LBP and PCA with 
FFBPNN (Feed Forward Back Propagation Neural Network) is 
less effective and efficient as compared to the SVM classifier 
[8]. Locality Preserving Projections (LPPs) have been used for 
manifold systems originated from Local Binary Pattern (LBP) 
subjects [9]. At first, a pyramid change is utilized to divide the 
test photographs into different areas. So, the goal pictures are 
isolated. After this, the ELBP is applied upon the little pictures 
to compute the ELBP pyramid and the community photo 
decided qualities are utilized to the little pictures from AWM 
which can ascertain the importance of the facts they got. 

Corresponding author: Syed Muhammad Hassan


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7172-7176 7173 
 

www.etasr.com Hassan et al.: An Effective Combination of Textures and Wavelet Features for Facial Expression … 

 
Finally, the AWELBPP highlight is assembled from the blend 
of the little ELBP pyramid and the AWM [10]. Support Vector 
Machine (SVM) has been applied in dispensing boisterous 
pictures for highlight extraction [11, 12].  

Background Subtraction [13] showed good results by 
applying background subtraction on real-time feeds. In this 
work, a model based on Gaussian Mixture was used for 
unfolding the pixels of images and the variables of the pattern 
were calculated with the Expectation-Maximization (EM) 
algorithm. The shades were also spotted effectively. 
Background subtraction was also very effective and met the 
requirements of drowning detection. Authors in [14] reviewed 
earlier approaches and tried to cover up the issue of 
recognizing actions and behaviors and the problem of dealing 
with a moderate crowded situation with a good modeling 
technique. The conventional techniques where mixed and a 
Gaussian distribution was used to design the temporal changing 
of the background pixels in [15]. This has been proven to be 
insufficient for extremely non-stationary environment. 
However, the thresholding method with hysteresis dealt with 
the issue of choosing thresholds in the background subtraction 
context. Stationary cameras have also been used to find 
drowning persons in swimming pools [6, 17, 18]. In contrast to 
previous works based on geometrical and 3D Mahalanobis 
distance features, the presented method in [18] captured the 
temporal and spatial correlation of the swimmers along with 
color information using the Markov Random Field (MRF) 
context to give better performance. Promising outcomes for 
drowning detection were achieved using an exclusive 
functional link net which fused the descriptors of extracted 
swimmers optimally. An improved descriptor fusion technique 
associated with the hierarchical technique was proposed in 
[19]. The current drowning detection techniques can be broadly 
classified into the vision-based schemes and the systems based 
on wearable sensors [20-22]. On the other hand, the 
combination of aerial and underwater cameras to monitor the 
postures of FER was utilized in [23], whereas the CNN model 
achieved 99.78% accuracy [24]. An even more successful 
accuracy level was achieved in learning similarities and 
dissimilarities among the faces of dataset using FDREnet in 
[25]. 

III. SYSTEM METHODOLOGY 

Usually, cameras are present in most areas for security 
purposes. The already installed cameras can be utilized for the 
purpose of monitoring and expression detection. So, few 
critical frames are extracted from the video or can be utilized. 
A facial expression video is divided into frames to be 
processed. The image frames extracted from the video are 
utilized for feature extraction. Then, classification is carried 
out. 

A. Feature Extraction 

The input dataset is very large to be handled and processed. 
It is supposed to be redundant (enough data, but not abundant 
information), so, the input dataset will be converted into a 
reduced depiction set containing features. This set is named as 
Features vector (Fv). This process is known as feature 
extraction. Therefore, taking out the prejudiced features from 

the images enhances the decline of the dimension of the Fv by 
removing the redundancy in images and squeezing the relevant 
data into the Fv to a much smaller size. 

 
Fig. 1.  The flowchart of the proposed method. 

B. Feature Extraction via Discrete Wavelet Transform 

Discrete Wavelet Transform (DWT) is utilized to extract 
features from an image on various levels of low pass (g), and 
high pass filtering (h). A signal x is calculated by passing 
through these series of filters, at first through the low pass filter 
(g) and then through the high pass (h):  

( )[ ] [ ] [ ][ ] *
k

h n x g n x k g n k
∞

=−∞

= = −∑    (1) 

The low pass filter (g) and the high pass filter (h) with h 
cut-off frequency are described by:  

[ ] [ ] [ ]2low
k

y n x k g n k
∞

=−∞

= −∑    (2) 

[ ] [ ] [ ]2high
k

y n x k h n k
∞

=−∞

= −∑    (3) 

The wavelet coefficients are the consecutive persistence of 
the estimate and the detail coefficients. The basic process of 
features extraction consists of: 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7172-7176 7174 
 

www.etasr.com Hassan et al.: An Effective Combination of Textures and Wavelet Features for Facial Expression … 

 
• Mouldering the image using DWT in N-levels using 
decimation and filtering to get the detailed coefficients and 
approximation. 

• Feature extraction using the DWT coefficients output.  

• The features that were taken out from the DWT coefficients 
of the images are considered as the input to classifiers 
because of their operative representation. 

The algorithmic steps for feature extraction from the dataset 
are:  

• Step 1: The image data are decomposed into 4 detailed sub-
bands by DWT. 

• Step 2: The coefficients of approximation are further been 
decomposed by DWT to obtain localized data from the sub-
band of the detailed coefficients of approximation 
(horizontal, vertical, and diagonal). 

• Step 3: Aimed at processing and analyzing, all of the 4 
levels detailed coefficients are calculated. 

• Step 4: Finally, the features are analyzed and tabulated to 
be used as the input of the classifier. 

C. Feature Extraction via Histogram of Gradients 

The Histogram of Oriented Gradients (HOG) is the shape of 
the "function descriptor". The motive behind a feature 
descriptor is to generalize the item on a way that the same item 
(in this case a person) produces the same feature descriptor at 
the same time as considered under specific situations. This 
makes the class assignment simpler. Static Facial Expressions 
in the Wild (SFEW) has been utilized for selecting frames from 
AFEW. Regarding the block normalization for HOG, we 
consider v as the non-normalized vector containing all 

histograms in a given block, 
k

v be its k-norm for k =1, 2, and 

e is some small constant. The normalized factor is defined as: 

2
2

2

v
f

v e

=
+

    
(4) 

The dataset covers unconstrained facial expressions, 
numerous head poses, massive age variety, occlusions, 
numerous poses, and near actual global illumination. Frames 
had been extracted from AFEW sequences and were labeled 
based on the label of the series. Typically, SFEW includes 
seven-hundred snapshots which have been classified for six 
fundamental expressions: anger, disgust, fear, happiness, 
sadness, and surprise, and were categorized by unbiased 
labelers. 

D. Feature Extraction via Local Binary Pattern (LBP) 

The LBP method is applied on facial images in order to 
extract features that may be used to get a degree of similarity. 
Firstly, the pictures have been divided into several blocks. 
After that, the LBP histogram was calculated for each block. 

The value of the LBP code of a pixel ( ),c cx y is considered as: 

( ) ( )
1

,
0

, 2
p

p

p cp r
p

LBP s g g
−

=

=∑     (5) 

where ( ) {1, 0}s x x= ∀ ≥ and ( ) {0,otherwise}s x = . The 
notation ( ), 2p rLBP u  is used for the LBP operator, where (p, r) 

represents the neighborhood, and u2 stands for uniform 
patterns and labeling all reaming patterns with a single label. 

The histogram for the image ( )1 ,f x y  is defined as: 

( )
( ),

{ , }, 0,......., 1,
i i

x y

H I f x y i i n= = = −∑     (6) 

The number of different labels produced by the LBP 
operator, and I{A} is 1, if A is true and 0 if it is false. Further, 
the image patches whose histograms are to be compared must 
be normalized in order to get a coherent description: 

1

0

i
i n

j

j

H
N

H
−

=

=

∑
    (7) 

Then, the block LBP histograms were concatenated into an 
unmarried vector. The histograms have then been evaluated by 
using space similarity [16]. Moreover, each bin in histograms 
consists of the variety of its look within the region. Lastly, the 
feature vector is constructed with the useful data by 
concatenating the community histograms to one massive 
histogram. 

IV. RESULTS AND DISCUSSION 

In this study, the SFEW dataset was used for testing, which 
is close to real world environment, having 300 color images 
with 6 emotion categories, consisting of 50 pictures each with 
dimensions of 143×181 pixels. The classes are Surprise, Fear, 
Anger, Sadness, Disgust, and Happiness represented by SU, F, 
A, S, D, and H respectively. The results were evaluated with 
assessment metrics, including confusion matrix, precision, 
recall, and F1 score. To compute the overall precision, we used 
micro-averages to combine the consequences across the 6 
categories. We divided our dataset into 80% training and 20% 
testing subsets. The sets were fed to the distinctive learning 
system which utilized algorithms such as K-Nearest Neighbor 
(KNN), Decision Tree (DT), Multilayer Perceptron (MLP), and 
Random Forest (RF). Our experimental model was divided into 
four parts. The mentioned machine learning algorithms were 
applied directly to the first part of the dataset. Table I shows the 
original dataset accuracies.  

TABLE I.  ORIGINAL IMAGES WITHOUT FEATURE EXTRACTION 

Algorithm KNN DT MLP RF 

Accuracy 27% 14% 22% 32% 

TABLE II.  LBP FEATURE EXTRACTION 

Algorithm KNN DT MLP RF 

Accuracy 50% 96% 22% 95% 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7172-7176 7175 
 

www.etasr.com Hassan et al.: An Effective Combination of Textures and Wavelet Features for Facial Expression … 

 
Maximum accuracy was achieve by the RF and was only 
32%. Then, all algorithm accuracies were computed using 
DWT, LPB, and HOG for the other parts of the dataset (Tables 
II-IV), and finally combined them and achieved 87% maximum 
accuracy with MLP and 29% minimum accuracy with KNN 
(Table V), which are respectively shown in the confusion 
matrices of Figures 2 and 3. 

TABLE III.  DWT FEATURE EXTRACTION 

Algorithm KNN DT MLP RF 

Accuracy 12% 17.5% 36% 21% 

TABLE IV.  HOG FEATURE EXTRACTION 

Algorithm KNN DT MLP RF 

Accuracy 12% 12% 37% 14% 

TABLE V.  COMBINATION OF LBP, DWT, AND HOG 

Algorithm KNN DT MLP RF 

Accuracy 29% 80% 87% 79% 

 
Fig. 2.  MLP confusion matrix. 

 
Fig. 3.  KNN confusion matrix. 

Further, we also calculated some edges of the face 
generated by DWT, LBP, and HOG. The original image is 
reconstructed using Harr DWT techniques. 

 
Fig. 4.  Retained energy is 99.40%. 

 
Fig. 5.  HOG edges. 

 
Fig. 6.  LBP visualized surprised face. 

V. CONCLUSION AND FUTURE WORK 

The proposed model combines DWT, HOG, and LBP 
capabilities in a feature extraction technique with system 
learning algorithms in an excellent way of enhancing the 
accuracy of facial feature recognition. Six facial expressions 
from the SFEW database had been used for training and 
validation. The results indicated that the accuracy of the use of 
blended methods is 87%, which is higher from the individual 
accuracies of the combined algorithms. However, the proposed 
combination has the issue of generalization which may be 
addressed in our future work. 


Engineering, Technology & Applied Science Research Vol. 11, No. 3, 2021, 7172-7176 7176 
 

www.etasr.com Hassan et al.: An Effective Combination of Textures and Wavelet Features for Facial Expression … 

 
FER is one of the most well-known regions in image 
processing. Generally, FER has been given more attention 
nowadays. The proposed technique gives an exquisite overview 
of facial recognition methods. The extraction of functions is 
vital as it decreases the very massive amount of data to only a 
required set. Thus, it reduces the processing time of the 
machine and the results are more correct. In future work, the 
accuracy may be augmented by using more learning 
algorithms. A similar approach to the usage of the Convolution 
Natural Community can be combined with the prevailing 
support vector classifier. 

ACKNOWLEDGEMENT 

The authors would like to thank Dr. Abhinav Dhall, 
Australian National University for the provision of the SFEW 
dataset. 

REFERENCES 

[1] W.-L. Chao, J.-J. Ding, and J.-Z. Liu, "Facial expression recognition 
based on improved local binary pattern and class-regularized locality 

preserving projection," Signal Processing, vol. 117, pp. 1–10, Dec. 2015, 
https://doi.org/10.1016/j.sigpro.2015.04.007. 

[2] F. Long and M. S. Bartlett, "Video-based facial expression recognition 

using learned spatiotemporal pyramid sparse coding features," 
Neurocomputing, vol. 173, pp. 2049–2054, Jan. 2016, https://doi.org/10. 

1016/j.neucom.2015.09.049. 

[3] H. Fang et al., "Facial expression recognition in dynamic sequences: An 
integrated approach," Pattern Recognition, vol. 47, no. 3, pp. 1271–1281, 

Mar. 2014, https://doi.org/10.1016/j.patcog.2013.09.023. 

[4] J. Hussain Shah, M. Sharif, M. Raza, M. Murtaza, and S. Ur-Rehman, 
"Robust Face Recognition Technique under Varying Illumination," Journal 

of applied research and technology, vol. 13, no. 1, pp. 97–105, 2015. 

[5] S. Arya, N. Pratap, and K. Bhatia, "Future of Face Recognition: A 

Review," Procedia Computer Science, vol. 58, pp. 578–585, Jan. 2015, 
https://doi.org/10.1016/j.procs.2015.08.076. 

[6] M.-Y. Chen and C.-C. Chen, "The contribution of the upper and lower 

face in happy and sad facial expression classification," Vision Research, vol. 
50, no. 18, pp. 1814–1823, Aug. 2010, https://doi.org/10.1016/ 

j.visres.2010.06.002. 

[7] A. Fernandez, O. Ghita, E. Gonzalez, F. Bianconi, and P. F. Whelan, 
"Evaluation of robustness against rotation of LBP, CCR and ILBP features in 

granite texture classification," Machine Vision and Applications, vol. 22, no. 
6, pp. 913–926, Nov. 2011, https://doi.org/10.1007/s00138-010-0253-4. 

[8] S. Shankar and V. R. Udupi, "Recognition of Faces – An Optimized 

Algorithmic Chain," Procedia Computer Science, vol. 89, pp. 597–606, Jan. 
2016, https://doi.org/10.1016/j.procs.2016.06.020. 

[9] R. K. Nagar, R. Manazhy, and P. Sankaran, "Sparse Manifold 

Discriminant Embedding for Face Recognition," Procedia Computer 
Science, vol. 89, pp. 743–748, Jan. 2016, https://doi.org/10.1016/ 

j.procs.2016.06.050. 

[10] T. Gao, X. L. Feng, H. Lu, and J. H. Zhai, "A novel face feature 
descriptor using adaptively weighted extended LBP pyramid," Optik, vol. 

124, no. 23, pp. 6286–6291, Dec. 2013, https://doi.org/10.1016/j 
.ijleo.2013.05.007. 

[11] K. Yu, Z. Wang, L. Zhuo, J. Wang, Z. Chi, and D. Feng, "Learning 

realistic facial expressions from web images," Pattern Recognition, vol. 46, 
no. 8, pp. 2144–2155, Aug. 2013, https://doi.org/10.1016/j.patcog.2013. 

01.032. 

[12] R. A. Khan, A. Meyer, H. Konik, and S. Bouakaz, "Framework for 

reliable, real-time facial expression recognition for low resolution images," 
Pattern Recognition Letters, vol. 34, no. 10, pp. 1159–1168, Jul. 2013, 

https://doi.org/10.1016/j.patrec.2013.03.022. 

[13] S. Ali Khan, A. Hussain, and M. Usman, "Facial expression recognition 
on real world face images using intelligent techniques: A survey," Optik, vol. 

127, no. 15, pp. 6195–6203, Aug. 2016, https://doi.org/10.1016/ 
j.ijleo.2016.04.015. 

[14] S. Ali Khan, A. Hussain, A. Basit, and S. Akram, "Kruskal-Wallis-Based 

Computationally Efficient Feature Selection for Face Recognition," The 
Scientific World Journal, vol. 2014, May 2014, Art. no. e672630, 

https://doi.org/10.1155/2014/672630. 

[15] C. Shan, S. Gong, and P. W. McOwan, "Facial expression recognition 
based on Local Binary Patterns: A comprehensive study," Image and Vision 

Computing, vol. 27, no. 6, pp. 803–816, May 2009, https://doi.org/ 
10.1016/j.imavis.2008.08.005. 

[16] W.-H. Chen, P.-C. Cho, P.-L. Fan, and Y.-W. Yang, "A framework for 
vision-based swimmer tracking," in International Conference on Uncertainty 

Reasoning and Knowledge Engineering, Bali, Indonesia, Aug. 2011, vol. 1, 
pp. 44–47, https://doi.org/10.1109/URKE.2011.6007835. 

[17] D. Zecha, T. Greif, and R. Lienhart, "Swimmer detection and pose 

estimation for continuous stroke-rate determination," in Multimedia on 
Mobile Devices 2012; and Multimedia Content Access: Algorithms and 

Systems VI, California, United States, Jan. 2012, vol. 8304, Art. no. 830410, 
https://doi.org/10.1117/12.908309. 

[18] S. S. Intille, J. W. Davis, and A. F. Bobick, "Real-time closed-world 

tracking," in IEEE Computer Society Conference on Computer Vision and 
Pattern Recognition, San Juan, USA, Jun. 1997, pp. 697–703, 

https://doi.org/10.1109/CVPR.1997.609402. 

[19] K.-A. Toh, W.-Y. Yau, and X. Jiang, "A reduced multivariate 
polynomial model for multimodal biometrics and classifiers fusion," IEEE 

Transactions on Circuits and Systems for Video Technology, vol. 14, no. 2, 
pp. 224–233, Feb. 2004, https://doi.org/10.1109/TCSVT.2003.821974. 

[20] M. Kharrat, Y. Wakuda, N. Koshizuka, and K. Sakamura, "Automatic 

waist airbag drowning prevention system based on underwater time-lapse and 
motion information measured by smartphone’s pressure sensor and 

accelerometer," in IEEE International Conference on Consumer Electronics, 
Las Vegas, NE, USA, Jan. 2013, pp. 270–273, https://doi.org/10.1109/ 

ICCE.2013.6486891. 

[21] M. Kharrat, Y. Wakuda, S. Kobayashi, N. Koshizuka, and K. Sakamura, 
"Near drowning detection system based on swimmer’s physiological 

information analysis," presented at the World Conference on Drowning 
Prevention (WCDP), May 2011. 

[22] E. McAdams et al., "Wearable sensor systems: The challenges," in 
Annual International Conference of the IEEE Engineering in Medicine and 

Biology Society, Boston, MA, USA, Sep. 2011, pp. 3648–3651, 
https://doi.org/10.1109/IEMBS.2011.6090614. 

[23] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, "Support 

Vector Clustering," Journal of Machine Learning Research, vol. 2, pp. 125–
137, 2001. 

[24] Y. Said, M. Barr, and H. E. Ahmed, "Design of a Face Recognition 

System based on Convolutional Neural Network (CNN)," Engineering, 
Technology & Applied Science Research, vol. 10, no. 3, pp. 5608–5612, Jun. 

2020, https://doi.org/10.48084/etasr.3490. 

[25] D. Virmani, P. Girdhar, P. Jain, and P. Bamdev, "FDREnet: Face 
Detection and Recognition Pipeline," Engineering, Technology & Applied 

Science Research, vol. 9, no. 2, pp. 3933–3938, Apr. 2019, https://doi.org/ 
10.48084/etasr.2492.