


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
Learning Representations for Face Recognition:  

A Review from Holistic to Deep Learning  

Fabian Barreto
1*

, Jignesh Sarvaiya
2
, Suprava Patnaik

3
 

1
Department of Electronics and Telecommunication, Xavier Institute of Engineering, Mumbai, India 

2
Department of Electronics, Sardar Vallabhbhai National Institute of Technology, Surat, India 

3
School of Electronics, Kalinga Institute of Industrial Technology, Bhubaneswar, India 

Received 18 August 2021; received in revised form 07 January 2022; accepted 08 January 2022 

DOI: https://doi.org/10.46604/aiti.2022.8308 

Abstract  

For decades, researchers have investigated how to recognize facial images. This study reviews the development 

of different face recognition (FR) methods, namely, holistic learning, handcrafted local feature learning, shallow 

learning, and deep learning (DL). With the development of methods, the accuracy of recognizing faces in the labeled 

faces in the wild (LFW) database has been increased. The accuracy of holistic learning is 60%, that of handcrafted 

local feature learning increases to 70%, and that of shallow learning is 86%. Finally, DL achieves human-level 

performance (97% accuracy). This enhanced accuracy is caused by large datasets and graphics processing units 

(GPUs) with massively parallel processing capabilities. Furthermore, FR challenges and current research studies are 

discussed to understand future research directions. The results of this study show that presently the database of 

labeled faces in the wild has reached 99.85% accuracy.  

 
Keywords: learning representations, deep learning, autoencoders, variational autoencoders 

 
1. Introduction  

In the modern world, automatic face recognition (AFR) is embedded into smart e-commerce applets for better 

personalization and marketing of commodities, such as hair styling and digital makeup. Consumer-based photography has 

become a new trend in selecting a range of products that suit consumers’ needs, with social media platforms providing facial 

recognition services to attract diverse users. Conventional facial recognition (FR) requirements are limited to basic security 

and access control applications, and are implemented in more advanced ways. Examples include accessing historical data and 

using cloud-based database identification and closed-circuit television (CCTV) video-supported tracking, leading to better 

enforcement of the law. Facial identification has become essential for forensics, surveillance, border control, lie detection, and 

access ID verification.  

FR, in its various dimensions, is currently a research area in computer vision, and is the process of detecting and locating 

faces from a background, normalizing face images, and performing face verification (FV) or face identification (FI). There are 

two separate tasks for face matching while conducting FR, namely: FV and FI. In FV, one determines whether a given test 

image is from the same person being verified, while the FI aims to recognize the facial images of persons already enrolled in 

the database [1]. To verify genuineness, the output of FR is either “yes” or “no,” which may be a result of the class number 

corresponding to the input image. In the FV, the input image is assumed to be a sample from a known possible class of inputs. 

                                                           
*
 
Corresponding author. E-mail address: frfabiansj@xavier.ac.in 

 
Tel.: +919833916407 

 
Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
Regarding face detection (FD), in 2001, Viola and Jones [2] used Haar-like features to detect human faces. A 24 × 24 pixel can 

have over 160,000 Haar-like features. The framework used the concept of integral images to perform intensive computation 

and the adaptive boost (AdaBoost) algorithm to select the best features from different subsets.  

Wang et al. [3] categorized FD and recognition development into four broad representation learning types: holistic, 

handcrafted, shallow, and deep learning. Traditionally, FR techniques have been divided into two major categories: geometric 

and photometric techniques. Here, geometric techniques find distinct features and spatial positioning to form a template that is 

used to compare and eliminate variances in face images. Photometric approaches are distilled out and use hidden statistical 

properties that account for the entire input of facial images. Popular photometric approaches include principal component 

analysis (PCA) using the eigenface algorithm and linear discrimination analysis (LDA) using the Fisherface algorithm. The 

holistic approach uses low-dimensional representations in the form of a manifold or a linear subspace. However, this approach 

is limited by variations such as face appearances that introduce different statistical distributions, which are difficult to manage.  

The early twentieth century saw a transition to handcrafted local feature-based methods. Inherent face changes are 

managed through local descriptors, such as local binary patterns (LBPs), Gabor filters, and histograms of oriented gradients 

(HOGs). These local features help remove redundant and meaningless information from raw representation; thus, they provide 

greater robustness than existing methods and are greatly invariant to transformation. The limitations of these approaches are 

that they suffer from a lack of compactness, distinctiveness over a large sample space, and acceptance for real-time 

applications, as well as being slow and susceptible to poor generalization. 

Shallow representation learning, with a one- or two-layer representation, improved the distinctness of the codebook. 

Noticeable shallow approaches included the learning-based (LE) approach, discriminant face descriptor (DFD), feature vector, 

and PCANet. However, these approaches were not robust to the complex non-linear nature of the face. 

Deep learning (DL) is a revolutionary approach that has changed the facial recognition landscape. In 2012, AlexNet 

achieved state-of-the-art (SOTA) recognition accuracy and propelled research toward DL for computer vision. Researchers 

have used a convolutional neural network (CNN) that exhibited strong invariance to face pose, lighting, expression, and other 

variations to achieve high accuracy. Thus, this research addresses recognition accuracy and investigates the complexity of 

learning a large number of features, dependency on datasets, protocols addressing application scenarios, and model 

interpretability. This research also addresses variations encountered owing to cross-posed, aging, and other adversarial 

conditions. 

Since the 1990s, remarkable advances have been made in FD and recognition. This study aims to review the development 

of learning representations for FR in the past three decades and has resulted in an accuracy increase of 39.85% for labeled faces 

in the wild (LFW) database from the earlier methods used three decades ago. The remainder of this study is organized as 

follows. Section 2 describes the initial holistic representation of the learning stage for FR, and section 3 describes the transition 

to a handcrafted stage. Section 4 presents the shallow learning phase, and section 5 deals with the DL phase and some 

challenges and current research studies. Finally, section 6 provides the conclusions of this study.    

2. Review of Holistic Learning 

The earliest holistic stage begins by using eigenfaces, motivated by Sirovich and Kirby [4], to efficiently represent face 

images using PCA. They then transition to Fisherface algorithms and LDA and later to independent component analysis (ICA), 

leading to sparse representation-based classification (SRC), a particular case of collaborative representation-based 

classification (CRC). Later, researchers used distance metric learning with improved class separability, meaning that the 

holistic stage can assume certain distributions (linear, manifold, and sparse) from which it arrives at a low-dimensional 

representation. However, these assumptions do not hold firm ground on the variations in facial features. 

280 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
2.1.   Principal component analysis (PCA) 

Ballantyne et al. [5] mentioned the pioneering work of Woody Bledsoe and his AFR team. They manually classified face 

images with landmarks (e.g., eye centers and mouth) and saved the metrics in a database. Goldstein et al. [6] enhanced the 

accuracy by using 21 specific subjective markers on the face. The work of Turk and Pentland [7] in 1991 gave a new direction 

to using eigenfaces (PCA) to develop the first AFR system. Varying the illumination and pose conditions is a challenging task 

for this method. It is essential to understand that a particular eigenfeature may not be related to recognition, but to the direction 

of illumination. Hence, an increase in eigenfeatures does not necessarily lead to better accuracy. PCA can only set apart the 

linear dependencies in the pixel pair of a facial image.  

PCA is a method for expressing data vectors in their principal components (PCs), where the largest variances in the data 

indicated the direction of the PCs (Fig. 1). PCs capture the most significant data information and correspond to the 

eigenvectors given by the largest eigenvalues of the autocorrelation matrix of the data vectors. PCA computes the most 

representational basis for looking at the dataset and generally works as follows. First, it calculates the covariance matrix of the 

given data points and calculates the eigenvectors and corresponding eigenvalues sorted in decreasing order. Then, the first k 

eigenvectors are chosen from the n eigenvectors (k < n), yielding the novel k dimensions. Thus, the original n higher 

dimensions were transformed into k fewer dimensions. 

 
Fig. 1 Original space (X1, X2) and PCA reduced space (PC1, PC2) 

2.2.   Linear discrimination analysis (LDA) 

LDA constructs a subspace that differentiates between different face images, while Fisher discriminant analysis classifies 

face images into groups based on their facial features. Zhao et al. [8] used LDA for FR because it encodes discriminatory 

information. They used PCA to project the face image to a subspace and used an LDA to obtain a linear classifier in the 

subspace. The pure LDA approach, however, does lead to an overfitting problem and does not perform well for samples from 

different classes and samples with diverse backgrounds. 

2.3.   Independent component analysis (ICA) 

ICA describes a subspace method that transforms data from high to low dimensions. It finds a linear transformation that 

leads to the minimization of the statistical dependence between its components. However, unlike PCA, it provides an improved 

probabilistic model, a greater response to high-order statistics, and better reconstruction in noisy environments [9]. A set of 

statistically independent basis images for a set of face images is found by separating the independent components of the facial 

images (Fig. 2). Here, let S be a set of statistically independent source images, which is unknown, with X as the source of the 

face images and A as an unknown combination matrix. WI is a matrix of learned filters which in turn produces outputs U that 

are statistically independent. ICA outputs in rows that are WIX = U. 

281 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
Fig. 2 Image synthesis model 

2.4.   Hidden Markov model (HMM) 

In a hidden Markov model (HMM), patterns are characterized as parametric random processes. These parameters can be 

estimated precisely and logically. Samaria et al. [10] used the HMM model to represent the statistics of facial images. They 

converted a two-dimensional face image to a one-dimensional sequence. As shown in Fig. 3, the face is split into regions (e.g., 

the forehead, eyes, nose, mouth, and chin). After determining the hidden states (five in the given figure), the HMM is trained to 

learn the state transitional probability. After training on the output probability, the class was determined. Although HMM has a 

better detection rate, it also has a higher false-alarm rate. 

 
Fig. 3 Five-state HMM 

2.5.   Bayesian model 

Schneiderman et al. [11] derived a probabilistic model for FR using local regions, such as the eyes, nose, and mouth. Their 

statistical model captured the more unique patterns of the human face, such as the intensity patterns around the eye, to represent 

the local features more uniquely. They also modelled the joint probability of local features and positions, as human faces are 

easily recognized because of their proper spatial arrangement. They used the Bayesian decision rule, also known as maximum 

a posteriori (MAP), and calculated a larger probability for a given input image x, namely, P(face | x) or P(not face | x), 

indicating whether a face was selected. Yang et al. [12] presented two advantages of using a naive Bayes classifier; that is, it 

provided a better estimation of the subregion conditional density functions and provided an MAP to understand the joint 

statistics of a local feature and its position.  

2.6.   Locality preserving projection (LPP) 

He et al. [13] proposed an appearance-based Laplacian method for facial recognition by using locality preserving 

projections (LPPs) to map facial images into a subspace. Eigenfaces (PCA) preserve the global surface of the face image, 

whereas the Fisherface algorithm (LDA) preserves discriminating information. The advantage of LPP over PCA and LDA is 

282 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
that it preserves local features and detects the essential face manifold surface, where the nearest-neighbor graph models this 

surface. The face images in the lower-dimensional subspaces are called Laplacian faces. Facial recognition was performed in 

three steps. Laplacian faces were calculated from the given training face image samples, and the test image is then projected 

onto the Laplacian face subspace. Finally, the nearest-neighbor classifier identifies a new face. As this method considers the 

face manifold, it considers varying illumination conditions. 

2.7.   Sparse representation-based classification (SRC) and collaborative representation-based classification (CRC) 

SRC and CRC belong to sparse representation-based classifiers. The test input image was a linear connection between the 

recorded images. The test image can be recognized as the combination coefficients for the target faces, which are larger than 

the others. In SRC/CRC, the test face images are coded over others with sparsity constraints, such as L1 minimization. 

SRC/CRC uses the reconstruction error to determine the face image. In the work of Wright et al. [14], the discriminative 

property of an SRC model for classification was used, while in the work of Zhang et al. [15] and Zhang et al. [16], it was shown 

that the good performance of SRC is primarily due to the collaborative representation of the test face image with training 

samples across different classes. 

2.8.   Distance metric learning 

In distance metric learning, one learns a distance metric for the input space of face images from a given set of 

similar/dissimilar points in the training face images. Yang et al. [17] categorized the algorithms for distance metric learning 

into supervised and unsupervised methods. Supervised training face images are placed into pairwise constraints: pairs of 

same-class data points in the equivalence constraints and those that belong to different classes in equivalence constraints. 

Supervised learning can be global or local, where global satisfies pairwise constraints simultaneously and local only meets 

local pairwise constraints. Supervised learning includes supervised global learning, local adaptive supervised learning, 

neighborhood component analysis, and relevant component analysis (RCA), while unsupervised learning includes linear-like 

PCA and multidimensional scaling. They also include nonlinear embedding methods such as isometric mapping, linear 

embedding, and Laplacian eigenmaps.  

Jin et al. [18] presented a regularized distance metric learning algorithm that is robust for high-dimensional data. Here, the 

generalization error of regularized distance metric learning is independent of dimensionality. The algorithm was tested with the 

baselines of the Euclidean distance metric, Mahalanobis distance metric, large margin nearest neighbor classifier, 

information-theoretic metric learning, and RCA and was comparable to SOTA approaches for distance learning. 

3. Review of Handcrafted Local Feature Learning 

To enhance the holistic method, researchers started using handcrafted local features. They used Gabor wavelets, elastic 

bunch graph matching (EBGM), local binary patterns (LBP), and high dimensional local binary patterns (HD-LBP). These 

methods did achieve robust performance. However, as the features increased, there was a problem of distinctiveness, and the 

large size created the problem of non-compactness. 

3.1.   Gabor wavelet (filter) 

Gabor introduced the Gabor wavelet (or Gabor filter) in 1946 as a band-pass filter and has an impulse response given by a 

Gaussian function, multiplied by a harmonic function. Its resolution is optimal in both the domains of space and frequency. 

Daugman [19] generalized the 1-D Gabor filters to two-dimensional Gabor filters. Liu et al. [20] described a facial recognition 

Gabor feature classifier where Gabor wavelets first transform the face images to obtain the augmented Gabor FV and then pass 

through an enhanced Fisher discrimination model. Their results showed that the classifier can discriminate Gabor features with 

283 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
low dimensionality and increased discrimination. Barbu [21] proposed a 2-D Gabor filter for human FR. He used 2-D Gabor 

filter banks, which help extract different orientation and scale features from the input face image, resulting in 3-D face feature 

vectors. One disadvantage is that Gabor features have high dimensionality and result in redundancy [22]. A hybrid method uses 

Gabor filters and another technique such as PCA to reduce redundancy. Principal Gabor filters that help reduce redundancy are 

described in the work of Štruc et al. [23]. Here, they used orthonormal linear combinations and derived a Gabor face 

representation. However, the tradeoff is that the filters are not optimally localized in the space and frequency domains. 

3.2.   Local binary pattern (LBP) 

The human face can be viewed as consisting of micro-patterns and hence can use an LBP as a face descriptor [24-25]. 

LBP was first proposed for texture description [26], where it was observed that certain LBP are key properties of texture and 

sometimes represent over 90% of all 3 × 3 patterns present in the textures. After thresholding, a histogram that functions as a 

texture descriptor can be created (Fig. 4). These patterns have uniform circular structures with few spatial transitions and were 

used as templates. The LBP operator is only a 3 × 3 neighborhood; therefore, it is difficult to capture the features that are 

dominant for large-scale structures, with later models using neighborhoods of different sizes to correct this issue. LBP 

efficiently summarizes the local structures of facial images, where each pixel was compared with its neighboring pixels.  

An example is shown in Fig. 5. Here, each pixel is compared with its eight neighbors by subtracting the center pixel value. 

The encoding process is done in the following steps. Encode a 0 for negative; otherwise, encode a 1. Concatenate all binary 

values in a clockwise direction. Begin from the top-left neighbor and move clockwise. Convert the binary to a decimal value, 

the label (LBP codes) for the given pixel. LBP is a non-parametric method that converts the face image into an array of integer 

labels. Huang et al. [27] surveyed LBP and its variants that offer better performance and improved the robustness of the 

original LBP. Isnanto et al. [28] used LBP and Haar cascade classifier on low-resolution images for multi-object FR. 

 
Fig. 4 LBP histogram 

 
Fig. 5 LBP operator 

3.3.   Elastic bunch graph matching (EBGM) 

Bolme [29] described the EBGM FR algorithm. It recognizes new facial images by localizing landmark features and 

then finds the similarity measure. Facial landmark points were selected manually from a set of model face images with 

variations. Gabor jets are the names given to the Gabor wavelets extracted from the landmark point and the jets from the 

model form a face bunch graph. Each node contains a stack of N jets (N = model image). Here, the edge is the distance 

284 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
between landmark points (Fig. 6). The limitation of the EBGM is that one needs to rely on the model’s manual ground truth 

for landmark selection at the initial recognition stage. Lahasan et al. [30] proposed a method to overcome this shortcoming 

by posing the EBGM as an optimization problem by using harmony search (HS) to find the optimal facial landmarks using 

the manual method. 

 
Fig. 6 EBGM process 

3.4.   Scale-invariant feature transform (SIFT) 

Scale-invariant feature transform SIFT was proposed by Lowe [32-33]. It creates descriptors that are scale-, rotation-, and 

translation-invariant and possesses high dimensionality. FR tasks use SIFT features [33-34] to reliably match images. This 

process includes extracting SIFT keypoints from the face image. How can one find the test image? By finding the matching 

features. The Euclidean distance was used as the measure; however, a challenge is the reliable extraction of consistent SIFT 

descriptors. As shown in Fig. 7, the SIFT algorithm has four stages: keypoint detection, keypoint localization, orientation 

assignment, and keypoint descriptor generation. Keypoint detection uses the difference of the Gaussian (DOG) function to 

detect feature points, and each keypoint is assigned one or more orientations during the orientation assignment stage. In the last 

stage, each keypoint is assigned to a vector descriptor. Given that the algorithm is computationally intensive, the actions are 

performed only at positions that go through the first test. Fig. 8 shows the SIFT features of a 64 × 64 image, its noisy version, 

and matching features. 

 
Fig. 7 Stages of the SIFT algorithm 
 

(a) SIFT keypoints of the original  

image (64 × 64) 

(b) With added noise (c) SIFT keypoint matching 

Fig. 8 Implementation of SIFT 

285 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
3.5.   Histogram of oriented gradient (HOG) 

Dalal et al. [35] developed grids of HOG descriptors, which have the advantage of capturing the gradient (edge) structure, 

a characteristic of the local shape. The grids count the occurrence of edge orientations in the local neighborhood of the face 

image. Facial images were split into small and linked regions (cells), and a histogram of the edge orientations was computed 

for each cell. The histograms were normalized to account for the illumination and combined to form the HOG descriptor. The 

HOG is invariant to 2D rotation and scaling. Using locally normalized HOG features with an overlapping dense grid yielded 

better results. Déniz et al. [36] proposed a method for building a robust HOG descriptor by using a regular grid, combining 

HOG descriptors at different scales, and applying a reduction in linear dimensions. 

4. Review of Shallow Learning 

The shallow learning-based (LE) local descriptor phase uses local filters to learn distinctiveness and a codebook to 

achieve compactness. As this was a shallow representation, a one- or two-layer representation, it is not robust to the complex 

nonlinearity of face images. The method also improves one characteristic, such as pose, light, or expression, but does not 

address unconstrained changes in the face image in general. 

4.1.   Learning-based (LE) 

Cao et al. [37] proposed a new LE descriptor that was compact, discriminative, and easy to extract. They list the 

disadvantages of existing handcrafted methods, as it is challenging to obtain an optimal encoding and unevenly distributed. 

Their process consisted of extracting face landmarks that aligned nine different parts of the face separately, which were fed into 

the DOG filter to remove low- and high-frequency illumination variations. Each pixel has a low-level FV encoded by an LE 

encoder. PCA-reduced histograms were concatenated and then normalized to obtain the LE descriptor, and the similarity of the 

LE descriptors of the face pair was measured using the L2 distance norm. The nine component similarity scores were then fed 

into a pose-adaptive classifier, which resulted in FV. 

4.2.   Discriminant face descriptor (DFD) 

Lei et al. [38] described a technique for acquiring a DFD. Discriminant local features learn by minimizing the feature 

differences between the same face images and maximizing those between different face images. The discriminative capability 

is performed in three steps: learning discriminant image filters, determining the optimal neighborhood sampling, and 

constructing the dominant patterns. They also used coupled DFDs to view heterogeneous facial data. 

4.3.   Feature vector  

Sánchez et al. [39] described the feature vector method for image classification based on the principle of Gaussian mixture 

distribution. They proposed using the Fisher kernel framework and described their blocks by deviation from a Gaussian 

mixture distribution with diagonal covariance. Visual vocabulary is a gradient vector for the model parameters. Their method 

encoded the (probabilistic) count of occurrences and higher-order statistics. The authors listed the advantages of their method 

as having better results than efficient linear classifiers and compression with a very low loss of accuracy. 

4.4.   PCANet 

Chan et al. [40] proposed a baseline model for image classification called PCANet, a precursor to DL models. PCANet 

consists of cascaded PCA to learn from multistage filter banks, binary hashing, and blockwise histograms and has two 

variations: RandNet and LDANet. In RandNet, they replaced PCA filters with random filters of the same size at each layer, 

whereas in LDANet, the supervision of a classification problem was improved by using supervised training. LDA is used to 

286 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
learn the filters. PCANet eliminated image variability and provided effective accuracy with well-preprocessed images in the 

datasets. However, PCANet may not sufficiently account for the variability of challenging face images. However, the PCANet 

is a valuable baseline for studying DL architectures. 

5. Review of Deep Learning 

The FR landscape saw a fundamental shift with the introduction of AlexNet, which uses DL. DeepFace [41], DeepID 

[42-43], FaceNet [44], ArcFace [45], and AdaptiveFace [46] have paved the way for an evolution of network architectures, 

algorithms, and datasets to answer the multi-faceted FR problem. The accuracy results for the LFW database [47] explain the 

FR development stages. For the holistic stage, the accuracy was 60%, while for handcrafted, it increased to 70%, shallow to 

86%, and finally, for DL, especially for DeepFace, it approached human-level performance of 97% for the unconstrained FR. 

In the early days of the AFR, the focus was more on developing FD algorithms and less on developing face image datasets. 

There has been organic growth in the datasets over the past two decades because it has come from the research community in 

terms of the need for a large number of face images with varying conditions and diversity. Another development has been the 

challenge to go beyond recognizing faces from laboratory-controlled to unconstrained face images. AFR research has 

progressed enormously, with some simple datasets achieving 99% accuracy, which has resulted in the development of more 

complex datasets that can facilitate new directions for FR research.  

The number of face images in the datasets and their variations has increased over the years. The past decade with FR research 

moving toward DL approaches has resulted in the growth of large training datasets required to implement DL algorithms 

effectively. Taskiran et al. [48] classified face image datasets as image-based or video-based. They may also be 3D or 

hyperspectral/infrared datasets. Some of the datasets were private, whereas others were public. These datasets are essential for 

benchmarking new AFR algorithms. A database’s choice depends on the given problem that one intends to solve or a property that 

one wants to test and also depends on the size of the training set required to test the algorithm. Some databases, such as Facebook, 

Google, CelebFaces+, and VGGFace, were used for training, and others, such as LFW, YTF, and IJB-C, were used for testing. 

5.1.   Artificial intelligence (AI), machine learning (ML), and deep learning (DL) 

John McCarthy, the father of artificial intelligence (AI), coined the term AI in his 1955 proposal for the Dartmouth 

Conference, USA, in 1956. On a broader scale, AI explores theories and applications to broaden human intelligence and 

envisions the creation of a future where intelligent machines have human-like perception and cognition. Researchers have 

made significant progress in understanding and improving learning algorithms; however, the challenge of AI remains [49]. As 

shown in Fig. 9, DL is a subfield of machine learning (ML), and ML is a subset of the broader field of AI. Some examples of 

ML problems include classification, clustering, and prediction. Traditional ML techniques are constrained to process data in a 

basic form and domain experts are required to carefully perform feature extraction [50]. DL is a subset of the ML and learns 

multiple representations and abstraction levels to understand the data. The raw input was transformed to a higher and more 

abstract level (Fig. 10). These transformations can help learn complex and intricate functions. 

 
Fig. 9 Relationship of AI, ML, and DL 
 

287 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
(a) ML (b) DL 

Fig. 10 ML and DL approaches 

5.2.   Artificial neural network (ANN) 

The unique human brain, especially how neurons interact, has inspired scientists. Artificial neural networks (ANNs) are 

hardware and software implementations of neural structures in the human brain. The history of neural computing originated 

with the work of McCulloch and Pitts in 1943. The Warren McCulloch and Walter Pitts model (MCP model, known as the 

linear threshold gate model) is a binary classifier [51], where the weights were manually adjusted by a human. In the 1950s, 

Rosenblatt published the Perceptron algorithm, which automatically learns weights without human involvement [52]. This was 

an enhanced version of the MCP model. The perceptron model adds extra information representing the bias and variable 

weight values. The 1969 publication by Minsky and Papert [53] weakened neural network research for nearly a decade 

(1969-1986). They believed that using Perceptrons in practical applications was futile without an adequate basic theory.  

In 1979, Fukushima developed a neural network with multiple pooling and convolutional layers called neocognitron, 

which used a hierarchical and multilayered design that learned how to recognize visual patterns [54]. Rumelhart revived neural 

network research in 1986 using a backpropagation (BP) algorithm. The neural network iteratively learns weights that are then 

used to predict class labels. Given sufficient hidden units and sufficient training data multilayers, feedforward networks can 

closely approximate any function. In 1989, Yann LeCun demonstrated BP at the Bell Labs. He combined CNNs with BP to 

read handwritten digits. In 1997, long short-term memory for recurrent neural networks (RNNs) was developed by Hochreiter 

and Schmidhuber, with a gating mechanism to regulate the information to be kept or discarded at each time step. 

5.3.   The deep learning phase 

In 2009, Fei-Fei Li launched the challenging benchmark dataset, ImageNet [55]. Between 2011 and 2012, Krizhevsky 

created AlexNet, a CNN. As shown in Fig. 11, AlexNet has five convolutional layers, followed by max-pooling layers and 

three fully connected layers. Instead of using tanh and sigmoid activation functions, he used rectified linear units (ReLUs), 

which increased the speed and dropout. AlexNet showed that a greater depth resulted in high performance and, despite being 

computationally expensive, is feasible because of graphics processing units (GPUs). In 2014, DeepFace used neural networks 

to identify faces from the LFW dataset with 97.35% accuracy, an improvement of 27% over previous efforts [41]. In 2015, the 

Facenet model, using GoogLeNet-24, achieved 99.63% accuracy for the Google dataset [44]. In 2018, Ring loss model using 

ResNet-64 achieved 99.5% accuracy for the MS-Celeb dataset [56] and Arcface model using ResNet-100 achieved 99.83% 

accuracy for the MS-Celeb dataset [45]. In the work of Yan et al. [57], the use of VarGFaceNet resulted in an accuracy of 

99.85% for the LFW database. 

 
Fig. 11 AlexNet architecture 
 

288 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
Fig. 12 Autoencoder model 

 
Fig. 13 Variational autoencoder 

The evolution of DL is described in detail by Schmidhuber [58]. He explains the hierarchical representation learning for 

different supervised/reinforcement learning and the various advancements in both feedforward (acyclic) neural networks 

(FNNs) and recurrent (cyclic) neural networks (RNNs). He also described the evolution of restricted Boltzmann machines 

(RBMs), as well as the constituents of multilayer learning architectures, such as the deep belief networks (DBNs). 

Advances in DL meant working with high dimensional data, which could be reduced to codes of lower dimensionality. In 

2006, Hinton and Salakhutdinov [59] trained an “autoencoder” network. Autoencoders [60] are used for dimensionality 

reduction, denoising, and outlier detection and are made up of three sections, as shown in Fig. 12. The encoder encodes the data 

to the hidden layer (code) which results in an output h = f(x). The decoder then outputs r = g(h). The training minimizes a mean 

squared error loss function. Deep autoencoders use numerous internal intermediate representations, and these deep layers help 

learn more intricate and complex data patterns. Convolutional autoencoder (CAEs) [60] helps integrate the convolutional 

advantage of a CNN. The encoder is thus made up of convolutional layers and the decoder of deconvolutional layers. Thus, 

CAEs extract features and gives a feature map containing the image’s significant points.   

One limitation of an autoencoder is that it has a deterministic latent-space representation. Although the autoencoder learns 

the input data, it may lack relevant information, which may be due to random encoding in the latent space or empty space. To 

overcome this, Kingma et al. [61] suggested a variational autoencoder (VAE), as shown in Fig. 13, which uses a probability 

distribution for latent space code representation. An inference model q(z | x) for VAE is described in [62]. Here,  denotes the 

variational parameters, optimized for q(z | x)  p(x | z). Here, q(z | x) approximates the posterior p(z | x) of the generative 

model and is optimized using the evidence lower bound (ELBO) [63]. 

In 2014, Goodfellow et al. [64] introduced generative adversarial networks (GANs) as well as an adversarial network 

framework. A generative model is matched against a competitor, which they call a discriminative model, and the latter learns to 

determine whether the query face image is from the model distribution or given data distribution [64]. Both thrive on 

competition to improve their methods till one cannot be distinguished from the other. 

5.4.   Some current research in DL for FR 

Developing different deep FR methods and their deployment in real-world applications requires a systematic performance 

evaluation. Iandola et al. [65] provided an evaluation framework for different datasets and SOTA methods. They used the 

following criteria: data augmentation, network architecture, loss function, training strategy, and model compression. The 

varied sizes of the datasets, such as CASIA-WebFace, VGG-Face, MS-Celeb-1M, and MegaFace for training and LFW and 

YTF for testing the models, make comparisons difficult. Here, both the datasets and architectures vary. A critical part of the 

evaluation is the loss function, which imposes stricter requirements for FR, as it has to discriminate and separate the features 

from the embedding space. The training strategy also plays an important role in terms of the learning rate and batch size. With 

289 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
the modern trend of using FR in mobile and embedded devices, they also evaluated SqueezeNet [66] and MobileNet [67], 

which use compressed models and give better performance. They concluded that the deep ResNet series has advantages over 

other architectures, and the batch and feature normalization optimizes performance. 

Deployment of FR models, especially unconstrained faces on embedded or mobile devices, needs to meet the challenge of 

recognizing low-resolution faces at a low computational cost. This problem is addressed in the work of Ge et al. [68] by using 

the selective knowledge distillation approach and calling it the teacher-student model. They used a two-stream CNN, one with 

high resolution (HR), which collected the essential facial features used to tune the other LR network using regression and 

classification. Li et al. [69] also take on the challenging task of working with low-resolution unconstrained face images. They 

explore good-performing models using the SCface [70] and UCCSface [71] datasets. To visually learn the network, they 

pre-train it with DCGAN [72]. New trends for unconstrained, very low-resolution FR were explored in [73]. They present a 

classification of very low-resolution FR approaches, characterizing them as heterogeneous or homogeneous based on their 

belongingness to different or same domains, respectively. The heterogeneous approach can be classified into projection 

(coupled mapping) and synthesis (super-resolution (SR)) methods. In a homogeneous approach, they discussed lightweight 

CCNs. They listed the challenges for very low-resolution FR as the availability of datasets for real-world applications, the 

dearth of discriminative features, discrepancies in the domain, and the efficiency of existing solutions. 

One of the challenges in FR is the development of a pipeline that can simultaneously perform FD, alignment, and 

recognition. Other parameters, such as pose and gender, may also be required in some instances. A CNN pipeline for the 

different processes is described by Ranjan et al. [74]. They use a deep pyramid single-shot face detector (DPSSD) and a new 

loss function called crystal loss. They evaluated their end-to-end system on the IARPA Janus Benchmarks IJB-A [75], IJB-B 

[76], IJB-C [77], and IARPA Janus Challenge Set 5 (CS5) datasets to obtain SOTA performance. They also mentioned that 

some of the challenges facing current FR systems are dataset bias and domain adaptation.  

In mid-March 2020, the World Health Organization (WHO) declared the coronavirus disease 2019 (COVID-19) be a 

pandemic [78]. DL has been extensively used in the analysis of the COVID-19 pandemic, as elaborated in the work of 

Heidariet al. [79], for disease prediction, disease monitoring, drug testing, and vaccine development. WHO issued guidelines 

for wearing a mask to prevent the transmission of the disease. Abboah-Offei et al. [80] provided a detailed analysis of 

facemasks to control the transmission of respiratory viral infections, and the French government tested AI-based CCTV 

software to detect whether travelers wore masks or not [81]. The FR research community is engaged in developing systems to 

monitor the facemasks worn by people. Fig. 14 depicts a block diagram of face mask detection using ML or DL.  

Mbunge et al. [82] and Nowrin et al. [83] provide a comprehensive review of ML- and DL-based facemask detection 

techniques. Most of the facemask detection algorithms are CNN-based. A few are hybrid as they use DL and ML approaches 

like support vector machine (SVM) and decision tree (DT). CNN-based models include MobileNetV2 [84], ResNet [85], and 

VGG-16 CNN [86]. MobileNet and ResNet perform better than VGG-16 CNN. MobileNetV2 exhibits better performance 

because it is a lightweight classifier. SRCNet [87] uses an SR network and a classification network to perform three-class 

classification with an accuracy of 98.7%. Facemasknet [88], a three-class classifier, has an accuracy of 98.6%. 

RetinaFacemask [89], which uses both ResNet and MobileNet, incorporates transfer learning to achieve SOTA results. Some 

challenges for face mask detection are elaborated in the work of Nowrin et al. [83]. These include the availability of 

benchmarked datasets, variation in mask designs, processing speed for real-time applications, and variations in image 

resolution and masked face reconstruction. 

 
Fig. 14 Face mask detection block diagram 

290 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
6. Conclusions 

This study reviewed the vast literature on the development of different approaches for AFR. Over time, a transition from 

shallow to modern SOTA methodologies for DL has been observed. Early FR methods used limited images and a 

laboratory-controlled environment. However, with the advent of DL models, the LFW database achieved 99.85% accuracy. 

This was possible because of GPUs’ massively parallel processing capabilities and large training and testing datasets. The 

challenges faced by DL models were also examined. As networks deepen, the complexity of the deep convolutional neural 

network (DCNN) model increases. A deep autoencoder or VAE that preserves some interclass discrimination information and 

intraclass similarity can feed a DCNN with a lower complexity to reduce the overall DCNN complexity. The performance 

decreases when the images have low resolution, variations in illumination, and blurry quality. Hence, DL methods must be 

made more robust under adverse conditions. The advent of new mobile communication technologies presents the challenge of 

integrating personalized FR applications that can be accessed by mobile users over different clouds and networks.   

Conflicts of Interest 

The authors declare no conflicts of interest. 

References 

[1] G. Guo, et al., “A Survey on Deep Learning Based Face Recognition,” Computer Vision and Image Understanding, vol. 

189, Article no. 102805, December 2019.  

[2] P. Viola, et al., “Rapid Object Detection Using a Boosted Cascade of Simple Features,” IEEE Computer Society 

Conference on Computer Vision and Pattern Recognition, pp. 1-9, December 2001.  

[3] M. Wang, et al., “Deep Face Recognition: A Survey,” https://arxiv.org/pdf/1804.06655v2.pdf, April 2018.   

[4] L. Sirovich, et al., “Low-Dimensional Procedure for the Characterization of Human Faces,” Journal of the Optical Society 

of America A, vol. 4, pp. 519-524, March 1987.  

[5] M. Ballantyne, et al., “Woody Bledsoe: His Life and Legacy,” AI Magazine, vol. 17, no. 1, pp. 7-20, 1996.   

[6] A. J. Goldstein, et al., “Man-Machine Interaction in Human-Face Identification,” Bell System Technical Journal, vol. 51, 

no. 2, pp. 399-427, 1972.  

[7] M. A. Turk, et al., “Face Recognition Using Eigenfaces,” IEEE Computer Society Conference on Computer Vision and 

Pattern Recognition, pp. 586-587, January 1991.  

[8] W. Zhao, et al., “Subspace Linear Discriminant Analysis for Face Recognition,” 

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.7.6280&rep=rep1&type=pdf, April 1999. 

[9] M. S. Bartlett, et al., “Face Recognition by Independent Component Analysis,” IEEE Transactions on Neural Networks, 

vol. 13, no. 6, pp. 1450-1464, December 2002. 

[10] F. Samaria, et al., “HMM-Based Architecture for Face Identification,” Image and Vision Computing, vol. 12, no. 8, pp. 

537-543, October 1994.  

[11] H. Schneiderman, et al., “Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition,” 

IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 45-51, June 1998.  

[12] M. H. Yang, et al., “Detecting Faces in Images: A Survey,” IEEE Transactions on Pattern Analysis and Machine 

Intelligence, vol. 24, no. 1, pp. 34-58, August 2002.  

[13] X. He, et al., “Face Recognition Using Laplacianfaces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 

vol. 27, no. 3, pp. 328-340, January 2005.  

[14] J. Wright, et al., “Robust Face Recognition via Sparse Representation,” IEEE Transactions on Pattern Analysis and 

Machine Intelligence, vol. 31, no. 2, pp. 210-227, April 2008.  

[15] L. Zhang, et al., “Sparse Representation or Collaborative Representation: Which Helps Face Recognition?” International 

Conference on Computer Vision, pp. 471-478, November 2011.  

[16] L. Zhang, et al., “Collaborative Representation Based Classification for Face Recognition,” 

https://arxiv.org/vc/arxiv/papers/1204/1204.2358v1.pdf, April 2012.  

[17] L. Yang, et al., “Distance Metric Learning: A Comprehensive Survey,” 

https://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf, May 2006.  

291 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
[18] R. Jin, et al., “Regularized Distance Metric Learning: Theory and Algorithm,” Advances in Neural Information Processing 

Systems, vol. 22, pp. 862-870, December 2009.  

[19] J. G. Daugman, “Two-Dimensional Spectral Analysis of Cortical Receptive Field Profiles,” Vision Research, vol. 20, no. 

10, pp. 847-856, January 1980. 

[20] C. Liu, et al., “A Gabor Feature Classifier for Face Recognition,” 8th IEEE International Conference on Computer Vision, 

pp. 270-275, July 2001.  

[21] T. Barbu, “Gabor Filter-Based Face Recognition Technique,” Proceedings of the Romanian Academy, vol. 11, no. 3, pp. 

277-283, March 2010.  

[22] M. Yang, et al., “Gabor Feature Based Sparse Representation for Face Recognition with Gabor Occlusion Dictionary,” 

European Conference on Computer Vision, pp. 448-461, September 2010.  

[23] V. Štruc, et al., “Principal Gabor Filters for Face Recognition,” 3rd International Conference on Biometrics: Theory, 

Applications, and Systems, pp. 1-6, September 2009.  

[24] T. Ahonen, et al., “Face Recognition with Local Binary Patterns,” European Conference on Computer Vision, pp. 469-481, 

May 2004.  

[25] T. Ahonen, et al., “Face Description with Local Binary Patterns: Application to Face Recognition,” IEEE Transactions on 

Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037-2041, October 2006.  

[26] T. Ojala, et al., “A Comparative Study of Texture Measures with Classification Based on Featured Distributions,” Pattern 

Recognition, vol. 29, no. 1, pp. 51-59, January 1996.  

[27] D. Huang, et al., “Local Binary Patterns and Its Application to Facial Image Analysis: A Survey,” IEEE Transactions on 

Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 41, no. 6, pp. 765-781, March 2011. 

[28] R. R. Isnanto, et al., “Multi-Object Face Recognition Using Local Binary Pattern Histogram and Haar Cascade Classifier 

on Low-Resolution Images,” International Journal of Engineering and Technology Innovation, vol. 11, no. 1, pp. 45-58, 

January 2021. 

[29] D. S. Bolme, “Elastic Bunch Graph Matching,” Master thesis, Department of Computer Science, Colorado State 

University, CO, 2003. 

[30] B. M. Lahasan, et al., “Recognizing Faces Prone to Occlusions and Common Variations Using Optimal Face Subgraphs,” 

Applied Mathematics and Computation, vol. 283, pp. 316-332, June 2016.  

[31] D. G. Lowe, “Object Recognition from Local Scale-Invariant Features,” 7th IEEE International Conference on Computer 

Vision, pp. 1150-1157, September 1999.  

[32] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol. 

60, no. 2, pp. 91-110, November 2004.  

[33] J. Luo, et al., “Person-Specific SIFT Features for Face Recognition,” IEEE International Conference on Acoustics, Speech, 

and Signal Processing, pp. 593-596, April 2007.  

[34] C. Geng, et al., “Face Recognition Using SIFT Features,” 16th IEEE International Conference on Image Processing, pp. 

3313-3316, November 2009.  

[35] N. Dalal, et al., “Histograms of Oriented Gradients for Human Detection,” IEEE Computer Society Conference on 

Computer Vision and Pattern Recognition, pp. 886-893, June 2005.  

[36] O. Déniz, et al., “Face Recognition Using Histograms of Oriented Gradients,” Pattern Recognition Letters, vol. 32, no. 12, 

pp. 1598-1603, September 2011.  

[37] Z. Cao, et al., “Face Recognition with Learning-Based Descriptor,” IEEE Computer Society Conference on Computer 

Vision and Pattern Recognition, pp. 2707-2714, June 2010.  

[38] Z. Lei, et al., “Learning Discriminant Face Descriptor,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 

vol. 36, no. 2, pp. 289-302, June 2013.  

[39] J. Sánchez, et al., “Image Classification with the Fisher Vector: Theory and Practice,” International Journal of Computer 

Vision, vol. 105, no. 3, pp. 222-245, December 2013.  

[40] T. H. Chan, et al., “PCANet: A Simple Deep Learning Baseline for Image Classification?” IEEE Transactions on Image 

Processing, vol. 24, no. 12, pp. 5017-5032, September 2015.  

[41] Y. Taigman, et al., “Deepface: Closing the Gap to Human-Level Performance in Face Verification,” Proceedings of the 

IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701-1708, June 2014.  

[42] Y. Sun, et al., “Deep Learning Face Representation by Joint Identification-Verification,” Advances in Neural Information 

Processing Systems, pp. 1988-1996, December 2014.  

[43] Y. Sun, et al., “Deepid3: Face Recognition with Very Deep Neural Networks,” https://arxiv.org/pdf/1502.00873.pdf, 

February 2015.   

292 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
[44] F. Schroff, et al., “Facenet: A Unified Embedding for Face Recognition and Clustering,” Proceedings of the IEEE 

Conference on Computer Vision and Pattern Recognition, pp. 815-823, June 2015.  

[45] J. Deng, et al., “Arcface: Additive Angular Margin Loss for Deep Face Recognition,” Proceedings of the IEEE/CVF 

Conference on Computer Vision and Pattern Recognition, pp. 4690-4699, June 2019.  

[46] H. Liu, et al., “Adaptiveface: Adaptive Margin and Sampling for Face Recognition,” Proceedings of the IEEE/CVF 

Conference on Computer Vision and Pattern Recognition, pp. 11947-11956, June 2019.  

[47] G. B. Huang, et al., “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained 

Environments,” Workshop on Faces in Real-Life Images: Detection, Alignment, and Recognition, pp. 1-11, October 2008.  

[48] M. Taskiran, et al., “Face Recognition: Past, Present and Future (A Review),” Digital Signal Processing, vol. 106, Article 

no. 102809, November 2020.  

[49] Y. Bengio, Learning Deep Architectures for AI, Hanover: Now Publishers, 2009.  

[50] Y. Bengio, et al., “Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and 

Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, March 2013.  

[51] S. Hayman, “The McCulloch-Pitts Model,” International Joint Conference on Neural Networks, pp. 4438-4439, July 1999.  

[52] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” 

Psychological Review, vol. 65, no. 6, pp. 386-408, November 1958.  

[53] M. Minsky, et al., Perceptrons, Cambridge: MIT Press, 1969.  

[54] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition,” 

Proceedings of the U.S.-Japan Joint Seminar, pp. 267-285, February 1982.  

[55] A. Krizhevsky, et al., “Imagenet Classification with Deep Convolutional Neural Networks,” Advances in Neural 

Information Processing Systems, vol. 25, pp. 1097-1105, December 2012.  

[56] Y. Zheng, et al., “Ring Loss: Convex Feature Normalization for Face Recognition,” Proceedings of the IEEE Conference 

on Computer Vision and Pattern Recognition, pp. 5089-5097, June 2018.  

[57] M. Yan, et al., “Vargfacenet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face 

Recognition,” Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 2647-2654, 

October 2019. 

[58] J. Schmidhuber, “Deep Learning in Neural Networks: An Overview,” Neural Networks, vol. 61, pp. 85-117, January 2015.  

[59] G. E. Hinton, et al., “Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, no. 5786, pp. 

504-507, July 2006.  

[60] I. Goodfellow, et al., Deep Learning, Cambridge: MIT Press, 2016.  

[61] D. P. Kingma, et al., “Auto-Encoding Variational Bayes,” https://arxiv.org/pdf/1312.6114v4.pdf, December 2013.  

[62] D. P. Kingma, et al., “An Introduction to Variational Autoencoders,” https://arxiv.org/pdf/1906.02691v1.pdf, June 2019.  

[63] C. Doersch, “Tutorial on Variational Autoencoders,” https://arxiv.org/pdf/1606.05908v1.pdf, June 2016.  

[64] I. Goodfellow, et al., “Generative Adversarial Nets,” Proceedings of the 27th International Conference on Neural 

Information Processing Systems, pp. 2672-2680, December 2014. 

[65] M. You, et al., “Systematic Evaluation of Deep Face Recognition Methods,” Neurocomputing, vol. 388, pp. 144-156, May 

2020. 

[66] F. N. Iandola, et al., “SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size,” 

https://arxiv.org/pdf/1602.07360.pdf, November 2016. 

[67] A. G. Howard, et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision,” 

https://arxiv.org/pdf/1704.04861.pdf, April 2017. 

[68] S. Ge, et al., “Low-Resolution Face Recognition in the Wild via Selective Knowledge Distillation,” IEEE Transactions on 

Image Processing, vol. 28, no. 4, pp. 2051-2062, November 2018. 

[69] P. Li, et al., “On Low-Resolution Face Recognition in the Wild: Comparisons and New Techniques,” IEEE Transactions 

on Information Forensics and Security, vol. 14, no. 8, pp. 2000-2012, August 2019. 

[70] M. Grgic, et al., “SCface-Surveillance Cameras Face Database,” Multimedia Tools Applications, vol. 51, no. 3, pp. 

863-879, February 2011. 

[71] A. Sapkota, et al., “Large Scale Unconstrained Open Set Face Database,” IEEE 6th International Conference on 

Biometrics: Theory, Applications, and Systems, pp. 1-8, September 2013. 

[72] A. Radford, et al., “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” 

https://arxiv.org/pdf/1511.06434v1.pdf, November 2015. 

[73] L. S. Luevano, et al., “A Study on the Performance of Unconstrained Very Low Resolution Face Recognition: Analyzing 

Current Trends and New Research Directions,” IEEE Access, vol. 9, pp. 75470-75493, May 2021. 

293 


Advances in Technology Innovation, vol. 7, no. 4, 2022, pp. 279-294 

 
[74] R. Ranjan, et al., “A Fast and Accurate System for Face Detection, Identification, and Verification,” IEEE Transactions on 

Biometrics, Behavior, and Identity Science, vol. 1, no. 2, pp. 82-96, April 2019.  

[75] B. F. Klare, et al., “Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A,” 

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1931-1939, June 2015.  

[76] C. Whitelam, et al., “IARPA Janus Benchmark-B Face Dataset,” Proceedings of the IEEE Conference on Computer 

Vision and Pattern Recognition Workshops, pp. 90-98, July 2017.  

[77] B. Maze, et al., “IARPA Janus Benchmark-C: Face Dataset and Protocol,” International Conference on Biometrics, pp. 

158-165, February 2018. 

[78] “WHO Director-General’s Opening Remarks at the Media Briefing on COVID-19–11 March 2020,” 

https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-

covid-19---11-march-2020, March 11, 2020. 

[79] A. Heidari, et al., “The COVID-19 Epidemic Analysis and Diagnosis Using Deep Learning: A Systematic Literature 

Review and Future Directions,” Computers in Biology and Medicine, vol. 141, Article no. 105141, February 2022. 

[80] M. Abboah-Offei, et al., “A Rapid Review of the Use of Face Mask in Preventing the Spread of COVID-19,” International 

Journal of Nursing Studies Advances, vol. 3, Article no. 100013, November 2021. 

[81] H. Fouquet, “Paris Tests Face-Mask Recognition Software on Metro Riders,” 

https://www.bloombergquint.com/politics/paris-tests-face-mask-recognition-software-on-metro-riders, May 07, 2020. 

[82] E. Mbunge, et al., “Application of Deep Learning and Machine Learning Models to Detect COVID-19 Face Masks–A 

Review,” Sustainable Operations and Computers, vol. 2, pp. 235-245, January 2021. 

[83] A. Nowrin, et al., “Comprehensive Review on Facemask Detection Techniques in the Context of Covid-19,” IEEE Access, 

vol. 9, pp. 106839-106864, July 2021. 

[84] P. Khandelwal, et al., “Using Computer Vision to Enhance Safety of Workforce in Manufacturing in a Post COVID 

World,” https://arxiv.org/ftp/arxiv/papers/2005/2005.05287.pdf, May 2020. 

[85] M. Loey, et al., “Fighting against COVID-19: A Novel Deep Learning Model Based on YOLO-v2 with ResNet-50 for 

Medical Face Mask Detection,” Sustainable Cities and Society, vol. 65, Article no. 102600, February 2021.  

[86] S. V. Militante, et al., “Real-Time Facemask Recognition with Alarm System Using Deep Learning,” 11th IEEE Control 

and System Graduate Research Colloquium, pp. 106-110, August 2020. 

[87] B. Qin, et al., “Identifying Facemask-Wearing Condition Using Image Super-Resolution with Classification Network to 

Prevent COVID-19,” Sensors, vol. 20, no. 18, Article no. 5236, September 2020. 

[88] M. Inamdar, et al., “Real-Time Face Mask Identification Using Facemasknet Deep Learning Network,” 

https://ssrn.com/abstract=3663305, July 2020. 

[89] M. Jiang, et al., “Retinamask: A Face Mask Detector,” https://arxiv.org/pdf/2005.03950v1.pdf, May 2020. 

 
Copyright©  by the authors. Licensee TAETI, Taiwan. This article is an open access article distributed 

under the terms and conditions of the Creative Commons Attribution (CC BY-NC) license  

 (https://creativecommons.org/licenses/by-nc/4.0/). 

294