Knowledge Engineering and Data Science (KEDS)  pISSN 2597-4602 

Vol 5, No 2, December 2022, pp. 129–136  eISSN 2597-4637 

 
https://doi.org/10.17977/um018v5i22022p129-136 

©2022 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id 

This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) 

An Accurate Real-Time Method for Face Mask Detection  

using CNN and SVM 

Shili Hechmi * 

University of Tabuk, Tabuk 741, Saudi Arabia 
asuhaili@ut.edu.sa * 

* corresponding author 

 
I. Introduction  

Respiratory infections have been the leading cause of mortality worldwide for many years. Deaths 
from pneumonia occur in all countries. Deaths are mainly among the elderly in high-income nations, 
whereas children are the primary casualties in low-income ones. At the same time, fatalities in both 
population categories are documented in most middle-income nations. Since the end of 2019, a novel 
beta-coronavirus has caused several viral pneumonia episodes in China in the Wuhan region [1][2][3] 
before spreading over the world, resulting in the worst contagious epidemic since the 1918 Spanish 
flu. This coronavirus, SARS-CoV-2 (Severe Acute Respiratory Syndrome CoronaVirus-2), is 
responsible for a clinical picture called COVID-19 by the WHO (for COronaVIrus Disease 2019), 
involving various organ assaults, but most notably an attack on the upper and lower airways. The 
COVID-19 epidemic has claimed the deaths of almost 5 million people worldwide since the first 
outbreak [4]. Before the COVID-19 pandemic, over 2.5 million adults and children perished from 
pneumonia every year: No other virus resulted in as many fatalities. After the emergence of COVID-
19, matters became more complicated, and respiratory diseases became a significant challenge to 
global health, especially with their rapid spread and danger to all segments of society.  

Studies on influenza, influenza-like illnesses, and human coronaviruses prove that a medical mask 
helps lessen the spread of contagious droplets from an infected individual and the possible 
contamination of the environment with these droplets [5]. These particles are expelled when a 
COVID-19 patient talks, sneezes, or coughs. People nearby may breathe in these contagious droplets 
through their mouths and noses and even be inhaled them into the lungs. The WHO highly 
recommends wearing masks to avoid infection [4]. Many nations have mandated that individuals wear 
medical masks in public spaces like squares to keep the sickness from spreading. 

In contrast, most of the time, the verification process for wearing a medical mask is still done 
manually, which requires a lot of human capabilities and leads to many errors in identifying people 
who do not adhere to wearing a mask, especially in crowded places. In response to this need, tools for 
detecting the wearing of masks - without using facial recognition - are becoming increasingly 
common. Many recent types of research have been carried out to identify people without a medical 
mask [6][7][8][9][10], and the challenge remains to get the highest recognition rate. 

ARTICLE INFO A B S T R A C T   

Article history:  

Received 3 March 2023 

Revised 29 March 2023 

Accepted 30 March 2023 

Published online 30 December 2022 

 
Infectious respiratory diseases, including COVID-19, pose a significant challenge to 
humanity and a potential threat to life due to their severity and rapid spread. Using a 
surgical mask is among the most significant safety precautions that can help keep this 
sort of pandemic from spreading, and manual monitoring of large crowds in public 
places for face masks is problematic. In this research, we suggest a real-time approach 
for face mask detection. First, we use a multi-scale deep neural network to extract 
features. As a result, the attributes are better suited for training the detection system. 
We employ SVM post-processing in the classification stage to make the face mask 
detection method more robust. According to the experimental findings, our strategy 
considerably decreased the percentage of false positives and undetected cases. 

This is an open-access article under the CC BY-SA license 

(https://creativecommons.org/licenses/by-sa/4.0/). 

Keywords: 

Face Mask Detection 

COVID-19 

CNN 

SVM 

http://u.lipi.go.id/1502081730
http://u.lipi.go.id/1502081046
http://journal2.um.ac.id/index.php/keds
mailto:keds.journal@um.ac.id
https://creativecommons.org/licenses/by-sa/4.0/
https://creativecommons.org/licenses/by-sa/4.0/


 S. Hechmi / Knowledge Engineering and Data Science 2022, 5 (2): 129–136 130 

 
Detecting individuals not using medical masks has posed a significant challenge to researchers 
since a person can wear a mask incorrectly and not completely cover the nose and mouth. Below we 
review the most critical research conducted in this field. Preeti Nagrath et al. [6] introduced 
SSDMNV2, a deep learning, TensorFlow, Keras, and OpenCV-based technique for recognizing face 
masks. The Single Shot Multibox Detector detects faces in this approach, and the MobilenetV2 
algorithm acts as the classifier's framework. S. Sanjaya and S. Rakhmawan provide a machine-
learning method for recognizing face masks in Indonesia, citing [7]. Authorities can plan for COVID-
19 mitigation, evaluation, prevention, and response. The suggested model may be used with a security 
camera to halt the COVID-19 epidemic by recognizing individuals not using medical masks. The 
authors used the preprocessing step before training and testing the data. The regional result displays 
the percentage of persons in the cities with the highest and lowest percentages. Gui Ling Wu 
developed a masked face identification strategy focused on the process of attention [8] to enhance the 
efficacy of covered face image recognition. The covered face image is extracted first, followed by the 
face image component, utilizing the local restricted dictionary learning approach. The dilated 
convolution is then used to compensate for the resolution loss during the subsampling process. Finally, 
the attentive method neural network is used based on the relevant feature information in the face image 
to reduce information loss during the sub-sampling process and enhance the rate of facial 
identification. 

Hariri, W. proposed an efficient Method for Masked Face Recognition [2]. The Author employs 
three pre-trained deep Convolutional Neural Networks (CNN), VGG-16, AlexNet, and ResNet-50 use 
them to extract deep features from the resulting regions. The Bag-of-features (BoF) is used for in-
depth feature extraction and masked faces classification. Multilayer Perceptron (MLP) is applied for 
the classification process. In [9], G. Yang et al. suggested using a deep learning system based on 
YOLOV5 to replace manual testing. The technique is applied to look for face masks. The entire system 
was divided into four sections: facial mask image improvement, facial mask image segmentation, 
facial mask image identification, and interface interaction. GIOU Loss and Center Loss are combined 
to identify whether a face mask is used. Anirudh et al. proposed in [11] a face mask detection using 
image processing. This presented system has three phases: 1. Image preprocessing 2. Face recognition 
and cropping 3. Classifier for face masks. This technology can identify faces with and without masks 
and may be utilized with webcam cameras. This method will promote using face masks, help identify 
safety infractions, and provide a safe working environment. In [10], the authors proposed a new mixed 
method to automatically detect whether someone is protecting himself by wearing a mask. It combines 
convolutional neural network (CNN)-extracted visual characteristics with an image histogram that 
communicates information about the pixel intensity. The authors of this study present a few pre-
trained models for creating feature extraction systems utilizing CNNs and several kinds of picture 
histograms. 

In [12], the authors presented a system using convolutional neural networks to classify the 
detection of facial masks utilizing COVID-19 precautions in images and videos. A complete 
experiment on the dataset and an effectiveness assessment of the suggested method is presented. In 
addition, we have succeeded in preserving the inter- and intra-class facial mask detection variability 
using a symbolic approach. The authors explored different classifiers, including support vector 
machines and symbolic classifiers. This work is being prototyped to monitor temperature readings 
and find masks on individuals. The first technique uses a temperature sensor that records your body 
temperature, and the first approach employs a temperature sensor that measures your body 
temperature and immediately sprays disinfection. The second, your job is to provide people with safety 
systems to avoid COVID-19. The author proposed using deep learning concepts to monitor people's 
conditions continuously. Jiang, Mingjie, and Xinqi Fan introduced RetinaFaceMask [13], a single-
stage detector that employs a feature pyramid network to combine very sophisticated semantic data 
with a novel context attention module focusing on facial mask recognition. 

Furthermore, the authors present a novel cross-class object elimination method for denying 
hypotheses with a high union intersection and little confidence. We have reviewed several works 
related to masked and unmasked face identification. Through this study, we find a gradual and 
consistent increase in the accuracy of these systems. Nevertheless, most studies use face databases 
and not actual images, which does not allow them to be implemented directly into surveillance systems 
to identify individuals not using medical masks in common areas and in real time. Our method, 
detailed in the next section, tries to solve this problem. 


131 S. Hechmi / Knowledge Engineering and Data Science 2022, 5 (2): 129-136 

This research introduces a novel medical mask detection model that uses a CNN to produce multi-
scale in-depth features. Automatically, the deep network extracts the original image's multi-layer 
attributes. Usually, such a network extracts many characteristics that perform better than standard 
subjective attributes. In response, we used an image dataset to train a CNN-based feature extraction 
model. In this work, we refer to these derived characteristics as CNN features. 

The following is a summary of our paper's primary contributions:  

a. We propose a novel method for identifying medical masks that employ multi-scale CNN 
characteristics collected from layered windows using a deep neural network over several complete 

connection layers. An SVM classifier is trained using the finding rate from every detection 

window. 

b. We present a complete medical mask detection method that is simple to use, efficient, and reliable. 
c. We identify people not wearing medical masks from the actual images and not by using well-

framed face images. Our automatic mask detection method can be easily implemented on an 
existing video surveillance system. 

This article is organized as follows: Section 2 summarizes the most recent approaches to detect 
persons not wearing medical masks. Section 3 contains a complete overview of our suggested 
technique. The experiment's results are outlined in Section 4. We will conclude this study in Section 
5 and suggest possible future steps. 

II. Proposed Method 

Many essential neurons are used in deep learning, each receiving the output of a lower-level 
neuron. Based on the nonlinear relation among outputs and inputs, the low-level characteristics are 
merged into a higher-level abstract concept that describes the scattered properties of the observed data. 
Bottom-up research is used to create a multi-layer abstract representation. Multi-layer feature learning 
is a completely automated technique that does not require any human participation. The inputs are 
transferred to several feature levels via the deep learning technique based on the learned network 
structure, and then classifies or identifies the top layer's output using a matching algorithm or 
classifier. 

We propose a technique for detecting persons not wearing medical masks based on CNN. We can 
determine if someone uses a face mask when analyzing surveillance camera photos. First, the human 
body positioning module receives real-world photos containing multiple possible human body areas. 
The facial positioning module is then used to identify several potential face areas. The face mask 
detection module uses this data to identify several face mask detection zones. Finally, using an SVM 
classifier for post-processing, we determine the accurate face mask identification result. Figure 1 
presents a structural representation of the system. 

 
Fig. 1. Face mask identification method using a convolution neural network (CNN) 


 S. Hechmi / Knowledge Engineering and Data Science 2022, 5 (2): 129–136 132 

 
A. Dataset Description 

There are several databases for the evaluation of facial mask detection methods. In order to provide 
a relevant evaluation of our proposed method, we wanted to use more than one database in the 
experimental tests but observed that each dataset's annotation format was distinct and that some data 
did not fit our needs. To solve this problem, we built a database with images collected from available 
databases with a new annotation and added images from the internet.  

This constructed dataset contained 10229 images distributed as follows: 3250 images from 
WIDER Face [14], 4108 images from MAFA [15], 1521 images from RMFRD [16], and 1350 images 
obtained from the internet. We utilize 7412 images for training, 772 for verification, and 2045 for 
testing. 

B. CNN Feature Extraction 

Consider the n-layer structure S = (S1,…,Sn). The system's input is I, and the output is O. It may be 
written as I→ S1 → S2→…→ Sn→ O. If the input I and the output O are equal, O has the same 
information as the initial input, indicating that no information was lost throughout the layering process 
(Si). In other words, O is an alternative rendering of the input, whereas I represents the initial (the 
original information). The essential principle of deep learning is that the input and output are equal 
for each layer of an n-layer neural network. In an ideal world, no human assistance would be required 
during the learning process. 

A CNN is a multi-layered neural network. Every layer consists of numerous two-dimensional 
planes, each with its own set of neurons. Simple (S-neurons) and complex (C-neurons) neurons 
comprise the network. The S-neurons create the S-plane, and the S-planes form the S-layer, the symbol 
we indicate. C-neuron, C-plane, and C-layer (Uc) are all equivalent. The S and C layers connect each 
intermediate network level. The input layer, on the other hand, comprises just one layer and direct 
access to the two-dimensional visual features. The techniques for extracting features from the sample 
are included in the CNN model's linked structure. 

In a CNN, the input connections between S-neuron are flexible, whereas the rest are constant. The 
output of an S-neuron on the l level of the kl S-plane is represented by usl (kl, n), whereas the output of 
a C-neuron on the kl C-plane is represented by ucl (kl, n). n is a two-dimensional coordinate 
representing the field's position in the input layer. The receptive field is limited initially, which rises 
in level l. The S-output neurons are in (1) to (2). 

𝑢𝑠𝑙  (𝑘, 𝑛) =  𝑟𝑙 (𝑘)𝛷 {
1+∑ ∑ 𝑎𝑙 (𝑣,𝑘𝑙−1,𝑘)𝑢𝑐𝑙−1(𝑘𝑐𝑙−1,𝑛+𝑣)𝑣∈𝐴𝑙

𝑘𝑙−1

𝑘𝑙−1

1+
𝑟𝑙(𝑘)

𝑟𝑙(𝑘)+1
𝑏𝑙(𝑘)𝑢𝑣𝑙 ( 𝑛)

− 1}          (1) 

𝛷(𝑥) = {𝑥   𝑥 ≥ 0 0   𝑥 < 0        (2) 

The connection coefficients of the excitatory and inhibitory inputs, respectively, are 𝑎𝑙(𝑣, 𝑘𝑙−1, 𝑘) 

and 𝑏𝑙 (𝑘). 𝑟𝑙 (𝑘) is a constant that regulates the feature extraction's selectivity. A higher number 

indicates a lower tolerance for noise and feature distinctions. The function 𝛷(𝑥) is nonlinear. v is a 
vector that indicates the former neuron's relative location in the receptive field, n. The size of the S 

neuron's feature extraction reflecting the receptive field of n, is determined by 𝐴𝑙 . As a result, the 

total of v includes all of the neurons in the defined region and the sum of 𝑘𝑙 − 1 includes all of the 
sub-planes in the previous level. As a result, the numerator's sum term is frequently referred to as the 

excited term and is the sum of the product. Neurons are fed into the receptive field, multiplying their 

outputs by weights. 𝑢𝑣𝑙 ( 𝑛) is a supposed inhibitory neuron, V in the S-plane that may be used to 
illustrate the network's inhibitory impact. The V-output neurons are (3). 

𝑢𝑣𝑙(𝑛) = (∑ ∑ 𝑐𝑙 (𝑣)(𝑢𝑑−1(𝑘𝑙−1, 𝑛 + 𝑣))
2

𝑣∈𝐴𝑙
𝑘𝑙−1
𝑘𝑙−1

)

1

2
      (3) 

where 𝑐𝑙 (𝑣) represents the weights of the V-neurons. The C-output neurons are (4) to (5). 


133 S. Hechmi / Knowledge Engineering and Data Science 2022, 5 (2): 129-136 

𝑢𝑐𝑙 (𝑘𝑙, 𝑛) = 𝜑 [
1+∑ 𝑗𝑙 (𝑘𝑙,𝑘𝑙−1)

𝑘𝑙
𝑘𝑙−1=1

∑ 𝑑𝑙(𝑣)𝑢𝑠𝑙(𝑘𝑙,𝑛+𝑣)𝑣∈𝐷𝑙

1+𝑉𝑠𝑙 (𝑛)
− 1]        (4) 

𝜑(𝑥) = {
𝑥

𝛽+𝑥
   𝑥 ≥ 0 0           𝑥 < 0         (5) 

Where 𝛽 is a fixed value. 𝑘𝑙  is used to indicate how many sub-planes (S) are present at the l level. 
𝐷𝑙  is the C-receptive neuron's field. As a result, It is related to the size of the feature. The 
aforementioned fixed link's weight is 𝑑𝑙(𝑣), and it is a monotonically descending function of |v|. If 
the S neurons 𝑘𝑙  is sub-plane got signals from the 𝑘𝑙−1 sub-plane, 𝑗𝑙 (𝑘𝑙 , 𝑘𝑙−1) 𝑒𝑞𝑢𝑎𝑙𝑠 1, else it equals 
0. 

C. Multi-scale Detecting Method 

The choice of a suitable observation scale is essential for identifying and comprehending targets 
because the characteristics of an object vary according to the scale. Since images contain objects of 
various sizes, choosing an ideal scale for picture analysis in advance is not feasible. As a result, the 
image's content must be considered at several scales. 

We created a multi-scale feature extraction model with three CNNs. Each CNN model has eight 
layers, five of which are convolution layers and three total connection layers. Three layered and 
increasingly monumental rectangular panes automatically extract features from each image (the mask 
region, the face region, and the human body region). CNNs extract three features, which are then 
transmitted to two total connection layers, with the output of the second full connection delivered to 
the output layer. Finally, the linear SVM classifier is used to categorize all of the sub-blocks. 

CNN is used to extract characteristics from each picture. We begin by selecting candidates for the 
mask region. We can see the mask information immediately in this place. Using the CNN model, we 
extract features for the face area. This vector is known as Feature A. We will have numerous incorrect 
detection regions if we identify the mask region by simply extracting its features. A second feature 
vector is extracted from a rectangle neighborhood to boost the accuracy. This neighborhood comprises 
the mask region and its immediate adjoining regions, namely the face region. This vector is known as 
Feature B. We locate the body region utilizing the observed face and mask portions. Feature C refers 
to the features recovered by the CNN from the human body area.  

To train the body identification model, we employ Feature C. B and C is combined to produce W, 
a new feature utilized to train the face detection model. W is described as: 

𝑊 = 𝑣𝐵 + 𝜇𝐶            (6) 

where v and 𝜇 denote the confidences of B and C.  

 The properties of human observations suggest that different items attract different kinds of 
attention. For instance, we specify the object region as the target region to recognize an object in an 
image. The weight of a position decreases the further it is from the target zone. As a result, we put 
𝑣 = 0.7 𝑎𝑛𝑑 𝜇 = 0.3. We employ a mix of A, B, and C (Feature S) to train the mask identification 
model. S is defined as: 

𝑆 = 𝛼𝐴 + 𝛽𝐵 + 𝛾𝐶          (7) 

where 𝛼 = 0.6, 𝛽 = 0.3, 𝛾 = 0.1 denote A, B, and C's confidences. 

D. Post-processing using SVM  

We discussed the CNN approach in the previous section for finding the coarse locations of the 
human body, face, and mask region. To create the feature vector, we compute their detection scores. 
Finally, we employ the SVM technique for post-processing to delete the inaccurate areas.  

In this research, we apply three CNN detection algorithms (D1, D2, and D3) to make three 
detections. D1 represents the detection of a human body, D2 represents the detection of a face, and D3 
represents the detection of a medical mask. (B: s)∈ Di is a five-dimensional feature vector with B = 
(x1, y1; x2, y2), where (x1, y1) denotes the position of the top left corner of the detection box and (x2, y2) 
denotes the position of the bottom right corner. Each part's feature score is represented by s. The 
dimensions of the candidate image are used to normalize Coordinate B., which satisfies (x1, y1; x2, y2) 
∈ [0, 1]. We create a 15-dimension feature vector (D1, D2, D3) for the mask area. We generate a feature 


 S. Hechmi / Knowledge Engineering and Data Science 2022, 5 (2): 129–136 134 

 
vector in ten dimensions for the face region (D1, D2). A linear SVM is then used to classify the feature 
vectors of the mask, face, and human body regions. The training data are the labeled values (D1, D2, 
D3) discovered by the CNN algorithm. 

E. Performance Evaluation Metrics 

The following lists the many measures employed to evaluate the proposed model. The accuracy, 
precision, and recall equation is in (8) to (10). 

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑝+𝑇𝑛

𝑇𝑝+𝑇𝑛+𝐹𝑝+𝐹𝑛
              (8) 

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑝

𝑇𝑝+𝐹𝑝
           (9) 

𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑝

𝑇𝑝+𝐹𝑛
          (10) 

Tp, Tn, Fp, and Fn denote true positives, true negatives, false positives, and false negatives, 
respectively. True positives are images appropriately categorized as positive, whereas images 
mistakenly classified as positive are known as false positives. True negatives are correctly predicted 
to belong in the negative class, while false negatives are wrongly classified. 

III. Results and Discussion 

In this part, we run a thorough benchmark of our technique and seven face mask detectors on five 
well-known datasets. This benchmark's main objective is to ascertain how these face detectors behave 
while identifying masked faces and in which instances they are likely to succeed or fail. We next 
undertake an in-depth discussion on how to create face detectors capable of handling faces obscured 
by various sorts of masks by evaluating the experimental findings of these face mask detectors. 

All the experimental tests have been conducted using a laptop running on Windows 10 with the 
following specifications: AMD Ryzen 7-5700X processor with 32 GB. In this research, the PyCharm 
program with Python 3.9.12 has been selected for the creation and execution of numerous 
experimental tests using a variety of libraries, including OpenCV 3.0 and Darknet [17]. 

The suggested method was tested against various pre-existing models on the same datasets, and 
the findings are presented in this study. Five contemporary approaches [18][19][20][13][21] were 
chosen for this purpose. Table 1 compares several models with the suggested one using the accuracy 
metric.  

From Table 1, the first four techniques (YOLO v4, R-CNN, ResNet50, and RetinaFaceMask) were 
published between 2020 and 2021, and each of them achieved an accuracy between 84% and 89%. 

Each technique also improved the accuracy of the previous year's best-performing technique by a 
small percentage, ranging from 0.05% to 0.08%. The fifth technique, CenterFace, was published in 
2022 and achieved the highest accuracy of 0.91, with an improvement of 0.03% compared to the 
previous year's best-performing technique. Finally, the last is our proposed technique, which combines 
CNN and SVM; the results exhibit the enhanced accuracy of our method over other recent methods. 
In particular, the suggested model attained a greater accuracy of 94% compared to previous 
techniques. The accuracy is improved by 3% [21], 5% [19], 6% [13], 8% [18], and 10% [20]. 

Table 1. Accuracy evaluation of various techniques 

Technique Year Accuracy %improvement 

YOLO v4 [18] 2020 0.86 +0.08% 

R-CNN [19] 2021 0.89 +0.05% 
ResNet50 [20] 2021 0.84 +0.10% 

RetinaFaceMask [13] 2021 0.88 +0.06% 

CenterFace [21] 2022 0.91 +0.03% 

CNN and SVM (ours) 2022 0.94 - 

 
135 S. Hechmi / Knowledge Engineering and Data Science 2022, 5 (2): 129-136 

Table 2 compares several models with the suggested one using the precision metric. In Table 2, 
the precision of selected models has been analyzed. The precision scores range from 87% to 93%, 
with the CNN and SVM (ours) techniques achieving a precision score of 92%. The % improvement 
column shows the percentage improvement in precision compared to the previous year's best-
performing technique. The results show that our model outperforms [19][20][13][21]. 

Table 3 compares several models with the suggested one using the recall metric. Results indicate 
that our strategy outperforms other recent approaches in terms of recall. Notably, the proposed model 
achieved a higher recall of 93%. The recall is improved by 1% [21], 2% [13], 3% [19], 5% [20], and 
6% [18]. 

Overall, based on the information presented in Table 1, Table 2, and Table 3, we can conclude that 
various techniques for object detection, such as YOLO v4, R-CNN, ResNet50, RetinaFaceMask, 
CenterFace, and a combination of CNN and SVM (proposed), have been developed and evaluated. 
Therefore, we can conclude that our proposed CNN and SVM technique has achieved the highest 
accuracy, precision, and recall scores among the techniques evaluated in this study. 

IV. Conclusion 

Automatic identification of people not wearing face masks is a significant study issue. Using CNN 
and SVM, we propose an accurate real-time technique for detecting face masks. CNN enables the 
extraction of attributes better suited for training the detection model, whereas SVM is utilized for 
classification. The findings show that our approach significantly outperforms the other recent 
techniques utilized in the studies. In the future, we will address the issue of incorrectly worn face 
masks, making the identification system more intelligent. 

 
Declarations  

Author contribution  

All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. 

Funding statement  

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.  

Conflict of interest  

The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence 
the work reported in this paper.  

Additional information  

Reprints and permission information are available at http://journal2.um.ac.id/index.php/keds. 

Publisher’s Note: Department of Electrical Engineering - Universitas Negeri Malang remains neutral with regard to 
jurisdictional claims and institutional affiliations. 

Table 2. Precision evaluation of various techniques 

Technique Year Precision %improvement 

YOLO v4 [18] 2020 0.88 +0.04% 

R-CNN [19] 2021 0.91 +0.01% 
ResNet50 [20] 2021 0.87 +0.05% 

RetinaFaceMask [13] 2021 0.91 +0.01% 

CenterFace [21] 2022 0.93 +0.01% 

CNN and SVM (ours) 2022 0.92 - 

 
Table 3. Precision evaluation of various techniques 

Technique Year Recall %improvement 

YOLO v4 [18] 2020 0.87 +0.06% 

R-CNN [19] 2021 0.90 +0.03% 

ResNet50 [20] 2021 0.88 +0.05% 
RetinaFaceMask [13] 2021 0.91 +0.02% 

CenterFace [21] 2022 0.92 +0.01% 

CNN and SVM (ours) 2022 0.93 - 

 
 S. Hechmi / Knowledge Engineering and Data Science 2022, 5 (2): 129–136 136 

 
References 

[1] C. Huang et al., “Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China,” Lancet, vol. 
395, no. 10223, pp. 497–506, Feb. 2020. 

[2] W. Hariri, “Efficient masked face recognition method during the COVID-19 pandemic,” Signal, Image Video 
Process., vol. 16, no. 3, pp. 605–612, Apr. 2022. 

[3] N. Zhu et al., “A Novel Coronavirus from Patients with Pneumonia in China, 2019,” N. Engl. J. Med., vol. 382, no. 
8, pp. 727–733, Feb. 2020. 

[4] WHO, “Infection prevention and control during health care when coronavirus disease (COVID-19) is suspected or 
confirmed,” WHO, 2021. (Access on 29 July 2022) 

[5] WHO, “Infection prevention and control of epidemic-and pandemic prone acute respiratory infections in health care,” 
WHO, 2014. (Access on 29 July 2022) 

[6] P. Nagrath, R. Jain, A. Madan, R. Arora, P. Kataria, and J. Hemanth, “SSDMNV2: A real time DNN-based face mask 
detection system using single shot multibox detector and MobileNetV2,” Sustain. Cities Soc., vol. 66, p. 102692, Mar. 
2021. 

[7] S. A. Sanjaya and S. Adi Rakhmawan, “Face Mask Detection Using MobileNetV2 in The Era of COVID-19 
Pandemic,” in 2020 International Conference on Data Analytics for Business and Industry: Way Towards a 

Sustainable Economy (ICDABI), Oct. 2020, pp. 1–5. 
[8] G. Wu, “Masked Face Recognition Algorithm for a Contactless Distribution Cabinet,” Math. Probl. Eng., vol. 2021, 

pp. 1–11, May 2021. 

[9] G. Yang et al., “Face Mask Recognition System with YOLOV5 Based on Image Recognition,” in 2020 IEEE 6th 
International Conference on Computer and Communications (ICCC), Dec. 2020, pp. 1398–1404. 

[10] E. Ryumina, D. Ryumin, D. Ivanko, and A. Karpov, “A Novel Method for Protective Face Mask Detection using 
Convolutional Neural Networks and Image Histogram,” Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci., vol. 

XLIV-2/W1-, pp. 177–182, Apr. 2021. 

[11] K. Anirudh, A. Ravi, V. S. Charan, and V. Chaurasiya, “Face Mask Detection Using Machine Learning,” in 2022 
IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Feb. 2022, pp. 

1–5. 

[12] G. K. J. Hussain, R. Priya, S. Rajarajeswari, P. Prasanth, and N. Niyazuddeen, “The Face Mask Detection Technology 
for Image Analysis in the Covid-19 Surveillance System,” J. Phys. Conf. Ser., vol. 1916, no. 1, p. 012084, May 2021. 

[13] X. Fan and M. Jiang, “RetinaFaceMask: A Single Stage Face Mask Detector for Assisting Control of the COVID-19 
Pandemic,” Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern., pp. 832–837, 2021. 

[14] S. Yang, P. Luo, C. C. Loy, and X. Tang, “Wider Face: A Face Detection Benchmark,” in 2016 IEEE Conference on 
Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 5525–5533. 

[15] S. Ge, J. Li, Q. Ye, and Z. Luo, “Detecting Masked Faces in the Wild with LLE-CNNs,” in Proceedings of the IEEE 
Conference on Computer Vision and Pattern Recognition, 2017, pp. 2682–2690. 

[16] B. Huang et al., “Masked Face Recognition Datasets and Validation,” in 2021 IEEE/CVF International Conference 
on Computer Vision Workshops (ICCVW), Oct. 2021, pp. 1487–1491. 

[17] A. Farhadi, “Darknet: Open Source Neural Networks in C.” (Access on 29 July 2022) 
[18] K. Bhambani, T. Jain, and K. A. Sultanpure, “Real-time Face Mask and Social Distancing Violation Detection System 

using YOLO,” in 2020 IEEE Bangalore Humanitarian Technology Conference (B-HTC), Oct. 2020, pp. 1–6. 

[19] J. Zhang, F. Han, Y. Chun, and W. Chen, “A Novel Detection Framework About Conditions of Wearing Face Mask 
for Helping Control the Spread of COVID-19,” IEEE Access, vol. 9, pp. 42975–42984, 2021. 

[20] S. Sethi, M. Kathuria, and T. Kaushik, “A Real-Time Integrated Face Mask Detector to Curtail Spread of 
Coronavirus,” Comput. Model. Eng. Sci., vol. 127, no. 2, pp. 389–409, 2021. 

[21] C. W. Yang, T. H. Phung, H. H. Shuai, and W. H. Cheng, “Mask or Non-Mask? Robust Face Mask Detector via 
Triplet-Consistency Representation Learning,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 18, no. 1s, pp. 

1–19, 2022. 

https://doi.org/10.1016/s0140-6736(20)30183-5
https://doi.org/10.1016/s0140-6736(20)30183-5
https://doi.org/10.1007/s11760-021-02050-w
https://doi.org/10.1007/s11760-021-02050-w
https://www.nejm.org/doi/10.1056/nejmoa2001017
https://www.nejm.org/doi/10.1056/nejmoa2001017
https://www.who.int/publications/i/item/WHO-2019-nCoV-IPC-2021.1
https://www.who.int/publications/i/item/WHO-2019-nCoV-IPC-2021.1
https://www.who.int/publications-detail-redirect/infection-prevention-and-control-of-epidemic-and-pandemic-prone-acute-respiratory-infections-in-health-care
https://www.who.int/publications-detail-redirect/infection-prevention-and-control-of-epidemic-and-pandemic-prone-acute-respiratory-infections-in-health-care
https://doi.org/10.1016/j.scs.2020.102692
https://doi.org/10.1016/j.scs.2020.102692
https://doi.org/10.1016/j.scs.2020.102692
https://doi.org/10.1109/ICDABI51230.2020.9325631
https://doi.org/10.1109/ICDABI51230.2020.9325631
https://doi.org/10.1109/ICDABI51230.2020.9325631
https://doi.org/10.1155/2021/5591020
https://doi.org/10.1155/2021/5591020
https://doi.org/10.1109/ICCC51575.2020.9345042
https://doi.org/10.1109/ICCC51575.2020.9345042
https://pdfs.semanticscholar.org/8d12/2012acbda5af6c3f88fce0b087c83070d619.pdf
https://pdfs.semanticscholar.org/8d12/2012acbda5af6c3f88fce0b087c83070d619.pdf
https://pdfs.semanticscholar.org/8d12/2012acbda5af6c3f88fce0b087c83070d619.pdf
https://doi.org/10.1109/SCEECS54111.2022.9740913
https://doi.org/10.1109/SCEECS54111.2022.9740913
https://doi.org/10.1109/SCEECS54111.2022.9740913
https://iopscience.iop.org/article/10.1088/1742-6596/1916/1/012084/meta
https://iopscience.iop.org/article/10.1088/1742-6596/1916/1/012084/meta
https://doi.org/10.1109/SMC52423.2021.9659271
https://doi.org/10.1109/SMC52423.2021.9659271
https://doi.org/10.1109/CVPR.2016.596
https://doi.org/10.1109/CVPR.2016.596
https://doi.org/10.1109/CVPR.2017.53
https://doi.org/10.1109/CVPR.2017.53
https://doi.org/10.1109/ICCVW54120.2021.00172
https://doi.org/10.1109/ICCVW54120.2021.00172
https://pjreddie.com/darknet/
https://doi.org/10.1109/B-HTC50970.2020.9297902
https://doi.org/10.1109/B-HTC50970.2020.9297902
https://doi.org/10.1109/ACCESS.2021.3066538
https://doi.org/10.1109/ACCESS.2021.3066538
https://doi.org/10.32604/cmes.2021.014478
https://doi.org/10.32604/cmes.2021.014478
https://doi.org/10.1145/3472623
https://doi.org/10.1145/3472623
https://doi.org/10.1145/3472623