Microsoft Word - ETASR_V13_N2_pp10529-10534


Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10529-10534 10529  
 

www.etasr.com Gupta & Agarwal: Recognition of Suspicious Human Activity in Video Surveillance: A Review 

 
Recognition of Suspicious Human Activity in 
Video Surveillance: A Review 

 
Neha Gupta 

Computer Science & Engineering Department, IFTM University, India | Computer Science & 
Engineering Department, Moradabad Institute of Technology, India 
discoverneha@gmail.com 
(corresponding author)  
 
Bharat Bhushan Agarwal 

Computer Science & Engineering Department, School of Computer Science and Applications, IFTM 
University, India 
bharat_agarwal@iftmuniversity.ac.in 
 

Received: 11 February 2023 | Revised: 21 February 2023 | Accepted: 25 February 2023 

 
ABSTRACT 

Over the past few years, there has been a noticeable growth in the use of video surveillance systems, 

frequently functioning as integrated systems that remotely monitor key locations. In order to prevent 

terrorism, theft, accidents, illegal parking, vandalism, fighting, chain snatching, and crime, human 

activities can be observed through visual surveillance in sensitive and public places like buses, trains, 

airports, banks, shopping centers, schools, and colleges. In this paper, a review of the state-of-the-art is 

provided, showing the overall development of identifying suspicious behavior from surveillance recordings 

over the past few years. We give a quick overview of the issues and difficulties associated with recognizing 

suspicious human activity. The purpose of this publication is to give this field's scholars a literature 

evaluation of several suspicious activity recognition systems along with their general structure. 

Keywords-suspicious activity; video surveillance; human activity; deep learning 

I. INTRODUCTION  

Suspicious Human Behavior (SHA) recognition through 
video surveillance is a well-known topic of research in the 
fields of image processing and computer vision and involves 
classifying human activity as normal and abnormal. Unusual or 
suspicious behaviors, which are rarely displayed by persons in 
public settings, can be classed as abnormal. Today, more and 
more people are using video surveillance to keep an eye on 
human activity and stop any questionable behavior. During the 
recent years, human activity recognition has been realized by a 
wide variety of applications in military, intelligence, mass-
transit agencies, and research and academia as a measure to 
counter crime and terrorism, public health monitoring, 
detection of public violent protests and attacks, etc. [1-3]. Due 
to the technological improvement, the means of surveillance 
are easily available, yet the means of continuously and 
effectively processing, analyzing, and detecting are not. 
Intelligent systems that can detect and classify suspicious 
human activities have been established as a crucial means to 
carry the necessary counter-action mechanisms, for controlling 
the situation, and/or for post situation/scenario analysis [4]. 

In general, SHA recognition involves identification of 
human activities that can be classified as normal and abnormal. 

The first classification refers to the common human activities at 
public places. Normal activities are the typical human 
behaviors that occur in public spaces, such as hand-waving, 
applauding, jogging, boxing, and walking. The latter category 
includes behaviors like leaving bags behind, dodging crowds, 
robbing, fighting and attacking, vandalism, crossing borders, 
etc. [5]. The focus is to employ efficient techniques/systems to 
identify the abnormalities in human activities which can be 
used to predict/decide the appropriate course of action. The 
abnormalities can be of several kinds and be mixed with 
normal activities. For example, in the first category, while 
running, a person might drop a bag with explosives, while a 
walking person might point a weapon towards another person, 
and so on. On the other hand, in the second category the 
abnormalities can be classified incorrectly, e.g. a running 
crowd in a marathon does not need to be identified as a 
negative behavior.  

There is a wide range of activities to be identified in SHA 
recognition [5], as shown in Figure 1. Explosives left in 
unattended objects by terrorists in crowded places or secluded 
places have a complex range of tasks to be looked into before 
making appropriate decisions. Illegal parking on roads causing 
traffic jam or accidents needs quick decisions in real life 
applications. Activities involving violence such as street 


Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10529-10534 10530  
 

www.etasr.com Gupta & Agarwal: Recognition of Suspicious Human Activity in Video Surveillance: A Review 

 
fighting, vandalisms, etc. also require quick action to ensure 
minimum damage of life and property. Manual monitoring and 
intervention fail to deal with such situations. Thus, fully 
automatic, effective intelligent surveillance systems that are 
efficient enough to handle real-time scenarios are important to 
develop and install. This survey provides a comprehensive 
detail regarding the different aspects of developing such 
systems, discusses existing works, and finally puts forward the 
prospects.  

 
Fig. 1.  Activity recognition. 

II. SOLUTION APPROACHES AND DISCUSSION 

The following approaches are used by the existing solutions 
for SHA recognition: 

 Object tracking and detection by background subtraction: 
This is the most common initial phase rule followed by the 
existing solutions for SHA recognition. In this phase, items 
are tracked and identified using changes in the order of the 
frames, and then the foreground objects are retrieved. 
Tracking-based or non-tracking-based methods are used to 
identify objects in video frames [6]. 

 Feature extraction: Certain features of objects such as 
motion and shape are extracted using various algorithms to 
identify objects [7]. 

 Object classification: In this step, a method is used to 
distinguish between the various things in the video, such as 
people, guns, cars, etc. Various methods, including SVM, 
Bayesian, Haar-classifier, KNN, face recognition, and skin 
color detection are used. 

 Suspicious activity detection: The final stage after 
classifying the items in the video stream is a comparison 
with several threshold values categorically to check for 
aberrant behavior. 

The specifications as to what can a suspicious event be are 
discussed below. 

 Loitering individuals: We have loitering when a person is 
present at a particular location for a longer period than 
required. The related mathematical formulations can be 
quantified depending on the applications type [8]. 

 Unattended/missing objects: The suspicious objects have 
to be detected in the video frames. However, identification 
of abandoned/unattended object is very difficult in video 
frames in crowded environment. To tackle the problem 
different techniques have been proposed, such as frame 
differencing, optical flow, and background subtraction [9, 
10]. 

 Intruder detection: Intruder detection approaches utilize 
multiple cameras for SHA analysis in cases such as bank 
robberies, chain snatching, etc. Another proposed 
technique is based on Ontology, and is also used in 
airports and banks to recognize SHA. The optical flow 
technique is employed to detect the snatch theft from 
crowd movement pedestrian video footage. 

 Abnormal activity/behavior: It includes fall detection, 
accident detection, crowd detection, fire detection, etc. Fall 
detection is predominantly used in the health monitoring 
system. Herein, static cameras are used for patient care in 
hospitals and unattended elderly people at home in modern 
times [11]. 

III. LITERATURE REVIEW 

In this section we discuss the relevant works regarding 
SHA recognition. 

Author in [12] considered the current camera surveillance 
systems that simultaneously capture dynamic images from 
areas using multiple web cameras. These images are matched 
against the enormous amount of dynamic images, which makes 
the process very tedious for the observer. Meanwhile, if some 
of the images are missed by the observer, the system will fail. 
Authors in [13] proposed a DCNN model called 
DIATRadHARNet designed for SHA classification. The 
following concepts serve as the framework for the scheme: 
depthwise separable convolutions, channel weighting (CHW) 
based on importance, multiple filter sizes in the depthwise 
section, and application of various size kernels to the same 
input tensor. Authors in [14] proposed an object detection 
algorithm for video surveillance and real-time security camera 
systems. To do this, a modified version of the Kanade-Locus-
Thomasi extraction algorithm was presented for object 
tracking. The proposed scheme detects and analyzes a small 
number of features instead of a large number of objects. The 
problem of noise creation is dealt with a Kalman filter. Authors 
in [15] proposed a real-time SHA recognition scheme based on 
a Convolutional Neural Network (CNN) and 2D pose 
estimation technique which is beneficial in a wide range of 
surveillance areas. The skeletal images of humans are extracted 
from the input frames of the video through 2D pose 
estimations. Then, these are then fed to a pre-trained CNN to 
categorize them into different human activities like fall or not 
fall, trespassing or no trespassing, etc. Finally, based on the 
classification, appropriate action can be taken, such as sending 
messages on mobile phones, triggering alarms, etc. Authors in 
[16] proposed an SHA recognition technique known as 
YOLOv3, which handles the complex problem of human 
detection. For this, the video is first converted into a few 
frames then, each frame is analyzed to recognize and detect any 
suspicious activity. 


Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10529-10534 10531  
 

www.etasr.com Gupta & Agarwal: Recognition of Suspicious Human Activity in Video Surveillance: A Review 

 
In [17], the authors handle the complexity of associating the 
skeleton inputs for recognition of both abnormal human 
activity and daily activity by reducing the number of features of 
the datasets through an RNN-based approach. Moreover, an 
LSTM model was used to classify suspicious activities in 
medical applications. In [18], a hierarchical approach was used 
to detect different SHAs such as fainting, loitering, 
unauthorized entry, etc. At first, all SHAs are defined using a 
semantic approach. Then, background subtraction is employed 
to detect the object. Then, a correlation technique is used to 
track the objects. Authors in [19] based their method on two 
features: shape moments and Histogram of Normalized 
Distances (HND) from the center of gravity and contour points 
of the object shape. For categorizing human activities, Naive 
Bayes classifier and Multi-class Support Vector Machine were 
utilized. Authors in [20] focused on detecting gun-based crimes 
and abandoned luggage based on a deep neural network model 
capable of detecting handguns in images. In [21], the authors 
provided an SHA detection and tracking recognition system 
based on artificial neural networks. The silhouette pattern of 
the human blob created through segmenting the camera-
captured video is used in this technique. In [22], the authors 
combined the Bi-LSTM network with Skeleton Activity 
Forecasting (SAF) to create a deep learning-based SHA 
recognition system. The pose estimation in this case uses the 
human skeletal joints, which are viewed as points. On a 
streaming video from an IP networked camera, the skeleton 
tracking and places of interest are approximated. 

In [23], a deep learning based approach is used for SHA 
recognition wherein, at first, the features are calculated from 
video frames. Then, a feature classifier is used to predict 
whether the activity is suspicious. In [24], at first the videos 
were broken down into frames. Then these frames were 
analyzed using background subtraction to detect humans. Then, 
after using a CNN to extract features from the frames, a 
Discriminative Deep Belief Network (DDBN) was used. The 
DDBN is also given some labeled movies of various SHAs, 
and its associated features are also retrieved. The two sets of 
features are then compared to determine if there are normal or 
suspicious behaviors. In [25], a 63 layer-deep CNN model 
called L4-BranchedActionNet is proposed. The framework is 
first trained using an object detection dataset called CIFAR-100 
with the help of the SoftMax function. After that, the dataset 
for SHA recognition is then input to the trained model to obtain 
a set of features. These are then improved utilizing an ant 
colony system and coded features with an entropy-based 
structure. To produce the final results, a variety of SVM and 
KNN-based classifiers are fed with the optimized features. 
Authors in [26] proposed an algorithm for tracking a moving 
object by the video samples captured during movement. This 
method consists of multiple subparts including video sample 
creation, an experimental setup for capturing the data, and 
applying the object tracking algorithm using mean shift 
algorithms. This system can be used for calculating vehicle 
speed, number of vehicles passed, etc. and it has been tested 
over multiple frame rates. The results were not compared with 
the existing work. 

Suspicious activity detection will be a major breakthrough 
in the video surveillance for behavior identification, action 

recognition, activity classification, etc. Authors in [27] proved 
the usability of automated surveillance systems. Surveillance 
plays an important role in maintaining law and order and in 
detection of possible threat. Usually, this process happens 
manually and requires a significant amount of manpower, 
which can be reduced if we automate the process. Authors in 
[28] explored the Situation Awareness (SAW) which is used 
for many crucial applications. The main challenge in SAW is 
faced during the instant changes while identifying objects. It 
also becomes challenging, whenever it focuses on huge video 
frames. The developed system is question-answer based, and 
chooses content as per interest. The interest-based traits can be 
more intricate in some situations than the facial features. 
Authors in [25] developed a CNN model having 63 deep 
layers. The given name of the model is L4-Branched-
ActionNet. The AlexNet has been modified using four 
blanched sub-structures. It is first turned into a network that has 
already been trained using the CIFAR-100 object detection 
dataset and the SoftMax algorithm and the crucial details are 
spotted. To optimize feature subsets, these features are used. 
Entropy is employed to code these traits initially, and an ant 
colony optimization method is then applied. Different variants 
of SVM and KNN were used for classification. Cubic SVM 
provides the highest efficiency of 0.99. Several machine 
learning approaches as described in [29-37] can be utilized in a 
similar way. Authors in [38] proposed a classroom activity 
detection approach using video surveillance. This method's 
drawback is that it is difficult to differentiate between students 
and teacher in the class and to develop a generalized model for 
all scenarios. Also, data from real surveillance can be noisy and 
of low-quality. The proposed work was tested on real 
environment using Siamese neural network for classification 
from classroom recordings. Authors in [39] also worked to 
improve the classroom learning outcome. A brief comparative 
summary of the relevant works is shown in Table I. Similar 
work has been done in [40-42]. 

IV. MOTIVATION AND CHALLENGES 

The motivation of adopting smart and intelligent video 
surveillance systems is mainly to identify human activities 
which are suspicious in nature. In various highly sensitive 
places these systems will aid in the prevention of thefts, 
terrorist acts, hooliganism, fighting, and attacks, and fire. The 
following areas are shielded against shady behavior via 
intelligent video surveillance [7]: 

 Universities and other academic institutions utilize video 
surveillance to keep an eye on student activity to protect 
property from theft and vandalism. Additionally, they assist 
in preventing student fighting and inappropriate behavior. 
When exams are being given, video surveillance may be 
also utilized. 

 Video surveillance helps to prevent theft, vandalism, 
fighting, disease identification, growing crowds, and 
explosive attacks to protect the population and public 
infrastructures including borders, laboratories, jails, military 
bases, temples, etc. 

 The use of video surveillance to identify SHA is expanding 
in the retail sector, both for internal security in places like 


Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10529-10534 10532  
 

www.etasr.com Gupta & Agarwal: Recognition of Suspicious Human Activity in Video Surveillance: A Review 

 
warehouses and stores as well as for external security in 
places like parking lots. Even tiny businesses are using 
cameras to keep an eye on people and record video proof in 
case of theft or an incident.  

 In every nation, the safety of travelers, runways, and 
aircrafts is paramount in airports, which are highly sensitive 
security zones. Such security-sensitive places are protected 
with high levels of protection thanks to a real-time system 
that detects SHA in video monitoring.  

 In the banking industry, video monitoring is crucial for 
ensuring security. The use of weapons during armed 
robberies and assaults is prevented by the presence of 

cameras. Automated banking devices are frequently 
targeted by criminals. One way a security camera can help 
in the identification of fraud is by installing a device to read 
the magnetic information on bank cards.  

 Recognizing suspicious behavior from video surveillance 
can assist in finding frauds, heists, and other crimes. 
Intelligent video surveillance is an intriguing technique to 
assist security officers because monitoring a casino 
necessitates seeing human movement in a congested setting. 

 In hospitals, video surveillance is also used to keep an eye 
on patients. At home, it can be used to keep an eye on 
elderlies or children. 

TABLE I.  SHA LITERATURE REVIEW SUMMARY 

Ref Object detection method Noise removal method Remarks 

[12] Background subtraction Smoothing filter 
Finds the detection point of SHA and evalueates the corresponding 

degree of risk 
[13] Lighweight deep CNN Spatial filter Efficient classification with high accuracy 
[14] KLT and Kalman filter 

 
Efficient real time tracking 

[15] 2D pose estimation and CNN 
 

Effective response system 
[16] YOLOv3 

 
Quick processing, accurate detection 

[17] Multilayer LSTM network 
 

Less training data, better cross-view and cross-subject evaluation 
[18] Background subtraction A thresholding technique Less complexity, high accuracy 
[19] Naïve Bayes classifier 

 
Effective and accurate detection and prediction 

[20] Deep neural network 
 

Gun-crime andabandoned laggage detection 
[21] ANN 

 
High accuracy and robustness 

[22] Bi-LSTM network Adaptive thresholding Skeleton tracking 
[23] CNN and RNN 

 
Apriori detection, simple, yet powerful 

[24] CNN and DDBN 
 

High accuracy 
[25] Entropy-coded ant colony system 

 
High accuracy 

[26] Mean shift algorithm 
 

Efficient object tracking. Box and image sequence segmentation 

[27] 
Faster region-based CNN inception V2 

framework  
SHA detection in public places 

[28] I-ViSE and deep neural networks Smoothing filter 
Emphasis is given on ensuring that the photos analyzed have 

significant content and good quality 
[29] Siamese neural framework Textual windows across segment Superior prediction perfomance for online and offline classrooms 

[30] 
Deep fully connected convonutional and 

recurrent neural network  
Precice estimates of the total amount of time spent in classes or 

activities 
 

To develop an intelligent video surveillance system that can 
automatically detect SHA, there are several issues and 
challenges to overcome [7]: 

 Moving object recognition is challenging to accurately 
process due to dynamic fluctuation in natural environments, 
such as slow illumination changes brought on by day-to-
night shifts and quick illumination differences brought on 
by weather changes. 

 The look of an object is altered by its shadow, making it 
difficult to track and identify the specific object in a video. 
Characteristics like shape, movement, and background, are 
more delicate for a shadow.  

 Occasionally, noises made by swaying tree branches make 
it difficult to identify an object in a video. 

 Finding the object in a busy location is a difficult process. It 
is quite challenging to discover abandoned objects, theft, or 
violence in such a circumstance.  

 Full or partial object occlusions occur occasionally, the 
objects in video are entirely or partially obscured, making 
identification difficult. 

 It might be quite difficult to identify foreground items in 
low-resolution films. Identification of object boundaries 
becomes extremely challenging. 

 The creation of a real-time, intelligent monitoring system is 
the more difficult task. When extracting and tracking the 
foreground objects from films with complicated 
backgrounds, processing time increases. 

 Since background subtraction in abandoned object detection 
only recognizes moving things as the foreground, static 
object detection is a difficult task. 

V. MEASURES TO DETECT SHA 

In general, the measures used to detect SHA follow a 
hierarchical structure as shown in Figure 2. At first the video is 
split into a sequence of frames. Then, background subtraction 
is used to detect changes in the sequences of the frame and then 
extraction of foreground objects. Following this, the different 
types of objects are attempted to be extracted by the system. 
Then, the different obstructions in the form of noise, shadow 
etc. are removed from the frame containing the object. Finally, 
the recognizable object is obtained from the surveillance 


Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10529-10534 10533  
 

www.etasr.com Gupta & Agarwal: Recognition of Suspicious Human Activity in Video Surveillance: A Review 

 
footage. Figure 3 represents the specific steps regarding the 
general steps in entity (object) detection in SHA recognition. 
The first step is the identification of the different kinds of steps. 
To stop terrorist bomb attacks, suspicious activity detection 
also includes the detection of abandoned objects. Background 
approaches in video surveillance treat stationary items as the 
background and moving things as the foreground. Therefore, a 
newly arrived object is absorbed into the backdrop when it 
becomes static. 

 
Fig. 2.  General steps in entity detection. 

 
Fig. 3.  Specifications regarding the general steps in entity (object) 
detection in SHA recognition. 

Finding foreground objects that are free from noise, 
illumination, or shadow is incredibly challenging in computer 
vision. Noise makes it difficult to identify an object, 
illumination produces false detection, and shadow alters the 
object's appearance, making object tracking particularly 
challenging. The right features must be chosen in order for 
video surveillance to automatically detect anomalous activity. 
Finding the most valuable information in the captured video is 
the main goal of feature extraction. After determining whether 
there are foreground-moving or still objects in a frame, the 
object categorization stage is used to determine whether the 
behavior is normal or abnormal. A static human is 
distinguished from a static abandoned object, a fight is 
distinguished from a boxing match, a face is distinguished from 
an object that has skin color, fire is distinguished from a 
flashlight, the sun, or any other artificial light source, and so 
on. 

VI. CONCLUSION AND FUTURE WORK 

The current paper reviews various intelligent and automatic 
SHA recognition surveillance techniques. The reviewed papers 
cover a wide range of applications associated with real-time 
implementation. The applications range from health-care 
systems to perimeter monitoring in war fields. The techniques 
and dataset used are the most common ones pertaining to the 

SHA domain. Note that recognition of human behavior is a 
complex process, and monitoring and analyzing human 
activities is difficult. Despite the complex nature of the 
problem, it is also required in day-to-day life in order to 
improve safety in modern society. On the other hand, 
recognition of suspicious activity of non-living things is 
relatively less complex but requires a great deal of expertise to 
deduce correct results. We discussed the related works and 
mentioned their shortcomings. The weaknesses can be 
significantly improved and pave the way for a wide range of 
open research.  

REFERENECES 

[1] J. Candamo, M. Shreve, D. B. Goldgof, D. B. Sapper, and R. Kasturi, 
"Understanding Transit Scenes: A Survey on Human Behavior-
Recognition Algorithms," IEEE Transactions on Intelligent 
Transportation Systems, vol. 11, no. 1, pp. 206–224, Mar. 2010, 
https://doi.org/10.1109/TITS.2009.2030963. 

[2] S. A. Shah and F. Fioranelli, "RF Sensing Technologies for Assisted 
Daily Living in Healthcare: A Comprehensive Review," IEEE 
Aerospace and Electronic Systems Magazine, vol. 34, no. 11, pp. 26–44, 
Aug. 2019, https://doi.org/10.1109/MAES.2019.2933971. 

[3] O. D. Lara and M. A. Labrador, "A Survey on Human Activity 
Recognition using Wearable Sensors," IEEE Communications Surveys & 
Tutorials, vol. 15, no. 3, pp. 1192–1209, 2013, https://doi.org/10.1109/ 
SURV.2012.110112.00192. 

[4] W. Huang, L. Zhang, W. Gao, F. Min, and J. He, "Shallow 
Convolutional Neural Networks for Human Activity Recognition Using 
Wearable Sensors," IEEE Transactions on Instrumentation and 
Measurement, vol. 70, pp. 1–11, 2021, https://doi.org/10.1109/TIM. 
2021.3091990. 

[5] R. K. Tripathi, A. S. Jalal, and S. C. Agrawal, "Suspicious human 
activity recognition: a review," Artificial Intelligence Review, vol. 50, 
no. 2, pp. 283–339, Aug. 2018, https://doi.org/10.1007/s10462-017-
9545-7. 

[6] J. M. McHugh, J. Konrad, V. Saligrama, and P.-M. Jodoin, 
"Foreground-Adaptive Background Subtraction," IEEE Signal 
Processing Letters, vol. 16, no. 5, pp. 390–393, Feb. 2009, 
https://doi.org/10.1109/LSP.2009.2016447. 

[7] A. Yilmaz, O. Javed, and M. Shah, "Object tracking: A survey," ACM 
Computing Surveys, vol. 38, no. 4, Sep. 2006, Art. no. 13, 
https://doi.org/10.1145/1177352.1177355. 

[8] S. Patil and K. Talele, "Suspicious movement detection and tracking 
based on color histogram," in International Conference on 
Communication, Information & Computing Technology, Mumbai, India, 
Jan. 2015, pp. 1–6, https://doi.org/10.1109/ICCICT.2015.7045698. 

[9] T. Y. Lai, J. Y. Kuo, C.-H. Liu, Y. W. Wu, Y.-Y. Fanjiang, and S.-P. 
Ma, "Intelligent Detection of Missing and Unattended Objects in 
Complex Scene of Surveillance Videos," in International Symposium on 
Computer, Consumer and Control, Taichung, Taiwan, Jun. 2012, pp. 
662–665, https://doi.org/10.1109/IS3C.2012.172. 

[10] T. T. Zin, P. Tin, H. Hama, and T. Toriu, "Unattended object intelligent 
analyzer for consumer video surveillance," IEEE Transactions on 
Consumer Electronics, vol. 57, no. 2, pp. 549–557, Feb. 2011, 
https://doi.org/10.1109/TCE.2011.5955191. 

[11] K. K. Verma, B. M. Singh, and A. Dixit, "A review of supervised and 
unsupervised machine learning techniques for suspicious behavior 
recognition in intelligent surveillance system," International Journal of 
Information Technology, vol. 14, no. 1, pp. 397–410, Feb. 2022, 
https://doi.org/10.1007/s41870-019-00364-0. 

[12] M. Takai, "Detection of suspicious activity and estimate of risk from 
human behavior shot by surveillance camera," in Second World 
Congress on Nature and Biologically Inspired Computing, Kitakyushu, 
Japan, Dec. 2010, pp. 298–304, https://doi.org/10.1109/NABIC. 
2010.5716350. 


Engineering, Technology & Applied Science Research Vol. 13, No. 2, 2023, 10529-10534 10534  
 

www.etasr.com Gupta & Agarwal: Recognition of Suspicious Human Activity in Video Surveillance: A Review 

 
[13] M. Chakraborty, H. C. Kumawat, S. V. Dhavale, and A. B. Raj A., 
"DIAT-RadHARNet: A Lightweight DCNN for Radar Based 
Classification of Human Suspicious Activities," IEEE Transactions on 
Instrumentation and Measurement, vol. 71, pp. 1–10, 2022, 
https://doi.org/10.1109/TIM.2022.3154832. 

[14] S. Nandyal and S. Angadi, "Recognition of Suspicious Human Activities 
using KLT and Kalman Filter for ATM Surveillance System," in 
International Conference on Innovative Practices in Technology and 
Management, Noida, India, Feb. 2021, pp. 174–179, https://doi.org/ 
10.1109/ICIPTM52218.2021.9388322. 

[15] A. S. Dileep, S. S. Nabilah, S. Sreeju, K. Farhana, and S. Surumy, 
"Suspicious Human Activity Recognition using 2D Pose Estimation and 
Convolutional Neural Network," in International Conference on 
Wireless Communications Signal Processing and Networking, Chennai, 
India, Mar. 2022, pp. 19–23, https://doi.org/10.1109/ 
WiSPNET54241.2022.9767152. 

[16] N. Bordoloi, A. K. Talukdar, and K. K. Sarma, "Suspicious Activity 
Detection from Videos using YOLOv3," in 17th India Council 
International Conference, New Delhi, India, Dec. 2020, pp. 1–5, 
https://doi.org/10.1109/INDICON49873.2020.9342230. 

[17] R. Nale, M. Sawarbandhe, N. Chegogoju, and V. Satpute, "Suspicious 
Human Activity Detection Using Pose Estimation and LSTM," in 
International Symposium of Asian Control Association on Intelligent 
Robotics and Industrial Automation, Goa, India, Sep. 2021, pp. 197–
202, https://doi.org/10.1109/IRIA53009.2021.9588719. 

[18] U. M. Kamthe and C. G. Patil, "Suspicious Activity Recognition in 
Video Surveillance System," in Fourth International Conference on 
Computing Communication Control and Automation, Pune, India, Aug. 
2018, pp. 1–6, https://doi.org/10.1109/ICCUBEA.2018.8697408. 

[19] H. Samir, H. E. Abd El Munim, and G. Aly, "Suspicious Human 
Activity Recognition using Statistical Features," in 13th International 
Conference on Computer Engineering and Systems, Cairo, Egypt, Dec. 
2018, pp. 589–594, https://doi.org/10.1109/ICCES.2018.8639457. 

[20] S. Loganathan, G. Kariyawasam, and P. Sumathipala, "Suspicious 
Activity Detection in Surveillance Footage," in International Conference 
on Electrical and Computing Technologies and Applications, Ras Al 
Khaimah, United Arab Emirates, Nov. 2019, pp. 1–4, 
https://doi.org/10.1109/ICECTA48151.2019.8959600. 

[21] M. K. Fiaz and B. Ijaz, "Vision based human activity tracking using 
artificial neural networks," in International Conference on Intelligent 
and Advanced Systems, Kuala Lumpur, Malaysia, Jun. 2010, pp. 1–5, 
https://doi.org/10.1109/ICIAS.2010.5716186. 

[22] D. Kumar and S. R. Sailaja, "Abnormal Activity Recognition using Deep 
Learning in Streaming Video for Indoor Application," in ITU 
Kaleidoscope: Connecting Physical and Virtual Worlds, Geneva, 
Switzerland, Dec. 2021, pp. 1–7, https://doi.org/10.23919/ITUK53220. 
2021.9662095. 

[23] C. V. Amrutha, C. Jyotsna, and J. Amudha, "Deep Learning Approach 
for Suspicious Activity Detection from Surveillance Video," in 2nd 
International Conference on Innovative Mechanisms for Industry 
Applications, Bangalore, India, Mar. 2020, pp. 335–339, https://doi.org/ 
10.1109/ICIMIA48430.2020.9074920. 

[24] B. A. Alavudeen, P. Parthasarathy, and S. Vivekanandan, "Detection of 
Suspicious Human Activity based on CNN-DBNN Algorithm for Video 
Surveillance Applications," in Innovations in Power and Advanced 
Computing Technologies, Vellore, India, Mar. 2019, vol. 1, pp. 1–7, 
https://doi.org/10.1109/i-PACT44901.2019.8960085. 

[25] T. Saba, A. Rehman, R. Latif, S. M. Fati, M. Raza, and M. Sharif, 
"Suspicious Activity Recognition Using Proposed Deep L4-Branched-
Actionnet With Entropy Coded Ant Colony System Optimization," IEEE 
Access, vol. 9, pp. 89181–89197, 2021, https://doi.org/10.1109/ 
ACCESS.2021.3091081. 

[26] G. Mathur, D. Somwanshi, and M. M. Bundele, "Intelligent Video 
Surveillance based on Object Tracking," in 3rd International Conference 
and Workshops on Recent Advances and Innovations in Engineering, 
Jaipur, India, Nov. 2018, pp. 1–6, https://doi.org/10.1109/ICRAIE.2018. 
8710421. 

[27] R. Srinath, J. Vrindavanam, V. P. Vasudev, S. Supreeth, H. Raj, and A. 
Kesarwani, "A Machine Learning Approach for Localization of 

Suspicious Objects using Multiple Cameras," in IEEE International 
Conference for Innovation in Technology, Bangluru, India, Nov. 2020, 
pp. 1–6, https://doi.org/10.1109/INOCON50539.2020.9298364. 

[28] S. Yahya Nikouei, Y. Chen, A. Aved, and E. Blasch, "I-ViSE: 
Interactive Video Surveillance as an Edge Service using Unsupervised 
Feature Queries," Mar. 2020. https://doi.org/10.48550/arXiv.2003. 
04169. 

[29] K. R. Kodepogu et al., "A Novel Deep Convolutional Neural Network 
for Diagnosis of Skin Disease," Traitement du Signal, vol. 39, no. 5, pp. 
1873–1877, Nov. 2022, https://doi.org/10.18280/ts.390548. 

[30] N. Kumar and D. Aggarwal, "LEARNING-based Focused WEB 
Crawler," IETE Journal of Research, pp. 1–9, Feb. 2021, 
https://doi.org/10.1080/03772063.2021.1885312. 

[31] M. Kaur, V. Kumar, V. Yadav, D. Singh, N. Kumar, and N. N. Das, 
"Metaheuristic-based Deep COVID-19 Screening Model from Chest X-
Ray Images," Journal of Healthcare Engineering, vol. 2021, Mar. 2021, 
Art. no. e8829829, https://doi.org/10.1155/2021/8829829. 

[32] N. Kumar, N. Narayan Das, D. Gupta, K. Gupta, and J. Bindra, 
"Efficient Automated Disease Diagnosis Using Machine Learning 
Models," Journal of Healthcare Engineering, vol. 2021, May 2021, Art. 
no. e9983652, https://doi.org/10.1155/2021/9983652. 

[33] N. Kumar, M. Gupta, D. Gupta, and S. Tiwari, "Novel deep transfer 
learning model for COVID-19 patient detection using X-ray chest 
images," Journal of Ambient Intelligence and Humanized Computing, 
vol. 14, no. 1, pp. 469–478, Jan. 2023, https://doi.org/10.1007/s12652-
021-03306-6. 

[34] M. Gupta, N. Kumar, B. K. Singh, and N. Gupta, "NSGA-III-Based 
Deep-Learning Model for Biomedical Search Engines," Mathematical 
Problems in Engineering, vol. 2021, May 2021, Art. no. e9935862, 
https://doi.org/10.1155/2021/9935862. 

[35] N. Kumar, M. Gupta, D. Sharma, and I. Ofori, "Technical Job 
Recommendation System Using APIs and Web Crawling," 
Computational Intelligence and Neuroscience, vol. 2022, Jun. 2022, Art. 
no. e7797548, https://doi.org/10.1155/2022/7797548. 

[36] M. Gupta, N. Kumar, N. Gupta, and A. Zaguia, "Fusion of multi-
modality biomedical images using deep neural networks," Soft 
Computing, vol. 26, no. 16, pp. 8025–8036, Aug. 2022, 
https://doi.org/10.1007/s00500-022-07047-2. 

[37] A. Hashmi et al., "Contrast Enhancement in Mammograms Using 
Convolution Neural Networks for Edge Computing Systems," Scientific 
Programming, vol. 2022, Apr. 2022, Art. no. e1882464, 
https://doi.org/10.1155/2022/1882464. 

[38] H. Li, Z. Wang, J. Tang, W. Ding, and Z. Liu, "Siamese Neural 
Networks for Class Activity Detection," in 21st International 
Conference on Artificial Intelligence in Education, Ifrane, Morocco, Jul. 
2020, pp. 162–167, https://doi.org/10.1007/978-3-030-52240-7_30. 

[39] E. Slyman, C. Daw, M. Skrabut, A. Usenko, and B. Hutchinson, "Fine-
Grained Classroom Activity Detection from Audio with Neural 
Networks." arXiv, Nov. 09, 2021, https://doi.org/10.48550/arXiv.2107. 
14369. 

[40] Y. Said, M. Barr, and H. E. Ahmed, "Design of a Face Recognition 
System based on Convolutional Neural Network (CNN)," Engineering, 
Technology & Applied Science Research, vol. 10, no. 3, pp. 5608–5612, 
Jun. 2020, https://doi.org/10.48084/etasr.3490. 

[41] B. A. Mossaad, S. Elkosantini, and M. Abid, "An Automated 
Surveillance System Based on Multi-Processor and GPU Architecture," 
Engineering, Technology & Applied Science Research, vol. 7, pp. 2319–
2323, Dec. 2017, https://doi.org/10.48084/etasr.1645. 

[42] N. Kumar, A. Hashmi, M. Gupta, and A. Kundu, "Automatic Diagnosis 
of Covid-19 Related Pneumonia from CXR and CT-Scan Images," 
Engineering, Technology & Applied Science Research, vol. 12, no. 1, 
pp. 7993–7997, Feb. 2022, https://doi.org/10.48084/etasr.4613.