Microsoft Word - 13-3347_s_ETASR_V10_N2_pp5412-5418 Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5412-5418 5412 www.etasr.com Ullah & Altamimi: Panic Detection in Crowded Scenes Panic Detection in Crowded Scenes Ahmed B. Altamimi College of Computer Science and Engineering University of Hail Hail, Saudi Arabia altamimi.a@uoh.edu.sa Habib Ullah College of Computer Science and Engineering University of Hail Hail, Saudi Arabia h.ullah@uoh.edu.sa Abstract—A crowd is a gathering of a huge number of individuals in a confined area. Early identification and detection of unusual behaviors in terms of panic occurring in crowded scenes are very important. Panic detection comprises of formulating normal scene behaviors and detecting and identifying non-matching behaviors. However, panic detection and recognition is a very difficult problem, especially when considering diverse scenes. Many methods proposed to cope with these problems have limited robustness as the density of the crowd varies. In order to handle this challenge, this paper proposes the integration of different features into a unified model. Discriminant binary patterns and neighborhood information are used to model complex and unique motion patterns in order to characterize different levels of features for diverse types of crowd scenes, focusing in particular on the detection of panic and non- pedestrian entities. The proposed method was evaluated considering two benchmark datasets and outperformed five existing methods. Keywords—crowd analysis; anomaly detection; congestion; bottleneck identification I. INTRODUCTION Crowd scene analysis is major problem in computer vision. The challenges of crowd scene analysis are found in gatherings which attract many people, such as sports, festivals, social, political and religious gatherings. Important problems and security concerns are raised in huge crowds. Therefore, dealing with the crowd scenes is a very challenging problem in the subfield of video surveillance, due to the complex behaviors of individuals. A method developed for this purpose could be deployed for detecting critical crowd levels, people counting, and anomalies in such environments. Detecting panic situations in such scenes becomes extremely important for crowd control and safety. Furthermore, in public gatherings, it is important to know the ongoing situation and the behavior of people attending an event, as it can provide useful information for future event planning and public space design. This study aims at detecting panic and non-pedestrian entities representing abnormal behaviors in crowd scenes, integrating robust features into a unified model. The structure of the proposed method consists of the Discriminant Binary Pattern (DBP) [1] and the neighborhood orientation information [2]. These features are combined and classified using a multi-class support vector machine [3]. In the crowd scene context, different feature integration could represent individuals, and patterns that reflect the collective behavior of crowd arise from the interaction of individuals. This kind of modeling deals with the challenges of crowded scenes in terms of various distributions, sparseness, consistency and inconsistency of the scene. Many computer vision based methods have been proposed to detect panic and other anomalies in crowded scenes. However, most of them are modeled to address a specific scene, while different representations of motion and appearance are analyzed with different techniques. The proposed method explores the properties of abnormal situations, considering both the anomalies in terms of panic and non-pedestrian entities. This method computes dense motion trajectories from input videos. Considering the computed trajectories, different features are combined into a unified hybrid model. Furthermore, the underlying complex motion information is explored, which can formulate complex events in the crowd. This information consists of mid-level characteristics that model the distance between low and high- level features for representing anomalous situations. The proposed hybrid features do not limit the type of features or scenes, enabling the extension of the technique to broader research fields. For experimental evaluation, the method is applied on two widely used benchmark datasets. Additionally, the proposed technique is compared with five existing methods, namely the Spatio-Temporal Texture Model (STTM) [4], the Abnormal Crowd Behavior Model (ACBM) [5], and the spatio-temporal volume based methods: Mixture of Dynamic Texture (MDT) [6], Data Mining for Anomaly (DMA) [7], and Optical Flow and Texture for Anomaly (OFTA) [8]. II. LITERATURE REVIEW Recently, deep learning methods have shown promising results in different applications in the field of computer vision. Therefore, different computer vision and deep learning based methods have been successfully employed to solve problems related to crowded scenes in real time [9-11]. Crowded scenes can be categorized on gathering locations, e.g., congested urban areas, historical events organized in museums, music and fun concerts, arrival and departure zones in airports, and different sports in stadiums. Various proposed methods can generally help safety, security staff and administrators in identifying security methods. Different crowd analysis methods consider the input data in different ways. For example, some methods process videos from stored archives. The methods Corresponding author: Habib Ullah Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5412-5418 5413 www.etasr.com Ullah & Altamimi: Panic Detection in Crowded Scenes introduced in [12, 13] extract frames from video CCTV cameras. Some researchers introduced unified frameworks for crowd congestion analysis, individual counting, crowd flow analysis on pedestrian pathways, crowd fight analysis, panic and bottleneck detection during exit and entrance [14-16]. In [17], a hybrid method was used to analyze crowd from different angles, such as crowd moving in different directions, computing statistics related to a static crowd, calculation of areas covered by potential crowd, and crowd’s social behavior analysis in the form of groups or as a whole. This technique could be deployed to take into account different aspects in public locations, considering a unified framework. Moreover, the techniques introduced in [18-20] could be used to direct crowd flow to avoid stampede or other adverse events. In addition, crowd analysis approaches driven by tracking techniques [21-23] can be exploited to track suspicious individuals in a congested area, in order to ensure security. The tracking driven technique in [24] could be used to understand how individuals are behaving and moving in crowded scenes. These techniques can be used to divert crowd flows in public places to avoid congestion and stampede. In general, crowd analysis methods can be divided into two categories. In the first, each entity in the crowd is considered individually and they are combined to model crowd behavior. However, these methods can only be considered when the scene is not occupied by a very dense crowd. In the second category, methods treat the whole crowd as a single entity. These techniques perform very well in dense crowded scenes. The techniques in [25-27] fall in the first category, and they can be utilized to track individual people in the crowd, calculate the waiting time of a static crowd, compute the length of individuals in a queue, count the number of people in crowded scenes, and analyze people’s engagement in crowd scenes. The techniques presented in [28-30] fall in the second category, and they can help security staff and administrators to have in insight into a crowd scene and get the essence of the behavior as a whole. Different methods have been proposed in dealing with the problems of crowd analysis considering different factors such as individual tracking, abnormal situation detection, flow analysis, and situation awareness. The method proposed in [31] combines different data from different sensors to get more accurate localization of the target in the crowded scene, introducing a modified ensemble of Kalman filters to deal with a high degree of nonlinearity. The method in [6] detects and recognizes anomalies in crowded scenes using features related to the appearance and the dynamics of the crowd, considering both temporal and spatial information to compute mixtures of dynamic textures. The model presented in [32] uses a multi- scale deep convolutional neural network to count the number of individuals in a single image with arbitrary crowd density and perspective. Authors in [33] presented an aggregation of ensembles considering pre-trained ConvNets and a pool of classifiers in order to detect anomalies in crowded scenes. Their approach was inspired by the concept that a set of different fine-tuned CNNs represents various levels of semantic characterization, and therefore encodes a very rich set of robust features. The method in [34] was an agent-based crowd simulation model that exploited a path planning strategy. Different elements, including traveling distance and turning angle, can be used efficiently for path planning in crowded scenes. An inspiration of momentum from Physics was exploited in [35], integrating foreground information with an object’s motion, based on background subtraction, feature computation, behavior identification, and anomaly detection. Authors in [36] discussed various approaches for the analysis of both high and low density crowds to facilitate personal mobility, safety, and security, enabling assistive robotics in crowded scenes. This study demonstrated the main challenges and solutions for the analysis of unified behaviors, in order to explore interpersonal relations and social interaction of people in crowd scenes. Authors in [37] investigated a confined set of socio-cognitive crowd behaviors, discovering the interrelated connection between the movements of individuals to analyze different crowd behaviors. This was a layered approach segmenting visual analysis and semantic crowd behaviors. TABLE I. METHODS, FEATURE MODELS AND DATASETS USED Ref. Publication date Features/Model Dataset [9] 2017 Statistical UCD [11] 2018 Nested motion PETS2009UMN [12] 2019 Spatio-temporal Collected data [13] 2019 Intra-frame classification WWW crowd [19] 2019 Social model CUHK crowd [20] 2019 Depth information UCF [23] 2019 Texture features PNNL parking [24] 2019 Spatio-temporal UCF crowd [26] 2016 Simulation features Mall dataset [27] 2019 Entropy features Crowd dataset [29] 2019 Probabilistic model UMN crowd [6] 2010 Dynamic texture UCSD [33] 2020 Aggregation of ensembles Collected data [35] 2019 Histogram features UCSD, PETS [7] 2018 Visual features WWW crowd [8] 2019 Optical flow features CUHK crowd [38] 2016 Holistic features PNNL parking [39] 2011 Low-level features Collected data [40] 2011 Adaptive features UMN [4] 2019 Spatio-temporal CUHK crowd A method based on artificial bacteria colony was investigated in [7], where the optical flow of frames was exploited to get the foreground segments with entity motions as layers. Surveillance problems using image processing and machine learning techniques were studied in [8], exploring algorithms of identifying abnormal crowd behaviors, proposing a unified method for anomaly detection, considering both crowd motion and texture-based analysis. A deep learning method was proposed in [41], calculating crowd density from individual images representing different crowd densities, exploiting both deep and shallow fully convolutional networks to estimate the density map of a crowd image. This method is good in encoding both high level semantic information and low level features. Authors in [42] analyzed stationary crowd by computing the duration of a static foreground pixel. For this purpose, they used dynamic constraints, spatial and temporal features, and mixed partials. Authors in [38] used holistic features for crowd anomaly detection, fusing together different features including crowd collectiveness, conflict, density and mean motion speed. Authors in [43] mapped a given crowd scene to its density, avoiding people occlusion in dense crowd, foreground and background similarities, and huge changes in camera viewpoints. In [44] a combined framework was Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5412-5418 5414 www.etasr.com Ullah & Altamimi: Panic Detection in Crowded Scenes presented to deal with the different elements of a crowd scene, including people counting, abnormal behavior detection, and different crowd density. A contextual pyramid technique was presented in [45] to produce a robust map of crowd density and people count, by using both global and local contextual features. Recently, some advanced deep learning methods [46, 47] were presented, which could improve the performance of crowd analysis techniques. The modeling of complex crowd scenes using deep Elman neural network architecture in [46], could extract in a better way the complex structure of a crowd, and emulate perfectly such dynamic systems. III. PROPOSED METHOD Panic detection and non-pedestrian entity localization is a challenging task due to its associated complexities. For this purpose, discriminative features are explored [1]. There is a lack in researches that efficiently assess the underlying structure of crowd scenes, investigating the most discriminant pattern representation of abnormal situations. This study’s purpose is to exploit the connection between the motion feature extraction and their discriminability, proposing an integrated model characterized by the discriminative power of combining different features. This framework renders a deep insight into the optimal feature extraction for anomaly detection. This study avoids the modeling of traditional features that use single-type representation and features unable to encode the motion and crowd structure information. Moreover, this study is exploiting local neighborhood features. A refined feature [2] combines more information of the intrinsic structure and is effective for crowd scene analysis. The method’s flow diagram is shown in Figure 1. Fig. 1. Proposed method's flow diagram The effectiveness of the selected features is not limited to abnormal situation detection, they could also be widely used in many other applications in the field of computer vision and machine learning. The proposed method computes repetitions of different orientations in localized areas of a crowd video. This framework gets some inspiration from edge orientation histograms [48], scale-invariant feature transform descriptors [49], and shape contexts. A significant difference lies in the unification of completely different features, as the model is formulated on a dense grid of all pixels in each frame. The intuition behind the feature unification was that localized crowd pattern appearance and layout within a video frame could be formulated by the distribution of intensity and orientation information. Previous studies [50-52] extracted different types of orientation features. However, this study investigated the important discriminability of feature unification for generalized purposes, exploring a model to characterize the discriminative power of various types of orientations, so that more discriminative features can be fused together. The proposed method is effective and compact for abnormal situation detection. Hence, it demonstrates complex and unique motion patterns, and it could be considered as a mid-level hierarchy to integrate the space between low and high-level information in order to capture complex crowd events. Features represent motion information in the neighborhood of the region under observation. The most significant ability of this method is the unification of features in the local neighborhood. Crowd scene layout (i.e. appearance structure of the scene) captures reliable information in a reasonable neighborhood, discovering important hints to recognize various events of interest in crowds. The conventional techniques extract gradient locations and orientation information for feature modeling. The proposed method differs from these as it considers the difference between each pixel and the center one, within local windows of its eight neighbors to produce different orientation features, whereas conventional techniques are confined to only horizontal and vertical. Therefore this kind of conventional modeling cannot be effectively applied without any distinction in the significance and the influence of different orientations. The proposed refined features improve the local scene description ability, enhancing the intrinsic structure of the crowd scenes. In this method, each frame is extracted from the crowd video. It is worth noticing that each class label for each event in the scene is considered. Abnormal situations in each crowd video consist of multiple abnormal events. Inspired by [1], the abnormal situation is extracted through the Discriminant Binary Pattern (DBP). DBP can model the scene changes and implicitly formulate the multiple dominant features, as: 𝜁 = exp −𝜋 + cos 2𝜋𝜇(𝑥𝑐𝑜𝑠𝜃 + 𝑦𝑠𝑖𝑛𝜃) (1) where 𝜁 represents the function to extract direction information. The potential feature will maximize this response, and it can be considered as the prominent direction feature of the abnormal situation. In (1), 𝜖 and 𝜌 represent the standard deviations of the data surrounding the position x and y. The convolution between 𝜁and a video frame I is represented by: 𝜉 (𝑥, 𝑦) = 𝜁 ∗ 255 − 𝐼(𝑥, 𝑦) (2) where * represents the convolution operation, and functions 𝜁 and 𝜉 obtain convolved results for each pixel of the video frame. In order to identify individual pixels related to abnormal situations in the crowded scene, the following modeling is performed: 𝜃 𝐼(𝑥, 𝑦) = 𝑎𝑟𝑔𝑚𝑎𝑥𝜉 (𝑥, 𝑦) (3) where 𝜃 represents the orientation of the pixel under observation. Equation (3) characterizes the connection between the direction feature extraction and the discriminability of the abnormal situation. To enhance the method’s encoding ability, it is consolidated as [2]: Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5412-5418 5415 www.etasr.com Ullah & Altamimi: Panic Detection in Crowded Scenes 𝛽 = {𝛼 − 𝛼 , 𝛼 − 𝛼 , 𝛼 − 𝛼 , 𝛼 − 𝛼 , . . , 𝛼 − 𝛼 } = {𝛾 , 𝛾 , 𝛾 , 𝛾 , 𝛾 , 𝛾 , 𝛾 , 𝛾 } (4) where 𝛼 represents the central pixel and 𝛼 , 𝛼 , 𝛼 , 𝛼 , . . , 𝛼 represent the neighboring pixels. The intensities of non-unit neighboring pixels are obtained by arithmetic operations, using their adjacent pixel values. Next, the intensity difference between each neighboring pixel and the central pixel is computed. Differences are concatenated to a difference vector that represents the scene. In fact, for each frame, this information is extracted densely for all 9×9 neighborhoods. Features in (3) and (4) are combined into a unified framework as shown in: Λ = ∑ ∑ (exp − 𝜃 − 𝛽 ) (5) This equation considers the entire features of the scene and gathers the statistics of local regions to embed more local information and strengthen the robustness of representing complicated crowd motion patterns for abnormal situation detection. The classification of a crowded scene into normal and abnormal situations, involves exploiting a multi-class support vector machine [3]. Two benchmark datasets were used and the training parts of these datasets consist of panic situation and non-pedestrian entity detection. Therefore, a function that shows at most ε deviation from the actually chosen labels for the training data of both datasets is required. This function was intended to be very flat according to the training data of both datasets. In order to elaborate the theoretical background, errors are not considered as long as they are less than ε. The importance of this theory arises from the fact that it is desired to ensure that error lies in some acceptable margin during the classification process of different abnormal events. To start the classification procedure according to the multi-class support vector machine, a linear function is considered as: 𝑓(𝑥) =< 𝑤, 𝑥 > +𝑏 𝑤𝑖𝑡ℎ 𝑤𝜖 𝜒, 𝑏 𝜖 ℝ (6) where <. , . > represents the dot product. Smoothness in this equation is related to the convergence of the smaller w, and this is only possible by minimizing its norm. This process belongs to convex optimization formulation according to: minimize ‖ω‖ subject to y −< w, x > −b ≤ ϵ < w, x > +b − y ≤ ϵ (7) According to (7), convex optimization is possible if enough training data are available. For this reason, two benchmark datasets were used to train the multi class support vector machine, taking full advantage of the convex optimization process. Standalone features without integration are not very effective, since the physical structure of crowd scene changes over a temporal window. In addition, single type of features is very sensitive to background and illumination changes, scale variations, and crowd flow direction. To handle these problems, the integration of different robust and reliable features should be explored. For instance, entities in the crowd scattered in the scene with a still background, consist a very challenging environment in terms of modeling different events. Therefore, a method based on single type of features suffers if the uncertainty in crowd flow increases over a large scale, such as a uniform crowd flow versus a random one. The outcome of the traditional and single type of features would be unreliable in such situations. Moreover, if the flow of the crowd is in one direction and it changes randomly due to the occurrence of either panic situation or a non-pedestrian entity, the same features would behave differently. Therefore a unified model for different features is needed, modeling the random flow of crowd and being robust to different variations. The proposed method can effectively cope with these issues. IV. RESULTS The detailed experimental analysis procedure utilized the widely used benchmark datasets UCSD [53] and UMN [54]. These datasets are properly annotated benchmarks for the analysis of abnormal detection and localization in crowded scenes. UCSD consists of anomalous entities represented by non-pedestrian entities in scenes. The videos of this dataset were captured with a CCTV camera fixed at an elevation, at a resolution of 238×158 at 10fps, overlooking people in pedestrian pathways. Non-person objects in pathways and anomalous pedestrian motion patterns are treated as abnormal situations. In this dataset, the abnormal entities are bikers, skaters, small carts, and people walking across a pathway or in the park. The video clips in the dataset are divided into 2 subsets: Ped1 and Ped2, and each one is associated with a different crowd scene. Videos recorded from each scene were categorized into different clips, each of them consisting of about 200 frames. Ped1 consists of 34 training and 36 testing videos. Ped2 consists of 16 training and 14 testing videos. In each video, the ground truth annotation includes a binary flag per frame, showing whether an anomalous entity is present. Moreover, there is a subset consisting of 10 videos with manually produced pixel-level binary masks, which recognize the parts or sub-parts consisting of abnormal events with clear boundaries. This was generated for performance analysis, with respect to the ability of anomaly localization and segmentation. Sample images from the UCSD dataset are depicted in Figure 2. In the top row, non-pedestrian entities are shown, which are not allowed in these pedestrian pathways. Fig. 2. Pedestrians walking with other entities present. Images from UCSD © SVCL Qualitative and quantitative experimental analyses were performed. The qualitative results are shown in Figure 3, where Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5412-5418 5416 www.etasr.com Ullah & Altamimi: Panic Detection in Crowded Scenes the top row shows sample frames taken from four video sequences. The bottom row shows the anomalous entities annotated and highlighted in light blue. The bicycle riders in the bottom row are detected as anomalous entities, since they are not allowed to use the pedestrian pathways. Similarly, a vehicle is detected in the bottom left frame. Fig. 3. The top row shows sample frames taken from four video sequences. The bottom row shows anomalous entities annotated and highlighted in different colors. Images from UCSD © SVCL For the quantitative analysis, Table II presents the area as a ROC curve (AUC) of the tested methods, in which a larger AUC score shows improved classification results. As it can be noted, the proposed method outputs competitive results against the five reference methods. TABLE II. UCSD: AUC PERFORMANCE COMPARISON Method STTM [4] ACBM [5] MDT [6] DMA [7] OFTA [8] Proposed AUC 0.77 0.74 0.77 0.73 0.69 0.81 Moreover, experimental analysis in the form of Equal Error Rate (EER) was performed. Figure 4 shows the results in graphs for the five references and the proposed method. Blue and red graphs represent the results for Ped1 and Ped2 subsets respectively. We can see that the proposed method has smaller EER errors compared to the five reference methods. Fig. 4. UCSD Dataset: Frame level equal error rate comparison The UMN dataset presents both normal and abnormal crowd video sequences. This dataset comprises of three indoor and outdoor scenes, showing 11 different scenarios of panic events. The dataset consists of 7739 frames in total, with resolution of 320x240 pixels. Each video starts with normal human behaviors, such as walking or standing. Figure 5 depicts sample images from the UMN dataset. Fig. 5. People walking and standing. Images from [54], © University of Minnesota Qualitative results for the UMN dataset are depicted in Figure 6, where the top row shows sample frames taken from four video sequences. The bottom row shows anomalous entities annotated and highlighted in light blue. Panic is detected and highlighted when pedestrians start running in different directions. Table III shows the area under a ROC curve (AUC) of the five references and the proposed method. Again, the proposed method has larger AUC score than the referenced methods. Fig. 6. UMN Dataset: Normal frames and panic detection on them. Images from [54], © University of Minnesota TABLE III. UMN: AUC PERFORMANCE COMPARISON Method STTM ACBM MDT DMA OFTA Proposed AUC 0.91 0.75 0.84 0.87 0.79 0.93 Experimental analysis in the form of EER was performed, considering the UMN dataset. Figure 7 shows the results provided for the five references and the proposed method. Blue Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5412-5418 5417 www.etasr.com Ullah & Altamimi: Panic Detection in Crowded Scenes and red graphs represent the results for the first two and the following two video sequences of the UMN respectively. As can be noted, the proposed method has smaller EER errors compared to the five reference methods. Fig. 7. UMN Dataset: Frame level equal error rate comparison V. CONCLUSION Many diverse approaches have been proposed to solve the problem of panic detection and anomaly identification. However, most of them are designed to work on specific scenes, where different representations of motion and appearance are analyzed with different models. In this study, the properties of abnormal situations were considered specifically as anomalies in terms of panic, and more general, as anomalies in terms of non-pedestrian entities. Abnormal entities are rare things with unexpected appearance or motion patterns. For anomalous situation detection including panic and non-pedestrian entities, a novel technique was proposed, where dense motion trajectories were computed from the crowd input videos. Considering the computed trajectories, a set of robust features was designed considering HOG, HOF, and MBH. Motion atoms were explored for compact encoding of motion patterns in crowded scenes, representing distinguished motion patterns of crowds. In fact, motion atoms are mid-level characteristics to fuse the distance between low and high-level features for capturing anomalous situations. Since an anomalous situation is described from the view of a feature set, the proposed method can be utilized in different surveillance scenes. Moreover, hybrid features and motion atoms do not limit the type of features or the type of scenes, which helps in extending the proposed technique to broader research fields. The experimental results demonstrated that the proposed approach is effective for real crowd videos containing various types of normal and abnormal activities, in terms of panic situation and the existence of non-pedestrian entities. The proposed method is independent of crowd flow variability and density over temporal windows. Furthermore, the method is not sensitive to crowd concentration in different locations of different scenes. Experimental evaluation was performed considering two benchmark datasets and the proposed method outperformed five known methods both quantitatively and qualitatively. ACKNOWLEDGMENT This research has been supported by the Research deanship at the University of Hail under the grant number 160778. REFERENCES [1] L. Fei, B. Zhang, Y. Xu, D. Huang, W. Jia, J. Wen, “Local discriminant direction binary pattern for palmprint representation and recognition”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 30, No. 2, pp. 468-481, 2020 [2] W. Zhang, W. Zhang, K. Liu, J. Gu, “A feature descriptor based on local normalized difference for real-world texture classification”, IEEE Transactions on Multimedia, Vol. 20, No. 4, pp. 880-888, 2017 [3] J. T. Zhou, I. W. Tsang, S. S. Ho, K. R. Muller, “N-ary decomposition for multi-class classification”, Machine Learning, Vol. 108, No. 5, pp. 809-830, 2019 [4] Y. Hao, Z. J. Xu, Y. Liu, J. Wang, J. L. Fan, “Effective crowd anomaly detection through spatio-temporal texture analysis”, International Journal of Automation and Computing, Vol. 16, No. 1, pp. 27-39, 2019 [5] Y. Liu, K. Hao, X. Tang, T. Wang, “Abnormal crowd behavior detection based on predictive neural network”, 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, October 17, 2019 [6] V. Mahadevan, W. Li, V. Bhalodia, N. Vasconcelos, “Anomaly detection in crowded scenes”, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Fransisco, USA, August 5, 2010 [7] J. Ramos, N. Nedjah, L. de Macedo Mourelle, B. B. Gupta, “Visual data mining for crowd anomaly detection using artificial bacteria colony”, Multimedia Tools and Applications, Vol. 77, No. 14, pp. 17755-17777, 2018 [8] P. Ingole, V. Vyas, “Anomaly detection in crowd using optical flow and textural feature”, in: Soft Computing and Signal Processing, Advances in Intelligent Systems and Computing, Vol. 900, pp. 723-732, Springer, 2019 [9] S. D. Khan, M. Tayyab, M. K. Amin, A. Nour, A. Basalamah, S. Basalamah, S. A. Khan, “Towards a crowd analytic framework for crowd management in Majid-al-Haram”, 17th Scientific Meeting on Hajj & Umrah Research, 2017 [10] K. Ahmad, N. Conci, F. G. De Natale, “A saliency-based approach to event recognition”, Signal Processing: Image Communication, Vol. 60, pp. 42-51, 2018 [11] H. Ullah, A. B. Altamimi, M. Uzair, M. Ullah, “Anomalous entities detection and localization in pedestrian flows”, Neurocomputing, Vol. 290, pp. 74-86, 2018 [12] R. Nawaratne, D. Alahakoon, D. De Silva, X. Yu, “Spatiotemporal anomaly detection using deep learning for real-time video surveillance”, IEEE Transactions on Industrial Informatics, Vol. 16, No. 1, pp. 393- 402, 2019 [13] K. Xu, T. Sun, X. Jiang, “Video anomaly detection and localization based on an adaptive intra-frame classification network”, IEEE Transactions on Multimedia, Vol. 22, No. 2, pp. 394-406, 2019 [14] J. Wan, A. Chan, “Adaptive density map generation for crowd counting”, IEEE International Conference on Computer Vision, Seoul, South Korea, October 27 – November 2, 2019 [15] X. Jiang, Z. Xiao, B. Zhang, X. Zhen, X. Cao, D. Doermann, L. Shao, “Crowd counting and density estimation by trellis encoder-decoder networks”, IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, June 15-20, 2019 [16] H. Yu, G. Pan, L. Zhang, Z. Li, M. Pan, “Translation domain segmentation model based on improved cosine similarity for crowd motion segmentation”, Journal of Electronic Imaging, Vol. 28, No. 2 2019 [17] N. Bisagno, B. Zhang, N. Conci, “Group LSTM: Group grajectory prediction in crowded scenarios”, in: Proceedings of the European conference on computer vision (ECCV), pp. 213-225, Springer, 2018 [18] R. Trabelsi, I. Jabri, F. Melgani, F. Smach, N. Conci, A. Bouallegue, “Complex-valued representation for RGB-D object recognition”, in: Pacific-Rim Symposium on Image and Video Technology, pp. 17-27. Springer, Cham, 2017 [19] L. Pan, H. Zhou, Y. Liu, M. Wang, “Global event influence model: integrating crowd motion and social psychology for global anomaly Engineering, Technology & Applied Science Research Vol. 10, No. 2, 2020, 5412-5418 5418 www.etasr.com Ullah & Altamimi: Panic Detection in Crowded Scenes detection in dense crowds”, Journal of Electronic Imaging, Vol. 28, No. 2, 2019 [20] M. Xu, Z. Ge, X. Jiang, G. Cui, P. Lv, B. Zhou, C. Xu, “Depth information guided crowd counting for complex crowd scenes”, Pattern Recognition Letters, Vol. 125, pp. 563-569, 2019 [21] X. Alameda-Pineda, E. Ricci, N. Sebe, “Multimodal behavior analysis in the wild: an introduction”, in: Multimodal Behavior Analysis in the Wild, pp. 1-8. Academic Press, 2019 [22] M. Ullah, H. Ullah, N. Conci, F. G. B. De Natale, “Crowd behavior identification”, 2016 IEEE International Conference on Image Processing, Phoenix, USA, September 25-28, 2016 [23] H. Kim, J. Han, S. Han, “Analysis of evacuation simulation considering crowd density and the effect of a fallen person”, Journal of Ambient Intelligence and Humanized Computing, Vol. 10, No. 12, pp. 4869- 4879, 2019 [24] Y. Hao, Z. Xu, Y. Liu, J. Wang, J. Lun Fan, “Effective crowd anomaly detection through spatio-temporal texture analysis”, International Journal of Automation and Computing, Vol. 16, No. 1, pp. 27-39, 2019 [25] J. Li, L. Wei, F. Zhang, T. Yang, Z. Lu, “Joint deep and depth for object-level segmentation and stereo tracking in crowds”, IEEE Transactions on Multimedia, Vol. 21, No. 10, pp. 2531-2544, 2019 [26] K. Shimura, S. D. Khan, S. Bandini, K. Nishinari, “Simulation and evaluation of spiral movement of pedestrians: towards the tawaf simulator”, Journal of Cellular Automata, Vol. 11, No. 4, 2016 [27] X. Zhang, X. Shu, Z. He, “Crowd panic state detection using entropy of the distribution of enthalpy”, Physica A: Statistical Mechanics and its Applications, Vol. 525, pp. 935-945, 2019 [28] D. Kang, Z. Ma, A. B. Chan, “Beyond counting: comparisons of density maps for crowd analysis tasks—counting, detection, and tracking”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, No. 5, pp. 1408-1422, 2018 [29] M. Dimitrievski, P. Veelaert, W. Philips, “Behavioral pedestrian tracking using a camera and lidar sensors on a moving vehicle”, Sensors, Vol. 19, No. 2, 2019 [30] T. Figueiredo, R. Castro, “Passengers perceptions of airport branding strategies: the case of Tom Jobim International Airport–RIOgaleoo, Brazil”, Journal of Air Transport Management, Vol. 74, pp. 13-19, 2019 [31] Z. Zhang, K. Fu, X. Sun, W. Ren, “Multiple target tracking based on multiple hypotheses tracking and modified ensemble Kalman filter in multi-sensor fusion”, Sensors, Vol. 19, No. 14, 2019 [32] D. Ji, H. Lu, T. Zhang, “End to end multi-scale convolutional neural network for crowd counting”, in: Eleventh international conference on machine vision (ICMV 2018), Vol. 11041, International Society for Optics and Photonics, 2019 [33] K. Singh, S. Rajora, D. K. Vishwakarma, G. Tripathi, S. Kumar, G. S. Walia, “Crowd anomaly detection using aggregation of ensembles of fine-tuned ConvNets”, Neurocomputing, Vol. 371, pp. 188-198, 2020 [34] S. K. Tan, N. Hu, W. Cai, “A data-driven path planning model for crowd capacity analysis”, Journal of Computational Science, Vol. 34, pp. 66-79, 2019 [35] S. D. Bansod, A. V. Nandedkar, “Crowd anomaly detection and localization using histogram of magnitude and momentum”, The Visual Computer, Vol. 36, pp. 609-620, 2020 [36] N. Conci, N. Bisagno, A. Cavallaro, “On modeling and analyzing crowds from videos”, in: Computer Vision for Assistive Healthcare, pp. 319-336, Academic Press, 2018 [37] M. S. Zitouni, A. Sluzek, H. Bhaskar, “Towards understanding socio- cognitive behaviors of crowds from visual surveillance data”, Multimedia Tools and Applications, 2019 [38] M. Marsden, K. Mc Guinness, S. Little, N. E. O’ Connor, “Holistic features for real-time crowd behaviour anomaly detection”, IEEE International Conference on Image Processing, Phoenix, USA, September 25-28, 2016 [39] A. B. Chan, N. Vasconcelos, “Counting people with low-level features and bayesian regression”, IEEE Transactions on Image Processing, Vol. 21, No. 4, pp. 2160-2177, 2011 [40] M. D. Zeiler, G. W. Taylor, R. Fergus, “Adaptive deconvolutional networks for mid and high level feature learning”, 2011 International Conference on Computer Vision, Barcelona, Spain, November 6-13, 2011 [41] L. Boominathan, S. S. S. Kruthiventi, R. V. Babu, “Crowdnet: a deep convolutional network for dense crowd counting”, in: Proceedings of the 24th ACM international conference on ,ultimedia, pp. 640-644, ACM, 2016 [42] S. Yi, X. Wang, C. Lu, J. Jia, H. Li, “L0 regularized stationary-time estimation for crowd analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 5, pp. 981-994, 2017 [43] D. B. Sam, S. Surya, R. V. Babu, “Switching convolutional neural network for crowd counting”, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, July 21-26, 2017 [44] M. Marsden, K. Mc Guinness, S. Little, N. E. O’ Connor, “Resnetcrowd: a residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification”, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, August 29–September 1, 2017 [45] V. A. Sandagi, V. M. Patel, “Generating high-quality crowd density maps using contextual pyramid CNNs”, 2017 IEEE International Conference on Computer Vision, Venice, Italy, October 22-29, 2017 [46] L. B. Sallah, F. Fourati, “Systems modeling using deep Elman Neural Network”, Engineering, Technology & Applied Science Research, Vol. 9, No. 2, pp. 3881-3886, 2019 [47] S. R. Basha, J. K. Rani, “A comparative approach of dimensionality reduction techniques in text classification”, Engineering, Technology & Applied Science Research, Vol. 9, No. 6, pp. 4974-4979, 2019 [48] B. Alefs, G. Eschemann, H. Ramoser, C. Beleznai, “Road sign detection from edge orientation histograms”, 2007 IEEE Intelligent Vehicles Symposium, Istanbul, Turkey, June 13-15, 2007 [49] J. Clemons, “SIFT: scale invariant feature transform”, available at: https://pdfs.semanticscholar.org/19d1/c9a4546d840269ef534f6c1c8e379 8ce81ac.pdf [50] P. H. Gosselin, N. Murray, H. Jegou, F. Perronnin, “Revisiting the fisher vector for fine-grained classification”, Pattern Recognition Letters, Vol. 49, pp. 92-98, 2014 [51] S. Dasgupta, “Experiments with random projection”, in: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 143-151, ACM, 2013 [52] G. V. de Lima, P. T. Saito, F. M. Lopes, P. H. Bugatti, “Classification of texture based on bag-of-visual-words through complex networks”, Expert Systems with Applications, Vol. 133, pp. 215-224, 2019 [53] Statistical Visual Computing Lab–UC San Diego, “UCSD anomaly detection dataset”, available at: http://www.svcl.ucsd.edu/projects/ anomaly/dataset.htm [54] University of Minnesota, “Monitoring Human Activity”, available at: http://mha.cs.umn.edu/