Microsoft Word - ETASR_V11_N1_pp6724-6729 Engineering, Technology & Applied Science Research Vol. 11, No. 1, 2021, 6724-6729 6724 www.etasr.com Sahel et al.: Logo Detection Using Deep Learning with Pretrained CNN Models Logo Detection Using Deep Learning with Pretrained CNN Models Salma Sahel College of Computer and Information Systems Umm Al-Qura University Makkah, Saudi Arabia salma.a.sahel@gmail.com Mashael Alsahafi College of Computer and Information Systems Umm Al-Qura University Makkah, Saudi Arabia m.e.alsahafi@hotmail.com Manal Alghamdi College of Computer and Information Systems Umm Al-Qura University Makkah, Saudi Arabia maalghamdi@uqu.edu.sa Tahani Alsubait College of Computer and Information Systems Umm Al-Qura University Makkah, Saudi Arabia tmsubait@uqu.edu.sa Abstract—Logo detection in images and videos is considered a key task for various applications, such as vehicle logo detection for traffic-monitoring systems, copyright infringement detection, and contextual content placement. The main contribution of this work is the application of emerging deep learning techniques to perform brand and logo recognition tasks through the use of multiple modern convolutional neural network models. In this work, pre-trained object detection models are utilized in order to enhance the performance of logo detection tasks when only a portion of labeled training images taken in truthful context is obtainable, evading wide manual classification costs. Superior logo detection results were obtained. In this study, the FlickrLogos-32 dataset was used, which is a common public dataset for logo detection and brand recognition from real-world product images. For model evaluation, the efficiency of creating the model and of its accuracy was considered. Keywords—logo detection; deep learning; convolutional neural networks; FlickrLogos-32 I. INTRODUCTION Logo Detection (LD) is important for several real-world applications [1] by enhancing the ability of users/systems to recognize the identities of items by their brand logos. As a result, LD is an area that many companies aim to explore to captivate their clients and make informed decisions regarding brand development [2]. LD is a sub-field of object detection which is the task of identifying wherever there is a specific object in an image. We are not only interested in simply identifying the existence of an object, but also to locate it within the image. There are many classical approaches to object detection such as key point-based detectors and local feature-based recognition. More recently, there has been a growing interest in using Convolutional Neural Networks (CNNs) to perform object detection tasks. This approach usually starts by capturing images by camera devices, probably experimenting with different resolutions, and then processing the images in order to be able to classify the images of the objects contained within [3]. More specifically, three steps can be used in traditional object detection and classification approaches, and these are: informative region selection, feature extraction, and classification [4]. In region selection, it is probable to scan the whole image consuming a multi-scale sliding window, as numerous substances (objects) might appear in different positions with several sizes and feature ratios. There are differences between LD and Logo Recognition (LR). LR is used to detect a logo's identity in an image, whereas LD is considered to detect a logo's position [5]. Both LD and LR are based on classification methods. For example, [6] proposed the use of CNNs for a recognition pipeline comprising a recall-oriented logo region. A CNN, in general, is considered as a widely used method in both LD and LR. Research in LD has been addressed in several directions where general object detection with deep learning methods has always been an excessive realization. Construction of a deep object detection model classically needs many categorized training data collected using wide manual classification. In this paper, RCNN, FRCNN and RetinaNet were used for logo detection. This paper presents the current LD methods and sheds light on common datasets used in previous research. II. LITERATURE REVIEW One of the aims of this research is to investigate the area of LD and examine the state-of-the-art approaches. It should be noted that due to the desire to explore new trends in LD, which is a very fast-growing area, we focused on papers published during the last five years or papers that have shown a considerable impact in terms of number of citations. Table I shows the summary of the studied and referenced articles. Corresponding author: Tahani Alsubait Engineering, Technology & Applied Science Research Vol. 11, No. 1, 2021, 6724-6729 6725 www.etasr.com Sahel et al.: Logo Detection Using Deep Learning with Pretrained CNN Models TABLE I. SUMMARY OF THE REVIEWED ARTICLES Reference Year Approach Dataset Performance [7] 2019 YOLOv3 VLD-3 76.6-98.8 [8] 2017 Faster R-CNN FlickerLogo 0.52-0.67 [1] 2019 OSLD - SDML BLAC and Flickr-32 84.5-90.9 [9] 2018 RetinaNet INbreast - GURO 86-1.00 [10] 2017 Faster R-CNN FlickrLogos mAP 51-66% [11] 2020 Faster R-CNN and YOLOv2 WebLogo-2M mAP 36.8-46.9 [12] 2017 DenseNet16 -, ResNet101 -, VGG16 FlickrLogos-32 SportsLogos 0.37-0.46 [13] 2017 Scalable Logo Self Training (SLST) WebLogo-2M mAP 34.37% [14] 2007 ANN and Support Vector Machine (SVM), Fisher classifier Tobacco-800 39.3-84.2% [5] 2015 RCNN – FRCN – SPPnet Logos-18 test set Logos-160 test set 81.3-95.2% [15] 2019 Faster R-CNN, SSD, YOLOv3 FlickrLogos-32, PL2K 0.565 mAP [6] 2017 CNN FlickrLogos-32 Logos-32plus 0.1-95.8% [2] 2016 Fast Region-based Convolutional Networks (FRCN) ILSVRC, FlickrLogos-32 mAP 54.5-73.74% [16] 2018 Retina U-Net, Mask R-CNN, Faster R-CNN +, U-Faster R- CNN, + DetU-Net LIDC-IDRI mAP 29-50.5% [17] 2017 Faster R-CNN FlickrLogo-32 TopLogo-10 mAP 20.5-81.1% [18] 2020 DenseNet-CNN FlickrLogo-32 92.8% A. Logo Detection Methods Several frameworks have been proposed in the LD area, and experimental evaluations show different results. For example, in [2], CNNs were shown to be a reliable technique for LD, containing several layers that are similar to simple and complex cells in the primary visual cortex. In particular, it adopts a hierarchical structure that enables the recognition of visual patterns arising from image pixels. The convolutional layers alternate with other subsampling layers that have varying sizes. As a result, they help enhancing image clarity and improve LD. Other researchers [11, 13] indicated that the technique of scalable logo self-co-learning can self-discover informative training pictures from the blaring web information. In particular, this can be helpful to enhance model capacity in a cross-model co-learning approach [13]. The scalability of the logo is of concern because it determines the level of categorization possible and the efficiency of the LD approach. The process enables the collection of many unconstrained images for review. The images need not undergo any fine- grained instance level of labeling for the necessary aspects to become clearer to the users. However, sometimes, the use of proxies for the scalable LR enables the classification and identification of the logos [15]. In particular, this is because there is no clear definition of a logo and logos vary like brands. The proxies, however, facilitate the re-training of images to integrate all variations. Although it may seem impractical, the utilization of proxies in scalable LR involves the use of a universal logo detector and a few-shot logo recognizer. As a result, the use of proxies can become a reliable tool because it can enable the users to enhance clarity and assess the image better. Additionally, companies can utilize the proxies in scalable LR to achieve high precision and make informed decisions. Authors in [16] used Retina U-Net for semantic separation in classifying medical images. Retina U-Net is a useful tool for logo detection and classification, which can guarantee the desired results for image analysis. Retina U-Net can be a reliable LD and image markup technology. Companies can rely on this technology to enhance the clarity of their brands. They can make brands more attractive and can be easily valued by customers. In this regard, Retina U-Net can become a suitable tool through which companies can attract customers. Authors in [14] proposed an LD method which used a boosting approach through multiple image scales particularly for LD and image extraction. It uses a trained fisher classifier, a kind of deep learning approach, to perform initial classification that can identify features from document context and linked components. In particular, each logo area is classified by a cascade of simple classifiers to continuously improve scales. It is considered to be able to detect regions to be refined and ignores false alarms. Early research produces weak performance of Faster R-CNN [10]. Therefore, the researchers in [8, 10] introduced a Fast RCNN approach which is based on deep neural networks where the convolutional layers are used to extract gradually abstract feature representations by using previous learned convolutions, then apply a non-linear activation task to the image. More recently, authors in [18] proposed a transfer-learning-based method aided by the use of Densely Connected Convolutional Networks (DenseNet) for logo recognition. They applied their proposed method to the FlickerLogos-32 dataset. In Computer Aided Diagnosis (CAD) systems [9], the system utilizes a mass detection model based on RetinaNet. RetinaNet is a kind of deep CNN, where an object detector is mainly expected as a one-stage object detector that is fast and effective while achieving improved performance. However, new research in this area [11] argues that the current LD approaches typically consider a small number of logo sessions, with limited images per session and presume fine-tuning associated to each object annotation. However, this produces the problem of limited ability to be scalable to dynamic applications in the real world. Their proposed approach tries to overcome these challenges by ignoring manual labeling and directly exploring web data learning principles. In particular, it proposes an incremental learning approach, named Scalable Logo Self-Co-Learning (SL 2). This method can automatically self-discover informative training images from noisy web data for increasingly improving model ability in a cross-model co- learning means. Engineering, Technology & Applied Science Research Vol. 11, No. 1, 2021, 6724-6729 6726 www.etasr.com Sahel et al.: Logo Detection Using Deep Learning with Pretrained CNN Models B. Datasets Used The datasets that have been used in recent studies come from various sources. Authors in [14] utilized the Tobacco-800 dataset which consists of 42 million pages of documents, whereas authors in [8, 10] have used four publicly available datasets. Authors in [11] introduced a very large logo dataset which contains: 2,190,757 images of 194 logo classes, named “WebLogo-2M” by designing an automatic approach for data collection and processing by automatically sampling web logo images from social media sources (Twitter). In other studies, such as [2, 6, 15], the public dataset FlickrLogos-32 was used. This dataset is one of the most commonly used datasets in the field of logos, it contains 8,248 pictures, with 32 images per brand. Authors in [15] introduced a new logo dataset containing 2000 logos and 295K images collected from Amazon, called PL2K. The dataset used in the current paper is FlickrLogos-32 [19]. This dataset has 32 logos each has many examples and it also has well-articulated annotations. This is the most prevalent logo detection dataset containing and it has been used for the purposes of this research for comparisons with existing approaches in the area. Sample data from this dataset can be seen in [6, 19]. C. Evaluation Stage Evaluation of the considered methods has been conducted in a variety of ways through several steps. The framework presented in [10] has been evaluated on the dataset in order to improve the detection performance on small object instances. The evaluation of the approach presented in [8] was conducted on the dataset improving the RPN performance from 0.52 to 0.71 (mAP) and the detection performance from 0.52 to 0.67 (mAP). A description of the evaluation metrics will be presented in the following sections. Other researchers [2] used metrics of performance for measuring LD performance such as the Average Precision (AP) for every logo class and the mean AP (mAP) for all classes, where the detection is considered true when the Intersection over Union (IoU) overtakes 50%. In addition, two evaluations per each detector type (single and five shot) were used in [15]. The Faster R-CNN model worked best with a mAP of 0.56558, where the mAP was decided by region proposals with a class detection threshold of 0.5. For example, the proposed models were trained on PL2K and evaluated on FlickrLogos32 to achieve new state-of-the-art performance of 56.55% mAP. In the evaluation stage in [15], the researchers used an evaluation metric based on labeled ground truth to measure the quality of LD. In [11], the evaluation has been conducted by using extensive comparative evaluations demonstrating the superiority of SL 2 over the state-of-the-art contemporary web data learning methods and strongly weakly supervised detection models. The experimental evaluation of [9], shows that the considered model extracts inconsistent mass features from the single dataset as well as the combined dataset whose mammograms are collected from different sources, which proposes the ability of the model to be applied to different groups. The evaluation of the proposed model in [9] has been addressed in setups consuming pre- trained weights, which uses weights pre-trained on GURO, training and testing on INbreast. This shows that consuming the pre-trained weights on datasets produces the same performance as directly consuming datasets in the training stage. III. METHODOLOGY The studied dataset has been examined with the use of some pre-trained deep learning models for object detection based on CNNs. A. CNN Models 1) RetinaNet RetinaNet [20] consists of a support network, and binary sub-nets that use various features of the maps of the provision network. One establishment subnet categorizes the instance of the image, and one regression subnet registers out the bounding box. The model workflow is: • Loading and preparing training data. • Training a deep neural network using RetinaNet. • Evaluating the model. • Using the model for inference. 2) Faster R-CNN Faster R-CNN [21] assimilates the area suggestion algorithm into the CNN model. Faster R-CNN model is composed of an RPN (Region Proposal Network) and a firm R- CNN with communal convolutional feature layers. The model workflow is: • Using the image classification pre-trained model. • Fine-tuning the RPN for the area proposal task, which is prepared using the pre-trained image classification model where optimistic samples have IoU>0.7, while undesirable samples have IoU <0.3. o Slips a small n×n three-dimensional gap over the convention of the feature map of the complete image. o At the middle of each gliding window, it forecasts numerous areas of numerous balances and ratios concurrently. A presenter is a grouping of gliding the window center, scale, and ratio. For instance, for 3 scales + 3 ratios, then k=9 anchors at each gliding position. • Trains the Fast R-CNN LD model by the usage of the proposals formed by the current RPN. • Uses the Fast R-CNN system to modify the RPN training. Through the observance of the public convolutional layers, solitarily fine tunes the RPN-precise layers. At this level, the RPN and the finding network have public convolutional layers. 3) R-CNN R-CNN [22] is one of the most common deep learning frameworks used to detect objects on a large scale. It is a combination of CNNs and region suggestion. The model workflow is as follows: Engineering, Technology & Applied Science Research Vol. 11, No. 1, 2021, 6724-6729 6727 www.etasr.com Sahel et al.: Logo Detection Using Deep Learning with Pretrained CNN Models • Train the CNN network for image classification. • Propose category-independent regions of interest by selective search. • The areas of the objects are warped to make sure that they are in place for the fixed size needed by the CNN. • Move on with the fine-tuning of the CNN which is warped over the proposal areas for K + 1 classes. The extra class is associated with the background (does not have many objects of interest). • Assure that every image in the area, with one or more of the forwarded flows of movement throughout the CNN makes a feature vector. • Decrease the localization of the errors. B. Evaluation Measures 1) Localization and Intersection over Union As can be seen in [22], the object detection of the dataset can be estimated accurately by the IoU which is known to be used in the evaluation stage [23]. The predication of locating the object is determined on how strict the model is, by the evaluation of the function of object localization of the model. This is usually accomplished by drawing a bounding box around the desired object [22]. The function of localization is evaluated by the IoU: IoU � ���� ��� ��������� �� ��� ∩��� ���� ���� �� ���� ���� ��� ��������� �� ��� ∪ ��� ���� ���� �� ���� � 0.5 (1) Each object is associated with one bounding box, but in some cases, the bounding boxes might be more than one. When there are more than one bounding boxes regarded to one object, one box will either be True Positive (TP) or False Positive (FP) and the other box is vice versa. Yet, an object can be recognized as False Negative (FN) when there is no predicted bounding box. 2) Mean Average Precision mAP is one of the most popular metrics in measuring the accuracy of object detectors, e.g. R-CNN. It is also called precision-recall for detecting bounding boxes. Precision-recall is considered to measure how the network understands importance and how it removes invalid information. mAP enhances the information produced by precision-recall. The prediction is more accurate when the mAP score is higher. mAP � ! ∑ APi�$ ! (2) The AP is the average of class predictions (N) measured over various thresholds, i.e., it is the average of precision values for various recall levels [24]. The current study aligns with the approach in [24]. In particular, in the current study, the AP curves for each logo class are based on the pre-trained models (R-CNN, Faster R-CNN, RetinaNet). IV. IMPLEMENTATION Initially, we started by cleaning the dataset and checking if the annotation files were correct. Then the dataset needed to be fed into each of the training models. Since it takes too much time to build the model, the model needs to be stored and then reused. After that, we chose samples of pictures from the dataset to train each model on them, in order to produce weights. The remaining images were used for testing the models in order to find each one's testing accuracy. For implementation, Colab notebook was used to execute code on Google’s cloud servers. Hardware specifications were: CPU: AMD ryzen 7, GPU: 2080ti, Ram: 16GB, GPU capacity needed for the build: 2.3GB for R-CNN, 1.5GB for FR-CNN, and 3.1GB for RetinaNet. V. RESULTS AND DISCUSSION It has been found that R-CNN takes more time and GPU space, but obtained higher accuracy. Hence, the increased accuracy comes with a cost. This is because the FR-CNN first applies CNN and then the zones where compared to the R- CNN which makes the regions first and then applies CNN. When comparing the R-CNN and the FR-CNN to the RetinaNet, it was revealed that the RetinaNet takes fewer test time when compared to R-CNN, it demands more training time when compared to the FR-CNN and it takes more space than both R-CNN and FR-CNN because it checks layers more, because the quality of the image matters less to RetinaNet. Fig. 1. R-CNN performance. As shown in Figure 1, there is a considerable increase in the testing accuracy whereas there is a decrease in the training loss for the first 1000 iterations. Similarly, as shown in Figure 2, when the iterations increase, the training loss decreases linearly and the test accuracy increases significantly. For the first 600 iterations the training loss is close to 0.3. As shown in Figure 3, with an increase in the training iterations, the loss decreases linearly and the test accuracy increases significantly for the first 3000 iterations, whereas the training loss is close to 0 after that. Table II shows the comparison of the results of the current and previous research. We have outperformed the Faster R-CNN model of previous research results by 12% in mAP. We achieved high accuracy by training the dataset on the RCNN model, reaching 99.8%, a difference of 8.8% from [6]. Our model also achieved a high accuracy of 95.2% with RetinaNet Engineering, Technology & Applied Science Research Vol. 11, No. 1, 2021, 6724-6729 6728 www.etasr.com Sahel et al.: Logo Detection Using Deep Learning with Pretrained CNN Models model with threshold 0.5, a difference of 0.6% behind the model presented in [24], at the same threshold. Fig. 2. Faster-RCNN performance. Fig. 3. RetinaNet performance. TABLE II. RESULT COMPARISON FOR METHODS THAT USED THE FLICKRLOGOS-32 DATASET Method Precision Recall Accuracy mAP [25] 0.909 0.845 0.884 - [26] - - 0.896 - [2] 0.955 0.908 - - [6] 0.976 0.676 0.910 - [18] - - 0.928 - [12] - - - 0.464 [15] - 0.798 - 0.565 [22] RetinaNet: threshold 0.5 0.948 0.829 0.946 0.620 Ours: RetinaNet: threshold 0.5 - - 0.952 0.40 Ours: RCNN - - 0.998 0.65 Ours: Faster R-CNN - - 0.936 0.74 VI. CONCLUSION Applying CNNs in LD is a common process for such purposes. Many network architectures have been applied, resulting in varying accuracy. The process involves many challenges as the logos may appear on any scale, position, and under different perspectives in an image. The traditional techniques for LR include key point-based sensors and local feature-based recognition. However, by using different CNN models such as R-CNN and FR-CNN, better results can be achieved as it has been discussed in this study. As newer CNN models get available, more experiments should be done, taking into account the trade-offs between accuracy, training and development cost. REFERENCES [1] M M. Bastan, H.-Y. Wu, T. Cao, B. Kota, and M. Tek, “Large Scale Open-Set Deep Logo Detection,” Nov. 2019, Accessed: Jan. 08, 2021. [Online]. Available: http://arxiv.org/abs/1911.07440. [2] G. Oliveira, X. Frazao, A. Pimentel, and B. Ribeiro, “Automatic graphic logo detection via Fast Region-based Convolutional Networks,” in International Joint Conference on Neural Networks, Vancouver, Canada, Jul. 2016, pp. 985–991, https://doi.org/10.1109/ IJCNN.2016.7727305. [3] P. Chakraborty and C. Tharini, “Pneumonia and Eye Disease Detection using Convolutional Neural Networks,” Engineering, Technology & Applied Science Research, vol. 10, no. 3, pp. 5769–5774, Jun. 2020, https://doi.org/10.48084/etasr.3503. [4] M. Salemdeeb and S. Erturk, “Multi-national and Multi-language License Plate Detection using Convolutional Neural Networks,” Engineering, Technology & Applied Science Research, vol. 10, no. 4, pp. 5979–5985, Aug. 2020, https://doi.org/10.48084/etasr.3573. [5] S. C. H. Hoi et al., “LOGO-Net: Large-scale Deep Logo Detection and Brand Recognition with Deep Region-based Convolutional Networks,” Nov. 2015, Accessed: Jan. 08, 2021. [Online]. Available: http://arxiv.org/abs/1511.02462. [6] S. Bianco, M. Buzzelli, D. Mazzini, and R. Schettini, “Deep learning for logo recognition,” Neurocomputing, vol. 245, pp. 23–30, Jul. 2017, https://doi.org/10.1016/j.neucom.2017.03.051. [7] S. Yang, J. Zhang, C. Bo, M. Wang, and L. Chen, “Fast vehicle logo detection in complex scenes,” Optics & Laser Technology, vol. 110, pp. 196–201, Feb. 2019, https://doi.org/10.1016/j.optlastec.2018.08.007. [8] C. Eggert, D. Zecha, S. Brehm, and R. Lienhart, “Improving small object proposals for company logo detection,” in ACM on International Conference on Multimedia Retrieval, New York, USA, Jun. 2017, pp. 167–174. [9] H H. Jung et al., “Detection of masses in mammograms using a one- stage object detector based on a deep convolutional neural network,” PLoS ONE, vol. 13, no. 9, Sep. 2018, Art. no. e0203355. [10] C. Eggert, S. Brehm, A. Winschel, D. Zecha, and R. Lienhart, “A closer look: Small object detection in faster R-CNN,” in IEEE International Conference on Multimedia and Expo, Hong Kong, China, Jul. 2017, pp. 421–426, https://doi.org/10.1109/ICME.2017.8019550. [11] H. Su, S. Gong, and X. Zhu, “Scalable logo detection by self co- learning,” Pattern Recognition, vol. 97, Jan. 2020, Art. no. 107006, https://doi.org/10.1016/j.patcog.2019.107003. [12] A. Tuzko, C. Herrmann, D. Manger, and J. Beyerer, “Open Set Logo Detection and Retrieval,” Oct. 2017, Accessed: Jan. 08, 2021. [Online]. Available: http://arxiv.org/abs/1710.10891. [13] H. Su, S. Gong, and X. Zhu, “WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web,” in IEEE International Conference on Computer Vision Workshops, Venice, Italy, Oct. 2017, pp. 270–279, https://doi.org/10.1109/ICCVW.2017.41. [14] G. Zhu and D. Doermann, “Automatic Document Logo Detection,” in Ninth International Conference on Document Analysis and Recognition, Engineering, Technology & Applied Science Research Vol. 11, No. 1, 2021, 6724-6729 6729 www.etasr.com Sahel et al.: Logo Detection Using Deep Learning with Pretrained CNN Models Parana, Brazil, Sep. 2007, vol. 2, pp. 864–868, https://doi.org/ 10.1109/ICDAR.2007.4377038. [15] I. Fehervari and S. Appalaraju, “Scalable Logo Recognition Using Proxies,” in IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village, USA, Jan. 2019, pp. 715–725, https://doi.org/ 10.1109/WACV.2019.00081. [16] P. F. Jaeger et al., “Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection,” vol. 1811, Nov. 2018, Accessed: Jan. 08, 2021. [Online]. Available: http://adsabs.harvard.edu/abs/2018arXiv181108661J. [17] H. Su, X. Zhu, and S. Gong, “Deep Learning Logo Detection with Data Expansion by Synthesising Context,” in IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, USA, Mar. 2017, pp. 530– 539, https://doi.org/10.1109/WACV.2017.65. [18] A. Alsheikhy, Y. Said, and M. Barr, “Logo Recognition with the Use of Deep Convolutional Neural Networks,” Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6191–6194, Oct. 2020, https://doi.org/10.48084/etasr.3734. [19] S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol, “Scalable logo recognition in real-world images,” in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, Apr. 2011, Art. no. 25, https://doi.org/10.1145/1991996.1992021. [20] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in IEEE International Conference on Computer Vision, Venice, Italy, Oct. 2017, pp. 2999–3007, https://doi.org/10.1109/ICCV.2017.324. [21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Oct. 2014, Accessed: Jan. 08, 2021. [Online]. Available: http://arxiv.org/abs/1311.2524. [22] S S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017, https://doi.org/10.1109/TPAMI.2016. 2577031. [23] F F. Al-Azzo, A. M. Taqi, and M. Milanova, “Human Related-Health Actions Detection using Android Camera based on TensorFlow Object Detection API,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 10, pp. 9–23, 2018. [24] F. S. Herrera and J. M. Saavedra, “DLDENet: Deep Local Directional Embeddings with Increased Foreground Focal Loss for object detection,” in 38th International Conference of the Chilean Computer Science Society, Concepcion, Chile, Nov. 2019, pp. 1–8, https://doi.org/10.1109/SCCC49216.2019.8966436. [25] S. Bianco, M. Buzzelli, D. Mazzini, and R. Schettini, “Logo Recognition Using CNN Features,” in Image Analysis and Processing — ICIAP 2015, V. Murino and E. Puppo, Eds. New York, USA: Springer, 2015, pp. 438–448. [26] F. N. Iandola, A. Shen, P. Gao, and K. Keutzer, “DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer,” Oct. 2015, Accessed: Jan. 08, 2021. [Online]. Available: http://arxiv.org/abs/1510. 02131.