Microsoft Word - brain_vol8_issue3_v6_ok1.docx 5 A Robust Approach of Facial Orientation Recognition from Facial Features Kishor Datta Gupta Department of Computer Science, Lamar University, 4400 S M L King Jr Pkwy, Beaumont, TX 77705, USA Phone: +1 409-880-7011 kgupta@lamar.edu Md Manjurul Ahsan Department of Industrial Engineering, Lamar University, 4400 S M L King Jr Pkwy, Beaumont, TX 77705, USA Phone: +1 409-880-7011 mahsan2@lamar.edu Stefan Andrei Department of Computer Science, Lamar University, 4400 S M L King Jr Pkwy, Beaumont, TX 77705, USA Phone: +1 409-880-7011 stefan.andrei@lamar.edu Kazi Md. Rokibul Alam Department of Computer Science and Engineering, Khulna University of Engineering and Technology, Khulna 9203, Bangladesh Phone: +880 41-769468 rokib@cse.kuet.ac.bd Abstract Face orientation recognition is an important topic in computer vision and pattern recognition. Due to the non-rigid properties of faces, it is computationally expensive and difficult to achieve good recognition accuracy and robustness in face orientation recognition. In this paper, we propose an image mapping technique for face analysis in smart camera networks with a feature extraction and data from the facial feature. We estimate the face orientation angles in all camera views, based on the matched imaged data. Our objective is to obtain a set of facial structures which can work as landmarks for tracking and recognition of facial expressions. Keywords: face detection, recognition, computer vision, human-computer interaction, image 1. Introduction Most face recognition and tracking techniques employed in surveillance and human-computer interaction (HCI) systems rely on the assumption of a frontal view of the human face. In alternative approaches, knowledge of the orientation angle of the face in captured images can improve the performance of techniques based on non-frontal face views. It broadly consists of three parts: first, the face is detected by Haar detection based face detection method; then the face is tracked robustly using four extracted facial features; and finally, the orientation of the face is estimated by using the tracking results obtained independently from the three trackers (Viola and Jones; 2001). First, we use Haar detection method to identify a face area from any picture. The accuracy of this methodology is higher than 90% and very reliable to detect faces, we develop a new algorithm for face orientation recognition. The algorithm based on the combination of four individual tracking-based face orientation estimators that are relied on the seven properties of the face in question respectively: the variation of face regions, BRAIN: Broad Research in Artificial Intelligence and Neuroscience Volume 8, Issue 3, September 2017, ISSN 2067-3957 (online), ISSN 2068-0473 (print) 6 the deformation of face texture patterns, and the trajectory of face motion, Eyes position, mouth position, lip, and nose position (Chihaoui et al.; 2016) (Chutorian et al.; 2007). The combination is achieved by the make data in printed graph type images and compares images by image matching techniques. The algorithm is reliable and able to estimate face orientation efficiently. 2. Problem statement Despite the success of the existing approaches/systems for extracting face’s characteristics using computer vision technologies, current efforts in this area focus on using only a single visual cue such as eyelid movement or line of sight or head orientation to characterize face’s state of alertness. The system relying on a single visual cue may encounter difficulty when the required visual features cannot be acquired accurately or reliably (Schiele and Sagerer; 2001). For example, faces with glasses could pose a serious problem to those techniques based on detecting eye characteristics. Glasses can cause glare and may be totally opaque to light, making it impossible for the camera to monitor eye movement. Furthermore, the degree of eye openness may vary from people to people. Another potential problem with the use of a single visual cue is that the obtained visual feature may not always be indicative of one’s mental conditions. For example, the irregular head movement or line of sight (like briefly look back or at the mirror) may yield false alarms for such a system. All those visual cues, however imperfect they are individual, if combined systematically, can provide an accurate characterization of a face’s level of vigilance. It is our belief that simultaneous extraction and use of multiple visual cues can reduce the uncertainty and resolve the ambiguity present in the information from a single source. The system we propose can simultaneously, non-intrusively, and in real-time monitor several visual behaviors which usually characterize a person’s level of alertness while driving (Tian et al.; 1999). These visual cues include eyelid movement, pupil movement, and face orientation. The fatigue parameters computed from this visual cue are subsequently combined probabilistically to form a composite fatigue index that can robustly, accurately, and consistently characterize one’s vigilance level (Zhu et al.; 2004). 3. Methodology The methodology is composed of two main phases such as Face Feature Extraction and create graph image and Matching Graph image with stored images. 3.1. Face Feature Extraction It consists of four sequential steps such as Face Detection, Feature Extraction, gets features data and Creates image with these data. 3.1.1. Face detection The core basis for Haar classifier object detection is identifying the Haar-like features. First, Haar classifier cascades are trained for detecting human facial features, such as the mouth, eyes, and nose (Wilson and Fernandez; 2006) (Chihaoui et al.; 2016). To train the classifiers, this gentle AdaBoost algorithm and Haar feature algorithms must be implemented (Zhu et al.; 2004). Intel developed an open source library devoted to easing the implementation of computer vision related programs called Open Computer Vision Library. OpenCV is designed to be used in conjunction with applications that pertain to the field of HCI, robotics, biometrics, image processing, and other areas where visualization is Important and includes an implementation of Haar classifier detection and training (Wanjale et al; 2013). To train the classifiers, two set of images are needed. One set contains an image or a scene that does not contain the object, in this case, a facial feature, which is going to be detected. This set of images is referred to as the negative images. The other set of images, the positive images, contain one or more instances of the object. The location of the objects within the positive images is specified by image name, the upper left pixel, the height, and width of the object. For training, facial features 5,000 negative images with at least one megapixel resolution are used for training (Niese et al.; 2006). These images consisted of everyday objects, like paper K. D. Gupta, M. Ahsan, S. Andrei, K. M. R. Alam - A Robust Approach of Facial Orientation Recognition from Facial Features 7 clips, and of natural scenery, such as photographs of forests and mountains. To produce the most robust facial feature detection possible, the original positive set of images needs to be representative of the variance between different people, including, race, gender, and age. A good source for these images are the National Institute of Standards and Technology’s (NIST) and the Facial Recognition Technology (FERET) database. This database contains over 10,000 images of over 1,000 people under different lighting conditions, poses, and angles (Wilson and Fernandez; 2006). In training each facial feature, 1,500 images were used. These images were taken at angles ranging from zero to forty-five degrees from a frontal view. This provides the needed variance required to allow detection if the head is turned slightly. Three separate classifiers were trained, one for the eyes, one for the nose, and one for the mouth. Once the classifiers were trained, then it used to detect the facial features within another set of images from the FERET database. The accuracy of the classifier was then computed as shown in Table 1. Except for the mouth classifier, the classifiers have a high rate of detection. However, as implied by, the false positive rate is also quite high. Table 1. Accuracy of Classifiers Facial Feature Positive Hit Rate Negative Hit Rate Eyes 93% 23% Nose 100% 29% Mouth 67% 28% 3.1.2 Feature extraction The first step in facial feature detection is detecting the face. This requires analyzing the entire image. The second step is using the isolated face(s) to detect each feature. The result is shown in Figure 1. Since each portion of the image used to detect a feature is much smaller than that of the whole image, detection of all three facial features takes less time on average than detecting the face itself. Using a 1.2GHz AMD processor to analyze a 320 by 240 image, a frame rate of 3 frames per second was achieved. Since a frame rate of 5 frames per second was achieved in facial detection only by using a much faster processor, regionalization provides a tremendous increase in efficiency in facial feature detection. Figure 1. Face area detection 3.1.3. Get features data Because of facial symmetry, the horizontal pose can be estimated with positions of both eyes about the face. Our technique is dependent on four features of a face: Right Eye, Left Eye, Mouth, and Nose. It uses respective distance and size of these features for finding the orientation of the faces. One of the most important phase is to get the positions, size, height, width, angle of these features respective to faces. BRAIN: Broad Research in Artificial Intelligence and Neuroscience Volume 8, Issue 3, September 2017, ISSN 2067-3957 (online), ISSN 2068-0473 (print) 8 3.1.4 Create image from data Using these data, it creates a new picture and uses these data as a starting point for drawing. By using the data, it gets a model and shape of face without color and facial expression (Gourier et al.; 2004), such as Figure 2. It got a model of faces using these features. Figure 2. A Sample Image from extracted feature 3.2. Matching Graph image It calculates two images from all pixels, and calculates the distance between two images’ pixels. After summing up the distance using a threshold point it determines whether the images can match or not. If the images are close to the stored image, it passes as recognized. If image A1 has a black point Xa and Ya and the reference image R1 has a black point is Xr and Yr, then our calculated distance is:   )()( YrYaXrXa . 3.2.1 Orientation Recognition After matching the images with the reference image, it gets the nearest orientation matches and they could be Font left, Font right, Down-left, Down Right, Up left, Upright, Font Straight, Up Straight, Down Straight. Initially, some constraints must be satisfied to realize a successful correct matching. The facial regions, concerned on eyes and nose points, have the following characteristic: if there is almost one missing point for the region of the same type then the comparison will be performed. There must be the same number of feature points for both eyes and nose separately. If this condition is satisfied then a new comparison will be performed. 4. Experimental Analysis The methodology of our experimental method is described in Figure 3 below. Figure 3. The methodology flowchart K. D. Gupta, M. Ahsan, S. Andrei, K. M. R. Alam - A Robust Approach of Facial Orientation Recognition from Facial Features 9 It helps to write our code in C# and to make an application in dot net framework, which collects facial images using a webcam/or other video grabbing tools. Then it implements Haar detection to extract facial features and to draw image pattern for matching both images. 5. Results We list below our results (Figures 4 and 5) supported by Table 2. Figure 4. Sample image data 1 And for frontal we get as (Figure 5): Figure 5. Sample Image data 2 After image matching, we got a positive result at 93% times, for 1000 random sample images tested on the nine criteria of orientation. BRAIN: Broad Research in Artificial Intelligence and Neuroscience Volume 8, Issue 3, September 2017, ISSN 2067-3957 (online), ISSN 2068-0473 (print) 10 Table 2. Statistics of our matching Result Face orientation Total Missed Positive Success Rates Font Straight 150 4 144 99% Up Straight 110 20 90 92% Down Straight 100 23 77 77% Font left 100 12 88 88% Font Right 120 18 102 82% Up left 120 21 89 91% Up right 115 22 93 95% Down right 100 38 62 62% Down left 95 44 51 54% 5.1. Comparisons with other Methods If we make a comparison with three other models, some of these technologies are Multimodal Head estimation, Collaborative Face Orientation Detection. Estimating Face orientation is derived from Robust Detection of Salient Facial Structures. Table 3 shows the percentages of failing to recognize pose. Table 3. If we consider only the down face type result (that is, the negative impact on our feature), then the graph of failure rate of our method is higher than other (Table 4). Table 4. However, if we consider without the down face type result, then the graph of failure rate of our method is lower than other as shown in Table 5. K. D. Gupta, M. Ahsan, S. Andrei, K. M. R. Alam - A Robust Approach of Facial Orientation Recognition from Facial Features 11 Table 5. As a consequence of our experimental results, our method has a highly improved performance when we do not consider downside pose. 6. Conclusion Template matching is a fundamental task in the field of image processing because it is applicable to numerous different tasks such as object detection and categorization due to its simplicity of implementation. However, some authors also provide the different algorithm for simultaneous face detection, landmarks localization using deep convolutional neural networks (CNN), which can locate global and local information in faces (Ranjan et al.; 2016). Some authors also suggest the idea of normalizing the salient areas to align the specific areas (Liu et al.; 2017). In this paper, we propose an image matching technique with increased success rate of getting proper pose estimation for human faces. It could be used for artificial intelligence, and game controller development, also can be used in traffic control and robot development. Also, our method is more improved and faster, it needs less stored data and can work faster in a short period of time. Currently, several profit, non-profit organizations are working on face recognition from video evaluation. NIST (National Institute of Standards and Technology) are also working on accuracy and speed of face recognition algorithms applied to the identification of the persons appearing in the video (Grother et al.; 2017). References Chihaoui, M., Elkefi, A., Bellil, W., & Amar, C. B. (2016). A Survey of 2D Face Recognition Techniques, Computers, Vol. 5, No. 21, 28 pages. Chutorian, E., Doshi, A., & Trivedi, M. (2007). Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation. In IEEE Intelligent Transportation Systems Conference, pages 709–714. Gourier, N., Hall, D., & Crowley, J. L. (2004). Facial features detection robust to pose, illumination and identity. In Systems, Man and Cybernetics, 2004 IEEE International Conference on, vol. 1, pp. 617-622. IEEE, 2004. Grother, P. J., Ngan, M. L., & Quinn, G. W. (2017). Face In Video Evaluation (FIVE) Face Recognition of Non-Cooperative Subjects. NIST Interagency/Internal Report (NISTIR) – 8173. Liu, Y., Li, Y., Ma, X., & Song, R. (2017). Facial Expression Recognition with Fusion Features Extracted from Salient Facial Areas. Sensors 17, no. 4 (2017): 712. Niese, R., Al-Hamadi, A., & Michaelis, B. (2006). A stereo and color-based method for face pose estimation and facial feature extraction. In IEEE International Conference on Pattern Recognition, pages 299–302. Ranjan, R., Patel, V. M., & Chellappa, R. (2016). Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249. BRAIN: Broad Research in Artificial Intelligence and Neuroscience Volume 8, Issue 3, September 2017, ISSN 2067-3957 (online), ISSN 2068-0473 (print) 12 Schiele, B. & Sagerer, G. (2001). Computer Vision Systems, Second International Workshop, ICVS 2001 Vancouver, Canada, July 7-8, 2001 Proceedings. Vol. 2. Springer Science & Business Media. Viola, P. & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Conference on Computer Vision and Pattern Recognition, pages 511–8. Wanjale, K. H., Bhoomkar, A., Kulkarni, A., & Gosavi, S. (2013, April). Use of Haar Cascade Classifier for Face Tracking System in Real Time Video. In International Journal of Engineering Research and Technology, vol. 2, no. 4. IJERT. Wilson, P. I. & John, F. (2006). Facial feature detection using Haar classifiers. Journal of Computing Sciences in Colleges 21, no. 4: 127-133. Ying-li, T., Kanade, T., & Cohn, J. F. (1999, December). Recognizing Action Unit for Facial Expression Analysis. CMU-RI-TR-99-40 Technical Report of Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213. Retrieved from http://www.cs.cmu.edu/~face/Papers/CMU-RI-TR-99-40.pdf Zhu, Z., Ji, Q., & Lan, P. (2004). Real time non-intrusive monitoring and prediction of driver fatigue. IEEE Transactions on Vehicular Technologies 53, no. 4: 1052-1068.