KEDS_Paper_Template Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 4, No 1, July 2021, pp. 49–54 eISSN 2597-4637 https://doi.org/10.17977/um018v4i12021p49-54 ©2021 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) KEDS is Sinta 2 Journal (https://sinta.ristekbrin.go.id/journals/detail?id=6662) accredited by Indonesian Ministry of Research & Technology Face Images Classification using VGG-CNN I Nyoman Gede Arya Astawa a, 1, *, Made Leo Radhitya b, 2, I Wayan Raka Ardana a, 3, Felix Andika Dwiyanto c, 3 a Electrical Engineering Department, Politeknik Negeri Bali Kampus Jimbaran, Badung, Bali, 80361 Indonesia b Department of Informatics, STMIK STIKOM Indonesia Tukad Pakerisan 97, Denpasar, Bali, 80225 Indonesia c Association for Scientific Computing Electronics and Engineering (ASCEE) Jl. Janti, Karangjambe 130B, Banguntapan, Bantul, Yogyakarta, Indonesia 1 arya_kmg@pnb.ac.id*; 2 leo.radhitya@stiki-indonesia.ac.id; 3 rakawyn@pnb.ac.id; 4 felix@ascee.org * corresponding author I. Introduction Facial recognition is one of the most widely studied biometrics fields due to a high level of difficulty [1][2]. Specifically, image classification is part of facial recognition processes, which is an actual problem in computer vision [3]. The classification process helps accelerate the training process due to data that has been classified before performing the training process. The classification method selection also determines the level of accuracy in the training process [4]. Several popular classifications in the facial recognition process are Euclidean distance, KNN, SVM, PCA, and CNN [5][6]. Currently, studies that apply the deep learning method provide better results in facial recognition [7]. The most compelling image recognition method is Convolutional Neural Network (CNN) [8]. Recent researches results show that transfer learning solutions are the basis for image classification [7][9][10]. The research claimed that CNN provides significant results. Moreover, each binary image classification, ReLU activation function, and Sigmoid classifier combination provide the best classification accuracy [11]. Other studies result that the activation function strongly influences the system's accuracy to identify and recognize mushroom images [12]. ARTICLE INFO A B S T R A C T Article history: Received 4 March 2021 Revised 29 March 2021 Accepted 4 April 2021 Published online 17 August 2021 Image classification is a fundamental problem in computer vision. In facial recognition, image classification can speed up the training process and also significantly improve accuracy. The use of deep learning methods in facial recognition has been commonly used. One of them is the Convolutional Neural Network (CNN) method which has high accuracy. Furthermore, this study aims to combine CNN for facial recognition and VGG for the classification process. The process begins by input the face image. Then, the preprocessor feature extractor method is used for transfer learning. This study uses a VGG-face model as an optimization model of transfer learning with a pre-trained model architecture. Specifically, the features extracted from an image can be numeric vectors. The model will use this vector to describe specific features in an image. The face image is divided into two, 17% of data test and 83% of data train. The result shows that the value of accuracy validation (val_accuracy), loss, and loss validation (val_loss) are excellent. However, the best training results are images produced from digital cameras with modified classifications. Val_accuracy's result of val_accuracy is very high (99.84%), not too far from the accuracy value (94.69%). Those slight differences indicate an excellent model, since if the difference is too much will causes underfit. Other than that, if the accuracy value is higher than the accuracy validation value, then it will cause an overfit. Likewise, in the loss and val_loss, the two values are val_loss (0.69%) and loss value (10.41%). This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Classification CNN Face Image VGG http://u.lipi.go.id/1502081730 http://u.lipi.go.id/1502081046 https://doi.org/10.17977/um018v4i12021p49-54 http://journal2.um.ac.id/index.php/keds mailto:keds.journal@um.ac.id https://creativecommons.org/licenses/by-sa/4.0/ https://sinta.ristekbrin.go.id/journals/detail?id=6662 https://creativecommons.org/licenses/by-sa/4.0/ 50 I.N.G.A. Astawa et al. / Knowledge Engineering and Data Science 2021, 4 (1): 49–54 This study aims to determine the CNN method's facial images classification. In this study, the pre- trained model used is the VGG-face model [8]. This significant result was obtained using 16-19 layer weights. The classification modeling in this study is changing the last layer on CNN. II. Method In this study, there are several processes to achieve the expected result. Those processes are data collection, map feature extraction, classification modeling, and result validation testing. Therefore, this study using KomNet dataset [13], and the face image used is 36600 face images with a size of 224×224 pixels. In computer vision, transfer learning is commonly expressed through the use of a pre- trained model. A typical implementation used is to import and use models from existing libraries. The next step is to create a new convolutional neural networks (CNN) model for image classification using multiclass CNN. This image classification model is generated from the transfer learning approach, which is based on the CNN pre-trained model [14]. In general, CNN proved to be superior in a variety of computer vision tasks [15]. Convolutional Networks (ConvNets) have shown excellent performance in handwritten digital classification and face detection [16]. Figure 1 is an outline of the CNN processes in the system. The process begins by inputting the face image. The method used for transfer learning is the feature extractor preprocessor. This study uses an optimization model of transfer learning with a pre-trained model architecture, the VGG-face model. Mainly, the features extracted from an image can be numeric vectors. The model will use this vector to describe specific features in an image. The reason for the VGG-face model selection because it is perfect for producing facial feature extraction [17]. The Feature Extraction has a VGG-face 16 layer architecture. After performing the VGG-face model, the last layer of the VGG-face will be modified to achieve the maximum result. Figure 2 presents the VGG-face architecture with the last three layers are the classifications to be modified. The first layer features are general, and the last layer features specific, so there must be a transition from general to specific somewhere on the network [18]. The pre-trained strategy leaves some initial layers unprocessed and trains the final layers to avoid overfitting [19]. The initial layer is for convolution or feature extraction, while the last layer is for classification. The last three layers are Fig. 1. CNN process Fig. 2. VGG-face Architecture I.N.G.A. Astawa et al. / Knowledge Engineering and Data Science 2021, 4 (1): 49–54 51 fully connected + ReLU. The modification of the last three layers is required to provide better performance. Following is the pseudocode for the last three layers. #LAST LAYER classifier_model=Sequential() classifier_model.add(Dense(units=100,input_dim=x_train.shape[1],kernel_initiali zer='glorot_uniform')) classifier_model.add(BatchNormalization()) classifier_model.add(Activation('tanh')) classifier_model.add(Dropout(0.3)) classifier_model.add(Dense(units=10,kernel_initializer='glorot_uniform')) classifier_model.add(BatchNormalization()) classifier_model.add(Activation('tanh')) classifier_model.add(Dropout(0.2)) classifier_model.add(Dense(units=24,kernel_initializer='he_uniform')) classifier_model.add(Activation('softmax')) classifier_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),o ptimizer='nadam',metrics=['accuracy']) The last layer of the VGG-face model is the one that is wholly connected before the output layer. These layers will provide a complex set of features for describing an input image and provide useful input when training a new image classification model. After the pre-trained model from the last VGG-face layer is loaded, the next step is to create a data train and data test. It consists of five stages, and the first stage is to change the existing wavelet feature in the train or test folder with a target size of 224 (224×224 pixels). At the second stage, it needs to change the image into an array. Next, the third stage is inputting the results into the last VGG-face. Then, the results are entered into the train data array and the test data array. After that, the last stage is to repeat step one until all face images in the train or test folder have been read After the model is made, the next step is using epoch 100 in the training process. In this process, the weight value will be obtained, which is stored in a file in h5 format. Tests are performed to obtain a validation test of the results through training facial images from several devices. The results of image testing from several devices are displayed in graphical form. III. Results and Discussions The use of massive data is necessary to produce an ideal result. Moreover, the pre-trained model is a conversion model provided by TensorFlow or Keras. This pre-trained model can be used directly from the VGG-face Keras library. After the model is made, the next step is the training process with epoch 100. Epoch 100 limits the iteration of large amounts of data that takes a long time to train in one training session. However, the deep learning method has a weakness with the long training process when using a server computer. It can be overcome using Graphical Processing Unit (GPU) technology [9][20]. This study using GPU, which Google Colab owns for the training process. The results of the training process are the weight value which is stored in an h5 format file. The train results with epoch 100 with three sources of face images are presented in Table 1. Table 1 shows that the results of facial image training are from three devices at epoch 100. In the training process, facial images are divided into two, which are 17% as data test and 83% as data train. The accuracy value, which are val_accuracy, loss, and val_loss values, are impressive. However, the best training result is the image that comes from a digital camera with a modified classification. The val_accuracy result is very high (99.84%), not too far from the accuracy result (94.69%). The difference in value that is not too significant indicates a great model. It is because if the difference is too far, it will cause an under fit. Other than that, if the accuracy value is higher than the accuracy validation value, it will cause an over fit. Moreover, the val_loss result is very low (0.69%) and the loss value is 10.41%. The error or loss value is the smallest compared to the others, which means that the model is ideal and proper to be used as a prediction. 52 I.N.G.A. Astawa et al. / Knowledge Engineering and Data Science 2021, 4 (1): 49–54 The training results from start to finish are presented in a graphic image. Figure 3 shows a graph of the facial image training result on Epoch 100 with modified classification. Figure 3 shows that the model (the modification of the last three layers) is great and ideal since the value differences are insignificant. Likewise, the difference between val_loss and loss is relatively small, and the values are close. Table 1. Training result of three image sources on epoch 100 Image source Number of face images Epoch 100 Train Test Accuracy (%) Val_Accuracy (%) Loss (%) Val-loss (%) Mobile phone 11,000 2,200 94.05 98.69 12.32 6.29 Digital camera 11,000 2,200 94.69 99.84 10.41 0.69 Social media 11,000 2,200 93.02 92.75 20.07 38.20 (a) (b) (c) Fig. 3. Graph of training results on epoch 100 with modification and facial image classification sourced from (a) mobile phones, (b) digital cameras, and (c) social media I.N.G.A. Astawa et al. / Knowledge Engineering and Data Science 2021, 4 (1): 49–54 53 IV. Conclusion This study performed a pre-trained model using the VGG-face architecture to modify the last three layers or the classification section. The model provides a result of very high accuracy. Also, the resulting loss is shallow. It is indicated that the model is great and ideal for prediction. Moreover, the image data for training are obtained from three sources. Based on the three image sources, the best source is from the digital camera with accuracy = 94.69%, and loss = 10.41%. Therefore, further research needs to focus on the quality of camera image sources to improve the classification performance optimally. Acknowledgment Politeknik Negeri Bali and STIKI Indonesia supported this research. We thank everyone who contributed to the completion of this paper. Hopefully, this research significantly contributes to knowledge development, especially in face image classification. Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest The authors declare no known conflict of financial interest or personal relationships that could have appeared to influence the work reported in this paper. Additional information Reprints and permission information is available at http://journal2.um.ac.id/index.php/keds. Publisher’s Note: Department of Electrical Engineering - Unversitas Negeri Malang remains neutral with regard to jurisdictional claims and institutional affiliations. References [1] M. Andrejevic and N. Selwyn, “Facial recognition technology in schools: critical questions and concerns,” Learn. Media Technol., vol. 45, no. 2, pp. 115–128, Apr. 2020, doi: 10.1080/17439884.2020.1686014. [2] C. M. Cook, J. J. Howard, Y. B. Sirotin, J. L. Tipton, and A. R. Vemury, “Demographic Effects in Facial Recognition and Their Dependence on Image Acquisition: An Evaluation of Eleven Commercial Systems,” IEEE Trans. Biometrics, Behav. Identity Sci., vol. 1, no. 1, pp. 32–41, Jan. 2019, doi: 10.1109/TBIOM.2019.2897801. [3] Y. Lin and H. Xie, “Face Gender Recognition based on Face Recognition Feature Vectors,” in 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), Sep. 2020, pp. 162–166, doi: 10.1109/ICISCAE51034.2020.9236905. [4] M. Imani and H. Ghassemian, “Fast feature selection methods for classification of hyperspectral images,” in 7’th International Symposium on Telecommunications (IST’2014), Sep. 2014, pp. 78–83, doi: 10.1109/ISTEL.2014.7000673. [5] Y. Zhu, C. Zhu, and X. Li, “Improved principal component analysis and linear regression classification for face recognition,” Signal Processing, vol. 145, pp. 175–182, Apr. 2018, doi: 10.1016/j.sigpro.2017.11.018. [6] A. Raikwar and J. Agrawal, “A Review of Face Recognition Using Feature Optimization and Classification Techniques,” in Information Management and Machine Intelligence. ICIMMI 2019. Algorithms for Intelligent Systems, D. Goyal, V. E. Bălaş, A. Mukherjee, C. de A. V. Hugo, and A. K. Gupta, Eds. Singapore: Springer, 2021, pp. 595– 604. [7] A. Bilgic, O. C. Kurban, and T. Yildirim, “Face recognition classifier based on dimension reduction in deep learning properties,” in 2017 25th Signal Processing and Communications Applications Conference (SIU), May 2017, pp. 1–4, doi: 10.1109/SIU.2017.7960368. [8] T. Purwaningsih, I. A. Anjani, and P. B. Utami, “Convolutional Neural Networks Implementation for Chili Classification,” in 2018 International Symposium on Advanced Intelligent Informatics (SAIN), Aug. 2018, pp. 190–194, doi: 10.1109/SAIN.2018.8673373. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386. [10] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv Prepr. arXiv1409.1556, Sep. 2014. [11] K. Chauhan and S. Ram, “Image classification with deep learning and comparison between different convolutional http://journal2.um.ac.id/index.php/keds https://doi.org/10.1080/17439884.2020.1686014 https://doi.org/10.1080/17439884.2020.1686014 https://doi.org/10.1109/tbiom.2019.2897801 https://doi.org/10.1109/tbiom.2019.2897801 https://doi.org/10.1109/tbiom.2019.2897801 https://doi.org/10.1109/iciscae51034.2020.9236905 https://doi.org/10.1109/iciscae51034.2020.9236905 https://doi.org/10.1109/iciscae51034.2020.9236905 https://doi.org/10.1109/istel.2014.7000673 https://doi.org/10.1109/istel.2014.7000673 https://doi.org/10.1109/istel.2014.7000673 https://doi.org/10.1016/j.sigpro.2017.11.018 https://doi.org/10.1016/j.sigpro.2017.11.018 https://doi.org/10.1007/978-981-15-4936-6_64 https://doi.org/10.1007/978-981-15-4936-6_64 https://doi.org/10.1007/978-981-15-4936-6_64 https://doi.org/10.1007/978-981-15-4936-6_64 https://doi.org/10.1109/siu.2017.7960368 https://doi.org/10.1109/siu.2017.7960368 https://doi.org/10.1109/siu.2017.7960368 https://doi.org/10.1109/sain.2018.8673373 https://doi.org/10.1109/sain.2018.8673373 https://doi.org/10.1109/sain.2018.8673373 https://doi.org/10.1145/3065386 https://doi.org/10.1145/3065386 https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 http://ijaerd.com/papers/finished_papers/Image%20Classification%20with%20Deep%20Learning%20and%20Comparison%20between%20Different%20Convolutional%20Neural%20Network%20Structures%20using%20Tensorflow%20and%20Keras-IJAERDV05I0263082.pdf 54 I.N.G.A. Astawa et al. / Knowledge Engineering and Data Science 2021, 4 (1): 49–54 neural network structures using tensorflow and keras,” Int. J. Adv. Eng. Res. Dev., vol. 5, no. 02, pp. 533–538, 2018. [12] A. Fadlil, R. Umar, and S. Gustina, “Mushroom Images Identification Using Orde 1 Statistics Feature Extraction with Artificial Neural Network Classification Technique,” Journal of Physics: Conference Series, vol. 1373, p. 012037, Nov. 2019. [13] I. N. G. A. Astawa, I. K. G. D. Putra, M. Sudarma, and R. S. Hartati, “KomNET: Face Image Dataset from Various Media for Face Recognition,” Data Br., vol. 31, p. 105677, Aug. 2020, doi: 10.1016/j.dib.2020.105677. [14] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep Learning for Computer Vision: A Brief Review,” Comput. Intell. Neurosci., vol. 2018, pp. 1–13, 2018, doi: 10.1155/2018/7068349. [15] Y. Bengio, “Learning Deep Architectures for AI,” Found. Trends® Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009, doi: 10.1561/2200000006. [16] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Springer, 2014, pp. 818–833. [17] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “VGGFace2: A Dataset for Recognising Faces across Pose and Age,” in 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), May 2018, pp. 67–74, doi: 10.1109/FG.2018.00020. [18] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?,” arXiv Prepr. arXiv1411.1792, Nov. 2014. [19] P. Marcelino, “Transfer learning from pre-trained models,” 2018. [20] Y. E. Wang, G.-Y. Wei, and D. Brooks, “Benchmarking TPU, GPU, and CPU Platforms for Deep Learning,” arXiv Prepr. arXiv1907.10701, Jul. 2019. http://ijaerd.com/papers/finished_papers/Image%20Classification%20with%20Deep%20Learning%20and%20Comparison%20between%20Different%20Convolutional%20Neural%20Network%20Structures%20using%20Tensorflow%20and%20Keras-IJAERDV05I0263082.pdf https://doi.org/10.1088/1742-6596/1373/1/012037 https://doi.org/10.1088/1742-6596/1373/1/012037 https://doi.org/10.1088/1742-6596/1373/1/012037 https://doi.org/10.1016/j.dib.2020.105677 https://doi.org/10.1016/j.dib.2020.105677 https://doi.org/10.1155/2018/7068349 https://doi.org/10.1155/2018/7068349 https://doi.org/10.1561/2200000006 https://doi.org/10.1561/2200000006 https://doi.org/10.1007/978-3-319-10590-1_53 https://doi.org/10.1007/978-3-319-10590-1_53 https://doi.org/10.1109/fg.2018.00020 https://doi.org/10.1109/fg.2018.00020 https://doi.org/10.1109/fg.2018.00020 https://arxiv.org/abs/1411.1792 https://arxiv.org/abs/1411.1792 https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751 https://arxiv.org/abs/1907.10701 https://arxiv.org/abs/1907.10701 I. Introduction II. Method III. Results and Discussions IV. Conclusion Acknowledgment Declarations Author contribution Funding statement Conflict of interest Additional information References