Conseguences of soil crude oil pollution on some wood properties of olive trees Computer |243 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 A Smartphone-Based Model for Human Activity Recognition Ali Al-Taei Dept.of IT Technical/ College of Management/ Middle Technical/ University of Baghdad alitaei@mtu.edu.iq Received in :9/August/2017,Accepted in :1/November/2017/ Abstract Activity recognition (AR) is a new interesting and challenging research area with many applications (e.g. healthcare, security, and event detection). Basically, activity recognition (e.g. identifying user’s physical activity) is more likely to be considered as a classification problem. In this paper, a combination of 7 classification methods is employed and experimented on accelerometer data collected via smartphones, and compared for best performance. The dataset is collected from 59 individuals who performed 6 different activities (i.e. walk, jog, sit, stand, upstairs, and downstairs). The total number of dataset instances is 5418 with 46 labeled features. The results show that the proposed method of ensemble boost-based classifier overperforms other classifiers that were examined in this research paper. Keywords: Activity Recognition, Machine Learning, Sensor Mining, Smartphone Computing, Accelerometers. Multiple Classifier System. mailto:alitaei@mtu.edu.iq Computer |244 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 Introduction Recently, the need for monitoring and recognizing human activities is increasing. This task can be accomplished by employing some machine learning techniques [1], [2]. Human activity recognition (AR) might take part in many applications such as context aware behavior, smart environments, health care and security [3], [4] [2, 5]. The main aims of this research is to (i) employ and evaluate the performance of various standalone machine learning techniques for the AR task, and (ii) suggest an AR classification model that is more robust and accurate. To achieve these goals, a concrete background of the related works’ results should be discussed, the proposed method performance should be illustrated, compared to previous results using trusted accuracy metrics for evaluation. This research will be organized as follows, the related work will be discussed in section 2, the dataset characterization will be also described in subsection 2.2. And in section 3, a number of classifiers will be employed, tested, and compared, and the result of our model will be presented and discuss the experiments. Finally, conclusions and recommendations for future work are presented in section 4. Related works and dataset Related Works The recognition of physical human activity has been previously studied in some researches that depended on either accelerometer data collected from smart mobile devices such as in [6- 8], or other wearable sensors [2, 9, 10]. However, the most recent related works are explained in this section. Kwapisz et al. [11] in the wireless sensor data mining (WISDM) project 1 , collected a mobile phone-based dataset from 29 individuals who carried their Android smart phones on pockets while they were performing activities of daily life (ADL) such as sit, walk, climbing stairs, jog, and stand. However, they collected a dataset of 4526 instances with 46 features. That dataset was used to train 4 different classifiers (i.e. Decision Tree (DT), Logistic Regression (LR), Multi Layer Perceptron (MLP), and Straw Man (SM)) for the purpose of human activity recognition. Result showed that the MLP classifier was the best method with accuracy of 90%, while SM was the lowest accuracy classifier with 37.2% performance. Accordingly, the performance of the other two methods, DT and LR, was 85.1% and 78.1%, respectively. Trabelsi et al. [12] used inertial wearable-sensors to collect acceleration data, then used that data to train an unsupervised model to achieve AR task. The proposed model used Hidden Markov Map (HMM) for segmenting the data, and Expectation Maximization (EM) method for learning process. In other word, they proposed a Multiple Hidden Markov Map with Regression (MHMMR). Results showed that the proposed model performance was 91,4%, which, in comparison, is better than the performance of k-means (60.2%) and standard HMM (84.1%). In addition, they also evaluated some well known supervised learning methods (Naïve Bayes, MLP, Support Vector Machine, k-Nearest Neighbor, and Random Forest) and the results were 80.6%, 83.1%, 88.1%, 95.8%, and 93.5%, respectively. Statistically, supervised methods performed higher than the unsupervised methods. However, the dataset set that have been used is likely to be small (6 individuals), so additional work is needed on larger and wider data. Thus, Micucci et al. [3] proposed a rich and sufficient dataset, in both subjective and objective manners. This dataset (7,013 instances) was collected from 30 individuals (6 men and 24 women), with wide range of ages (i.e. from 18 to 60 year-old). In order to benchmark the dataset, two different classifiers were used: kNN and SVM. Furthermore, cross validation with 5 and 30 folds were evaluated on each classifier. Results Computer |245 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 showed that the performance of KNN (k=1) classifier is 86.89% and 86.47% for the SVM classifier. Bayat et al. [4] proposed an AR system with lowpass filter, which isolates gravity noise components from accelerometer raw data. Then, five classifiers were evaluated and compared to the suggested model, which is to use a mix of classifiers in one tier (average of probabilities). Results showed that a combination of MLP, LR, and SVM classifiers performed the best among other methods with 91.15% accuracy. Gupta and Dallas [7] proposed a feature selection based AR system to classify ADLs and falls. During the feature selection process two functions were employed: Sequential Forward Floating Search (SFFS), and Releif-F. In addition, two classification methods were employed: NB and kNN. Results showed 98% accuracy for both methods. Although this work presents promising results, which outperform filter-based systems in accuracy, it tends to cost more computations and has a low generalization on other machine learning methods. However, there is a need for this approach to be investigated on richer data. Catal et al. [13] proposed a multiple classifier system which utilized from MLP, DT, and LR and to be combined with the average probabilistic rule. The result was higher than using only MLP classifier. Al-Taei [14] employed 5 classification methods (i.e. MLP, NB, DT, SMO, and BN) for training WISDM (WIreless Sensor Data Mining) dataset of 6 different activities (i.e. walkning, jogging, climbing and downing stairs, sitting and standing). Results showed that MLP outperformed the other classifiers with overall accuracy of 92.65%. Lockhart and Weiss [15] analyzed and compared the performance of different AR models: universal, personal, and hybrid models. A combination of classification algorithms is used (i.e. LR, RF, NN, IBK, NB, J48, and JRip). Results show that personal models out performed other models. And the best classification method was RF, and NN, respectively. Dataset Characterization In this research, we will use the well known activity recognition dataset that is collected by and used in [11], and also been employed in many other researches such as [13], [14], and [16]. The total number of instances was 5418 with 46 attributes. The dataset concerns with 6 different activities: walking 38.4% (2081), jogging 30% (1625), upstairs 11.7% (632), downstairs 9.7% (528), sitting 5.6% (306), and standing 4.5% (246). The complete dataset is available in [17]. Results and Discussion Experimental Results In this section, experiments of 5 machine learning classifiers (RF, NB, kNN, JRip, and CvR) are presented and discussed, in addition to our proposed method of ensemble multi classifier. The proposed method depends on boosting the performance of classification by using voting technique to a specific learning algorithm, repeatedly, and add the learned hypothesis [18], [19]. Furthermore, cross validation method (with 10 folds) is used in all experiments, and results were compared in aspects of accuracy, F-measure, and root mean square error. However, the confusion matrix of the 7 classifiers are shown in below: 1- Confusion matrix of Random Forest From Table 1 above, it is noticable that 5033 of instances were classified correctly. 2- Confusion matrix of Instance Based (kNN=3). And from the results of the IB confusion matrix above, the total number of correctly classified instances is 4656. 3- Confusion matrix of Rule Induction (JRip) Computer |246 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 The results illustrated in Table 3 show that 4658 instances were classified correctly. 4- Confusion matrix of Naïve Bayes From Table 4 above, the results show that 4099 instances were classified correctly. 5- Confusion matrix of classification via regression The results of CvR method listed in Table 5 show that 4877 instances were classified correctly. 6- Confusion matrix of Adaboost (J48 as classifier)* From the confusion matrix listed in Table 6, results clearly show that 5106 instances were classified correctly. 7- Confusion matrix of Adaboost (Forest Random as classifier)* The results listed in Table 7 obviously show that 5110 instances were classified correctly. Furthermore, in order to understand the accuracy of the 7 classifiers in detail, the overall and perclass accuracy are illustrad in Table 8. From Table 8, results show that the combination of AdaBoost and FR methods in one classifier model comes with the highest accuracy of 94.31%. In addition, Adaboost with DT kernel (specifically J48) also performed very well with 94.24% accuracy. Random forest classifier comes in the third place with 92.89% accuracy. Furthermore, CvR classifier comes in the fourth place with accuracy of 90%, exceeding JRip and kNN classifiers with accuracy of 85.97% and 85.93% respectively. Naïve Bayes classifier performs the worst in that experiment with 79.99% accuracy. On the other hand, the results show that the jogging action has the highest accuracy of being classified correctly (97.54%), and walking comes in the second place with 96.43% overall class accuracy. Oppositely, downing and upping stairs actions were found to perform lowest class accuracy with 56.84% and 64.59%, respectively. However, On the other hand, root mean square error (RMSE) of each classifier is calculated and compared with other classifiers’ RMSE values. Figure 1 shows the RMSE values of the 7 employed classifiers. From figure 1, it is noticeable that the highest error value found was for the NB claasifier with 0.2345. On the other hand, the lowest error value was that of the AdaJ48 (0.1323) and AdaRF (0.1338) classifiers. The other metric that we used to evaluate the accuracy of classifiers is F-measure, as shown in Figure 2. Form Figure 2 above, it can be noticed that the highst average F-measure value was for the Ada classifier. In specific, the average F-measue values for AdaRF and AdaJ48 classifiers were 0.943 and 0.942 respectively. On the other hand, the lowest average F-measure value amonst the classifiers was that of NB with 0.781. Discussion The results obtained show that Ada method with RF kernel gives the highest classification accuracy with less error than other examined classifiers. Additionally, this result out performs the result obtained in [12], [4], [13], and [14]. Yet, the main limitations that face the AR task might be the position and orientatation of the mobile devices, features of devices/ sensors, in addition to the sensored data nature [9], [20, 21]. Computer |247 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 Conclusions and Future Work In this research, a combination of classification methods were employed and compared in accuracy and error aspects. The results show that random forest classifier performance was higher than other classifiers. Furthermore, the proposed method of ensemble of multi classifiers system (multi kernel) improved the performance and reduced the classification errors in the task of activity recognition. Specifically, employing random forest classifier with boosting technique should give best classification results. Nevertheless, one of the challenges in real life application (e.g. a phone-context problem) might occur, that is the mobile phone’s position is at inappropriate orientation and position for the target/ being sensed activity. However, as an idea for the future work might focus on preprocessing phase for the purpose of gaining more enhanced data with important affective features. Also, examining the performance of the suggested model to improve the task of human activity recognition based on other mobile data. In addition, the suggested model and techniques are needed to work on, and improve the performance, of online activity recognition task. References 1 .Segundo,R. San- Lorenzo-Trueba,J. B. and Martínez-González, J. M. Pardo, "Segmenting human activities based on HMMs using smartphone inertial sensors," Pervasive and Mobile Computing, 30,. 84-96, 8//( 2016). .2 Lara, O. D. and M. A. Labrador" ,A survey on human activity recognition using wearable sensors," IEEE Communications Surveys and Tutorials,. 15,. 1192-1209, (2013). .3 Mobilio,D. M. and Napoletano, P. "UniMiB SHAR: a new dataset for human activity recognition using acceleration data from smartphones," arXiv preprint arXiv:1611.07688,( 2016(. .4 Bayat,A. M. and Tran, D. A. "A study on human activity recognition using accelerometer data from smartphones," Procedia Computer Science, 34, 450-457, (2014|). .5 Lockhart,J .W. Pulickal,T. and Weiss, G. M. "Applications of mobile activity recognition," in Proceedings of the( 2012) ACM Conference on Ubiquitous Computing, (2012),. 1054-1058. .6 Khan, A. M. Tufail, A. A. and Khattak, M. Laine, T. H. "Activity recognition on smartphones via sensor-fusion and kda-based svms," International Journal of Distributed Sensor Networks,( 2014). .7 P. Gupta and T. Dallas, "Feature selection and activity recognition system using a single triaxial accelerometer," IEEE Transactions on Biomedical Engineering,. 61,. 1780-1786, (2014). .8 Vavoulas,G. C. Chatzaki, T. Malliotakis, M. and Pediaditis,M. Tsiknakis, "The mobiact dataset: Recognition of activities of daily living using smartphones," in Proceedings of the International Conference on Information and Communication Technologies for Ageing Well and e-Health,( 2016),. 143-151. .9 Raut, A. R. and S. Khandait, "Review on data mining techniques in wireless sensor networks," in Electronics and Communication Systems (ICECS)( ( ,2015 nd International Conference on, I(2015),. 390-394. .10 C. C. B. P. Lo, M. R. Yuce, A. Alomainy, and Y. Hao, "Body sensor networks: In the era of big data and beyond," IEEE reviews in biomedical engineering,. 8,. 4-16,( 2015). .11 Kwapisz,J. R Weiss, .G. M. and Moore, S. A. "Activity recognition using cell phone accelerometers," ACM SigKDD Explorations Newsletter,. 12,. 74-82, (2011). .12 Trabelsi,D. S. Mohammed, F. Chamroukhi, L. Oukhellou, and Y. Amirat, "An unsupervised approach for automatic activity recognition based on hidden Markov model Computer |248 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 regression," IEEE Transactions on Automation Science and Engineering,. 10,. 829- 835,( 2013). .13 C. S. Tufekci, E. Pirmit, and G. Kocabag, "On the use of ensemble of classifiers for accelerometer-based activity recognition," Applied Soft Computing,. 37,. 1018-1022, (2015). .14 Taei,A. Al- "A Mobile Based Activity Recognition Model," Journal of The College of Basic Education- Pure Science, 23,. 75-88, 1 June (2017). .15 Lockhart J. W. and Weiss, G. M. "The benefits of personalized smartphone- based activity recognition models," in Proceedings of the( 2014) SIAM International Conference on Data Mining,( 2014),. 614-622. .16 Abdallah, Z. S. M. Gaber, M. B. Srinivasan ,and S. Krishnaswamy, "Adaptive mobile activity recognition system with evolving data streams," Neurocomputing, vol. 150, Part A,. 304-317, 2/20/( 2015). .17 http://www.cis.fordham.edu/wisdm/dataset.php. .18 Ravi, N. N. Dandekar, P. Mysore, and M. L. Littman, "Activity recognition from accelerometer data," in Aaai, (2005),. 1541-1546. .19 Meir, R. and Rätsch, G. "An introduction to boosting and leveraging," in Advanced lectures on machine learning, ed: Springer,( 2003),. 118-183. .20 Lockhart, J. W. and Weiss,G. M. "Limitations with activity recognition methodology & data sets," in Proceedings of the (2014) ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication,( 2014),. 747-756. .21 Incel,O. D. M. Kose, and C .Ersoy, "A review and taxonomy of activity recognition on mobile phones," BioNanoScience,. 3,. 145-171,( 2013). Table 1: Confusion matrix of RF classifier Stand Sit Downstairs Upstairs Jog Walk 2 0 14 18 5 2042 Walk 1 1 6 6 1601 10 Jog 1 4 80 501 19 27 Upstairs 2 2 360 119 9 36 Downstairs 3 299 1 2 0 1 Sit 230 6 2 5 2 1 Stand Table(2): Confusion matrix of IB Stand Sit Downstairs Upstairs Jog Walk 0 0 37 34 0 2010 Walk 1 1 6 15 1589 13 Jog 2 0 152 407 7 64 Upstairs 0 0 313 133 0 82 Downstairs 9 243 21 20 1 12 Sit 94 3 34 65 1 49 Stand Table(3): Confusion matrix of JRip Stand Sit Downstairs Upstairs Jog Walk 3 2 44 60 16 1956 Walk 0 1 7 21 1570 26 Jog 2 4 115 336 20 155 Upstairs 0 0 281 86 14 147 Downstairs 9 286 8 2 0 1 Sit 229 9 3 3 0 2 Stand http://www.cis.fordham.edu/wisdm/dataset.php Computer |249 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 Table (4): Confusion matrix of NB Stand Sit Downstairs Upstairs Jog Walk 14 11 31 76 30 1919 Walk 14 4 12 21 1536 38 Jog 37 11 96 221 38 229 Upstairs 30 9 144 86 10 249 Downstairs 14 292 0 0 0 0 Sit 222 18 1 5 0 0 Stand Table (5): Confusion matrix of CvR Stand Sit Downstairs Upstairs Jog Walk 0 0 28 33 11 2009 Walk 1 1 6 12 1590 15 Jog 2 2 95 432 20 81 Upstairs 3 1 326 102 10 86 Downstairs 14 288 1 3 0 0 Sit 232 5 5 2 0 2 Stand Table(6): Confusion matrix of AdaBoost+J48 Stand Sit Downstairs Upstairs Jog Walk 1 0 16 14 6 2044 Walk 0 0 8 7 1603 7 Jog 0 0 87 515 10 20 Upstairs 1 0 414 88 7 18 Downstairs 4 297 0 3 0 2 Sit 233 2 4 3 1 3 Stand Table(7): Confusion matrix of AdaBoost+RF Stand Sit Downstairs Upstairs Jog Walk 0 0 12 16 2 2051 Walk 1 0 7 11 1597 9 Jog 0 2 83 528 5 14 Upstairs 1 3 397 107 0 20 Downstairs 3 301 0 2 0 0 Sit 236 1 3 3 2 1 Stand Table(8): Comparison of different classifiers’ accuraciey Walk Jog Upstairs Downstairs Sit Stand Overall accuracy RF 2042 98.12% 1601 98.52% 501 79.27% 360 68.18% 299 97.71% 230 93.50% 92.89% kNN 2010 96.58% 1589 97.78% 407 64.40% 313 59.28% 243 79.41% 94 38.21% 85.93% JRip 1956 93.99% 1570 96.62% 336 53.16% 281 53.22% 286 93.46% 229 93.09% 85.97% NB 1919 92.21% 1536 94.52% 221 34.97% 144 27.27% 292 95.42% 222 90.24% 79.99% CvR 2009 96.54% 1590 97.85% 432 68.35% 326 61.74% 288 94.12% 232 94.31% 90.01% Boost(M1+DT)* 2044 98.22% 1603 98.65% 515 81.49% 414 78.41% 297 97.06% 233 94.72% 94.24% Boost(RF)* 2051 98.55% 1597 98.28% 528 83.54% 397 75.19% 301 98.37% 236 95.93% 94.31% Overall class accuracy 96.43% 97.54% 64.59% 56.84% 94.26% 87.94% Computer |250 https://doi.org/10.30526/30.3.1628 2017( عام 3( العدد ) 30مجلة إبن الهيثم للعلوم الصرفة و التطبيقية المجلد ) Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.30 (3) 2017 ) 2017 Figure (1): Comparison of classifiers’ error values Figure (2): Comparison of classifiers’ average F-measure values 0 0.05 0.1 0.15 0.2 0.25 R F K N N J R I P N B C V R A D A + J 4 8 A D A + R F R M S E 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 R F K N N J R I P N B C V R A D A + J 4 8 A D A + R F AV E R AG E F - M EA S U R E