PARADIGMA BARU PENDIDIKAN MATEMATIKA DAN APLIKASI ONLINE INTERNET PEMBELAJARAN How to cite: M. Iqbal, C. Wulandari, W. Yunanto, and G. Sari, “Mining Non-Zero-Rare Sequential Patterns On Activity Recognition”, mantik, vol. 5, no. 1, pp. 1-9, May 2019. Mining Non-Zero-Rare Sequential Patterns on Activity Recognition Mohammad Iqbal1, Chandrawati Putri Wulandari2,a, Wawan Yunanto3, and Ghaluh Indah Permata Sari2,b Institut Teknologi Sepuluh Nopember, iqbal@matematika.its.ac.id1 National Taiwan University of Science and Technology, d10301809@mail.ntust.edu.tw2,a, ghaluhips@gmail.com2,b Politeknik Caltex Riau, wawan@pcr.ac.id3 doi: https://doi.org/10.15642/mantik.2019.5.1.1-9 Abstrak: Penemuan pola langka aktivitas manusia yang diperoleh dari sensor gerak yang aktif dapat memberikan informasi yang tidak biasa untuk memberitahukan seseorang dalam keadaan yang berbahaya. Penelitian ini bertujuan untuk mengenali aktivitas manusia yang langka menggunakan teknik penambangan pola non-zero-rare sekuensial. Pola tersebut harus muncul pada barisan sensor yang aktif dan jumlah kemunculannya tidak melebihi ambang batas kemunculan yang telah ditentukan sebelumnya. Penelitian ini mengusulkan sebuah algoritma untuk menambang pola non- zero-rare aktivitas manusia yang disebut Mining Multi-class Non-Zero-Rare Sequential Patterns (MMRSP). Hasil eksperimen menunjukkan bahwa pola non-zero-rare aktivitas manusia mampu menangkap aktivitas yang tidak biasa. Selanjtunya, MMRSP bekerja dengan baik berdasarkan hasil nilai precision dari aktivitas yang jarang. Kata kunci: Pola Sekuensial, Pola Langka, Pengenalan Aktivitas. Multi-Kelas Abstract: Discovering rare human activity patterns—from triggered motion sensors deliver peculiar information to notify people about hazard situations. This study aims to recognize rare human activities using mining non-zero-rare sequential patterns technique. In particular, this study mines the triggered motion sensor sequences to obtain non-zero-rare human activity patterns—the patterns which most occur in the motion sensor sequences and the occurrence numbers are less than the pre-defined occurrence threshold. This study proposes an algorithm to mine non-zero-rare pattern on human activity recognition called Mining Multi-class Non-Zero-Rare Sequential Patterns (MMRSP). The experimental result showed that non-zero-rare human activity patterns succeed to capture the unusual activity. Furthermore, the MMRSP performed well according to the precision value of rare activities. Keywords: Sequential Patterns, Rare Patterns, Activity Recognition, Multi-class. Jurnal Matematika MANTIK Vol. 5, No. 1, May 2019, pp. 1-9 ISSN: 2527-3159 (print) 2527-3167 (online) http://u.lipi.go.id/1458103791 Jurnal Matematika MANTIK Vol. 5, No. 1, May 2019, pp. 1-9 2 1. Introduction Discovering human activity patterns can help human life in better ways. We further plan to build a pleasant and safe living place by installing motion sensors in a house or apartment. In specific, we capture resident daily motions when the sensors are triggered. By collecting the triggered sensors sequences, we attempt to generate the triggered sensors patterns which can be used to understand the resident activities. However, understanding human activities from motion sensors are difficult and still being developed by finding its useful patterns. Machine learning and data mining are preferable techniques to uncover these useful patterns. Kasteren et al. [1] used Hidden Markov Model (HMM) to classify the sensor sequences into several resident activities. Some other machine learning methods, such as naïve Bayes, Conditional Random Forest (CRF), and Support Vector Machines (SVM) was utilized in Cook et al. [2]. However, they did not present the generated model that clearly describes what is happening on the triggered sensors, lie what we may directly interpret in this paper. Therefore, this study employs sequential pattern mining to offer useful information about the patterns in 𝒔 → 𝑎 form, where 𝒔 and 𝑎 denote a sensor sequence and a human activity, respectively. Iqbal and Pao [3], and Mukhlash et al. [4] studied several human activities types that can be considered as a typical pattern. This study focuses on one type of rare patterns, so-called non-zero-rare patterns [5,6]. The goal of this study is to provide a rare human activity pattern which can inform the resident in hazard condition. We argue that the generated patterns may not represent the whole activities, if they only consider the occurrence pattern in the whole activity sequences, since the dataset may have several activities as the labels. Thus, we present a non-zero-rare human activity pattern as a subsequence, which must occur in sequences of one activity with the occurrence number is less than a pre-defined occurrence threshold. We propose an algorithm to mine non-zero-rare human activity pattern called Mining Multi-class Non-Zero-Rare Sequential Patterns (MMRSP). Based on the MMRSP, we obtain a non-zero pattern for each activity that differs from the previous works [5,6]. As far as we are aware, there is a limited number of research that discusses rare patterns on human activity recognition. The organization of this paper describes as follows: Section 2 explains about mining sequential patterns techniques, especially on human activity recognition. The explanation about the non-zero-rare human activity pattern and the mining technique of the proposed method will be presented in Section 3. In Section 4 and Section 5, we discuss the experimental results and conclude the discussion, consecutively. 2. Related Works Like the abovementioned, there are two previous works on human activity recognition using sequential pattern mining. In [3], a distinguishing subsequence on the multi-class classification proposed to recognize a distinguish sensor subsequence that is frequent in one activity sequences yet rarely in other sequences based on two support thresholds. They extended the idea in [7] using one-vs-all strategy. Mukhlash et al. [4] introduced a periodic human activity pattern. Thus we know about regular activity in a certain time interval using FP-Growth Prefix-Span and fuzzy theory to discretize the time interval. Also, there are several studies on human activity recognition using sequential pattern mining. A sensor pattern that significantly distinguishes from one to other activities was studied in [8]. Furthermore, frequent pattern mining based on multiple order temporal information was performed in [9], and weighted frequent patterns mining was proposed by [10] to adapt with classification task. We could say that those studies still focusing on typical activity patterns. Concerning about environmental safety, we also need to take a rare pattern into M. Iqbal, C. P. Wulandari, W. Yunanto, and G. I. P. Sari Mining Non-Zero-Rare Sequential Patterns on Activity Recognition 3 our consideration to deliver a quick alert that may put the resident may in the unpredicted situation. This study discusses a non-zero-rare pattern based on [5,6] into human activity recognition. Since there are more than two activities, we extend the definition of the non- zero-rare pattern on the multi-class case. Also, we propose an algorithm to mine the non- zero-rare pattern on multi-class based on [11]. 3. Mining Multi-class Non-Zero-Rare Sequential Patterns on Human Activity Recognition In this section, we define the non-zero-rare-pattern on human activity recognition and describe an algorithm to mine the patterns. 3.1 Multi-class Non-Zero-Rare Sequential Patterns First, we define a non-zero-rare human activity pattern. Assume that we have pairs of motion sensor sequences and human activity set 𝐷 = {(𝒔𝑖, 𝑎𝑖)|𝒔𝑖 ∈ 𝑆, 𝑎𝑖 ∈ 𝐴, 1 ≤ 𝑖 ≤ |𝐷|}. A set of motion sensor sequences 𝑆 = {𝒔𝑗 |𝒔𝑗 = (𝑠𝑗1 , 𝑠𝑗2 , ⋯ , 𝑠𝑗𝑛 ), 𝑠𝑗𝑛 ∈ 𝑅 × 𝑀, 1 ≤ 𝑗 ≤ 𝑛 ≤ |𝑆|}, is collected when the sensors are triggered during 𝑡𝑡ℎ-time intervals with the sequence 𝒔𝑖 which a result of Cartesian product between a set of motion sensor types 𝑀 = {𝑚1, 𝑚2, ⋯ , 𝑚|𝑀|} and a set of sensor location 𝑅 = {𝑟1, 𝑟2, ⋯ , 𝑟|𝑅|}. For each time interval, a triggered sensor sequence belongs to a certain human activity label 𝑎𝑘 ∈ 𝐴, where 𝐴 = {𝑎1, 𝑎2, ⋯ , 𝑎|𝐴|} is a set of human activity labels. Let 𝒔𝑚 = (𝑠𝑚1 , 𝑠𝑚2 , ⋯ , 𝑠𝑚ℓ ) be a subsequence of 𝒔𝑗 (𝒔𝑚 ⊆ 𝒔𝑗 , 1 ≤ 𝑚ℓ ≤ 𝑗 ≤ |𝒔𝑗 |) and 𝑠𝑢𝑝𝑝(𝒔𝑚, 𝐷𝑎𝑘 ) = |{𝒔𝑗≼𝒔𝑚|𝒔𝑗∈𝐷𝑎𝑘 }| |𝑆𝑎𝑘 | be a relative support value of the subsequence 𝒔𝑚 in 𝐷𝑎𝑘 , Which can be used to extract rare patterns w.r.t. A certain activity label. Thus, we define 𝒔𝑚 as a non-zero-rare sequential pattern for each activity below. Definition 3.1. (A non-zero-rare human activity pattern) Given a pre-defined support threshold 𝛾, and a set of pairs of sensor sequence and human activity label 𝐷, A subsequence 𝒔𝑚 is a non-zero-rare pattern on human activity label 𝑎𝑘 or (𝒔𝑚 → 𝑎𝑘) if and only if 𝒔𝑚 satisfies 𝑠𝑢𝑝𝑝(𝒔𝑚, 𝐷𝑎𝑘 ) > 0 and 𝑠𝑢𝑝𝑝(𝒔𝑚, 𝐷𝑎𝑘 ) < 𝛾. In this study, we can say a pattern 𝒔𝑚 w.r.t. an activity label 𝑎𝑘 as a rule (𝒔𝑚 → 𝑎𝑘). The rule form can be employed into the classification task. Furthermore, this study meets a multi-class classification problem since |𝐴| > 2. We may use a traditional way, i.e., binary classification strategies to fit with a multi-class problem. There are two general binary classification strategies on the multi-class problem, such as one-vs-all (OVA), and one-vs-one (OVO). Both strategies may have better performance in some cases—accuracy result, but they are computationally expensive since both of them need to compare each positive class with the negative classes. Consequently, we extend Definition 3.1. by checking whether the maximum support value of 𝒔𝑚 for each activity, label is less than a pre-defined support threshold 𝛾 and greater than 0. Definition 3.2. (A multi-class non-zero-rare human activity pattern) Given a pre-defined support threshold 𝛾, and a set of pairs of sensor sequence and human activity label 𝐷, A subsequence 𝒔𝑚 is a non-zero-rare pattern on human activity label 𝑎𝑘 if and only if 𝒔𝑚 satisfies 0 < max 𝑘 {𝑠𝑢𝑝𝑝(𝒔𝑚, 𝑆𝑎𝑘 )|∀𝑎𝑘 ∈ 𝐴} = 𝑠𝑢𝑝𝑝(𝒔𝑚, 𝐷𝑎𝑘 ) < 𝛾. Jurnal Matematika MANTIK Vol. 5, No. 1, May 2019, pp. 1-9 4 Table 1. An example of motion sensor sequences and its activity label set. Event Motion Sensor Sequence Activity label 𝑡1 𝑚4, 𝑚1, 𝑚2, 𝑚3 𝑎1 𝑡2 𝑚1, 𝑚3, 𝑚2, 𝑚3 𝑎1 𝑡3 𝑚1, 𝑚2, 𝑚3 𝑎1 𝑡4 𝑚1, 𝑚3, 𝑚2 𝑎2 𝑡5 𝑚1, 𝑚3 𝑎2 Example 3.1. Assume we have a dataset 𝐷 as shown in Table 1. and a pre-defined support threshold 𝛾 = 4 5 . A subsequence 𝒔𝑚 = (𝑚1, 𝑚2, 𝑚3) is a non-zero-rare pattern for activity 𝑎1 since 0 < max(𝑠𝑢𝑝𝑝(𝒔𝑚, 𝑆𝑎𝑘 ), ∀𝑎𝑘 ∈ {𝑎1, 𝑎2}) = 𝑠𝑢𝑝𝑝(𝒔𝑚, 𝑆𝑎1 ) = 2 3 < 𝛾. According to Definition 3.2., we do not need to perform two stages the classifier from the generated pattern, which is the basic procedures of mining sequential pattern techniques for classification case. Therefore, we can have an efficient algorithm as we directly build the rules (not only the patterns). In the next section, we will explain how to mine multi- class non-zero-rare human activity patterns. 3.2 Mining Non-Zero-Rare Sequential Patterns Algorithm This study presents an algorithm to mine a multi-class non-zero-rare pattern called Mining Multi-class Non-Zero-Rare Sequential Patterns (MMRSP). The MMRSP consists of two main stages, (1) Rare Sequential Patterns Builder (RSPB) and (2) Classification Unseen Sequence (CUS). Algorithm 1 presents the detailed procedure of RSPB to build a non-zero-rare pattern based on a training set (𝐿), where (𝐿 ⊆ 𝐷). By the frequent subsequence mining techniques used in [11,12], we first generate a motion sensor type subsequences tree (lines 5,11). Then, the support values of each pattern in each activity sequences are calculated (line 7). The maximum support value is being checked to decide whether it is a non-zero-rare pattern (lines 8-9). Later, its activity label becomes the class or be placed in the consequent part if the subsequence is a non-zero-rare pattern. Additionally, we do not append the leaves node according to the max-pruning strategy in [7]. Otherwise, a single sensor type 𝑚 from a set 𝑀 is added into the left node until all the possible candidate subsequences no longer satisfying the frequent conditions [12]. Algorithm 1. (Rare Sequential Patterns Builder) Input: a training set (𝐿), a pre-defined support threshold (𝛾), a set of activity labels (𝐴), and a sequence (𝒄) Procedure RSPB (𝐿, 𝐴, 𝛾, 𝒄) 1. 𝑆𝑚 = ∅; 2. for each 𝑎𝑘 ∈ 𝐴 do 3. 𝐷𝑎𝑘 = {(𝒔𝑗 → 𝑎𝑘 )|𝒔𝑗 ∈ 𝑆}; 4. for each 𝑚 ∈ 𝑅 × 𝑀 do 5. if (𝒄 ∘ 𝑚) ⋣ 𝑆𝑚 then 6. 𝒎𝒄 = 𝒄 ∘ 𝑚; 7. count 𝑠𝑢𝑝𝑝(𝒎𝒄, 𝐷𝑎𝑘 ); 8. if 0 < 𝑠𝑢𝑝𝑝(𝒎𝒄, 𝐷𝑎𝑘 ) < 𝛾then 9. 𝑆𝑚 = 𝑆𝑚 ⋃(𝒎𝒄 → 𝑎𝑘 ) ; 10. else if 𝑠𝑢𝑝𝑝(𝒎𝒄, 𝐷𝑎𝑘 ) ≥ 𝛾 M. Iqbal, C. P. Wulandari, W. Yunanto, and G. I. P. Sari Mining Non-Zero-Rare Sequential Patterns on Activity Recognition 5 Algorithm 1. (Rare Sequential Patterns Builder) 11. RSPB (𝐿, 𝐴, 𝛾, 𝒄); 12. end if 13. end if 14. end if 15. end for 16. end for Output : 𝑆𝑚 = {(𝒔𝑚 → 𝑎𝑘 )} is a non-zero- rare activity rule set After we hold the non-zero-rare patterns, the next stage is performing the CUS in Algorithm 2. In this stage, the activity label of unseen motion sensor sequences is being predicted. A simple way to predict is calculating the similarity between the generated non- zero-rare patterns with the unseen sequence by using cosine similarity. An activity label which has the highest support value is considered as the prediction result. In particular, we realize that the same cosine similarity values may be found or all the cosine similarity values are 0. These conditions indicate that there are no similar sensor types in 𝑃 with the generated non-zero-rare patterns. Hence, we restrain two conditions as follows: (i) an activity label is selected randomly if the cosine similarity values are the same, and (ii) a default activity label is built in the early stage based on the maximum number of activity label in 𝑆𝑚 To provide an activity prediction result when the cosine value is 0. The detailed procedure of CUS is presented in Algorithm 2. Since this study focuses on rare pattern performance in classification, we employ a precision formula to evaluate our MMRSP performance. The precision formula is denoted by: 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(%) = 𝑡𝑝 𝑡𝑛 + 𝑓𝑝 ⋅ 100% (1) where 𝑡𝑝 is a number of true positive, 𝑡𝑛 is a number of true negative and 𝑓𝑝 is a number of false positive. Algorithm 2. (Classification Unseen Sequences) Input: a testing set (𝑃), and a set of non-zero-rare activity rules (𝑆𝑚 ) Procedure CUS (𝑆𝑚 , 𝑃) 1. 𝑎𝑑 =defaultclass(𝑆𝑚 ); 2. for each 𝒑 ∈ 𝑃 do 3. 𝑎𝑝𝑖 = argmax𝑖∈𝑆𝑚 {cos(∠(𝒑, (𝒔𝑚 → 𝑎𝑚 )))} ; 4. if |𝑎𝑝𝑖 | > 1 do 5. 𝑎𝑝𝑚 =rand(𝑎𝑝𝑖 ); 6. else if |𝑎𝑝𝑖 | = 0 do 7. 𝑎𝑝𝑚 = 𝑎𝑑 ; 8. else 9. count++; 10. end if 11. end if 12. end for Output: 𝐴ℎ = {𝑎𝑝𝑚 |1 ≤ 𝑚 ≤ |𝑃|} is a set of activity labels prediction Jurnal Matematika MANTIK Vol. 5, No. 1, May 2019, pp. 1-9 6 In the next section, we will describe the dataset information and the performance evaluation of our proposed algorithm on mining non-zero-rare pattern. Furthermore, we analyze the generated non-zero-rare human activity patterns. 4. Experimental Results We now discuss the experimental results which are started by the dataset description. 4.1 Dataset descriptions To aim our goals, we perform the dataset from [1] on our proposed algorithm. The dataset contains recorded sensors in three apartments—their called house A, B, and C. In this study, the dataset from house A is being used. In the ‘house A’ dataset, it comprises 14 sensors that installed in three rooms. As the sensors are triggered, 10 activities that annotated by Bluetooth within 25 days were recorded. Figure 1. The transformation phase on human activity sequences In specific, we discretize the sensory data with a different time interval Δ𝑡 = 60𝑠. As a result, we have around 42000 human activity events data. To fit in our algorithms, the dataset needs to be transformed into the form of a sequence. It can be done by taking one discretization event as one sequence with the activity label as the last item of the sequence. We provide an example of the transformation process in Figure 1. Figure 2. A human activity sequences form M. Iqbal, C. P. Wulandari, W. Yunanto, and G. I. P. Sari Mining Non-Zero-Rare Sequential Patterns on Activity Recognition 7 Furthermore, a training set consists of 200 sequences with the maximum length is 3752 and there are 15 distinct motion sensor labels in a set of sequences 𝑆 and 8 activity labels, such that brush teeth, get drink, go to bed, leave house, prepare breakfast, prepare dinner, take a shower, and use toilet. A testing set consists around 200 sequences with the maximum length is 3752 that we observe as the unseen sequences. We give an example of human activity sequences forms in Figure 2. 4.2 Evaluation This study simulated on a range of 𝛾-threshold into [0.01,1] since we found that the scale of the dataset is quite small over the number of distinct motion sensor label in 𝑆. Based on 𝛾 = 0.05, we extracted 10 non-zero-rare patterns only for use toilet, such as: Freezer → use toilet—motion sensor in Freezer is being triggered and recognized as use toilet, Groceries Cupboard → use toilet—motion sensor in Groceries Cupboard is triggered when the resident use toilet, etc. (see in Figure 3). These patterns are categorized as unusual resident activities when they still use the toilet. Also, we obtain 13 non-zero-rare human activity patterns only to go to bed. The patterns are Dishwasher → go to bed, Plates cup board → go to bed, etc. Figure 3. Non-zero-rare human activity pattern when 𝛾 = 0.01 Interestingly, we found that the number of generated non-zero-rare patterns are different for each support thresholds 𝛾 value (it is depicted in Fig 4). In this case, each 𝛾 value built a different particular activity that contains a different number of non-zero-rare patterns. Additionally, a support threshold 𝛾 can be represented as a particular activity event. As another viewpoint, the resident notifies that there is an unusual activity during the event and in place(s), which the sensors are triggered. Jurnal Matematika MANTIK Vol. 5, No. 1, May 2019, pp. 1-9 8 Figure 4. The number of generated non-zero-rare human activity patterns vs the support threshold values. In addition, we test the generated non-zero-rare human activity patterns to predict the unseen sequences based on precision values. The overall precision result is 87.5%. 5. Conclusion and Future Works As the precision result, the generated non-zero-rare human activity patterns can discover the unusual events during the residents do their activity. This can be used as an alert for the residents. Even though the MMRSP is well-performed, we still need to discuss the phenome that each support threshold give us different rare event only for a particular activity. Thus, we will obtain properties to explain the relation between support threshold and activity events in the future References [1] T. L. M. van Kasteren, G. Englebienne and B. J. A. Kröso, “Human activity recognition from wireless sensor network data: Benchmark and software,” in Activity Recognition in Pervasive Intelligent, 2010, pp. 165-185 [2] D. J. Cook, C. Krishnan, and P. Rashidi, “Activity Discovery and Activity Recognition: A New Partnership,” IEEE Trans. Cybernetics, 2013, Vol. 43, pp. 820– 828. [3] M. Iqbal and H.-K. Pao, “Activity Recognition from minimal distinguishing subsequence mining,” in International Conference on Mathematics: Pure, Applied and Computation, 2017, pp. 020046-1-020046-6. [4] I. Mukhlash, D. Yuanda, and M. Iqbal, “Mining fuzzy time interval periodic patterns in smart home data, ”International Journal of Electrical and Computer Engineering, 2018, Vol. 8(5), pp. 3374-3385. [5] W. Ouyang, “Mining Rare Sequential Patterns in Large Transaction Database,” in International Conference on Computer Science and Electronic Technology, 2016, pp. 159-162. [6] A. Samet, T. Guyet, and G. Negrevergne, “Mining rare sequential patterns with ASP,” in International Conference on Inductive Logic Programming, 2017. [7] X. Ji, J. Bailey, and G. Dong, “Mining minimal distinguishing subsequence patterns with gap constraints,” in International Conference on Data Mining, 2005, pp. 194- 201. [8] T. Gu, Z. Wu, X. Tao, H. K. Pung, and J. Lu, “epSICAR; An emerging patterns based approach to sequential, interleaved and Concurrent Activity Recognition,” in M. Iqbal, C. P. Wulandari, W. Yunanto, and G. I. P. Sari Mining Non-Zero-Rare Sequential Patterns on Activity Recognition 9 International Conference on Persasive Computing and Communications, 2009. [9] J. Yin, G. Tian, Z. Feng, and J. Li, “Human activity recognition based on multiple- order temporal information,” Computers and Electrical Engineering, 2014, Vol. 40, pp. 1538-1551. [10] J. Wen, M. Zhong, and Z. Wang, “Activity recognition with weighted frequent patterns mining in a smart environment,” Expert Systems with Applications, 2015, Vol. 42. pp. 6423-6432. [11] V. S. Tseng and C.-H. Lee, “Effective temporal data classification by integrating sequential pattern mining and probabilistic induction,” Expert Sytems with Applications, 2009, Vol. 36(5). pp. 9254-9532. [12] R. Agrawal, and R. Srinkant, “Mining sequential patterns,” in International Conference on Data Engineering, 1995, pp. 3-14.