Microsoft Word - cet-01.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright © 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 A Method for Determining Decision Tree Complexities Based on Multiobjective Optimization for Online Learning Community Liang Jia , Nigang Sun * School of Information Science & Engineering, Chang Zhou University, China. ngsun@cczu.edu.cn This paper focuses on finding minimal complexities of decision trees representing rules defined for features and functionalities of online learning community. Because simultaneously minimizing all complexities is not always possible and this matches pattern of multiobjective optimization, Pareto optimal point is introduced for describing complexity distributions for selecting decision tree most suitable for system requirements. Unlike typical multiobjective optimization whose feasible space can be explicitly represented, feasible space for complexities of decision trees can only be implicitly reflected by relationships of subtables and associated objective functions due to the great number of possible trees. This paper provides a means of finding Pareto optimal points in implicit feasible space determined by objective functions defined for describing decision tree complexities by using graph and associated algorithm. The experimental results show that proposed algorithm yields valid results for finding Pareto optimal points. 1. Introduction As smart phones, tablets and various types of digital mobile devices prevail in daily life, the internet expands through Wi-Fi, 4G and other mobile telecommunications standards compatible with mobile devices. Our daily lives are dramatically changed in several aspects such as learning as mobile devices flourished in past few years. With the help of new technologies, learning evolves from traditional blackboard and chalks to interactive online activities through the platforms like online learning communities. As far as mobile web can stretch, learning can occur at anytime, anywhere and with anybody by using online learning communities. Online learning communities now provide a convenient means for common people to interact with professional experts, teachers, technicians and other people of advanced knowledge and skills of some expertise. For learning itself, online learning communities promote not only knowledge learning but also idea sharing. However, an efficient solution of online learning community for providing multifarious functionalities is difficult to be implemented. For example. massive information presented in online learning communities involved with a great number of users is quite costly and difficult to classify under the considerations of what topics of users like and other features shared by certain users in online learning communities. As an efficient and viable means for implementing automatic decision making under the sophisticated cases, decision trees are widely adapted in multifarious fields as digital image processing (Elsalamony, 2014; Grana et al., 2012), biomedicine (Czajkowski et al. 2014), firm power capacity scheduling (Moutis and Hatziargyriou, 2014), stock analysis (Bast et al, 2015), earnings management (Chen et al., 2015), etc. These successful applications imply decision tree is a highly-efficient solution for implementing functionalities of online learning communities, e.g., information categorization. The focuses are how to construct decision trees, and more important which decision tree is most feasible for online learning communities. The most properties of decision trees need to be taken into account are time and spacial complexities of trees which determine efficiencies of functionalities depending on trees. Because minimization of both time and spacial complexities is always not attainable, this introduces concept of Pareto set which actually enumerates all Pareto optimal solutions for some objective constrains (Erfania and Utyuzhnikov, 2011; Ghosh and Chakraborty, 2014). Although there are multiple algorithms developed for finding Pareto set theoretically, a few targets decision trees (Chikalov, 2013; Hussain, 2014). This paper presents an algorithm for determining Pareto set of time and spacial complexities of decision trees derived from decision table bearing the features of online learning communities. The time and spacial DOI: 10.3303/CET1546033 Please cite this article as: Jia L., Sun N.G., 2015, A method for determining decision tree complexities based on multiobjective optimization for online learning community, Chemical Engineering Transactions, 46, 193-198 DOI:10.3303/CET1546033 193 complexities of decision tree are defined and employed as objective functions of multiobjective optimization whose desired solutions form Pareto set. The details of algorithm are visually described by UML activity diagrams (Rumbaugh et al., 2010). The algorithm essentially depends on adapting data structure of graph for preserving decision table splitting whose splitting information is employed for generating Pareto set in turn. Some experimental data is adapted for illustrating visual result of Pareto optimal points found in objective space. However, algorithm avoids constructions of decision trees corresponding to Pareto points. The rest of paper is organized as following. Section 2 provides a brief review about Pareto set and decision tree. Section 3 defines decision tree complexities and mathematical model for generating Pareto optimal points for a given decision table. Section 4 describes algorithms implementing models discussed in Section 3 in details. Section 5 introduces three experimental results for showing validity of proposed algorithm and Section 6 draws the conclusion. 2. Related works Constructions of decision trees based on given decision tables are constantly evolving. There are plenty of means for constructing decision trees with respect to terminal vertices and height of the tree. Different strategies are employed for decision tree constructions of different kinds of decision tables, e.g., single-valued decision tables (Amin et al., 2013;Chikalov, 2013; Hussain, 2014; Luo et al, 2015) and multi-valued tables (Azad et al., 2013; Azad and Moshkov, 2014). Construction algorithms roughly match some design pattern like dynamic programming (Azad and Moshkov, 2014), incremental algorithm (Luo et al, 2015) and greedy algorithm (Amin et al., 2013; Azad et al., 2013), etc. Normally, almost all of proposed algorithms attempt to construct decision trees of minimal height and minimal left number which are referred as time and spacial complexities. Commonly, such attempts end with compromise of optimizing only one complexity because minimization of two is not always possible. This essentially resembles the case of multiobjective optimization in which several objective functions need to be minimized, but the ideal case of minimizing all objective functions simultaneously is not possible. Hence, suboptimal solutions which minimize objective functions in some degree are derived as Pareto set. There are greatly number and types of Pareto set generation algorithms depending on actual problem patterns, e.g., utility function method (Zeng et al., 2012), global criterion method (Rao, 2009), nonsmooth convex objective function method (Attouch et al., 2015), lexicographic method (Pulido et al., 2014), goal programming method (Rao, 2009), goal attainment method (Rao, 2009), etc. Formally, k constrains of multiobjective optimization are formalized as functions g ,g ,…,g , there are n objective functions F ,F ,…,F and their values form objective space ⊆ . Vectors compatible with F where i 1,2,…,n form decision space ⊆ and the set of vectors in satisfying g where j 1,2,…,k is called feasible space and denoted by ∗ ⊆ . Hence, there is a map : → from ∗ to a subset ∗ ⊆ , i.e., image of ∗ under . For a given vector ∈ ∗, the definition of multiobjective optimization is given as min ∈ ∗ min ∈ ∗ F ,F ,…,F where ∗ ∈ ∗ is Pareto optimal for min ∈ ∗ iff ∄ such that F F ∗ for i 1,2,…,n and ∃j, 1 j n, F F ∗ . Minimum of F in is denoted by F∗, a common assumption of min ∈ ∗ is that there is no ∈ ∗ satisfying F ,F ,…,F F∗,F∗,…,F∗, . To understand how Pareto set is related with decision trees, relative definitions of decision tree have to be introduced. A decision tree is a concise representation of decision table. A decision table T is a two dimensional table of condition attributes f , f ,…, f E T which graphically are column headers of T . Row of index i in T contains an array of values as c ,c ,…,c corresponding to E T and these values lead to a decision value d . Values of n rows in T form a set c ,c ,…,c , c ,c ,…,c … c ,c ,…,c C T and the matched n decisions form a set d ,d ,…,d D T (|D T | n when d d , i j; |D T | n, otherwise). Generally, a decision table T has the following structure. T f … f d c … c d ⋮ ⋱ ⋮ ⋮ c ⋯ c d (1) For any d ∈ D T , k 1,2…n, there is a c ,c …c r ∈ C T . For two r ,r ∈ C T and k k , inequalities c c ,c c …c c hold for r ,r . For a given subset of E T , i.e., f , f ,…,f and their values a ,a ,…,a , a subtable Θ can be constructed from T. Θ has same condition attributes as T, but its rows is selectively chosen from T , i.e., E Θ E T and for any r c ,c …c ∈ C Θ where j 1,2…|C Θ |, c a ,c a …c a hold. Θ is also denoted by T f ,a f ,a … f ,a . Based on mentioned definitions, time and spacial complexities can be defined and mathematical formulae for modeling generation of Pareto optimal point can finally be given in following sections. 194 3. Decision tree complexities and generating pareto set This section comprises two subsections: the first focuses on definitions of time and spacial complexities of decision trees; the second illustrates how Pareto optimal points are derived with respect to objective functions involving decision tree complexities. 3.1 Time and Spatial Complexities of Decision Trees For any vertex v of a decision tree Γ generated based on a given decision table T, v denotes one condition attribute from E T , i.e., v ∈ E T . The edge connecting just two vertices is marked by condition values from C T . Notice, any f ∈ E T where i 1,…,|E T | can be root vertex of Γ under the assumption c ∈ r ∈ C T where j 1,…,|D T | is processed despite of order of f , f ,…, f| | , and ending vertex of Γ can only be decision values from D T . A path of t steps from root vertex f to arbitrary v ∈ E T ∪ D T begins from f , passing vertices f , f ,…, f through their connected edges marked by values a ,a ,…,a and ends at v. According the vertices and values associated with t -step path, a subtable T v can be constructed as following. T v T v f T f ,a f ,a … f ,a v f (2) For an arbitrary row r ∈ C T , k 1,2…|C T |, r is determined by some decision value d ∈ D T . In a decision tree Γ built on T d , there is a path presents r , the length of r , ℓ r , is defined by the sum of values marked on edges of path from root of T d to d . If r c ,c …c , then ℓ r ∑ c . For a decision tree Γ built on T v , the total length of T v is defined as Γ ∑ ℓ r . The time complexity of Γ is defined as followings. Γ Γ T v where T v denotes the number of rows in T represented in T v . Spacial complexity of Γ is defined as terminal vertices of Γ , Γ . 3.2 Pareto set generation All condition values associated with f ∈ E T in C T is denoted by C T,f , i.e., all values in column f of T. All decision trees generated based on T form a set T , and β min ∈ Γ , α min ∈ Γ , R max ∑ c | | , then , β,β 1…|C T | ⋅ R , , α,α 1…|C T | respectively denote ranges of and . Map : → and : → are defined as followings. l min T,Γ ∈ , |Γ ∈ T and T,Γ l t min T,Γ ∈ , |Γ ∈ T and T,Γ t Where l ∈ , and t ∈ , . Let data structure graph be employed to represent relationships between T and its subtables, then the nodes of graph are subtables T v T , edges emitting from a node associated with f ∈ E T are marked by pairs of values f ,a where a ∈ C T,f and pointing to nodes T f ,a . Hence, edges in the graph are directed and connect two tables of parent and child relationship. The graph is thus directed acyclic graph (DAG). For a node T ∈ DAG, it has two possible types determined by number of decision values. When |D T | 1 for all rows r ∈ C T , r share a common decision value and T is leaf node of DAG of 1 0. When |D T | 1, r ∈ C T can be categories based on their different decision values. Starting from node T , for any f ∈ E T and a ,a ,…,a| , | ∈ C T ,f , there are |C T , f | edges marked by f ,a , f ,a , …, f ,a| , | and leading to nodes T f ,a , T f ,a ,…, T f ,a| , | . For each T f ,a , there is a corresponding , , . For f ∈ E T , an ordered set of all possible compositions of Γ , where j 1,…,|C T , f | is defined as followings. , , ∗ , ∗ ,…, | | ∗ Where 1 … 1 , ∗ denotes inner product and ∗ ∗ where k i, i 1,…,|C T |. is defined as followings. min ∈ , Γ , min ∈ , Γ , ⋮ min ∈ , , Γ , , and Γ , Γ , ⋮ Γ , , (3) For , i 1 j |C T | , Γ ∈ T f ,a , Γ ∈ T f ,a , … , Γ ∈ T f ,a , is defined as follwoings. For T f ,a where i 1,…,|E T |, value of , can be derived. For ∗ ∈ , , , objective function associated with complexities of decision trees generated around f ∈ E T is defined as followings. 195 , , min ∗ ∈ , , , |C | 4 Where j denotes the jth component of vector . According to , objective function , of E T for describing complexities of subtable T is given as followings where , , , , , , ,…, , , . , min , , ∈ , , , 5 Generally, if T is terminal node of graph, then 1 0 and Pareto optimal point is 1, 1 ; if T is not a terminal node, we first compute , , for each f ∈ E T , then choose the minimal value as , and Pareto optimal point is , , , where , is objective function for retrieving ∗ ∈ , , which yields . For multiobjective optimization of decision tree complexities, objective space is a two- dimensional space whose objective functions are , and , which graphically are two axis of space. 4. Algorithms for generating pareto points This section describes the procedure of generating Pareto optimal points in details. The general scheme consists of two subroutines, i.e., Subroutine 1: generate a directed acyclic graph starting from and proceeding downward until degenerate subtables, and Subroutine 2: compute starting from degenerate subtables in graph and proceeding upward until . Subsections 4.2 to 4.3 respectively describe the two phases of generating Pareto optimal points. 4.1 Generating DAG based on decision table This section describes the procedure of generating Pareto optimal points in details. The general scheme is sketched in subsection 4.1. Subsections 4.2 to 4.3 respectively describes the two phases of generating Pareto optimal points. Figure 1: General scheme of proposed algorithm. 4.2 Generating DAG based on decision table As depicted in Figure 2 left, Subroutine 1 attempts to split a given decision table T in a recursive manner. After adding T to DAG, it tries to find an table T not split yet from DAG and insert subtables of T f ,a where f ∈ E T and a ∈ C T,f as nodes to DAG. Node T and all T f ,a are connected by directed edges starting from T and ending at T f ,a . Edges are marked by corresponding condition values f ,a . Node insertion only occurs when subtable is not found in DAG and if there is an edge connecting T and existing T f ,a , the mark of edge will be added by f ,a rather than adding a new edge. This node insertion and edge modification or addition continues until there is no tables found in DAG which are not split. 4.3 Subroutine 2 generating pareto points Subroutine 2 provides core functionality for generating Pareto points based on mathematical descriptions made in Section 3. The strategy is starting generating from terminal nodes of DAG whose Pareto points are of form 1,0 , then moving upward to subtables T containing only terminal nodes providing values for and compute by adding the smallest with |C T |. Before T is reached, it continues to find the next subtables T whose subtables have completed their Pareto point computation. For T, each Pareto point attached with f ∈ E T is considered as an optimal Pareto point as output of proposed algorithm. The details are illustrated in Figure 2 right. 196 Subroutine 1 Try to find an unsplit table from DAG [un unsplit table is available ] [else] DAG decision table T Mark T as unsplit, initialize a directed acyclic graph(DAG) and add T as a vertex to DAG Set the available table as T ’ [else] [an unvisited  fi is available] Try to find an unvisited condition attribute fi∈E(T ’) Mark T ’ as split [an unvisited δ is available ] [else] Try to find an unvisited value δ From C(T ’, fi) [no identical vertex is found] [else] Check whether there is a vertex in DAG whose subtable is identical with T ’(fi, δ) Check whether vertex T ’ and T ’(fi, δ) is connected by a directed edge [no edge is found] [else] Add T ’(fi, δ) to DAG as a vertex Add a directed edge starting at T ’ and pointing to T ’(fi, δ) to DAG, and mark the edge by (fi, δ) Add (fi, δ) to the mark of the found edge Subroutine 2 DAG Initialize |D(T)| vertex collections S1, S2, …, S|D(T)| Categorize each vertex T ’∈DAG based on |D(T ’)| and add T ’ to S|D(T ’)| Notice S|D(T)| contains only T Attach (R = 1,F(R) = 0) to each vertex in S1 and set S1 to Si Find Si+1 after Si based on the order S1, S2, …, S|D(T)| [Si+1 is available ] [else] [T ’ is not available ] [else] Try to find an unvisited vertex T ’∈Si+1 Try to find an unvisited condition attribute fi∈E(T ’) [fi is available] [else] Follow the edges emitting from T ’ and marked by (fi, δj) where δj∈C(T ’, fi) to retrieve (R(fi, δj),F(fi, δj)) from ending vertex in Si Compute Rfi = ΣjR(fi, δj) and Ffi = ΣjF(fi, δj)+|C(T ’)| Check whether T ’ is T [T ’ ≠ T] [else] Find the smallest Ffi, if multiple Ffi share the minimum , then find the smallest Rfi,and set Ffi to FT ’, set Rfi to RT ’, finally attach (RT ’, FT ’) to vertex T ’ Attach all (Rfi, Ffi) for fi∈E(T ) to vertex T Pareto points return all (Rfi, Ffi) attached to vertex T for fi∈E(T ) as Pareto points Figure 2: Subroutine 1 and Subroutine 2. 5. Experimental results This section introduces experimental results of adapting proposed algorithm for decision tables designed for a virtual online learning community. These decision tables define some functionalities of online learning community as rules of categorizing posts based on topics, rules of identifying user of high influence, etc. A typical table has about 6 condition attributes and 50 rows. Pareto points are rendered as dots in a two dimensional space of axes leaf vertex number and . Three typical results are show in Figure 3 6. Conclusions This paper proposes mathematical model and corresponding algorithms for generating Pareto optimal points of decision tree complexities for given decision tables. Unlike traditional multiobjective optimization whose feasible space is explicitly described in some means as mathematical formulae, the feasible space of decision tree complexities is implicitly related to underlying decision tables. The implicit feasible space is revealed by employing graph to preserve relationships of subtables. Relationships are represented as nodes and directed edges in graph which lead to minimization of objective functions defined based on concepts of complexities. The validity of proposed algorithm is shown by generating Pareto optimal points of decision tables made for a virtual online learning community. Figure 3: Pareto Set for Table 1 of Virtual Online Learning Community. 197 References Amin T., Chikalov I., Moshkov M., Zielosko B., 2013, Dynamic programming approach to optimization of approximate decision rules, Inform. Sci., 221, 403-418. Attouch H., Garrigos G., Goudou X., 2015, A dynamic gradient approach to Pareto optimization with nonsmooth convex objective functions, J. of Math. Anal. and Applicat., 422, 741-771. Azad M., Moshkov M., 2014, Minimization of decision tree average depth for decision tables with many-valued decisions, Procedia Comput. Sci., 35, 368-377. Azad M., Zielosko B., Moshkov M., Chikalov I., 2013, Decision rules, trees and tests for tables with many- valued decisions–comparative study, Procedia Comput. Sci., 22, 87-94. Bastı E., Kuzey C., Delen D., 2015, Analyzing initial public offerings' short-term performance using decision trees and SVMs, Decision Support Syst., 73, 15-27. Chen F., Chi D., Wang Y., 2015, Detecting biotechnology industry's earnings management using Bayesian network, principal component analysis, back propagation neural network, and decision tree, Econ. Modelling, 46, 1-10. Chikalov I., Hussain S., Moshkov M., 2013, Totally optimal decision trees for monotone boolean functions with at most five variables, Procedia Comput. Sci., 22, 359-365. Czajkowski M., Grześ M., Kretowski M., , 2014, Multi-test decision tree and its application to microarray data classification, Artificial Intell. in Medicine, 61, 35-44. Elsalamony H. A., 2014, Detecting distorted and benign blood cells using the Hough transform based on neural networks and decision trees, in Emerging Trends in Image Process., Comput. Vision and Pattern Recognition, 1st ed. USA: Morgan Kaufmann. Erfania T., Utyuzhnikov V. S., 2011, Directed Search Domain: A method for even generation of the Pareto frontier in multiobjective optimization, Eng. Optimaztion, 43, 467-484. Ghosh D., Chakraborty D., 2014, A new Pareto set generating method for multiobjective optimization problems, Operations Research Lett., 42. 514-521. Grana C., Montangero M., Borghesani D., 2012, Optimal decision trees for local image processing algorithms, Pattern Recognition Letters, 33, 2302-2310. Hussain S., 2014, Total path length and number of terminal nodes for decision trees, Procedia Comput. Sci., 35, 514-521. Luo C., Li T., Chen H., Lu L., 2015, Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values, Inform. Sci., 299, 221-242. Moutis P., Hatziargyriou D. N., 2014, Decision trees aided scheduling for firm power capacity provision by virtual power plants, Int. J. of Elect. Power & Energy Syst., 63, 730-739. Pulido J. F., Mandow L., Cruz P. L. J., 2014, Multiobjective shortest path problems with lexicographic goal- based preferences, European Journal of Operational Research, 239, 89-101. Rao S. S., 2009, Practical aspects of optimization, in Engineering Optimization Theory and Practice, 4th ed., USA: John Wikey & Sons. Rumbaugh J., Jacobson I., and Booch G., 2010, Unified Modeling Language Reference Manual, 2nd ed. MA, USA: Addison-Wesley. Zeng X., Wong W., Leung Y. S., 2012, An operator allocation optimization model for balancing control of the hybrid assembly lines using Pareto utility discrete differential evolution algorithm, Comput. & Operations Research, 39, 1145-1159. 198