Microsoft Word - cet-01.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright © 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 Decision Tree Algorithm based on Granular Computing and Important Degree of Attribute Value Liu Pinga, W u Zhenganga, Ge Lichengb, W ang Haochongc, Yang Junping*d a School of information engineering, Nanchang University, Nanchang 330031 b Institute of Advanced study, Nanchang University, Nanchang330031 c Institute of electrical and mechanical engineering, Nanchang University, Nanchang 330031 d Affiliated Hospital of Jiangxi College of Traditional Chinese Medicine, Nanchang 330006, China yyfyjp0507@126.com Conventional decision tree algorithm employs information gain and information gain ratio to select splitting attribute, and avoid attribute value significance. According to the analysis on a single attribute decision -making problem, it is found that, different value of the same condition attribute has different influence on decision - making results. Based on this preliminary conclusion, proportion matrix and Euclidean norm are introduced to quantitatively describe the important degree of attribute value and a decision tree algorithms proposed based on granular computing . Experimental results show that, compared with ID3 algorithm, the proposed algorithm has higher accuracy when applied to classification problems with multiple attribute values. 1. Introduction Decision tree algorithm is an effective nonparametric method of learning which can be used for classification and regression. As early as in 1979, Quinlan proposed the famous ID3 algorithm, a decision tree classification algorithm based on information entropy [Quinlan (1986) reported]. After the concept was proposed, many scholars developed an interest on ID3 algorithm and made much improvement in respond to its weaknesses. One of the research directions is from the perspective of threshold setting and pruning strategy [Chen et al. (2011) reported]. Due to distinct features of data sets, and the fact that the decision tree algorithm resembles hill-climbing, such perspective has received wide discussion which never ends. Another direction is from the perspective of taxonomy, for example, weigh the information entropy of ID3 algorithm to get new information gain [Chen et al(2011),Yu(2012)reported, DING et al (2010) reported, ZHOU et al(2010) reported, Chenxia (2014) reported). Other directions are engaged in using rough set [Qiu et al. (2009)reported, DING et al (2010) reported, Ran et al(2014) reported] or fuzzy mathematics [Eyke (2008) reported, Chenxia (2014) reported, V. Srinivasan(2013) reported], or employing modern artificial intelligence algorithms [ FENG et al.(2009) reported]. The traditional ID3 algorithm and the improved algorithm are all considered to select the best splitting attribute, without considering the influence of different values of the same attribute on results. Through the analysis of a single attribute decision-making problem, this paper figures out the direct influence of the condition attribute value on decision-making results and uses the proportion matrix and the Euclidean norm to conduct a quantitative analysis so as to confirm new classifications. 2. Relevant concepts of granule in the information system Granular computing [Han et al.(2010)reported, Qiu et al.(2009)reported, Zhang et al.(2011)reported] takes on a micro perspective. By roughly or detailed computing the granule, people can observe the granule from different aspects, analyze and solve the problem. An information system can be formalized as S = ⟨U,A,V, f⟩,U = {u1,u2,…,uN} is a set of objects; it is a non- empty finite set of objects, called the universe, A = C ∪D is a set of attributes. C as the condition attributes set, D as the decision attribute set. A = {a1, a2,…,aM}, V = {Va1,Va2,…,VaM} is a set of attribute values, Vai = {Vai,1,Vai,2,…Vai,j} is the domain of ai attribute, f(u,ai) ∶ U × A → Vai f is a mapping,∀u ∈ U,ai ∈ A, f(u,ai) ∈ Va DOI: 10.3303/CET1546057 Please cite this article as: Liu P., Wu Z.G., Ge L.C., Wang H.C., Yang J.P., 2015, Decision tree algorithm based on granular computing and important degree of attribute value, Chemical Engineering Transactions, 46, 337-342 DOI:10.3303/CET1546057 337 Definition 1(Elementary granule) Forai(ai ∈ A) , Function h(ai,vj) reflects the set of objects whose value is vj(vj ∈ Vai) . In the information system S = ⟨U,A,V,f⟩ , Gr = ((ai, vj),h(ai,vj)),ai ∈ A,Gr is named elementary granule of the information system.Size of information granule can be defined as the cardinality of the extension of the information granule, it is expressed as|h(ai,vj)|. (i = 1,…. , |A|;j = 1, . . , |Vai|). Definition 2(Elementary condition granule and Elementary decision granular) For the information systemS = ⟨U,A,V, f⟩ , attribute sets are divided into two parts, in which A = C⋃D, C = {c1,c2,…,cS}is named the condition attribute set and D = {d1,d2,…,dT} is named the decision attribute set.The concept of elementary condition granular: Grc = ((ci,vj),h(ci,vj)),ci ∈ C,vj ∈ Vci (i = 1,…. ,s; j = 1, . . , |Vci|) The concept of elementary decision granular:Grd = ((di, vj),h(di,vj)),di ∈ D,vj ∈ Vdi(i = 1,…. ,T; j = 1, . . , |Vdi|) Definition 3 (combination granule) Suppose ω is the logical combination of many elementary granules using logical words ⋀ ,⋁ , → ,∼ ,←. h(ω) is used to show the set the objects which meet the requirement of the logical combination ω. Gr = ((ω),h(ω)) is named the combination granule. Size of combination granule [Qiu et al.(2009)reported] can be defined as the cardinality of the extension of the combination granule, it is expressed as L(Gr) = |h(ω)| Definition 4 (intersection ) For two combination granules:Gr1 = ((ω1),h(ω1)),Gr2 = ((ω2),h(ω2)) Gr1⨂Gr2 = ((ω1 ∨ ω2),h(ω1 ∧ ω2)), Definition 5(The condition granular set) The concept of junior condition granular set is: CSi = {Grci,1,Grci,2 …Grci,j}, Grci,j = ((ai, vj),h(ai,vj)),ai ∈ C,vj ∈ Vci, (i = 1,…. , |C|;j = 1, . . , |Vci|) , it shows a set of all elementary condition granules of a condition attribute, and we name it junior set of condition granular. We name the combination granule set after several intersections of junior condition sets as a set of condition granular, and every combination granule is one combination of the condition granular set. If two combinations belong to the same condition granular set, we name it consanguinity association. If two combinations do not belong to the same condition set, we name it non - consanguinity association. Definition 6(Intersection of condition granular set : ) For any two junior condition sets CSm = {Grcm,1,Grcm,2 …Grcm,s} CSn = {Grcn,1,Grcn,2 …Grcn,t} CSm⨂CSn = { Grcm,1⨂Grcn,1,Grcm,1⨂Grcn,2 …Grcm,1⨂Grcn,t Grcm,2⨂Grcn,1,Grcm,2⨂Grcn,2 …Grcm,2⨂Grcn,t ⋮ Grcm,s⨂Grcn,1,Grcm,s⨂Grcn,2 …Grcm,s⨂Grcn,t} Definition 7 (The set of decision granular) The concept of decision granular set is: DK = {Grdk,1,Grdk,2 …Grdk,t}, it shows a set of all elementary decision granules about a decision attribute. Definition 8(Intersection of condition granular set and decision granular set) Without loss of generality, we assume that there is only one decision attribute. CSωm = {Grωm,1,Grωm,2,…Grωm,s} DK = {Grdk,1,Grdk,2 …Grdk,t} CSm⨂DK = { Grωm,1⨂Grdk,1,Grωm,1⨂Grdk,2,…,Grωm,1⨂Grdk,t Grωm,2⨂Grdk,1,Grωm,2⨂Grdk,2,…,Grωm,2⨂Grdk,t ⋮ Grωm,s⨂Grdk,1,Grωm,s⨂Grdk,2,…,Grωm,s⨂Grdk,t} Definition 9 (proportion matrix) For a decision attribute dk ∈ D,Vdk = {Vdk,1,Vdk,2 …Vdk,t} is the range of the decision attribute. All combinations of a condition set CS is csi(i = 1,2,3…s) .W e name the ratio of decision granular set and condition granular set after intersection as proportion matrix p, P = [ L(Grωm,1⨂Grdk,1 ) L(Grωm,1) L(Grωm,1⨂Grdk,2 ) L(Grωm,1) … L(Grωm,1⨂Grdk,t ) L(Grωm,1) L(Grωm,2⨂Grdk,1 ) L(Grωm,2) L(Grωm,2⨂Grdk,2 ) L(Grωm,2) … L(Grωm,2⨂Grdk,t ) L(Grωm,2) ) L(Grωm,s⨂Grdk,1 ) L(Grωm,s) ⋮ L(Grωm,s⨂Grdk,2 ) L(Grωm,s) … L(Grωm,s⨂Grdk,t ) L(Grωm,s) ] 338 In particularly, if CS is a junior condition set, we name the proportion matrix as junior proportion matrix. Meanwhile, we name the j line of the proportion matrix P as ratio vector of condition combination granule csj. 3. Decision tree algorithm based on granular computing and important degree of attribute value Definition10 Important degree of attribute value. (The correlation degree about the condition granule combination and the decision granule) We take the correlation degree between a combination of condition granule 𝑐𝑠𝑗 and the decision attribute D as important degree of attribute value. Hypothesis that the proportion of vector 𝑐𝑠𝑗 as follows: 𝑃𝑗 = [ 𝑝1𝑗 𝑝2𝑗 ⋮ 𝑝𝑚𝑗 ] ①Normalization 𝑃𝑗 ′ = 𝑃𝑗 𝑠𝑢𝑚(𝑃𝑗) = [ 𝑝1𝑗 ′ 𝑝2𝑗 ′ ⋮ 𝑝𝑚𝑗 ′ ] ②Euclidean norm R = √𝑝1𝑗 ′2 + 𝑝2𝑗 ′2 + ⋯+ 𝑝𝑚𝑗 ′2 R expresses the important degree of attribute value. Decision tree algorithm description: Algorithm 1: Produce the Elementary granule space Input: Decision information system S = (𝑈,𝐴,𝑉,𝑓), 𝐴 = 𝐶 ∪ 𝐷, object 𝑥 ∈ 𝑈, attribute 𝑎 ∈ 𝐴. Output: Elementary granule space (Elementary condition granule set 𝐶𝑆 and Elementary decision granular 𝐷). CS = ∅;D = ∅; for (𝑎𝑖 ∈ 𝐴(1 ≤ 𝑖 ≪ |𝐴|)) do for (𝑣𝑗 ∈ 𝑉𝑖(1 ≤ 𝑗 ≪ |𝑉𝑎𝑖|)) for 𝑢𝑘 ∈ 𝑈(1 ≤ 𝑘 ≤ |𝑈|) do if ( 𝑓(𝑢𝑘,𝑎𝑖) = 𝑣𝑗) j = 1, . . , |𝑉𝑎𝑖| { 𝜑 = (𝑎𝑖,𝑣𝑗); ℎ(𝜑) = ℎ(𝜑)+ {𝑢𝑘};} //end if if (𝑎𝑖 ∈ 𝐶) { 𝐺𝑟𝑐 = ((𝑐𝑖, 𝑣𝑗),ℎ(𝑐𝑖,𝑣𝑗)); 𝐶𝑆𝑖 = 𝐶𝑆𝑖 + {𝐺𝑟𝑐} } else { 𝐺𝑟𝑑 = ((𝑑𝑖, 𝑣𝑗),ℎ(𝑑𝑖, 𝑣𝑗)); 𝐷𝑖 = 𝐷𝑖 + {𝐺𝑟𝑑} } Algorithm 2:Construct the decision tree Repeat {Subject each combination and each non-consanguinity association at the right: Step1: According to the Definition 6-10, calculate important degree of attribute value. Compute 𝐶𝑆𝑚⨂𝐶𝑆𝑛 and 𝐶𝑆𝑚⨂𝐷𝐾 according to definition 6, 8. // 𝐶𝑆𝑚, 𝐶𝑆𝑛 are non-consanguinity association. Get the proportion matrix according to definition 9. Calculate the important degree of attribute value according to definition 10. Step2: ranking from the maximum of the important degree of attribute value to the minimum . Step3: Store the decision result in the node.} Until there is no non-consanguinity association at the right of this node. 339 4. Case study Table 1: An example data set No. A B C Class 1. R 1 0 Y 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. R S R R S S S R R S R R 2 1 2 2 2 1 1 2 2 1 1 1 0 1 1 1 1 0 1 0 0 1 1 1 Y N Y Y N Y Y N N N Y Y Step1: According to Algorithm 1, set up elementary granule space. Elementary condition granule set CS: Figure1: Example of granule set Step2: According to Algorithm 2, construct the decision tree Step2.1 According to the Definition 6-10, calculates important degree of attribute value. Compute CSm⨂CSn and CSm⨂DK according to definition 6, 8. ⨂ (class, Y) (class, N) (B, 1) [1, 7, 8,12, 13] [3, 11] (B, 2) [2, 4, 5] [6, 9, 10] ⨂ (class, Y) (class, N) (C, 0) [1, 2, 7] [9, 10] (C, 1) [4, 5, 8, 12, 13] [3, 6, 11] Figure 2: Calculation process of combination granule A: [ 6/13 2/13 2/13 3/13 ] 𝑃𝐴 = [ 0.7906 0.7211 ] B: [ 5/13 2/13 3/13 3/13 ] 𝑃𝐵 = [ 0.7693 0.7071 ] C:[ 3/13 2/13 5/13 3/13 ] 𝑃𝐶 = [ 0.7606 0.7289 ] Figure3: Calculation process of proportion matrix and important degree of attribute value Step2.2 ranking from the maximum of the important degree of attributes value to the minimum. ⨂ (class, Y) (class, N) (A, R) [1, 2, 4, 5, 12, 13] [9, 10] (A, S) [7, 8] [3, 6, 11] 340 Figure 4: The 1-st layer of decision tree Step3: The method is same as step2. For example, we construct the next layer of A.R. [ 3/8 0 3/8 2/8 ] 𝑃1 = [ 1 0.7211 ] [ 2/11 2/11 4/11 3/11 ] 𝑃2 = [ 0.7071 0.7143 ] Figure 5: Calculation process about the 2-st layer of decision tree Figure 6: The 2-st layer of decision tree Figure 7: An example of decision tree 5. Data test Referring to test plan in the article (Tao, 2012), this paper compares the accuracy of the algorithm with that of ID3. Test plan: divide the data set “Nursery” randomly into 8 shares. Each one share is named a training set, while the rest seven are named test sets. Divide the data set Car Evaluation randomly into 4 shares. Each one share is named a training set, while the rest three are named test sets. Table 2: Two test sets Data set Nursery Car Evaluation Number of data set 12960 1278 Number of training set Number of test set Number of condition attribute Number of decision types 1620 11340 8 5 432 1296 6 4 Experimental results are shown as Table 3: 0.7906 A.R The 1-st layer 0.7906 C.0 0.7693 B.1 0.7289 C.1 0.7211 A.S 0.7071 B.2 ⨂ (class,Y) (class,N) A.R⊕B.1 [1,12,13] [ ] A.R⊕B.2 [2,4,5] [9,10] ⨂ (class,Y) (class,N) A.R⊕C.0 [1,2] [9,10 ] A.R⊕C.1 [4,5,12,13] [3,6,11] 341 Table 3: Accuracy of the test sets Algorithm Nursery Car Evaluation ID3 78.90% 76.62% The proposed method 88.83% 74.42% Take the average of test set accuracy as the test results and compare it with ID3 algorithm: For the “Nursery” data set, the classification accuracy of the proposed algorithm is higher than that of ID3 algorithm. But the “Car Evaluation” data set, the classification accuracy of the proposed algorithm is lower than that of ID3 algorithm. The author explains that the “Nursery” data set has more condition at tributes and junior combinations. The proposed algorithm has high requirement on condition attributes and junior combinations. So the “Nursery” data set achieves better classification effect. For the Car Evaluation data set, as the samples are in small amount, it would be meaningless to divide into more shares. So the proposed algorithm is not suitable for small data. 6. Conclusion The proposed algorithm that makes the classification from the perspective of granular is quite easy as it is integrated with mathematics. It is adapted to multi-attribute decision-making problem and problems with multiple conditions. Still, there is much room for improvement. Acknowledgments This research is supported by the National Natural Science Foundation of China under grant No. 81260578 and No. 81460769. Reference Chen J.J., Su S.B., Xu H.L., 2011, Decision tree optimization algorithm based on multiscale rough set, Journal of Computer Application, VOL.31 No.12, 3243-3246, DOI:10.3724/SP.J.1087.2011.03243. Ding C.R., Li L.S., 2010, Application of the Variable Precision Rough Set Model in Building Decision Trees, COMPUTER ENGINEERING & SCIENCE, VOL.32 No.7, 88-125, DOI:10.3969/j.issn.1007- 130X.2010.07.024. Feng S.R., Xiao W.J., 2009, Improved Decision Tree Algorithm Based on Samples Selection, JOURNAL OF SOUTHW EST JIAOTONG UNIVERSITY, VOL.44, No.5, 643-647, DOI 10.3969/j.issn.0258- 2724.2009.05.003. Han J.C., Lin T.Y., 2010, Granular Computing: Models and Applications. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL.25, 111-117, DOI 10.1002/int.20390 Hullermeier E., 2011, Fuzzy sets in machine learning and data mining, Applied Soft Computing Journal, 11 (2) 1493–1505, DOI:10.1016/j.asoc.2008.01.004. Jin C.X., Li F.C., li Y., 2014, A generalized fuzzy ID3 algorithm using generalized information Entropy, knowledge-Based Systems 64 (2014) 13-21 http://dx.doi.org/10.1016/j.knosys.2014.03.0140950-7051. Jin R., Kou C.H., Liu R.J., 2014, Biclustering algorithm of differential co-expression for gene data. REVIEW OF COMPUTER ENGINEER STUDIES 1(1):7-14. Qiu T.R., Liu Q., Huang H.K., 2009, A Granular Computing Approach to Knowledge Discovery in Relational Databases. ACTA AUTOMATICA SINACA, 35(8):1071-1079, DOI:10.3724/SP.J.1004.2009.01071. Quinlan J.R., 1986, Induction of Decision Tree. Machine Learning, 1(1):81-106, DOI: 10.1023/A: 1022643204877. Srinivasan V., Rajenderan V., Kuzhali J.V., Aruna M., 2013, Fuzzy fast classification algorithm with hybrid of ID3 and SVM, Journal of Intelligent & Fuzzy Systems 24 555–561,DOI:10.3233/IFS-2012-0574 Tao D.Q., Ma L.L., Peng C., 2012, Decision tree Algorithm based on Classification Matrix [J].Computer Engineering and Design, 2012, 33(6):2309-2313, DOI: 1000-7024(2012) 06-2309-05. Yu J.P., Huang X.M., Li K.S., 2012, Improved ID3 Algorithm based on New Attributes Selection Criterion .Computer Engineering and Design, 29(8):2895-2908, DOI:10.3969/j.issn.1001-3695.2012.08.024. Zhou L., Yan L., 2010, Decision tree algorithm using archetype abstraction and attribute classification value. Application Research of Computers, Vo1.27 No.8, 2899-2901, DOI:10.3969/j.issn.1001- 3695.2010.08.023. Zhang Q.H., Xing Y.K., Zhou Y.L., 2011, The Incremental Knowledge Acquisition Algorithm Based on Granular Computing. Journal of Electronics & Information Technology, 33(2):435-441, DOI:10.3724/SP.J.1146.2010.00217. 342