INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 13(3), 383-390, June 2018. Reduction of Conditional Factors in Causal Analysis H. Liu, I. Dzitac, S. Guo Haitao Liu*, Sicong Guo 1. Institute of Intelligence Engineering and Mathematics Liaoning Technical University, Fuxin 123000, China 2. College of Science Liaoning Technical University, Fuxin 123000, China *Corresponding author: liuhaitao@lntu.edu.cn Ioan Dzitac 1. Aurel Vlaicu University of Arad 310330 Arad, Elena Dragoi, 2, Romania ioan.dzitac@uav.ro 2. Agora University of Oradea 410526 Oradea, P-ta Tineretului 8, Romania, idzitac@univagora.ro Abstract: Faced with a great number of conditional factors in big data causal analysis, the reduction algorithm put forward in this paper can reasonably reduce the number of conditional factors. Compared with the previous reduction methods, we take into consideration the influence of conditional factors on resulted factors, as well as the relationship among conditional factors themselves. The basic idea of the algorithm proposed in this paper is to establish the matrix of mutual deterministic degrees in between conditional factors. If a conditional factor f has a greater deter- ministic degree with respect to another conditional factor h, we will delete the factor h unless factor h has a greater deterministic degree with respect to f, then delete factor f in this case. With this reduction, we can ensure that the conditional factors participating in causal analysis are as irrelevant as possible. This is a reasonable requirement for causal analysis. Keywords: factors space, causal analysis, reduction of factors, fuzzy logic. 1 Introduction Causal analysis in factors space [18] is proposed in the paper [22], which extracts causal rules from the background distribution in between a group of factors. This is the original methodology provided by the factor space for the machine learning, classification and decision-making and so on. The paper [13] applies those causal rules to causal reasoning, and the paper [17] improves the inductive algorithm introduced in paper [22]. The paper [1] puts forward that the slip-differential algorithm, improving the precision of causal reasoning. The paper [15] gives the rule extraction with respect to multi-result factors, which connects multi-label learning theory [5]. The paper [2] presents a reasonable statement on logistic regression based on fuzzy sets and the factor space theory. The paper [14] introduces the historic background of factors space and its relationship with formal concept analysis [6]. A lot of theoretical papers about factors space can be found in the reference [3, 4, 7–12, 16, 19–21, 23, 24]. All this lays a complete foundation for the unified depiction of causal induction and reasoning in artificial intelligence. However, in the face of the impact of big data, the number of factors to be processed by causal analysis is surprisingly large. We will discuss how to simplify and merge the large number of conditional factors in this paper. The idea of the article [22] is that the factor which has the strongest influence on the resulted factor will be used first. By using it, we can have a causal rule and delete some data. Repeating Copyright ©2018 CC BY-NC 384 H. Liu, I. Dzitac, S. Guo the process, when all the data are deleted all unused conditional factors were reduced. This reduction method is determined by the deterministic degrees of conditional factors with respect to the resulted factor. This paper makes a supplement to the idea of reduction. Not only do we consider the influence of the conditional factors on the result factor, but also the relationship between the condition factors is taken into consideration. The deterministic degrees of a condition factor with respect to other factors should be considered. The conditional factors are reduced or merged according to the degree of mutual determination, and the last is a set of conditional factors that are as not related to each other as possible, which is the best condition for causal analysis. The structure of this paper is that section 2 introduces the mutual relationship in between conditional factors, and section 3 introduces the reduction algorithm of conditional factors. Sec- tion 4 is a short conclusion. This paper is a mathematical method without specific examples. 2 Mutual relationship in between conditional factors The factor is the quality root, each factor in command of a string of attributes. For example, the color is a factor, which commands the red, orange, yellow, green, blue, blue, purple and so on. It mathematically is defined as a mapping f : U → X(f). f is color, for example, U are a group of cars, X(f)={red, orange, yellow, green, green, blue, purple}, which draws our attention from the group of cars to their colors. X(f) is called the state space of the factor f , where the states are described by natural language words, called the qualitative states; of course, factors also can have quantitative state space, then back to the variable. The factor is the promotion of variables. Factor f is regular if there are at least two objects u and v such that f(u) 6= f(v). Considering a set of basic factors F∗ = {f1, · · · ,fn}, we can define a synthetic factor by any subset f = {f(1), · · · ,f(k)} of F∗ with the state space X = X(f(1)) ×···× X(f(k)) ( × stands for Cartesian product). Denote the synthetic factor as f = {f(1) ∪·· ·∪f(k)}. It is easy to prove that P(F∗) = (P(F∗),∪,∩,c) forms a factorial Boolean algebra, where the operations ∪ and ∩ are called the synthesis and separation of factors respectively. Denote XF∗ = {X(f)}(f∈P(F∗)), and φ = (U,XF∗) is called the factor space defined on U. A factor f defines an equivalence relation ∼ in the domain U: For any u,v ∈ U, u ∼ v if and only if f(u) = f(v). Denote the subclass of U containing u as [u]f = {v ∈ U|f(v) = f(u)}. We call that H(f,U) = {[u]f|u ∈ U} the division of U by f. We call that f is more specific than h, denoted as H(f,U)〉H(h,U), if for any u, there is an v in U such that [v]f ⊆ [u]h, and for any u, there is an v in U such that [u]f ⊆ [v]h. It is obvious that H(f,U)〉H(h,U) if and only if H(f ∪ h,U) = H(f,U), in this case, for any a ∈ X(h), there are a1, · · · ,at ∈ X(f), such that [a]h = [a1]f + · · · + [at]f. We call that f and h are equivalent if H(f,U)〉H(h,U) and H(h,U)〉H(f,U). Suppose that the numbers of subclasses in the divisions of f and h are s and t respectively, and suppose that s,t > 1. If H(f ∪ h,U) is the roughest common more specific division of H(f,U) and H(h,U), then we call that f and h are independent in division. In this case, for any b ∈ X(h), there are a1, · · · ,as ∈ X(f) such that [b]h = [a1]f + · · · + [as]f ; and for any a ∈ X(f), there are b1, · · · ,bt ∈ X(h) such that [a]f = [b1]h + + [bt]h. Given a factors space φ = (U,XF∗), selecting f1, · · · ,fk and g from XF∗, called a set of conditional factors and a result factor respectively, and extracting m objects from U to form a sample domain U ′, we obtain the combined states data of these objects with respect to the k + 1 factors. Causal analysis aims to extract causal rules from conditions to the result based on the sample distribution of U ′. One of the key concepts is the deterministic degree of factor fi with respect to g. Reduction of Conditional Factors in Causal Analysis 385 Definition 1. (Wang2015) If there is an object u ∈ U ′ such that [u]fi ⊆ [u]g, then we say that [u]fi is a deterministic class of fi with respect to g. The ratio d of the number of objects in all deterministic classes of fi with respect to g and the number of objects in U ′ is called the determination degree of fi with respect to g. When fj(u) = fj(v),we have that [u]fi = [v]fi. To avoid repetition, denote [u]fi = [v]fi = [a]fi, then we have d(fj,g) = ∑ {|[a]fi||[a]fi is a deterministic class of fi on g}/m (1) where |A| stands for the number of elements in A. In this Section, we will consider the deterministic degree d(f,h) of a conditional factor f with another conditional factor h. The whole theory is applied on a sampling U ′ ⊆ U. Theorem 2. Let f, h be two conditional factors on sample U ′. Factor f is more specific than h on U ′ if and only if d(f,h) = 1. Proof: Suppose that f is more specific than h on U ′. For any u ∈ U ′, there is v ∈ U ′ such that [v]f ⊆ [u]h, it means that [v]f is a deterministic subclass of f with respect to h. Therefore, all the elements of U ′ is covered by deterministic subclasses of f with respect to h, then, we have that d(f,h) = 1. Inversely, suppose that d(f,h) = 1. For any a ∈ X(h), let [a] be the subclass that has the state a under h, there must be an element u ∈ U ′ such that h(u) = a, and then [a] = [u]h. Since d(f,h) = 1, we have that [u]f ⊆ [u]h, i.e., [u]f ⊆ [a]; For any a ∈ X(f), let [a] be the subclass has the state a under f, there must be an element u ∈ U ′ such that f(u) = a. Since d(f,h) = 1, we have that [u]f ⊆ [u]h, i.e., [u] ⊆ [u]f. Therefore, the factor f is more specific than h. 2 Theorem 3. If f is more specific than h on U ′, and the two factors h and f are not equivalent, then d(h,f) < 1. Proof: Suppose that d(h,f) = 1. According to proposition 2, h is more specific than f, and then f and h are equivalent U ′. This is a contradiction. 2 Theorem 4. If f is more specific than h on U ′, then d(f,g) ≥ d(h,g). Proof: Suppose that [a] is a deterministic subclass of h (a ∈ X(h)). There is u ∈ [a]h such that h(u) = a. Since that f is more specific than h on U ′, we have that H(f ∪h,U) = H(f,U), and then [a]h = [a1]f + · · · + [at]f, where [a1]f, · · · , [at]f are deterministic degrees of f on U ′ with respect to g both. It is obvious that d(f,g) ≥ d(h,g). 2 Theorem 5. If f and h are two regular factors mutual independent in division on U ′, then d(f,h) = d(h,f) = 0. Proof: Since f and h are two regular factors mutual independent in division on U ′ for any b ∈ X(h), there are a1, · · · ,as ∈ X(f) such that [b]h = [a1]f + · · ·+ [as]f; and for any a ∈ X(f), there are b1, · · · ,bt ∈ X(h) such that [a]f = [b1]h + · · ·+ [at]h, and it ensures that [a]f \ [b]h 6= Φ and [b]h \ [a]f 6= Φ hold for any a ∈ X(f) and any b ∈ X(h). then d(f,h) = d(h,f) = 0. 2 There are three kinds of relationship between conditional factors: 1. d(f,h) is rather larger and d(h,f) is rather smaller; 2. d(f,h) and d(h,f) are rather larger both; 3. d(f,h) and d(h,f) are rather smaller both. According to the statements mentioned above, in case 1, if d(f,g) is larger, then we need not the factor h when f is taken part in with respect to the result g; in case 2, factors f, h are related to each other closely, and they are not suitable to be conditional simultaneously, and need to do reduction; in the case 3, factors f, h are rather independent, so they don’t need to be deleted provided they have important influence to the resulted factor. 386 H. Liu, I. Dzitac, S. Guo Table 1: Conditional Factors factor name state space f1 Age X(f1)={Old, Middle, Young} f2 Income X(f2)={High, Average, Low} f3 Student X(f3)={Y, N} f4 Credit X(f4)={Very-good, Good, Un-recorded} f5 Education X(f5)={Primary, Junior, University, Graduated} f6 Civil X(f6)={Civil, Private} f7 Housing X(f7)={Rent, Narrow, Mansion} f8 Car X(f8)={Car, Bike} f9 Health X(f9)={Healthy, Sickness} f10 Residence X(f10)={Town, Rural} Table 2: Causal Data u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 u14 f1 O O M O O M M M M Y Y Y Y Y f2 L A H A H L A H L A H A L L f3 N N N N N N N N N Y N Y Y Y f4 U U V G V U G V G V V V G U f5 P J G U U J U J P G U U U G f6 C C C C C P C P P P C P P P f7 N N M M M N N M N N M R R R f8 B B C C B B C C B C C B B B f9 H S H H S H S S H H H S H S f10 R T T T T R T R R T T T T T g 0 1 2 1 2 0 1 2 0 2 2 2 1 0 3 Reduction of conditional factors Causal analysis aims to extract the rules form conditional to resulted factors; the more independent the better the conditional factors. The reduction of conditions factors obeys such a principle: For a pair of the factors with higher deterministic degrees with respect to the resulted factor g both, delete one of them except their mutual deterministic degrees are smaller both (i.e., in the case 3). This principle aims to take conditional factors into causal analysis as independent as possible. Algorithm Step 1. Rank the conditional factors according to their deterministic degrees with respect to the resulted factor g from high to low; Step 2. Write the matrix of deterministic degree between conditional factors; Step 3. For any i and j, if (d(fi,fj) > 0.5 or d(fj,fi) > 0.5) and d(fi,fj) > d(fj,fi), then delete fj. The remaining factor sequence is the conditional sequence that is required by the reduction. If causal analysis is performed according to this sequence, the sequence will be terminated when the causal tree is formed, and all unused conditional factors are deleted at all. Example. In customer analysis, the goal is to open the market. The utility factor is the purchasing power of the customer, and the form factor is the information of the customers. Take the form factors as the conditionals; the benefit factor should be the result to do the causal analysis. The conditional factors considered are listed in Table 1. Selecting 14 customers to form a sampling universe U ′ = {u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 u14}, Table 2 presents causal data type. Reduction of Conditional Factors in Causal Analysis 387 Table 3: Frequencies of factors f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 0 4 0 6 2 0 0 0 0 0 Table 4: The matrix of mutual deterministic degrees between conditional factors f4 f2 f5 f1 f3 f6 f7 f8 f9 f10 f4 0 0 0 0 0 0 4 0 0 f2 4 0 0 5 0 4 9 0 5 f5 2 2 0 2 0 2 4 4 8 f1 0 0 0 9 4 0 0 0 5 f3 0 2 0 2 4 0 0 0 4 f6 0 0 0 0 0 0 0 0 0 f7 0 0 0 3 3 3 3 0 0 f8 0 0 0 0 0 0 0 0 0 f9 0 0 0 0 0 0 0 0 0 f10 0 0 0 0 4 0 0 0 0 The steps of reduction of conditional factors are shown as follows: Step 1. Reordering conditional factors according to their deterministic degrees with respect to g. Remember that m=14, to be more simple, we write all frequencies by 14 times. The results are given in Tables 3. The new order is shown as: f4, f2, f5, f1, f3, f6, f7, f8, f9, f10. Step 2. The matrix of mutual deterministic degrees between conditional factors is listed in Table 4. Step 3. When i = 2, j = 8, d(fi,fj) = 9/14 > 0.5 and d(fi,fj) > d(fj,fi) = 0, delete f8; When i = 5, j = 10, d(fi,fj) = 8/14 > 0.5 and d(fi,fj) > d(fj,fi) = 0, delete f10; When i = 1, j = 3, d(fi,fj) = 9/14 > 0.5 and d(fi,fj) > d(fj,fi) = 0, delte f3. After deleting the three conditional factors, the new causal analysis data style is presented in Table 5 According to the causal analysis [22], do rule extraction by f4 to get that Rule 1: If Credit is very good, then the purchasing power is #2 Taking out those customers having very good credit from U ′, Table 6 presents newer causal analysis data style. Do rule extraction by f4 and f2 to get that Rule 2: If Credit is unrecorded and Income is low, then the purchasing power is #0; Rule 3: If Credit is unrecorded and Income is average, then the purchasing power is #1; Rule 4: If Credit is good and Income is average, then the purchasing power is #1; Table 5: New Causal Data u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 u14 f4 U U V G V U G V G V V V G U f2 L A H A H L A H L A H A L L f5 P J G U U J U J P G U U U G f1 O O M O O M M M M Y Y Y Y Y f6 C C C C C P C P P P C P P P f7 N N M M M N N M N N M R R R f9 H S H H S H S S H H H S H S g 0 1 2 1 2 0 1 2 0 2 2 2 1 0 388 H. Liu, I. Dzitac, S. Guo Table 6: New Causal Data (with 8 factors) u1 u2 u4 u6 u7 u9 u13 u14 f4 U U G U G G G U f2 L A A L A L L L f5 P J U J U P U G f1 O O O M M M Y Y f6 C C C P C P P P f7 N N M N N N R R f9 H S H H S H H S g 0 1 1 0 1 0 1 0 Table 7: New Causal Data (with 2 factors) U′ u9 u13 f4 G G f2 L L f5 P U f1 M Y f6 P P f7 N R f9 H H g 0 1 Taking out those customers having been contributed to rule extraction from U ′, the newer causal analysis data style is given in Table 7. Do rule extraction by f4, f2 and f5 to get that Rule 5: If Credit is Good, Income is low, and Education is Univ., then the purchasing is #1; Rule 6: If Credit is Good, Income is low, and Education is Prim., then the purchasing is #0; Now, the universe U ′ has been empty, the rule extraction has finished. We just select three factors in use, all of others have been deleted at all. What are the relationship in between the three factors? d(f4,f2) = 0 < 0.5, d(f2,f4) = 4/14 < 0.5, d(f4,f5) = 0 < 0.5, d(f5,f4) = 2/14 < 0.5, d(f2,f5) = 0 < 0.5, d(f5,f2) = 2/14 < 0.5. All of mutual deterministic degrees between them are small, which satisfy the requirement of causal analysis. 4 Conclusion In the face of the challenge of big data, the number of conditional factors in causal analysis is very large, so the reduction of conditional factors is an important task. The proposed reduction algorithm can reasonably reduce the number of conditional factors. Compared with the previous reduction methods, we take into consideration the influence of conditional factors on resulted factors, as well as the relationship among conditional factors themselves. In this paper we consider the mutual deterministic degrees in between conditional factors. Such reduction ensures the conditional factors are selected as independent as possible, Causal analysis requires such selection, and this improvement is of great importance in practice. Reduction of Conditional Factors in Causal Analysis 389 Acknowledgement The authors specially thank Professor P. Z. Wang for his guidance and modification. This study was partially supported by the grants (Grant Nos. 61350003, 11401284, 70621001, 70531040) from the Natural Science Foundation of China, and the grant (Grant Nos. L2014133) from the department of education of Liaoning Province. Bibliography [1] Bao, Y. K.; Ru, H.Y.; Jin, S.J.(2014); A new algorithm of knowledge mining in factor space, Journal of Liaoning Technical University (Natural Science), 33(8), 1141–1144, 2014. [2] Cheng, Q.F.; Wang, T.T.; Guo S.C.; Zhang, D. Y.; Jing K.; Feng, L.; Wang P.Z. (2017); The Logistic Regression from the Viewpoint of the Factor Space Theory, International Journal of Computers Communications & Control, 12(4), 492–502, 2017. [3] Dzitac I. (2015), The Fuzzification of Classical Structures: A General View, International Journal of Computers Communications & Control, 10(6), 772-788, 2015. [4] Dzitac, I.; Filip, F.G.; Manolescu, M.J. (2017), Fuzzy Logic Is Not Fuzzy: World-renowned Computer Scientist Lotfi A. Zadeh, International Journal of Computers Communications & Control, 12(6), 748-789, 2017. [5] Furnkranz, J.; Hullermeier, E.; Mencia, E.L.; Brinker, K.(2008); Multilabel classification via calibrated label ranking, Machine Learning, 73(2), 133-153, 2008. [6] Ganter, B.; Wille, R. (1996); Formal concept analysis, Wissenschaftliche Zeitschrift- Technischen Universitat Dresden, 45, 8–13, 1999. [7] Kandel, A.; Peng, X.T.; Cao, Z.Q.; Wang P.Z. (1990); Representation of concepts by factor spaces, Cybernetics and Systems: An International Journal, 21(1), 43–57, 1990. [8] Li, H.X.; Wang, P.Z.; Yen, V.C. (1998); Factor spaces theory and its applications to fuzzy information processing.(I). The basics of factor spaces, Fuzzy Sets and Systems, 95(2), 147– 160, 1998. [9] Li, H.X., Yen, V.C.; Lee, E.S. (2000); Factor space theory in fuzzy information processing- Composition of states of factors and multifactorial decision making, Computers & Mathe- matics with Applications, 39(1), 245–265, 2000. [10] Li, H.X.; Yen, V.C.; Lee, E.S. (2000); Models of neurons based on factor space, Computers & Mathematics with Applications, 39(12), 91–100, 2000. [11] Li, H. X.; Chen, C.P.; Yen, V.C., Lee, E.S. (2000); Factor spaces theory and its applications to fuzzy information processing: Two kinds of factor space canes, Computers & Mathematics with Applications, 40(6-7), 835–843, 2000. [12] Li, H.X.; Chen, C.P., Lee, E.S. (2000); Factor space theory and fuzzy information processing- Fuzzy decision making based on the concepts of feedback extension, Computers & Mathe- matics with Applications, 40(6-7), 845–864, 2000. [13] Liu, H.T.; Guo, S.C. (2015); Inference model of causality analysis, Journal of Liaoning Technical University(Natural Science), 34(1), 124–128, 2015. 390 H. Liu, I. Dzitac, S. Guo [14] Liu, H.T.; Dzitac, I.; Guo, S.C. (2018); Reduction of conditional factors in causal analysis, International Journal of Computers Communications & Control, 13(1), 83–98, 2018. [15] Qu, W.H.; Liu, H.T.; Guo, S.Z.(2017); Multi-target causality analysis in factor space, Fuzzy Systems & Mathematics, 31(6), 74–81, 2017. [16] Vesselenyi, T.; Dzitac, I.; Dzitac, S.; Vaida, V. (2008); Surface roughness image analysis using quasi-fractal characteristics and fuzzy clustering methods, International Journal of Computers Communications & Control, 3(3), 304–316, 2008. [17] Wang, H.D.; Wang, P.Z.; Shi, Y.; Liu, H.T. (2014); Improved factorial analysis algorithm in factor spaces, International Conference on Informatics, 201–204, 2014. [18] Wang, P.Z.; Sugeno, M. (1982); The factor fields and background structure for fuzzy subsets, Fuzzy Mathematics, 2(2), 45–54, 1982. [19] Wang, P.Z. (1990); A factor spaces approach to knowledge representation, Fuzzy Sets and Systems, 36(1), 113–124, 1990. [20] Wang, P.Z.; Zhang, X.H.; Lui, H.C.; Zhang, H.M.; Xu, W. (1995); Mathematical theory of truth-valued flow inference, Fuzzy Sets and Systems, 72(2), 221–238, 1995. [21] Wang, P.Z.; Jiang, A. (2002); Rules detecting and rules-data mutual enhancement based on factors space theory, International Journal of Information Technology & Decision Making, 1(01), 73–90, 2002. [22] Wang, P.Z.; Guo, S.C.; Bao, Y.K.; Liu, H.T. (2014); Causality analysis in factor spaces, Journal of Liaoning Technical University (Natural Science), 33(7), 1–6, 2015. [23] Yuan, X.H.; Wang, P.Z.; Lee, E.S. (1992); Factor space and its algebraic representation theory, J Math Anal Appl., 171(1), 256–276, 1992. [24] Yuan, X.H.; Wang, P.Z.; Lee, E.S. (1994); Factor Rattans, Category FR (Y), and Factor Space, Journal of Mathematical Analysis and Applications, 186(1), 254–264, 1994.