INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL ISSN 1841-9836, 12(4), 492-502, August 2017. The Logistic Regression from the Viewpoint of the Factor Space Theory Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang Qifeng Cheng Research and Development Center, China Academy of Launch Vehicle Technology, Fengtai District, Beijing, 100076, P. R. China Tiantian Wang School of Science Liaoning Technical University, Fuxin, 123000, P. R. China Sicong Guo, Peizhuang Wang Institute of Intelligent Engineering and Mathematics Liaoning Technical University, Fuxin, 123000, P. R. China Dayi Zhang*, Kai Jing, Liang Feng, Zhifeng Zhao General Hospital of Coal Mining Industry Group Fuxin, Fuxin, 123000, P. R. China *Corresponding author:18242875460@163.com Abstract: Logistic regression plays an important role in machine learning. Peo- ple excitingly use it in conceptual matching yet with some details to be understood further. This paper aims to present a reasonable statement on logistic regression based on fuzzy sets and the factor space theory. An example about breast cancer diagnosis is displayed to show how the factor space theory can be incorporated into the understanding and use of logistic regression. Keywords: logistic regression, factor space theory, fuzzy sets, logistic membership function 1 Introduction In 1965, Zadeh put forward the concept of fuzzy subset which made ordinary subset generalize that the range of grade of membership has been relaxed from binary variables {0,1} to continuous variables [0,1] [23]. Today, we are facing the tide of big data, the essential meaning of fuzzy set is shined by its original definition still. Along the way, the factor space theory has been established to provide a deeper mathematical foundation for artificial intelligence [13–15]. The factor space theory builds a bridge connecting certainty and uncertainty and the two sides can be transferred to each other by changing the dimension of related factor space [16]. The falling shadow theory [17] based on factor space was developed to compare fuzziness with randomness. No matter randomness or fuzziness, they are both caused by a lack of factors. Randomness is a kind of uncertainty, which is caused by the lack of conditional factors for predicating. Randomness breaks the law of causality, while probability theory replaces it by a generalized causality law: even though insufficient conditions can not determine whether an event will occur or not, they determine the occurrence of an event with a certain probability. Fuzziness is another kind of uncertainty, which is caused by the lack of identifying factors for recognition. Fuzziness breaks the law of excluded middle, while fuzzy theory replaces it by a generalized law Copyright © 2006-2017 by CCC Publications The Logistic Regression from the Viewpoint of the Factor Space Theory 493 of excluded middle: even though insufficient identifying factors can not determine if an object conforms a concept, they identify an object to a concept with a certain membership degree. An important relationship between fuzziness and randomness is emerged from the similarity: the fuzziness phenomenon on the ground, the universe U, can be described as a randomness phenomenon on the sky, the power of U [22]. Subjective reliability is the non-additive measure since they are the fallen of probability from sky to ground, this is the core idea of the falling shadow theory. Logistic regression analysis (LRA) is a simple but effective method which is widely used in classification, extending the techniques of multiple regression analysis to research situations in which the outcome variable is categorical [3]. In stratification research, demographic research and social medicine, the use of logistic regression is routine [8]. Some related works have been done on logistic regression. A forward stepwise logistic regression analysis is conducted and the results show that the peer ratings of collaboration can predict you belong to Learning Problem group or Not Learning Problem group [10]. Split-sample, cross-validation and bootstrapping methods for estimation of internal validity of a logistic regression model are compared, which ends up with the recommendation of bootstrapping [12]. The number of events per variable (EPV) in LRA has influences on the validity of the model, and it turns out that low EPV plays a major role [9]. Yet these works only focus on the strategic level of logistic regression, deep understanding hidden behind this common model should be mined. In artificial intelligence, the regressed curves can be regarded as membership functions. The selection of the types of regressed curves decides the quality of curve fitting partially. Logistic regression is fitted by the logistic membership function. In this article, we will expand the discussion about logistic regression based on factor space theory and fuzzy sets which can present a relatively different state from before. The paper is proceeded as follows. We introduce the types of membership functions in Section 2. The logistic membership function is put forward in Section 3. In Section 4, logistic regression based on the factor space theory is discussed further and an algorithm is proposed accordingly. In Section 5, an example is used to identify the effectiveness of the proposed algorithm. Finally, a brief conclusion is made in Section 6. 2 Type of membership functions Definition 1. A fuzzy subset A defined on universe of discussion U is a mapping µA: U→[0,1], µA(u) is called the membership degree of u with respect to A [23]. From Definition 1, two fundamental meanings are revealed: firstly, a fuzzy subset stands for the extension of a fuzzy concept, it is a milestone of intelligence mathematics; secondly, a fuzzy subset builds a bridge to step over the gap between quantitative and qualitative phenomena, it was the main bottleneck of information revolution. While the falling shadow theory can provide a deeper mathematical foundation for the revolutionary change [5,6,18]. Definitions 2, 3 and 4 are listed to show how the falling shadow of a random subset is generated. Definition 2. Let (U,B) be a measurable space. (P(U),B − ) is called a super-measurable space defined on (U,B) if (P(U),B − ) is a measurable space. Definition 3. Given a probabilistic field (Ω,F,p) and a super-measurable space (P(U),B − ) on (U,B). A mapping ξ: Ω → P(U) is called a random set if ξ−1(A) = {ω ∈ Ω | ξ(ω) ∈ A} ∈ F whenever A ∈ B − . 494Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang A probability distribution can be induced on B − from p on F through the mapping ξ, P(A) = P(ξ−1(A)), A ∈ B − , which makes (P(U),B − ) a super probabilistic field [19]. Definition 4. Given a super probabilistic field (P(U),B − ) and a random set ξ : Ω → P(U), we call a fuzzy subset Aξ on U the falling shadow of ξ if µAξ(u) = P ( ω | u ∈ ξ(ω) ) for all u ∈ U. It is revealed in Definition 4 that µAξ can be viewed as the covering function of a random set ξ on U. The random set is called clouds, and the fuzzy set is called the falling shadow of the clouds. The thicker the cloud, the higher the darkness of the shadow of the cloud [19]. Meanwhile, according to Definition 1, µAξ is also the membership function with respect to fuzzy subset Aξ. Definition 5. A membership function µAξ is also called the possibility distribution of a concept Aξ on U . Possibility varies from probability. According to Definition 5, possibility stands for the cover- ing chance of ξ to u and it does not hold exclusiveness; while probability holds the exclusiveness and stands for the chance of monopolization [20]. Theorem 1. Let U = (−∞, +∞) be the one dimensional state space. Given a random interval r with falling shadow µA, let ζ be the left extreme point of random interval r, then ζ is a random variable defined on (U,B). And the possibility distribution of the concept A is the same as the distribution function F(u) of ζ : µA(u) = F(u) = P(ζ ≤ u). Proof: Given probabilistic field (Ω,F,p), measurable space (U,B) and super-measurable space (P(U),B − ). Since r is a random set, then r−1(A) = {ω ∈ Ω | r(ω) ∈ A}∈ F whenever A ∈ B − . Because ζ is the left extreme point of r, then for each ω ∈ Ω, there exists β ∈ B, which makes r(ω) ∈ A =⇒ ζ(ω) ∈ β. Since ζ(ω) is a real number on the interval (−∞, +∞), according to the definition of random variable [21], it is obvious that ζ is a random variable. And µA(u) = P { ω | u ∈ r(ω) } = P { ω | ζ(ω) ≤ u } = P(ζ ≤ u). 2 From Theorem 1, we can distinguish possibility distribution from probability distribution and combine the two terminologies by means of density function and distributed function respectively. The membership function µA determines the extent that u belongs to a concept A. For one concept, different types of membership functions exhibit different membership degree for a certain point u, making it necessary to clarify which type of membership functions should be chosen accordingly. Definition 6. A is a fuzzy subset defined on universe of U whose membership function is µA. If µA(x) > min{µA(a),µA(b)} for any a < x < b, then A is a convex fuzzy subset [4]. The Logistic Regression from the Viewpoint of the Factor Space Theory 495 A convex fuzzy set divides A into five parts with four points l−, l+,u−,u+ (l− ≤ l+ ≤ u− ≤ u+): k = (−∞, l−], l = (l−, l+], t = (l+,u−], u = (u−,u+], v = (u+, +∞). µA ≡ 0 when x ∈ k or x ∈ v, µA ≡ 1 when x ∈ t. (l−,µA(l−)) and (u+,µA(u+)) are the inflection points. Definition 7. We call | l | and | u | the lower and upper interim length respectively. Even though the shape of membership functions is variable with countless changes, the es- sential variations focus on the two interim periods. The length of interim reflects the degree of fuzziness with respect to a membership function, the narrower the length of interim, the more precise the representation of a concept. For ease of simplicity, we only discuss the membership function on the left fuzzy interval l formed by the distribution of left extreme point ζ on the interval. There are three common types of probability density functions of ζ: uniform type, cosine type and normal type. Due to space limitation, only uniform distribution of ζ is displayed here to show how the membership function is generated. For example, ζ is uniformly distributed on fuzzy segment (l−, l+], the probability density function of ζ is f1(x) = 1 l+ − l− , x ∈ (l−, l+]. According to Theorem 1, the membership curve of the concept on this interval is µ1(x) = P1(ζ ≤ x) = x− l− l+ − l− . (1) It is a straight line ranging from 0 to 1 on fuzzy segment (l−, l+]. Curve fitting is the foundation of optimization, the quality of fitting depends on whether the curve chosen is appropriate or not. In this article, we introduce another type of membership function which is frequently used in classification but lacking deeper understanding, as well as expanding the discussion on it from the viewpoint of factor space. 3 Logistic membership function Let γ be the random variable defined on U, indicating attribute x. And let y be the indication variable of concept α with the extension A defined on U, which takes value 1 when u ∈ A and 0 else. Denote Px = P{y = 1|γ = x}, which is the possibility distribution of the concept A with respect to the variable x. To estimate the possibility, use the maximal likelihood principle. Consider a series of sampling points (x1,y1), · · · , (xm,ym), the likelihood function is as follows: L = m∏ i=1 Pyixi (1 −Pxi) 1−yi. (2) It is not easy to calculate the derivative, put logarithm on it and get the new likelihood 496Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang function: L = ln m∏ i=1 Pyixi (1 −Pxi) 1−yi = m∑ i=1 ( yilnPxi + (1 −yi)ln(1 −Pxi) ) = m∑ i=1 ( yi(lnPxi − ln(1 −Pxi)) ) + m∑ i=1 ln(1 −Pxi) = m∑ i=1 (yiln Pxi 1 −Pxi ) + m∑ i=1 ln(1 −Pxi). (3) Since ln Px 1−Px varies on the interval [0, +∞), also being related to the attribute x, we define ln Px 1−Px = ax + b, then Px = 1 1+e−(ax+b) . And Equation (3) can be transferred into L = m∑ i=1 (yi(axi + b)) − m∑ i=1 ln(1 + eaxi+b). (4) Maximize L so that a and b can be achieved. This is an optimization problem and some strategies solving this type of problems such as gradient descent [?] can be applied. Definition 8. The logistic regression function is defined as: φ(x; a,b) = 1 1 + e−(a >x+b) x,a ∈ Rn, a>x = a1x1 + · · · + anxn. When n = 1, φ(x; a,b) = 11+e−(ax+b) ; φ(x; 1, 0) = 1 1+e−x . It is obvious that this method of estimating a and b is kind of complex, while simplicity is what we pursue eventually. Another method which can transfer the problem into a linear problem will be introduced in the next section. Since PA(x) = 11+e−(ax+b) is central symmetric with respect to the point (− b a , 1 2 ), we can get P¬A(x) = PA(− 2b a −x) = 1 1 + e−(a(− 2b a −x)+b) = 1 1 + eax+b . And it is easy to know PA(x) + P¬A(x) = 1 1 + e−(ax+b) + 1 1 + eax+b = 1. With this property, the logistic regression function P(x) = 1 1+e−(ax+b) can be seen as the logistic membership function. And this property is available among another three types of membership functions referred in Section 2, the details of which are omitted here. Definition 9. We call P(x) = 1 1+e−a(x−x) the logistic interim function. 3.1 Logistic regression in risk attributable factor space The factor space theory paves the way for logistic regression since some concepts find their footholds in the factor space, which contributes to the better understanding of logistic regression. The Logistic Regression from the Viewpoint of the Factor Space Theory 497 Definition 10. A factor space defined on universe of discussion U is a family of set Ψ =( {X(f)}(f∈F); U ) satisfying: (1) F = (F, ∨ , ∧ ,c ,1,0) is a complete Boolean algebra; (2) X(0) = {∅}; (3) For any T ⊆ F , if {f|f ∈ T} are irreducible (i.e.,s 6= t ⇒ s ∧ t = 0 (s,t ∈ T)), then X({f|f ∈ T}) = ∏ f∈T X(f) ( ∏ stands for Cartesian product) (4) ∀f ∈ F , there is a mapping with same symbol f : U → X(f). F is called the set of factors, f ∈ F is called a factor on U. X(f) is called the state space of factor f [22]. Denote Ij = X(fj) and I = I1 ×···× In. Definition 11. Given an attribute space O = {oi1···in|ij ∈ Ij,j = 1, · · · ,n}) in factor space Ψ = ({X(f)}(f∈F={f1,··· ,fn}); U). For each ij ∈ Ij, oi1···in is called a granule of X(F) [5]. Denote that qi1···in = P { u|F(u) = oi1···in } , P = {qi1···in|ij ∈ Ij,j = 1, · · · ,n} is called the probability distribution of attributes. Assumption 1. X(f1),X(f2) · · · ,X(fn) are all partial ordered sets, it means for one attribute fi whose range of values is {ij1, ij2, · · · , ijr}, ij1 ≤ ij2 ≤ ··· ≤ ijr is always correct. It should be clear that if there exists two granules o1···1···1 and o1···3···1, granule o1···2···1 is anyhow certain to appear. Then we naturally suppose that P is convex. A convex probability distribution of attributes P is called the background distribution with respect to universe U [7]. Definition 12. A factor space Ψ̃ = ( {X(f)} (f∈F̃); U ) is called a risk attribute factor space if F̃ = {f1, · · · ,fn; fn+1}, where f1, · · · ,fn stand for attribute factors and fn+1 stands for a risk factor with binary attribute space X(fn+1) = {1, 0}. Given a group of sampling points on I ×{1, 0}: S = {(x1i, · · · ,xni; yi)}(i=1,··· ,m). For ( i1, · · · , in; in+1 ) ∈ I ×{1, 0}, denote qi1···inin+1 = | {t | x1t = i1, · · · ,xnt = in; yt = in+1} | m . (5) It is obvious that ∑ i1 · · · ∑ in ∑ in+1 qi1···inin+1 = 1. Definition 13. Q̃ = {i1, · · · , in; in+1 ∈ I ×{1, 0} | qi1···inin+1 > 0} is the support set of Q = {qi1···inin+1}(i1,··· ,in;in+1)∈I×{1,0}. 498Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang In the risk attribute factor space Ψ̃, logistic regression is mainly based on the support set Q̃. While the condition that qi1···inin+1 = 0 still exists and transformation used to handle it will be discussed in the final part of this section. Also, the definition of the logistic membership function fitted out through logistic regression get updated in Ψ̃. Definition 14. The membership function of illness α: P(X) = 1 1 + e−(θ0+θ1x1+···+θnxn) = 1 1 + e−θ >X , where θ = (θ0,θ1, · · · ,θn)>, X = (1,x1, · · · ,xn)>. The parameters θ1, · · · ,θn are called risk attribute coefficients of factors. The bigger the coefficient θj, the more important the factor fj on risk-increasing. Definition 15. Let α be a concept with extension A ⊆ U and intension O ⊆ X(F), we call conditional probability P{u ∈ A|F(u) = oi1···in} the possibility of concept α under oi1···in. Remark 2. The difference between possibility and probability is clear again. Since possibility is equal to the membership degree of the factorial configuration with respect to concept α, the logistic membership function of illness α is pi1···in = P{u ∈ A|F(u) = ai1···in} = | {t | x1t = i1, · · · ,xnt = in; yt = 1} | | {t | x1t = i1, · · · ,xnt = in} | . (6) Also, the probability corresponding to possibility pi1···in is qi1···in,in+1=1 = P{u ∈ A|F(u) = oi1···in}p{F(u) = oi1···in} = | {t | x1t = i1, · · · ,xnt = in; yt = 1} | m . (7) It is obvious that ∑ pi1···in 6= ∑ qi1···in,in+1=1. (8) Then the approach to estimate θ through logistic regression is developed as Algorithm 1, and the membership degree for the unknown samples can be calculated. Algorithm 1 Logistic Regression Algorithm 1: Given sample points S = {(x1i, · · · ,xni; yi)} (i = 1, · · · ,m). 2: For | {t | x1t = i1, · · · ,xnt = in} | > 0 (i1 ∈ I1, · · · , in ∈ In) : 3: if | {t | yt = 1} |> 0 and | {t | yt = 0} |> 0, then 4: calculate Pi1···in 5: let yi1···in = ln Pi1···in 1−Pi1···in 6: else 7: yi1···in = ln |{t|yt=1}|+0.5 |{t|yt=0}|+0.5 [1] 8: end if 9: Set yi1···in = θ0 + θ1i1 + · · · + θnin, do linear regression to get the coefficients θ0,θ1, · · · ,θn. The Logistic Regression from the Viewpoint of the Factor Space Theory 499 4 Example To show how the factor space theory is imbedded into logistic regression, in this section, an example calculating the membership degree of breast cancer using Algorithm 1 is considered. The data extracted from [2] are shown in Table 1 and Table 2. There are 1896 samples in total which form the universe of discussion U. Four attribute factors f1,f2,f3,f4 are considered and each factor is a categorical variable whose set of value is I1 = {0, 1, 2},I2 = {0, 1, 2, 3},I3 = {0, 1},I4 = {0, 1, 2} respectively. The meaning of these coded numbers are shown in Table 4. Then the factor space Ψ = ( {X(f)}(f∈F=f1,f2,f3,f4); U ) involving 72 different granules oi1···i4 of X(F) can be established. The risk attribute factor space Ψ̃ = ( {X(f)} (f∈F̃=f1,f2,f3,f4;f5); U ) is also confirmed, where X(f5) = {1, 0}. The data must be processed before using Algorithm 1. For the background distribution P, there are some granules whose qi1···i4 = 0 which means no samples are included when f1 = i1, · · · ,f4 = i4, then those granules should be deleted. For some granules, the samples are very small and there is little meaning to consider them, so the granules whose total samples are less than 5 are deleted. Hence 33 granules remain and the samples decrease to 1837. Algorithm 1 is applied to the 1837 samples and we can get the membership function of breast cancer as follows: P(X) = 1 1 + e−(0.19x1+0.24x1+0.36x3+0.80x4−0.75) . (9) Equation (9) represents a hyperplane in five-dimensional space. Although the interim still exists, we ignore it on account of the complexity. The estimated membership degree for each granule can be calculated. Table 5 depicts the real possibility Pi1···i4, the calculated possibility P(Xi) and probability qi1···i4,i5=1. Then Equation (8) can be verified easily:∑ Pi1···i4 = 18.5227, ∑ qi1···i4,i5=1 = 0.4899. In Table 5, the rows that the difference between Pi1···i4 and P(Xi) are over 0.2 are marked in red, only four granules 13th, 16th, 21st and 32nd are included. Since membership degree is estimated through linear regression, it is necessary to compare yi1···i4 and ln P(Xi) 1−P(Xi) , Figure 1 shows the differences of their values. The circles represent the real differences between yi1···i4 and ln P(Xi) 1−P(Xi) . Since least-square estimation is applied here, it is necessary to do parameter test. The vertical line segments represent the 95% confidence intervals of random residuals. It is obvious that the confidence intervals of the 16th, 21st and 32nd cases don’t cover 0, yet not far away from 0. Errors from different angles indicate that logistic regression is applicable in this example. From Equation (9), we know that θi (i = 1, 2, 3, 4) is 0.19, 0.24, 0.36, 0.80 accordingly. θi (i = 1, 2, 3, 4) are all positive numbers which means the risk of having breast cancer will grow when the coded number of factor i gets larger. This is in accordance with the existing knowledge: the risk of having breast cancer will become larger if girls get their first period at a younger age; nulliparous women and those who give the first birth when they are older will have higher risks of having breast cancer; doing previous breast biopsies1 means you are suspected of having breast cancer, the more you do, the larger the chance of being suspected will be; the risk of having breast cancer will grow if the number of people with breast cancer in near relations goes up. This conclusion verifies the efficiency of the logistic regression from another perspective. 1A biopsy is a medical test commonly performed by a surgeon, interventional radiologist, or an interventional cardiologist involving extraction of sample cells or tissues for examination to determine the presence or extent of a disease. 500Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang 5 Conclusions The possibility in logistic regression is equal to the membership degree in factor space. Con- nection between the two sides is established through the membership function. The paper shows that the factor space theory gives logistic regression a relatively different state. Meanwhile, lo- gistic regression forms another foothold of the factor space theory in the big data era. This is of great meaning since it gives us the reason and motivation to explore the factor space theory deeply. Acknowledgements This work is supported in part by Natural Science Foundation of Liaoning Province, China [2015020570] and National Natural Science Foundation (NNSF) of China under Grant [61304090]. Bibliography [1] Bao Y.K (2011); Data Analysis, Tsinghua University Press, 2011. [2] Bruzzi B.P, Sylvan B.G, David P.B, Louise A.B, Catherine S. (1985); Estimating the Pop- ulation Attributable Risk for Multiple Risk Factors Using Case-Control Data, American Journal of Epidemiology, 122(5), 904-914, 1985. [3] Dayton C.M. (1992); Logistic Regression Analysis, University of Maryland, 1992. [4] Guo S.C, Chen G. (2001); Method of Soft Computing in Information Science, Northeastern University Press, 2001. [5] Li H.X, Wang P.Z. (1994); A Mathematical Theory on Knowledge Representation, Tianjin Scientific and Technical Press, 1994. [6] Liu Z.L, Liu Y.C. (1992); Theory of Factorial Neural Networks, Beijing Normal University Press, 1992. [7] Liu H.T, Wang P.Z; Background distribution and fuzzy background relationship (to be pub- lished). [8] Mood C. (2010); Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It, European Sociological Review, 26(1), 67-82, 2010. [9] Peduzzi P., Concato J., Kemper K. (1996);A simulation study of the number of events per variable in logistic regression analysis, Journal of clinical epidemiology, 49(12), 1373-1379, 1996. [10] Prette Z. A. P. D, Prette A.D, Oliveira L.A.D, Gresham F.M, Vance M.J. (2012); Role of so- cial performance in predicting learning problems: Prediction of risk using logistic regression analysis, School Psychology International, 33(6), 615-630, 2012. [11] Russell S.J, Norvig P. (2011); Artificial Intelligence A modern Approach (Third Edition), Tsinghua University Press, 2011. [12] Steyerberga E.W, Jrb F. E. H, Borsbooma G. J. J. M. (2001); Internal validation of pre- dictive models: Efficiency of some procedures for logistic regression analysis, Journal of Clinical Epidemiology, 54, 774-781, 2001. The Logistic Regression from the Viewpoint of the Factor Space Theory 501 [13] Wang P.Z. (2013); Factor space and factor data bases, Journal of Liaoning Engineering and Technology University, 32(10), 1-8, 2013. [14] Wang P.Z, Guo S.C, Bao Y.K, Liu H.T. (2014); Causality analysis in factor spaces, Journal of Liaoning Engineering and Technology University, 33(7), 1-6, 2014. [15] Wang P.Z. (2015); Factor space and data science, Journal of Liaoning Engineering and Technology University, 34(2), 273-280, 2015. [16] Wang P.Z. (1981); In: Hao BL et al (eds) Advance of statistical physics.Science, Beijing, 1981. [17] Wang P.Z. (1985); Fuzzy Sets and Falling Shadow of Random Sets, Beijing Normal University Press, 1985. [18] Wang P.Z, Sugeno M. (1982); The factors field and background structure for fuzzy subsets, Fuzzy Mathematics, (2), 45-54, 1982. [19] Wang P.Z, Zhang H.M, Ma X.W, Xu W. (1991); Fuzzy set-operations represented by falling shadow theory, in: Fuzzy Engineering toward Human Friendly Systems, Proceedings of the International Fuzzy Engineering Symposium ’91, Yokohama, Japan, 1, 82-90, 1991. [20] Wang P.Z, Liu H.C, Zhang X.H. (1993); Win-win strategy for probability and fuzzy math- ematics, The Journal of Fuzzy Mathematics, 1(1), 223-231, 1993. [21] Wang P.Z. (2013); Fuzzy Mathematics and Optimization,Beijing Normal University Press, 2013. [22] Wang P.Z, Liu Z.L, Shi Y, Guo S.C. (2014); Factor Space, the Theoretical Base of Data Science, Annals of Data Science, 1(2), 233-251, 2014. [23] Zadeh L.A. (1965); Fuzzy Subsets, Information and Control, 8, 338-353, 1965. 502Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang Appendix Table 1: Detailed distribution of cases and their matched controls in all strata defined by cross- classifying the four risk factors Risk factor category Age Mothers Age at at first Previous plus sisters No. No. menarche live breast with breast of of birth biopsies cancer cases controls 0 0 0 0 14 27 0 0 0 1 3 1 0 0 0 2 0 0 0 0 1 0 3 1 0 0 1 1 1 0 0 0 1 2 0 0 0 1 0 0 54 85 0 1 0 1 20 12 0 1 0 2 1 1 0 1 1 0 5 5 0 1 1 1 2 0 0 1 1 2 0 0 0 2 0 0 81 100 0 2 0 1 18 20 0 2 0 2 3 0 0 2 1 0 7 12 0 2 1 1 4 2 0 2 1 2 0 0 0 3 0 0 27 14 0 3 0 1 12 7 0 3 0 2 1 0 0 3 1 0 0 2 0 3 1 1 1 0 0 3 1 2 1 0 1 0 0 0 27 56 1 0 0 1 8 7 1 0 0 2 1 0 1 0 1 0 1 4 1 0 1 1 0 0 1 0 1 2 0 0 1 1 0 0 112 173 1 1 0 1 27 12 The Logistic Regression from the Viewpoint of the Factor Space Theory 503 Table 2: Table 1 - Continued Risk factor category Age Mothers Age at at first Previous plus sisters No. No. menarche live breast with breast of of birth biopsies cancer cases controls 1 1 0 2 4 0 1 1 1 0 14 4 1 1 1 1 1 2 1 1 1 2 0 0 1 2 0 0 187 174 1 2 0 1 41 20 1 2 0 2 10 1 1 2 1 0 11 10 1 2 1 1 5 0 1 2 1 2 0 1 1 3 0 0 41 47 1 3 0 1 15 5 1 3 0 2 4 0 1 3 1 0 4 5 1 3 1 1 1 0 1 3 1 2 1 0 2 0 0 0 9 15 2 0 0 1 3 2 2 0 0 2 2 0 2 0 1 0 1 1 2 0 1 1 0 0 2 0 1 2 0 0 2 1 0 0 43 44 2 1 0 1 14 5 2 1 0 2 1 0 2 1 1 0 3 2 2 1 1 1 2 0 2 1 1 2 0 0 2 2 0 0 53 52 2 2 0 1 9 8 2 2 0 2 2 0 504Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang Table 3: Table 2 - Continued Risk factor category Age Mothers Age at at first Previous plus sisters No. No. menarche live breast with breast of of birth biopsies cancer cases controls 2 2 1 0 3 1 2 2 1 1 2 1 2 2 1 2 0 0 2 3 0 0 17 4 2 3 0 1 4 3 2 3 0 2 1 0 2 3 1 0 3 0 2 3 1 1 3 0 2 3 1 2 0 0 Table 4: Levels of the risk factors Risk factor Range Coding Age at menarche < 12 2 12-13 1 ≥ 14 0 Age at first live birth < 20 0 20-24 1 25-29(or nulliparous) 2 ≥ 30 3 No. of previous 0 or 1 0 breast biopsies ≥ 1 1 No. of mothers 0 0 plus sisters 1 1 with breast cancer ≥ 2 2 The Logistic Regression from the Viewpoint of the Factor Space Theory 505 Table 5: Pi1···i4, P(Xi) and qi1···i4,i5=1 for 33 granules Risk factor category Age Mothers case Age at at first Previous plus sisters Pi1···i4 P(Xi) qi1···i4i5=1 number menarche live breast with breast birth biopsies cancer 1 0 0 0 0 0.3415 0.3207 0.0076 2 0 1 0 0 0.3885 0.3739 0.0294 3 0 1 0 1 0.6250 0.5711 0.0109 4 0 1 1 0 0.5000 0.4624 0.0027 5 0 2 0 0 0.4475 0.4304 0.0441 6 0 2 0 1 0.4737 0.6275 0.0098 7 0 2 1 0 0.3684 0.5211 0.0038 8 0 2 1 1 0.6667 0.7081 0.0022 9 0 3 0 0 0.6585 0.4887 0.0147 10 0 3 0 1 0.6316 0.6806 0.0065 11 1 0 0 0 0.3253 0.3640 0.0147 12 1 0 0 1 0.5333 0.5606 0.0044 13 1 0 1 0 0.2000 0.4519 0.0005 14 1 1 0 0 0.3930 0.4200 0.0610 15 1 1 0 1 0.6923 0.6175 0.0147 16 1 1 1 0 0.7778 0.5105 0.0076 17 1 2 0 0 0.5180 0.4781 0.1018 18 1 2 0 1 0.6721 0.6713 0.0223 19 1 2 0 2 0.9091 0.8199 0.0054 20 1 2 1 0 0.5238 0.5689 0.0060 21 1 2 1 1 1.0000 0.7463 0.0027 22 1 3 0 0 0.4659 0.5368 0.0223 23 1 3 0 1 0.7500 0.7210 0.0082 24 1 3 1 0 0.4444 0.6254 0.0022 25 2 0 0 0 0.3750 0.4097 0.0049 26 2 0 0 1 0.6000 0.6074 0.0016 27 2 1 0 0 0.4943 0.4675 0.0234 28 2 1 0 1 0.7368 0.6619 0.0076 29 2 1 1 0 0.6000 0.5585 0.0016 30 2 2 0 0 0.5048 0.5263 0.0289 31 2 2 0 1 0.5294 0.7124 0.0049 32 2 3 0 0 0.8095 0.5843 0.0093 33 2 3 0 1 0.5714 0.7580 0.0022 506Q.F. Cheng, T.T. Wang, S.C. Guo, D.Y. Zhang, K. Jing, L. Feng, Z.F. Zhao, P.Z. Wang 5 10 15 20 25 30 −1.5 −1 −0.5 0 0.5 1 1.5 2 Residual Case Order Plot R e si d u a ls Case Number Figure 1: The differences between yi1···i4 and ln P(Xi) 1−P(Xi)