Applications of Discriminant Analysis in Medical diagnosis Hazim M. Gorgess Anas Kh. Mohammed Dept. of Mathematics/College of Education for Pure Science (Ibn AL-Haitham) / University of Baghdad Received in : 19June2013 , Accepted in : 4 December 2013 Abstract In this paper, the discriminant analysis is used to classify the most wide spread heart diseases known as coronary heart diseases into two groups (patient, not patient) based on the changes of discrimination features of ten predictor variables that we believe they cause the disease . A random sample for each group is employed and the stepwise procedures are performed in order to delete those variables that are not important for separating the groups. Tests of significance of discriminant analysis and estimating the misclassification rates are performed. Keywords: Discriminant analysis, classification, stepwise procedures, misclassification rates . 331 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Introduction Discriminant analysis is a technique for the multivariate study of group differences. Discriminant analysis is particularly appropriate when one wishes to describe, summarize, and understand the differences between or among groups. It is convenient to determine which of a set of variables is best captures or characterizes group differences. The most frequent applications of discriminant analysis are for predictive purpose, that is, for situations in which it is necessary or desirable to classify subjects into groups or categories.[1] Theoretical Part 1. The Discriminant Function for Two Groups The derived discriminant functions may be used to classify new cases into groups. Prior probabilities of belonging to each group may be entered or derived from the observed data. For the case of two groups, we assume that the two populations to be compared would have the same covariance matrix ∑1=∑2=∑ , but distinct mean vectors 𝜇1 and 𝜇2. We work with samples y11, y12, . . . ,y1𝑛1 and y21, y22, . . . ,y2𝑛2 from the two populations. As usual ,each vector yij consists of measurements on p variables. The discriminant function is the linear combination of these p variables that maximizes the distance between the two transformed group mean vectors. A linear combination z=𝑎ʹy transforms each observation vector to scalar z1i = 𝑎ʹy1i = 𝑎1y1i1 + 𝑎2y1i2 + … +𝑎py1ip , i =1, 2, …, 𝑛1 z2i = 𝑎ʹy2i = 𝑎1y2i1 + 𝑎2y2i2 + … +𝑎py2ip , i =1, 2, …, 𝑛2 Hence, the 𝑛1 + 𝑛2 observation vectors in the two samples. y11, y12, …, y1𝑛1 y21 , y22 , … , y2𝑛2, are transformed to scalars z11, z12, …, z1𝑛1, z21, z22, …, z2𝑛2 We find the means z�1 = ∑ z1i 𝑛1 i=1 𝑛1 = 𝑎ʹy�1,and z�2 = ∑ z2i 𝑛2 i=1 𝑛2 = 𝑎ʹy�2 where y�1 = ∑ y1i 𝑛1 i=1 𝑛1 , y�2 = ∑ y2i 𝑛2 i=1 𝑛2 We wish to find the vector 𝑎 that maximizes the ratio (z�1 − z�2)2 sz2 which can be expressed as [1] : Q = (z�1 − z�2)2 sz2 = �𝑎ʹ �y�1 − y�2�� 2 𝑎ʹsp𝑎 …( 2.1 ) The numerator of this ratio is the square of the difference between the means of z for the two groups and the denominator is the sum of squares within groups. Putting d= y�1 − y�2 , D= 𝑎 ʹd and w= 𝑎ʹSp𝑎, and substituting in equation ( 2.1 ) we get: Q = D2 w …( 2.2 ) By differentiating Q with respect to 𝑎 and putting the derivative equal to zero [2] we obtain ∂Q ∂𝑎 = 2wDd − 2D2Sp𝑎 w2 = 0 This yields wDd = D2Sp𝑎 , Dividing by D 2 we obtain w D d = Sp𝑎 and hence 𝑎 = Sp -1 w D d , since w D is any nonzero constant so let w D = 1 and maximize of ( 2.1 ) occur as when :[1] 𝑎 = Sp -1 d = Sp -1(y�1 − y�2) …( 2.3) 332 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Or when 𝑎 is any multiple of Sp -1(y�1 − y�2). Thus the maximizing vector 𝑎 is not unique however its direction is unique, that is the relative values or ratios of 𝑎1 , 𝑎2 , … , 𝑎p are unique . 2. Discriminant Analysis For Several Groups In discriminant analysis for several groups, we are concerned with finding linear combinations of variables that would best separate the k groups of multivariate observations. For k groups (samples) with 𝑛i observations in the ith group, we transform each observation vector yij to obtain zij = 𝑎ʹ yij i=1,2, . . . ,k ; j=1,2, . . . , 𝑛i and find the means z�i = 𝑎ʹy�i , where y�i = ∑ yij 𝑛i j=1 𝑛i . As in the two group case , we seek the vector 𝑎 that maximally separates z�1, z�2, . . . , z�k. To express separation among z�1, z�2, . . . , z�k we extend the separation criterion to the k group case Since 𝑎ʹ( y�1 − y�2)= (y�1 − y�2) ʹ 𝑎 we can write :[1] (z�1 − z�2)2 Sz 2 = [𝑎ʹ( y�1 − y�2)]2 𝑎ʹSp𝑎 = 𝑎ʹ( y�1 − y�2)(y�1 − y�2)ʹ 𝑎 𝑎ʹSp𝑎 ...( 2.4 ) To extend (2.4) to k groups, we use the H matrix in place of ( y�1 − y�2)(y�1 − y�2)ʹ and E in place of Sp to obtain: 𝜆 = 𝑎ʹH𝑎 𝑎ʹE𝑎 where : ...( 2.5 ) H= �𝑛i�y�i . − y�. .��y�i . − y�. .� ʹ k i=1 , E= ��( yij − y�i .) (yij − y�i . )ʹ 𝑛i j=1 k i=1 y�i . = ∑ yij 𝑛i j=1 𝑛i = yi . 𝑛i , y�.. = �� yij 𝑛k 𝑛i j=1 k i=1 = y. . 𝑛k The p×p matrix H has a between sum of squares on the diagonal for each of the p variables .off diagonal elements are analogous sums of products for each pair of variables. The p×p error matrix E has a within sum of squares for each variable on the diagonal ,with analogous sums of products off diagonal. Thus H has the form: H = ⎝ ⎜ ⎜ ⎜ ⎛ SSH11 SPH12 . . . SPH1p SPH12 SSH22 . . . SPH2p . . . . . . . . . SPH1p SPH2p . . . SSHpp⎠ ⎟ ⎟ ⎟ ⎞ ...( 2.6 ) The matrix E can be expressed in a form similar to ( 2.6 ) E = ⎝ ⎜ ⎜ ⎜ ⎛ SSE11 SPE12 . . . SPE1p SPE12 SSE22 . . . SPE2p . . . . . . . . . SPE1p SPE2p . . . SSEpp⎠ ⎟ ⎟ ⎟ ⎞ ...( 2.7 ) 333 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 We can write the ratio in ( 2.5 ) as 𝑎ʹ H𝑎 = 𝜆𝑎ʹE𝑎 𝑎ʹ (H𝑎 – 𝜆E𝑎) = 0 ...( 2.8 ) We examine values of 𝜆 and 𝑎 that are solutions of ( 2.8 ) in a search for the value of 𝑎 that results in maximum 𝜆. The solution 𝑎ʹ = 0ʹ is not permissible because it gives 𝜆 = 0 0 in (2.5).Other solutions are found from H𝑎 – 𝜆E𝑎=0 ...( 2.9 ) Which can be written in the form (E−1H – 𝜆I ) 𝑎 =0 ...( 2.10 ) The solution of ( 2.10 ) are the eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆s and associated eigenvectors 𝑎1 , 𝑎2 , . . . , 𝑎s of E-1H. The eigenvalues are considered to be ranked 𝜆1> 𝜆2>. . . > 𝜆s. The number of nonzero eigenvalues s is the rank of H which can be found as the smaller of k−1 and p. Thus the largest eigenvalue λ1 is the maximum value of 𝜆 = 𝑎ʹ H𝑎 𝑎ʹ E𝑎 in ( 2.10 ) and the coefficient vector that produces the maximum is the corresponding eigenvector 𝑎1. Eq( 2.10 ) can be verified by using calculus as follows: Differentiating 𝜆 = 𝑎ʹ H𝑎 𝑎ʹ E𝑎 with respect to 𝑎 then putting the derivative equal to zero, we obtain :[55] 𝜕𝜆 𝜕𝑎 = 2(𝑎ʹE𝑎)H𝑎 − 2(𝑎ʹH𝑎)E𝑎 (𝑎ʹE𝑎)2 = 0 This yields : (𝑎ʹE𝑎)H𝑎 − (𝑎ʹH𝑎)E𝑎 = 0, dividing by 𝑎ʹE𝑎, we obtain : H𝑎 − 𝜆 E𝑎 = 0, or ( H − 𝜆 E) 𝑎 = 0. Which can be written as (E−1H – 𝜆I ) 𝑎 =0 hence , the discriminant function that maximally separates the means is z1=𝑎1́ y that is, it represents the dimension that maximally separates the means. From the s eigenvectors 𝑎1 , 𝑎2 , . . . , 𝑎s of E−1H corresponding to 𝜆1 , 𝜆2 , . . . , 𝜆s, we obtain s discriminant functions. z1= 𝑎1́ y , z2= 𝑎2́ y , . . . , zs= 𝑎ś y. The relative importance of each discriminant function zi, i=1,2, . . . ,s. can be assessed by considering its eigenvalue as a proportion of the total [37] 𝜆i ∑ 𝜆j s j=1 ...( 2.11 ) By this criterion two or three discriminant functions will often suffice to describe the group differences. The discriminant function associated with small eigenvalues can be neglected. 3. Test of Significance of Discriminant Analysis For the case of two groups, we wish to test H0 : 𝜇1 = 𝜇2 Vs H1 : 𝜇1 ≠ 𝜇2 the discriminant function coefficient vector 𝑎 is significantly different form 0 if T2 is significant, where :[5] T2 = 𝑛1 𝑛2 𝑛1 + 𝑛2 (y�1 − y�2)ʹ Sp−1 ( y�1 − y�2) … (2.12) Which is distributed as Tp,𝑛1+𝑛2−2 2 when H0: 𝜇1=𝜇2 is true. We reject H0 if T2 > T𝛼,p,𝑛1+𝑛2−2 2 . Also, we can use F approximation test where :[5] 334 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 F = 𝑛1 + 𝑛2 − p − 1 (𝑛1 + 𝑛2 − 2)p T2 … (2.13) Which is distributed as Fp ,𝑛1+𝑛2− p−1 when H0: 𝜇1=𝜇2 is true. We reject H0 if F > Fp ,𝑛1+𝑛2− p−1 For several groups, to test H0 : 𝜇1 = 𝜇2 = . . . =𝜇k, we use the Wilk's lambda statistic defined as : [6] Λ = |E| |E+H| ...( 2.14 ) We reject H0 if Λ > Λ𝛼, p, vH ,vE . The parameters in Wilk's Λ distribution are p=number of variables, vH = k−1 degrees of freedom of hypothesis, vE = N − k with N = �𝑛i k i=1 degrees of freedom for error. Wilk's Λ in ( 2.14 ) can be expressed in terms of the eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆s of E−1H as follows : Λ1 = � 1 1 + 𝜆i s 𝑖=1 ...( 2.15 ) The number of nonzero eigenvalues of E−1H is s= min ( p, vH ) which is the rank of H. The range of Λ is 0 ≤ Λ ≤ 1 and the test based on wilk's Λ is an inverse test in the sense that we reject H0 for small value of Λ. Since Λ1 is small if one or more 𝜆i's are large, Wilk's Λ tests for significance of the eigenvalues and thereby for the discriminant functions. The s eigenvalues represent s dimensions of separation of the mean vectors y�1, y�2, … , y�k. We are interested in which, if any of these dimensions is significant. In addition to the Wilk's Λ test. We can use 𝜒2 approximation for Λ1 with vH , vE degrees of freedom. V1 = − �vE − 1 2 (p −vH+1)� ln Λ1 ...( 2.16 ) = − �N − 1 − 1 2 (p+k)� ln � 1 1 + 𝜆i S i=1 = �N − 1 − 1 2 (p + k)�� ln(1 + 𝜆i) s i=1 Which is approximately 𝜒2 with p(k−1) degrees of freedom. The test statistic Λ1 and its approximations( 2.16 ) test the significance of all of 𝜆1 , 𝜆2 , . . . , 𝜆s. If the test leads to rejection of H0, we conclude that at least one of the 𝜆's is significantly different from zero, and therefore there is at least one dimension of separation of mean vectors. Since 𝜆1 is the largest, we are only sure of its significance along with that of z1= 𝑎1́ y. To test the significance of 𝜆2 , . . . , 𝜆s ,we delete 𝜆1 from Wilk's Λ and the associated 𝜒2 approximation to obtain Λ2 = � 1 1 + 𝜆𝑖 s i=2 V2 = −�N − 1 − 1 2 (p + k)�� ln(1 + 𝜆i) s i=2 Which is approximately 𝜒2 with (p−1)(k−2) degree of freedom. If this test leads to rejection of H0, we conclude that at least 𝜆2 is significant along with the associated discriminant function z2= 𝑎2́ y. We can continue in this fashion, testing each 𝜆i in turn until a test fails to reject H0. The test statistic at the mth step is : 335 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Λm = � 1 1 + 𝜆i s i=m which is distributed as Λp−m+1, k−m, N−k−m+1 ...( 2.17 ) The statistic Vm = −�N − 1 − 1 2 (p + k)�� ln Λm s i=2 ...( 2.18 ) = �N − 1 − 1 2 (p + k)�� ln(1 + 𝜆i) s i=m has an approximate 𝜒2 distribution with (p−m+1)(k−m) degrees of freedom. We can also use F approximation for each Λi. ForΛ1 = � 1 1 + 𝜆i s i=1 we use F = 1 − Λ1 1 𝑡⁄ Λ1 1 𝑡⁄ 𝑑f2 𝑑f1 ...( 2.19 ) where t = � P2(k − 1)2 − 4 P2 + (k − 1)2 − 5 Putting w=N − 1 − 1 2 (p+k) then 𝑑f1 = P(k − 1), 𝑑f2 = wt − 1 2 [p(k − 1) − 2] For Λm = � 1 1 + 𝜆i s i=m , m=2,3,…,s We use F = 1 − Λ𝑚 1 𝑡⁄ Λ𝑚 1 𝑡⁄ 𝑑f2 𝑑f1 with p − m+1 and k − m in place of p and k − 1 t = � (p − m + 1)2(k − m)2 − 4 (p − m + 1)2 + (k − m)2 − 5 w = N − 1 − 1 2 (p+k) , 𝑑f1=(p − m+1)(k − m), 𝑑f2=wt − 1 2 [(p − m + 1)(k − m) − 2] 4. Tests of Equality of Covariance Matrices [1] For k multivariate populations, the hypothesis of equality of covariance matrices is H0 : ∑1=∑2=···=∑k Calculate C1 = �∑ 1 vi − 1 ∑ viki=1 k i=1 �� 2p2+ 3p−1 6(p+1)(k−1) � …( 2.20) Then : U= 2(1 C1) ln M is approximately 𝜒2 � 1 2 (𝑘 − 1)p(p + 1� …(2.21) Where M is M = |S1| v1 2 |S2| v2 2 … |Sk| vk 2 |Sp | ∑ vii 2 …( 2.22) and, ln M = 1 2 � vi ln|Si| k i=1 − 1 2 (� vi k i=1 ) ln�Sp� … (2. 23) We reject H0 if U > 𝜒𝛼2 …( 2.24 ) 336 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 5. Stepwise Selection Of Variables [6] The stepwise method for selecting variables in discriminant analysis is rather like doing a stepwise regressions and is especially useful in similar circumstances, namely when we have rather a long list of possible classification variables and it is unlikely that all will make a useful contribution to a set of discriminant functions. We would like to find the best subset, or else something close to that. The single variable that gives the significant classification into our groups is chosen first, then we look at the remaining variables and add the one that gives the biggest improvement. We check the two variables now, and make sure that each makes a significant contribution in the presence of the other. At each step we see whether another variables can be added that will make a significant improvement, and whether any previous ones can be removed. The process stops when no more variables can be added or removed at the level of significance we are using. 6. Classification Procedures 6.1. Classification use Discriminant Function A simple procedure for classification can be based on the discriminant function, z= 𝑎ʹy0= (y�1 − y�2) ʹSp -1 y0 ...( 2.25 ) Where y0 is the vector of measurements on a new sampling unit that we wish to classify into one of the two groups (populations). Denote the two groups by G1 and G2. Fisher’s (1936) linear classification procedure assigns y0 to G1 if z0=𝑎 ʹy0 is closer to z�1 than to z�2 and assigns y0 to G2if z0is closer to z�2than to z�1, z0is closer to z�1 if z0 > 1 2 (z�1 + z�2) where z�1 = � z1𝑖 𝑛1 𝑛1 i=1 = 𝑎ʹy�1 = (y�1 − y�2) ʹSp −1 y�1 To express the classification rule in terms of y, we first write 1 2 (z�1 + z�2) in the form: 1 2 (z�1 + z�2) = 1 2 𝑎ʹ �y�1 + y�2� = 1 2 (y�1 − y�2) ʹSp −1 ( y�1 + y�2) ...( 2.26 ) Then the classification rule becomes, assign y0 to G1 if [1] 𝑎ʹy0 = (y�1 − y�2) ʹ Sp -1y0 > 1 2 (y�1 − y�2) ʹ Sp −1 ( y�1 + y�2) ...( 2.27 ) and assign y0 to G2 if 𝑎ʹy0 = (y�1 − y�2) ʹ Sp -1y0 < 1 2 (y�1 − y�2) ʹ Sp −1 ( y�1 + y�2) ...( 2.28 ) 6.2. Classification Use Simple Classification Function [6] Fisher (1936) proposed a simple classification function for each group based on a linear combination of the discriminating variables. For the case of k groups and p discriminating variables, the simple classification function has the form zg = bg0 + bg1 X1 + bg2 X2 + . . . + bgp Xp , g= 1,2, . . . ,k …( 2.29 ) The coefficient bg i associated with variable i in group g is given as : 337 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 bg i = (N − g) � wij yjk p j=1 Where wij represents the ij th element from the inverse matrix of within groups sums of squares and cross products, N represents the whole number of observations. The constant bk0 is given as : bk0 = − 0.5 � bkj yjk p j=1 … (2.30) The rule of classification is simply to classify the new observation to the group that yields a maximum value of the classification function h after substituting all discriminating variables into classification functions. 7. Estimating Misclassification Rates A simple estimate of the error rate can be obtained by trying out the classification procedure on the same data set that has been used to compute the classification function. This method is referred to as resubstitution. Each observation vector yij is substituted to the classification functions and assigned to a group .[1] We then count the number of correct classifications and the number of misclassifications. The proportion of misclassifications resulting from resubtitution is called the apparent error rate (APER). The results can be conveniently displayed in a classification table as shown below: Table ,classification table for two groups Actual group Number of observations Predicted group 1 2 1 n1 n11 n12 2 n2 n21 n22 Let us denote the first and second groups by G1 and G2 respectively. Among n1 observations in G1,n11 are correctly classified into G1 and n12 are misclassified into G2 ,where n1=n11+n12. Similarly of the n2 observations in G2, n21 are misclassified into G1, and n22 are correctly classified into G2 where n2=n21+n22 thus:[6] APER = n12+n21 n1+n2 …( 2.31) Similarly, we can define apparent correct classification rate (APCR) as: APCR= n11 +n22 n1+n2 …( 2.32) The method of resubstitution can be readily extended to the case of several groups. Particular Application The real data were collected from records of (51) real patients suffering from coronary heart disease (CHD) from Ibn-Al-Nafees Hospital, moreover, the same informations were obtained about (54) healthy persons. The discriminant analysis were then performed with two groups (patient, not patient) and ten predictor variables that we belive they cause the disease. the variables for each group are : A. The dependent variable which represents (0 for Not patient) and (1 for patient) B. Ten independent variables are ascribed below : 1. Age (X1) 2. S.cholestrol (X2) 338 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 3. Triglyceride (X3) 4. LDL (high low density cholesterol) (X4) 5. HDL(low high density cholesterol) (X5) 6. Diabetes mellitus (Sugar) (X6) 7. Hypertension (systolic Blood pressure) (X7) 8. Sex (0 for Male) (1 for Female) (X8) 9. Smoking (0 for not smoker) (1 for X-smoker) (2 for smoker) (X9) 10. Family History (Heredity factor) (X10) (0 for no heritage factor) (1 for heritage factor) The mean for each group and the total mean are presented in (Table 1). Applying the rules of stepwise method for discriminant analysis stated earlier we found that only four predictor variables, namely X4, X5, X10, X2 give the significant classification into our groups. 1. The Discriminant Function We find the linear discriminant function coefficients by using equation (2.3) the linear discriminant function is : Z= (0.1300X4) + (0.3385 X5 ) – (3.6060 X10) – (0.1958 X2) 2. Test of Significance of Discriminant Analysis Now to test the significance of discriminant function we calculate the statistic T2, it found the statistic to be T2 = 381.1326171 Since T2 > T𝛼, p,𝑛1+𝑛2−2 2 = T0.01, 4,103 2 = 14.511, Then the discriminant function is significant. Another test of significant can be performed by using eq( 2.16 ) where the value of ∨ was found to be 156.212 comparing this value with 𝜒(0.01,4) 2 = 13.2767, we conclude that the discriminant function is significant. Also, we can use F approximation for Λ = 0.213 to test the significance by using equation (2.19), where F = 92.4 and compare with F0.01,4,100= 3.51, we conclude that the discriminant function is significant. 3. Tests of Equality of Covariance Matrices We use the 𝜒2 approximation test, the value of u was calculated by using equation (2.21), it was found to be 28.01 while the critical value of 𝜒0.001,102 =29.59 Thus we accept H0 since u < 𝜒0.001,102 . 4. Classification Procedures 4.1. Classification Use Discriminant Function We must find the mean of discriminant function for the two groups z� = [0.1300*X�4] + [0.3385*X�5] – [3.6060*X�10] – [0.1958*X�2] The mean discriminant function of group 1 is z�(1) = [0.1300*111.69] + [0.3385*64.43] – [3.6060*0.13] – [0.1958*199.37] = −3.176171 And the mean discriminant function of group 2 is z�(2) = [0.1300*184.94] + [0.3385*34.10] – [3.6060*0.63] – [0.1958*260.57] = −17.706336 The cut point is −3.176171+ −17.706336 2 = −10.4412535 , 339 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 we use equation (2.27), (2.28) to classify the new observations, for example, we want to classify new observation from group 1 (not patients) with the informations LDL=80, HDL=71, S.cholestrol=171 and has no hereditary factor, we found z(6) = [0.1300*80] + [0.3385*71] – [3.6060*0] – [0.1958*171] = 0.9517, By equation ( 2.80 ), 0.9517> −10.4412535. Then the observation is correctly classified in Group 1 4.2. Classification use Simple Classification Function After we find the inverse of within – sum square and cross products marix, we find the functions of group 1 and group 2 by equation (2.29), (2.30) where : Z(1)= (−60.037) + (0.337 *X4) + (0.902 *X5)+ (−2.598 * X10)+ (0.117 * X2) Z(2)= (−70.480) + (0.207 * X4) + (0.564 * X5) +(0.997* X10) + (0.313 * X2) And we use the two functions to classify the new observations. For example, we want to classify a new observation from group 2 (patients) with the informations LDL=183, HDL=28, S.cholestrol=251 and has no hereditary factor, we found z(1) = (−60.037) + (0.117 * 251) + (0.337 *183) + (0.902 * 28) + (−2.598 * 0) = 56.257 And z(2) = ( −70.480) + (0.313 * 251) + (0.207 *183) + (0.564*28) + (0.997 * 0) = 61.756 ∵ z(1) < z(2) , Then the observation is in group 2 5. Estimating Misclassification Rates We calculate apparent error rate by equation (2.31) APER = 2+3 105 = 4.8% And we calculate the apparent correct rate by equation (2.32) APCR =49+51 105 = 95.2% the correctly classified into group (1) 42 54 =96.3% The misclassified into group (1)is 2 54 = 3.7% The correctly classified into group (2)is 49 51 =96.1% The misclassified into group (2) 2 51 = 3.9% The classification is represented in (Table 2). 340 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Conclusions From the theoretical and practical study, we believe that the following points are considerable : 1. By using the stepwise method, we conclude that the first predictor variable that has the largest significant effect for discriminant between the two groups is high low density cholesterol X4, followed by the low high density cholesterol X5 then the Heredity factor X10 and finally by s.cholestrol X2, thus our discriminant function was constructed on the basis of these variables. 2. According to the test of significance we made, namely, the wilk's Λ test and the 𝜒2 approximation test with level of significance 𝛼= 0.01 we found that the discriminant function constructed significantly separates the groups. 3. By using the resubstitation method, the resulting classification table revealed that about 5% of the cases are wrongly classified, while about 95% of the cases are correctly classified to the groups. References 1. Rencher, A. C. (2012), Methods of Multivariate Analysis, Third Edition, Wiley, New York. 2. Kandall, M.G. (1955), The Advanced Theory of Statistics, Vol.II, Third Edition, Charles Griffin, London 3. Samnles, Wilks (1963), Mathematical Statistics, John wiley, New York, London. 4. Klecka, W.R. (1984), "Discriminant Analysis" Beverly Hills/ London ) " استخدام التحلیل الممیز لتشخیص بعض امراض العیون"، بحث منشور في مجلة االدارة 2008صالح، عائدة ھادي ( .5 286-264والستون، ص واالقتصاد، العدد السابع )، استخدام الدالة الممیزة لتشخیص حاالت التھاب األمعاء عند األطفال الرضع، 1998السلیماني، مؤید سلمان عباس ( .6 رسالة ماجستیر في اإلحصاء، مقدمة إلى مجلس كلیة اإلدارة واالقتصاد، جامعة بغداد. Table (1) : The Mean for Each Group and the Total Mean X�1 X�2 X�3 X�4 X�5 X�6 X�7 X�8 X�9 X�10 Not patients 45.85 199.37 116.80 111.69 64.43 137.61 125.74 0.43 0.63 0.13 patients 63.02 260.57 202.75 184.94 34.10 201.31 155.20 0.31 1.14 0.63 Total 54.19 229.10 158.54 147.27 49.70 168.55 140.05 0.37 0.88 0.37 Table (2) : Classification Result for Discriminant Function Actual group Number of observations Predicted group Healthy(1) Disease(2) Healthy(1) Disease(2) 54 51 51 2 3 49 % Healthy(1) Disease(2) 100 100 94.4 3.9 5.6 96.1 341 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 تطبیقات التحلیل التمییزي في التشخیص الطبي حازم منصور كوركیس نس خلیل محمدأ جامعة بغداد/ )ابن الھیثم(كلیة التربیة للعلوم الصرفة قسم الریاضیات / 2013 كانون االول 4: في ، قبل البحث 2013حزیران 19: في أستلم البحث الخالصة التمییزي لتصنیف أمراض القل�ب األوس�ع انتش�ارا والمعروف�ة باس�م أم�راض القل�ب استخدم التحلیل في ھذا البحث، التاجیة (انس�داد الش�رایین) إل�ى مجم�وعتین (م�ریض، غی�ر م�ریض) عل�ى أس�اس التغی�رات التمییزی�ة لعش�رة م�ن المتغی�رات .التنبؤیة التي تعتقد انھا تسبب المرض ونُفّ�ـذمھم�ة ف�ي فص�ل المجموع�ات الغی�ر المتغی�راتالمراح�ل لح�ذف اس�لوب ونُفّـذاستخدمت عینة عشوائیة لكل مجموعة .اختبار معنویة الدالة الممیزة ومعدل خطأ التصنیف .، معدل خطأ التصنیف ، التصنیف، اسلوب المراحل كلمات مفتاحیة: التحلیل التمییزي 342 | Mathematics @a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (1) 2014 Anas Kh. Mohammed