Applications of Discriminant Analysis in Medical diagnosis 
  

Hazim M. Gorgess 
Anas Kh. Mohammed  

Dept. of Mathematics/College of Education for Pure Science (Ibn AL-Haitham) 
/ University of Baghdad 

 
Received in : 19June2013 , Accepted in : 4 December 2013 

 
Abstract 
In this paper, the discriminant analysis is used to classify the most wide spread heart diseases 
known as coronary heart diseases into two groups (patient, not patient) based on the changes 
of discrimination features of ten predictor variables that we believe they cause the disease . 
A random sample for each group is employed and the stepwise procedures are performed in 
order to delete those variables that are not important for separating the groups. Tests of 
significance of discriminant analysis and estimating the misclassification rates are performed. 
 
Keywords: Discriminant analysis, classification, stepwise procedures, misclassification 
rates . 
 
 
331 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


Introduction 
Discriminant analysis is a technique for the multivariate study of group differences. 
Discriminant analysis is particularly appropriate when one wishes to describe, summarize, 
and understand the differences between or among groups. 
It is convenient to determine which of a set of variables is best captures or characterizes group 
differences. The most frequent applications of discriminant analysis are for predictive 
purpose, that is, for situations in which it is necessary or desirable to classify subjects into 
groups or categories.[1] 
 
Theoretical Part 
1. The Discriminant Function for Two Groups  
The derived discriminant functions may be used to classify new cases into groups. Prior 
probabilities of belonging to each group may be entered or derived from the observed data.  
For the case of two groups, we assume that the two populations to be compared would  have 
the same covariance matrix ∑1=∑2=∑ ,  but distinct mean vectors 𝜇1 and 𝜇2. We work with 
samples y11, y12, . . . ,y1𝑛1 and y21, y22, . . . ,y2𝑛2 from the two populations. As usual ,each 
vector yij  consists of measurements on p variables. The discriminant function is the linear 
combination of these p variables that maximizes the distance between the two transformed 
group mean vectors. A linear combination z=𝑎ʹy transforms each observation vector to scalar 
z1i = 𝑎ʹy1i = 𝑎1y1i1 + 𝑎2y1i2 + … +𝑎py1ip      , i =1, 2, …, 𝑛1 
z2i = 𝑎ʹy2i = 𝑎1y2i1 + 𝑎2y2i2 + … +𝑎py2ip    , i =1, 2, …, 𝑛2  
Hence, the 𝑛1 + 𝑛2 observation vectors in the two samples. y11, y12, …, y1𝑛1 
y21 , y22 , … , y2𝑛2, are transformed to scalars z11, z12, …, z1𝑛1, z21, z22, …, z2𝑛2 

 We find the means z�1 =
∑ z1i
𝑛1
i=1
𝑛1

= 𝑎ʹy�1,and  z�2 =
∑ z2i
𝑛2
i=1
𝑛2

= 𝑎ʹy�2 

where  y�1 =
∑ y1i
𝑛1
i=1
𝑛1

 ,  y�2 =
∑ y2i
𝑛2
i=1
𝑛2

 
We wish to find the  vector 𝑎 that maximizes the ratio 
(z�1 −  z�2)2

sz2
 which can be expressed as [1] : 

Q =
(z�1 −  z�2)2

sz2
=
�𝑎ʹ �y�1 −  y�2��

2

𝑎ʹsp𝑎
 …( 2.1 ) 

The numerator of this ratio is the square of the difference between the means of z for the two 
groups and the denominator is the sum of squares within groups. Putting d= y�1 − y�2 , D= 𝑎

ʹd 
and w= 𝑎ʹSp𝑎, and substituting in equation ( 2.1 ) we get: 

Q =
D2

w
 …( 2.2 ) 

By differentiating Q with respect to 𝑎 and putting the derivative equal to zero [2] 

we obtain  
∂Q
∂𝑎

 =
2wDd −  2D2Sp𝑎

w2
= 0 

This yields  wDd =  D2Sp𝑎 , Dividing by D
2 we obtain   

w
D

d = Sp𝑎   

and hence 𝑎 = Sp
-1 w

D
d , since 

w
D

 is any nonzero constant  so let 
w
D

= 1 and   
maximize of ( 2.1 ) occur as when :[1] 
𝑎 = Sp

-1 d = Sp
-1(y�1 −  y�2) …( 2.3) 

332 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


Or when 𝑎 is any multiple of  Sp
-1(y�1 − y�2). Thus the maximizing vector 𝑎 is not unique 

however its direction is unique, that is the relative values or ratios 
 of  𝑎1 , 𝑎2 , … , 𝑎p are unique . 
 
2. Discriminant Analysis For Several Groups  

In discriminant analysis for several groups, we are concerned with finding linear 
combinations of variables that would best separate the k groups of multivariate observations. 
For k groups (samples) with 𝑛i observations in the ith group, we transform each observation 
vector yij to obtain 
zij = 𝑎ʹ yij                 i=1,2, . . . ,k ;  j=1,2, . . . , 𝑛i 

and find the means z�i = 𝑎ʹy�i , where y�i  = 
∑ yij
𝑛i
j=1

𝑛i
 . As in the two group case , we seek the 

vector 𝑎 that maximally separates z�1, z�2, . . . , z�k. To express separation among z�1, z�2, . . . , z�k 
we extend the separation criterion to the k group case 
Since 𝑎ʹ( y�1 −  y�2)= (y�1 −  y�2)

ʹ 𝑎 we can write :[1] 
(z�1 − z�2)2

Sz
2 =

[𝑎ʹ( y�1 −  y�2)]2

𝑎ʹSp𝑎
=
𝑎ʹ( y�1 −  y�2)(y�1 −  y�2)ʹ 𝑎 

𝑎ʹSp𝑎
 ...( 2.4 ) 

 
To extend (2.4) to k groups, we use the H matrix in place of  
 ( y�1 −  y�2)(y�1 − y�2)ʹ and E in place of Sp to obtain:  

𝜆 =
𝑎ʹH𝑎
𝑎ʹE𝑎

 where : ...( 2.5 ) 

H= �𝑛i�y�i . − y�. .��y�i . − y�. .�
ʹ

k

i=1

, E= ��( yij − y�i .) (yij − y�i . )ʹ
𝑛i

j=1

k

i=1

  
y�i . =
∑ yij
𝑛i
j=1

𝑛i
=

yi .
𝑛i

 ,  y�..  = ��
yij
𝑛k

𝑛i

j=1

k

i=1

=
y. .
𝑛k

 
The p×p matrix H has a between sum of squares on the diagonal for each of the p variables 
.off diagonal elements are analogous sums of products for each pair of variables. The p×p 
error matrix E has a within sum of squares for each variable on the diagonal ,with analogous  
sums of products off diagonal. Thus H has the form: 

H = 

⎝

⎜
⎜
⎜
⎛

SSH11 SPH12 . . . SPH1p
SPH12 SSH22 . . . SPH2p

. . .

. . .

. . .
SPH1p SPH2p . . . SSHpp⎠

⎟
⎟
⎟
⎞

 ...( 2.6 ) 

The matrix E can be expressed in a form similar to ( 2.6 ) 

E = 

⎝

⎜
⎜
⎜
⎛

SSE11 SPE12 . . . SPE1p
SPE12 SSE22 . . . SPE2p

. . .

. . .

. . .
SPE1p SPE2p . . . SSEpp⎠

⎟
⎟
⎟
⎞

 ...( 2.7 ) 

333 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


We can write the ratio in ( 2.5 ) as  𝑎ʹ H𝑎 = 𝜆𝑎ʹE𝑎 
𝑎ʹ (H𝑎 – 𝜆E𝑎) = 0 ...( 2.8 ) 
We examine values of 𝜆 and 𝑎 that are solutions of ( 2.8 ) in a search for the value of 𝑎 that 
results in maximum 𝜆. The solution 𝑎ʹ = 0ʹ is not permissible because it gives  

𝜆 =  
0
0

 in (2.5).Other solutions are found from 
H𝑎 – 𝜆E𝑎=0 ...( 2.9 ) 
 
 
Which can be written in the form 
(E−1H – 𝜆I ) 𝑎 =0 ...( 2.10 ) 
The solution of ( 2.10 ) are the eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆s  and associated eigenvectors 
 𝑎1 , 𝑎2 , . . . , 𝑎s of  E-1H. The eigenvalues are considered to be ranked  𝜆1>  𝜆2>. . . >  𝜆s. 
The number of nonzero eigenvalues s is the rank of H which can be found as the smaller of 
k−1  

and p. Thus the largest eigenvalue λ1 is the maximum value of 𝜆 =
𝑎ʹ H𝑎 
𝑎ʹ E𝑎

 in ( 2.10 ) and the  
 coefficient vector that produces the maximum is the corresponding eigenvector 𝑎1. Eq( 2.10 ) 

can be verified by using calculus as follows:  Differentiating 𝜆 =
𝑎ʹ H𝑎 
𝑎ʹ E𝑎

 with respect   
to 𝑎 then putting the derivative equal to zero, we obtain :[55] 
𝜕𝜆
𝜕𝑎

=
2(𝑎ʹE𝑎)H𝑎 − 2(𝑎ʹH𝑎)E𝑎

(𝑎ʹE𝑎)2
= 0 

This yields : (𝑎ʹE𝑎)H𝑎 − (𝑎ʹH𝑎)E𝑎 = 0, dividing by 𝑎ʹE𝑎, we obtain : 
H𝑎 − 𝜆 E𝑎 = 0, or ( H − 𝜆 E) 𝑎 = 0.  
Which can be written as (E−1H – 𝜆I ) 𝑎 =0 hence , the discriminant function that maximally 
separates the means is z1=𝑎1́ y that is, it represents the dimension that maximally separates the 
means. From the s eigenvectors 𝑎1 , 𝑎2 , . . . , 𝑎s of E−1H corresponding to 𝜆1 , 𝜆2 , . . . , 𝜆s, we 
obtain s discriminant functions. 
 z1= 𝑎1́ y , z2= 𝑎2́ y , . . . , zs= 𝑎ś y. The relative importance of each discriminant function 
 zi, i=1,2, . . . ,s. can be assessed by considering its eigenvalue as a proportion of the total [37] 
𝜆i

∑ 𝜆j
s
j=1

 ...( 2.11 ) 

By this criterion two or three discriminant functions will often suffice to describe the group 
differences. The discriminant function associated with small eigenvalues can be neglected. 

 
3. Test of Significance of Discriminant Analysis  
For the case of two groups, we wish to test H0 : 𝜇1 = 𝜇2   Vs     H1 : 𝜇1 ≠ 𝜇2 
the discriminant function coefficient vector 𝑎 is significantly different form 0 if T2 is 
significant, where :[5] 
T2 =

𝑛1 𝑛2
𝑛1 +  𝑛2

 (y�1 −  y�2)ʹ Sp−1 ( y�1 −  y�2)                                                                              … (2.12) 

Which is distributed as Tp,𝑛1+𝑛2−2
2  when H0: 𝜇1=𝜇2 is true. We reject H0 if  T2 > T𝛼,p,𝑛1+𝑛2−2

2 . 
Also, we can use F approximation test where :[5] 

334 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


F

=
𝑛1 + 𝑛2 − p − 1
(𝑛1 + 𝑛2 − 2)p

T2                                                                                                                … (2.13) 

Which is distributed as Fp ,𝑛1+𝑛2− p−1 when H0: 𝜇1=𝜇2 is true. We reject H0 if  
F   > Fp ,𝑛1+𝑛2− p−1 
For several groups, to test H0 : 𝜇1 = 𝜇2 = . . . =𝜇k, we use the Wilk's lambda statistic defined  
as : [6]  

Λ =
|E|

|E+H|
 ...( 2.14 ) 

We reject H0 if Λ > Λ𝛼, p, vH ,vE . The parameters in Wilk's Λ distribution are p=number of 
variables, vH = k−1 degrees of freedom of hypothesis,  

vE  = N − k with N = �𝑛i

k

i=1

 degrees of freedom for error.  

Wilk's Λ in ( 2.14 ) can be expressed in terms of the eigenvalues 𝜆1 , 𝜆2 , . . . , 𝜆s of E−1H as 
follows : 

Λ1 = �
1

1 + 𝜆i

s

𝑖=1

 ...( 2.15 ) 

 
The number of nonzero eigenvalues of E−1H is s= min ( p, vH ) which is the rank of H. The 
range of Λ is 0 ≤ Λ ≤ 1 and the test based on wilk's Λ is an  inverse test in the sense that we 
reject H0 for small value of Λ. Since Λ1 is small if one or more 𝜆i's are large, Wilk's Λ tests for 
significance of the eigenvalues and thereby for the discriminant functions. The s eigenvalues 
represent s dimensions of separation of the mean vectors y�1, y�2, … , y�k. We are interested in 
which, if any of these dimensions is significant. In addition to the Wilk's Λ test. 
 We can use 𝜒2 approximation for Λ1 with vH , vE degrees of freedom.  
V1 = − �vE −

1
2

 (p −vH+1)� ln Λ1 ...( 2.16 ) 

     = − �N − 1 −
1
2

 (p+k)� ln �
1

1 + 𝜆i

S

i=1

 = �N − 1 −
1
2

 (p + k)�� ln(1 + 𝜆i)
s

i=1

 
Which is approximately 𝜒2 with p(k−1) degrees of freedom. The test statistic Λ1 and its 
approximations( 2.16 ) test the significance of all of  𝜆1 , 𝜆2 , . . . , 𝜆s. If the test leads to 
rejection of H0, we conclude that at least one of the 𝜆's is significantly different from zero, and 
therefore there is at least one dimension of separation of mean vectors. Since 𝜆1 is the largest, 
we are only sure of its significance along with that of z1= 𝑎1́ y. To test the significance of 
  𝜆2 , . . . , 𝜆s ,we delete 𝜆1 from Wilk's Λ and the associated 𝜒2 approximation to obtain  

Λ2 = �
1

1 + 𝜆𝑖

s

i=2

 
V2 = −�N − 1 −
1
2

 (p + k)�� ln(1 + 𝜆i)
s

i=2

 
Which is approximately 𝜒2 with (p−1)(k−2) degree of freedom. 
If this test leads to rejection of H0, we conclude that at least 𝜆2 is significant along with the 
associated discriminant function z2= 𝑎2́ y. We can continue in this fashion, testing each 𝜆i in 
turn until a test fails to reject H0. The test statistic at the mth step is : 

335 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


Λm = �
1

1 + 𝜆i

s

i=m

 which is distributed as Λp−m+1, k−m, N−k−m+1  ...( 2.17 ) 

The statistic Vm = −�N − 1 −
1
2

 (p + k)�� ln Λm

s

i=2

 ...( 2.18 ) 

                              = �N − 1 −
1
2

 (p + k)�� ln(1 + 𝜆i)
s

i=m

 
has an approximate 𝜒2 distribution with (p−m+1)(k−m) degrees of freedom. 

We can also use F approximation for each Λi.  ForΛ1 = �
1

1 + 𝜆i

s

i=1

 we use  

F =
1 − Λ1

1 𝑡⁄

Λ1
1 𝑡⁄

𝑑f2
𝑑f1

 ...( 2.19 ) 

where t = �
P2(k − 1)2 − 4

P2 + (k − 1)2 − 5
  

Putting w=N − 1 −
1
2

(p+k) then 𝑑f1 = P(k − 1), 𝑑f2 = wt −  
1
2

[p(k − 1) − 2]  

For Λm = �
1

1 + 𝜆i

s

i=m

 , m=2,3,…,s 

We use F =
1 − Λ𝑚

1 𝑡⁄

Λ𝑚
1 𝑡⁄

𝑑f2
𝑑f1

    with p − m+1 and k − m in place of p and k − 1 

 
t = �
(p − m + 1)2(k − m)2 − 4

(p − m + 1)2 + (k − m)2 − 5
 

w = N − 1 −
1
2 

(p+k) , 𝑑f1=(p − m+1)(k − m), 𝑑f2=wt −
1
2

[(p − m + 1)(k − m) − 2] 

 
4.  Tests of Equality of Covariance Matrices [1] 

For k multivariate populations, the hypothesis of equality of covariance matrices is  
H0 : ∑1=∑2=···=∑k 
Calculate C1 = �∑

1
vi
− 1

∑ viki=1
k
i=1 ��

2p2+ 3p−1
6(p+1)(k−1)

�                                                                …( 2.20) 

Then : U= 2(1 C1) ln M     is  approximately    𝜒2 �
1
2

 (𝑘 − 1)p(p + 1�                    …(2.21) 

Where M is M = |S1|
v1
2  |S2|

v2
2 …  |Sk|

vk
2

|Sp |
∑ vii
2

                                                                                …( 2.22) 

and, ln M =
1
2
� vi ln|Si|

k

i=1

−
1
2

 (� vi

k

i=1

) ln�Sp�                                                                       … (2. 23) 

We reject H0 if  U > 𝜒𝛼2                                                                                                 …( 2.24 )     

336 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


5. Stepwise Selection Of Variables [6] 

The stepwise method for selecting variables in discriminant analysis is rather like doing a 
stepwise regressions and is especially useful in similar circumstances, namely when we have 
rather a long list of possible classification variables and it is unlikely that all will make a 
useful contribution to a set of discriminant functions. We would like to find the best subset, or 
else something close to that. The single variable that gives the significant classification into 
our groups is chosen first, then we look at the remaining variables and add the one that gives 
the biggest improvement. We check the two variables now, and make sure that each makes a 
significant contribution in the presence of the other. At each step we see whether another 
variables can be added that will make a significant improvement, and whether any previous 
ones can be removed. The process stops when no more variables can be added or removed at 
the level of significance we are using. 
 
6. Classification Procedures  
6.1. Classification use Discriminant Function 
A simple procedure for classification can be based on the discriminant function, 
z= 𝑎ʹy0= (y�1 −  y�2)

ʹSp
-1 y0 ...( 2.25 ) 

Where y0 is the vector of measurements on a new sampling unit that we wish to classify into 
one of the two groups (populations). 
Denote the two groups by G1 and G2. Fisher’s (1936) linear classification procedure 
assigns y0 to G1 if  z0=𝑎

ʹy0  is closer to z�1 than to z�2 and assigns  y0  to  

G2if z0is closer to z�2than to z�1, z0is closer to z�1 if  z0 >
1
2

(z�1 + z�2) 

where z�1 = �
z1𝑖
𝑛1

𝑛1

i=1

= 𝑎ʹy�1 = (y�1 −  y�2)
ʹSp
−1 y�1 

To express the classification rule in terms of y, we first write 
1
2

(z�1 + z�2) 
in the form: 
 
1
2

 (z�1 + z�2) =
1
2

 𝑎ʹ �y�1 + y�2� =
1
2

(y�1 − y�2)
ʹSp
−1 ( y�1 + y�2) ...( 2.26 ) 

 
Then the classification rule becomes, assign y0 to G1 if [1] 

𝑎ʹy0 = (y�1 − y�2)
ʹ Sp

-1y0 >
1
2

(y�1 − y�2)
ʹ Sp

−1 ( y�1 + y�2) ...( 2.27 ) 

 
and assign y0 to G2 if  

𝑎ʹy0 = (y�1 − y�2)
ʹ Sp

-1y0 <
1
2

(y�1 − y�2)
ʹ Sp

−1 ( y�1 + y�2) ...( 2.28 ) 

 
6.2. Classification Use Simple Classification Function [6] 
Fisher (1936) proposed a simple classification function for each group  based on a linear 
combination of the discriminating variables. For the case of k groups and p discriminating 
variables, the simple classification function has the form  
zg = bg0 + bg1 X1 + bg2 X2 + . . . + bgp Xp , g= 1,2, . . . ,k                                                …( 2.29 ) 
The coefficient bg i  associated with variable i in group g is given as : 

337 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


bg i = (N − g) � wij  yjk

p

j=1

 
Where wij  represents the ij
th element from the inverse matrix of within groups sums of  

squares and cross products, N represents the whole number of observations. 
The constant bk0  is given as :  

bk0  = − 0.5 � bkj yjk

p

j=1

                                                                                                             … (2.30) 

The rule of classification is simply to classify the new observation to the group that yields a 
maximum value of the classification function h after substituting all discriminating variables 
into classification functions. 
 
7. Estimating Misclassification Rates  
A simple estimate of the error rate can be obtained by trying out the classification procedure 
on the same data set that has been used to compute the classification function. 
This method is referred to as resubstitution. Each observation vector yij is substituted to the 
classification functions and assigned to a group .[1] 
We then count the number of correct classifications and the number of misclassifications. 
The proportion of misclassifications resulting from resubtitution is called the apparent error 
rate (APER). The results can be conveniently displayed in a classification table as shown 
below: 

Table ,classification table for two groups 

Actual group Number of observations 

Predicted 
group 

1 2 
1 n1 n11 n12 
2 n2 n21 n22 

 
Let us denote the first and second groups by G1 and G2 respectively. Among n1 observations 
in G1,n11 are correctly classified into G1 and n12 are misclassified into G2 ,where n1=n11+n12. 
Similarly of the n2 observations in G2, n21 are misclassified into G1, and n22 are correctly 
classified into G2 where n2=n21+n22 thus:[6] 
APER = n12+n21

n1+n2
                                                                                                               …( 2.31) 

Similarly, we can define apparent correct classification rate (APCR) as: 
APCR= n11 +n22

n1+n2
                                                                                                               …( 2.32) 

The method of resubstitution can be readily extended to the  case of several groups. 
 
Particular Application 

The real data were collected from records of (51) real patients suffering from coronary 
heart disease (CHD) from Ibn-Al-Nafees Hospital, moreover, the same informations were 
obtained about (54) healthy persons. The discriminant analysis were then performed with two 
groups (patient, not patient) and ten predictor variables that we belive they cause the disease. 
the variables for each group are : 
A. The dependent variable which represents (0 for Not patient) and (1 for patient) 
B. Ten independent variables are ascribed below : 
1. Age  (X1) 
2. S.cholestrol  (X2)  

338 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


3. Triglyceride  (X3)  
4. LDL (high low density cholesterol)  (X4)  
5. HDL(low high density cholesterol)  (X5)  
6. Diabetes mellitus (Sugar)  (X6)  
7. Hypertension (systolic Blood pressure) (X7)  
8. Sex (0 for Male) (1 for Female)  (X8)  
9. Smoking   (0 for not smoker) (1 for X-smoker) (2 for smoker) (X9)  
10. Family History (Heredity factor) (X10)   
(0 for no heritage factor) (1 for heritage factor)  
The mean for each group and the total mean are presented in  (Table 1). 
Applying the rules of stepwise method for discriminant analysis stated earlier we found that 
only four predictor variables, namely X4, X5, X10, X2 give the significant classification into 
our groups. 
 

1. The Discriminant Function 

We find the linear discriminant function coefficients by using equation (2.3) the linear 
discriminant function is :  
Z= (0.1300X4) + (0.3385 X5 ) – (3.6060 X10) – (0.1958 X2) 
 
2. Test of Significance of Discriminant Analysis 

Now to test the significance of discriminant function we calculate the statistic T2, it 
found the statistic to be T2 = 381.1326171  
Since T2 > T𝛼, p,𝑛1+𝑛2−2 

2 = T0.01, 4,103 
2 = 14.511, Then the discriminant function is significant. 

Another test of significant can be performed by using eq( 2.16 ) where the value of ∨ was 
found to be 156.212 comparing this value with 𝜒(0.01,4)

2  = 13.2767, we conclude that the 
discriminant function is significant. 
Also, we can use F approximation for Λ = 0.213 to test the significance by using equation 
(2.19), where F = 92.4 and compare with F0.01,4,100= 3.51, we conclude that the discriminant 
function is significant. 
 
3. Tests of Equality of Covariance Matrices 
We use the 𝜒2 approximation test, the value of u was calculated by using equation (2.21), it 
was found to be 28.01 while the critical value of 𝜒0.001,102  =29.59  
Thus we accept H0 since u < 𝜒0.001,102  . 
 
4. Classification Procedures 
4.1. Classification Use Discriminant Function 

We must find the mean of discriminant function for the two groups 
 z� = [0.1300*X�4] + [0.3385*X�5] – [3.6060*X�10] – [0.1958*X�2] 
The mean discriminant function of group 1 is 
z�(1) = [0.1300*111.69] + [0.3385*64.43] – [3.6060*0.13] – [0.1958*199.37] 
       = −3.176171 
And the mean discriminant function of group 2 is 
z�(2) = [0.1300*184.94] + [0.3385*34.10] – [3.6060*0.63] – [0.1958*260.57] = −17.706336 
The cut point is  −3.176171+ −17.706336

2
=  −10.4412535 ,  

339 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


we use equation (2.27), (2.28) to classify the new observations, for example, we want 
to classify new observation from group 1 (not patients) with the informations LDL=80, 
HDL=71, S.cholestrol=171 and has no hereditary factor, we found   
z(6) =  [0.1300*80] + [0.3385*71] – [3.6060*0] – [0.1958*171] = 0.9517, By equation ( 2.80 
),   
0.9517> −10.4412535. Then the observation is correctly classified in Group 1 
 
4.2. Classification use Simple Classification Function 
After we find the inverse of within – sum square and cross products marix, we find the 
functions of group 1 and group 2 by equation (2.29), (2.30) where : 
Z(1)= (−60.037) + (0.337 *X4) + (0.902 *X5)+ (−2.598 * X10)+ (0.117 * X2) 
Z(2)= (−70.480) + (0.207 * X4) + (0.564 * X5) +(0.997* X10) + (0.313 * X2) 
And we use the two functions to classify the new observations. For example, we want to 
classify a new observation  from group 2 (patients) with the informations LDL=183, 
HDL=28, S.cholestrol=251 and has no hereditary factor, we found 
z(1) = (−60.037) + (0.117 * 251) + (0.337 *183) + (0.902 * 28) + (−2.598 * 0) = 56.257 
And z(2)  = ( −70.480) + (0.313 * 251) + (0.207 *183) + (0.564*28) + (0.997 * 0)  = 61.756    
∵  z(1) < z(2) , Then the observation is in group 2 
 
5. Estimating Misclassification Rates 
We calculate apparent error rate by equation (2.31) 
APER = 2+3

105
  = 4.8% 

And we calculate the apparent correct rate by equation (2.32) 
APCR =49+51

105
 = 95.2% 

the correctly classified into group (1) 
42
54

=96.3% 

The misclassified into group (1)is  
2
54

= 3.7%  

The correctly classified into group (2)is  
49
51

=96.1% 

The misclassified into group (2)  
2
51

= 3.9%  
The classification is represented in (Table 2). 
 
 
340 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


Conclusions 
From the theoretical and practical study, we believe that the following points are          
 considerable : 
1.  By using the stepwise method, we conclude that the first predictor variable that has the 
largest significant effect for discriminant between the two groups is high low density 
cholesterol X4, followed by the low high density cholesterol X5 then the Heredity factor X10 
and finally by s.cholestrol X2, thus our discriminant function was constructed on the basis of 
these variables. 
2. According to the test of significance we made, namely, the wilk's Λ test and the 𝜒2 
approximation test with level of significance 𝛼= 0.01 we found that the discriminant function 
constructed significantly separates the groups. 
3. By using the resubstitation method, the resulting classification table revealed that about 5% 
of the cases are wrongly classified, while about 95% of the cases are correctly classified to the 
groups. 
 
References  
1. Rencher, A. C. (2012), Methods of Multivariate Analysis, Third Edition, Wiley, New York. 
2. Kandall, M.G. (1955), The Advanced Theory of Statistics, Vol.II, Third Edition, Charles 
Griffin, London 
3. Samnles, Wilks (1963), Mathematical Statistics, John wiley, New York, London. 
4. Klecka, W.R. (1984), "Discriminant Analysis" Beverly Hills/ London 

) " استخدام التحلیل الممیز لتشخیص بعض امراض العیون"، بحث منشور في مجلة االدارة 2008صالح، عائدة ھادي ( .5
 286-264والستون، ص واالقتصاد، العدد السابع 

)، استخدام الدالة الممیزة لتشخیص حاالت التھاب األمعاء عند األطفال الرضع، 1998السلیماني، مؤید سلمان عباس ( .6
 رسالة ماجستیر في اإلحصاء، مقدمة إلى مجلس كلیة اإلدارة واالقتصاد،  جامعة بغداد. 

 
Table (1) : The Mean for Each Group and the Total Mean 
 X�1 X�2 X�3 X�4 X�5 X�6 X�7 X�8 X�9 X�10 

Not 
patients 45.85 199.37 116.80 111.69 64.43 137.61 125.74 0.43 0.63 0.13 

patients 63.02 260.57 202.75 184.94 34.10 201.31 155.20 0.31 1.14 0.63 
Total 54.19 229.10 158.54 147.27 49.70 168.55 140.05 0.37 0.88 0.37 

 
Table (2) : Classification Result for Discriminant Function 

 
Actual  
group 

Number of 
observations 

Predicted group 
Healthy(1) Disease(2) 

 
Healthy(1) 
Disease(2) 

54 
51 

      51 
        2 

3 
49 

% Healthy(1) Disease(2) 
100 
100 

94.4 
3.9 

5.6 
96.1 

 
341 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


 تطبیقات التحلیل التمییزي في التشخیص الطبي
 

 حازم منصور كوركیس
 نس خلیل محمدأ

 جامعة بغداد/  )ابن الھیثم(كلیة التربیة للعلوم الصرفة قسم الریاضیات / 
 

2013 كانون االول 4:   في ، قبل البحث 2013حزیران  19:  في أستلم البحث  
 

 الخالصة
التمییزي لتصنیف أمراض القل�ب األوس�ع انتش�ارا والمعروف�ة باس�م أم�راض القل�ب استخدم التحلیل في ھذا البحث، 

التاجیة (انس�داد الش�رایین) إل�ى مجم�وعتین (م�ریض، غی�ر م�ریض) عل�ى أس�اس التغی�رات التمییزی�ة لعش�رة م�ن المتغی�رات 
 .التنبؤیة التي تعتقد انھا تسبب المرض

 ونُفّ�ـذمھم�ة ف�ي فص�ل المجموع�ات الغی�ر  المتغی�راتالمراح�ل لح�ذف اس�لوب  ونُفّـذاستخدمت عینة عشوائیة لكل مجموعة 
 .اختبار معنویة الدالة الممیزة ومعدل خطأ التصنیف

 
 .، معدل خطأ التصنیف  ، التصنیف، اسلوب المراحل كلمات مفتاحیة: التحلیل التمییزي

  
342 | Mathematics 

@a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹127@@ÖÜ»€a@I1@‚b«@H2014 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 27 (1) 2014 


	Anas Kh. Mohammed