@1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

Employing Ridge Regression Procedure to Remedy the 
Multicollinearity Problem 

 
Hazim  M. Gorgees 

Bushra  A. Ali 
Dept. of Mathematics/College of Education for Pure Science(Ibn AL-Haitham) / 

University of Baghdad 
 

Received in:11 September 2012   Accepted in:15 October 2012 
 
Abstract 
     In this paper we introduce many different Methods of ridge regression to solve 
multicollinearity problem in linear regression model. These Methods include two types of 
ordinary ridge regression (ORR1), (ORR2) according to the choice of ridge parameter as well 
as generalized ridge regression (GRR). These methods were applied on a dataset suffers from 
a high degree of multicollinearity, then according to the criterion of mean square error (MSE) 
and coefficient of determination (R2) it was found that (GRR) method performs better than the 
other two methods. 
 
Keywords : Ordinary ridge regression, Generalized ridge regression, Shrinkage estimators, 
Singular value decomposition, Coefficient of determination. 
  
 
320 | Mathematics 


 @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

Introduction 
       In this paper we deal with the classical linear regression model y =Xβ+ ε …..  (1) where y 
is (n×1) vector of response variable, X is (n×p) matrix,(n > p)of explanatory variables, β is 
(p×1) vector of unknown parameters and ε is (n×1) vector of unobservable random errors, 
where E(ε)=0, var(ε)=σ2I. Considerable attention is currently being focused on biased 
estimation of the regression estimators of a linear regression model .This attention is due to 
the inability of classical least squares to provide reasonable point estimates when the matrix 
of regression variables is ill-conditioned. Despite possessing the very desirable property of 
being minimum variance in the class of linear unbiased estimators under the usual conditions 
imposed on the model, the least squares estimators can, nevertheless, have extremely large 
variances when the data are multicollinear which is one form of ill-conditioning. Much 
research, therefore, on obtaining biased estimators with better overall performance than the 
least squares estimator is being conducted. This paper discusses the ridge regression 
estimators for use with multicollinear data. In contrast to least squares, these estimators allow 
a small amount of bias in order to achieve a major reduction in the variance. A numerical 
example is included to illustrate the theoretical relationships.   
 
 The Case of Multicollinearity  
      The problem of multicollinearity occurs when there exists a linear relationship or an 
approximate linear relationship among two or more explanatory variables; two types of 
multicollinearity may be faced in regression analysis , perfect and  near multicollinearity. As 
an example of perfect multicollinearity assuming that the three components of a mixture are 
studied by including their percentages of the total p1,p2,p3 obviously these variables will have 
the perfect linear relationship p1+p2+p3 =100. During regression calculations, the exact linear 
relationship causes a division by zero which in turn causes the  calculations to be aborted.               
When the relationship is not exact, the division by zero does not occur and the calculations 
are not aborted. However the division by a very small quantity still distorts the results. Hence, 
one of the first steps in a regression analysis is to determine if a multicoiiinearity is a problem. 
Multicollinearity can be thought of as a situation where two or more explanatory variables in 
the data set  move together. As a consequence it is impossible to use this data set to decide 
which of the explanatory variables is producing the observed change in the response variable.      
Moreover, multicollinearity can create inaccurate estimates of the regression coefficients. To 
deal with multicollinearity we must be able to identify its source. The source impacts the 
analysis, the corrections and the interpretation of linear model. The sources of 
multicollinearity may be summarized as follows :[1] 
1- Data collection. In this case the data have been collected from a narrow subspace of the 
explanatory variable. The multicollinearity has been created by the sampling methodology 
and doesn’t exist in the population .Obtaining more data on an expanded range would cure 
this multicollinearity problem. 
2- Physical constraints on the linear model or population. This source will exist no matter 
what sampling technique is used. Many manufacturing or service processes have constraints 
on explanatory variables (as to their range),either physically, politically, or legally which will 
create multicollinearity moreover extreme values or outliers in the X space can cause 
multicollinearity.  
     Some multicollinearity is nearly always present, but the important point is whether the 
multicollinearity is serious enough to cause appreciable damage to the  regression analysis.           
Indicators of multicollinearity include a low determinant of the information matrix X´X, a 
very high correlation among two or more explanatory variables, very high correlation among 
two or more estimated coefficients a very small (near zero)eigenvalues of the correlation 
matrix of the explanatory variables. Moreover the Farrar-Glauber test based on Chi square 

321 | Mathematics 


 @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

statistic may be used to detect multicollinearity. Accordingly the null hypothesis to be tested 
is:  
H0 : Xj are orthogonal ,  j= 1,2,….p 
Against an alternative 
 
H1 :Xj are not orthogonal. 
The test statistic is  

χ2 = - [(n-1) - 1
6
 (2p+5)] ln |D|                                           …………..(2) 

     Where n is the number of observations, p is the number of explanatory variables, |D| is the 
determinant of correlation matrix. Comparing the calculated value of 𝝌𝝌2 with theoretical value 
at p(p-1)/2 degrees of freedom and specified level of significant, we reject H0 if the calculated 
value is more than the theoretical value which means that the dataset suffers from a 
multicollinearity problem, otherwise the null hypothesis H0 can not be rejected.  
 
 The Shrinkage Estimators 
       Applying the singular value decomposition we can decompose an (n×p) matrix into three 
matrices as follows:   

X=H D1/2 G′                                                                            …..…………(3) 
Where H is an n×p semi orthogonal matrix satisfying  H´H =IP, D1/2 is a (p×p) diagonal matrix 
of  ordered singular values of X 
d11/2  ≥  d21/2  ≥ …… ≥ dp1/2  > 0,  G is a (p ×p) orthogonal matrix whose columns represent  the 
eigenvectors of X´X . 
Accordingly, the ordinary least squares estimator of the regression parameter vector β can be 
written as: 
    bOLS = (X'X)-1X'Y  
            = GC            
Where C= D-1/2H'Y  is a (p×1) vector containing  the uncorrelated components of  bOLS [2] 
The generalized shrinkage estimators will be denoted by bSH may be defined as: [1] 

                                               
                                                       …………..(4) 

 where jg


 is the j-th column of the matrix G, δj is the j-th diagonal element of the shrinkage 

factors  diagonal matrix ∆, 0 ≤ δj ≤ 1, j = 1,2,…,p, cj is the j-th element of the uncorrelated 
component vector C. 
 
Ordinary Ridge Regression Estimators 
      One of several methods that have been proposed to remedy multicollinearity problem by 
modifying the method of least squares to allow biased estimators of the regression 
coefficients, is the ridge regression method. The ridge estimator depends crucially upon an 
exogenous parameter, say k called the ridge parameter or the biasing parameter of the 
estimator. For any k ≥0, the corresponding ridge estimator denoted by bRR  is defined as:   
bRR = ( X'X + kI)-1X'Y                                                                       ………..(5) 
    Where k ≥ 0 is a constant chosen by the statistician on the basis of some intuitively 
plausible criteria put forward by Hoerl and Kennard.[3] 
It can be shown that the ridge regression estimator given in (5) is a member of the class of 
shrinkage estimators as follows:[2] 
By using Matrix algebra and singular value decomposition approach of matrix X we get: 
bRR  = (X'X + kI)-1 X´Y  =[G(D+kI)G']-1GD1/2 H'Y 

p

SH j j j
j 1

b G C g c
=

= ∆ = δ∑


322 | Mathematics 


 @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

                                      = G(D+kI)-1G'GD1/2 H'Y  
                                      = G(D+kI)-1D1/2H'Y 
                                       = G[(D+kI)-1D] D -1/2H'Y 
                                       = G∆C                                                       ………… (6) 
Where ∆=(D+kI)-1D. Equivalently, the shrinkage factors δj,  j=1,….,p of the ridge estimator 
have the form 
δj =   

dj
dj+k

     ,   j= 1,2…..,p                                                                ………… (7) 

where dj is the jth element (eigenvalues) of the diagonal matrix D, and k is the ridge 
parameter.                 
   
The Generalized Ridge Regression (GRR)  
          In this section we suggest using the singular value decomposition technique in order to 
derive the generalized ridge regression estimator for the first time (as far as we know). Let G 
be a (p×p) orthogonal matrix with columns as eigenvectors (g1,g2….gp) of X´X Hence, G′ 
(X´X)G = G´(GDG′)G = D = diag(d1,d2,….dp). Then we can rewrite the linear model as: 
 Y =Xβ+ ε 
     =(HD1/2) G'β+ ε = X*𝛼𝛼 + ε                                                            …………. (8) 
Where X* = HD1/2,  α =G'β 
This model is called canonical linear model or uncorrelated components model. The OLS 
estimate for α is given as : 
α OLS = (X

*′X*)-1X*′Y= (D½H'H D½)-1X*'Y 
                              = D-1X*'Y                                                         ……………(9) 
  and var (α OLS) = 𝜎𝜎

2 (X*'X*)-1 = 𝜎𝜎2D-1. 
Which is diagonal. This shows the important property of this parameterization since the 
elements of αOLS namely (α1, α2 … αp)OLS are uncorrelated. The ridge estimator for 𝛼𝛼 is 
given by:  
α RR = (X

*′X*+K)-1X*′Y = (D+K)-1X*′Y                                        ……………(10) 
        = (D+K)-1X*'X* α OLS 
        = (I+KD-1)-1 α OLS        
        = wKα OLS = diag (

di
di+ki

) α OLS . 
Where K is a diagonal matrix with entries (k1,k2…kP). This estimate is known as generalized 
ridge estimate. The mean square error of αRR is  given by: 
  MSE(αRR) = var(αRR) + (bias αRR) (bias αRR)' 
                  = σ2 tr(wk D–1w′k) + (wk – I) αOLS α′OLS (wk – I)′ 

                  = 
2
i (OLS)i2 i

2 2
i i i i

2kd
(d k ) (d k )

α
σ ∑ + ∑

+ +
                                               …………(11) 

   To obtain the value of ki that minimize MSE (𝛼𝛼RR ) we differentiate equation(11)with respect  
to ki and equating the resultant derivative to zero thus  

2
(RR ) i i (OLS)i2 i

3 3
i i i i i

MSE( d kd
0

k (d k ) (d k )

∂ α α
= −σ ∑ + ∑ =

∂ + +
 

Solving for ki we obtain : 
2

i 2
(OLS)i

k
σ

=
α

 
323 | Mathematics 


 @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

Since the value of 𝜎𝜎2 is usually unknown we use the estimate value  σ^2. Accordingly, when  

matrix K satisfies  
2

i 2
( OLS)i

ˆ
k̂

σ
=
α

  
= diag
2 2 2

2 2 2
( OLS)1 ( OLS) 2 ( OLS) p

ˆ ˆ ˆ
( , ,..... )
σ σ σ

α α α
 

Then the mean square error of generalized ridge regression estimate αRR  attains the minimum 
value. The original form of ridge regression estimator can be converted back from the 
canonical form by: 
b(GRR) = G α(RR)                                                                                              ……………(12) 
All the basic results concerning the ordinary ridge regression estimator can be shown to hold 
for this more general formulation. 
 
Choice of Ridge Parameter 
        The ridge regression estimator does not provide a unique solution to the problem of 
multicollinearity, but provide a family of solutions .These solutions depend on the value of k 
(the ridge biasing parameter). No explicit optimum value can be found for k. Yet, several 
stochastic choices have been proposed for this shrinkage parameter .Some of these choices 
may be summarized as follows:[4] 
      Hoerl and Kennard (1970), suggested a graphical method called ridge trace to select the 
value of the ridge parameter k. This plot shows the ridge regression coefficients as a function 
of k.       When viewing the ridge trace, the analyst picks the value of k for which the 
regression coefficients have stabilized. Often, the regression coefficients will vary widely for 
small values of k and then stabilize. We have to choose the smallest value of k possible 
(which introduces the smallest bias) after which the regression coefficients have seem to 
remain constant. Hoerl, Kennard and Baldwin (1975), proposed another method to select a 
single k value given as: 

k�(HKB)=   
PS2

bOLS
′  bOLS

                                                                    …………… (13) 
 Where P is the number of predictor variables, S2 is the OLS estimator for 𝜎𝜎2, bOLS is the OLS 
estimator for the vector of regression coefficients. 
Lawless and Wang (1976) have proposed selecting the value of K by using the formula : 

k�(LW)= 
PS2

bOLS
′  X′X bOLS

                                                                  …………… (14) 
Hoerl and Kennard (1970) suggested the iterative method to estimate the value of K based on 
the formula: 

kj+1=   
PS2

[bRR(Kj
)]´[bRR(Kj)] 

                                                             …………… (15) 

The first value of k assumed to be zero and hence, [bRR(K0)]´[bRR(K0)] = b´OLS bOLS. 
Substituting for k0 in the right hand side of (15) we obtain the first adjusted value k1 which 
will be also substituted in the right hand side of equation (15) to obtain the second adjusted 
value k2 then continue the iterations until the following inequality have to be satisfied :[5] 

j 1 j

j

k k
k

+ −   ≤ ε   .                                                                             ……………(16) 

Where ϵ  is small positive number (close to zero). Hoerl and Kennard proposed the value of ϵ  
to be [5] 

324 | Mathematics 


 @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

1.31trace (X X)
20

P

−− ′
ε =  

  
                                                              …………..( 17) 

In the case of generalized ridge regression Hoerl and Kennard proposed method to select ki as 
follows: 

 ki = 
S2

α(OLS)i
2                                                                                   …………….(18) 

 
Numerical Example 
      In this section we apply the procedures discussed earlier employing the data obtained 
from Midland Refineries Company to determine the effect of six factors (explanatory 
variables X1, X2 …X6) on the productivity of  labor (response variable y).The data are given 
in table (1). Applying the Farrar-Glaubor test  given in equation (2), it was shown that the 
calculated 𝝌𝝌2  is 137.456  while the theoretical value at 15 degree of freedom and 0.05 level 
of significant is 24.996. Obviously, the calculated value is greater than the theoretical value of 
the 𝝌𝝌2 which implies that the data suffer from a high degree of multicollinearity. Let us 
assume that ORR1 represents the ordinary ridge regression estimator with the ridge parameter 
obtained by Hoerl-Kennard and Baldwin k�(HKB), ORR2 represents the ordinary ridge 
regression estimator with the ridge parameter obtained by Lawless and Wang k�(LW) and GRR 
represents the generalized  ridge regression estimator. Applying the formulas in equations 
(13), (14) and (18) we obtain: 
k�(HKB) =  0.0075410 ,    k�(LW)  =  0 .027179                                                                                                                                                    
k� (GRR)i  = [0.121641, 0.0117446,  0.0051056,  0.053459,  0.002102,  0.083658]  
    The computation results of variance inflation factors (VIF) for each explanatory variable, 
the mean square error (MSE) and the coefficient of determination R2 for each method are 
presented in tables (2), (3) and (4) respectively. 
 
 Conclusion 
        In addition to the Farrar-Glaubor test,  the large values of VIF in table (2) is another 
indicator that our dataset suffers from a high degree of  multicollinearity [6], since the GRR 
estimator has smaller MSE and larger R2 than other two estimators (ORR1, ORR2) as it is 
shown in table (3) and (4), we conclude that the GRR is better than  ORR1 and ORR2 
estimators for remedy the multicollinearity problem in our dataset. 
 
References 
1. Obenchain,R.L.(1975),''Ridge analysis following a preliminary test of the shrunken 
hypothesis'',Technometrics, 17. 431-445  
2.  Gorgees, H.M.(2009),''Using singular value decomposition method for estimating the ridge 
parameter'', Journal of Economics and Administrative science , 53.1-10    
3.  Hoel, A.E., and Kennard, R.W.(1970), "Ridge Regression Biased Estimation for                     
Nonorthogonal Problem",Technometrics,12 .55-67  
4.  El-Dereny, M and. Rashwan, N.I.(2011), "Solving Multicollinearity problem Using Ridge 
Regression Modlels",Int.J.Contemp,Math.Sciences, 6. 585-600 . 
 5.  Drapper, N.R. and Smith, H.(1981), ''Applied Regression Analysis'', Second Edition, John 
Wiley and Sons, New York . 
 6.  Pasha,G.R and Ali Shah, M.A.(2004),"Application of ridge regression to multicollinear 
data", Journal of  Research(Science), Bahauddin Zakariya University, Multan, Pakistan , 15. 
97-106  
 

325 | Mathematics 


 @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

Table (1): Effect of six explanatory variables X1,X2 …X6 on response variable Y 
Y X1 X2 X3 X4 X5 X6 

3193 7 79 305 230 1580 337 
3506 8 80 390 266 1590 358 
5203 8 81 415 280 1610 416 
3118 9 84 425 330 1640 454 
10565 9 85 434 368 1642 465 
28245 7 137 692 416 1535 470 
34701 35 200 759 440 1894 574 
33660 3 833 2475 1222 353 533 
45240 4 1153 2480 1285 345 733 
51157 4 1285 2745 1141 311 873 
65085 4 1353 2854 1087 350 878 
62893 4 1331 2895 1082 382 908 

 
Table (2): VIF for all variables 

Predictor VIF 
X1 25.279 
X2 366.798 
X3 220.081 
X4 89.442 
X5 884.559 
X6 92.407 

 
Table(3): MSE for each method 

MSE 
ORR1  0.074426 
ORR2 0.093867 
GRR  0.069632 

 
Table(4): R2 for each method 

R-Square 
ORR1 95.94 

ORR2 94.88 
GRR 96.2018 

 
326 | Mathematics 


 @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 

Ibn Al-Haitham Jour. for Pure & Appl. Sci.                                           Vol. 26 (1) 2013 

مشكلة التعدد الخطي نحدار الحرف في معالجةاتوظیف طریقة   
 

 حازم منصور كوركیس
 علي الرسول عبد شرىب

 جامعة بغداد /) الھیثم ابن( للعلوم الصرفة التربیة كلیة / قسم الریاضیات
 

 2012تشرین االول   15قبل البحث في:  ،  2012ایلول   11استلم البحث في:
 

 ةالخالص
ھذه في أنموذج االنحدار الخطي العام التعدد الخطي ق عدیدة النحدار الحرف لحل مشكلة ائفي ھذا البحث قدمنا طر        

باالعتماد على طریقة أختیار معلمة  ORR1, ORR2)(ق أنحدار الحرف االعتیادیة  ائق تشمل نوعیین من طرائالطر
ق على مجموعة من البیانات تعاني ائھذه الطر طبقتوقد  )GRR(الحرف وكذلك تشمل على طریقة أنحدار الحرف العامة  

 R2ومعامل التحدید   (MSE)عدد الخطي بدرجة عالیة وباالعتماد على معیار متوسط مربعات الخطا لتمن مشكلة ا
 .ھي االفضل من الطریقتین االخرتین  )(GRRین لنا ان طریقة أنحدار الحرف العامة تب.الغراض المفاضلة 

 
  أنحدار الحرف االعتیادي،أنحدار الحرف العام، المقدرات المقلصة، تحلیل القیمة الشاذة، معامل التحدید:  الكلمات المفتاحیة
 
 
327 | Mathematics