IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

Bayesian Analyses of Ridge Regression Prooblems  

 
H. M. Gorgees 

Departme nt of Mathematics, College of Education Ibn-Al-Haitham,Unive rsity                                                       

of Baghdad            

Abstract: 

       A Bay esian formulation of the ridge regression p roblem is consid erd, which derives from 
a direct sp ecification of prior informations about p arameters of general linear regression 
model when data suffer from a high degr ee of multicollinearity .A new app roach for deriving 
the conventional estimator for t he ridge parameter prop osed by  Hoerl and Kennard (1970) as 
well as  Bayesian estimator  are presented. A numerical example is st udied in order to 
comp are the performance of these estimators.  

Introduction: 

         The problem of multicollin earity  exists when there exists a linear relationship or an 

approximate linear relationship  amon g two or more exp lanatory  variables.  

       M ulticollinearity  can be thought of as a situation where two or more exp lanatory  

variables in the data set move together. As a consequence it is impossible to use this data set 

to decide which of the exp lanatory  variables is p roducing the observed chan ge in the resp onse 

variable. No t reatment of the data or transformation of the model will cure this deficiency. 
Consequently ,the best  way  to deal with multicollin earity  may  be to find a different data set, or 

additional data to break the association between the related variab les. 

 
However some multicollinearity  is nearly alway s p resent, but the important p oint is whether 

the multicollin earity  is serious enough to cause app reciable d amage to 

the regression. Indicators of multicollinearity  include a low determinant of t he information 

matrix, a very  high correlation amon g two or more exp lanatory  variables , very high 

correlations amon g two or more est imated coefficients, 

and sign ificant regr ession of one exp lanatory  variable on one or more exp lanatory  variables.  

 Key words: linear regr essioin model , multicollinearity  , ridge regression , generalized 

shrinkage estimators , Bayesian estimator , singular v alue decomposition. 

  
IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

     This p aper deals with multicollinear ity  in the classical linear regr ession 
model        

y =xβ+u …..(1) 

Where y is an (  of observations on t he resp onse variable, x = 

(  is an (n  matrix and of full colu mn rank, 

β is a ( p p arameter vector (vector of unknown regr ession coefficients)  and u is an ( 

n  vector of random disturbances, E(u)=0 and var(u)= I, and both β and are 

unknown. T he least  squares estimator of β is:  (see[1])                       = x'y 

…..(2) 

Where denote the least  squares estimator of β. T he two key  p rop erties of  are that it 

is unbiased,E( )=β, and that it has minimum variance among all lin ear unbiased 

estimators.The mean square error of is: 

MSE( )=  ……(3)   (see[2]) 

Where "s are the eigenvalues of x'x and  ….. >0 .. If the smallest 

eigenvalue of x'x is very much smaller than 1, then a seriously ill_conditioned (or 

multicollinear ity ) p roblem arises. Thus,for ill_conditioned data, the least squares solution 

y ields coefficients whose absolute values are too large and whose signs may actually reverse 

with negligible chan ges in the data. That is in the case of multicollinearity  the least  squares 

estimator  can be poor in terms of various mean squared error criterion. Consequently , a 

great deal of work has been done to const ruct alternatives to the least  squares estimator when 

multicollinear ity  is p resent. In the seventies Hoerl and Kennard introduced a class of biased 

estimators for p arameters in general lin ear regr ession model labeled ridge est imators as a rival 
to the least  squares estimator when samp le data are affected by  a high degree of 

multicollinear ity .  

       The outhers show t hat in any  given p roblem there is at least one member of this class 

which has t otal mean square error smaller than the tot al variance of the corresp onding least 

squares estimator. T he ridge est imator dep ends crucially 

up on an exo geneous p arameter, say k. For any  k  the corresp onding ridge  

estimator denoted by  is defined to be:    

= x'y…….(4)    ( see[1]) 

 
IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

          We argue that it is typ ically true that there is available p rior informations about t he 

p arameters, and this may be exp loited to find imp roved est imators. In this p aper attention is 

focuced upon Bayesian formulation for gener alized shrinkage est imators and ordinary ridge 

regr ession estimator,moreover a Bayes estimator as well as a  conventional  estimator  for t he 

ridge p arameter is derived. 

Generalized Shrinkage Esti mators: 

          Given an (n ) matrix of regr essors x and an (n ) vector of the  corresp onding 

resp onse y . Assume that sample means have been removed from the data (so that 1'x=0'and 

1'y=0, where 1 is an n_vector of ones.) and write the standard linear 

regr ession model as E(y|x)= xβ and var(y|x)= (I-11'/n) where β is a vector of unknown 

regr ession coefficients and  is the unknown error  

variance. The singu lar value decomp osition of x  will b e denoted by: (see[3]).  

X=H G' …….(5) 

Where H is an (  semi orthogonal matri x satisfy  H'H=  ,  is a (p  

diagonal matrix of orderd singular values of x,   ….. > 0, G is  a (p 

 orthogonal matrix whose co lumns rep resent t he eigenvectors of t he information matri x 

x'x. Assume  that >0 so that β is estimable, then as in" Obenchain (1978)" (see [4])  we 

get =GC where C= H'y  contains the uncorrelated comp onents of  where 

E(C)=E(G' ) = G'β =γ say, and: 

Var(C)=var(G' )=G'var( )G= G' G= . 

         Not ice that the elements of C are uncorr elated since their variance matrix is diagon al. 

The vector of gener alized shrinkage est imator (or generalized ridge regression estimator) will 

be denoted here by  and will be of the general  form : (see[4]) 

=G∆C=  …….. (6) 

Where  is the j_th column of the matrix G,  is the j_th diagonal element of 

the shrinkage factors matrix ∆, we will usually  restrict the range of shrinkage factors t o 0  

, j= 1,2, ….,p.  is the j_th element of t he uncorrelated components vector C. In the 

remain ing of this section we discuss Bay esian methods for defining the form of shrink age of 

sample estimates t owards a subjective p rior distribution.Lindely  and Smith (1972) describ e a 

Bayesian formalizim for h ierarchical (multi_st age) an aly sis of linear mod els using conju gate 

IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

multivariate_normal prior dist ribution.(see [5])  This formalism exp resses  unknown 

p arameters at each st age  of an analysis in terms of a linear model at the previous lower st age . 

But, although disp ersion matrices at each stage can b e arbitrary  , they must be known. And, at 

the final st age, both t he mean v ector and the disp ersion matrix must be known.The 

fundamental lemma of Lind ely  and Smith (1972) states that: 

Lemma:If the samp eling d istribution of the resp onse,y , is 

y | ~ N( , ) where is ( ×1) p arameter vector and the prior distribution is 

  ~ N( , ) where  is ( ×1) p arameter vector, t hen the marginal 

(unconditional) dist ribution of y  is: 

y  ~ N(  , + ') ……(7) 

and the p osterior (conditional) dist ribution of  given y is: 

|y ~ N(Bb , B) ………(8) 

Where : = ' +  and b= ' y + 

        To apply  this lemma and d emonst rate that a simple 2_stage Bay esian formalism 

p roduces generalized shrinkage estimators, we first make the identification: 

 = xβ , = I and = x'x + . Next, we set the p rior mean value for β 

to zero by  taking =0 and assume that  (and ) will be simultaneously diagonalizable 

with x'x by  restricting att ention to p rior variance_ covariance matrices of the gener al form   

= .G G', where K is 

a diagonal p×p  matrix and G is defin ed as in (5) . Now the Bay es estimate is the mean Bb of 

the post erior distribution of β given y  and this mean vector is of t he 

general form: 

E(β|y )=Bb= ( ' Y+ ) 

= 

= 

= G H'y 

=G H'y 

 
IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

=G D H'y  = G∆C =  …….(9) 

Where ∆ = D is t he diagonal matri x of generalized shrinkage factors and C is 

the vector of uncorrelated comp onents of the least  squares estimator. 

The  Ordinary Ridge Regression Esti mator: 

        In the previous section we have demonstrated that all generalized shrinkage estimators 

are 2_st age Bayes estimators. This include of course the case of 

ordinary  ridge regression estimator p rop osed by  Hoerl and Kennard (1970). To demonstrate 

this fact let us assume that an orthogonal matrix P is given such that P' P = D = 

diag( ,….., ), ≥…….≥ >0. M orever, supp ose that 

Z = P'  = ( ,….., )' and W=P'β = ( ,…., )' then:  Z ~ (W, D).  Let us 

assume that W has a p rior dist ribution given by  W ~ (0, λI)  for some p ositive constant 

λ, thus, according to the lemma stated in section 2, the p ost erior distribution of W given Z is: 

W|Z ~ [ λ Z, λI- ] …….(10)     

then the 

Bayes estimator of W is: 

 = Z for Z=P' . Consequently , the Bay es estimator of β is: 

P  = P P' 

= 

=  = x'y  for k =  

= x'y =  ……(11). 

       It is obvious that the estimator in (11) coincid e with t he ordinary ridge regr ession 

estimator given  in (4). 

 Esti mating The Ridge Parameter_Bayesian Approach: 

     There are many  different methods for selectin g the value of the ridge parameter k. The 

method we try  here is based upon the lemma st ated in section (2). Accordingly,the marginal 

distribution of Z is:  

Z ~ (0, D+ λI) 

 
IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

E(Z' Z) = tr var(Z) = tr ( D+ λ I) =  tr +λ tr  …..(12) 

where "tr" denotees the trace of the matrix.To find an unb iased estimator for λ, the following 

result is necessary.   (see[6]) 

E(  =  ……..(13) 

Where is an est imator of . The result in (13) can be easi ly p roved by  setting (n-p -

1) / =r then r ~   

E( ) = E( ) =     (see [7] page 176) 

From (12) and (13) an unbiased estimator for λ can be obtained as: 

 = [ ( ) - p ]/tr   or equivalently: 

 =  = [ ( ) – P]/tr(x'x) ……….(14)  

 Esti mating the ridge parameter_New Approach:  

        Subst itut ing from Δ in formula (6) by  D it can easily be shown t hat the 

ordinary  ridge regression estimator is a member of generalized shrinkage est imators class, in 

such a case the shrinkage factors  of ordinary  ridge regression estimator will  hav e the form 

:  

=   , 1   ………(15) 

       In (1970) Hoerl and Kennard prop osed an est imator for the ridge parameter given 

as:  =    ……..(16)     [ see[ 8 ]) 

       In this section, a new  app roach  is used to derive  given in  (16)   , this app roach is 

represented by  minimizin g  M SE( )  as follows: 

M SE( ) = M SE(GΔC) = GM SE(ΔC)G'  

Where M SE(ΔC) = +(I-Δ)γγ'(I-Δ) Is t he mean squared error matrix of ΔC  with 

ith diagon al element given as:    { see[3]} 

 
IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

M SE( ) = / +  ……….(17)  

Differintiating M SE( ) with resp ect to  we obtain: 

 =2 / -2(1- )  ……….  (18) 

         While the second p artial derivative is nonnegative const ant it follows that equating the 

derivative in (18) to zero will yield a minimum value for M SE( ) this optimal amount of 

shrinkage for the i_th uncorrelated co mponent  is: 

=  =  …….. (19) 

     We deriv e the formula in (19) as follows:  

2 / -2(1- ) =0 then / = -  hence  =  -  which implies that 

  + =   then solving for  we obtain the required result. 

      Now we suggest comp arin g the shrinkage factor of ridge regression estimator given in 

equation (15) with  given in equation (19), hence we conclude that the value of the 

ridge p arameter k must  be equal to / . Since each of  and is unknown, we can use 

their estimated values, thus : 

 =  =  ……….(20)  

Where  is the residual mean square in the an aly sis of variance table obtained from the 

st andard least squares fit. 

Numerical Example: 

       In this section a data set suffer from a high degr ee of multicollinearity  is used. The data 

are from Aljibori (2004) (see [11] for details). It was designed to measure the effect of five 

exp lanatory  variables , ,….,  on the resp onse variable y. Where the exp lanatory  

variables rep resent t he number of , managerials , technicians, skilled workers, unskilled 

workers and service workers resp ectively, while the resp onse variable y represents t he 
p roductivity  of the indust rial sector in Iraq measured by the surp lus value method for t he 

p eriod 21 y ears from 1970 to 1990.Our p urp ose is only to comp are the p erformance of 

Bayesian and conventional estimators for the ridge parameter. The original data are presented 

in table (1). 

IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

        For t his data we found that the est imated value of the ridge parameter was         = 

0.125260 and = 0.494107  obtained by apply ing the formula in (14) and in (20) resp ectively. 

In order to make the rid ge regr ession analysis we used the numeric calculation statist ical 

system (NCSS) and the  results were given  in table (2) throu gh table (6). 

Pearson correlations were  given for all variables in table (2). These correlation coefficients 

show which exp lanatory  variables are h ighly correlated with resp onse variable and with each 

other. Exp lanatory  variables that are highly correlated with one another may  cause 
multicollinear ity  p roblem.  

      Table (3) gives an ei genvalue an aly sis of t he exp lanatory  variables after they      

have been centerd and scaled. Not ice that incremental p ercent is t he percent 

this eigenvalue is of the tot al. Percents near zero indicate a multico llinearity  p roblem. The 

condition number is t he largest eigenvalue divided by each corresp onding eigenvalue. 

Condition numbers more than 100 indicates multicollinearity  p roblem. (See[9]). 

Conclusions: 

1_A new app roach for estimating the rid ge parameter was introduced by using the singular 
value decomposition technique. 

2_In their development of ridge regression, Hoerl and Kennard focus att ention on the 

eigenvalues of the information matrix x'x. A seriously  non orthogonal   p roblem is 

characterized by the fact t hat the smallest eigenvalue is very  much smaller than unity . In our 

p roblem the smallest  eigenvalue is 0.0093 t his indicates t hat our data set suffer from a high 

degr ee of multicollinearity .             3_It should also be noted that the variance inflation factor 

(VIF) is an additional measure of multico llinearity . It is the recip rocal of (1- ) where  is 

the square 

value of the multiple correlation coeff icient between the exp lanatory  variable  and other 

exp lanatory  variables. A VIF of 10 or more indicates a multicollinearity  p roblem. (See[10]). 

In our p roblem the largest VIF value is 65.16 which is an additional indicator that our data set 

suffer from a high degr ee of multicollinearity .                                                                             

                                       4_  Since one of the objects of ridge regression is to reduce the 
st andard error of the regression coefficients, it is of interest to see how much reduction has 

taken p lace. For our p roblem it should be noted from table (5) and table (6)  that t he standard 

errors for the ridge regression coeff icients are less than the corresp onding st andard errors for 

least squares coefficients. Also we note that the standard errors for the ridge regr ession 

coefficients obtained by using the conventional  mehod for est imatin g the rid ge parameter ar e 

less than the corresp onding st andard errors for the ridge regr ession coefficients obtained by 

using the Bay esian method. From this comp arison we conclude that the conventional mehod 
is p erformed better than the Bay esian method for this data set. 

 
IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

References: 

1. Drap er, N.R. and Smith, H. (1981),"Applied regression analy sis" second edition, John 

Wiley  and Sons, New York. 

2.Goldst ein,M . and Smith,A.F.M .(1974),"Ridge typ e est imator for regression analy sis."Jornal 

Roy al Statist ical Society .  B36,284_291.  

3.Rao,C.R.(1973),"Linear Statist ical Inference and Its App lications."second edition,John 

Wiley  and Sons. New York. 

4.Obenchain,R.L. (1978),"Good and op timal ridge est imators." Ann. Statist.6,1111_1121. 

5.Lindley ,D.V. and Smith,A.F.M .(1972),"Bay es estimators for the linear model."Jornal Royal 

 Statist ical Society . B34, 1_41. 

.Srivast ava,M .I. (2002),"M ethods Of M ultivariate Statist ics."Wiley, New York6  

7.Hogg,R.V. and  Craig,A.T .(1978),"Introduction to mathematical statist ics." Fourt h 

edition,M acmillan pub.Co.New York. 

8.Hoerl,A.E. and Kennard,R.W.(1970),"R idge regression: Biased estimation for non 
, 55_67.12orthogonal p roblems."Technometrics.  

9.Chatt erjee,S. and Price,P.(1977),"Re gression analy sis by  examp le."John Wiley and Sons, 

New York. 

10.M arquardit,D.W.(1970),"Generalized inverse,Ridge regr ession, and nonlinear 

, 591_612.12estimation."Technometrics. 

تخطیط الموارد البشریة ودورھا في تنمیة القطاع الصناعي في العراق ) :  2004.( خالد ابراھیم سلمان، الجبوري 11.

رسالة ماجستیر مقدمة الى مجلس المعھد العالي للدراسات السیاسیة والدولیة في الجامعة . 1990_1970للفترة  

  .المستنصریة

  
IHJPAS


              IBN AL- HAITHAM J. FOR PURE & APPL. SC I.         VO L.23 (3)  2010 

Table(1)Effects of five explanatory variable s x1,x2,….,x5    

the  response  variable  y on  

 
Table(2) Correlation matrix section  

Y X5 X4 X3 X2 X1 
0.749380 0.821688 0.380475 - 0.780634 0.967806 1.000000 X1 
0.790267 0.912351 -0.541955 0.688120 1.000000 0.967806 X2 
0.631133 0.513239 0.028945 - 1.000000 0.688120 0.780634 X3 
0.295671 - 0.645230 - 1.000000 0.028945 - 0.541955 - 0.380475 - X4 
0.806230 1.000000 0.645230 - 0.513239 0.912351 0.821688 X5 
1.000000 0.806230 0.295671 - 0.631133 0.790267 0.749380 y 

  
Table(3) Eigen values of correlati ons 

 
X5 X4 X3 X2 X1 y 
14216 50096 24524 473 1516 1271 
16467 57133 26012 1923 1514 1440 
18697 66795 30360 2490 2037 1355 
18716 66263 29385 2734 2019 1509 
20696 61718 36619 3322 2210 1254 
23474 61028 37393 3563 2234 1220 
25283 69344 40034 4205 2525 1546 
26172 66181 10310 4293 2381 1916 
28108 62039 49163 5278 2944 2381 
30707 69339 57299 8529 3528 2585 
32561 69671 60965 9661 4308 2810 
33609 69047 56600 9840 4444 1440 
32789 65441 55789 10830 4588 2493 
52340 44175 44125 12968 4075 3285 
47633 52476 46492 12875 4563 3062 
63398 48322 46599 12163 4031 3403 
62205 49672 49070 12712 4479 2875 
47530 47330 49035 12610 4343 2861 
46260 46160 48013 12615 4299 2596 
4510 45123 40860 12630 4345 1710 
44915 44925 49095 12955 4865 1777 

Condit ion 
number 

Cumulat ive 
percent 

In cremental 
percent 

eigenvalue No. 

1.00 72 .47 72 .47 3.623598 1 
3.45 93 .51 21 .04 1.051811 2 
7.151 97 .73 4.22 0.211248 3 

34 .83 99 .81 2.08 0.104043 4 
389.65 100.00 0.19 0.009300 5 

IHJPAS


IBN AL- HAITHAM J. FO R PURE & APPL. SC I.         VO L.23 (3 ) 2010 

Table(4)Eigenvectors of correlations 

X5 X4 X3 X2 X1 eigenvalue No. 
0.487824 - 0.304255 0.284390 - 0.518719 - 0.502625 - 3.623598 1 
0.210802 0.751940 - 0.592083 - 0.007497 - -0.199033 1.051811 2 
0.425416 0.569859 0.539120 - 0.222736 0.190854 0.211248 3 
0.710381 - 0.109447 - 0.303237 - 0.292886 0.552851 0.104043 4 
0.179037 0.074329 - 0.035429 - 0.771674 - 0.604718 0.009300 5 

 
Table(5)Ridge vs. Least squares comparison section for k = 0.125260  

LS st andad 
error 

Ridge  
standard 

error 

LS  
VIF 

Ridge VIF St andarded LS  
coefficients 

St andardized 
Ridge coefficients 

Explanat ory  
variables 

0.4965 0.0884 42.54 0.955 0.9859 - 0.0126 X1 
0.1562 0.0184 65.16 0.637 1.1247 0.2329 X2 
0.0127 0.0093 3.326 1.248 0.2634 0.1695 X3 

0.0154 0.0114 2.808 1.085 0.4680 0.1653 X4 
0.0181 0.0085 9.261 1.447 0.7006 0.5359 X5 

 
Table(6)Ridge vs. Least squares comparison section for k = 0.494107 

LS st andard 
error 

Ridge 
standard 

error 

LS VIF Ridge VIF St andardized 
LS 

Coefficients 

St andardized 
Ridge 

Coefficients 

Explanat ory  
variables 

0.4965 0.0447 42 .54 0.1892 -0.9859 0.1319 X1 
0.1562 0.0092 65 .16 0.1254 .12471 0.1970 X2 
0.0127 0.0059 3.326 0.3861 0.2634 0.1563 X3 
0.0154 0.0079 2.808 0.4101 0.4680 0.0452 X4 
0.0181 0.0044 9.261 0.2951 0.7006 0.3126 X5 

  
IHJPAS


  2010 )3( 23مجلة ابن الھیثم للعلوم الصرفة والتطبیقیة               المجلد

 توظیف اسلوب بیز في تحلیل انحدار الحرف

  
س   حازم منصور كوركی

ة،قسم الریاضیات    جامعة بغداد،ابن الھیثم -كلیة التربی

  
  خالصھال

فر معلومات مسبقة عن احرف على فرض تواسلوب بیز في تحلیل انحدار الحرف وتقدیر معلمة ال وظففي ھذا البحث 

النموذج یعاني من مشكلة التعدد الخطي غیر التام بدرجة كبیرة  ،معلمات انموذج االنحدار الخطي العام كما تم ، وان ا

رف الذي اقترحھ كل من  دام اسلوب جدید في ایجاد مقدر معلمة الح ومن  1970في عام  Hoerl and Kennardاستخ

  .عددي اجریت مقارنة ألفضلیة اداء ھذه المقدرات خالل دراسة مثال 

  
IHJPAS