IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 Bayesian Analyses of Ridge Regression Prooblems H. M. Gorgees Departme nt of Mathematics, College of Education Ibn-Al-Haitham,Unive rsity of Baghdad Abstract: A Bay esian formulation of the ridge regression p roblem is consid erd, which derives from a direct sp ecification of prior informations about p arameters of general linear regression model when data suffer from a high degr ee of multicollinearity .A new app roach for deriving the conventional estimator for t he ridge parameter prop osed by Hoerl and Kennard (1970) as well as Bayesian estimator are presented. A numerical example is st udied in order to comp are the performance of these estimators. Introduction: The problem of multicollin earity exists when there exists a linear relationship or an approximate linear relationship amon g two or more exp lanatory variables. M ulticollinearity can be thought of as a situation where two or more exp lanatory variables in the data set move together. As a consequence it is impossible to use this data set to decide which of the exp lanatory variables is p roducing the observed chan ge in the resp onse variable. No t reatment of the data or transformation of the model will cure this deficiency. Consequently ,the best way to deal with multicollin earity may be to find a different data set, or additional data to break the association between the related variab les. However some multicollinearity is nearly alway s p resent, but the important p oint is whether the multicollin earity is serious enough to cause app reciable d amage to the regression. Indicators of multicollinearity include a low determinant of t he information matrix, a very high correlation amon g two or more exp lanatory variables , very high correlations amon g two or more est imated coefficients, and sign ificant regr ession of one exp lanatory variable on one or more exp lanatory variables. Key words: linear regr essioin model , multicollinearity , ridge regression , generalized shrinkage estimators , Bayesian estimator , singular v alue decomposition. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 This p aper deals with multicollinear ity in the classical linear regr ession model y =xβ+u …..(1) Where y is an ( of observations on t he resp onse variable, x = ( is an (n matrix and of full colu mn rank, β is a ( p p arameter vector (vector of unknown regr ession coefficients) and u is an ( n vector of random disturbances, E(u)=0 and var(u)= I, and both β and are unknown. T he least squares estimator of β is: (see[1]) = x'y …..(2) Where denote the least squares estimator of β. T he two key p rop erties of are that it is unbiased,E( )=β, and that it has minimum variance among all lin ear unbiased estimators.The mean square error of is: MSE( )= ……(3) (see[2]) Where "s are the eigenvalues of x'x and ….. >0 .. If the smallest eigenvalue of x'x is very much smaller than 1, then a seriously ill_conditioned (or multicollinear ity ) p roblem arises. Thus,for ill_conditioned data, the least squares solution y ields coefficients whose absolute values are too large and whose signs may actually reverse with negligible chan ges in the data. That is in the case of multicollinearity the least squares estimator can be poor in terms of various mean squared error criterion. Consequently , a great deal of work has been done to const ruct alternatives to the least squares estimator when multicollinear ity is p resent. In the seventies Hoerl and Kennard introduced a class of biased estimators for p arameters in general lin ear regr ession model labeled ridge est imators as a rival to the least squares estimator when samp le data are affected by a high degree of multicollinear ity . The outhers show t hat in any given p roblem there is at least one member of this class which has t otal mean square error smaller than the tot al variance of the corresp onding least squares estimator. T he ridge est imator dep ends crucially up on an exo geneous p arameter, say k. For any k the corresp onding ridge estimator denoted by is defined to be: = x'y…….(4) ( see[1]) IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 We argue that it is typ ically true that there is available p rior informations about t he p arameters, and this may be exp loited to find imp roved est imators. In this p aper attention is focuced upon Bayesian formulation for gener alized shrinkage est imators and ordinary ridge regr ession estimator,moreover a Bayes estimator as well as a conventional estimator for t he ridge p arameter is derived. Generalized Shrinkage Esti mators: Given an (n ) matrix of regr essors x and an (n ) vector of the corresp onding resp onse y . Assume that sample means have been removed from the data (so that 1'x=0'and 1'y=0, where 1 is an n_vector of ones.) and write the standard linear regr ession model as E(y|x)= xβ and var(y|x)= (I-11'/n) where β is a vector of unknown regr ession coefficients and is the unknown error variance. The singu lar value decomp osition of x will b e denoted by: (see[3]). X=H G' …….(5) Where H is an ( semi orthogonal matri x satisfy H'H= , is a (p diagonal matrix of orderd singular values of x, ….. > 0, G is a (p orthogonal matrix whose co lumns rep resent t he eigenvectors of t he information matri x x'x. Assume that >0 so that β is estimable, then as in" Obenchain (1978)" (see [4]) we get =GC where C= H'y contains the uncorrelated comp onents of where E(C)=E(G' ) = G'β =γ say, and: Var(C)=var(G' )=G'var( )G= G' G= . Not ice that the elements of C are uncorr elated since their variance matrix is diagon al. The vector of gener alized shrinkage est imator (or generalized ridge regression estimator) will be denoted here by and will be of the general form : (see[4]) =G∆C= …….. (6) Where is the j_th column of the matrix G, is the j_th diagonal element of the shrinkage factors matrix ∆, we will usually restrict the range of shrinkage factors t o 0 , j= 1,2, ….,p. is the j_th element of t he uncorrelated components vector C. In the remain ing of this section we discuss Bay esian methods for defining the form of shrink age of sample estimates t owards a subjective p rior distribution.Lindely and Smith (1972) describ e a Bayesian formalizim for h ierarchical (multi_st age) an aly sis of linear mod els using conju gate IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 multivariate_normal prior dist ribution.(see [5]) This formalism exp resses unknown p arameters at each st age of an analysis in terms of a linear model at the previous lower st age . But, although disp ersion matrices at each stage can b e arbitrary , they must be known. And, at the final st age, both t he mean v ector and the disp ersion matrix must be known.The fundamental lemma of Lind ely and Smith (1972) states that: Lemma:If the samp eling d istribution of the resp onse,y , is y | ~ N( , ) where is ( ×1) p arameter vector and the prior distribution is ~ N( , ) where is ( ×1) p arameter vector, t hen the marginal (unconditional) dist ribution of y is: y ~ N( , + ') ……(7) and the p osterior (conditional) dist ribution of given y is: |y ~ N(Bb , B) ………(8) Where : = ' + and b= ' y + To apply this lemma and d emonst rate that a simple 2_stage Bay esian formalism p roduces generalized shrinkage estimators, we first make the identification: = xβ , = I and = x'x + . Next, we set the p rior mean value for β to zero by taking =0 and assume that (and ) will be simultaneously diagonalizable with x'x by restricting att ention to p rior variance_ covariance matrices of the gener al form = .G G', where K is a diagonal p×p matrix and G is defin ed as in (5) . Now the Bay es estimate is the mean Bb of the post erior distribution of β given y and this mean vector is of t he general form: E(β|y )=Bb= ( ' Y+ ) = = = G H'y =G H'y IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 =G D H'y = G∆C = …….(9) Where ∆ = D is t he diagonal matri x of generalized shrinkage factors and C is the vector of uncorrelated comp onents of the least squares estimator. The Ordinary Ridge Regression Esti mator: In the previous section we have demonstrated that all generalized shrinkage estimators are 2_st age Bayes estimators. This include of course the case of ordinary ridge regression estimator p rop osed by Hoerl and Kennard (1970). To demonstrate this fact let us assume that an orthogonal matrix P is given such that P' P = D = diag( ,….., ), ≥…….≥ >0. M orever, supp ose that Z = P' = ( ,….., )' and W=P'β = ( ,…., )' then: Z ~ (W, D). Let us assume that W has a p rior dist ribution given by W ~ (0, λI) for some p ositive constant λ, thus, according to the lemma stated in section 2, the p ost erior distribution of W given Z is: W|Z ~ [ λ Z, λI- ] …….(10) then the Bayes estimator of W is: = Z for Z=P' . Consequently , the Bay es estimator of β is: P = P P' = = = x'y for k = = x'y = ……(11). It is obvious that the estimator in (11) coincid e with t he ordinary ridge regr ession estimator given in (4). Esti mating The Ridge Parameter_Bayesian Approach: There are many different methods for selectin g the value of the ridge parameter k. The method we try here is based upon the lemma st ated in section (2). Accordingly,the marginal distribution of Z is: Z ~ (0, D+ λI) IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 E(Z' Z) = tr var(Z) = tr ( D+ λ I) = tr +λ tr …..(12) where "tr" denotees the trace of the matrix.To find an unb iased estimator for λ, the following result is necessary. (see[6]) E( = ……..(13) Where is an est imator of . The result in (13) can be easi ly p roved by setting (n-p - 1) / =r then r ~ E( ) = E( ) = (see [7] page 176) From (12) and (13) an unbiased estimator for λ can be obtained as: = [ ( ) - p ]/tr or equivalently: = = [ ( ) – P]/tr(x'x) ……….(14) Esti mating the ridge parameter_New Approach: Subst itut ing from Δ in formula (6) by D it can easily be shown t hat the ordinary ridge regression estimator is a member of generalized shrinkage est imators class, in such a case the shrinkage factors of ordinary ridge regression estimator will hav e the form : = , 1 ………(15) In (1970) Hoerl and Kennard prop osed an est imator for the ridge parameter given as: = ……..(16) [ see[ 8 ]) In this section, a new app roach is used to derive given in (16) , this app roach is represented by minimizin g M SE( ) as follows: M SE( ) = M SE(GΔC) = GM SE(ΔC)G' Where M SE(ΔC) = +(I-Δ)γγ'(I-Δ) Is t he mean squared error matrix of ΔC with ith diagon al element given as: { see[3]} IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 M SE( ) = / + ……….(17) Differintiating M SE( ) with resp ect to we obtain: =2 / -2(1- ) ………. (18) While the second p artial derivative is nonnegative const ant it follows that equating the derivative in (18) to zero will yield a minimum value for M SE( ) this optimal amount of shrinkage for the i_th uncorrelated co mponent is: = = …….. (19) We deriv e the formula in (19) as follows: 2 / -2(1- ) =0 then / = - hence = - which implies that + = then solving for we obtain the required result. Now we suggest comp arin g the shrinkage factor of ridge regression estimator given in equation (15) with given in equation (19), hence we conclude that the value of the ridge p arameter k must be equal to / . Since each of and is unknown, we can use their estimated values, thus : = = ……….(20) Where is the residual mean square in the an aly sis of variance table obtained from the st andard least squares fit. Numerical Example: In this section a data set suffer from a high degr ee of multicollinearity is used. The data are from Aljibori (2004) (see [11] for details). It was designed to measure the effect of five exp lanatory variables , ,…., on the resp onse variable y. Where the exp lanatory variables rep resent t he number of , managerials , technicians, skilled workers, unskilled workers and service workers resp ectively, while the resp onse variable y represents t he p roductivity of the indust rial sector in Iraq measured by the surp lus value method for t he p eriod 21 y ears from 1970 to 1990.Our p urp ose is only to comp are the p erformance of Bayesian and conventional estimators for the ridge parameter. The original data are presented in table (1). IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 For t his data we found that the est imated value of the ridge parameter was = 0.125260 and = 0.494107 obtained by apply ing the formula in (14) and in (20) resp ectively. In order to make the rid ge regr ession analysis we used the numeric calculation statist ical system (NCSS) and the results were given in table (2) throu gh table (6). Pearson correlations were given for all variables in table (2). These correlation coefficients show which exp lanatory variables are h ighly correlated with resp onse variable and with each other. Exp lanatory variables that are highly correlated with one another may cause multicollinear ity p roblem. Table (3) gives an ei genvalue an aly sis of t he exp lanatory variables after they have been centerd and scaled. Not ice that incremental p ercent is t he percent this eigenvalue is of the tot al. Percents near zero indicate a multico llinearity p roblem. The condition number is t he largest eigenvalue divided by each corresp onding eigenvalue. Condition numbers more than 100 indicates multicollinearity p roblem. (See[9]). Conclusions: 1_A new app roach for estimating the rid ge parameter was introduced by using the singular value decomposition technique. 2_In their development of ridge regression, Hoerl and Kennard focus att ention on the eigenvalues of the information matrix x'x. A seriously non orthogonal p roblem is characterized by the fact t hat the smallest eigenvalue is very much smaller than unity . In our p roblem the smallest eigenvalue is 0.0093 t his indicates t hat our data set suffer from a high degr ee of multicollinearity . 3_It should also be noted that the variance inflation factor (VIF) is an additional measure of multico llinearity . It is the recip rocal of (1- ) where is the square value of the multiple correlation coeff icient between the exp lanatory variable and other exp lanatory variables. A VIF of 10 or more indicates a multicollinearity p roblem. (See[10]). In our p roblem the largest VIF value is 65.16 which is an additional indicator that our data set suffer from a high degr ee of multicollinearity . 4_ Since one of the objects of ridge regression is to reduce the st andard error of the regression coefficients, it is of interest to see how much reduction has taken p lace. For our p roblem it should be noted from table (5) and table (6) that t he standard errors for the ridge regression coeff icients are less than the corresp onding st andard errors for least squares coefficients. Also we note that the standard errors for the ridge regr ession coefficients obtained by using the conventional mehod for est imatin g the rid ge parameter ar e less than the corresp onding st andard errors for the ridge regr ession coefficients obtained by using the Bay esian method. From this comp arison we conclude that the conventional mehod is p erformed better than the Bay esian method for this data set. IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 References: 1. Drap er, N.R. and Smith, H. (1981),"Applied regression analy sis" second edition, John Wiley and Sons, New York. 2.Goldst ein,M . and Smith,A.F.M .(1974),"Ridge typ e est imator for regression analy sis."Jornal Roy al Statist ical Society . B36,284_291. 3.Rao,C.R.(1973),"Linear Statist ical Inference and Its App lications."second edition,John Wiley and Sons. New York. 4.Obenchain,R.L. (1978),"Good and op timal ridge est imators." Ann. Statist.6,1111_1121. 5.Lindley ,D.V. and Smith,A.F.M .(1972),"Bay es estimators for the linear model."Jornal Royal Statist ical Society . B34, 1_41. .Srivast ava,M .I. (2002),"M ethods Of M ultivariate Statist ics."Wiley, New York6 7.Hogg,R.V. and Craig,A.T .(1978),"Introduction to mathematical statist ics." Fourt h edition,M acmillan pub.Co.New York. 8.Hoerl,A.E. and Kennard,R.W.(1970),"R idge regression: Biased estimation for non , 55_67.12orthogonal p roblems."Technometrics. 9.Chatt erjee,S. and Price,P.(1977),"Re gression analy sis by examp le."John Wiley and Sons, New York. 10.M arquardit,D.W.(1970),"Generalized inverse,Ridge regr ession, and nonlinear , 591_612.12estimation."Technometrics. تخطیط الموارد البشریة ودورھا في تنمیة القطاع الصناعي في العراق ) : 2004.( خالد ابراھیم سلمان، الجبوري 11. رسالة ماجستیر مقدمة الى مجلس المعھد العالي للدراسات السیاسیة والدولیة في الجامعة . 1990_1970للفترة .المستنصریة IHJPAS IBN AL- HAITHAM J. FOR PURE & APPL. SC I. VO L.23 (3) 2010 Table(1)Effects of five explanatory variable s x1,x2,….,x5 the response variable y on Table(2) Correlation matrix section Y X5 X4 X3 X2 X1 0.749380 0.821688 0.380475 - 0.780634 0.967806 1.000000 X1 0.790267 0.912351 -0.541955 0.688120 1.000000 0.967806 X2 0.631133 0.513239 0.028945 - 1.000000 0.688120 0.780634 X3 0.295671 - 0.645230 - 1.000000 0.028945 - 0.541955 - 0.380475 - X4 0.806230 1.000000 0.645230 - 0.513239 0.912351 0.821688 X5 1.000000 0.806230 0.295671 - 0.631133 0.790267 0.749380 y Table(3) Eigen values of correlati ons X5 X4 X3 X2 X1 y 14216 50096 24524 473 1516 1271 16467 57133 26012 1923 1514 1440 18697 66795 30360 2490 2037 1355 18716 66263 29385 2734 2019 1509 20696 61718 36619 3322 2210 1254 23474 61028 37393 3563 2234 1220 25283 69344 40034 4205 2525 1546 26172 66181 10310 4293 2381 1916 28108 62039 49163 5278 2944 2381 30707 69339 57299 8529 3528 2585 32561 69671 60965 9661 4308 2810 33609 69047 56600 9840 4444 1440 32789 65441 55789 10830 4588 2493 52340 44175 44125 12968 4075 3285 47633 52476 46492 12875 4563 3062 63398 48322 46599 12163 4031 3403 62205 49672 49070 12712 4479 2875 47530 47330 49035 12610 4343 2861 46260 46160 48013 12615 4299 2596 4510 45123 40860 12630 4345 1710 44915 44925 49095 12955 4865 1777 Condit ion number Cumulat ive percent In cremental percent eigenvalue No. 1.00 72 .47 72 .47 3.623598 1 3.45 93 .51 21 .04 1.051811 2 7.151 97 .73 4.22 0.211248 3 34 .83 99 .81 2.08 0.104043 4 389.65 100.00 0.19 0.009300 5 IHJPAS IBN AL- HAITHAM J. FO R PURE & APPL. SC I. VO L.23 (3 ) 2010 Table(4)Eigenvectors of correlations X5 X4 X3 X2 X1 eigenvalue No. 0.487824 - 0.304255 0.284390 - 0.518719 - 0.502625 - 3.623598 1 0.210802 0.751940 - 0.592083 - 0.007497 - -0.199033 1.051811 2 0.425416 0.569859 0.539120 - 0.222736 0.190854 0.211248 3 0.710381 - 0.109447 - 0.303237 - 0.292886 0.552851 0.104043 4 0.179037 0.074329 - 0.035429 - 0.771674 - 0.604718 0.009300 5 Table(5)Ridge vs. Least squares comparison section for k = 0.125260 LS st andad error Ridge standard error LS VIF Ridge VIF St andarded LS coefficients St andardized Ridge coefficients Explanat ory variables 0.4965 0.0884 42.54 0.955 0.9859 - 0.0126 X1 0.1562 0.0184 65.16 0.637 1.1247 0.2329 X2 0.0127 0.0093 3.326 1.248 0.2634 0.1695 X3 0.0154 0.0114 2.808 1.085 0.4680 0.1653 X4 0.0181 0.0085 9.261 1.447 0.7006 0.5359 X5 Table(6)Ridge vs. Least squares comparison section for k = 0.494107 LS st andard error Ridge standard error LS VIF Ridge VIF St andardized LS Coefficients St andardized Ridge Coefficients Explanat ory variables 0.4965 0.0447 42 .54 0.1892 -0.9859 0.1319 X1 0.1562 0.0092 65 .16 0.1254 .12471 0.1970 X2 0.0127 0.0059 3.326 0.3861 0.2634 0.1563 X3 0.0154 0.0079 2.808 0.4101 0.4680 0.0452 X4 0.0181 0.0044 9.261 0.2951 0.7006 0.3126 X5 IHJPAS 2010 )3( 23مجلة ابن الھیثم للعلوم الصرفة والتطبیقیة المجلد توظیف اسلوب بیز في تحلیل انحدار الحرف س حازم منصور كوركی ة،قسم الریاضیات جامعة بغداد،ابن الھیثم -كلیة التربی خالصھال فر معلومات مسبقة عن احرف على فرض تواسلوب بیز في تحلیل انحدار الحرف وتقدیر معلمة ال وظففي ھذا البحث النموذج یعاني من مشكلة التعدد الخطي غیر التام بدرجة كبیرة ،معلمات انموذج االنحدار الخطي العام كما تم ، وان ا رف الذي اقترحھ كل من دام اسلوب جدید في ایجاد مقدر معلمة الح ومن 1970في عام Hoerl and Kennardاستخ .عددي اجریت مقارنة ألفضلیة اداء ھذه المقدرات خالل دراسة مثال IHJPAS