@1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 Employing Ridge Regression Procedure to Remedy the Multicollinearity Problem Hazim M. Gorgees Bushra A. Ali Dept. of Mathematics/College of Education for Pure Science(Ibn AL-Haitham) / University of Baghdad Received in:11 September 2012 Accepted in:15 October 2012 Abstract In this paper we introduce many different Methods of ridge regression to solve multicollinearity problem in linear regression model. These Methods include two types of ordinary ridge regression (ORR1), (ORR2) according to the choice of ridge parameter as well as generalized ridge regression (GRR). These methods were applied on a dataset suffers from a high degree of multicollinearity, then according to the criterion of mean square error (MSE) and coefficient of determination (R2) it was found that (GRR) method performs better than the other two methods. Keywords : Ordinary ridge regression, Generalized ridge regression, Shrinkage estimators, Singular value decomposition, Coefficient of determination. 320 | Mathematics @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 Introduction In this paper we deal with the classical linear regression model y =Xβ+ ε ….. (1) where y is (n×1) vector of response variable, X is (n×p) matrix,(n > p)of explanatory variables, β is (p×1) vector of unknown parameters and ε is (n×1) vector of unobservable random errors, where E(ε)=0, var(ε)=σ2I. Considerable attention is currently being focused on biased estimation of the regression estimators of a linear regression model .This attention is due to the inability of classical least squares to provide reasonable point estimates when the matrix of regression variables is ill-conditioned. Despite possessing the very desirable property of being minimum variance in the class of linear unbiased estimators under the usual conditions imposed on the model, the least squares estimators can, nevertheless, have extremely large variances when the data are multicollinear which is one form of ill-conditioning. Much research, therefore, on obtaining biased estimators with better overall performance than the least squares estimator is being conducted. This paper discusses the ridge regression estimators for use with multicollinear data. In contrast to least squares, these estimators allow a small amount of bias in order to achieve a major reduction in the variance. A numerical example is included to illustrate the theoretical relationships. The Case of Multicollinearity The problem of multicollinearity occurs when there exists a linear relationship or an approximate linear relationship among two or more explanatory variables; two types of multicollinearity may be faced in regression analysis , perfect and near multicollinearity. As an example of perfect multicollinearity assuming that the three components of a mixture are studied by including their percentages of the total p1,p2,p3 obviously these variables will have the perfect linear relationship p1+p2+p3 =100. During regression calculations, the exact linear relationship causes a division by zero which in turn causes the calculations to be aborted. When the relationship is not exact, the division by zero does not occur and the calculations are not aborted. However the division by a very small quantity still distorts the results. Hence, one of the first steps in a regression analysis is to determine if a multicoiiinearity is a problem. Multicollinearity can be thought of as a situation where two or more explanatory variables in the data set move together. As a consequence it is impossible to use this data set to decide which of the explanatory variables is producing the observed change in the response variable. Moreover, multicollinearity can create inaccurate estimates of the regression coefficients. To deal with multicollinearity we must be able to identify its source. The source impacts the analysis, the corrections and the interpretation of linear model. The sources of multicollinearity may be summarized as follows :[1] 1- Data collection. In this case the data have been collected from a narrow subspace of the explanatory variable. The multicollinearity has been created by the sampling methodology and doesn’t exist in the population .Obtaining more data on an expanded range would cure this multicollinearity problem. 2- Physical constraints on the linear model or population. This source will exist no matter what sampling technique is used. Many manufacturing or service processes have constraints on explanatory variables (as to their range),either physically, politically, or legally which will create multicollinearity moreover extreme values or outliers in the X space can cause multicollinearity. Some multicollinearity is nearly always present, but the important point is whether the multicollinearity is serious enough to cause appreciable damage to the regression analysis. Indicators of multicollinearity include a low determinant of the information matrix X´X, a very high correlation among two or more explanatory variables, very high correlation among two or more estimated coefficients a very small (near zero)eigenvalues of the correlation matrix of the explanatory variables. Moreover the Farrar-Glauber test based on Chi square 321 | Mathematics @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 statistic may be used to detect multicollinearity. Accordingly the null hypothesis to be tested is: H0 : Xj are orthogonal , j= 1,2,….p Against an alternative H1 :Xj are not orthogonal. The test statistic is χ2 = - [(n-1) - 1 6 (2p+5)] ln |D| …………..(2) Where n is the number of observations, p is the number of explanatory variables, |D| is the determinant of correlation matrix. Comparing the calculated value of 𝝌𝝌2 with theoretical value at p(p-1)/2 degrees of freedom and specified level of significant, we reject H0 if the calculated value is more than the theoretical value which means that the dataset suffers from a multicollinearity problem, otherwise the null hypothesis H0 can not be rejected. The Shrinkage Estimators Applying the singular value decomposition we can decompose an (n×p) matrix into three matrices as follows: X=H D1/2 G′ …..…………(3) Where H is an n×p semi orthogonal matrix satisfying H´H =IP, D1/2 is a (p×p) diagonal matrix of ordered singular values of X d11/2 ≥ d21/2 ≥ …… ≥ dp1/2 > 0, G is a (p ×p) orthogonal matrix whose columns represent the eigenvectors of X´X . Accordingly, the ordinary least squares estimator of the regression parameter vector β can be written as: bOLS = (X'X)-1X'Y = GC Where C= D-1/2H'Y is a (p×1) vector containing the uncorrelated components of bOLS [2] The generalized shrinkage estimators will be denoted by bSH may be defined as: [1] …………..(4) where jg  is the j-th column of the matrix G, δj is the j-th diagonal element of the shrinkage factors diagonal matrix ∆, 0 ≤ δj ≤ 1, j = 1,2,…,p, cj is the j-th element of the uncorrelated component vector C. Ordinary Ridge Regression Estimators One of several methods that have been proposed to remedy multicollinearity problem by modifying the method of least squares to allow biased estimators of the regression coefficients, is the ridge regression method. The ridge estimator depends crucially upon an exogenous parameter, say k called the ridge parameter or the biasing parameter of the estimator. For any k ≥0, the corresponding ridge estimator denoted by bRR is defined as: bRR = ( X'X + kI)-1X'Y ………..(5) Where k ≥ 0 is a constant chosen by the statistician on the basis of some intuitively plausible criteria put forward by Hoerl and Kennard.[3] It can be shown that the ridge regression estimator given in (5) is a member of the class of shrinkage estimators as follows:[2] By using Matrix algebra and singular value decomposition approach of matrix X we get: bRR = (X'X + kI)-1 X´Y =[G(D+kI)G']-1GD1/2 H'Y p SH j j j j 1 b G C g c = = ∆ = δ∑  322 | Mathematics @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 = G(D+kI)-1G'GD1/2 H'Y = G(D+kI)-1D1/2H'Y = G[(D+kI)-1D] D -1/2H'Y = G∆C ………… (6) Where ∆=(D+kI)-1D. Equivalently, the shrinkage factors δj, j=1,….,p of the ridge estimator have the form δj = dj dj+k , j= 1,2…..,p ………… (7) where dj is the jth element (eigenvalues) of the diagonal matrix D, and k is the ridge parameter. The Generalized Ridge Regression (GRR) In this section we suggest using the singular value decomposition technique in order to derive the generalized ridge regression estimator for the first time (as far as we know). Let G be a (p×p) orthogonal matrix with columns as eigenvectors (g1,g2….gp) of X´X Hence, G′ (X´X)G = G´(GDG′)G = D = diag(d1,d2,….dp). Then we can rewrite the linear model as: Y =Xβ+ ε =(HD1/2) G'β+ ε = X*𝛼𝛼 + ε …………. (8) Where X* = HD1/2, α =G'β This model is called canonical linear model or uncorrelated components model. The OLS estimate for α is given as : α OLS = (X *′X*)-1X*′Y= (D½H'H D½)-1X*'Y = D-1X*'Y ……………(9) and var (α OLS) = 𝜎𝜎 2 (X*'X*)-1 = 𝜎𝜎2D-1. Which is diagonal. This shows the important property of this parameterization since the elements of αOLS namely (α1, α2 … αp)OLS are uncorrelated. The ridge estimator for 𝛼𝛼 is given by: α RR = (X *′X*+K)-1X*′Y = (D+K)-1X*′Y ……………(10) = (D+K)-1X*'X* α OLS = (I+KD-1)-1 α OLS = wKα OLS = diag ( di di+ki ) α OLS . Where K is a diagonal matrix with entries (k1,k2…kP). This estimate is known as generalized ridge estimate. The mean square error of αRR is given by: MSE(αRR) = var(αRR) + (bias αRR) (bias αRR)' = σ2 tr(wk D–1w′k) + (wk – I) αOLS α′OLS (wk – I)′ = 2 i (OLS)i2 i 2 2 i i i i 2kd (d k ) (d k ) α σ ∑ + ∑ + + …………(11) To obtain the value of ki that minimize MSE (𝛼𝛼RR ) we differentiate equation(11)with respect to ki and equating the resultant derivative to zero thus 2 (RR ) i i (OLS)i2 i 3 3 i i i i i MSE( d kd 0 k (d k ) (d k ) ∂ α α = −σ ∑ + ∑ = ∂ + + Solving for ki we obtain : 2 i 2 (OLS)i k σ = α 323 | Mathematics @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 Since the value of 𝜎𝜎2 is usually unknown we use the estimate value σ^2. Accordingly, when matrix K satisfies 2 i 2 ( OLS)i ˆ k̂ σ = α = diag 2 2 2 2 2 2 ( OLS)1 ( OLS) 2 ( OLS) p ˆ ˆ ˆ ( , ,..... ) σ σ σ α α α Then the mean square error of generalized ridge regression estimate αRR attains the minimum value. The original form of ridge regression estimator can be converted back from the canonical form by: b(GRR) = G α(RR) ……………(12) All the basic results concerning the ordinary ridge regression estimator can be shown to hold for this more general formulation. Choice of Ridge Parameter The ridge regression estimator does not provide a unique solution to the problem of multicollinearity, but provide a family of solutions .These solutions depend on the value of k (the ridge biasing parameter). No explicit optimum value can be found for k. Yet, several stochastic choices have been proposed for this shrinkage parameter .Some of these choices may be summarized as follows:[4] Hoerl and Kennard (1970), suggested a graphical method called ridge trace to select the value of the ridge parameter k. This plot shows the ridge regression coefficients as a function of k. When viewing the ridge trace, the analyst picks the value of k for which the regression coefficients have stabilized. Often, the regression coefficients will vary widely for small values of k and then stabilize. We have to choose the smallest value of k possible (which introduces the smallest bias) after which the regression coefficients have seem to remain constant. Hoerl, Kennard and Baldwin (1975), proposed another method to select a single k value given as: k�(HKB)= PS2 bOLS ′ bOLS …………… (13) Where P is the number of predictor variables, S2 is the OLS estimator for 𝜎𝜎2, bOLS is the OLS estimator for the vector of regression coefficients. Lawless and Wang (1976) have proposed selecting the value of K by using the formula : k�(LW)= PS2 bOLS ′ X′X bOLS …………… (14) Hoerl and Kennard (1970) suggested the iterative method to estimate the value of K based on the formula: kj+1= PS2 [bRR(Kj )]´[bRR(Kj)] …………… (15) The first value of k assumed to be zero and hence, [bRR(K0)]´[bRR(K0)] = b´OLS bOLS. Substituting for k0 in the right hand side of (15) we obtain the first adjusted value k1 which will be also substituted in the right hand side of equation (15) to obtain the second adjusted value k2 then continue the iterations until the following inequality have to be satisfied :[5] j 1 j j k k k + − ≤ ε . ……………(16) Where ϵ is small positive number (close to zero). Hoerl and Kennard proposed the value of ϵ to be [5] 324 | Mathematics @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 1.31trace (X X) 20 P −− ′ ε =      …………..( 17) In the case of generalized ridge regression Hoerl and Kennard proposed method to select ki as follows: ki = S2 α(OLS)i 2 …………….(18) Numerical Example In this section we apply the procedures discussed earlier employing the data obtained from Midland Refineries Company to determine the effect of six factors (explanatory variables X1, X2 …X6) on the productivity of labor (response variable y).The data are given in table (1). Applying the Farrar-Glaubor test given in equation (2), it was shown that the calculated 𝝌𝝌2 is 137.456 while the theoretical value at 15 degree of freedom and 0.05 level of significant is 24.996. Obviously, the calculated value is greater than the theoretical value of the 𝝌𝝌2 which implies that the data suffer from a high degree of multicollinearity. Let us assume that ORR1 represents the ordinary ridge regression estimator with the ridge parameter obtained by Hoerl-Kennard and Baldwin k�(HKB), ORR2 represents the ordinary ridge regression estimator with the ridge parameter obtained by Lawless and Wang k�(LW) and GRR represents the generalized ridge regression estimator. Applying the formulas in equations (13), (14) and (18) we obtain: k�(HKB) = 0.0075410 , k�(LW) = 0 .027179 k� (GRR)i = [0.121641, 0.0117446, 0.0051056, 0.053459, 0.002102, 0.083658] The computation results of variance inflation factors (VIF) for each explanatory variable, the mean square error (MSE) and the coefficient of determination R2 for each method are presented in tables (2), (3) and (4) respectively. Conclusion In addition to the Farrar-Glaubor test, the large values of VIF in table (2) is another indicator that our dataset suffers from a high degree of multicollinearity [6], since the GRR estimator has smaller MSE and larger R2 than other two estimators (ORR1, ORR2) as it is shown in table (3) and (4), we conclude that the GRR is better than ORR1 and ORR2 estimators for remedy the multicollinearity problem in our dataset. References 1. Obenchain,R.L.(1975),''Ridge analysis following a preliminary test of the shrunken hypothesis'',Technometrics, 17. 431-445 2. Gorgees, H.M.(2009),''Using singular value decomposition method for estimating the ridge parameter'', Journal of Economics and Administrative science , 53.1-10 3. Hoel, A.E., and Kennard, R.W.(1970), "Ridge Regression Biased Estimation for Nonorthogonal Problem",Technometrics,12 .55-67 4. El-Dereny, M and. Rashwan, N.I.(2011), "Solving Multicollinearity problem Using Ridge Regression Modlels",Int.J.Contemp,Math.Sciences, 6. 585-600 . 5. Drapper, N.R. and Smith, H.(1981), ''Applied Regression Analysis'', Second Edition, John Wiley and Sons, New York . 6. Pasha,G.R and Ali Shah, M.A.(2004),"Application of ridge regression to multicollinear data", Journal of Research(Science), Bahauddin Zakariya University, Multan, Pakistan , 15. 97-106 325 | Mathematics @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 Table (1): Effect of six explanatory variables X1,X2 …X6 on response variable Y Y X1 X2 X3 X4 X5 X6 3193 7 79 305 230 1580 337 3506 8 80 390 266 1590 358 5203 8 81 415 280 1610 416 3118 9 84 425 330 1640 454 10565 9 85 434 368 1642 465 28245 7 137 692 416 1535 470 34701 35 200 759 440 1894 574 33660 3 833 2475 1222 353 533 45240 4 1153 2480 1285 345 733 51157 4 1285 2745 1141 311 873 65085 4 1353 2854 1087 350 878 62893 4 1331 2895 1082 382 908 Table (2): VIF for all variables Predictor VIF X1 25.279 X2 366.798 X3 220.081 X4 89.442 X5 884.559 X6 92.407 Table(3): MSE for each method MSE ORR1 0.074426 ORR2 0.093867 GRR 0.069632 Table(4): R2 for each method R-Square ORR1 95.94 ORR2 94.88 GRR 96.2018 326 | Mathematics @1a@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ÚÓ‘Ój�n€a@Î@Úœäñ€a@‚Ï‹»‹€@·rÓ:a@Âig@Ú‹©@Ü‹26@@ÖÜ»€a@I1@‚b«@H2013 Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 26 (1) 2013 مشكلة التعدد الخطي نحدار الحرف في معالجةاتوظیف طریقة حازم منصور كوركیس علي الرسول عبد شرىب جامعة بغداد /) الھیثم ابن( للعلوم الصرفة التربیة كلیة / قسم الریاضیات 2012تشرین االول 15قبل البحث في: ، 2012ایلول 11استلم البحث في: ةالخالص ھذه في أنموذج االنحدار الخطي العام التعدد الخطي ق عدیدة النحدار الحرف لحل مشكلة ائفي ھذا البحث قدمنا طر باالعتماد على طریقة أختیار معلمة ORR1, ORR2)(ق أنحدار الحرف االعتیادیة ائق تشمل نوعیین من طرائالطر ق على مجموعة من البیانات تعاني ائھذه الطر طبقتوقد )GRR(الحرف وكذلك تشمل على طریقة أنحدار الحرف العامة R2ومعامل التحدید (MSE)عدد الخطي بدرجة عالیة وباالعتماد على معیار متوسط مربعات الخطا لتمن مشكلة ا .ھي االفضل من الطریقتین االخرتین )(GRRین لنا ان طریقة أنحدار الحرف العامة تب.الغراض المفاضلة أنحدار الحرف االعتیادي،أنحدار الحرف العام، المقدرات المقلصة، تحلیل القیمة الشاذة، معامل التحدید: الكلمات المفتاحیة 327 | Mathematics