Microsoft Word - 212-221 https://doi.org/10.30526/31.1.1841 Mathematics | 212 2018) عام 1(العدد ) 31(المجلد مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 The Comparison Between Different Approaches to Overcome the Multicollinearity Problem in Linear Regression Models Hazim Mansoor Gorgees Fatimah Assim Mahdi Dept. of Mathematics/ College of Pure Science Ibn Al-Haithem/ Baghdad University hazim5656@yahoo.com fatima_assim92@yahoo.com Received in:10 /September/2017, Accepted in:22/ October /2017 Abstract: In the presence of multi-collinearity problem, the parameter estimation method based on the ordinary least squares procedure is unsatisfactory. In 1970, Hoerl and Kennard insert an alternative method labeled as estimator of ridge regression. In such estimator, ridge parameter plays an important role in estimation. Various methods were proposed by many statisticians to select the biasing constant (ridge parameter). Another popular method that is used to deal with the multi-collinearity problem is the principal component method. In this paper, we employ the simulation technique to compare the performance of principal component estimator with some types of ordinary ridge regression estimators based on the value of the biasing constant (ridge parameter). The mean square error (MSE) is used as a criterion to assess the performance of such estimators. Keywords: multi-collinearty, Ridge regression, Ridge parameter, condition number, Principal components.                     https://doi.org/10.30526/31.1.1841 Mathematics | 213 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 Introduction Consider the linear regression model (1) where is 1 vector of response variable, is matrix of explanatory variablesand n > p, is 1 vector of unknown parameters, is 1 vector of unobservable random errors and 0, The aim of regression analysis is to estimate the numerical values of linear model parameters. Recently, biased estimators of regression parameters get attention of many researchers, because the ordinary least squares procedure is unable to provide reasonable point estimates when the matrix of explanatory variables if there exists the problem of multi-collineardata. Where we refer through the paper to the ridge regression estimators and principal component estimators as alternatives to the ordinary least square estimators with multi-collineardata. The estimators of each ridge regression and principal component allow a small amount of bias in order to achieve a major reduction in the variance in contrast to ordinary least squares. The Case of Multi-collinearity The problem of multi-collinearity occurs when there exists an exact linear relationship or an approximate linear relationship among two or more explanatory variables, two types of multi- collinearity may be faced in regression analysis, exact and near multi-collnearity. During regression calculations, the exact linear relationship causes a division by zero which in turn leads to the abortion of the calculations. In case of not exact relationship, the calculations will not be aborted and the division by zero does not occur. Nevertheless, the results will be distorted when the division is done by a very small quantity. Therefore, the determination whether the multi-collinearity is a problem is one of the first steps in regression-analysis. Multi-collinearity can be thought of as a situation where two or more explanatory variables in the data set move together, as a consequence it is impossible to use this data set to decide which of the explanatory variables is producing the observed change in the response variable. Some multi-collinearty is nearly always present, but the important point is whether it is serious enough to cause appreciable damage to the regression analysis. Indicators of multi- collineaity include a low determinant of the information matrix X'X, a very high correlation among two or more explanatory variables, very high correlation among two or more estimated coefficients, a very small (near zero) eigenvalues of the correlation matrix of the explanatory variables and the too large condition number. The Class of Shrinkage Estimators Applying the singular value decomposition technique, we can decompose the matrix X as follows [1] 1 2X = H Λ G' (2) where H is ( n p ) matrix satisfying H'H =Ip, 1 2 is a (pp) diagonal matrix of ordered singular values of X. 1 1 1 2 2 2 1 2 pλ λ ... λ > 0   , G is a (pp)orthogonal matrix whose columns represent the normalized eigenvectors of X'X. Consequently, the ordinary least squares estimator of the regression parameters vector  can be rewritten as: https://doi.org/10.30526/31.1.1841 Mathematics | 214 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 1 OLSb = (X'X) X'y  = 1 1 2(G ΛG') G Λ H'y = 1 2 G Λ H'y  = GC where C = 1 - 2 Λ H'y = G' OLSb is the vector of uncorrelated components of OLSb . This can be noticed by considering the variance-covariance matrix of C that can be easily shown to equal the diagonal matrix 2 1  . The generalized shrinkage estimators denoted by SHb can be defined as SHb = p j j j j=1 GΔC = g δ C  (3) Where jg  is thejthcolumn of the matrix G, jδ is the j thdiagonal element of the shrinkage factors diagonal matrix  , j0 δ 1, j = 1, 2,..., p  and jC is thej thelement of the uncorrelated components vector C. Ordinary Ridge Regression Estimators The most popular method that has been proposed to deal with multi-collinearity problem is the ordinary ridge regression. The ordinary ridge regression method is a modification of ordinary least squares method to allow the biased estimators of regression coefficients. The ridge estimators depend crucially upon an exogenous parameter, say k, called the ridge parameter or the biasing parameter of the estimator. For any k 0 , the corresponding ordinary ridge estimator denoted by RRb is defined as: 1 RRb = ( X'X + kI) X'y  (4) wherek 0 is a constant selected by the statistician according to some intuitively plausible criteria put forward by Hoerl and Kennard [2]. It can be shown that the ridge regression estimator given in equation (4) is a member of the class of shrinkage estimators as follows By using matrix algebra and singular value decomposition approach we get 1 RRb = ( X'X + kI) X'y  = 1 1 2[ G (Λ + kI) G'] G Λ Hy = 1 1 2G ( Λ + kI) G'GΛ H'y = 1 1 2G( Λ + kI) Λ H'y = 1 - 1 2G [ (Λ + kI) Λ] Λ H'y = GΔC (5) Where 1Δ = ( Λ + kI) Λ . Equivalently, the shrinkage factors jδ , j= 1,2,...,p of the ridge estimator has the form j δ = j j λ λ + K (6) Where jλ is thej th element (eigenvalue) of the diagonal matrix  , and K is the ridge parameter. The mean square error of ordinary ridge regression estimator can easily demonstrated to be[2] https://doi.org/10.30526/31.1.1841 Mathematics | 215 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 p 2 2 2i RR 2 i=1 i λ MSE(b ) = σ + K β'(X'X + kI) β (λ + K)  (7) The first term can be shown to be the sum of variances(total variance) of the parameter estimates and the second term can be considered to be the square of the bias introduced when RRb is used instead of OLSb . Choice of Ridge Parameter The ordinary ridge regression estimators do not provide a unique solution to the multi- collinearity problem, but provide a family of solutions. These solutions depend upon the ridge parameter (the value of k). No explicit optimum value can be found for k. Yet, several stochastic choices have been proposed for this ridge parameter. Some of these choices may be summarized as follows Hoerl and Kennard (1970).Suggested graphical method called ridge trace to select the value of the ridge parameter k. When viewing the ridge trace, the analyst picks the value of k for which the regression coefficients have stabilized. Often, the regression coefficients will vary widely for small values of k and then stabilize. We have to select the smallest value of k (which introduced the smallest bias) after which the regression coefficients have seemed to remain constant. Hoerl, Kennard and Baldwin in (1975), proposed another method to select a single value of K given as [3] 2 HKB OLS OLS p S = b ' b k  (8) Where p is the number of explanatory variables, 2S is the OLS estimator of 2σ and OLSb is the OLS estimator of the vector of regression coefficients . Lawless and Wang (1976) proposed selecting the value of K by using the formula [4] 2 OLS OLs p S = b ' X'X b Lwk  (9) Assuming that the regression coefficients vector has certain prior distribution srivastava followed Bayesian approach to estimate the ridge parameter. He concluded that [5] Bayes OLS OLS 2 tr(X'X) = Max[0, b ' X'X bn p 3 [ ( ) p] n p 1 S k       (10) Where tr (X'X) denote the trace of the matrix X'X. Hazim Mansoor Gorgees and Fatimh Assim Mahdi (2017) proposed a new method for selecting the ridge parameter by employing the concept of condition number [6]. The suggested estimator denoted as ^ CNk is defined as 2^ 1 [ 0, ] ' CN OLS OLS S k Max b b CN    (11) Where CN reffered to the condition number which is the ratio of the largest to the smallest singular value of the matrix of explanatory variables X. Principal Components Regression Ridge regression was offered as a technique which attempted to overcome the multi- collinearity problem. An alternative procedure known as principal components approach, was first proposed by Harold Hoteling (1933). https://doi.org/10.30526/31.1.1841 Mathematics | 216 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 In order to obtain a good realization of this approach let us proceed our discussion with the case of two predictors .If these predictors are correlated then the matrix X will not be orthogonal consequently, this will complicate the interpretation of the effects of and on the response variable y. From the geometric point of view, suppose we rotate the coordinate axis so that in the new system, the predictors are orthogonal. Moreover, let us make the rotation so that the first axis lies in the direction of the greatest variation in the data, the second axis lies in the direction of the second greatest variation in the data. These rotated directions ( and say in our two predictors' case) are simply linear combinations of the original predictors. We now illustrate how these directions can be calculated. Using singular value decomposition then X = H 1 2 G' where each of H,  , G is defined earlier X'X = G 1 2 H'H 1 2 G' = G  G' Since G is orthogonal matrix then the general linear regression model y = Xβ +  can be rewritten as y = X GG' β +  = Z +  (12) Where Z= XG and  = G'β Hence: Z'Z = G'X'XG = G' (G  G') G =  = diag( , ,…, Where ⋯  0 are the eigenvalues of X'X. The columns of G are the eigenvectors of X'X and the columns of Z are the principal components of X and these are orthogonal to each other. Thus, the procedure creates a set of artificial variables 's jz from the original ' sjX via a linear transformation Z = XG in such a way that the Z vectors are orthogonal to each other. The Corresponding to the largest value is called the principal component and it explains the largest proportion of the variation in the standardized dataset. Further, ' s jz explain smaller and smaller until all variation is explained. Typically, one does not use all the 's jz but follows some type of selection rule. No universal rule is presented for selecting the components. Some statisticians use the rule that only eigenvalues greater than 1 are of interest. Other statisticians suggested that the components might be computed until some arbitrarily large proportion ( maybe 0.75 or more ) of the variances has been explained the OLS estimator of  is given as: ^ 1 1( ' ) ' ' 'Z Z Z y G X y    (13) Assuming that the first q ( q p ) principal components are selected, then the reduced estimator can be written as https://doi.org/10.30526/31.1.1841 Mathematics | 217 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 ^ 1 1( ' ) ' ' 'q q q q q qZ Z Z y G X y     (14) Where qZ = X qG , qG denote as the first q eigenvectors of X'X matrix and q is the diagonal matrix contains the first q eigenvalues of X'X. To find the principal component estimator of the regression coefficients in terms of the original variables we can solve  = G'β for β to get β = G since G is orthogonal matrix. Let PCb denote the principal component estimator of β then ^ P Cb G  If q principal components are selected then ^ 1 ( ) ' 'qPC q q q q qb G G G X y    (15) The Simulation Results To exhibit multi-collinearity in the simulated data, we use different degrees of correlation between the variables included in the model. Specifically, we assume correlation values to be ρ 0.75, 0.80 and 0.95 ,four predictor variables have been generated. Since the performance of different estimators is influenced by the sample size, we have used three types of samples, small of size 20, median of size 50,80 and large of size 200. The standard deviations of the error terms are taken as σ 10, 25 and 30 . Ordinary ridge estimates are computed using different ridge parameters given in equation (8) to (11) and the principal components regression given equations (12) to (15). The mean square error (MSE) is used as a criterion in order to assess the performance of the stated methods. This experiment is repeated 1000 times. And the results are presented in tables (1), (2) and (3). https://doi.org/10.30526/31.1.1841 Mathematics | 218 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 Table (1):The values of MSE at 0.75  n  Method  Standard deviation    10  25  30  20  PC  3.9252e‐018    9.4206e‐018    1.1776e‐017      HKBk    0.0215    0.0215      0.0214  LWk        0.0027  0.0027      0.0027       Bayesk        0.0275  0.0276      0.0276  CNk    1.2561e‐017  8.6355e‐018  1.9626e‐018  50  PC  2.9790e‐018    4.9651e‐018    1.4895e‐017      HKBk        0.0217      0.0220      0.0220  LWk    8.2379e‐004    8.2387e‐004    8.2388e‐004       Bayesk        0.0151      0.0151      0.0151  CNk      2.8797e‐017  1.2909e‐017  1.9860e‐018  80    PC  3.9252e‐018  4.7103e‐018  1.2561e‐017      HKBk    0.0287   0.0287  0.0287  LWk    6.2243e‐004  6.2243e‐004   6.2243e‐004       Bayesk    0.0575  0.0576  0.0576  CNk    2.0411e‐017  4.2392e‐017  8.6355e‐018  200  PC  2.4205e‐018  2.7680e‐017  4.2948e‐017      HKBk    0.0271      0.0273  0.0273  LWk    1.1161e‐004  1.1161e‐004  1.1161e‐004       Bayesk    0.0370  0.0372  0.0372  CNk    3.4524e‐004  7.7579e‐017  2.8425e‐017  https://doi.org/10.30526/31.1.1841 Mathematics | 219 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 Table (2):The values of MSE at 0.80  n  Method  Standard deviation    10  25  30  20  PC    5.8878e‐018  9.4206e‐018    2.3551e‐018      HKBk    0.0215  0.0215      0.0214  LWk        0.0027      0.0027      0.0027       Bayesk        0.0275      0.0276      0.0276  CNk    1.8056e‐017  3.9252e‐018    1.4916e‐017  50  PC  7.9441e‐018  1.6881e‐017    6.9511e‐018      HKBk        0.0217      0.0220  0.0220  LWk    8.2379e‐004    8.2387e‐004  8.2388e‐004       Bayesk    0.0151      0.0151  0.0151  CNk    3.4755e‐017    6.7525e‐017  2.1846e‐017  80  PC  2.7477e‐017  1.2561e‐017  4.7103e‐018      HKBk    0.0287   0.0287   0.0287  LWk    6.2243e‐004  6.2243e‐004   6.2243e‐004       Bayesk     0.0575  0.0576  0.0576  CNk     7.8505e‐018  1.8056e‐017   5.4953e‐018  200  PC  2.6067e‐018  4.2762e‐017  1.8867e‐017      HKBk    0.0271  0.0273  0.0273  LWk    1.1161e‐004  1.1161e‐004  1.1161e‐004       Bayesk    0.0370  0.0372  0.0372  CNk    3.4879e‐004  1.1935e‐016  2.4701e‐017  https://doi.org/10.30526/31.1.1841 Mathematics | 220 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 Table (3):The values of MSE at 0.95  n Method Standard deviation 10 25 30 20 PC 3.9252e-18 9.4206e-018 9.8131e-018 HKBk  0.0215 0.0215 0.0215 LWk  0.0027 0.0027 0.0027 Bayes k  0.0275 0.0276 0.0276 CNk  2.5121e-017 8.6355e-018 1.0206e-017 50 PC 1.7874e-017 5.9581e-018 0 HKBk  0.0216 0.0220 0.0220 LWk  8.2376e-004 8.2387e-004 8.2387e-004 Bayes k  0.0151 0.0151 0.0151 CNk  7.9441e-018 1.7874e-017 2.3832e-017 80 PC 7.8505e-018 1.5701e-017 4.7103e-018 HKBk  0.0287 0.0287 0.0287 LWk  6.2243e-004 6.2243e-004 6.2243e-004 Bayes k  0.0575 0.0576 0.0576 CNk  1.3346e-017 2.0411e-017 5.4953e-018 200 PC 2.5446e-018 4.4686e-018 2.4081e-017 HKBk  0.0271 0.0273 0.0273 LWk  1.1161e-004 1.1161e-004 1.1161e-004 Bayes k  0.0370 0.0372 0.0372 CNk  3.8924e-004 1.1116e-016 1.6720e-016 Conclusions In the sense of mean square error (MSE) as a criterion of performance. In our paper "An Alternative Approach for selecting Ridge Parameter for Ordinary Ridge Regression Estimator Regression Estimator" International Journal of Science and Research (IJSR).We made the comparison between the performance of different type ordinary ridge regression as well as the generalize ridge regression and we found the proposed method was the best in the since of MSE, while in this paper we introduced another which will be known method of estimation which is the principal component method and compared it with many different types of ridge https://doi.org/10.30526/31.1.1841 Mathematics | 221 2018) عام 1(العدد ) 31(لمجلد ا مجلة إبن الهيثم للعلوم الصرفة و التطبيقية Ibn Al-Haitham J. for Pure & Appl. Sci. Vol.31 (1) 2018 regression estimator, the simulation results displayed that the principal components estimator performs better than all types of ordinary ridge regression estimators that are included in this articles while the ordinary regression estimator based on the ridge parameter ^ CNk seems to be better than other studied types of ordinary ridge regression estimators in all conditions of multi- collinearity levels, sample sizes and the standard deviations of the error terms. However, for the purpose of future works, many other methods can be used to overcome the multi-collinearity problem such as the generalized inverse method and the jackknife ridge regression method. References [1] H.M. Gorgees, “Using Singular Value Decomposition Method for Estimating the Ridge Parameter", Journal of Economic and Administrative Science, 1-10. 2009 [2] A. E. Hoerl and R.W. Kennard. "Ridge Regression: Biased Estimation of Nonorthogonal Problems", Techno metrics. 55-67. 1970 [3] A.E. Hoerl, R. W. Kennard and K.F. Baldwin. "Ridge Regression: some simulation" Communications in Statistics. 105-123. 1975 [4] J.F Lawless and p. Wang. "A simulation study of Ridge and other Regression Estimators", Communications in Statistics - theory and Methods .1177-1182. 2005 [5] M.S srivastava. "Methods of Multivariate Statistics", Wiley, New York. 2002 [6] H.M. Gorgees, and F. A. Mahdi, "An Alternative Approach for selecting Ridge Parameter for Ordinary Ridge Regression Estimator Regression Estimator", International Journal of Science and Research (IJSR). 2426-2429. 2017