J. Nig. Soc. Phys. Sci. 3 (2021) 278–281 Journal of the Nigerian Society of Physical Sciences Ridge Estimation’s Effectiveness for Multiple Linear Regression with Multicollinearity: An Investigation Using Monte-Carlo Simulations O.G. Obadinaa, A. F. Adedotunb,∗, O. A. Odusanyac aDepartment of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Ogun State, Nigeria bDepartment of Mathematics, Covenant University Ota, Ogun State, Nigeria cDepartment of Mathematics, D.S Adegbenro (ICT) Polytechnic, Itori, Ogun State, Nigeria Abstract The goal of this research is to compare multiple linear regression coefficient estimation technique with multicollinearity. In order to quantify the effectiveness of estimations by the mean of average mean square error, the ordinary least squares technique (OLS), modified ridge regression method (MRR), and generalized Liu-Kejian method (LKM) are compared with the Average Mean Square Error (AMSE). For this study, the simulation scenarios are 3 and 5 independent variables with zero mean normally distributed random error of variance 1, 5, and 10, three correlation coefficient levels; i.e., low (0.2), medium (0.5), and high (0.8) are determined for independent variables, and all combinations are performed with sample sizes 15, 55, and 95 by Monte Carlo simulation technique for 1,000 times in total. As the sample size rises, the AMSE decreased. The MRR and LKM both outperformed the OLS. At random error of variance 10, the MRR is the most suitable for all circumstances. DOI:10.46481/jnsps.2021.304 Keywords: Monte-Carlo, Multicollinearity, Regression Model, Ridge Estimation, Simulations Article History : Received: 16 July 2021 Received in revised form: 30 September 2021 Accepted for publication: 05 October 2021 Published: 29 November 2021 c©2021 Journal of the Nigerian Society of Physical Sciences. All rights reserved. Communicated by: T. Latunde 1. Introduction Multiple linear regression (MLR), a widely used and well- known statistical technique, is now applied in a variety of fields [1,2]. This method is a statistical strategy that predicts the val- ues of a response by combining many predictors (independent variables). MLR’s purpose is to identify the optimal model for describing the linear relationship between predictor and re- sponse variables. After obtaining the best subsets of predictors, ∗Corresponding author tel. no: +2348055711272 Email address: adedayo.adedotun@covenantuniversity.edu.ng (A. F. Adedotun ) MLR’s main objective is to estimate coefficients, find the most fitting estimates, and minimize errors. The least squared ap- proach has long been used as a tool for estimating. It is a com- mon and acceptable approach. However, this technique has a multicollinearity constraint, which is a major MLR roadblock. Literatures on ridge regression are concerned with the prob- lem of finding a better substitute to the least square estimator. Common methods in dealing with multicollinearity include but not limited to ridge regression [3]. Estimation procedures are obtained using some specific assumptions such as the random vector ε should be independent and identically distributed ran- dom variables. But when these assumptions are violated, these methods do not yield the desirable results and leads to problems 278 Obadina et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 278–281 279 such as heteroscedasticity and autocorrelation [4]. Li, & Yang [5,7] suggested Jackknifed Modified Ridge Estimator (JMRE) and it was shown that it superior to other models. Giacalone et al, [7] define multicollinearity as a condition were regressor variables in multiple linear regression model is almost linearly dependent. This condition causes the variance of least squares estimator tends to be large and the estimator be- comes unstable. Hence, this condition will result in a reduced explanation of the result of the regression model and ridge re- gression is used to address these difficulties. Ref. [7] introduce the Lpmin method to determine and ad- dress the multicollinearity problem. The major advantage of the approach is that it produces more efficient estimates of the model’s parameters than the ordinary least squares method. Ref. [8] proposed a new collinearity diagnostic test. Monte Carlo simulation study conducted to compare the existing and pro- posed tests. It is based on coefficient of determination and ad- justed coefficient of determination on auxiliary regression of regressors while Ref. [9] examined estimators which are re- sistant to the combined problems of multicollinearity and non- normal disturbance distributions. Can the ridge estimator and some robust estimation technique be combined to produce a ro- bust ridge regression estimator? An algorithm that uses the α-level estimation method to evalu- ate the parameters of the ridge fuzzy regression model was pro- posed by Ref. [10]. Parameter bias, Type I and Type II error, and variance inflation factor (VIF) values produced by multi- ple regressions with two, four, and six predictors under vari- ous multicollinearity circumstances were examined [11]. Mul- ticollinearity is not linked to Type I error, but it does increase Type II error, according to the findings. Multicollinearity ap- pears to enhance the variability in parameter bias while result- ing in overall parameter underestimate. VIF is also increased by collinearity. Increasing the number of predictors, on the other hand, interacts with multicollinearity in all diagnostics to exac- erbate difficulties. An extended conventional semi-parametric partial linear regres- sion model was introduced [12]. The effectiveness of the pro- posed method is then illustrated through two numerical exam- ples including a simulation study. They also compared with some common fuzzy multiple regression models with fuzzy predictors and fuzzy responses. In the method of ridge regres- sion, a constant bias ridge k was added to X′X matrix. This work illustrates the use of Restricted Ridge Regression method in disabling the multicollinearity in regression model. The method was developed by using the prior information of the parameter β [13]. 2. Materials and Methods Ridge regression is suited to deal with the problem of mul- ticollinearity, especially when the predictors are highly corre- lated. In 1970, [14] proposed the ridge regression estimator, which included a scalar multiplication, the product of a pos- itive real number and the identity matrix, within the inverse component of the least square estimator. This yielded more ac- curate ridge parameter estimations than least square estimates, and its variance and mean square errors are frequently lower than least square estimates. The following is a comparison of the three approaches for estimating multiple linear regression coefficients: Least Squares Method (OLS), Modified Ridge Re- gression Method (MRR), and Liu Kejian Method (LKM): 2.1. Ordinary Least Square Method (OLS) The best linear unbiased estimator is a method for estimat- ing multiple linear regression coefficients that is unbiased and has the least variation of estimation (BLUE). The estimated value is expressed as β̂OLS = ( X′X )−1 X′Y (1) where X is the n × p predictors matrix, Y is the n × 1 obser- vation vector, β̂OLS is the vector of coefficients estimates. The mean square error of β̂OLS is σ2tr (X′X) −1. 2.2. Modified Ridge Regression (MRR The ridge estimation for multiple linear regression coeffi- cients is β̂Ridge = (X ′ X + kIn) −1 X′Y, k > 0, (2) where β̂Ridge is p × 1 ridge estimator, k is a positive real number also known as a constant bias ridge, In is an identity matrix of size n. The approximation of the linear regression coefficients is more accurate and closer to the real values when historical data is used with the ridge regression approach. The modified ridge regression (MRR) approach is as follows: β̂MRR = ( X ′ X + kIn )−1 (X ′ Y + k J) (3) where J is p × 1 historical observation vector, J = ( ∑p i=1 β̂OLS p )1, 1 is p × 1 vector of ones where every element is equal to one. From equation (3), β̂MRR = β̂OLS when k = 0 The estimation of k is considered using the following two cases: 1. σ2 is known, k̂ = pσ2(̂ βOLS −J )′(̂ βOLS −J ) −σ2 tr(X′X)−1 , i f (̂ βOLS − J )′ (̂ βOLS − J ) −σ2tr(X′X)−1 > 0 pσ2(̂ βOLS −J )′(̂ βOLS −J ), otherwise 2. σ2 is unknown, k̂ = pσ̂2(̂ βOLS −J )′(̂ βOLS −J ) −σ̂2 tr(X′X)−1 , i f (̂ βOLS − J )′ (̂ βOLS − J ) − σ̂2tr(X′X)−1 > 0 pσ̂2(̂ βOLS −J )′(̂ βOLS −J ), otherwise where σ̂2 = ( Y−Xβ̂OLS )′( Y−Xβ̂OLS ) n−p is an unbiased estimator of σ 2. The mean square of β̂MRR is σ̂2tr (( X ′ X + k̂In )−1 (X′X) ( X ′ X + k̂In )−1) + k̂2 (̂ βOLS − J ) ( X ′ X + k̂In )−2 (̂ βOLS − J ) 279 Obadina et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 278–281 280 2.3. Generalized Liu Kejian Method (LKM) In the situation of a multiple-relationship between the inde- pendent variables, this is a method for estimating the multiple linear regression coefficient. The advantages of the Ridge Re- gression approach and the Stein method are combined. The Generalized Kejian Method is the name of this method, and the form of the multiple linear regression coefficient estimator is β̂LK M = ( X′X + In )−1 (X′Y + dβ̂OLS ) , 0 < d < 1 (4) when d = 1, β̂LK M = β̂OLS and β̂LK M = ( X′X + In )−1 (X′Y + Dβ̂OLS ) = ( X′X + In )−1 (X′Y + D)̂βOLS ) = ( In − ( X′X + In )−2(In − D)) β̂OLS = ( In − ( X′X + In )−2(In − D)2) β̂OLS (5) where D = diag (d1, d2, . . ., dp), 0 < di < 1, i = 1, 2, . . ., p and the estimates of di is d̂i = 1 − σ̂ (λi + 1)√ λiβ̂ 2 OLS i + σ̂2 The mean square error of β̂LK M is (In− ∆2) (X′X) −1(In− ∆2)σ2 + ∆2ββ′∆2, where ∆ = (X′X + In) −1(In − D) 2.4. Monte Carlo Simulation A Monte Carlo simulation scenario with three and five in- dependent variables was developed by [8], the properties were zero mean normally distributed random error of variance 1, 5, and 10, and three correlation coefficients levels; i.e., low (0.2), medium (0.5), and high (0.8) are determined for independent variables, and all combinations are performed with sample sizes 15, 55, and 95 by Monte Carlo simulation. The steps for carry- ing out the simulation are: 1. The random error (ε) is simulated as ε ∼ N (0,σ2εIn), where σ2ε = 1, 5, 10. 2. An observation matrix, X, is simulated from X ∼ Nn(0, IN ) with different levels of polynomial relations such that ρ = 0.2, 0.5, 0.8. 3. Generate response values of Y from the model with mul- tiple linear regression coefficient β. 4. Multiple linear regression coefficients are estimated for all methods. The step in 1-3 are repeated 1,000 times in each scenario. 5. Then calculate the mean of the mean square error of mul- tiple linear regression, AMS E = 11000 ∑1000 i=1 MS E), the method with lowest AMSE is selected as the best method for the scenario involved. Table 1. The best method of all scenarios from 1000-Monte Carlo simulations Predictors σ ρ n=15 n=55 n=90 3 1 0.3 MRR MRR MRR 3 1 0.6 MRR MRR MRR 3 1 0.9 MRR MRR MRR 3 5 0.3 MRR MRR MRR 3 5 0.6 MRR MRR MRR 3 5 0.9 MRR MRR MRR 3 10 0.3 LKM MRR MRR 3 10 0.6 LKM LKM MRR 3 10 0.9 LKM MRR MRR 5 1 0.3 MRR MRR MRR 5 1 0.6 MRR MRR MRR 5 1 0.9 MRR MRR MRR 5 5 0.3 LKM MRR MRR 5 5 0.6 MRR MRR MRR 5 5 0.9 MRR MRR MRR 5 10 0.3 LKM MRR MRR 5 10 0.6 LKM LKM MRR 5 10 0.9 LKM LKM LKM 3. Conclusion The optimum multicollinearity MLR coefficients estimation approach for each simulation circumstance is shown in Table 1. Clearly, the OLS is not suited in every application. MRR and LKM are the best approaches for determining MLR coefficients when multicollinearity exists. MRR is appropriate for all sam- ple sizes and data with low predictor correlation degrees and small to moderate error variance. The Generalized Liu Kejian Method is well suited to small datasets with a high degree of predictor correlation and a large error variance. LKM outper- forms MRR as the number of predictors grows, however the more predictors, the greater the risk of multicollinearity. References [1] K. K. Adesanya, A. I. Taiwo, A. F. Adedotun & T. O. Olatayo “Modeling Continuous Non-Linear Data with Lagged Fractional Polynomial Regres- sion”, Asian Journal of Applied Sciences 06 (2018) 315. [2] G. Ciulla, & A. D’Amico “Building energy performance forecasting: A multiple linear regression approach”, Applied Energy 253 (2019) 113500. [3] H. Yang, & X. Chang “A new two-parameter estimator in linear regres- sion”, Communications in Statistics - Theory and Methods 39 (2010) 923 doi: 10.1080/0361092090280791. [4] B. M. G. Kibria “Performance of some new ridge regression estimators”, Communications in Statistics - Simulation and Computation, 32 (2003) 419. doi: 10.1081/SAC-120017499. [5] Y. Li, & H. Yang “On the performance of the jackknifed modified ridge estimator in the linear regression model with correlated or heteroscedastic errors”, Communications in Statistics - Theory and Methods 40 (2011) 2695. doi: 10.1080/03610926.2010.491589 [6] D. C. Montgomery, E. A. Peck E & G. G. Vining “Introduction to Linear Regression Analysis”, (United States: John Wiley & Sons) 2001. [7] M. Giacalone, D. Panarello, & R. Mattera “Multicollinearity in regression: an efficiency comparison between L p-norm and least squares estimators”, Quality & Quantity 52 (2018) 1831. [8] M. I Ullah, M. Aslam, S. Altaf, & M. Ahmed “Some new diagnostics of multicollinearity in linear regression model”, Sains Malaysiana 48 (2019) 2051. 280 Obadina et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 278–281 281 [9] A. F. Lukman, K. Ayinde, & A. S. Ajiboye “Monte Carlo study of some classification-based ridge parameter estimators”, Journal of Modern Ap- plied Statistical Methods 16 (2017) 24. [10] S. H. Choi, H. Y. Jung, & H. Kim, “Ridge fuzzy regression model”, In- ternational Journal of Fuzzy Systems 21 (2019) 2077. [11] M. R. Lavery, P. Acharya, S. A. Sivo, & L. Xu, “Number of predictors and multicollinearity: What are their effects on error and bias in regression?”, Communications in Statistics-Simulation and Computation 48 (2019) 27. [12] M. G. Akbari, & G. Hesamian “A partial-robust-ridge-based regression model with fuzzy predictors-responses”, Journal of Computational and Ap- plied Mathematics 351 (2019) 290. [13] F. A. O Rumere, S. M. Soemartojo & Y. Widyaningsih, “Restricted Ridge Regression estimator as a parameter estimation in multiple linear regres- sion model for multicollinearity case”, Journal of Physics: Conference Se- ries 24 (2021) 1725. [14] A. E. Hoerl & R. W. Kennard “Ridge Regression: Biased Estimation for Nonorthogonal Problems”, American Society for Quality 12 (1970) 44. 281