Journal of Applied Economics and Business Studies, Volume. 5, Issue 1 (2021) 131-142 https://doi.org/10.34260/jaebs.517 

131 

 
Journal of Applied Economics and 
Business Studies (JAEBS) 

Journal homepage: https://pepri.edu.pk/jaebs 

ISSN (Print): 2523-2614 

   ISSN (Online) 2663-693X 

 
Evaluation of Ridge, Elastic Net and Lasso Regression 

Methods in Precedence of Multicollinearity Problem: A 

Simulation Study 

 
Shady I. Altelbany1 

1. Faculty of Economic and administrative Sciences, Al Azhar University– Gaza, Palestine 

Abstract 

This study aims at performance evaluation of Ridge, Elastic Net and 

Lasso Regression Methods in handling different degrees of 

multicollinearity in a multiple regression analysis of independent 

variables using simulation data. The researcher simulated a collection 

of data with sample size n=200, 1000, 10000, 50000 and 100000, 

independent variables p=10. The researcher compared the 

performances of the three methods using Mean Square Errors (MSE). 

The study found that Elastic Net method outperforms Ridge and Lasso 

methods to estimate the regression coefficients when a degree of 

multicollinearity is low, moderate and high for any sample size. While, 

Lasso method is the most accurate regression coefficients estimator 

when data containing severe multicollinearity at sample size less than 

10000 observations. 

 Keywords  
Ridge, Lasso, 

Elastic Net, 

Multicollinearity, 

Regression. 

JEL 

Classification 

C02, C31, C63 

 
1 Introduction 

Multiple linear regression is frequently employed is appropriate in particular 

context to evaluate a model to predict the expected responses, or to explore the link 

between the dependent variable and the independent variables. The first goal, which is 

the design's prediction accuracy, is critical; however, the second goal, which is the 

model's complexity, is more important. Common linear regression procedures are 

popular for generally not carrying out well according to both prediction performance 

and model involvement (Doreswamy and Vastrad, 2013). There is a high number of 

hypotheses about the model in the regression analysis, specially, the most important 

one is (multicollinearity), in addition to (non-homogeneity of variance, autocorrelation 

and linearity)  . If one or more assumptions are broken, the model becomes unreliable, 


Shady I. Altelbany 

132 

and it is no longer suitable for estimating population parameters (Herawati et al., 

2018). 

When there is a close association or interaction between two or more independent 

variables, multicollinearity occurs in the study of multiple regression. 

Multicollinearity has the potential to produce inaccurate regression coefficient 

choices, increase regression coefficient standard errors, deflate partial t-tests for 

regression coefficients, produce wrong, non-significant p-values, and minimize model 

predictability. (Draper and Smith, 1998; Gujarati 1995). 

The key issue with multicollinearity is that as the degree of collinearity rises, the 

coefficient estimates in the regression model become unsteady, and the standard errors 

for the coefficients become wildly maximized. Multicollinearity has two types; the 

first type is (full/perfect/exact multicollinearity and the second one is partially/less 

than perfect multicollinearity). The existence of the first type is when independent 

variables interrupted in a complete way. This means that no particular least squares 

solution to a multiple regression analysis can be computed under this condition. 

(Slinker, and Glantz1985).  

Since we know that multicollinearity is a serious problem when trying to make 

inferences or find predictive models, it's crucial to figure out the best way to deal with 

it. (Judge 1988). Multicollinearity can be detected using a variety of techniques and 

methods. Using pair-wise scatter plots of the independent variables, searching for 

near-perfect relationships, analyzing the correlation matrix for high interactions and 

the variance inflation factors (VIF), using eigenvalues of the correlation matrix of the 

independent variables, and testing the signs of the regression coefficients are just a 

few of the common approaches. (Montgomery and Peck, 1992; Kutner et al., 2005).  

The reduce of variance at the cost of introducing a group of bias. The scholars call 

this method as “regularization or shrinkage methods” and is roughly beneficial for the 

predictive appearance of the model. In the study of current data, regularization is 

crucial. To overcome the shortcomings of ordinary least squares regression in terms of 

prediction precision, we introduced regularized regression methods for linear 

regression. Methods of regularization aid in the formalization of a unique solution to 

this well-posed problem. Any coefficients are reduced to zero using these techniques. 

This does not help with descriptor collection on its own, but it does reduce the gap at 

the expense of a small increase in bias. This form, on the other hand, improves the 

estimate's generalization. (Doreswamy and Vastrad, 2013). The least absolute 

shrinkage and selection operators Ridge, lasso, and Elastic Net are among the 

methods. 

Using simulated data, this study examines three different regression methods to 

see which one works better for coping with multicollinearity obstacles. 

 
Journal of Applied Economics and Business Studies, Volume. 5, Issue 1 (2021) 131-142 https://doi.org/10.34260/jaebs.517 

133 

2 Materials and Methods: 

At first, we need to consider the basics of regression and what parameters of the equation 

they changed when using a distinct model. The relationship between a dependent variable and 

independent variables can be estimated using a multiple linear regression model and the 

Ordinary Least Square (OLS) method. If data comprises in observations,  
1

,
n

i i i
y x

=
 each 

observation has a scalar response yi and a vector of p independents xij for j=1,...,p, we can 

write a multiple linear regression model as: 

Y X  = +                                                        (1) 

Where 1nY   is the dependent vector variable, n pX  symbolizes the independent 

variables, 
1p




 is the set of regression coefficients that needs to be estimated, and 
1p




 
symbolizes the residuals. 

( )
1ˆ ols X X X Y
−

 =                     (2) 

the regression coefficients are calculated using Ordinary Least Square by reducing 

the squared distances between the predicted dependent variable and the observed 

(Montgomery and Peck, 1992). 

When building a regression model, the model becomes more complicated as the number 

of data and variables grows, and major optimization issues arise (Zou and Hastie, 2005). 

Furthermore, classical regression analysis fails when assumptions such as constant variance, 

multi-collinearity, and normality are not met (Ogutu et al., 2012). As a result, high coefficients 

in the model must be corrected, or penalized.  

Regularized regression is a form of regression in which the coefficient estimates are 

constrained to zero. It penalizes the magnitude of the error term as well as the magnitude of 

the coefficients. Complex models are discouraged, mainly to prevent over-fitting.  A typical 

least squares model has some flaws, such as the fact that it does not generalize well to data 

sets other than its training data. Regularization greatly reduces the model's variance while 

having little effect on its bias. The effect on bias and variance is dominated by the tuning 

parameter λ used in the regularization systems mentioned. As the value of λ increases, the rate 

of coefficients decreases, lowering the variance. To some extent, this increase in  λ is 

advantageous because it only reduces variance (thus preventing over-fitting) while losing no 

significant properties in the results, but after a certain value, the model loses significant 

properties, resulting in bias and under fitting.  Accordingly, the value of λ they should select it 

carefully ( Biswas, 2019). There are three kinds of regularization systems called the Ridge, 

Lasso, and Elastic Net. 

Ridge regression corrections are made with squared values, while Lasso regression 

corrections are made with absolute values. The ridge and Lasso biased estimation 

regression methods are combined in Elastic Net regression (Zou and Hastie 2003). 

  
https://medium.com/@imsaikatb?source=post_page-----ad9ff80f9ccc--------------------------------


Shady I. Altelbany 

134 

2.1 Ridge Regression: 

It is obvious that Ordinary Least Square (OLS) is unsteady and presents estimates having 

a lot of variance when multicollinearity appears among independent variables, e.g. the 

columns of X are strikingly correlated. Hoerl et al (1975) develop ridge Regression and this 

approach is the adjustment of the least squares method, which allows for biased regression 

coefficient estimators. (Myers, 1986). 

Ridge regression approach depends on adding a ridge parameter to the diagonal of ( )X X

matrix resulting a new matrix ( )X X  +  . Since the diagonal of systems in the correlation 
matrix can be interpreted as a ridge, we call it ridge regression (Hoerl and Kennard, 2000). 

The ridge formula to find the coefficients is: 

( )
1ˆ , 0

ridge
X X X Y  

−
 = +                                    (3) 

When  λ equal zero, the ridge estimator appears as the Ordinary Least Square (OLS). If 

they all λ 's are like each other, the estimators that resulted are called the ordinary ridge 

estimators (Hoerl, 1962; Hoerl et al., 1975). It is usually acceptable to edit ridge regression in 

Lagrangian form: 

 
 

2 2

22

ˆ arg min
ridge

y X


   = − +                         (4) 

Where ( )
2

2

2
1

n
T

i i

i

y X y x 
=

− = − is the L2 -norm (quadratic) loss function (i.e. 

residual sum of squares), 
T

i
x  is the row of X, 

2 2

2
1

p

j

j

 
=

=    is the L2 - norm penalty 

on    and 0      is the tuning parameter (penalty, regularization) that controls the 

power of the penalty (linear shrinkage) by selecting the relative importance of the 

data-dependent practical error and the penalty term. The vaster the value of, the 

greater is the amount of shrinkage. Since the value of is reliant on the data, we can 

find it out using data-possessed techniques, includes; cross-validation (Doreswamy 

and Vastrad, 2013). 

By constraining the coefficient estimates, Ridge regression can overpower this 

multicollinearity, as a result, it may decrease the estimator's variance while also 

introducing bias. (James et al., 2013). 

2.2 Lasso Regression: 

They broadly used lasso regression approaches in handling with big databases, 

such as those used in drug discovery, where efficient and quick algorithms are 

required (Hastie and Friedman, 2010) the Lasso estimator is also recognized as basis 

pursuit (Chen et al., 1998). Still, Because there are steep correlations between 

descriptors, Lasso will choose one and ignore the others, and when all descriptors are 

file:///C:/Users/ابودلال%202016/Downloads/e.g


Journal of Applied Economics and Business Studies, Volume. 5, Issue 1 (2021) 131-142 https://doi.org/10.34260/jaebs.517 

135 

the same, it will decrease. The Lasso penalty looks for many coefficients that are 

similar to zero, with only a small subset of them being the best (and not equal zero). 

To get a sparse solution to the following expansion problem, the Lasso estimator uses 

the L1 penalized least squares basis (Tibshirani, 1996). 

 
2

12

ˆ arg min
Lasso

y X


   = − +                               (5) 

Where 
1

1

p

j

 
=

=    is the L1-norm penalty on  , that causes the solution to 

become sparse, and 0   is a parameter for fine tuning. Penalizing the absolute 
values of the coefficients introduces shrinkage towards zero, likewise ridge regression. 

In contrast, unlike ridge regression, It reduces certain coefficients to zero; such 

solutions have a large number of identically zero values. The penalty acts as a 

continuous variable selection tool (Herawati, 2018).  The Lasso estimation method 

handles both the multicollinearity issue and best feature selection together in the high 

dimension linear regression model. Nonetheless, according to Hastie and Zou (2005) 

Lasso estimation procedure is unstable if the amount of predictors is greater than the 

amount of observations. Further, the prediction performance of RE dominates Lasso if 

there is high multicollinearity among predictors. 

2.3 Elastic Net Regression: 

According to (Friedman et al., 2010) This is a continuity of the Lasso that is robust 

to the strongest correlations among the predictor variables.. In order to prevent the 

imbalance of the Lasso solution paths when predictor variables are strongly correlated, 

they projected the Elastic Net for assessing high-dimensional data. Zou and Hastie 

(2005) recommended Elastic Net estimator by using a mixture of ridge and lasso, and 

it is: 

 
2 2

1 21 22

ˆ arg min
Elastic Net

y X


     = − + +            (6)  

The regularization parameter   is the sum of two nonnegative penalties 

1 2
  = + , Now, let 2

1 2




 
=

+
 then 1

1 2

1



 

− =
+

, where 0 1  . Further, it 

can be defined as: 

( ) 
2 2

1 22

ˆ arg min 1
Elastic Net

y X


       = − + + −
 

              (7)  

Note that 0 = , and then Elastic Net estimator in equation (7) is equivalent to 
Ridge. Similarly, 1 = , and then Elastic Net estimator in equation (7) is equivalent to 
Lasso.    If 0 = , so using this method, the elastic net method, decreases to ordinary 
least squares regression. 


Shady I. Altelbany 

136 

 Hence, we can write the Ridge, Lasso and Elastic Net estimator in a common 

form in the mis specified regression model as below: 

( ) ( ) 
2 2

1 22

ˆ arg min 1y X




       = − + + −
 

         (8)  

Where         
( )

ˆ , 0

ˆ ˆ , 0 1

ˆ , 1

Ridge

Elastic Net

Lasso



 

  

 

 =



= 


=

  
The MSE, which is the scheduled prediction error of the estimators is given by: 

( )( ) ( )( ) ( )( )
1ˆ ˆ ˆ

new new new new
MSE y X y X

n
  

  


= − −                        (9) 

where 
( ),new newy X  includes new observations that are not used to obtain the 

coefficient estimates ( )
ˆ



 . 

In brief, the following are some salient distinctions between Lasso, Ridge and 

Elastic Net (Hastie et al., 2001): 

• Lasso has a sparse selection, unlike Ridge which does not have. 

• Ridge regression shrinks the two coefficients towards one another if we have 
extremely correlated variables. Furthermore, Lasso is neutral and picks one 

over the other. In terms of context, no one would know which variable was 

chosen. Elastic Net is an adjustment between the two which attempts to shrink 

and do a sparse selection at the same time. 

• Ridge estimators are neutral to multiplicative scaling of the data If constants 
multiplied both X and Y, the coefficients of the fit do not change for λ 

parameter. However, for Lasso, the fit is not separate from the scaling. In fact, 

the multiplier must scale the λ parameter up to get the same result. It is more 

complicated for Elastic Net. 

• In a comparison with Lasso, Ridge penalizes the largest β‘s rather than it 
penalizing the small ones (as it square them in the penalty term). Lasso 

penalizes the small ones more consistently. Sometimes, This is of no 

consequence. When faced with a forecasting issue involving a strong predictor, 

the Ridge shrinks the predictor’s effectiveness as compared to the Lasso. 

3 Results: Simulation Study  

Using R package, we simulate the linear regression model  for number of data 

n = 200, 1000, 10000, 50000, 100000 observations and 10 independent variables. To 

explore the implements of different grades of multicollinearity on the estimators,  we 


Journal of Applied Economics and Business Studies, Volume. 5, Issue 1 (2021) 131-142 https://doi.org/10.34260/jaebs.517 

137 

choose ( )0.70, 0.80, 0.90, 0.99 = which represent low, moderate, high 
multicollinearity and severe multicollinearity. (Mcdonald and Galarneau, 1975) 

generate the independent variables: 

                   ( )
1/ 2

2
1 , 1, 2,..., & 1, 2,...,

ij ij ij
x u u i n j p = − + = =                       (10) 

Where 
ij

u   are independent, standard normal pseudo-random numbers and    is 

fixed, so that any two independent variables' theoretical correlation is defined by 
2

 . 

Performance Assessment of ridge, lasso and elastic net approaches are 

compared on the basis of MSE value. Cross validation is a technique for determining a 

value for the    value for Ridge, Lasso and Elastic Net, and are displayed in  
Tables 1-4.  

The estimated MSE values of the ridge, lasso and elastic Net estimators versus 

regularization parameter when 0.70, 0.80, 0.90 0.99and   = = = =  and the 

optimal value of regularization parameters are summarized in Table 1–4.  

We can observe in table 1-3, that Elastic Net was outperforms than Ridge and 

Lasso at n = 200, 1000, 10000, 50000, 100000 observations if degrees of 

multicollinearity are low, moderate, high. 

In Table 4 we can show that Lasso method was outperforms than Ridge and 

Elastic Net when 𝜌 = 0.99 (severe multicollinearity) at n = 200, 1000, 10000 
observations, while at n = 50000, 100000 observations was showed that Elastic Net 

method was the best. Elastic Net method is the most accurate regression coefficients 

estimator. 

 
Shady I. Altelbany 

138 

  Table 1 MSE values and optimal value of Regularization parameter when 0.70 =   

0.7 =  

Method    
n=200 n=1000 n=10000 n=50000 n=100000 

   MSE    MSE    MSE    MSE    MSE 

Ridge 0 2.4943870 1.3542480 2.435132 1.77365 2.393143 2.071687 2.397843 1.90527 2.398618 1.925571 

Elastic Net 

0.1 0.8556920 0.8115550 0.5246333 1.09823 0.4280494 1.111337 0.3560721 1.026351 0.3561871 1.011122 

0.2 0.6812518 0.8255325 0.4584058 1.094004 0.3740143 1.111348 0.3414572 1.034078 0.3112236 1.013724 

0.3 0.4984487 0.8274622 0.3353999 1.086273 0.3003344 1.105096 0.2741909 1.030563 0.2499133 1.009879 

0.4 0.4502874 0.8562786 0.2760757 1.08463 0.2472124 1.10109 0.247698 1.034514 0.2257661 1.013563 

0.5 0.3953519 0.8766654 0.2919647 1.08985 0.2170523 1.101262 0.1981584 1.029075 0.1982224 1.014189 

0.6 0.3294599 0.8792247 0.2670257 1.091589 0.1808769 1.09714 0.1812321 1.031515 0.1651853 1.009974 

0.7 0.3099273 0.9037126 0.2085462 1.086522 0.1701533 1.100581 0.1704875 1.035552 0.155392 1.013765 

0.8 0.2251437 0.8662195 0.2002693 1.088843 0.1488841 1.098445 0.1359241 1.027507 0.135968 1.011472 

0.9 0.2410546 0.9089749 0.1953736 1.092268 0.1452446 1.103449 0.1208214 1.025793 0.1208604 1.008507 

Lasso 1 0.2169491 0.9112325 0.1602154 1.088178 0.1307201 1.102091 0.1193412 1.031095 0.1087744 1.00979 

Table 2 MSE values and optimal value of Regularization parameter when 0.80 =   

0.8 =  

Method    
n=200 n=1000 n=10000 n=50000 n=100000 

   MSE    MSE    MSE    MSE    MSE 

Ridge 0 3.072037 2.377661 2.65657 1.925118 2.72915 2.103151 2.720972 2.013501 2.710478 2.047791 

Elastic Net 

0.1 0.9602314 1.001544 0.6281431 1.120764 0.4881492 1.111701 0.4040558 1.034427 0.4417403 1.02362 

0.2 0.76448 0.9836472 0.5000907 1.101727 0.4265273 1.110792 0.3874714 1.040867 0.3516877 1.016339 

0.3 0.5593439 0.9327243 0.401574 1.091794 0.375896 1.112307 0.3111404 1.03481 0.3099403 1.01822 

0.4 0.5052988 0.9567745 0.3305453 1.086529 0.281922 1.100099 0.2810773 1.03758 0.2551193 1.014054 

0.5 0.4436518 0.9626045 0.3495691 1.09122 0.2716609 1.10776 0.2467856 1.03770 0.2458337 1.022227 

0.6 0.3697098 0.9457097 0.3197097 1.091918 0.2062727 1.096673 0.2257057 1.039964 0.2048614 1.016942 

0.7 0.3477909 0.9656147 0.2496922 1.085563 0.2129625 1.107571 0.2123244 1.044483 0.1927159 1.020984 

0.8 0.2772824 0.9290207 0.2184807 1.08461 0.1863422 1.105263 0.1692793 1.034339 0.1686264 1.018251 

0.9 0.2968779 0.9840657 0.2131398 1.086901 0.1817869 1.111765 0.1504705 1.031538 0.1498901 1.01628 

Lasso 1 0.2434537 0.9478264 0.1918258 1.086458 0.1636082 1.110342 0.1354234 1.032729 0.1349011 1.014802 


Journal of Applied Economics and Business Studies, Volume. 5, Issue 1 (2021) 131-142 https://doi.org/10.34260/jaebs.517 

139 

 Table 3 MSE values and optimal value of Regularization parameter when 0.90 =   

0.9 =  

Method    
n=200 n=1000 n=10000 n=50000 n=100000 

   MSE    MSE    MSE    MSE    MSE 

Ridge 0 3.3935400 2.7443080 3.10191 2.095804 3.123437 2.490004 3.105966 2.234529 3.098342 2.260106 

Elastic 

Net 

0.1 1.0607240 1.1089750 0.8834351 1.183933 0.6131434 1.159529 0.4612261 1.037333 0.4600939 1.024278 

0.2 0.7694645 0.9855784 0.771914 1.164514 0.5357428 1.147965 0.4422951 1.041303 0.4020136 1.020438 

0.3 0.6781246 0.9557141 0.6198487 1.136541 0.4721469 1.141938 0.3897921 1.040322 0.3888352 1.028375 

0.4 0.5581806 0.9200789 0.6145527 1.144592 0.3886355 1.128757 0.3208472 1.034618 0.3200596 1.021084 

0.5 0.5378645 0.9313477 0.5921847 1.154328 0.3744903 1.137142 0.2817036 1.034401 0.3084104 1.029477 

0.6 0.4482204 0.9077476 0.4934872 1.135276 0.3120752 1.126321 0.2576411 1.037091 0.2570087 1.022753 

0.7 0.4216468 0.9163924 0.46423 1.140038 0.2935733 1.13074 0.2659968 1.050992 0.2417714 1.027734 

0.8 0.3361653 0.8828899 0.4062012 1.130843 0.2819218 1.138457 0.2120706 1.039766 0.21155 1.024674 

0.9 0.3599220 0.9240131 0.3610678 1.120571 0.2505971 1.13428 0.1885072 1.03851 0.1880445 1.022657 

Lasso 1 0.2951528 0.8923225 0.324961 1.124729 0.2255374 1.131318 0.1861978 1.045797 0.1857407 1.030667 

Table 4 MSE values and optimal value of Regularization parameter when 0.99 =   

0.99 =  

Method    
n=200 n=1000 n=10000 n=50000 n=100000 

   MSE    MSE    MSE    MSE    MSE 

Ridge 0 3.654888 3.0121850 3.546463 3.483199 3.585424 4.344516 3.557993 3.859463 3.544116 3.739378 

Elastic Net 

0.1 1.040925 1.0764300 0.8385574 1.334766 0.7038333 1.387571 0.6984484 1.251602 0.6339181 1.216287 

0.2 0.9982006 1.0413740 0.8825415 1.313373 0.7407508 1.391776 0.7350835 1.251386 0.6671684 1.224578 

0.3 0.8015574 0.9893414 0.853611 1.305417 0.7164684 1.361745 0.7109868 1.25647 0.7082139 1.246001 

0.4 0.7241091 0.9845812 0.8463177 1.295802 0.7796048 1.379147 0.6422896 1.222855 0.7021628 1.25335 

0.5 0.6357671 0.9766932 0.8155141 1.300482 0.6844922 1.35112 0.6189121 1.223221 0.676606 1.227457 

0.6 0.5814613 0.9817787 0.7458548 1.274246 0.6870611 1.359158 0.6212348 1.237149 0.6188119 1.221965 

0.7 0.4983954 0.9641743 0.7016354 1.266754 0.6463274 1.353982 0.5844038 1.235244 0.5821245 1.227421 

0.8 0.4786148 0.9831136 0.6737885 1.272795 0.6206756 1.358227 0.5612097 1.244024 0.5590209 1.236977 

0.9 0.4669148 1.0107910 0.6573174 1.281283 0.5517117 1.322116 0.5474906 1.260935 0.5453553 1.257195 

Lasso 1 0.3828918 0.9524901 0.5915857 1.242952 0.4965405 1.296512 0.4927415 1.244684 0.4908198 1.24159 


Shady I. Altelbany 

140 

4   Conclusion 

According to the outcomes of simulation at p = 10 and  n = 200, 1000, 10000, 

50000, 100000 observations, containing different degrees of multicollinearity within 

all independent variables, it can be summarized in: 

 
• Elastic Net method outperforms Ridge and Lasso methods to estimate the 

regression coefficients when degree of multicollinearity is low ( )0.70 = , 

moderate ( 0.80 = ) and high ( 0.90 = ) at all number of data (n = 200, 

1000, 10000, 50000, 100000 observations).  

• If data including cruel multicollinearity within all independent variables, the 

Lasso method outperforms Ridge and Elastic Net methods at (n = 200, 1000, 

10000 observations). 

• Elastic Net methods for the larger number of observations (n = 50000, 100000 

observations) was outperforms Ridge and Lasso methods if data containing 

severe multicollinearity. 

• Results suggest that performance of these methods are depending greatly on 

the values of α.  

• Elastic net regression outperformed the other two methods in case of low, 

moderate and high level of multicollinearity when 0 < α <1. Also, it can be 

incurred that severe multicollinearity requires higher value (α=1). This is 

consistent with the theoretical framework. 

• Ridge method was unsuitable regression coefficients estimator compared with 

Lasso and Elastic Net methods. 

• General, in studying relationships and interconnecting economic and social 

factors, we recommend to the decision maker that use Elastic Net method for 

any sample size.  

• We also recommend to the decision maker that use Lasso method for when 

using real data, and examine the relationships between the different variables 

(severe multicollinearity) at sample size less than 10000 observations. 

  
Journal of Applied Economics and Business Studies, Volume. 5, Issue 1 (2021) 131-142 https://doi.org/10.34260/jaebs.517 

141 

References:  

Biswas, S. (2019). How Regularization Helps in Data Overfitting. 

https://medium.com/towards-artificial-intelligence/how-regularization-can-help-

in-overfitting-the-data-ad9ff80f9ccc. 

Chen, S. S., Donoho, D. & Saunders, M. (1998).  Atomic decomposition by basis 

pursuit. SIAM Journal on Scientic Computing, 20(1), 33–61. 

https://doi.org/10.1137/S1064827596304010 

Doreswamy, A., & Vastrad, C. M. (2013). Performance Analysis of Regularized 

Linear Regression Models for Oxazolines and Oxazoles Derivatives Descriptor 

Dataset. International Journal of Computational Science and Information 

Technology (IJCSITY), 1(4), 111-123. DIO: 10.5121/ijcsity.2013.1408. 

Draper, N.R., & Smith, H. (1998). Applied Regression Analysis. 3rd ed., New York: 

Wiley. https://doi.org/10.1002/9781118625590 

Friedman, J., Hastie, T., & Tibshirani, R. (2010).  Regularization Paths for 

Generalized Linear Models via Coordinate Descent, Journal of Statistical 

Software, 33(1),1-22.   DOI: 10.1163/ej. 9789004178922.i-328.7 

Gujarati, D. (1995). Basic Econometrics. 4th ed., New York: McGraw−Hill. 

Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical 

Learning: Data Mining, Inference, and Prediction. 2nd ed., Springer-Verlag New 

York Inc.  https://www.springer.com/gp/book/9780387848570  

Herawati, N., Nisa, K., Setiawan, E., & Nusyirwan, T. (2018). Regularized multiple 

regression methods to deal with severe multicollinearity. International Journal of 

Statistics and Applications, 8(4), 167-172. DOI: 10.5923/j.statistics.20180804.02   

Hoerl, A.E., & Kennard, R.W. (2000). Ridge Regression: Biased Estimation for 

Nonorthogonal Problems. Technimetrics, 42, 80-86. 

https://doi.org/10.2307/1271436  

Hoerl, A.E. (1962). Application of ridge analysis to regression problems. Chem. Eng. 

Prog., 58, 54-59. 

Hoerl, A., Kannard, R., & Baldwin, K.F. (1975). Ridge regression: Some simulations. 

Communication of Statistics, 4, 105-123. 

https://doi.org/10.1080/03610927508827232 

https://medium.com/@imsaikatb?source=post_page-----ad9ff80f9ccc--------------------------------
https://medium.com/towards-artificial-intelligence/how-regularization-can-help-in-overfitting-the-data-ad9ff80f9ccc
https://medium.com/towards-artificial-intelligence/how-regularization-can-help-in-overfitting-the-data-ad9ff80f9ccc
https://doi.org/10.1137/S1064827596304010
https://arxiv.org/ct?url=https%3A%2F%2Fdx.doi.org%2F10.5121%2Fijcsity.2013.1408&v=0a57879d
https://doi.org/10.1002/9781118625590
https://www.researchgate.net/deref/http%3A%2F%2Fdx.doi.org%2F10.1163%2Fej.9789004178922.i-328.7
https://www.bookdepository.com/publishers/Springer-Verlag-New-York-Inc
https://www.bookdepository.com/publishers/Springer-Verlag-New-York-Inc
https://www.springer.com/gp/book/9780387848570
https://www.researchgate.net/deref/http%3A%2F%2Fdx.doi.org%2F10.5923%2Fj.statistics.20180804.02
https://doi.org/10.2307/1271436
https://doi.org/10.1080/03610927508827232


Shady I. Altelbany 

142 

James, G., Witten D., Hastie T., & Tibshirani R. (2013). An Introduction to Statistical 

Learning: With Applications in R. New York: Springer Publishing Company, Inc. 

https://link.springer.com/book/10.1007/978-1-4614-7138-7  

Judge, G.G. (1988).  Introduction to Theory and Practice of Econometrics. New York: 

John Willy and Sons. https://doi.org/10.1002/jae.3950050311 

Kutner, M. H., Nachtsheim, C., Neter, & William, N. (2005). Applied Linear 

Statistical Models. 5th Edition. New York: McGraw-Hill. 

https://www.amazon.com/Applied-Linear-Statistical-Models-

Michael/dp/007310874X  

McDonald G.C., & Galarneau, D.I. (1975). A Monte Carlo evaluation of some ridge 

type estimators. J. Amer. Statist. Assoc., 20, 407-416. 

https://www.tandfonline.com/doi/abs/10.1080/01621459.1975.10479882  

Montgomery, D.C. & Peck, E.A. (1992). Introduction to Linear Regression Analysis. 

New York: John Willy and Sons. https://doi.org/10.1111/biom.12129 

Myers, R. H. (1986). Classical and modern regression with applications, 2nd Ed, 

USA: PWSKENT Publishing Company. 

https://lib.ugent.be/catalog/rug01:000851135 

Ogutu, J. O., Schulz-Streeck, T., & Piepho, H. P. (2012). Genomic selection using 

regularized linear regression models: Ridge regression, Lasso, Elastic Net and 

their extensions. BMC proceedings, 6 (2), 10. https://doi.org/10.1186/1753-6561-

6-S2-S10 

Slinker, B.K., & Glantz, S.A. (1985). Multiple regression for physiological data 

analysis: the problem of multicollinearity. American Journal of Physiology - 

Regulatory, Integrative and Comparative Physiology, 249(1), R1–R12.  

https://doi.org/10.1152/ajpregu.1985.249.1.R1 

Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal 

Royal. Statist. Soc B., 58(1), 267-288. https://doi.org/10.1111/j.2517-

6161.1996.tb02080.x 

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the Elastic Net. 

Journal of The Royal Statistical Society: Series B (Statistical Methodology), 67 

(2), 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x 

https://link.springer.com/book/10.1007/978-1-4614-7138-7
https://doi.org/10.1002/jae.3950050311
https://www.worldcat.org/search?q=au%3AKutner%2C+Michael+H.%2C&qt=hot_author
https://www.worldcat.org/search?q=au%3ANachtsheim%2C+Chris%2C&qt=hot_author
https://www.worldcat.org/search?q=au%3ANeter%2C+John%2C&qt=hot_author
https://www.worldcat.org/search?q=au%3ALi%2C+William%2C&qt=hot_author
https://www.amazon.com/Applied-Linear-Statistical-Models-Michael/dp/007310874X
https://www.amazon.com/Applied-Linear-Statistical-Models-Michael/dp/007310874X
https://www.tandfonline.com/doi/abs/10.1080/01621459.1975.10479882
https://doi.org/10.1111/biom.12129
https://lib.ugent.be/catalog/rug01:000851135
https://doi.org/10.1186/1753-6561-6-S2-S10
https://doi.org/10.1186/1753-6561-6-S2-S10
https://doi.org/10.1152/ajpregu.1985.249.1.R1
https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
https://doi.org/10.1111/j.1467-9868.2005.00503.x