Int. J. Anal. Appl. (2023), 21:40

Random and Fixed Effects Selection for Weighted Ridge

Lulah Alnaji∗

Department of Mathematics, College of Sciences, University of Hafr Al Batin, Saudi Arabia

∗Corresponding author: laalnaji@uhb.edu.sa

Abstract. Using penalized profiled log-likelihood and penalized limited profiled log-likelihood, respec-

tively, together with the weighted ridge penalized term, we offer a method in this study for choosing

the fixed and random effects in linear mixed models. Then, we use the penalized restricted profiled

log-likelihood to perform in the random effects depending on the chosen tuning parameter. Second,

we use the penalized profiled log-likelihood to choose the fixed effect parameters. There is no closed-

form solution for the choice of the fixed and random effects, hence the Newton-Raphson technique

is employed to iteratively estimate the parameters. We use a simulation study to show how well the

suggested strategy works. Lastly, we use two separate datasets to use the methods to further evaluate

the newly proposed model.

1. Introduction

With longitudinal data, each individual is followed repeatedly across various times in time. Thus, the

independence assumption is not optimal because of the associated observations of each participant.

The linear mixed model is a common option for longitudinal research and is a helpful tool since it

incorporates the random effects to account for the within-subjects correlation. [1].

Due to the fixed and random effect parameters’ increased dimension during the past 20 years, there

have been certain challenges. The issue of variable selection has been researched, and many different

approaches have been put forth in order to narrow the selection of parameters to those that are most

crucial, such as Ridge Regression [2], LASSO Method [3], Adaptive LASSO [4], Elastic Net [5] and

SCAD [6] among many others.

Selection criteria like Akaike information criterion (AIC [7]) and Bayesian information criterion

(BIC [8]) have been used for variable selection and proved to provide a consistent selection model rules.

Received: Mar. 4, 2023.

2020 Mathematics Subject Classification. 62J05, 62J07.

Key words and phrases. random effect; fixed effects; Akaike information criterion; variable selection; ridge.

https://doi.org/10.28924/2291-8639-21-2023-40
ISSN: 2291-8639

© 2023 the author(s).

https://doi.org/10.28924/2291-8639-21-2023-40


2 Int. J. Anal. Appl. (2023), 21:40

In particular, BIC is asymptotically consistent for model selection. Many research have extensively

explored and used the asymptotic qualities when the number of possible aggressors is fixed [9].

Fixed effects selection in the linear mixed-effects model using adaptive ridge process for L0 penalty

performance is one study that has focused a lot of emphasis on one-stage variable selection using

penalized log-likelihood approach for the fixed effects [10]; and model selection in linear mixed effect

m els [11]. Although the methods listed above are highly helpful, the fixed and random effects’

fundamental features are quite different, therefore the methods listed above might not reveal these

differences.

Adaptive LASSO has been the subject of various studies for variable selection methods such as [12],

they investigated pathwise coordinate optimization and fixed and random effects selection via REML.

Also, profile log-likelihood-based adaptive LASSO for linear mixed model selection [13]. They discuss

selecting the fixed and random effects using ML and REML procedures.

For weighted ridge, [14] studied a weighted ridge procedure for L0 regularization in fixed models.

The linear mixed-effects model was used to study the selection of fixed effects, and the fixed effects,

random effects, and variance components were estimated using the weighted ridge approach for L0

penalty performance [10]. We provide a model selection process for a weighted Ridge in mixed models

for both random and fixed effects, respectively, to further enhance the behavior of the current penalized

techniques.

The remaining sections of this article are structured as follows. Section 2 presents variable selection

for the linear mixed-effects model. The methodology of the weighted ridge is presented in Section

3. Section 4 presents simulation studies and Section 5 presents the important conclusions from this

study.

2. Variable selection for the linear mixed-effects model

Maximum likelihood (ML) and restricted maximum likelihood (REML) are major methods have been

proposed to estimate the parameters in 2.1 when assuming that λj and �j are normally distributed.

See Random-effects models for longitudinal data [15], Unbalanced repeated-measures models with

structured covariance matrices [16], and Newton-Raphson and EM algorithms for linear mixed-effects

models for repeated-measures data [17].

In this article, restricted maximum likelihood (REML) and Maximum likelihood (ML) are used to

select the random effects and the fixed effects, respectively.

2.1. Literature review of classical linear mixed-effect mode. In this section, we consider the clas-

sical linear mixed-effect model setting to establish a selection method for weighted ridge mixed model.

Yj = Xjψ +Zjλj + �j, j =1, · · · ,m, (2.1)


Int. J. Anal. Appl. (2023), 21:40 3

where Yj is the mj ×1 response vector for the observations of subject j, ψ ∈ Rr is the fixed effects
vector corresponding to its mj × r ull rank design matrix Xj, λj ∈ Rl is the random effects vector
corresponding to its mj × l design matrix Zj and �j is the mj ×1 vector of the model errors.

Denote the total of observations M =
∑m
j=1mj. Assume that Y1, · · · ,Ym are independent and that

�j and λj are independent with �j ∼ N(0,σ2Imj), λj ∼ MV N(0,G), hence Yj ∼ N(Xjψ,σ
2Dj(α))

where Dj(α)= ZjGZ
′
j + Imjσ

2 and α is the k =1/2∗ (l(l +1)) vector that consists of the unknown
covariance parameters which characterizes the matrix G.

The model 2.1 is a classical model and the estimation of the fixed and random effect parameters

can be done by using the well-known methods, unbiased estimation(BLUE) and best linear unbiased

prediction (BLUP), respectively, using maximum likelihood (ML) approach [18].

2.2. Selection of Random Effects Parameters. Due to its unpredictable nature, which presents

greater difficulties in estimating the variance-covariance matrix’s structure, random effects selection

hasn’t garnered as much attention as fixed effects selection, as was previously indicated. Although

the parameters for the fixed effects are not very sensitive to the choice of random effects, choosing

random effects incorrectly can have an impact on how effectively the fixed effects are estimated.

To implement the selection of the random effects by selecting a parameter α that maximizes the

penalized restricted profiled log-likelihood. Consider the weighted ridge [14] for the penalty term

Tran(α)=−
1

2
log
∣∣∣∣ m∑
j=1

X
′
jDjXj

∣∣∣∣− 12
m∑
j=1

log|Dj|

−
1

2
M × log

( m∑
j=1

e
′
D−1
j
e

)

+
1

2
r × log

( m∑
j=1

e
′
D−1
j
e)

)

−γ1m
l∑
j=1

w1jg
2
j

(2.2)

where γ1m ≥ 0 is the tuning parameter (also called the regularization parameter), W1 =
diag(w11, · · · ,w1l) is l × l diagonal matrix and w1j is the j-th element of W1’s diagonal (See Section
3), e =(Yj −Xjψ̂) and gj is the j-th element of the diagonal of the matrix G.

The primary interest lies on the selection of ψ and α. The proposed selection variable method

has no closed-form solution and can be solved iteratively. For example, Newton-Raphson is a popular

iterative method to be used. The first and the second derivatives of Newton-Raphson method can be

found in details in [16].


4 Int. J. Anal. Appl. (2023), 21:40

2.3. Selection of Fixed Effects Parameters. To select the important covariates of the fixed effects,

we propose to maximize the penalized profiled log-likelihood

Tf ix(ψ)=−
1

2

m∑
j=1

log|Dj|

−
1

2
M × log

( m∑
j=1

(Yj −Xjψ)
′
D−1
j
(Yj −Xjψ)

)

−γ2n
r∑
j=1

w2jψ
2
j

(2.3)

where the tuning parameter (also called the regularization parameter) is γ2n ≥ 0, W2 =
diag(w21, · · · ,w2q) is l × l diagonal matrix and w2j is the j-th element of W2’s diagonal (See Section
3). Note that at this step, the penalized profiled log-likelihood 2.3 is a function of ψ only, since the

matrix Dj always selected by the first stage using the penalized restricted profiled log-likelihood 2.2.

After selecting the matrix Dj in the first stage by maximizing the penalized restricted profiled

log-likelihood 2.2, in the second stage the primary interest lies in the selection of the covariates of

ψ. Similarly, the proposed selection variable method has no closed-form solution and can be solved

iteratively by the Newton-Raphson method of the form

ψt+1 = ψt −H−1Sψ, t =0,1,2 · · · (2.4)

where ψt is the current result of 2.3, ψt+1 is the updated one, Sψ is the score vector and Hψψ is the

Hessian matrix.

3. The methodology of the weighted ridge

[14] proposed a weighted ridge strategy that improves the performance of the L0-penalty for fixed

effect models, motivated by Least absolute shrinkage is equivalent to quadratic penalization [19]; One-

step sparse estimates in non-concave penalized likelihood models [20] and Visualization of genomics

changes by segmented smoothing using an L0 penalty [21]. In (2017), [14] has been extended to a

linear mixed effects model by [10]. They proposed a selection strategy for the fixed effects while the

covariance matrix for the random effects is Cholesky factorized. In this article, we proposed a selection

approach for both the fixed and random effects using an iteratively weighted ridge strategy following

the procedure of the weight matrix in [14]

Consider the objective function 2.2 where the weight matrices for for w1j and w2j are calculated by

wt1j =
1

|α(t)
j
|2 +δ2

(3.1)

and


Int. J. Anal. Appl. (2023), 21:40 5

wt2j =
1

|ψ(t)
j
|2 +δ2

(3.2)

where w1j is the j-th element of diag(W1), αj is the j-th element of the diagonal of the selected

covariance matrix in the first stage, t is the number of iteration, δ is a constant, w2j is the j-th

element of diag(W2) and ψj is the j-th element of the vector of the fixed effect parameters. In

numerical practices small positive choices for δ seem to perform better than δ =0 (for further details

see [10], [22], [14], [21]).

Although, the selection of the tuning parameter γ would add more computational work to search

among lattice of γ’s but it is an important step. The selection of the tuning parameter is an influential

part of penalized methods.

Selection criteria like Akaike information criterion (AIC [7]), Bayesian information criterion (BIC [8])

and Generalized Cross Validation (GCV [23]) have been used for variable selection. For compression

we employ the three criteria for the fixed and random effect parameters.

4. Simulation studies

A simulation study is conducted in order to examine the asymptotic properties and the performance

of our newly developed method. All of the simulated data are generated according to the model 2.1

using R statistical software. Following the examples in [24], the simulation study assumes repeatedly

observation per each subject.

(I) Assume mj =5 per each subject j with m =30 and consider the true fixed effects vector to

be ψ =(1,1,0,0,0,0,0,0,0) with r =9. We further consider l =4 for random effects with

the assumption of normal distribution for the error of the model and the random effects as,

�j ∼ N(0,σ2I) and λj = (λj0,0)
′
with rj0 ∼ N(0,G) where σ2 = 1 and the true covariance

matrix

G =



9 4.8 0.6 0

4.8 4 1 0

0.6 1 1 0

0 0 0 0




The design covariates matrix Xj is assumed to arise from a uniform(-2,2). The first column

in matrix Zj are ones for the subject-specific intercept, while the remaining columns are

assumed to arise from a uniform(-2,2) as well.

(II) This case follows case (I) with an increase of the sample size. Assume m =60 with mj =10

and generate 200 dataset following the same methodology.)

(III) This case follows case (I) with an increase of the sample size. Assume m =60 with mj =10

and generate 500 dataset following the same methodology.)


6 Int. J. Anal. Appl. (2023), 21:40

Table 1. : The Simulation Results for Case (I)

Criteria BIC AIC GCV

%Correct 64 62 65

%CR 69 67 58

%CF 73 70 70

Table 2. : The Simulation Results for Case (II)

Criteria BIC AIC GCV

%Correct 87 82 85

%CR 90 77 71

%CF 94 69 71

Table 3. : The Simulation Results for Case (III)

Criteria BIC AIC GCV

%Correct 90 83 85

%CR 90 80 75

%CF 97 73 78

Table 4. : The simulation results are for case (I) compared to some existing studies

m =30 WRidge [13] [12]

mj =5

%Correct 64 73 61

%CR 69 81 79

%CF 73 88 79

%Correct denotes the percentage of times that the correct model (fixed and random effects) is

selected, %CR denotes the percentage of times that the random effects is selected, and %CF denotes

the percentage of times that the fixed effects is selected. m =30 ,mj =5. While there is a large body


Int. J. Anal. Appl. (2023), 21:40 7

Table 5. : The simulation results are for case (II) compared to some existing studies

m =60 WRidge [13] [12]

mj =10

%Correct 87 92 88

%CR 90 92 91

%CF 94 100 97

of literature on the estimation the parameters, only countable references studied parameters selection.

In particular, the selection of the random effect has received less attention than the selection of the

fixed effects. [14] studied one-stage of an weighted ridge procedure for L0 regularization in fixed

models. Also, [10] studied selection of fixed effects with estimation fixed effects, random effects and

variance components in the linear mixed-effects model using weighted ridge procedure for L0 penalty

performance.

In Tables 1 and 2, we use AIC, BIC and GCV criteria and it can be seen that the empirical results

confirm the asymptotic properties, that is the selection percentages get higher as the sample size

increases. In Tables, 4 and 5 we use BIC criteria and compare the results of our newly proposed

approach (WRidge) to the results of [13] and [12]. It can be seen that the percentage of selection

in for our method is somehow less and that due to the nature of our method, It is known that the

weighted ridge method doesn’t eliminate some predictors to zero as LASSO does and that

Our newly proposed method performs well across the simulations experiments and the weighted

ridge performs well for model selection for both fixed and random effects.

5. Real Data Applications

To show the efficiency of the suggested penalized technique in the mixed model selection, two

dataset set are applied.

5.1. First Dataset. We use the data from Amsterdam, Growth and Health study, which is a dis-

tinctive, interdisciplinary cohort research established to investigate development and health among

teens [25]. The information was gathered to investigate the connection between adolescent and early

adulthood lifestyle and health. The study had 147 people in all, who were assessed at six different time

periods, for a total of 882 observations. The five factors considered are age, gender, body fat, fitness,

and smoking. This paper follows [26] and [13] for compression purposes. The random intercept is

permitted to have intercept, but this is not the case for the fixed effects, since the response variable

cholesterol has been centered and normalized all of the inputs, and hence the fixed effects have no

intercept. Table 6 exhibits that the bodyf at and time are selected by all approaches to be important


8 Int. J. Anal. Appl. (2023), 21:40

fixed effects. For random effects selection, gender is selected by all approaches, while f itness is

selected by SAW and WRidge methods, but smoking is selected by HARD only.

Table 6. Comparison of All Methods.

Fixed Effect Random Effect

HARD SAW PS WRidge HARD SAW PS WRidge

intercept - - - - 0.405 0.347 0.017 0.211

fitness 0 0 0 0 0 0.006 0 0.001

body_fat 0.174 0.165 0.170 0.168 0 0 0 0

smoking 0 0 0 0 0.149 0 0 0

gender 0 0 0 0 0.668 0.624 0.888 0.691

time 0.156 0.167 0.165 0.161 0 0 0 0

5.2. Second Dataset. The data was collected from 27 children to study, the distance (mm) between

the pituitary gland’s core and the pterygomaxillary fissure for both gender [27]. The distances were

measured in 16 boys and 11 girls at ages 8, 10, 12, and 14 for this study. The purpose of the study

was to examine the basic age functions for boys and girls before describing the gap between boys and

girls as a function of age. The two factors are gender and age while distance is the response. the

response variable distance has been centered and normalized all of the inputs and hence there is no

intercept for fixed effects.

Table 7. Results of WRidge Selection Method.

Fixed Effect Random Effect

WRidge WRidge

intercept - .021

gender 0.623 0.861

age 0.117 0

Table 7 demonstrates that the gender and age are selected to be important fixed effects. For

random effects selection, gender is selected by the newly proposed model to be important random

effects.

6. Conclusion

In this paper, we propose weighted ridge selection for a linear mixed model and we focus on the

case of longitudinal data with N observations coming from n subjects. The simulation studies of


Int. J. Anal. Appl. (2023), 21:40 9

low-dimensional settings show that the newly established method is an efficient method in general,

and the percentages of times the random effects, fixed effects, and both of them combined are high

percentages.

Conflicts of Interest: The author declares that there are no conflicts of interest regarding the publi-

cation of this paper.

References

[1] G. Verbeke, Linear Mixed Models for Longitudinal Data, in: Linear Mixed Models in Practice, Springer, New York,

NY, 1997: pp. 63-153. https://doi.org/10.1007/978-1-4612-2294-1_3.

[2] A.E. Hoerl, R.W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics. 12

(1970), 55-67.

[3] R. Tibshirani, Regression Shrinkage and Selection via the Lasso, J. R. Stat. Soc.: Ser. B (Methodol.) 58 (1996),

267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.

[4] H. Zou, The Adaptive Lasso and Its Oracle Properties, J. Amer. Stat. Assoc. 101 (2006), 1418–1429. https:

//doi.org/10.1198/016214506000000735.

[5] H. Zou, T. Hastie, Regularization and Variable Selection via the Elastic Net, J. R. Stat. Soc. Ser. B: Stat. Methodol.

67 (2005), 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x.

[6] J. Fan, R. Li, Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties, J. Amer. Stat.

Assoc. 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273.

[7] H. Akaike, A New Look at the Statistical Model Identification, IEEE Trans. Automat. Contr. 19 (1974), 716-723.

https://doi.org/10.1109/tac.1974.1100705.

[8] G. Schwarz, Estimating the Dimension of a Model, Ann. Stat. 6 (1978), 461-464. https://www.jstor.org/

stable/2958889.

[9] Y. Yang, Can the Strengths of AIC and BIC Be Shared? A Conflict Between Model Indentification and Regression

Estimation, Biometrika. 92 (2005), 937-950. https://doi.org/10.1093/biomet/92.4.937.

[10] E. Adjakossa, G. Nuel, Fixed Effects Selection in the Linear Mixed-Effects Model Using Adaptive Ridge Procedure

for L0 Penalty Performance, arXiv:1705.01308. (2017). https://doi.org/10.48550/ARXIV.1705.01308.

[11] H. Peng, Y. Lu, Model Selection in Linear Mixed Effect Models, J. Multivar. Anal. 109 (2012), 109-129. https:

//doi.org/10.1016/j.jmva.2012.02.005.

[12] B. Lin, Z. Pang, J. Jiang, Fixed and Random Effects Selection by REML and Pathwise Coordinate Optimization,

J. Comput. Graph. Stat. 22 (2013), 341–355. https://doi.org/10.1080/10618600.2012.681219.

[13] J. Pan, J. Shang, Adaptive LASSO for Linear Mixed Model Selection via Profile Log-Likelihood, Commun. Stat. -

Theory Methods. 47 (2017), 1882-1900. https://doi.org/10.1080/03610926.2017.1332219.

[14] F. Frommlet, G. Nuel, An Adaptive Ridge Procedure for L0 Regularization, PLoS ONE. 11 (2016), e0148620.

https://doi.org/10.1371/journal.pone.0148620.

[15] N.M. Laird, J.H. Ware, Random-Effects Models for Longitudinal Data, Biometrics. 38 (1982), 963. https://doi.

org/10.2307/2529876.

[16] R.I. Jennrich, M.D. Schluchter, Unbalanced Repeated-Measures Models with Structured Covariance Matrices, Bio-

metrics. 42 (1986), 805-820. https://doi.org/10.2307/2530695.

[17] M.J. Lindstrom, D.M. Bates, Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-

Measures Data, J. Amer. Stat. Assoc. 83 (1988), 1014–1022. https://doi.org/10.1080/01621459.1988.

10478693.

https://doi.org/10.1007/978-1-4612-2294-1_3
https://doi.org/10.1198/016214506000000735
https://doi.org/10.1198/016214506000000735
https://doi.org/10.1111/j.1467-9868.2005.00503.x
https://doi.org/10.1198/016214501753382273
https://doi.org/10.1109/tac.1974.1100705
https://www.jstor.org/stable/2958889
https://www.jstor.org/stable/2958889
https://doi.org/10.1093/biomet/92.4.937
https://doi.org/10.48550/ARXIV.1705.01308
https://doi.org/10.1016/j.jmva.2012.02.005
https://doi.org/10.1016/j.jmva.2012.02.005
https://doi.org/10.1080/10618600.2012.681219
https://doi.org/10.1080/03610926.2017.1332219
https://doi.org/10.1371/journal.pone.0148620
https://doi.org/10.2307/2529876
https://doi.org/10.2307/2529876
https://doi.org/10.2307/2530695
https://doi.org/10.1080/01621459.1988.10478693
https://doi.org/10.1080/01621459.1988.10478693


10 Int. J. Anal. Appl. (2023), 21:40

[18] D.A. Harville, Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems, J.

Amer. Stat. Assoc. 72 (1977), 320-338.

[19] Y. Grandvalet, Least Absolute Shrinkage is Equivalent to Quadratic Penalization, in: L. Niklasson, M. Bo-

den, T. Ziemke (Eds.), ICANN 98, Springer London, London, 1998: pp. 201-206. https://doi.org/10.1007/

978-1-4471-1599-1_27.

[20] P. Bühlmann, L. Meier, H. Zou, Discussion of "One-Step Sparse Estimates in Nonconcave Penalized Likelihood

Models" by H. Zou and R. Li, Ann. Stat. 36 (2008), 1534-1541.

[21] R.C.A. Rippe, J.J. Meulman, P.H.C. Eilers, Visualization of Genomic Changes by Segmented Smoothing Using an

L0 Penalty, PLoS ONE. 7 (2012), e38230. https://doi.org/10.1371/journal.pone.0038230.

[22] E.J. Candes, M.B. Wakin, S.P. Boyd, Enhancing Sparsity by Reweighted `1 Minimization, J. Fourier Anal. Appl. 14

(2008), 877-905. https://doi.org/10.1007/s00041-008-9045-x.

[23] M. Stone, An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion, J. R. Stat.

Soc.: Ser. B (Methodol.) 39 (1977), 44-47. https://doi.org/10.1111/j.2517-6161.1977.tb01603.x.

[24] H.D. Bondell, A. Krishna, S.K. Ghosh, Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects

Models, Biometrics. 66 (2010), 1069-1077. https://doi.org/10.1111/j.1541-0420.2010.01391.x.

[25] J.W.R. Twisk, H.C.G. Kemper, G.J. Mellenbergh, Longitudinal Development of Lipoprotein Levels in Males and

Females Aged 12-28 Years: The Amsterdam Growth and Health Study, Int. J. Epidemiol. 24 (1995), 69-77.

https://doi.org/10.1093/ije/24.1.69.

[26] M. Ahn, H.H. Zhang, W. Lu, Moment-Based Method for Random Effects Selection in Linear Mixed Models, Stat.

Sinica. 22 (2012), 1539-1562. https://doi.org/10.5705/ss.2011.054.

[27] R.F. Potthoff, S.N. Roy, A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve

Problems, Biometrika. 51 (1964), 313-326. https://doi.org/10.1093/biomet/51.3-4.313.

https://doi.org/10.1007/978-1-4471-1599-1_27
https://doi.org/10.1007/978-1-4471-1599-1_27
https://doi.org/10.1371/journal.pone.0038230
https://doi.org/10.1007/s00041-008-9045-x
https://doi.org/10.1111/j.2517-6161.1977.tb01603.x
https://doi.org/10.1111/j.1541-0420.2010.01391.x
https://doi.org/10.1093/ije/24.1.69
https://doi.org/10.5705/ss.2011.054
https://doi.org/10.1093/biomet/51.3-4.313

	1. Introduction
	2. Variable selection for the linear mixed-effects model
	2.1. Literature review of classical linear mixed-effect mode
	2.2. Selection of Random Effects Parameters
	2.3. Selection of Fixed Effects Parameters

	3. The methodology of the weighted ridge
	4. Simulation studies
	5. Real Data Applications
	5.1. First Dataset
	5.2. Second Dataset

	6. Conclusion
	References