Int. J. Anal. Appl. (2023), 21:40 Random and Fixed Effects Selection for Weighted Ridge Lulah Alnaji∗ Department of Mathematics, College of Sciences, University of Hafr Al Batin, Saudi Arabia ∗Corresponding author: laalnaji@uhb.edu.sa Abstract. Using penalized profiled log-likelihood and penalized limited profiled log-likelihood, respec- tively, together with the weighted ridge penalized term, we offer a method in this study for choosing the fixed and random effects in linear mixed models. Then, we use the penalized restricted profiled log-likelihood to perform in the random effects depending on the chosen tuning parameter. Second, we use the penalized profiled log-likelihood to choose the fixed effect parameters. There is no closed- form solution for the choice of the fixed and random effects, hence the Newton-Raphson technique is employed to iteratively estimate the parameters. We use a simulation study to show how well the suggested strategy works. Lastly, we use two separate datasets to use the methods to further evaluate the newly proposed model. 1. Introduction With longitudinal data, each individual is followed repeatedly across various times in time. Thus, the independence assumption is not optimal because of the associated observations of each participant. The linear mixed model is a common option for longitudinal research and is a helpful tool since it incorporates the random effects to account for the within-subjects correlation. [1]. Due to the fixed and random effect parameters’ increased dimension during the past 20 years, there have been certain challenges. The issue of variable selection has been researched, and many different approaches have been put forth in order to narrow the selection of parameters to those that are most crucial, such as Ridge Regression [2], LASSO Method [3], Adaptive LASSO [4], Elastic Net [5] and SCAD [6] among many others. Selection criteria like Akaike information criterion (AIC [7]) and Bayesian information criterion (BIC [8]) have been used for variable selection and proved to provide a consistent selection model rules. Received: Mar. 4, 2023. 2020 Mathematics Subject Classification. 62J05, 62J07. Key words and phrases. random effect; fixed effects; Akaike information criterion; variable selection; ridge. https://doi.org/10.28924/2291-8639-21-2023-40 ISSN: 2291-8639 © 2023 the author(s). https://doi.org/10.28924/2291-8639-21-2023-40 2 Int. J. Anal. Appl. (2023), 21:40 In particular, BIC is asymptotically consistent for model selection. Many research have extensively explored and used the asymptotic qualities when the number of possible aggressors is fixed [9]. Fixed effects selection in the linear mixed-effects model using adaptive ridge process for L0 penalty performance is one study that has focused a lot of emphasis on one-stage variable selection using penalized log-likelihood approach for the fixed effects [10]; and model selection in linear mixed effect m els [11]. Although the methods listed above are highly helpful, the fixed and random effects’ fundamental features are quite different, therefore the methods listed above might not reveal these differences. Adaptive LASSO has been the subject of various studies for variable selection methods such as [12], they investigated pathwise coordinate optimization and fixed and random effects selection via REML. Also, profile log-likelihood-based adaptive LASSO for linear mixed model selection [13]. They discuss selecting the fixed and random effects using ML and REML procedures. For weighted ridge, [14] studied a weighted ridge procedure for L0 regularization in fixed models. The linear mixed-effects model was used to study the selection of fixed effects, and the fixed effects, random effects, and variance components were estimated using the weighted ridge approach for L0 penalty performance [10]. We provide a model selection process for a weighted Ridge in mixed models for both random and fixed effects, respectively, to further enhance the behavior of the current penalized techniques. The remaining sections of this article are structured as follows. Section 2 presents variable selection for the linear mixed-effects model. The methodology of the weighted ridge is presented in Section 3. Section 4 presents simulation studies and Section 5 presents the important conclusions from this study. 2. Variable selection for the linear mixed-effects model Maximum likelihood (ML) and restricted maximum likelihood (REML) are major methods have been proposed to estimate the parameters in 2.1 when assuming that λj and �j are normally distributed. See Random-effects models for longitudinal data [15], Unbalanced repeated-measures models with structured covariance matrices [16], and Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data [17]. In this article, restricted maximum likelihood (REML) and Maximum likelihood (ML) are used to select the random effects and the fixed effects, respectively. 2.1. Literature review of classical linear mixed-effect mode. In this section, we consider the clas- sical linear mixed-effect model setting to establish a selection method for weighted ridge mixed model. Yj = Xjψ +Zjλj + �j, j =1, · · · ,m, (2.1) Int. J. Anal. Appl. (2023), 21:40 3 where Yj is the mj ×1 response vector for the observations of subject j, ψ ∈ Rr is the fixed effects vector corresponding to its mj × r ull rank design matrix Xj, λj ∈ Rl is the random effects vector corresponding to its mj × l design matrix Zj and �j is the mj ×1 vector of the model errors. Denote the total of observations M = ∑m j=1mj. Assume that Y1, · · · ,Ym are independent and that �j and λj are independent with �j ∼ N(0,σ2Imj), λj ∼ MV N(0,G), hence Yj ∼ N(Xjψ,σ 2Dj(α)) where Dj(α)= ZjGZ ′ j + Imjσ 2 and α is the k =1/2∗ (l(l +1)) vector that consists of the unknown covariance parameters which characterizes the matrix G. The model 2.1 is a classical model and the estimation of the fixed and random effect parameters can be done by using the well-known methods, unbiased estimation(BLUE) and best linear unbiased prediction (BLUP), respectively, using maximum likelihood (ML) approach [18]. 2.2. Selection of Random Effects Parameters. Due to its unpredictable nature, which presents greater difficulties in estimating the variance-covariance matrix’s structure, random effects selection hasn’t garnered as much attention as fixed effects selection, as was previously indicated. Although the parameters for the fixed effects are not very sensitive to the choice of random effects, choosing random effects incorrectly can have an impact on how effectively the fixed effects are estimated. To implement the selection of the random effects by selecting a parameter α that maximizes the penalized restricted profiled log-likelihood. Consider the weighted ridge [14] for the penalty term Tran(α)=− 1 2 log ∣∣∣∣ m∑ j=1 X ′ jDjXj ∣∣∣∣− 12 m∑ j=1 log|Dj| − 1 2 M × log ( m∑ j=1 e ′ D−1 j e ) + 1 2 r × log ( m∑ j=1 e ′ D−1 j e) ) −γ1m l∑ j=1 w1jg 2 j (2.2) where γ1m ≥ 0 is the tuning parameter (also called the regularization parameter), W1 = diag(w11, · · · ,w1l) is l × l diagonal matrix and w1j is the j-th element of W1’s diagonal (See Section 3), e =(Yj −Xjψ̂) and gj is the j-th element of the diagonal of the matrix G. The primary interest lies on the selection of ψ and α. The proposed selection variable method has no closed-form solution and can be solved iteratively. For example, Newton-Raphson is a popular iterative method to be used. The first and the second derivatives of Newton-Raphson method can be found in details in [16]. 4 Int. J. Anal. Appl. (2023), 21:40 2.3. Selection of Fixed Effects Parameters. To select the important covariates of the fixed effects, we propose to maximize the penalized profiled log-likelihood Tf ix(ψ)=− 1 2 m∑ j=1 log|Dj| − 1 2 M × log ( m∑ j=1 (Yj −Xjψ) ′ D−1 j (Yj −Xjψ) ) −γ2n r∑ j=1 w2jψ 2 j (2.3) where the tuning parameter (also called the regularization parameter) is γ2n ≥ 0, W2 = diag(w21, · · · ,w2q) is l × l diagonal matrix and w2j is the j-th element of W2’s diagonal (See Section 3). Note that at this step, the penalized profiled log-likelihood 2.3 is a function of ψ only, since the matrix Dj always selected by the first stage using the penalized restricted profiled log-likelihood 2.2. After selecting the matrix Dj in the first stage by maximizing the penalized restricted profiled log-likelihood 2.2, in the second stage the primary interest lies in the selection of the covariates of ψ. Similarly, the proposed selection variable method has no closed-form solution and can be solved iteratively by the Newton-Raphson method of the form ψt+1 = ψt −H−1Sψ, t =0,1,2 · · · (2.4) where ψt is the current result of 2.3, ψt+1 is the updated one, Sψ is the score vector and Hψψ is the Hessian matrix. 3. The methodology of the weighted ridge [14] proposed a weighted ridge strategy that improves the performance of the L0-penalty for fixed effect models, motivated by Least absolute shrinkage is equivalent to quadratic penalization [19]; One- step sparse estimates in non-concave penalized likelihood models [20] and Visualization of genomics changes by segmented smoothing using an L0 penalty [21]. In (2017), [14] has been extended to a linear mixed effects model by [10]. They proposed a selection strategy for the fixed effects while the covariance matrix for the random effects is Cholesky factorized. In this article, we proposed a selection approach for both the fixed and random effects using an iteratively weighted ridge strategy following the procedure of the weight matrix in [14] Consider the objective function 2.2 where the weight matrices for for w1j and w2j are calculated by wt1j = 1 |α(t) j |2 +δ2 (3.1) and Int. J. Anal. Appl. (2023), 21:40 5 wt2j = 1 |ψ(t) j |2 +δ2 (3.2) where w1j is the j-th element of diag(W1), αj is the j-th element of the diagonal of the selected covariance matrix in the first stage, t is the number of iteration, δ is a constant, w2j is the j-th element of diag(W2) and ψj is the j-th element of the vector of the fixed effect parameters. In numerical practices small positive choices for δ seem to perform better than δ =0 (for further details see [10], [22], [14], [21]). Although, the selection of the tuning parameter γ would add more computational work to search among lattice of γ’s but it is an important step. The selection of the tuning parameter is an influential part of penalized methods. Selection criteria like Akaike information criterion (AIC [7]), Bayesian information criterion (BIC [8]) and Generalized Cross Validation (GCV [23]) have been used for variable selection. For compression we employ the three criteria for the fixed and random effect parameters. 4. Simulation studies A simulation study is conducted in order to examine the asymptotic properties and the performance of our newly developed method. All of the simulated data are generated according to the model 2.1 using R statistical software. Following the examples in [24], the simulation study assumes repeatedly observation per each subject. (I) Assume mj =5 per each subject j with m =30 and consider the true fixed effects vector to be ψ =(1,1,0,0,0,0,0,0,0) with r =9. We further consider l =4 for random effects with the assumption of normal distribution for the error of the model and the random effects as, �j ∼ N(0,σ2I) and λj = (λj0,0) ′ with rj0 ∼ N(0,G) where σ2 = 1 and the true covariance matrix G =   9 4.8 0.6 0 4.8 4 1 0 0.6 1 1 0 0 0 0 0   The design covariates matrix Xj is assumed to arise from a uniform(-2,2). The first column in matrix Zj are ones for the subject-specific intercept, while the remaining columns are assumed to arise from a uniform(-2,2) as well. (II) This case follows case (I) with an increase of the sample size. Assume m =60 with mj =10 and generate 200 dataset following the same methodology.) (III) This case follows case (I) with an increase of the sample size. Assume m =60 with mj =10 and generate 500 dataset following the same methodology.) 6 Int. J. Anal. Appl. (2023), 21:40 Table 1. : The Simulation Results for Case (I) Criteria BIC AIC GCV %Correct 64 62 65 %CR 69 67 58 %CF 73 70 70 Table 2. : The Simulation Results for Case (II) Criteria BIC AIC GCV %Correct 87 82 85 %CR 90 77 71 %CF 94 69 71 Table 3. : The Simulation Results for Case (III) Criteria BIC AIC GCV %Correct 90 83 85 %CR 90 80 75 %CF 97 73 78 Table 4. : The simulation results are for case (I) compared to some existing studies m =30 WRidge [13] [12] mj =5 %Correct 64 73 61 %CR 69 81 79 %CF 73 88 79 %Correct denotes the percentage of times that the correct model (fixed and random effects) is selected, %CR denotes the percentage of times that the random effects is selected, and %CF denotes the percentage of times that the fixed effects is selected. m =30 ,mj =5. While there is a large body Int. J. Anal. Appl. (2023), 21:40 7 Table 5. : The simulation results are for case (II) compared to some existing studies m =60 WRidge [13] [12] mj =10 %Correct 87 92 88 %CR 90 92 91 %CF 94 100 97 of literature on the estimation the parameters, only countable references studied parameters selection. In particular, the selection of the random effect has received less attention than the selection of the fixed effects. [14] studied one-stage of an weighted ridge procedure for L0 regularization in fixed models. Also, [10] studied selection of fixed effects with estimation fixed effects, random effects and variance components in the linear mixed-effects model using weighted ridge procedure for L0 penalty performance. In Tables 1 and 2, we use AIC, BIC and GCV criteria and it can be seen that the empirical results confirm the asymptotic properties, that is the selection percentages get higher as the sample size increases. In Tables, 4 and 5 we use BIC criteria and compare the results of our newly proposed approach (WRidge) to the results of [13] and [12]. It can be seen that the percentage of selection in for our method is somehow less and that due to the nature of our method, It is known that the weighted ridge method doesn’t eliminate some predictors to zero as LASSO does and that Our newly proposed method performs well across the simulations experiments and the weighted ridge performs well for model selection for both fixed and random effects. 5. Real Data Applications To show the efficiency of the suggested penalized technique in the mixed model selection, two dataset set are applied. 5.1. First Dataset. We use the data from Amsterdam, Growth and Health study, which is a dis- tinctive, interdisciplinary cohort research established to investigate development and health among teens [25]. The information was gathered to investigate the connection between adolescent and early adulthood lifestyle and health. The study had 147 people in all, who were assessed at six different time periods, for a total of 882 observations. The five factors considered are age, gender, body fat, fitness, and smoking. This paper follows [26] and [13] for compression purposes. The random intercept is permitted to have intercept, but this is not the case for the fixed effects, since the response variable cholesterol has been centered and normalized all of the inputs, and hence the fixed effects have no intercept. Table 6 exhibits that the bodyf at and time are selected by all approaches to be important 8 Int. J. Anal. Appl. (2023), 21:40 fixed effects. For random effects selection, gender is selected by all approaches, while f itness is selected by SAW and WRidge methods, but smoking is selected by HARD only. Table 6. Comparison of All Methods. Fixed Effect Random Effect HARD SAW PS WRidge HARD SAW PS WRidge intercept - - - - 0.405 0.347 0.017 0.211 fitness 0 0 0 0 0 0.006 0 0.001 body_fat 0.174 0.165 0.170 0.168 0 0 0 0 smoking 0 0 0 0 0.149 0 0 0 gender 0 0 0 0 0.668 0.624 0.888 0.691 time 0.156 0.167 0.165 0.161 0 0 0 0 5.2. Second Dataset. The data was collected from 27 children to study, the distance (mm) between the pituitary gland’s core and the pterygomaxillary fissure for both gender [27]. The distances were measured in 16 boys and 11 girls at ages 8, 10, 12, and 14 for this study. The purpose of the study was to examine the basic age functions for boys and girls before describing the gap between boys and girls as a function of age. The two factors are gender and age while distance is the response. the response variable distance has been centered and normalized all of the inputs and hence there is no intercept for fixed effects. Table 7. Results of WRidge Selection Method. Fixed Effect Random Effect WRidge WRidge intercept - .021 gender 0.623 0.861 age 0.117 0 Table 7 demonstrates that the gender and age are selected to be important fixed effects. For random effects selection, gender is selected by the newly proposed model to be important random effects. 6. Conclusion In this paper, we propose weighted ridge selection for a linear mixed model and we focus on the case of longitudinal data with N observations coming from n subjects. The simulation studies of Int. J. Anal. Appl. (2023), 21:40 9 low-dimensional settings show that the newly established method is an efficient method in general, and the percentages of times the random effects, fixed effects, and both of them combined are high percentages. Conflicts of Interest: The author declares that there are no conflicts of interest regarding the publi- cation of this paper. References [1] G. Verbeke, Linear Mixed Models for Longitudinal Data, in: Linear Mixed Models in Practice, Springer, New York, NY, 1997: pp. 63-153. https://doi.org/10.1007/978-1-4612-2294-1_3. [2] A.E. Hoerl, R.W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics. 12 (1970), 55-67. [3] R. Tibshirani, Regression Shrinkage and Selection via the Lasso, J. R. Stat. Soc.: Ser. B (Methodol.) 58 (1996), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x. [4] H. Zou, The Adaptive Lasso and Its Oracle Properties, J. Amer. Stat. Assoc. 101 (2006), 1418–1429. https: //doi.org/10.1198/016214506000000735. [5] H. Zou, T. Hastie, Regularization and Variable Selection via the Elastic Net, J. R. Stat. Soc. Ser. B: Stat. Methodol. 67 (2005), 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x. [6] J. Fan, R. Li, Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties, J. Amer. Stat. Assoc. 96 (2001), 1348–1360. https://doi.org/10.1198/016214501753382273. [7] H. Akaike, A New Look at the Statistical Model Identification, IEEE Trans. Automat. Contr. 19 (1974), 716-723. https://doi.org/10.1109/tac.1974.1100705. [8] G. Schwarz, Estimating the Dimension of a Model, Ann. Stat. 6 (1978), 461-464. https://www.jstor.org/ stable/2958889. [9] Y. Yang, Can the Strengths of AIC and BIC Be Shared? A Conflict Between Model Indentification and Regression Estimation, Biometrika. 92 (2005), 937-950. https://doi.org/10.1093/biomet/92.4.937. [10] E. Adjakossa, G. Nuel, Fixed Effects Selection in the Linear Mixed-Effects Model Using Adaptive Ridge Procedure for L0 Penalty Performance, arXiv:1705.01308. (2017). https://doi.org/10.48550/ARXIV.1705.01308. [11] H. Peng, Y. Lu, Model Selection in Linear Mixed Effect Models, J. Multivar. Anal. 109 (2012), 109-129. https: //doi.org/10.1016/j.jmva.2012.02.005. [12] B. Lin, Z. Pang, J. Jiang, Fixed and Random Effects Selection by REML and Pathwise Coordinate Optimization, J. Comput. Graph. Stat. 22 (2013), 341–355. https://doi.org/10.1080/10618600.2012.681219. [13] J. Pan, J. Shang, Adaptive LASSO for Linear Mixed Model Selection via Profile Log-Likelihood, Commun. Stat. - Theory Methods. 47 (2017), 1882-1900. https://doi.org/10.1080/03610926.2017.1332219. [14] F. Frommlet, G. Nuel, An Adaptive Ridge Procedure for L0 Regularization, PLoS ONE. 11 (2016), e0148620. https://doi.org/10.1371/journal.pone.0148620. [15] N.M. Laird, J.H. Ware, Random-Effects Models for Longitudinal Data, Biometrics. 38 (1982), 963. https://doi. org/10.2307/2529876. [16] R.I. Jennrich, M.D. Schluchter, Unbalanced Repeated-Measures Models with Structured Covariance Matrices, Bio- metrics. 42 (1986), 805-820. https://doi.org/10.2307/2530695. [17] M.J. Lindstrom, D.M. Bates, Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated- Measures Data, J. Amer. Stat. Assoc. 83 (1988), 1014–1022. https://doi.org/10.1080/01621459.1988. 10478693. https://doi.org/10.1007/978-1-4612-2294-1_3 https://doi.org/10.1198/016214506000000735 https://doi.org/10.1198/016214506000000735 https://doi.org/10.1111/j.1467-9868.2005.00503.x https://doi.org/10.1198/016214501753382273 https://doi.org/10.1109/tac.1974.1100705 https://www.jstor.org/stable/2958889 https://www.jstor.org/stable/2958889 https://doi.org/10.1093/biomet/92.4.937 https://doi.org/10.48550/ARXIV.1705.01308 https://doi.org/10.1016/j.jmva.2012.02.005 https://doi.org/10.1016/j.jmva.2012.02.005 https://doi.org/10.1080/10618600.2012.681219 https://doi.org/10.1080/03610926.2017.1332219 https://doi.org/10.1371/journal.pone.0148620 https://doi.org/10.2307/2529876 https://doi.org/10.2307/2529876 https://doi.org/10.2307/2530695 https://doi.org/10.1080/01621459.1988.10478693 https://doi.org/10.1080/01621459.1988.10478693 10 Int. J. Anal. Appl. (2023), 21:40 [18] D.A. Harville, Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems, J. Amer. Stat. Assoc. 72 (1977), 320-338. [19] Y. Grandvalet, Least Absolute Shrinkage is Equivalent to Quadratic Penalization, in: L. Niklasson, M. Bo- den, T. Ziemke (Eds.), ICANN 98, Springer London, London, 1998: pp. 201-206. https://doi.org/10.1007/ 978-1-4471-1599-1_27. [20] P. Bühlmann, L. Meier, H. Zou, Discussion of "One-Step Sparse Estimates in Nonconcave Penalized Likelihood Models" by H. Zou and R. Li, Ann. Stat. 36 (2008), 1534-1541. [21] R.C.A. Rippe, J.J. Meulman, P.H.C. Eilers, Visualization of Genomic Changes by Segmented Smoothing Using an L0 Penalty, PLoS ONE. 7 (2012), e38230. https://doi.org/10.1371/journal.pone.0038230. [22] E.J. Candes, M.B. Wakin, S.P. Boyd, Enhancing Sparsity by Reweighted `1 Minimization, J. Fourier Anal. Appl. 14 (2008), 877-905. https://doi.org/10.1007/s00041-008-9045-x. [23] M. Stone, An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion, J. R. Stat. Soc.: Ser. B (Methodol.) 39 (1977), 44-47. https://doi.org/10.1111/j.2517-6161.1977.tb01603.x. [24] H.D. Bondell, A. Krishna, S.K. Ghosh, Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models, Biometrics. 66 (2010), 1069-1077. https://doi.org/10.1111/j.1541-0420.2010.01391.x. [25] J.W.R. Twisk, H.C.G. Kemper, G.J. Mellenbergh, Longitudinal Development of Lipoprotein Levels in Males and Females Aged 12-28 Years: The Amsterdam Growth and Health Study, Int. J. Epidemiol. 24 (1995), 69-77. https://doi.org/10.1093/ije/24.1.69. [26] M. Ahn, H.H. Zhang, W. Lu, Moment-Based Method for Random Effects Selection in Linear Mixed Models, Stat. Sinica. 22 (2012), 1539-1562. https://doi.org/10.5705/ss.2011.054. [27] R.F. Potthoff, S.N. Roy, A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems, Biometrika. 51 (1964), 313-326. https://doi.org/10.1093/biomet/51.3-4.313. https://doi.org/10.1007/978-1-4471-1599-1_27 https://doi.org/10.1007/978-1-4471-1599-1_27 https://doi.org/10.1371/journal.pone.0038230 https://doi.org/10.1007/s00041-008-9045-x https://doi.org/10.1111/j.2517-6161.1977.tb01603.x https://doi.org/10.1111/j.1541-0420.2010.01391.x https://doi.org/10.1093/ije/24.1.69 https://doi.org/10.5705/ss.2011.054 https://doi.org/10.1093/biomet/51.3-4.313 1. Introduction 2. Variable selection for the linear mixed-effects model 2.1. Literature review of classical linear mixed-effect mode 2.2. Selection of Random Effects Parameters 2.3. Selection of Fixed Effects Parameters 3. The methodology of the weighted ridge 4. Simulation studies 5. Real Data Applications 5.1. First Dataset 5.2. Second Dataset 6. Conclusion References