International Journal of Analysis and Applications Volume 19, Number 2 (2021), 264-279 URL: https://doi.org/10.28924/2291-8639 DOI: 10.28924/2291-8639-19-2021-264 SMOOTHING APPROXIMATIONS FOR LEAST SQUARES MINIMIZATION WITH L1-NORM REGULARIZATION FUNCTIONAL HENRIETTA NKANSAH∗, FRANCIS BENYAH, HENRY AMANKWAH Department of Mathematics, University of Cape Coast, Cape Coast, Ghana ∗Corresponding author: hnkansah@ucc.edu.gh Abstract. The paper considers the problem of least squares minimization with L1-norm regularization functional. It investigates various smoothing approximations for the L1-norm functional. It considers Quadratic, Sigmoid and Cubic Hermite functionals. A Tikhonov regularization is then applied to each of the resulting smooth least squares minimization problem. Results of numerical simulations for each smoothing approximation are presented. The results indicate that our regularization method is as good as any other non-smoothing method used in developed solvers. 1. Introduction We consider the problem (1.1) min α g(α) = f(α) + λj(α) where f(α) is smooth, j(α) is non-smooth and λ > 0 is the regularization parameter. In particular, we examine f(α) = ‖Xα− y‖22 and j(α) = ‖α‖1 . Therefore, the problem becomes (1.2) min α g(α) = ‖Xα− y‖22 + λ‖Lα‖1 , and L is the p×n discrete approximation of the (n−p)-th derivative operator. Received October 8th, 2019; accepted November 4th, 2019; published March 5th, 2021. 2010 Mathematics Subject Classification. 68W25. Key words and phrases. least squares minimization; regularization; smoothing approximations; over-determined systems. ©2021 Authors retain the copyrights of their papers, and all open access articles are distributed under the terms of the Creative Commons Attribution License. 264 https://doi.org/10.28924/2291-8639 https://doi.org/10.28924/2291-8639-19-2021-264 Int. J. Anal. Appl. 19 (2) (2021) 265 In this paper, we focus on over-determined linear model of the form Xα = y, where α ∈
> σ2i , fi ≈ σ2i λ , and σ2i λ → 0. This indicates that the filter factors has effect on the solution. In this case, it reduces the effect of the smaller singular values. Thus, Equation (1.4) gives the solution with regularization. Another technique is the L1-Regularized Least Squares in which we substitute a sum of absolute values for the sum of squares used in the L2-norm regularization, to obtain Equation (1.2). This problem in Equation (1.2) always has a solution but it needs not be unique. The first part of g(α) is smooth but the second part is non-smooth. In this paper, we explore three smoothing approximations that can be used to replace the L1-norm regularized term thereby enabling us to apply the Tikhonov regularization method. These approximations are the Quadratic Approximation of a function [2], Sigmoid Function Approximation [1] and the Cubic Hermite Approximation. These three approximations are used to obtain a regularized solution to the least-squares problem in the case where L = Ip, which is the Tikhonov regularization of order zero. In each case, we will compare the solution from our regularization method with that of the Modified Newton’s Method, which is mostly used in the Int. J. Anal. Appl. 19 (2) (2021) 267 literature. We begin by implementing the Modified Newton’s Method used by Lee et al.(2006) for solving an unconstrained optimization problem in order to ascertain the challenges associated with the method. 2. Smoothing Approximations 2.1. Quadratic Approximation. Lee et al. (2006), proposed a method for transforming the non-differentiable L1-norm function into a differentiable function by replacing it with a differentiable approximation. For a one dimensional case, the approximation to the absolute value function is given by |x| ≈ √ x2 + �. To determine the best approximate solution, we first examine the nature of the plot for various values of �. Approximation of the absolute value function for different values of �, is given in Figure 1. Figure 1. Quadratic Approximation of |x|� for various values of the approximation param- eter, �. Figure 1 indicates that lim �→0 |x|� = |x| . Thus, we choose � = 0.0001 for the subsequent implementation. The gradient, ∇(|x|�), and the Hessian, ∇2(|x|�), of the smoothing approximation of the absolute value function given in single variable form are derived as follows: ∇(|x|�) = x √ x2 + � and ∇2(|x|�) = �(√ x2 + � )3 . For x ∈
0 is the step size and H(xk) is the Hessian at the current iterate. From (2.1), the gradient of g(α) is given as ∇g(α) = 2XT (Xα− y) + λG(α), where G(α) = [ α1(α 2 1 + �) −1 2 , α2(α 2 2 + �) −1 2 , · · · , αp(α2p + �) −1 2 ]T , and the Hessian is also given as H(α) = 2XT X + �λh(α), where h(α) = diag [ (α21 + �) −3 2 , (α22 + �) −3 2 , · · · , (α2p + �) −3 2 ] . We now consider the implementation of the algorithm. Int. J. Anal. Appl. 19 (2) (2021) 270 2.1.2. Numerical Experiment. To illustrate our results, we make use of the 12 × 7 Hilbert submatrix of the 12 × 12 Hilbert matrix which constitute an overdetermined system. Hilbert matrices are known to be very ill-conditioned because the coefficient matrix XT X is almost near zero. y is chosen such that the true solu- tion is α = [1, 1, 1, 1, 1, 1, 1]T . We want to find α ∈