International Journal of Analysis and Applications

Volume 19, Number 2 (2021), 264-279

URL: https://doi.org/10.28924/2291-8639

DOI: 10.28924/2291-8639-19-2021-264

SMOOTHING APPROXIMATIONS FOR LEAST SQUARES MINIMIZATION WITH

L1-NORM REGULARIZATION FUNCTIONAL

HENRIETTA NKANSAH∗, FRANCIS BENYAH, HENRY AMANKWAH

Department of Mathematics, University of Cape Coast, Cape Coast, Ghana

∗Corresponding author: hnkansah@ucc.edu.gh

Abstract. The paper considers the problem of least squares minimization with L1-norm regularization

functional. It investigates various smoothing approximations for the L1-norm functional. It considers

Quadratic, Sigmoid and Cubic Hermite functionals. A Tikhonov regularization is then applied to each

of the resulting smooth least squares minimization problem. Results of numerical simulations for each

smoothing approximation are presented. The results indicate that our regularization method is as good as

any other non-smoothing method used in developed solvers.

1. Introduction

We consider the problem

(1.1) min
α
g(α) = f(α) + λj(α)

where f(α) is smooth, j(α) is non-smooth and λ > 0 is the regularization parameter. In particular, we

examine f(α) = ‖Xα− y‖22 and j(α) = ‖α‖1 .

Therefore, the problem becomes

(1.2) min
α
g(α) = ‖Xα− y‖22 + λ‖Lα‖1 ,

and L is the p×n discrete approximation of the (n−p)-th derivative operator.

Received October 8th, 2019; accepted November 4th, 2019; published March 5th, 2021.

2010 Mathematics Subject Classification. 68W25.

Key words and phrases. least squares minimization; regularization; smoothing approximations; over-determined systems.

©2021 Authors retain the copyrights

of their papers, and all open access articles are distributed under the terms of the Creative Commons Attribution License.

264

https://doi.org/10.28924/2291-8639
https://doi.org/10.28924/2291-8639-19-2021-264


Int. J. Anal. Appl. 19 (2) (2021) 265

In this paper, we focus on over-determined linear model of the form

Xα = y,

where α ∈<p is the vector of unknowns, y ∈<m is the vector of observations and X ∈<m×p is the decision

matrix. When m ≥ p and the columns of X are linearly independent, we can determine α by solving the least

squares problem of minimizing the quadratic loss ‖Xα− y‖22 , where ‖v‖2 = (
∑
i v

2
i )

1
2 denotes the L2-norm

of v and ‖α‖1 =
∑
|αi| . When m is not large enough compared to p, a simple least-squares problem leads

to over-regularization.

1.1. Background. The L1-norm minimization has attracted attention in data fitting as an effective tech-

nique for solving over-determined systems of linear equations and has also been proposed for regularization

([7], [10], [11]). The L1- norm is a matrix norm that penalizes the sum of maximum absolute values of

each row. This regularizer encourages row sparsity, that is, it encourages the entire rows of the matrix to

have zero elements. In essence, this type of regularization aims at extending the L1 framework for learning

sparse models to a setting where the goal is to learn a set of sparse models. Learning algorithms based on

L1 regularized loss functions have had a relatively long history in machine learning, covering a wide range

of applications such as sparse sensing ([5], [6]), L1-logistic regression [8] and structure learning of Markov

networks [9]. A well known property of L1 regularized models is their ability to recover sparse solutions.

Because of this, they are suitable for applications where discovering significant features is of value and where

computing features is expensive. In addition, it has been shown that in some cases, L1 regularization can

lead to sample complexity bounds that are logarithmic in the number of input dimensions, making it suitable

for learning in high dimensional spaces [9]. [11] developed an interior point algorithm for optimizing a twice

differentiable objective regularized problem with an L1- norm. One of the limitations of this approach is that

it requires the exact computation of the Hessian of the objective function. This might be computationally

expensive for some applications both in terms of memory and time. An alternative approach was proposed

by [10], who combined a gradient-descent method with independent L∞ projections. For the special case of

a linear objective function, the regularization problem can be expressed as a linear programme [12]. While

this is feasible for small problems, it does not scale to problems with large number of variables. ([13], [16])

also proposed an L1 projection algorithm which is a special case of the algorithm where m = 1. The deriva-

tion of the general case for L1 regularization is significantly more involving, as it requires reducing a set

of L∞ regularization problems tied together through a common L1 -norm to a problem that can be solved

efficiently. Similar to the L1- norm, the L2 -norm has also been proposed for sparse approximation. This

norm penalizes the sum of the L2- norms of each row ([14], [15],[17]).


Int. J. Anal. Appl. 19 (2) (2021) 266

A standard technique to prevent over-regularization is Tikhonov regularization or the L2-norm [3] given

by

(1.3) min
α

g(α) = ‖Xα− y‖22 + λ‖α‖
2
2 .

The L2- regularized least squares problem (LSP) has an analytic solution of

αL2 = (X
T X + λI)−1XT y.

Solutions to the LSP can either be computed by direct methods or by nondirect method, that is, applying

iterative methods to the linear system of equations (XT X + λI)α = XT y. Iterative methods are efficient

especially when there are fast algorithms for the matrix-vector multiplications with the decision matrix X

and its transpose XT .

Singular Value Decomposition of X in component form is

(1.4) αreg =

p∑
i=1

fi
UTi y

σi
Vi

where fi are the filter factors and is given by

fi =
σ2i

λ + σ2i
.

If λ << σ2i , fi ≈ 1, which indicates that the filter factors has no effect on the solution. Thus, Equation

(1.4) is without regularization.

However, if

λ >> σ2i , fi ≈
σ2i
λ
, and

σ2i
λ
→ 0. This indicates that the filter factors has effect on the solution. In

this case, it reduces the effect of the smaller singular values. Thus, Equation (1.4) gives the solution with

regularization.

Another technique is the L1-Regularized Least Squares in which we substitute a sum of absolute values for

the sum of squares used in the L2-norm regularization, to obtain Equation (1.2). This problem in Equation

(1.2) always has a solution but it needs not be unique.

The first part of g(α) is smooth but the second part is non-smooth. In this paper, we explore three

smoothing approximations that can be used to replace the L1-norm regularized term thereby enabling us

to apply the Tikhonov regularization method. These approximations are the Quadratic Approximation of

a function [2], Sigmoid Function Approximation [1] and the Cubic Hermite Approximation. These three

approximations are used to obtain a regularized solution to the least-squares problem in the case where

L = Ip, which is the Tikhonov regularization of order zero. In each case, we will compare the solution

from our regularization method with that of the Modified Newton’s Method, which is mostly used in the


Int. J. Anal. Appl. 19 (2) (2021) 267

literature. We begin by implementing the Modified Newton’s Method used by Lee et al.(2006) for solving

an unconstrained optimization problem in order to ascertain the challenges associated with the method.

2. Smoothing Approximations

2.1. Quadratic Approximation. Lee et al. (2006), proposed a method for transforming the non-differentiable

L1-norm function into a differentiable function by replacing it with a differentiable approximation. For a

one dimensional case, the approximation to the absolute value function is given by

|x| ≈
√
x2 + �.

To determine the best approximate solution, we first examine the nature of the plot for various values of

�. Approximation of the absolute value function for different values of �, is given in Figure 1.

Figure 1. Quadratic Approximation of |x|� for various values of the approximation param-

eter, �.

Figure 1 indicates that

lim
�→0
|x|� = |x| .

Thus, we choose � = 0.0001 for the subsequent implementation. The gradient, ∇(|x|�), and the Hessian,

∇2(|x|�), of the smoothing approximation of the absolute value function given in single variable form are

derived as follows:

∇(|x|�) =
x

√
x2 + �

and ∇2(|x|�) =
�(√

x2 + �
)3 .

For x ∈<p,

‖x‖1 =
p∑
i=1

|xi| ≈
p∑
i=1

|xi|� .


Int. J. Anal. Appl. 19 (2) (2021) 268

The loss function given in Equation (1.2) therefore becomes

(2.1) g(α) ≈‖Xα− y‖22 + µ
p∑
i

√
α2i + �.

The regularized solution of (2.1)is given as

αµ = (X
T X + µI)−1XT y,

where µ = 1
2
λ�−

1
2 . The regularized solution αµ written in terms of singular value decomposition (SVD) is

given in component form as

αµ =

p∑
i=1

σi
σ2i + µ

(UTi y)Vi,

where σi are the singular values of the matrix X.

2.1.1. Derivation of Analytic Solution using Lee-Quadratic Approximation. Let

k(x) =

p∑
i

√
x2i + � = ‖x‖� ≈‖x‖1 .

Since k(x) is differentiable at x = 0, a Taylor’s expansion about x = 0 is given as

k(x) ≈ k(x0) + ∇k(x0)T (x − x0) +
1

2
(x − x0)T∇2k(x0)(x − x0) + · · ·

≈ k(0) + ∇k(0)T x +
1

2
xT∇2k(0)x + · · · ,

where

k(0) = p�
1
2 , ∇k(0) = ∇k(x) =

xi√
x2i + �

∣∣∣∣∣
x=0

= 0,

and

∇2k(0) = ∇2k(x) =
�(√

x2i + �
)3
∣∣∣∣∣
x=0

=




�−
1
2 0 · · · 0

0 �−
1
2 · · · 0

... · · ·
. . . · · ·

0 0 · · · �−
1
2




= �−
1
2 Ip.

Therefore,

k(x) = p�
1
2 +

1

2
xT�−

1
2 xIp.

Thus, g(α) in (2.1) becomes

(2.2) g(α) = ‖Xα− y‖22 + λ
(
p�

1
2 +

1

2
αT�−

1
2 α
)


Int. J. Anal. Appl. 19 (2) (2021) 269

which is now differentiable. Finding the gradient of g(α) and equating to zero gives

2XT Xα− 2XT y + λα�−
1
2 = 0

XT Xα +
1

2
λα�−

1
2 = XT y(

XT X +
1

2
λ�−

1
2 I
)
α = XT y

(2.3) αµ =
(
XT X + µI

)−1
XT y,

where µ = 1
2
λ�−

1
2 .

The method usually considered in the literature after obtaining a smoothing approximation to replace the

L1-norm functional is an unconstrained optimization method known as the Modified Newton’s Method. To

compare our results using regularization with the Modified Newton’s Method, we now implement this method.

The algorithm based on the implementation of Modified Newton’s Method is formulated as

xk+1 = xk −βkH(xk)−1∇g(xk),

where xk+1 is the next iterate, xk is the current iterate, ∇g(xk) is the gradient at the current iterate xk,

βk > 0 is the step size and H(xk) is the Hessian at the current iterate.

From (2.1), the gradient of g(α) is given as

∇g(α) = 2XT (Xα− y) + λG(α),

where

G(α) =
[
α1(α

2
1 + �)

−1
2 , α2(α

2
2 + �)

−1
2 , · · · , αp(α2p + �)

−1
2

]T
,

and the Hessian is also given as

H(α) = 2XT X + �λh(α),

where

h(α) = diag
[
(α21 + �)

−3
2 , (α22 + �)

−3
2 , · · · , (α2p + �)

−3
2

]
.

We now consider the implementation of the algorithm.


Int. J. Anal. Appl. 19 (2) (2021) 270

2.1.2. Numerical Experiment. To illustrate our results, we make use of the 12 × 7 Hilbert submatrix of the

12 × 12 Hilbert matrix which constitute an overdetermined system. Hilbert matrices are known to be very

ill-conditioned because the coefficient matrix XT X is almost near zero. y is chosen such that the true solu-

tion is α = [1, 1, 1, 1, 1, 1, 1]T . We want to find α ∈<p such that Xα = y.

Numerical simulations are performed to obtain an optimal regularization parameter µ which will hopefully

give a solution close to the true solution. We define µ = 10−16 for the Modified Newton’s Method and

µ = 10−30 for the Regularization Method after several iterations.

Script-files are created in OCTAVE 4.0 to compute the solutions at various iterations. In the implemen-

tations, we define � = 0.0001, βk = β = 2 and set initial guess x0 = 0.25 ∗ ones(7, 1). We also initialize

x := ones(7, 1). Numerical simulations are performed with the best approximate solution occurring at the

81st iteration. The result of the implementation of the algorithm based on Lee’s approximation is given in

Table 1 along with its regularized solution.

Table 1 shows the solutions corresponding to the optimal regularization parameter of the two methods.

Table 1: Modified Newton’s Method(MNM) vrs Regularization Method(RM) with Quadratic

Approximation
Method µ α̂ ‖αexact − α̂‖

MNM 10−16

1.000000680934881

0.999975996291894

1.000210550708668

0.999239773222684

1.001312540621827

0.998920937855809

1.000339707272387

1.31254062182729e− 003

RM 10−30

0.999999999998873

1.000000000038512

0.999999999666076

1.000000001202600

0.999999997920514

1.000000001715320

0.999999999457774

2.07948647190648e− 009

From Table 1, by increasing the value of µ from 10−16, the solution seems to deteriorate. The iterations

show that as we move away from the 81st iterate, there is not much difference between the solutions from

the 82nd to the 100th iteration.

The accuracy in the computed solution of MNM corresponding to µ = 10−16 is just about 3 digits. The loss

in the accuracy of the solution is due to the fact that the coefficient matrix XT X of the normal equations is

ill-conditioned, with a condition number κ ≈ 2.31648078701200e + 015.


Int. J. Anal. Appl. 19 (2) (2021) 271

These results verify the findings of Lee et al. (2006) that gives a slow convergence which is due to the

ill-conditioning of XT X. In order to overcome the undesirable effects of ill-conditioning, we make use of

regularization method. It is found that the best approximate solution for the MNM occurred at µ = 10−16,

with the step size β = 2, and at the 81st iteration. For the RM, the best approximate solution occurred at

µ = 10−30 with nine digit accuracy.

2.2. Sigmoid Function Approximation. In this section, we consider the Sigmoid Function approximation

to the L1-norm functional. It takes advantage of the non-negative projection operators

(x)+ = max(x, 0) and (−x)+ = max(−x, 0).

This projection function can be smoothly approximated by the integral of a sigmoid function [1] given as

(x)+ ≈ p(x,κ) = x +
1

κ
log(1 + e−κx)

and

(−x)+ ≈ p(−x,κ) = −x +
1

κ
log(1 + eκx).

The functions p(x,κ) and p(−x,κ) are members of a class of smoothing functions presented in [1]. These

smoothing approximations of the projections have been used to transform the standard L1-norm formulation

into an efficiently solved unconstrained problem [2]. By combining p(x,κ) and p(−x,κ), we obtain the identity

|x| = (x)+ + (−x)+

where (x)+ + (−x)+ are the left and right parts of abs(x) = |x| .

We arrive at a smoothing approximation for the absolute value function that consists of the sum of the

integral of two sigmoid functions given by

|x| ≈ (x)+ + (−x)+ = p(x,κ) + p(−x,κ)

=
1

κ
[log(1 + e−κx) + log(1 + eκx)]

def
= |x|κ

A graph of different values of the parameter κ in approximating the absolute value function is given in

Figure 2.


Int. J. Anal. Appl. 19 (2) (2021) 272

Figure 2. Sigmoid Approximations of |x|κ for various values of κ.

Figure 2 indicates that |x|κ →|x| as κ →∞. That is,

lim
κ→∞

|x|κ = |x| .

Thus, it would be suitable to choose κ = 1000 for the subsequent implementation.

Given the smoothing approximation, the gradient ∇(|x|κ) and the Hessian ∇
2(|x|κ) in single variable form

is derived as follows:

|x|κ =
1

κ

[
log(1 + e−κx) + log(1 + eκx)

]
∇(|x|κ) =

1

κ

[ −κe−κx
1 + e−κx

+
κeκx

1 + eκx

]
=

(1 + eκx) − (1 + e−κx)
(1 + e−κx)(1 + eκx)

=
(1 + eκx)

(1 + e−κx)(1 + eκx)
−

(1 + e−κx)

(1 + e−κx)(1 + eκx)

=
1

1 + e−κx
−

1

1 + eκx
.

Therefore,

∇(|x|κ) = (1 + e
−κx)−1 − (1 + eκx)−1,

and

∇2(|x|κ) = (1 + e
−κx)−2κe−κx + (1 + eκx)−2κeκx

=
κeκx

(1 + eκx)2

[ (1 + eκ)2
(1 + e−κx)2

(e−κx)2 + 1
]

=
κeκx

(1 + eκx)2

[(e−κx(1 + eκx)2)
(1 + e−κx)2

+ 1
]


Int. J. Anal. Appl. 19 (2) (2021) 273

=
κeκx

(1 + eκx)2

[(e−κx + 1)2
(1 + e−κx)2

+ 1
]

=
2κeκx

(1 + eκx)2
.

Therefore,

∇2(|x|κ) =
2κeκx

(1 + eκx)2
.

For x ∈<p,

‖x‖1 =
p∑
i=1

|xi| ≈
p∑
i=1

|xi|κ .

The loss function in (1.2) therefore becomes

(2.4) g(α) = ‖Xα− y‖22 + λ
p∑
i

1

κ

[
log(1 + e−καi ) + log(1 + eκαi )

]
.

which is now differentiable. The gradient of g(α) is given as

∇g(α,κ) = 2XT (Xα− y) + λ
p∑
i=1

(1 + e−καi )−1 − (1 + eκαi )−1.

A linear approximation to

∇(|α|κ) = (1 + e
−κα)−1 − (1 + eκα)−1

is obtained as

k(α) =
1

2

(
κα +

(κα)3

3!
+ · · ·

)
after some expansion and simplification. By ignoring terms of higher order, we obtain the linear approxima-

tion

k(α) =
1

2
κα.

Using this linear approximation and equating ∇g(α) to zero, we solve the equation to obtain

XT Xα +
1

2
λk(α) = XT y,

which gives

αµ = (X
T X + µI)−1XT y

where µ = 1
4
λκ.

Using the linear approximation for k(α), we obtain the minimization of g(α) as

XT Xα +
1

4
λκα = XT y.

Thus,

(2.5) αµ =
(
XT X + µI

)−1
XT y,


Int. J. Anal. Appl. 19 (2) (2021) 274

where µ =
1

4
λκ.

Therefore, the singular value decomposition solution in component form is given as

αµ =

p∑
i=1

σi
σ2i + µ

(UTi y)Vi.

To implement the Modified Newton’s Method for sigmoid approximation, we want to find α ∈ <p such

that Xα = y. From (2.4), the gradient of g(α) is given by

∇g(α) = 2XT (Xα− y) + λG(α)

where G(α) =
[
(1 + e−κα1 )−1 − (1 + eκα1 )−1, · · · , (1 + e−καp )−1 − (1 + eκαp )−1

]T
and Hessian

∇2(g(α)) = 2XT X + 2κλ h(α),

where h(α) = diag

[
eκα1

(1 + eκα1 )2
,

eκα2

(1 + eκα2 )2
, · · · ,

eκαp

(1 + eκαp )2

]
.

In the implementation, we define κ = 300, βk = β = 3 and set initial guess α0 = 0.25∗ones(7, 1). We also

initialize α := ones(7, 1). Numerical simulations are performed with the best approximate solution occurring

at the 84th iterate.

The results of the implementation based on sigmoid approximation is given in Table 2 for the Modified

Newton’s Method and the Regularization Method.

Table 2: Modified Newton’s Method(MNM) vrs Regularization Method(RM) with Sigmoid Approximation

Method µ α̂ ‖αexact − α̂‖

MNM 10−16

1.000005092022318

0.999820688361910

1.001571753611182

0.994327784550849

1.009789408764376

0.991954381867874

1.002532280742229

9.78940876437595e− 003

RM 10−30

0.999999999998873

1.000000000038512

0.999999999666076

1.000000001202600

0.999999997920514

1.000000001715320

0.999999999457774

2.07948647190648e− 009

From Table 2, the accuracy in the computed solution of MNM corresponding to µ = 10−16 is just about

3 digits. The accuracy in that of the RM is up to about 9 digits.


Int. J. Anal. Appl. 19 (2) (2021) 275

2.3. Cubic Hermite Approximation. The Cubic Hermite approximation is a spline where each piece is

a third-degree polynomial specified in Hermite form: that is, by its values and first derivatives at the end

points of the corresponding domain interval. The Hermite form of a cubic polynomial defines the polynomial

p(x) by specifying two distinct points [−γ, γ], and providing values for the following four equations




0 1 2γ 3γ2

0 1 −2γ 3γ2

1 γ γ2 γ3

1 −γ γ2 −γ3






a0

a1

a2

a3


 =




1

−1

γ

γ


 .(2.6)

Solving for the unknown parameters in Equation (2.6), gives

a0 =
γ

2
, a1 = 0, a2 =

1

2γ
, a3 = 0.

Therefore,

P(x) =
γ

2
+

1

2γ
x2.

To determine the best approximate solution, we first examine the nature of the plot of the absolute

value function for various values of γ. The graph of the abs(x) is given in Figure 3 for various values of the

parameter γ.

Figure 3. Cubic Hermite Approximation of |x|γ for various values of Approximating

Parameter, γ.

Figure 3 indicates that

lim
γ→0
|x|γ = |x| .


Int. J. Anal. Appl. 19 (2) (2021) 276

The implementation shows that γ = 0.05 is a more suitable choice.

Thus, the scalar Cubic Hermite approximation to the absolute value function is given as

|x|γ ≈
γ

2
+

1

2γ
x2.

The gradient ∇(|x|γ) and the Hessian ∇
2(|x|γ) are derived as follows:

∇(|x|γ) =
x

γ
and ∇2(|x|γ) =

1

γ
.

For x ∈<p,

‖x‖1 =
p∑
i=1

|xi| ≈
p∑
i=1

|xi|γ .

The loss function in Equation (1.2) therefore becomes,

(2.7) g(α) = ‖Xα− y‖22 + λ
p∑
i

(γ
2

+
1

2γ
α2i

)
,

which is now differentiable. The gradient of g(α) is given as

∇g(α) = 2XT (Xα− y) + λG(α),

where G(α) = (α1
γ
, α2

γ
, · · · , αp

γ
)T .

Equating ∇g(α) to zero and solving gives

αµ = (X
T X + µI)−1XT y,

where µ = 1
2γ
λ.

Therefore, the singular value decomposition solution in component form is given as

αµ =

p∑
i=1

σi
σ2i + µ

(UTi y)Vi.

To implement the Modified Newton’s method for Cubic Hermite approximation, we want to find α ∈<p

such that Xα = y.

Now, the Hessian of g(α) is

∇2(g(α)) = 2XT X +
1

γ
λIp.

In the implementation, we define γ = 0.05, βk = β = 3 and set initial guess α0 = 0.25∗ones(7, 1). We also

initialize α := ones(7, 1). Numerical simulations are performed with the best approximate solution occurring

at the 87th iterate.

The result of the implementation based on Cubic Hermite approximation is given in Table 3 for the

Modified Newton’s Method and the Regularization Method.


Int. J. Anal. Appl. 19 (2) (2021) 277

Table 3: Modified Newton’s Method(MNM) vrs Regularization Method(RM)

Method µ α̂ ‖αexact − α̂‖

MNM 10−16

0.999983772069521

1.000576215743305

0.994919717190969

1.018414062557087

0.968111442843396

1.026280481466755

0.991709642462265

3.18885571566037e− 002

RM 10−30

0.999999999998873

1.000000000038512

0.999999999666076

1.000000001202600

0.999999997920514

1.000000001715320

0.999999999457774

2.07948647190648e− 009

We note from Tables 1, 2 and 3 that the MNM solutions vary for the three smoothing approximations

considered. However, the regularized solutions are the same for each of the three smoothing approximations

at µ = 10−30.

We now compare the three smoothing approximations using regularization with a non-smooth method

which makes use of Truncated Newton interior-point method described in [4]. In that paper, they developed

a Matlab Solver for Large-Scale L1-Regularized Least Squares Problems called l1 ls. Using our value of the

parameter µ = 10−30 in the l1 ls, we display in Table 4 the result of all three regularization methods (RM)

and that of the l1 ls.

Table 4 : Summary of methods and their solutions at µ = 10−30

Method
Solution corresponding to

µ = 10−30
‖α− α̂‖

RM

0.999999999998873

1.000000000038512

0.999999999666076

1.000000001202600

0.999999997920514

1.000000001715320

0.999999999457774

2.07948647190648e− 009

l1 ls

1.000000000000292

0.999999999990242

1.000000000081881

0.999999999715937

1.000000000472646

0.999999999624568

1.000000000114463

4.72645700355656e− 010


Int. J. Anal. Appl. 19 (2) (2021) 278

From Table 4, it is seen that the solutions from the three smoothing approximations by regularization

method is as good as the non-smooth method, which is accurate to about 10 digits.

3. Conclusion

We have considered three different methods for approximating the absolute value function, which is non-

differentiable and used in the L1-norm minimization problem. Under each of these three methods namely,

the Quadratic, Sigmoid and Cubic Hermite approximations, we have obtained optimal approximate solutions

by means of the Newton’s method and by regularization. It is observed that the results of the Newton’s

method under all three methods show visible differences and produce solutions that are accurate only to at

most 3 digits. However, the regularized solution using smoothing approximations produce almost the same

results that are accurate to 9 digits at the same parameter value of µ = 10−30. This value of µ is obviously

much smaller than the usual machine representation of 10−16 for 0. The results obtained are as good as the

non-smooth methods used in developed solvers.

Conflicts of Interest: The author(s) declare that there are no conflicts of interest regarding the publication

of this paper.

References

[1] C. Chen, O.L. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Comput.

Optim. Appl. 5 (1996), 97–138.

[2] S.I. Lee, H. Lee, P. Abbeel, A.Y. Ng, Efficient L1 regularized logistic regression. https://www.aaai.org/Papers/AAAI/

2006/AAAI06-064.pdf (2006).

[3] A. Neumaier, Solving Ill-Conditioned and Singular Linear Systems: A Tutorial on Regularization, SIAM Rev. 40 (1998),

636–666.

[4] K. Koh, S.-J. Kim, S. Boyd, An interior-point method for L1-regularized logistic regression, J. Mach. Learn. Res. 8 (2007),

1519-1555.

[5] W.J. Fu, Penalized Regressions: The Bridge versus the Lasso, J. Comput. Graph. Stat. 7 (1998), 397–416.

[6] D.L. Donoho, Denoising via soft-thresholding, IEEE Trans. Info. Theory, 41 (1995), 613–627.

[7] D.L. Donoho, I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika. 81 (1994), 425–455.

[8] B. Efron, The Estimation of Prediction Error: Covariance Penalties and Cross-Validation, J. Amer. Stat. Assoc. 99 (2004),

619–632.

[9] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Stat. 32 (2004), 407–499.

[10] M. Schmidt, G. Fung, R. Rosales, Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New

Approaches, in: J.N. Kok, J. Koronacki, R.L. de Mantaras, S. Matwin, D. Mladenič, A. Skowron (Eds.), Machine Learning:

ECML 2007, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007: pp. 286–297.

[11] J.A. Tropp. Just relax: Convex programming methods for subset selection and sparse approximation. IEEE Trans. Inform.

Theory, 51 (3) (2006), 1030–1051.

[12] B.A. Turlach, Shape constrained smoothing using smoothing splines, Comput. Stat. 20 (2005), 81–104.

[13] B.T. Polyak, Introduction to Optimization. Optimization Software Inc. Publication Division, New York, 1987.

https://www.aaai.org/Papers/AAAI/2006/AAAI06-064.pdf
https://www.aaai.org/Papers/AAAI/2006/AAAI06-064.pdf


Int. J. Anal. Appl. 19 (2) (2021) 279

[14] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), Sov. Math. Dokl. 27 (2)

(1983), 372-376.

[15] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinevsky, An Interior-Point Method for Large-Scale l1-Regularized Least

Squares, IEEE J. Sel. Top. Signal Process. 1 (2007), 606–617.

[16] P. Zhao, B. Yu, On model selection consistency of Lasso. J. Mach. Learn. Res. 7 (2006), 2541-2563.


	1. Introduction
	1.1. Background

	2. Smoothing Approximations
	2.1. Quadratic Approximation
	2.2. Sigmoid Function Approximation
	2.3. Cubic Hermite Approximation

	3. Conclusion
	References