Photovoltaic Cells and Systems:


 SQU Journal for Science, 17 (1) (2012) 103-124  

© 2012 Sultan Qaboos University 

301 

A Superlinearly Convergent Penalty 
Method with Nonsmooth Line Search for 
Constrained Nonlinear Least Squares    

Nezam Mahdavi-Amiri* and Mohammad Reza Ansari** 

Faculty of Mathematical Sciences, Sharif University of Technology, Tehran, Iran, 

*Email: nezamm@sina.sharif.edu,**Email: ansari.mr@gmail.com. 
 

ABSTRACT: Recently, we have presented a projected structured algorithm for solving 

constrained nonlinear least squares problems, and established its local two-step Q-

superlinear convergence. The approach is based on an adaptive structured scheme due to 

Mahdavi-Amiri and Bartels of the exact penalty method. The structured adaptation also 

makes use of the ideas of Nocedal and Overton for handling the quasi-Newton updates of 

projected Hessians and appropriates the structuring scheme of Dennis, Martinez and Tapia. 

Here, for robustness, we present a specific nonsmooth line search strategy, taking account 

of the least squares objective. We also discuss the details of our new nonsmooth line search 

strategy, implementation details of the algorithm, and provide comparative results obtained 

by the testing of our program and three nonlinear programming codes from KNITRO on 

test problems (both small and large residuals) from Hock and Schittkowski, Lukšan and 

Vlček and some randomly generated ones due to Bartels and Mahdavi-Amiri. The results 

indeed affirm the practical relevance of our special considerations for the inherent structure 

of the least squares.   

 
KEYWORDS: Constrained nonlinear programming, Exact penalty method, Nonlinear least 

squares. Nonsmooth line search, Projected structured Hessian update.  

 مقيدةوطريقة جزاء فائقة التقارب مع خط بحث غير قابل لالشتقاق لمسائل مربعات صغرى غير خطية 

 نظام مهداوي ومحمد رضا أنصاري

لذمىا حذَثاً خىاسصمُت مشوبت ٌحً مغائً اٌمشبعاث اٌصغشي غُش اٌخطُت اٌممُذة، وأوذوا تماسبها اٌمىظعٍ  :ملخص

اٌفائك بخطىتُه. ولذ تم رٌه باالعتىاد إًٌ التباط خطت مهذاوٌ أمُشٌ و باستٍض ٌطشَمت اٌجضاء اٌذلُمت. رٌه االلتباط 

وُىته إلعماغاث هظ وعًٍ خىاص اٌخطت اٌتشوُبُت ت متماثٍحُح َغتىذ أَعا عًٍ أفىاس وىعُذاي وأفُشتىن فٍ وُفُت تص

هذف ًٌ إاالعتىاد ، وعشض هىا خػ بحث معُه غُش لابً ٌالشتماق، بهزي اٌىتُجت ٌذوُظ وماستىُض وتابُت. وٌتأوُذ

هُض وتائج ٍخىاسصمُت، وتجاٌتفاصًُ اٌتىفُزَت ٌ، وٍخػ اٌبحثهزا اٌاعتشاتُجُت  تفاصًُاٌمشبعاث اٌصغشي. وزٌه وىالش 

ٌٍبشمجت غُش اٌخطُت  (KNITRO)واَتشو  بشوامج مماسوت تم اٌحصىي عٍُها بئختباس بشوامجىا مع ثالثت حاالث مهٌٍ

فٍُه وبعط -شُتىىعىٍ وٌىوشان-صغُشة ووبُشة( تىتمٍ ٌمغائً هىن متبمُت راث أخطاءعًٍ مغائً ومىرجُت )

اٌعاللت اٌتطبُمُت الفتشاظاتىا اٌخاصت حىي اٌىتائج تٍه مهذاوٌ أمُشٌ. تؤوذ و باستٍضئُا بىاعطت اٌمختاسة عشىامغائً اٌ

 تشوُبت اٌمشبعاث اٌصغشي.

mailto:nezamm@sina.sharif.edu
mailto:ansari.mr@gmail.com


N. MAHDAVI-AMIRI and M.R. ANSARI 

300 

   
1.  Introduction 

n exact penalty method is a sequential unconstrained minimization approach for solving general nonlinear 

programming (NLP) problems. Some attractive features of the exact penalty methods are: 

 They overcome the difficulties posed by inconsistent constraint linearization (Fletcher, 1987).  

 They are successful in solving certain classes of problems in which standard constraint qualifications 
are not satisfied (Anitescu, 2005; Benson et al., 2006; Leyffer et al., 2006). 

 They are useful to enhance the robustness and ensure the feasibility of the subproblems produced in the 
iterations of the algorithm (Byrd et al., 2004; Gill et al., 2002). 

Coleman and Conn (1982a, 1982b, 1984) presented an effective exact penalty algorithm for solving 

generally constrained nonlinear minimization problems. Mahdavi-Amiri and Bartels (1989) adapted the 

approach to develop an algorithm using projected structured Hessian for solving constrained least squares 

problems. Although computational results given in (Mahdavi-Amiri and Bartels, 1989) showed a promising two-

step superlinear asymptotic convergence, the authors did not provide an analytical proof. We presented a variant 

of the approach in (Mahdavi-Amiri and Bartels, 1989) and established both the global and local superlinear 

convergence (Mahdavi-Amiri and Ansari, 2011a, 2011b). Our algorithm appropriates the ideas of Dennis et al. 

(1989) in the unconstrained case to the projected structured Hessian updating. Here, we present the details of our 

new structurally adapted nonsmooth line search strategy, implementation details and more extensive 

computational results. 

The remainder of our work is organized as follows. Section 2 gives the notation and relevant background 

on constrained nonlinear least squares problems, the exact penalty approach and projected structured Hessian 

updating scheme for constrained nonlinear least squares problems. Section 3 discusses the new nonsmooth line 

search procedure accounting for the structure in the nonlinear least squares objective. Section 4 is devoted to the 

global and the local two-step superlinear convergence results. The practical issues relating to implementation are 

discussed in Section 5. Computational results in Section 6 substantiate the theoretical results and show that the 

new algorithm is competitive as compared to the best methods tested in the collection of Hock and Schittkowski 

(1981) and three algorithms developed in KNITRO (Byrd et al., 2006) integrated package. There, we also show 

the effectiveness of the algorithm on some larger test problems taken from (Lukšan and Vlček, 1999) and 

randomly generated test problems due to Bartels and Mahdavi-Amiri (1986). Finally, we conclude in Section 7. 

Throughout,  denotes the 2l  norm for vectors or matrices. 

2.  Constrained nonlinear least squares 

The problem to be solved is: 

 
     

 

 

1

2

1
min

2

. . 0 ,

0 , ,

T

x

i

i

x F x F x

s t c x i M

c x j M

 

 

 

        (1) 

 
where       1 1, , ,
T

F x f x f x  x is an n-vector, and for each ,f  ic and jc are functions from 
n

R  to 
1
,R  

all assumed to be twice continuously differentiable. We refer to each f   as a residual. There are good reasons 

for numerical analysts to study least squares problems, mainly because of their applicability to many practical 

problems. Almost anyone who formulates a parameterized model for a chemical, physical, financial, social or 

A 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

301 

economic application uses a problem of the form (1) to select values for the parameters that best match the 

model to the data. Although the problem (1) can be solved by a general constrained nonlinear optimization 

method, in most circumstances the inherent structure of the objective function in (1) makes it worthwhile to use 

specialized techniques. Notice that if  G x is the matrix with its columns being the gradients  ,f x then the 

gradient of  x is: 

     ,x G x F x   

and the Hessian of  x is: 

               2 2
1

: .
lT T

x G x G x f x f x G x G x S x 





                              (2) 

In many applications, it is possible to calculate the first partial derivatives that make up the matrix  G x  

explicitly. These could be used to calculate the gradient  .x  However, the distinctive feature of a least 

squares problem is that by knowing  ,G x  we can compute the first part of the Hessian  2 x for free. 

Moreover, the term    
T

G x G x is often more important than the second summation term in (2), either because 

of near-linearity of the model near the solution (that is, the  2f x being small) or because of small residuals 

(that is, the f   being small). Most algorithms for unconstrained nonlinear least squares exploit these structural 

properties of the Hessian. Newton's method uses the search direction p found by solving, 

          .TG x G x S x p G x F x                                          (3) 

For problems where,  S x  is small compared to    
T

G x G x  as x approaches a minimizer *,x  the term 

 S x  is dropped in (3) to yield the Gauss-Newton method. This method works well when  G x  has full row 

rank and  S x  is "sufficiently small", for discussions see (Dennis, 1977; Gill and Murray, 1978). 

Approximation for  S x  in cases where it cannot be neglected is a recurring theme in the literature. In the 

unconstrained case, quasi-Newton approximations to only the second term,  ,S x  of the Hessian matrix (2) 

have been developed (Dennis et al., 1989; Dennis and Walker, 1981; Engels and Martinez, 1991). Pertinent to 

our proposal is the work by Dennis et al. (1981), who made investigations of quasi-Newton updates to a matrix 

 .B S x  This additive structure was analyzed by Dennis et al. (1989), was used by Al-Baali and Fletcher 

(1985) and was extended to an infinite dimensional setting by Griewank (1987). These strategies are called 

"structured quasi-Newton" methods. Tapia (1988) extended the class of secant updates to equality constrained 

optimization utilizing the structure present in the Hessian of the augmented Lagrangian. Local Q-superlinear 

convergence of these structured secant methods were established by Dennis and Walker (1981). Based on 

Tapia's approach, Huschens (1993) also presented an approach to exploit the nonlinear least squares structure. A 

hybrid method, linking the reduced successive quadratic programming (SQP) method with the ideas of Fletcher 

and Xu (1987), was proposed by Tjoa and Biegler (1991). Mahdavi-Amiri and Bartels (1989) adjoined the 

reduced SQP method with the ideas developed by Dennis et al. (1981) for the unconstrained structured quasi-

Newton BFGS update to develop a projected structured approach for solving the problem (1). 


N. MAHDAVI-AMIRI and M.R. ANSARI 

302 

2.1  An exact penalty function   

Here  ,x  denotes the 1 -exact penalty function, 

        
1 2

, min 0, ,i j
i M j M

x x c x c x  
 

    

where the penalty parameter   is a positive number. It is well known that for appropriate values of the penalty 

parameter ,  stationary points of  ,x   are either KKT points of the nonlinear program (1) or infeasible 

stationary points (Byrd et al., 2005). Pietrzykowski (1969) showed this penalty function to be exact in the sense 

that if *x  is an isolated minimizer of (1) and the gradients of the active (binding) constraints at *x  are linearly 

independent, then there exists a real number * 0   such that *x  is also an isolated local minimizer of 

 , ,x   for each , 0 *.     Luenberger (1970) determined, in the convex case, the threshold value of .   

The parameter 0   is used to determine the closeness of any constraint to zero, and hence to judge its 
activity. Using the approach of Mahdavi-Amiri and Bartels (1989), we define the index sets 

    1 2, iAC x i M M c x      of active constraints,     1, ,AE x AC x M    of active equality 
constraints,     2, ,AI x AC x M    of active inequality constraints,    1, ,VE x M AE x    of 

violated equality constraints, and    2, ,VI x M AI x    of violated inequality constraints. Thus, the 

minimization of   is carried out with the aid of an  -active merit function: 

   
 

      
 , ,

, sgn .i i j
i V E x s j V I x s

x x c x c x c x  
 

     

This provides a differentiable approximation to the true merit function   in a neighborhood of x. It is trivial that 

 ,x   is equal in value to  ,x   when 0.   Loosely speaking, step directions are determined using 

,  but line searches and optimality tests use .  

A necessary condition for *x  being an isolated local minimizer of   under the assumptions made above 

on ,  the ,ic  and the jc  is that there exist multipliers 
*
,r  for ( *,0),r AC x  such that 

                                       
* *

0
( *,0)

( *, ) = ( *) = ( *) ,r r
r AC x

x c x A x   


   (4) 

and 

                              
*

1 < < 1, ( *,0) ,r r AE x   (5) 

                               
*

0 < < 1, ( *,0) ,r r AI x   (6) 

where  ( )A x  is the matrix with its columns being the gradients of the active constraints at x , for more details 

see (Coleman and Conn, 1980). A point x for which only (4) above is satisfied is said to be a stationary point of 

.  If *x  is an isolated minimizer, then it is necessary for *x  to be a stationary point and satisfy (5) and (6). 

Note that stationarity and optimality are determined using 0 ,  which is   with = 0.  

We estimate r of the numbers 
*
r  only in the neighborhoods of stationary points. In such neighborhoods, 

the numbers r are taken to be the least squares solution to: 

                             ( ) ( , ) ,min A x x


    (7) 

and in practice, the QR decomposition of ( )A x  is used to solve the least squares problem: 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

303 

                                 ( ) = = .
0 0

R R
A x Q Y Z

   
   
   

 (8) 

If the columns of ( )A x are linearly independent, then the columns of Y and Z, respectively, serve as bases for the 

range space of ( )A x and null space of ( ) ;
T

A x  that is, ( ) = ,A x Y R  with R nonsingular, ( ) = 0,
T

A x Z  
T

Y Y  

and 
T

Z Z  are the identities. Nearness to a stationary point is governed by the stationary tolerance > 0.  The 

r  are computed only if the projected gradient, ( , ),
T

ZZ x   is deemed "small enough" according to this 

tolerance. 

2.2  The quadratic subproblem   

Fundamental to the Mahdavi-Amiri and Bartels (1989) approach is a particular unconstrained quadratic 

problem: 

                              
1

( ( , )) .min
2

T T T
z

w

Z x w w H w    (9) 

The matrix zH  is a positive definite approximation of one of two projected Hessian matrices, 

   
2 2

( , )

( ( , ) ( )) ,
T

r r
r AC x

Z x c x Z


  


    

as explained next. When the algorithm is in its "local" state,  that is in the proximity of a stationary point, the 

dual variables   are computed and in its "global" state, that is far from a stationary point, the value of the r  

are taken to be zero. For general nonlinear programming problems, a variety of projected quasi-Newton updates 

is available. For the general constrained nonlinear least squares problems, we make use of the ideas in (Dennis et 

al., 1989), in the unconstrained case, for structuring and in (Nocedal and Overton, 1985) for projecting. 

2.3  The step directions  

The minimization of    is carried out using several alternative step directions derived from :   

(1) the global horizontal step direction; 

(2) the dropping step direction; 

(3) the Newton step direction: 

(3a)  the asymptotic horizontal step direction; 

(3b)  the vertical step direction. 

The global horizontal step direction is :=Gh Z w where w is the solution of the quadratic problem (9). The 

matrix ,zH  of course, approximates the global projected Hessian.  

The asymptotic horizontal step direction Ah  is a component of the Newton step direction, .Ah v  The 

Newton steps are only attempted in close neighborhoods of stationary points that are expected to be minimizers. 

The step direction Ah is computed as in the global case above. In this case, the matrix zH is to be an 

approximation of the local projected Hessian.  

At ,Ax h  the constraints of ( , )AC x   may no longer be within   of zero. The vertical step is derived by 

taking a single Newton step toward the value of v that solves the system of equations, 

            ( , )( ) = ( ) ,
T

AC x AA x v C x h    

where  ( , )AC xC   is the vector of the active constraint functions, ordered in accordance with the columns of 

( ).A x  


N. MAHDAVI-AMIRI and M.R. ANSARI 

304 

The dropping step direction tries to drop a function ic  from equalities or jc  from inequalities of the 

collection of active constraints and provides a direction that gives a local first-order decrease in the penalty 

function value. It is the step direction d that satisfies the system of equations, 

 
            ( ) = sgn( ) ,

T
r rA x d e   

where  ( , )r AC x   is chosen for whichever one of (5) or (6) is violated, and  re  is the rth unit vector. 

The steps described above are used, broadly speaking, as follows: 

(1) When ( , ) > ,
T

ZZ x    then we set ,Gx x h   where a line search is used to determine > 0.  

(2) When ( , ) ,
T

ZZ x     then the multipliers ,r  ( , ),r AC x   are approximated by the least 

squares solution of (7). 

(a) If (5) or (6) is not satisfied, then an index ( , )r AC x   is chosen for whichever one of (5) or (6) is 

violated, and we set ,x x d   where a line search is used to determine > 0.  

(b) If (5) and (6) are satisfied, then we set Ax x h v    (no line search is used for Ah  or v ). 

Under the standing assumptions on the ,f   ,ic  and ,jc  the steps incurred by the  -approximation to the 

penalty function produce descent for the penalty function if   and   are "correctly set". Algorithm 1 below 

gives an outline of the classical 1 -penalty method. 

 
Algorithm 1.  Classical 1 -penalty method. 

Give 0 > 0  and starting point 
0
;sx  

for  = 0, 1, 2,k   

       find minimizer 
k

x  of  ( , )x  , starting at ;
k
sx  

      if  
k

x  is  feasible then STOP 

      else 

           choose new 1k   so that 1 <k k   (say, 1 = 8k k  ); 

           set a new starting point 
1k

sx


 (say, 
k

x ); 

     end if ; 

endfor.            

 
Note that care must be taken to detect infeasibility or unboundedness. Also, when the value of   is too 

large, in which case the optimal point for   may be infeasible, Algorithm 1 reduces   and the minimization of 

  is repeated, for more details see the first two algorithms presented by Mahdavi-Amiri and Bartels (1989). The 

details of how zB  is to be managed will be given in Section 2.4. 

2.4  The projected least squares structure  

For computing the steps using (9), we intend to provide a quasi-Newton approximation zH  for 

 
2 2

( , )

( ( , ) ( )) = [ ( ) ( ) ( , )] ,
T T T

r r
r AC x

Z x c x Z Z G x G x S x Z


    


     

where 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

305 

2 2

=1 ( , )

( , ) = ( ) ( ) ( ( )) ( )
l

i i
i V E x

S x f x f x sgn c x c x 
 

 


     

2 2

( , ) ( , )

( ) ( ) .j r r
j V I x r AC x

c x c x
 


 

      

In the rest of our work, we set ( ) ( ) ( ) ( ) ( )
T T

zQ x Z x G x G x Z x  and in order to simplify the notation, the 

presence of a bar above a quantity indicates that it is taken at iteration 1,k   and the absence of a bar indicates 

iteration k. Thus, we have 

= ( ) ,z z zH Q x B   

and intend to provide a quasi-Newton approximation, 

( , ) .
T

zB Z S x Z  

We interpret the multipliers r  to be zero or estimated according to (7), whichever is appropriate at the kth 

iteration of the algorithm. Following (Mahdavi-Amiri and Bartels, 1989), the matrix zB  is updated such that the 

new matrix zB  satisfies the secant condition  

= ,zB s y  

where 

                              = ( ) ,
T

s Z x x  (10) 

and y is an appropriate approximation to ( , ) .
T

Z S x Z s  Based on the structure principle given by Dennis et al. 

(1989), Engels and Martinez (1991) derived the structured Broyden family. According to this, we presented a 

scheme (Mahdavi-Amiri and Ansari, 2011a) for updating the matrix zB  and derived a BFGS zB -update 

scheme as 

                                         
( )

= ,
T T

z z
z z T T

z

H s H s uu
B B

s H s u s

 
 


 (11) 

and a DFP update formula as 

                             
2

( ) ( ) ( )
= ,

( )

T T T
Tz z z

z z T T

z H s z z z H s s z H s
B B zz

s z s z

     
   (12) 

where    

= ,
z

u Q s y  and = .z z zH Q B   

Later, we use the BFGS scheme for testing the problems, because of its generally satisfactory numerical 

behavior in other contexts. We also impose positive definiteness of ,zH  if necessary, to guarantee descent, by 

use of the modified Cholesky factorization. 

3.  The line search strategy 

Here, we propose a nonsmooth line search strategy that makes use of the structure of least squares and is 

specially designed for exact 1 -penalty function ( , ).x   In a close neighborhood of *x  (a point satisfying 

second-order sufficiency conditions), a step size of unity is always taken. Therefore, the procedure to be 

explained here is to be used when 'far' from a solution. 

Let x be the current iterate and h be the direction of search along which the step   is to be taken. We are 

interested in finding > 0  such that a sufficient reduction in   is imposed, that is, 


N. MAHDAVI-AMIRI and M.R. ANSARI 

330 

( , ) < ( ( , ) ) ,x h x        

where   is a given small positive constant to be specified by the following assumption.  

 
Assumption 3.1  If h is a descent direction for   at x, then   is determined so that 

                      
2

1 1( , ) ( , ) ( ) := , > 0 ,
T

x x h h g           (13) 

where 

( , ) ( , )

= ( , ) ( ( )) ,i i i
i V E x i V I x

g x sgn c x c c
 

 
 

       

and 

 ( , ) = ( , ) | > 0 ,TiV E x i AE x c h    

 ( , ) = ( , ) | < 0 .TiV I x i AI x c h    
 

We note that Assumption 3.1 has also been used by Coleman and Conn (1980). It is a direct extension of a 

result of Proposition 1 of Conn and Pietrzykowski (1977), to show that the above condition can be satisfied. 

For a function ( ),p x  let ( ) = ( ).p p x h   It should always be clear from the context whether we are 

thinking of p, as a function of a vector or a step length value  . Also, we denote the derivative of ( )p   along h, 

by 

( ) = ( )
T

p p x h h     

and the left and right derivatives of ( ),p   by ( )p   and ( ),p   respectively, by 

 
0

( ) ( )
( ) = ,lim

t

p t p
p

t

 





 
  and 

0

( ) ( )
( ) = .lim

t

p t p
p

t

 





 
  

Then 

1 2

( ) = ( ) ( ) min (0, ( )) .i j
i M j M

c c     
 

    

A point where the derivative of ( )   does not exist will be referred to as a breakpoint. Note that the breakpoints 

occur at the zeros of the ( ).ic   Since ( )   is a continuous function and bounded below, a minimizer *  

exists and a minimum occurs either at a breakpoint, where ( *) 0    and ( *) 0    or at a point where 

( *) = 0.   

We gain our motivation from the case when the f   and the ic  are taken to be linear. A similar idea was 

suggested by Coleman and Conn (1982b) for the general nonlinear programming problems, but our approach is 

different in our consideration of the objective function. They make a linear approximation to the objective 

function and constraints but, in fact, we make a quadratic (sum of squares of linear functions) approximation of 

the objective function and a linear approximation of the constraints. Clearly, this should provide a better 

approximation. In the new scheme, we try out a sequence of candidate values ,  stopping to accept one of these 

values when certain conditions are satisfied. The numerical results show the new strategy to do better than the 

general line search in (Murray and Overton, 1978) considered by Mahdavi-Amiri and Bartels (1989). 

Suppose that h is a descent direction for   so that ( ) < 0
T

x h  As an approximation of ( ),   we 

define 

1 1

( ) = ( ) ( ) ( ) min (0, ( ) ( ) ) ,
T T

i i j j
i M j M

c x c x h c x c x h     
 

        

where 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

333 

2

=1

( ) = ( ( ) ( ) ) .
l

T
f x f x h 



     

Note that the breakpoints of   occur at zeros of the 1 2( ) ( ) , .
T

i ic x c x h i M M     We can compute 

' ( ),   ' ( )   or '( ),   for all   and without any new objective or constraint function evaluation, if needed. 

Let  = | > 0, = 1, ,j j jx x h j r   be a sorted list (in nonincreasing order with respect to j ) of these 
breakpoints along h. Now, an algorithm to determine the minimizer of ( )   is straightforward: by examining 

breakpoints sequentially, we find either a breakpoint as a minimizer (that is a breakpoint, at which ' 0    and 

' > 0  ) or determine an interval (say, 1( , )j j   or ( , )r  ) on which    is continuous, containing the 

minimizer that is found by setting '( ) = 0.   Next, we explain this strategy in more detail. 

We consider moving along the positive direction of h as long as the corresponding directional derivative of 

  along h is non-positive (in doing this, we may encounter breakpoints). Later, we discuss how to compute 

' ,   '   and '.  Now, suppose we are along the interval 1[ , ],i i   1' ( ) 0i     and a minimizer of   is 

yet to be determined. We compute ' ( ).i   If ' ( ) > 0,i   then the minimizer is in 1( , )i i   and we 

determine the step length   so that '( ) = 0,   for 1( , ).i i    But, if ' ( ) 0,i    then we compute 

' ( ).i   Now, if ' ( ) > 0,i   then clearly i  is the desired step length and the breakpoint =
i

ix x h  is a 

minimizer of   along h. In cases where ' ( ) 0,i    we expect more reduction along h, and thus the next 

interval 1[ , ]i i    is determined and the process is repeated. If we progress through the entire sorted list of 

breakpoints and   continues to reduce past the last breakpoint = ,
r

rx x h  then the step length   is 

determined by setting '( ) = 0,   for ( , ).r    

Now, we discuss the computational details. We compute breakpoints of   by 

= ( ) / ( ) > 0 .i j ji i
c x c x   

For computing the derivative and directional derivative of  , we note that for any > 0,  we have  

'( ) '( ) = .
T T

x h x h GG h      

Inductively, consider we are in the interval 1[ , )i i   and 1' ( )i    is computed. Along this interval, 

derivatives of the linearized constraints would remain unchanged. Therefore, the change in '  is determined by 

the change in '  only. Considering the above and 
1

1= ( ) ,
i

ix h x h  


    for 1( , ),i i    we have 

                               1 1 1( ) = ( ) ( ) , ( , ) ,
T T

i i i ih GG h                   (14) 

and 

1 1( ) = ( ) = ( ) ( ) .lim
T T

i i i i

i

h GG h

 

           




      

Now, suppose i  corresponds to ji
c  and 

1

2

( ) ( ) , if ,
( ) =

min (0, ( ) ( ) ), if .

T
j j ii i

T
j j ii i

c x c x h j M
c

c x c x h j M






   


   


 
We have (by the fact that ( ) ( ) = 0
T

j i ji i
c x c x h  ), 

' ( ) = ' ( ) sgn( ( )) ( ) ,i i j ji i
c c c x c x      


N. MAHDAVI-AMIRI and M.R. ANSARI 

331 

where 

                     
1

2

2, if ,
=

1, if .

i

i

j M

j M






 (15) 

Since at i  the change in directional derivative of   is determined by the change in directional derivative of c  

only, we have 

' ( ) = ( ) ' ( ) ' ( )i i i ic c           

= ' ( ) sgn( ( )) ( ) ,i j ji i
c x c x                             

where   is as defined by (15). Finally, by (14), the solution of  ( ) = 0,   for 1( , ),i i    is: 

1 1( ) = ( ) ( ) = 0 ,
T T

i i h GG h             

and therefore, 

1
1

( )
= .ii T T

h GG h

 
 



 



  

It is clear that if = 0,
T

G h  then    is constant along each interval and therefore we are certain that the 

minimizer is a breakpoint. 

 
Algorithm 2. Line search procedure. 

Input: x , the current point, h , a descent direction for   at x , 1 > 0  and > 0 . 

Output: The step length   such that ( , ) < ( ( , ) )x h x       , where,   is a given small positive 

constant specified by (13). 

:= 0l ; 

1
:=x x ; 

while ( < 0) ; 

 {Determine ,  the minimizer of 
1

( , )x h   }; 

           if ( = 0l ) then 

          := min(1.0, )  ; 

           else 

          1:= max( , )l     

          endif 

          
1

:=x x h ; 

           if  
1

( , ) < ( ( , )x x      then 

                return  ; 

          endif 

          :=l  ; 

          
1

:= ( , )
T

x h    ; 

endwhile; 

{Perform the (Murray and Overton, 1978) line search with the initial step length  }; 

return  ; 
 

A SUPERLINEARLY CONVERGENT PENALTY METHOD  

331 

Note that for ( , ),x   the resulting point x h  may not necessarily satisfy the sufficient decrease. In 

this situation, if ( ) < 0,   then we repeat the above procedure with x replaced by .x h  On the other hand, if 

( ) 0,    then we have a good interval containing the minimizer and thus the approach of Murray and Overton 

(1978) can be performed conveniently using their successive polynomial interpolation with safeguard strategy 

(using the available information at the last point x and ,x h  their algorithm works out more favorably; for 

details, see (Murray and Overton, 1978)). Our numerical testing showed that an initial large value of   may 

harm the performance of the line search. Thus, for the first trial, we adjust the choice of   by setting 

= min (1, ),   to ensure that the Newton-like step is tried (if it gives enough reduction). For subsequent values 

of ,  when the first trial does not provide a sufficient decrease specified by (13), we adjust   by at least an 

1 > 0  to ensure enough change in the value of  and admit a finite termination of the algorithm. 

We are now ready to give our line search algorithm. In Algorithm 2, the parameter 1  (we used 
3

1 = 10


 
in our experiments in Section 6) is used to avoid acceptance of a small correction to .  Our experiments showed 

that the usage of this parameter would accelerate the procedure. The parameter   is used to determine a 

sufficient decrease in   and is characterized by (line search) Assumption 3.1 as specified by (3.1). 

It should be pointed out that our numerical experiments in Section 6 showed that Algorithm 2 often 

achieved adequate reductions inside the loop without needing to perform the Murray and Overton line search. 

4.  Global and local convergence results 

Here, we briefly state the global and the local two-step Q-superlinear convergence results. The analysis of 

global convergence is based on the results of Byrd and Nocedal (1991), who studied convergence of the 

Coleman and Conn (1984) approach, with our necessary adjustments for the nonlinear least squares. Let D  

denote a compact set so that 
k

x D  in D, for all k. Let  ( )ACC x  denote the vector of active constraints at x  

and  Ŝ  denote the set of first-order points of    in D, see (Coleman and Conn, 1980) for justification of this 

terminology. 

 
Assumptions 4.1  Let D be a compact set. 

(A)    and 1 2, ,ic i M M   are twice continuously differentiable and their first and second derivatives are 

uniformly bounded in norm on D; 

(B)  kx  is generated by Algorithm 1 starting from an arbitrary initial point, and ,kx D  for all k; 
(C)  the gradients of the active constraints at any point in D are linearly independent. 

 
Theorem 4.2  Suppose that Assumptions 4.1 hold and assume that 

(i) the number of stationary points of   in D is finite; 

(ii)  all first-order points of   in D are strict second-order points of  ; 

(iii)  if  k ix  is a subsequence and x  is a first-order point such that 
k ix x  and ( , ) = ( , 0) ,

k i
k i

AC x AC x  

then there exists a sufficiently small positive constant   such that 

,( ( , ) ) ,
kT i

z k k k k kki i i i ii
B Z S x Z s s    

where ks  is defined by (10); 


N. MAHDAVI-AMIRI and M.R. ANSARI 

330 

(iv)  the proposed line search strategy produces a step length   satisfying Assumption 3.1. 
Then, 

(1)  0k . 

(2)  ˆ
k

x x S  . 

(3)  For k sufficiently large, ( , ) = ( , 0)
k

kAC x AC x . 

(4)  For k sufficiently large, the "Newton step" is executed. 

 
Proof.  See Theorem 3.4 in (Mahdavi-Amiri and Ansari, 2011b). 

 
Remark 4.3 It should be noted that a similar result for the general nonlinear case (see Coleman and Conn, 

1982b, Theorem 1) makes use of a stronger condition,  , ( )
T

z k Li
H Z G x Z  as ,

k ix x  where LG  is the 

Hessian of Lagrangian function, while Theorem 4.2 is proved under a weaker condition as specified by (iii). 

Considering Theorem 4.13 of Mahdavi-Amiri and Ansari (2011a), our condition in (iii) is reasonable. 

 
Now, we state a local two-step superlinear convergence of our method. The asymptotic analysis is 

developed adapting the approach of Nocedal and Overton (1985) to the nonlinear least squares. We will denote 

*x  to be a local optimal solution of problem (1) and suppose the active set at *x  to be  1, , .t  Furthermore, 

we assume that we are sufficiently close to *x  so that the proper active set has been identified. We need the 

following assumptions. 

 
Assumptions 4.4 Assume that Assumptions 4.1 hold, and that ( )Y x  and ( ),Z x  given by a particular 

implementation of (8), are Lipschitz continuous in D. We assume that 
2

,  
2

,ic  ,i AC  and ( )zQ x  are 

Lipschitz continuous matrix functions. 

 
Theorem 4.5 Suppose that Assumptions 4.4 hold. Let the matrix ,z kB  be updated by (12). Then, the sequence 

 kx  generated by Algorithm 1 converges to *x  with a local two-step Q-superlinear rate. 
 

Proof.  See Theorem 4.17 in (Mahdavi-Amiri and Ansari, 2011a). 

5.  Computational considerations 

Here, we indicate the measures taken to implement Algorithm 1. The reader is referred to (Mahdavi-Amiri 

and Bartels, 1989) for more details. 

The test for nearness to stationarity has the form, 

                                  1( , ) ( , ( , ) ) ,
T

Z x reference x         (16) 

where, for given scalars   and val, we define 1reference  to be a composite absolute-relative test, 

2
1( , ) = .reference val val    

This avoids underflow, or a stringent test, in either case that val becomes far from or close to zero. A much 

smaller tolerance, ,  of course, replaces   in (16) when the tests for convergence are made. 

By the same token, activity is determined by, 

2( ) ( , ( )) ,r rc x reference c x  


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

331 

and the violation of an equality constraint is determined by the reverse of this inequality. If  rc x  defines an 

inequality constraint, it is regarded as violated if we have 

2( ) < ( , ( )) .r rc x reference c x  

For any scalar ,  we define 

   

1 2
2 1

1 2

( , ( )) , ,
1

i
i M M

r

F x c x

reference c x reference
M M

 
 

  
 

  
  

 
in which the second term serves us as a general-purpose average function value. 

Tests for feasibility are carried out exactly as for activity, except that a much smaller user-defined 

tolerance, ,  must be used in place of .  

The main difficulty in the context of exact penalty method is the strategy for choosing and updating the 

penalty parameter .  The initial value of = 1  was used for most test problems, though we sometimes found 

that other values were more advantageous to use. These are noted in the reported results. Only naive changes are 

made to   during the running of the program. The value of   is reduced through division by 8 each time a 

minimizer x for ( , )x   is found that is not feasible for (1). Recently, Byrd et al. (2008) proposed an effective 

penalty parameter update strategy appropriated for trust region approaches. Of course, a possible adaptation to 

line search strategies is welcome. We have work in progress to consider a new penalty parameter updating 

strategy in the context of combined line search and trust region exact penalty methods. 

The value of r  must be tested to determine whether it satisfies (5) or (6). We carried out this relative to a 

boundary tolerance   in analogy with the corresponding ones in (Mahdavi-Amiri and Bartels, 1989). 

The value of   is regarded as being too small when 

 
2 ( , ( )) ,rF x reference mascheps c x   

where macheps is the unit round-off error. We regard   to be too small when ,   and  is regarded as being  

too small when .   

The difference in penalty function values, ( , ) ( , ),x x     has to be tested to determine whether a 

sufficient decrease has been obtained. We choose the condition for a sufficient decrease on the Newton step to 

be: 

   
2

1 ,
1

( , ) , ( , ( , ) ) .
T

A AC xx h x reference Z x C                

A sufficient decrease for the other two steps (global and dropping) is demanded by setting the parameter 1  in 

the line search condition (13). The numerical positive definiteness of the matrix 
z

H  is enforced, if necessary, 

by using the modified Cholesky factorization described in (Fang and O'Leary, 2008) during the process of 

solving (9) to guarantee descent.  

The requirements for optimality are that, for reasonable choices of reference values, 

(1) 1( , ) ( , ( , ) )
T

Z x reference x        , 

(2a) 1 1 , ( , ) ,i i AE x            

(2b) 1 , ( , ) ,j j AI x          

(3)  1( , ) ( , ) ( , ( , ) ) ,x x reference x           

(4)  1( , ) .x x reference x   


N. MAHDAVI-AMIRI and M.R. ANSARI 

332 

One might also be prepared to accept as a case of optimal termination the event that one or more of the above 

fail to be satisfied, but the user should be notified whenever this happens, for details see (Mahdavi-Amiri and 

Bartels, 1989). 

6.  Computational results 

Here, we report the comparative results obtained on three sets of problems. First, thirty CNLLS problems 

were taken from Hock and Schittkowski (1981): 1, 2, 6, 13, 14, 16, 17, 18, 20, 22, 23, 26, 27, 28, 30, 31, 32, 42, 

46, 48, 49, 50, 51, 52, 53, 60, 65, 77 and 79. These were chosen because they were least squares problems or 

could be conveniently cast in that form. The number of variables in these problems varies from 1 to 5, and the 

number of constraints varies from 1 to 13. Problems 28, 48, 50, 51, 52, and 53 have both linear constraints and 

linear residual functions and the other problems have at least a constraint or a residual function to be nonlinear. 

Second, we considered seven problems generated from (Bartels and Mahdavi-Amiri, 1986) who presented a 

program for random generation of general nonlinearly constrained nonlinear programming problems and also 

generated an example of a least squares problem where the dimensions and the curvature of the problem could 

be varied. These random test problems have both nonlinear constraints and residual functions. Third, we used 

twelve larger test problems in the constrained nonlinear least squares form from (Lukšan and Vlček, 1999) as 

listed below: 
 

1. Chained Rosenbrock function with trigonometric-exponential constraints. 
2. Chained Wood function with Broyden banded constraints. 
3. Chained Powell singular function with simplified trigonometric-exponential constraints. 
4. Chained Cragg-Levy function with tridiagonal constraints. 
5. Chained HS46 (Hock and Schittkowski, 1981), problem number 46) problem. 
6. Chained HS47 problem. 
7. Chained modified HS48 problem. 
8. Chained modified HS49 problem. 
9. Chained modified HS50 problem. 
10. Chained modified HS51 problem. 
11. Chained modified HS52 problem. 
12. Chained modified HS53 problem. 

 
These problems were modified in such a way that a variable number of n may be used. In our tests we take 

300n   (we were not able to consider larger n, since the free KNITRO package we are using is limited to 

problems with = 300n ). In all of these test problems the constraints and the residual functions are both 

nonlinear. The algorithm has successfully solved all of the considered problems quite efficiently. The local 

convergence on all problems clearly showed a two-step superlinear rate. Moreover, the significant reduction in 

the number of global steps confirms the effectiveness of our proposed line search procedure (Algorithm 2), as 

compared to the results obtained by Mahdavi-Amiri and Bartels (1989) as well as methods reported by Hock and 

Schittkowski (1981) and results obtained by our testing of three algorithms (Interior/Direct, Interior/CG and 

Active-Set) recently developed in the KNITRO 6.0 package. Our calls to programs in KNITRO are managed by 

a C++ program to have the same testing environment as our program. 

Our algorithm was coded in a portable subset of C++ and ran in double precision arithmetic on the 

Microsoft visual C++ 6.0 compiler on an x86-based PC with an AMD~1667Mhz processor. The parameters of 

Sections 2 and 5 were set as follows: 
1 2 14 4 12

1= 1, = 10 , = 10 , = 10 , = 10 , = 10 ,     
    

 and 
6

= 10 .


 
These values were determined by experience as ones that have worked well on most of the problems we had 

tried. 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

333 

Tables 1 and 2 present the results on problems from (Hock and Schittkowski, 1981). The same initial 

points as in this reference were used. The function value obtained by our algorithm is found in the column 

headed by "FV" for comparison with those listed. It should be noted that our objective functions correspond to 

those reported in (Hock and Schittkowski, 1981) multiplied by .1 2  This was simply a matter of convenience for 

us in arranging the test problems in a least squares format. The number given in the "NI" column counts the 

cumulative number of steps through the for loop of the minimization for .  The "NAI" column counts the 

number of steps needed to be taken for termination after the identification of the correct active set at the solution 

and execution of the Newton steps. The "NFE" column counts the number of times the value of the objective 

function was computed. The spread between the NI and the NFE provides an impression of how much work was 

required by the line search. Indeed, the spread was not found to be significant and this points to the efficiency of 

our proposed line search scheme. The numbers listed in the column headed by "H&SH" give the number of 

function evaluations required by the best method tested in (Hock and Schittkowski, 1981) for attaining a function 

value at least as good as the one we attained. We list the number of function evaluations needed by the method 

of Mahdavi-Amiri and Bartels (1989) in the column headed by "MA&B". For the three algorithms provided by 

KNITRO 6.0 (Byrd et al., 2006) integrated package for general constrained nonlinear problems, we used the 

default parameter values without defining any special case for the objective function or constraints, and used the 

dense quasi-Newton BFGS Hessian approximation of the Hessians.  
 

Table 1.  Initializing with the zero matrix 
 

PN FV NI NAI NFE H&SH MA&B D CG AS 
1 7.45045E-24 9 3 10 24 21 44 87 70 

2 2.15624E-02 8 2 8 40 6 15 19 11 

6
(1) 

7.54965E-30 5 2 7 10 6 13 14 17 

13
(2)

 5.07423E-01 20 5 24 45 29 47 15 65 

14 6.96732E-01 4 1 5 6 8 47 32 44 

15
(3) 

1.53250E+02 4 1 5 5 3 13 15 7 

16
(3)

 1.15723E+00 6 2 8 89 19 10 18 5 

17 5.00000E-01 9 1 12 12 20 9 15 17 

18 2.50000E+00 13 4 19 8 36 13 12 19 

20
(3) 

2.00994E+01 7 1 10 17 46 8 5 4 

22 5.00000E-01 4 1 5 9 9 7 7 5 

23 1.00000E+00 5 3 6 7 7 9 8 8 

26
(1) 

2.01655E-23 5 2 6 19 37 29 27 104 

27 2.00000E-02 13 3 18 25 23 3588 22 26 

28 5.23853E-32 2 1 2 5 2 11 9 10 

30 5.00000E-01 2 1 3 14 2 9 7 21 

31 3.00000E+00 9 5 11 10 15 13 10 22 

32 5.00000E-01 4 1 3 3 9 8 9 4 

42 6.92893E+00 7 3 8 10 14 13 10 18 

46 5.84019E-09 16 9 16 14 40 44 32 94 

48 5.54668E-32 2 1 2 7 2 11 12 28 

49 1.35973E-07 11 6 11 9 11 23 26 26 

50 1.70714E-30 2 1 2 18 8 9 8 10 

51 3.69779E-32 2 1 2 5 2 13 7 13 

52 2.66332E+00 3 1 3 8 7 7 9 10 

53 2.04651E+00 3 1 3 8 8 9 9 8 

60
(1) 

1.62841E-02 12 3 14 9 29 11 16 100 

65
(4) 

4.76764E-01 11 3 13 - 20 11 19 21 

77
(4)

 1.20753E-01 15 4 17 16 39 39 19 25 

79 3.93884E-02 12 4 15 10 29 13 12 17 

 
N. MAHDAVI-AMIRI and M.R. ANSARI 

334 

Table 2. Initializing with the identity matrix 
 

PN FV NI NAI NFE H&SH MA&B D CG AS 

1 9.59790E-18 9 4 10 24 29 44 87 70 

2 2.15624E-02 7 2 7 40 8 15 19 11 

6
(1) 

4.48518E-23 8 2 11 10 13 13 14 17 

13
(2) 

5.03425E-01 22 6 28 45 30 47 15 65 

14 6.96732E-01 4 1 5 6 8 47 32 44 

15
(3) 

1.53250E+02 3 1 4 5 3 13 15 7 

16
(3)

 1.15723E+00 6 2 8 89 9 10 8 5 

17 5.00000E-01 9 1 12 12 20 9 5 17 

18 2.50000E+00 13 4 19 8 32 13 12 19 

20
(3) 

2.00994E+01 7 2 10 17 32 8 5 4 

22 5.00000E-01 4 1 5 9 11 7 7 5 

23 1.00000E+00 5 3 6 7 34 9 8 8 

26
(1) 

2.41009E-24 7 4 8 19 15 29 27 104 

27 2.00000E-02 14 8 16 25 18 3588 22 26 

28 7.70372E-33 3 2 3 5 4 11 9 10 

30 5.00000E-01 3 2 3 14 3 9 7 21 

31 3.00000E+00 9 5 11 10 15 13 10 22 

32 5.00000E-01 5 1 6 3 11 8 9 4 

42 6.92893E+00 12 4 22 10 25 13 10 18 

46 5.41352E-08 20 10 22 14 27 44 32 94 

48 0.00000E-00 5 3 4 7 5 11 12 28 

49 2.25804E-07 13 7 13 9 14 23 26 26 

50 1.63935E-30 5 1 5 18 8 9 8 10 

51 1.23260E-32 3 1 3 5 4 13 7 13 

52 2.66332E+00 5 3 5 8 10 7 9 10 

53 2.04651E+00 4 3 4 8 10 9 9 8 

60
(1)

 1.62841E-02 12 4 14 9 18 11 16 100 

65
(4) 

4.76764E-01 9 2 11 - 19 11 19 21 

77
(4) 

1.20753E-01 20 8 22 16 33 39 19 25 

79 3.93884E-02 15 7 19 10 29 13 12 17 

 
In Tables 1 and 2, the columns headed by "D", "CG" and "AS" respectively give the number of function 

evaluations required by Interior/Direct (barrier), Interior/CG (barrier) and Active-Set algorithms in this package. 

Our method of counting function evaluations was consistent with (Hock and Schittkowski, 1981; Mahdavi-Amiri 

and Bartels, 1989; and KNITRO), so that these columns serve as a basis for comparison. Obviously, the function 

evaluations are compared on cases where the algorithms have converged to the same solution point. The results 

for initialization of the projected matrix with the zero matrix are given in Table 1 and those for initialization with 

the identity matrix are given in Table 2. There is no entry in the "H&SH" column for problem 65, which serves 

to indicate that our program reached a minimum value significantly lower than that of any of the methods 

reported on in (Hock and Schittkowski, 1981). In the first column of Tables 1 and 2, the superscript (1) or (4) 

indicates that the initial value of   was set to 100 or 10, respectively, rather than 1. The problems in question, 

i.e. 6, 26, 60, 65, and 77, had constraint function values at some points whose magnitudes dominated the 

magnitude of the penalty function, forcing the minimization process to take an unusually long time in attempting 

to maintain the constraints within feasibility. The larger value of   compensated for the imbalance in scale 

between the objective function and the constraints. The superscript  (2) or (3) indicates that the initial value of 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

335 

  was set to 0.01 or 0.001, respectively, instead of 1. In the problems for which this was done, i.e. 13, 15, 16, 

and 20, there was a reverse imbalance of scale. The magnitude of the objective function at some point was far 

larger than that of the constraints. This caused the minimization to waste time in finding a sequence of infeasible 

stationary points for a sequence of decreasing values of .  The smaller value of   compensated for the 

imbalance in scale between the objective function and the constraints.  

It should be pointed out that we have work in progress to come up with a new automatic penalty parameter 

updating strategy in the context of combined line search and trust region exact penalty methods. Despite this, the 

method we have described shows itself competitive in comparison with those tested by Hock and Schittkowski 

(1981) and those solved by KNITRO. It did, respectively, as well as, better than and poorer than the best of the 

methods in Hock and Schittkowski on approximately 40%, 50% and 10% of the test problems and in comparison 

with the method of Mahdavi-Amiri and Bartels (1989), the corresponding results are 30%, 70% and 0%, 

respectively. In comparison with Interior/Direct algorithm, our algorithm did as well as, better and poorer on 

13%, 70% and 17%, respectively. In comparison with Interior/CG algorithm, the corresponding results are 13%, 

74% and 13%, respectively. Finally, in comparison with Active-Set algorithm, results are 37%, 50% and 13%, 

respectively. Thus, our new code here with the modifications proposed along with the new line search strategy 

significantly outperforms the ones reported in (Bartels and Mahdavi-Amiri, 1986; Mahdavi-Amiri and Bartels, 

1989), as well as the three general nonlinear algorithms of KNITRO (Byrd et al., 2006), on the test problems. 

This should justify the appropriateness of our structural approach to nonlinear least squares. 

 
Table 3. Results on randomly generated test problems initializing with the zero matrix 
 

PN SP  FV NI NAI NFE NCE RFE 
1 1 1 21.6541 19 5 19 24 E-12 

1 10 1 21.6541 37 6 40 46 E-12 

1 100 1 21.6541 39 7 40 47 E-13 

2 1 -1 21.6541 24 6 25 31 E-12 

2 10 -1 21.6541 39 7 45 52 E-11 

2 100 -1 21.6541 41 6 42 48 E-11 

3 1 -10 21.6541 33 7 37 44 E-12 

3 10 -10 21.6541 39 7 45 52 E-11 

3 100 -10 21.6541 36 7 37 44 E-12 

4 1 1 99.3233 35 4 36 40 E-15 

4 10 1 98.8173 51 4 56 60 E-14 

5 1 -1 98.8173 51 4 55 59 E-16 

5 10 -1 98.8173 115 8 117 125 E-15 

6 1 1 862.628 105 3 113 116 E-15 

6 10 1 862.628 116 4 117 121 E-16 

7 1 -1 861.698 125 3 130 133 E-14 

7 10 -1 861.698 140 5 152 155 E-14 

 
In Tables 3 and 4, we report the results obtained by our program on three sets of random problems 

(problems 1, 2 and 3 as set 1, problems 4 and 5 as set 2, and finally problems 6 and 7 as set 3) generated and 

tested by Bartels and Mahdavi-Amiri (1986). Each set of random problems are generated with all quantities ( *,x  

Lagrange multipliers, 1,U  etc.) being exactly the same, and only having different 
2
L  obtained by setting 

different values for   (
2
L  is definite if > 0  and is indefinite if 0  , for details see (Bartels and 

Mahdavi-Amiri, 1986). The numbers of the equality constraints, the inequality constraints and the active 

constraints at solution for the three sets respectively are (2, 3, 4), (5, 5, 7) and (8, 12, 10). The results obtained by 


N. MAHDAVI-AMIRI and M.R. ANSARI 

310 

each algorithm provided by KNITRO 6.0 on these random problems are separately reported in Table 5. In Tables 

3 and 4, the "SP" column shows the value to which all components of the starting point are set. The "  " column 
shows the value considered for .  The "NCE" column gives the count for the number of times the constraint 
function is computed. We compute the composite absolute-relative feasibility error at the final point x as: 

 
1 2

1

( ) (0, ( ))

( ) = ,
1 ( )

mini j
i M j M

c x c x

rfe x
C x

 

 


 

where  ( )C x  denotes the vector of constraints evaluated at x. Rounded value of ( )rfe x is shown in the column 

headed by "RFE". The other column headlines have the same meanings as those given for Tables 1 and 2.  

 
Table 4. Results on randomly generated test problems initializing with the identity matrix 
 

PN SP  FV NI NAI NFE NCE RFE 
1 1 1 21.6541 25 5 25 30 E-10 

1 10 1 21.6541 45 6 47 53 E-12 

1 100 1 21.6541 37 6 38 44 E-12 

2 1 -1 21.6541 27 6 30 36 E-11 

2 10 -1 21.6541 56 7 56 63 E-11 

2 100 -1 21.6541 45 6 45 51 E-12 

3 1 -10 21.6541 28 7 31 38 E-11 

3 10 -10 21.6541 45 7 52 59 E-12 

3 100 -10 21.6541 44 7 45 52 E-11 

4 1 1 99.3233 38 6 40 46 E-15 

4 10 1 98.8173 52  4 58 62 E-15 

5 1 -1 98.8961 30 5 31 36 E-15 

5 10 -1 98.9681 89 10 95 105 E-15 

6 1 1 862.628 132 4 138 142 E-15 

6 10 1 862.628 163 3 173 176 E-15 

7 1 -1 861.698 152 7 157 164 E-15 

7 10 -1 861.698 138 3 146 149 E-14 

 
In Table 5 for each algorithm of KNITRO we have three columns, and all column headlines have the same 

meanings as those given for other tables. 

Note that on problems 4 (with starting point 10), 5 (with starting point 1), 6 and 7 our program reached a 

locally optimal solution having significantly smaller objective values than the values obtained by the other three 

algorithms. At these solutions, our program terminated with the active constraints set {1, … ,7,9} on problems 4 

and 5, and with the active constraints set {1, … ,15,17,18,19} on problems 6 and 7. 

In Table 6, we report the results obtained by our program and three programs of KNITRO 6.0 on problems 

taken from (Lukšan and Vlček, 1999). The same initial points as in this reference were used. The columns 

headed by "n" and "m" respectively give the number of variables and constraints. The superscript (in the first 

column) and the other column headlines have the same meanings as those given for Table 1. The Interior/Direct 

program, on problems 2, 3, 4, 7 and 9 reached the iteration limit (of 10000) or failed to solve the problem and on 

problem 8 it found a local optimal solution having objective value 1.67453E+02, which is significantly smaller 

than the values obtained by the other three programs. On problem 3, the Interior/CG program and the Active -Set 

program terminated with the objective values 3.40390E+02 and 7.08426E+00, respectively. On problem 1, the 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

313 

Active-Set program found a local optimal solution having the objective value 1.92827E+00, which is 

significantly larger than the values obtained by the other three programs. 

The results in Tables 3, 4, 5 and 6 show the effectiveness of the proposed algorithm as compared to the 

three algorithms recently developed in KNITRO (Byrd et al., 2006) on constrained nonlinear least squares 

problems. Observing the low values of the number of local iterations under the column "NAI" in all the tables, 

the local two-step superlinear convergence of our proposed algorithm is well supported. Moreover, the overall 

performance of the proposed algorithm on the test problems attests to its robustness. 

 
Table 5.  Results of KNITRO on randomly generated test problems 
 

 D CG AS 

PN SP  FV NFE RFE FV NFE RFE FV NFE RFE 
1 1 1 21.6541 36 E-9 21.6543 45 E-8 21.6541 45 E-9 

1 10 1 21.6544 58 E-9 21.6657 62 E-7 21.6542 327 E-10 

1 100 1 21.6543 83 E-6 21.6562 77 E-5 22.8154 97 E-18 

2 1 -1 21.6541 42 E-8 21.6543 45 E-8 21.6541 45 E-9 

2 10 -1 21.6546 64 E-9 21.6565 76 E-8 21.6564 256 E-9 

2 100 -1 21.6543 91 E-8 21.7348 72 E-5 21.8106 132 E-15 

3 1 -10 21.6541 52 E-9 21.6541 69 E-8 21.6541 54 E-9 

3 10 -10 21.6541 67 E-7 21.6665 77 E-7 21.6542 312 E-10 

3 100 -10 21.6543 93 E-8 21.6546 87 E-8 21.8070 74 E-18 

4 1 1 99.3281 266 E-8 99.3902 163 E-7 99.3281 151 E-11 

4 10 1 99.3522 169 E-7 99.3531 130 E-7 99.3307 417 E-13 

5 1 -1 99.3305 386 E-7 99.3304 139 E-9 99.3281 193 E-12 

5 10 -1 98.8990 182 E-9 99.7139 99 E-8 99.3301 766 E-12 

6 1 1 959.052 533 E-7 959.567 3096 E-7 959.327 406 E-12 

6 10 1 959.061 8140 E-7 971.849 17787 E-14 959.327 406 E-12 

7 1 -1 960.240 13174 E-6 959.517 423 E-10 960.210 127 E-15 

7 10 -1 959.406 7835 E-5 959.514 255 E-9 959.040 3813 E-15 

 
Table 6. Results on problems taken from (Lukšan and Vlček, 1999) 
 

PN n m FV NI NAI NFE D CG AS 

1
(1) 

300 298 1.12048E-14 30 5 40 36 34 31 

2
(3) 

300 290 2.62320E+03 250 9 340 - 186 699 

3 300 2 2.00542E+02 27 6 34 - 544 189 

4 300 296 7.15723E+02 50 8 65 - 35 180 

5
(1) 

299 198 2.34529E-13 30 4 38 25 23 37 

6 297 222 2.22740E+02 35 5 48 25 27 174 

7 299 198 1.18060E+03 32 6 39 - 41 80 

8 299 198 7.79817E+02 75 8 125 159 183 184 

9
(1)

 297 222 1.67930E-06 115 8 135 - 194 588 

10 297 222 5.06731E-14 29 7 38 91 24 10 

11
(3) 

297 222 2.11459E+02 50 9 75 55 56 121 

12
(3)

 297 222 1.77561E+02 40 6 53 19 27 71 

 
Remark 6.1 To investigate the effectiveness of projected structured updating schemes proposed by Mahdavi-

Amiri and Bartels (1989) and (11) proposed here, we used alternative update rules instead of (11) in the local 

steps and ran the algorithm on all the test problems again. The asymptotic results showed a marginal 


N. MAHDAVI-AMIRI and M.R. ANSARI 

311 

outperformance of (11). Thus, the overall clear outperformance of our new code to the one in (Mahdavi-Amiri 

and Bartels, 1989) (as we mentioned before) should be attributed to our global steps and the use of our new 

nonsmooth line search strategy. With this observation, it is remarkable that the two-step superlinear convergence 

of the method in (Mahdavi-Amiri and Bartels, 1989) is yet to be proved. 

7.  Conclusions 

We presented a structurally projected exact penalty method adapted for solving constrained nonlinear least 

squares problems. The projected structure made appropriate use of the ideas of Nocedal and Overton (1985) for 

general nonlinear programs. Numerical results showed our proposed algorithm to be efficient and reliable. For 

robustness, we devised a nonsmooth line search procedure accounting for the structure in the nonlinear least 

squares objective. We implemented the proposed algorithm and tested the program and three nonlinear 

programming codes from KNITRO on both small and large residual problems from (Hock and Schittkowski, 

1981; Lukšan and Vlček, 1999) and some randomly generated ones due to Bartels and Mahdavi-Amiri (1986). 

The results showed the practical relevance of our special line search and projected considerations for the sum of 

squares objective. 

8.  Acknowledgment 

The first author thanks the Research Council of Sharif University of Technology for supporting this work. 

9.  References 

AL-BAALI, M. and FLETCHER, R., 1985. Variational methods for non-linear least squares. Journal of the 

Operational Research Society, 36: 405-421. 

ANITESCU, M., 2005. Global convergence of an elastic mode approach for a class of mathematical programs 

with complementarity constraints. SIAM J. Optim., 16(1): 120-145. 

BARTELS, R.H. and MAHDAVI-AMIRI, N., 1986. On generating test problems for nonlinear programming 

algorithms. SIAM Journal on Scientific and Statistical Computing, 7(3): 769-798. 

BENSON, H.Y., SEN, A., SHANNO, D.F. and VANDERBEI, R.J., 2006. Interior-point algorithms, penalty 

methods and equilibrium problems. Computational Optimization and Applications, 34(2): 155-182. 

BYRD, R.H., GOULD, N.I.M., NOCEDAL, J. and WALTZ, R.A., 2004. An algorithm for nonlinear 

optimization using linear programming and equality constrained subproblems. Math. Program. Series B, 

100(1): 27-48. 

BYRD, R.H. and NOCEDAL, J., 1991. An analysis of reduced Hessian method for constrained optimization. 

Mathematical Programming, 49: 285-323. 

BYRD, R.H., NOCEDAL, J. and WALTZ, R.A., 2006. KNITRO: An integrated package for nonlinear 

optimization. In Large-Scale Nonlinear Optimization, eds G. Di Pillo and M. Roma, Springer, pp. 35-59. 

BYRD, R.H., NOCEDAL, J. and WALTZ, R.A., 2008. Steering exact penalty methods for nonlinear 

programming. Optimization Methods & Software, 23(2): 197-213. 

COLEMAN, T.F. and CONN, A.R., 1980. Second-order conditions for an exact penalty function. Math. 

Program., 19: 155-177. 

COLEMAN, T.F. and CONN, A.R., 1982a. Nonlinear programming via an exact penalty function: Asymptotic 

analysis. Mathematical Programming, 24: 123-136. 

COLEMAN, T.F. and CONN, A.R., 1982b. Nonlinear programming via an exact penalty function: Global 

analysis. Mathematical Programming, 24: 137-161. 

COLEMAN, T.F. and CONN, A.R., 1984. On the local convergence of a quasi-Newton method for the nonlinear 

programming problem. SIAM Journal on Numerical Analysis, 21: 755-769. 


A SUPERLINEARLY CONVERGENT PENALTY METHOD  

311 

CONN, A.R. and PIETRZYKOWSKI, T., 1977. A penalty function method converging directly to a constrained 

minimum. SIAM Journal on Numerical Analysis, 14: 348-375. 

DENNIS, Jr., J.E. 1977. Nonlinear least squares and equations. In The State of the Art of Numerical Analysis, eds 

D. Jacobs, Academic Press, Orlando, Florida, USA. 

DENNIS, Jr., J.E., GAY, D.M. and WELSCH, R.E., 1981. An adaptive nonlinear least squares algorithm. ACM 

Trans. Math. Soft., 7(3): 348-368. 

DENNIS, Jr., J.E., MARTINEZ, H.J. and TAPIA, R.A., 1989. A convergence theory for the structured BFGS 

secant method with an application to nonlinear least squares. Journal of Optimization Theory and 

Applications, 61: 161-178. 

DENNIS, Jr., J.E. and WALKER, H.F., 1981. Convergence theorems for least-changes secant update methods. 

SIAM. J. Numer. Anal., 18: 949-987. 

ENGELS, J.R. and MARTINEZ, H.J., 1991. Local and superlinear convergence for partially known quasi-

Newton methods. SIAM Journal on Optimization, 1: 42-56. 

FANG, H.R. and O’LEARY, D.P., 2008. Modified Cholesky algorithm: A catalog with new approaches. Math. 

Program., 115(2): 319-349. 

FLETCHER, R., 1987. Practical Methods of Optimization. (2nd Edition) J. Wiley and Sons, Chichester, 

England. 

FLETCHER, R. and XU, C., 1987. Hybrid methods for nonlinear least squares. IMA Journal of Numerical 

Analysis, 7: 371-389. 

GILL, P.E. and MURRAY, W., 1978. Algorithms for the solution of the nonlinear least squares problem, SIAM 

J. Numer. Anal., 15(5): 977-992. 

GILL, P.E., MURRAY, W. and SAUNDERS, M.A., 2002. SNOPT: An SQP algorithm for large-scale 

constrained optimization. SIAM J. Optim., 12: 979-1006. 

GRIEWANK, A., 1987. On the iterative solution of differential and integral equation using secant updating 

techniques. In The State of the Art in Numerical Analysis, eds A. Iserles and M.J.D. Powell, Clarendon 

Press: 299-324. 

HOCK, H. and SCHITTKOWSKI, K., 1981. Test Examples for Nonlinear Programming Codes. Lecture Notes 

in Economic and Mathematical System No.187, Springer-Verlag, New York. 

HUSCHENS, J., 1993. Exploiting additional structure in equality constrained optimization by structured SQP 

secant algorithms. Journal of Optimization Theory and Applications, 77: 359-382. 

LEYFFER, S., L´OPEZ-CALVA, G. and NOCEDAL, J., 2006. Interior methods for mathematical programs 

with complementarity constraints. SIAM Journal on Optimization, 17(1): 52-77. 

LUENBERGER, D.G., 1970. Control problem with kinks. IEEE Transactions on Automatic Control, 15: 570-

574. 

LUKŠAN, L. and VLČEK, J., 1999. Sparse and Partially Separable Test Problems for Unconstrained and 

Equality Constrained Optimization. Institute of Computer Science, Academy of Sciences of the Czech 

Republic, Technical Report 767. 

MAHDAVI-AMIRI, N. and ANSARI, M.R., 2011a. Superlinearly convergent exact penalty projected structured 

Hessian updating schemes for constrained nonlinear least squares: Asymptotic analysis. Bulletin of the 

Iranian Mathematical Society (accepted). 

MAHDAVI-AMIRI, N. and ANSARI, M.R., 2011b. Superlinearly convergent exact penalty projected structured 

Hessian updating schemes for constrained nonlinear least squares: Global analysis. Submitted. 

MAHDAVI-AMIRI, N. and BARTELS, R.H., 1989. Constrained nonlinear least squares: An exact penalty 

approach with projected structured quasi-Newton updates. ACM Transactions on Mathematical Software, 

15(3): 220-242. 

MURRAY, W. and OVERTON, M.L., 1978. Steplength Algorithms for Minimizing a Class of Nondifferentiable 

Functions. STAN-CS-78-679, Stanford University, Stanford, California, USA. 

NOCEDAL, J. and OVERTON, M.L., 1985. Projected Hessian updating algorithms for nonlinearly constrained 

optimization. SIAM Journal on Numerical Analysis, 22(5): 821-850. 


N. MAHDAVI-AMIRI and M.R. ANSARI 

310 

PIETRZYKOWSKI, T., 1969. An exact potential method for constrained maxima. SIAM Journal on Numerical 

Analysis, 6: 299-304. 

TAPIA, R.A., 1988. On secant updates for use in general constrained optimization. Mathematical Computing, 

51: 181-203. 

TJOA, I.B. and BIEGLER, L.T., 1991. Simultaneous solution and optimization strategies for parameter 

estimation of differential-algebraic equation systems. Industrial Engineering Chemical Research, 30: 376-

385. 

 
Received  11 May 2011    

Accepted  16 October 2011