Numerical Experience with Damped


SQU Journal for Science, 17 (1) (2012) 1- 11 

© 2012 Sultan Qaboos University 

1 

 
Numerical Experience with Damped  
Quasi-Newton Optimization Methods  
when the Objective Function is 
Quadratic 

Mehiddin Al-Baali* and Anton Purnama** 

Department of Mathematics and Statistics, Sultan Qaboos University, Muscat, Oman, *Email: 
albaali@squ.edu.om and **Email: antonp@squ.edu.om. 
 

ABSTRACT: A class of damped quasi-Newton methods for nonlinear optimization has recently been 

proposed by extending the damped-technique of Powell for the BFGS method to the Broyden family 

of quasi-Newton methods. It has been shown that this damped class has the global and superlinear 

convergence property that a restricted class of 'undamped' methods has for convex objective functions 

in unconstrained optimization. To test this result, we applied several members of the Broyden family 

and their corresponding damped methods to a simple quadratic function and observed several useful 

features of the damped-technique. These observations and other numerical experiences are described 

in this paper. The important role of the damped-technique is shown not only for enforcing the above 

convergence property, but also for improving the performance of efficient, inefficient and divergent 

undamped methods substantially (significantly in the latter case). Thus, some appropriate ways for 

employing the damped-technique are suggested. 

  
KEYWORDS: Nonlinear optimization, Quasi-Newton methods, Damped-technique, Line search 

framework. 

 في األمثليات عندما تكون دالة الهدف تربيعيةنيوتن متماثلة تخامد مع  ةعددي ةخبر

أنتون برناماو محي الدين البعلي   

باول إلى حزمة  قنيةت رويطمثليات غير الخطية بتفي األ لمتماثلة نيوتنتخامدة ماقتراح حزمة طرق  اتم حديث :خصمل
التي تمتلكها  الفائق أن هذه الحزمة المتخامدة تمتلك خاصية التقارب الخطي إثبات. وقد تم متماثلة نيوتنبرويدن لطرق 

، لهدف محدبة. الختبار هذه النتيجةعندما تكون دوال ا المقيدة ة للطرق غير المتخامدة في األمثليات غيرحددالحزمة الم
من عناصر حزمة برويدن وما يقابلها من الطرق المتخامدة على دالة تربيعية بسيطة والحظنا عددا  من الصفات  ا  طبقنا عدد

تم توضيح  حيثالعددية األخرى في هذه الورقة.  بارجتغيرها من الهذه المالحظات وشرح تم وقد التخامد. لتقنية الجيدة 
، بل لتحسين سير الطرق السريعة والبطيئة وغير على التقارب فحسبمن أجل الحصول  التخامد ليس لتقنيةالدور الرئيسي 

  التخامد.   قنيةتراح طرق مناسبة الستخدام تمتميز في الحالة األخيرة(. لذا تم اقوالمتقاربة بشكل كبير )
 

MEHIDDIN AL-BAALI and ANTON PURNAMA 

2 

 
1.  Introduction 

onsider iterative quasi-Newton methods with line search for solving the unconstrained optimization 

problem  

min ( ) ,f x  

where :
n

f R R is a smooth function. 

On each iteration k of these methods, an estimate of a solution kx and a positive definite Hessian 

approximation kB  are used to obtain a new estimate 1.kx   Then, kB  is updated to a new Hessian 

approximation 1kB   by using a member of the Broyden family in terms of the differences in points 

1=k k ks x x   and gradients 1= ,k k ky g g   where kg denotes the gradient ( ).kf x  This family consists 

of efficient, inefficient and divergent methods. The former two types of methods are usually defined sufficiently 

closely to the well known BFGS and DFP methods, respectively, while the latter type is defined sufficiently 

remotely away from these methods. For further details see for instance (Fletcher, 1987; Dennis and Schnabel, 

1996; Nocedal and Wright, 1999). 

Al-Baali and Grandinetti (2009) show that the performance of the BFGS method can be improved if ky is 

modified before updating to the damped-technique   

ˆ = (1 ) ,k k k k k ky y B s                                                                 (1) 

where k is a parameter chosen appropriately and sufficiently large in the interval (0,1] . The resulting damped 

(D)-BFGS method is proposed by Powell (1978) for the Lagrangian function in constrained optimization and 

used many times with only values of 0.8k  , see for example (Fletcher, 1987; Nocedal and Wright, 1999). 

The aim of this paper is to show that small values of k can be used to improve the behaviour of not only the 

BFGS method, but also all members of the Broyden family of methods for unconstrained optimization. To 

illustrate this possibility, we applied several members of this family and their corresponding damped methods to 

a simple quadratic function of two variables, using certain initial point 1x and Hessian approximation 1.B  For 

particular choices of ,k  it is shown that many damped methods work substantially better than the BFGS 

method. The analysis is organized in the following way. Section 2 describes the Broyden and D-Broyden classes 

of methods, Section 3 provides some numerical results and Section 4 concludes the paper. 

2.  Damped quasi-Newton methods 

We now briefly describe the Broyden and D-Broyden classes of methods. On each iteration of these 

methods, the current point kx and a symmetric and positive-definite Hessian approximation kB are available, 

given initially 1x and 1.B  Using these available data, a new point 
1

1 =k k k k kx x B g


  is calculated. Here, 

k denotes a steplength which is usually chosen such that the Wolfe conditions   

1 0
T

k k k kf f s g                                                                        (2) 

and   

1(1 ) ,
T T
k k k ks y s g                                                                     (3) 

where kf denotes ( ),kf x  0 (0, 0.5)   and 1 0( ,1),   are satisfied. Then kB is updated to a new Hessian 

approximation   

1

ˆ ˆ
ˆ ˆ= ( ) ,

ˆ

T T
T Tk k k k k k

k k k k k k k kT T
k k k k k

B s s B y y
B B s B s v v

s B s s y
                                                (4) 

C 


NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 

3 

 
where k is a parameter,   

ˆ
ˆ =

ˆ

k k k
k T T

k k k k k

y B s
v

s y s B s
                                                                   (5) 

and ˆky is defined by (1) for a suitable value of .k  This class of damped updates is reduced to the Broyden 

family if ˆ =k ky y  (which corresponds to = 1k ). Thus, if this equality holds for all iterations we obtain the 

Broyden family of methods. Otherwise, we obtain the D-Broyden class of methods. In particular, the choices 

= 0k  and = 1k  yield the D-BFGS and D-DFP methods which correspond (for these choices and = 1k ) 

to the well known BFGS and DFP methods, respectively. 

Although the restricted class of quasi-Newton methods (defined for = 1k  and < < 1,k k   where k is 

a certain negative value sufficiently close to zero) converges globally and q-superlinearly for convex functions, 

the performance of the class becomes worse as k increases. In general, the BFGS method is robust and the 

DFP method is inefficient, though the former method usually suffers from large eigenvalues of kB , see in 

particular (Powell, 1986; Byrd et al., 1992). Therefore several modification techniques have been introduced to 

the BFGS method, see for example (Yabe et al., 2007; Gratton and Toint, 2010). Here, we test the D-Broyden 

class of methods which is an extension of the D-BFGS update of Powell (1978) in the augmented Lagrangian 

and SQP methods for constrained optimization. For details on the latter case, see for instance (Fletcher, 1987; 

Nocedal and Wright, 1999). This damped method was applied to unconstrained optimization problems for the 

first time by Al-Baali (2003) who considered   

2
2

3
3

, < 1
1

=
, > 1

1

1, otherwise ,

k
k

k
k

k


 



 
 




 





 



                                                                    (6) 

 where   

=
T
k k

k T
k k k

s y

s B s
                                                                              (7) 

and 2 3, 0.    If 3 ( 1) ,k c    where > 0,c  then the bound k c  is guaranteed. This formula with 

damped-technique (1) ensures that ˆ
T
k ks y is sufficiently close to either the curvature 

T
k ks y or approximate 

curvature 
T
k k ks B s if k is sufficiently large or small, respectively. Another useful feature of formula (6) is that 

ˆTk ks y remains sufficiently positive even when 0
T
k ks y   (which may occur if the Wolfe condition (3) is not 

enforced on ks ). 

We also observe that formula (6) is reduced to the choice of Powell (1978) if 2 = 0.8  and 3 = .   For 

certain choices of these parameters, Al-Baali (2004) introduced formula (6) to the limited memory L-BFGS 

method of Nocedal (1980) and reported encouraging numerical results in certain cases. Other encouraging results 

for the D-BFGS method with formula (6) on unconstrained optimization have been reported by Al-Baali and 

Grandinetti (2009). The authors also show that this formula works better than other modified BFGS formulae 

which modify ky to the form   

                                                 1 1ˆ = [6( ) 3( ) ] ,
T

k k k k k k k ky y f f g g s u                   (8) 

for certain vectors , see for example (Yabe et al., 2007) and the references therein. Because this form is reduced 

to ky when the objective function is quadratic, the proposed modified BFGS methods maintain the difficulty 


MEHIDDIN AL-BAALI and ANTON PURNAMA 

4 

 
associated with the BFGS method in this case. Due to this feature and the above observations, we consider only 

formula (6) here and state the following investigations. 

In practice, we observed that formula (6) with 2  sufficiently close to 0.6 seems to work well in some 

cases, but generally worsens the performance of the D-BFGS method as 2  decreases. This observation 

illustrates that large values of 2  such as 0.8 or 0.9 are usually recommended when the D-BFGS update is 

employed in methods for constrained optimization, see for example (Fletcher, 1987; Nocedal and Wright, 1999; 

Powell, 1978, 2009). We will argue below that small values of k are also useful in some cases, although large 

values of k yield small changes in .ky  

Al-Baali (2011) extends the global convergence property of the restricted Broyden class of methods to that 

of D-Broyden, assuming that   

 21 2

1
(1 ) ,k k

kk






 
                                                                      (9) 

where 1 2, (0,1),      

 
1

1
= , = , =

1

T T
k k k k k k

k k kT T
k k k k k k

s B s y B y
b h

b h s y s y






                                          (10) 

and (0,1]k   is given by   

= .
(1 )

k
k

k k kb




  
                                                                      (11) 

 
We observe that < 0,k  and that condition (9) hold for the choice = 0k  (the BFGS option) and any 

values of 1  and 2.  Thus, this condition adds another useful feature to the BFGS method over the other 

methods. Although prior to this result k was never chosen outside the interval ( ,1],k  except for a well 

defined SR1 method (see for example Fletcher, 1987), condition (9) allows any real number k for sufficiently 

small values of k  (or k ). 

Recently, for further restrictions on ,k  Al-Baali (2011) extends the above superlinear convergence 

property to the damped methods and shows that   

( 1) 0k k               and          
2

( 1) 0 ,k k kb h                                          (12) 

noting that = 1/ .k kb  

Although this result suggests using a small value of < 1k  when either term 1k   or 1k kb h   is 

sufficiently remotely away from zero, we use this value only when both terms satisfy the latter condition. This 

suggestion is due to the fact that the Broyden family of updates is reduced to a single update if = 1k kb h  (see for 

example Al-Baali, 1993). Therefore, since the first limit in (12) yields formula (6) (see Al-Baali, 1993), we 

consider employing it only when the value of the product k kb h is sufficiently larger than one so that we obtain   

2
2

4

3
3

4

, < min(1 , )
1 1

=
, 1 < <

1 1

1, otherwise ,

k
k

k

k k
k

k

h

h


 

 

 
 

 


  





  



                                                (13) 


NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 

5 

 
where 4 0.   This modified formula ensures that the value of k or its modification lies in 2 3[1 , 1 ]    

when 4> 1 .k kb h   If the value of 4 = 0  is used, then the damped-technique is always employed, except 

when the Broyden family is reduced to a single update. In particular, this value with 2 = 0.8  and 3 =   

reduces formula (13) to that of Powell (1978) for D-BFGS, except when = 1k kb h  and 2< 1 .k   Indeed, for 

some choices of these parameters, we observed that formula (13) works better than (6).  

We now consider the second limit of (12) which suggests using 
1/2

4= / ( 1)k k kb h    if 
2
4> 1 ,k kb h   

where 4 0,   which can be substituted into (11) to obtain the corresponding value of .k  However, since 4  

is chosen by the user and k k  if 1,k   we consider   

4
4, > 1

1=

1, otherwise ,

k k
k kk

b h
b h













                                                     (14) 

where, as above, 4 0.   If 4 = 0  is used, then as for formula (13), the damped-technique is always employed, 

except when the Broyden family is reduced to a single update. Since 
2

,k k   Al-Baali (2011) considers 

formula (14) with the denominator replaced by 1.k kb h   Although other choices for k exist, formulae (13) 

and (14) are considered here for illustration. 

We note that the above formulae for k are independent of the updating parameter k , so that the global 

convergence condition (9) may not hold for sufficiently large values of | | .k  Therefore, in this case, this 

condition can be enforced by choosing k sufficiently small. For example, the equality in (9) could be solved for 

k which can be substituted into (11) to obtain the largest acceptable value of .k  In this way, all the damped 

algorithms which we consider below for ( ,1)k k    (the divergent Broyden options) solved the considered 

problem successfully. However, alternatively, replacing 1k kb h   by   

= ( 1) max(1,| |)k k k ka b h                                                                    (15) 

in formula (14), it follows that   

          
4
4, >

=

1, otherwise .

k
kk

a
a












                                                                      (16) 

Using formula (16), we observed that all the damped algorithms which we consider below for several values of 

k from inside and outside the convex interval solved the problem successfully with performance significantly 

better than that of BFGS. 

On the basis of the above discussions, we state the following outline for practical damped quasi-Newton 

methods. 

We note that the strong Wolfe conditions in Step 3 yield the Wolfe conditions (2)-(3). If, on every iteration, 

the choice ˆ =k ky y  is used in Step 5, which follows from (1) and (13) with values of 2 = 1,  3 =   and 

4 = ,   then Algorithm 2.1 is reduced to the 'undamped' Broyden family of methods. In particular, this choice 

with = 0,k  for all values of k, reduces the D-BFGS method to the standard BFGS method. 

In practice, we observed that several values for 2 > 1 2  improve the performance of the BFGS and DFP 

methods (the latter significantly). If k is chosen such that the usual positive definiteness property holds (which 

we consider here), then any value of k satisfies condition (9). Indeed, we observed that sufficiently small 


MEHIDDIN AL-BAALI and ANTON PURNAMA 

6 

 
values of k are sometimes useful, particularly when | |k  is remotely away from zero. These features will be 

illustrated in the next section. 

 
Algorithm 2.1  D-Broyden class    

Step 0: Given a starting point 1,x  a symmetric and positive-definite matrix 1,B  positive values of 0  and 

1,  and a tolerance > 0.  Set = 1.k    

Step 1: Terminate if .kg     

Step 2: Compute the search direction 
1

= .k k kd B g


    

Step 3: Find a steplength k and a new point 1 =k k k kx x d  such that the following strong Wolfe 

conditions hold:   

  1 0 1 1, | | .
T T T

k k k k k k k k kf f d g d g d g                                                    (17) 

Step 4: Compute ,ks  ,ky  kb  and .kh    

Step 5: Choose values for k and .k    

Step 6: Update ky  to ˆky  and hence kB to 1,kB   using (1) and (4), respectively.   

Step 7: Set := 1k k   and go to Step 1.  

 
3.  A numerical example 

The question that has been most useful to the development of successful algorithms for unconstrained 

optimization is 'Does the method work well when the objective function is quadratic?' This statement is given by 

Powell (2009) who also states that the answer is very welcome and encouraging for the updating of second 

derivative matrices of quadratic models by a Broyden family method. Since this family usually suffers from ill-

conditioned problems, we would like to apply some selected Broyden family methods to a quadratic function 

with a very ill-conditioned Hessian matrix. 

Because quasi-Newton methods are invariant under linear transformation, the difficulty is unchanged if we 

choose the simple quadratic function  

 
21

( ) = ,
2

T
f x x x x R  

which has the solution * = 0x  and identity Hessian 
2

( ) = ,f x I  and the initial Hessian approximation 1B  is 

very ill-conditioned. In particular, Powell (1986) uses  

 1 = diag(1, ) ,B   

where > 0,  and  

 1
1

= , = .
11

c
x c

c 

 
 
   

 
The author simplifies the analytic derivation of 1kx   when calculated by either the BFGS or DFP method with 

= 1.k  He shows that for large values of ,  BFGS performs badly and DFP is far worse, although ky is exact. 

Note that the former performance is maintained for the modified BFGS methods which use (8). The derivation of 

Powell (1986) for 1kx   has been extended to a restricted Broyden class of methods by Byrd et al. (1992) who 

also show that the performance of this class becomes worse as k increases. Since these papers illustrate that 


NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 

7 

 
these methods usually suffer from large eigenvalues of kB rather than small ones, we will describe below the 

numerical results for a very large value of .  

In a similar manner, we have tested the D-Broyden class of methods by applying Algorithm 2.1 to the 

above problem. To define Step 0 of this algorithm, we let 1x  and 1B  be given as above, using 
10

= 10  and 

7
= 10 .


 We do not define 0  and 1  here, because = 1k  was used in Step 3 on all iterations whether this 

value satisfies the Wolfe conditions or not. Nevertheless, these conditions hold to the limit if a quasi-Newton 

method converges superlinearly (Dennis and Moré, 1974). Thus the algorithms we consider below differ only in 

the choices of k and k as required in Step 5 of Algorithm 2.1. All experiments were run in Matlab.  

The damped-technique (1) with suitable values of k  has the ability of correcting the Hessian 

approximations successfully. Here, we report the results for several choices of k  and 

7 7
{ 10 , 100, 0.5, 0, 0.5,1,1.5,100,10 }k      which belong to the so called convex, preconvex and postconvex 

classes of methods (defined for 0 1,k    < 0k  and > 0,k  respectively). We will report the number of 

function evaluations ( nfe ) which is required to terminate Algorithm 2.1 in Step 1 with the Euclidean norm 

.kg   This number is the same as that of the gradient evaluations as well as that of iterations. 'F' indicates 

the algorithm was terminated before the latter inequality held, ie the algorithm failed to solve the problem. We 

also consider the BFGS method for comparison, which required = 32nfe  to solve the problem. 

 
Table 1. D-BFGS with formula (6)  

   
  2  0.95 0.9 0.7 0.6 0.5 0.4 0.1 0.01 0.001 610


 
 nfe   32 32 32 27 35 47 220 2107 18887 F 

   
We firstly tested the D-BFGS method with 
k

  given by formulae (6), (13) and (14) for various choices of 

the parameters 2  and 4.  We do not consider choices for 3  here, because the second case in (6) and (13) 

was not used as the inequality 3> 1k   was not satisfied for 3 > 0.  The results expressed in terms of nfe  

are presented in Table 1. We used (6) for 10 different values of 2 (0,1).   For sufficiently large values of 2 ,  

D-BFGS required = 32nfe  to solve the problem, which is the same as that required by BFGS. The reason is that 

using 2 = 1  in formula (6) yields = 1k  which (by the damped-technique (1)) reduces the D-BFGS update to 

the BFGS one. We also note that D-BFGS works slightly better than BFGS when 2  is close to 0.6, while it 

becomes worse as 2  decreases in (0,0.5). Indeed, D-BFGS also failed to solve the problem when values of 

6
2 10


  were used. This observation is somehow expected on the basis of some results from the literature, 

because (according to our knowledge) values of 2 = 0.8  or 0.9 have been used in the D-BFGS method for 

constrained optimization during the last three decades, see for example (Powell, 2009). 

This drawback of formula (6) with choices of 2 0.5   can be avoided by employing it only when the 

BFGS update is sufficiently away from the rank one update, that is when k kb h is sufficiently away from 1. To 

illustrate this feature, we repeated the run using formula (13) with most of the above values of 2  and some 

values of 4 0.   The results are given in Table 2. We do not report the results for 4 > 2  here, because we 

observed that D-BFGS required the same nfe  as that required for 4 = 2.  It is clear that the damped-technique 


MEHIDDIN AL-BAALI and ANTON PURNAMA 

8 

 
improves BFGS substantially when 4  is defined sufficiently close to 0.5 and 2 0.6   and significantly for 

very small values of 2.  It is rather surprising that D-BFGS with 
6

2 = 10


 and 4 < 2  required at most 

= 7nfe  to solve the problem. This result shows that small values of 2  (and hence k ) are useful in practice. 

This table also shows that the choice of 4 = 0  (or nearly so) which yields that the damped-technique is 

employed on most iterations, is not desirable. 

 
Table 2. D-BFGS with formula (13)  

   
  4 2\   0.95 0.9 0.7 0.6 0.5 0.4 0.1 0.01 310


 
6

10


 
2 32 32 32 32 32 32 32 32 32 32 

1.5 32 32 32 20 18 17 12 8 7 6 

0.95 32 32 32 20 18 17 12 8 8 6 

0.5 32 32 32 20 18 17 12 8 8 5 

0.1 32 32 32 20 19 18 12 8 8 5 

0.001 32 32 32 22 20 19 13 8 8 5 
6

10


 32 32 32 24 21 19 14 9 8 5 

0 32 32 32 27 27 25 87 625 3918 7 

 
The damped-technique seems to work well if the left limit in (12) is enforced only when the term under the 

right limit, 1,k kb h   is sufficiently away from zero. Therefore, it is worth testing formula (14) which enforces 

the right limit in (12). The results are given in Table 3. We observe that the damped-technique improves over 

BFGS as 4  decreases. For example when 4 = 0.01,  D-BFGS required only = 8nfe , which is very small 

compared to 32 as required by BFGS. 

 
Table 3. D-BFGS with formula (14)  

   
  4  2 1 0.7 0.6 0.5 0.4 0.1 0.01 310


 
6

10


 0 

 nfe   32 19 17 16 15 14 11 8 7 5 4 

 
To see if the above encouraging numerical results for D-BFGS are generalised to other members of the D-

Broyden class, we repeated the run for formula (14) with the same values of 4  and several values of ;k  some 

of them are very remotely away from 0 (the BFGS option). The numerical results are presented in Table 4. 

In the first row of results, we used 4 =   to obtain = 1k  for all k and, hence, the corresponding 

undamped quasi-Newton methods were considered for comparison. The failures for < 0k  and > 1k  are 

expected, see for example (Byrd et al., 1992). Note that 
10

= 10nfe   required for = 1k  (the DFP option) is 

expected, see (Powell, 1986). 

The remaining results in Table 4 were obtained by testing condition (9) with 1 2= = 0.05.   If it does not 

hold for the computed ,k  we reduce this value such that this condition holds with equality. In this way 

convergence is enforced to all algorithms which, indeed, solves the problem successfully. Thus the second row 


NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 

9 

 
of results with 4 =   and condition (9) being enforced indicates that the damped-technique was employed for 

all algorithms defined with [0,0.95].k    

 
Table 4. D-Broyden with formula (14) and enforcing condition (9)  

   
  4 \ k   710  -100 -0.5 0 0.5 1 1.5 100 
7

10  

*  F F F 32 78 1010  F F F 

  16 21 16 32 78 411 118 150 18424 

2 17 9 8 32 78 411 118 65 18425 

0.95 17 9 13 19 22 30 30 65 18425 

0.5 14 10 15 15 16 18 21 65 18425 

0.1 11 19 10 11 11 12 15 66 18425 

0.01 11 7 8 8 8 10 13 65 18425 

0.001 11 7 7 7 7 9 12 64 18425 
6

10


 6 5 5 5 5 7 10 62 18425 

0 7 7 7 7 7 9 12 79 22172 

 * Condition (9) was not enforced.    

 
An examination of the other results in Table 4 shows the following. The damped algorithms with 1.5k   

and 4 0.1   work substantially better than BFGS. It is rather surprising that all algorithms with < 0,k  which 

sometimes yields indefinite Hessian approximations in the undamped algorithms, solved the problem 

successfully. Although BFGS works much better than the algorithms for 100,k   it is also surprising that they 

solved the problem successfully. Note that the slow convergence of these algorithms is avoided as described in 

the following paragraph. 

 
Table 5. D-Broyden with formula (16)  

   
  4 \ k   710  -100 -0.5 0 0.5 1 1.5 100 
7

10  

*  F F F 32 78 1010  F F F 

  16 21 16 32 78 411 118 150 18424 

0.95 20 16 13 19 22 27 23 16 18 

0.5 13 11 15 15 16 17 16 12 12 

0.1 12 8 10 11 11 10 10 8 34 

0.01 820 7 8 8 8 8 8 8 1109 

0.001 7 24 7 7 7 7 7 10 6 
6

10


 5 6 5 5 5 6 6 5 5 

0 4 4 4 4 4 4 4 4 4 

 * Condition (9) was not enforced.   

 
We now repeat the test with formula (16). The results are given in Table 5. We observe the surprising 

results that all damped algorithms solve the problem successfully. For sufficiently small values of 4 ,  the 


MEHIDDIN AL-BAALI and ANTON PURNAMA 

10 

 
algorithms performance is significantly better than that of BFGS, and the algorithms improve as 4  decreases. 

4 = 0  means that the damped-technique is always employed, except when the Broyden family of updating 

formulae is reduced to the SR1 update. All algorithms approach an optimal method with = 4nfe  which is 

required to solve the problem by a well defined SR1 method. Although 4 = 0  gives the best choice, the value 

of 4 = 0.95  might be typical for a general function, and further experiments on several test problems should be 

considered. 

It is worth reporting that an application of some algorithms to the above quadratic problem (or its linear 

transformation) with some choices for 1x  and 
10

1 = diag(1,10 )B


 has been considered. We observed, as 

expected, that all the algorithms solved the problem successfully with = 2nfe  or 3 whether the damped-

technique was employed or not. 

To see if the above results are generalised to general functions, we applied Algorithm 2.1, employing the 

strong Wolfe conditions (17) with the usual values of 
4

0 = 10


 and 1 = 0.9,  to some unconstrained 

optimization problems and observed the following. When k is sufficiently away from zero, the damped 

algorithms work significantly better than the undamped Broyden methods. For = 0,k  we observed that 

generally the standard BFGS method is preferable to the D-BFGS algorithm unless further restrictions on the 

damped parameter k are made or, similarly, the scalars 2 ,  3  and 4  are appropriately chosen. Indeed, Al-

Baali and Grandinetti (2009) showed that the performance of the D-BFGS method with formula (6) for 

2 = 0.9,  3 = 9  and further restrictions on k is substantially better than that of the BFGS method. This 

performance is improved further when both the damped and self-scaling techniques are combined in a certain 

sense (Al-Baali and Khalfan, 2009). For large-scale problems, Al-Baali (2004) introduced the damped formula 

(6) to the limited memory L-BFGS method, see for example (Nocedal and Wright, 1999). Using 2 = 0.6,  

3 3   and sufficiently large values of ,k  Al-Baali (2004) reported encouraging numerical results in certain 

cases. 

4.  Conclusion 

It is shown that the D-Broyden class of methods with appropriate choices for the damped-technique work 

very well when applied to the quadratic function of Powell (1986). In particular, the damped parameter (16) 

improves the performance of robust methods (e.g. BFGS) substantially and inefficient methods (e.g. DFP) 

significantly and enforces fast convergence of divergent methods. The numerical results also demonstrate that 

small values for the damped parameter k are useful in some cases. 

Since this finding contradicts the well known fact that large values of 0.8k   should be used in the D-

BFGS update of methods for the Lagrangian function in constrained optimization (see for example Fletcher, 

1987; Nocedal and Wright, 1999; Powell, 1978; Powell, 2009; Gill and Leonard, 2003), it is worth noting that 

smaller values of k using formulae (13) and (14) which modify that of Powell (1978) should also be further 

investigated. 

For a general function, these formulae can be used with ky replaced by the right hand side of (8) as in (Al-

Baali and Grandinetti, 2009). 

5.   Acknowledgment 

We would like to thank the two anonymous referees for making a number of valuable comments on a draft 

of this paper. 


NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 

11 

 
6.   References 

AL-BAALI, M. 1993. Variational quasi-Newton methods for unconstrained optimization. JOTA, 77: 127-143. 

AL-BAALI, M. 2003. Quasi-Wolfe conditions for Newton-like methods. 18th International Symposium on 

Mathematical Programming, Copenhagen, Denmark, August 18-22, 2003. 

AL-BAALI, M. 2004. Quasi-Wolfe conditions for quasi-Newton methods for large-scale optimization. 40th 

Workshop on Large-Scale Nonlinear Optimization, Erice, Italy, June 22 - July 1, 2004. 

AL-BAALI, M. 2011. Convergence analysis of a family of damped quasi-Newton methods for nonlinear 

optimization. Research Report DOMAS 11/2, Sultan Qaboos University, Oman. 

AL-BAALI, M. and GRANDINETTI, L. 2009. On practical modifications of the quasi-Newton BFGS method. 

Advanced Modeling and Optimization, 11: 63-76. 

AL-BAALI, M. and KHALFAN, H.F. 2009. A combined class of self-scaling and modified quasi-Newton 

methods. Optimization Online, 07/2334 (to appear in COAP). 

BYRD, R.H., LIU, D.C. and NOCEDAL, J. 1992. On the behavior of Broyden's class of quasi-Newton methods. 

SIAM J. Optim., 2: 533-557. 

DENNIS, J.E. and MORÉ, J.J. 1974. A characterization of superlinear convergence and its application to quasi-

Newton methods. Math. Comp., 28: 549-560. 

DENNIS, J.E. and SCHNABEL, R.B. 1996. Numerical Methods for Unconstrained Optimization and Nonlinear 

Equations, SIAM Publications, Philadelphia. 

FLETCHER, R. 1987. Practical Methods of Optimization (2nd edition). Wiley, Chichester, England. 

GILL, P.E. and LEONARD, M.W. 2003. Limited-memory reduced-Hessian methods for large-scale 

unconstrained optimization. SIAM J. Optim., 14: 380-401. 

GRATTON, S. and TOINT, P.L. 2010. Approximate invariant subspaces and quasi-Newton optimization 

methods. OMS, 25: 507-529. 

NOCEDAL, J. 1980. Updating quasi-Newton matrices with limited storage. Math. Comput., 35: 773-782. 

NOCEDAL, J. and WRIGHT, S.J. 1999. Numerical Optimization. Springer, London. 

POWELL, M.J.D. 1978. Algorithms for nonlinear constraints that use Lagrange functions. Math. Programming, 

14: 224-248. 

POWELL, M.J.D. 1986. How bad are the BFGS and DFP methods when the objective function is quadratic?. 

Math. Programming, 34: 34-47. 

POWELL, M.J.D. 2009. On nonlinear optimization since 1959. Optimization Online, 11/2477.    

YABE, H., OGASAWARA, H. and YOSHINO, M. 2007. Local and superlinear convergence of quasi-Newton 

methods based on modified secant conditions. Journal of Computational and Applied Mathematics, 205: 

617-632. 

 
Received   20 April 2011                

Accepted   4 October 2011