A New Sparse Quasi-Newton Update


SQU Journal for Science, 17 (1) (2012) 30-43 

© 2012 Sultan Qaboos University 

30 

A New Sparse Quasi-Newton Update  
Method  

Minghou Cheng*,  Yu-Hong Dai** and  Rui Diao* 

Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of 
Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, *Email: 
chengmh@lsec.cc.ac.cn; diaorui@lsec.cc.ac.cn. **Email: dyh@lsec.cc.ac.cn. 

 
ABSTRACT: Based on the idea of maximum determinant positive definite matrix completion, 

Yamashita proposed a sparse quasi-Newton update, called MCQN, for unconstrained optimization 

problems with sparse Hessian structures. Such an MCQN update keeps the sparsity structure of the 

Hessian while relaxing the secant condition. In this paper, we propose an alternative to the MCQN 

update, in which the quasi-Newton matrix satisfies the secant condition, but does not have the same 

sparsity structure as the Hessian in general. Our numerical results demonstrate the usefulness of the 

new MCQN update with the BFGS formula for a collection of test problems. A local and superlinear 

convergence analysis is also provided for the new MCQN update with the DFP formula. 

 
KEYWORDS: Large-scale, Matrix completion, Quasi-Newton methods, Secant condition, Sparsity, 

Unconstrained optimization. 

 المتناثرة نيوتنمتماثلة تحسين جديد لطريقة 

  يوهونج داي و روي دياومنغو تشينج و 

نيوتن  متماثلةلى فكرة إكمال المحدد األقصى لمصفوفة موجبة محددة، اقترح ياماشيتا تحسيناً لطريقة إباالستناد ص: خمل
مسائل األمثليات غير المقيدة مع أصفار متناثرة في مصفوفة هس. يحافظ هذا وذلك لحل   MCQNتناثرة وسماها الم

 MCQN التناثر في مصفوفة هس مع تخفيف شرط القاطع. نقترح في هذا البحث بديال عن تحسينالتحسين على هيكلية 
هيكلية التناثر لمصفوفة هس بشكل عام. تبين نفس لك تمحققة لشرط القاطع، ولكنها التم نيوتن متماثلةبحيث تكون مصفوفة 

. كذلك تم تجريبيةمجموعة من المسائل ال لحل  BFGSقانونالجديد مع  MCQN استخدام تحسينفائدة نتائجنا العددية 
 .DFP قانونالجديد مع  MCQN قانون الخطي الفائق لتحسينالموضعي وتحليل التقارب 

 
1. Introduction 

onsider the unconstrained optimization problem  

                                             ( ) ,min
n

x R

f x



                                                                                  (1) 

where : 
n

f R R  is continuously differentiable and its gradient ( )f x  is available. If the dimension n is not 

C 


A NEW SPARSE QUASI-NEWTON UPDATE METHOD 

31 

large, the quasi-Newton method is one choice for solving problem (1) because of its superlinear convergence and 

the unnecessity to calculate the function Hessian. Assuming that kx is the current iterate and kH is the 

approximation to the inverse Hessian, the quasi-Newton method generates the next iterate by  

                                 1 = ( ) ,k k k k kx x H f x    (2) 

where > 0k  is a stepsize obtained via some line search, and updates the approximation kH  to 1kH   to meet 

the secant condition  

                                          1 = ,k k kH y s  (3) 

where 1=k k ks x x   and 1= ( ) ( ).k k ky f x f x   

If the dimension of the problem (1) is large, the direct use of the quasi-Newton method is not possible due 

to the storage of an n n  matrix. In order to overcome this difficulty, several methods have been proposed. The 
limited-memory BFGS (L-BFGS) method (Liu and Nocedal, 1989; Nocedal, 1980) is only to store a few 

curvature pairs ( , )i is y  in the construction of the Hessian approximation. Since there is no need to know any 

information about the Hessian, the L-BFGS method is friendly to users and has been widely used in practice. For 

many large-scale problems, the function f can be written in the form  

=1

( ) = ( ) ,
ne

i
i

f x f x  

where each of the en  element functions if  depends only on a few variables. In this case, the partitioned quasi-

Newton method, developed by Griewank and Toint (see Griewank and Toint, 1982a, 1982b; Griewank and 

Toint, 1984; and the references therein), performs very well in practice, and is now regarded as one of the 

important practical optimization algorithms. Their basic idea is to update a Hessian approximation 
i
kB  for each 

element function if  and then to assemble these matrices to obtain an approximation kB  to the whole Hessian 

of f. Further, they determine the search direction by solving the linear system 

=1
( ) = ( ) .

n ie
k k ki

B d f x  

Their method was implemented with the trust region strategy since the matrix kB  is not positive definite in 

general. 

There are also many large-scale problems where the function Hessian 
2

( )f x  is sparse and the sparsity 

structure is available. Suppose that for all ,
n

x R   

                                      
2

,[ ( )] = 0,   ( , ) ,i jf x i j F   (4) 

where F  is some subset of I I  and = {1, 2, , }.I n  In this case, it is possible to establish faster optimization 

methods by exploiting the sparsity structure of the Hessian. Toint (1977) and Fletcher (1995) studied such 

updates and required 1kH   to meet the sparsity requirement, namely, 
1

1 , 1 ,( ) = ( ) = 0k i j k i jH B

   when 

( , ) ,i j F  and the secant equation (3) simultaneously. As a result, their methods involve the solution of a linear 

system or a convex program at each iteration. If some component of ks  is zero, the obtained approximate 

Hessian may be ill-posed (see Sorensen's example (Sorensen, 1982)). Inspired by the successful use of positive 

definite matrix completion in (Fukuda et al., 2000) for semidefinite programming, Yamashita (2008) proposed a 

novel type of quasi-Newton update for problem (1) satisfying the sparse Hessian structure. Let :
n n

R R


  

be a strictly convex function defined by  

                                  ( ) =tr( ) ln det( )A A A   (5) 

(This function is introduced in (Byrd and Nocedal, 1989) as a powerful tool for the convergence analysis of 


MINGHOU CHENG, YU-HONG DAI and RUI DIAO 

32 

quasi-Newton methods). Yamashita determines the new approximation matrix 1kH   from kH  by two steps: 

(i) update kH  to 
QN

H  by certain ordinary quasi-Newton formula; 

(ii) obtain 1kH   by solving the following subproblem with H,  

                          
1/ 2 1/ 2

, ,

1
,

min ( )

s.t. = , ( , )

( ) = 0 , ( , )

.

k k

QN
i j i j

i j

H HH

H H i j F

H i j F

H S


 











 (6) 

Here S   denotes the set of symmetric positive semidefinite matrices. Notice that since step (ii) uses , ,
QN
i j

H  

( , ) ,i j F  we only have to update | |F  elements of kH  in step (i), where | |F  means the cardinality of F. As 

in (Yamashita, 2008), we call the above update MCQN (Matrix Completion Quasi-Newton). The use of DFP and 

BFGS methods in step (i) are considered in (Yamashita, 2008). 

Further, Yamashita showed that, if the sparsity pattern of the Hessian is such that there is not any fill-in in 

its Cholesky factorization, or equivalently, the graph induced by the Hessian is chordal (see (Yamashita, 2008) 

for details), problem (6) is equivalent to finding a maximum-determinant positive definite matrix completion of 

,
,

QN
i j

H  ( , )i j F :  

                            , ,

max det( )

s.t. = , ( , )

.

QN
i j i j

H

H H i j F

H S 





 (7) 

The above problem can be easily solved by analyzing the clique tree of the graph induced by the Hessian (see 

(Yamashita, 2008) for details). In addition, it is shown in (Yamashita, 2008) that the update does not suffer from 

the drawback in Sorensen's example (Sorensen, 1982). Therefore by relaxing the secant equation, the MCQN 

update is easy to implement and is well-posed. 

The numerical experiments in (Yamashita, 2008) show that, the MCQN update with BFGS obviously 

performs better than the MCQN update with DFP. As seen from the above procedure, the MCQN update by 

Yamashita keeps the sparsity structure of the Hessian, but does not satisfy the quasi-Newton condition. 

Nevertheless, local and superlinear convergence results are only established for an MCQN update with DFP. Dai 

and Yamashita (2007) extended the results to the MCQN update with Broyden's family. 

In this paper, we propose an alternative of the MCQN update, in which the quasi-Newton matrix satisfies 

the secant condition, but does not have the same sparsity structure as the Hessian in general (see the next 

section). A local and superlinear convergence analysis is also provided for the new MCQN update with DFP (see 

Section 3). Our numerical results for a collection of test problems demonstrate that the new MCQN update with 

BFGS clearly outperforms the previous MCQN update with BFGS (see Section 4). Conclusions and discussions 

are presented in the last section. 

2. The new MCQN method 

Looking back to the MCQN update by Yamashita (2008), the whole sequence of quasi-Newton matrices, 

which were used for the calculations of search directions, keep the same sparsity structure as the function 

Hessian. The intermediate matrix 
QN

H satisfies the secant condition, but does not necessarily have the same 
sparsity structure as the function Hessian. As an alternative of such an MCQN update, we may think of the 

possible use of those intermediate matrices in the calculations of search directions. In this paper, we explore this 


A NEW SPARSE QUASI-NEWTON UPDATE METHOD 

33 

possibility and our numerical results in Section 4 demonstrate the usefulness of such an idea. 

To describe the new MCQN update, we assume that the current quasi-Newton matrix is ,kH  which is 

symmetric and positive definite. Since kH does not have the sparsity structure of the function Hessian in 

general, we consider the optimal solution of the following subproblem as an intermediate matrix ,
S
kH   

                        
1/ 2 1/ 2

, ,

1
,

min ( )

s.t. = ( ) , ( , )

( ) = 0 , ( , )

,

k k

i j k i j

i j

H HH

H H i j F

H i j F

H S


 











 (8) 

where, again S   denotes the set of symmetric positive semidefinite matrices and F is some subset of I I  such 

that (4) holds. If the graph induced by the Hessian is chordal (otherwise, we extend the set F such that the 

induced graph has such property), we know that 
S
kH  possesses the sparse structure and has the following form 

of sparse clique factorization  

                         1 2 1 2 1= ,
T T T

l l lH P P P QP P P P  (9) 

where , = 1, 2, ,iP i l  and Q are some sparse matrices (see (Yamashita, 2008)). Having obtained the 

intermediate matrix ,
S
kH  we can use for example the BFGS formula to generate a new quasi-Newton matrix  

 1 = 1 ,
S T T S T S T

S k k k k k k k k k k k
k k T T T

k k k k k k

H y s s y H y H y s s
H H

s y s y s y


 
   

 
 

 (10) 

which satisfies the secant condition 1 =k k kH y s  and will be used for the calculation of search directions. In 

other words, we alter the two steps in the original MCQN method and determine the new approximation matrix 

1kH   in the following way: 

(i) obtain 
S
kH  by solving problem (7); 

(ii) update 
S
kH  to 1kH   by certain quasi-Newton formula. 

It is not difficult to see that the amount of computation of such a strategy is almost the same as that 

required by the original MCQN. However, since the new approximation matrix 1kH   satisfies exactly the secant 

condition, we think that the new quasi-Newton matrix 1kH   contains more information about the function 

Hessian. The numerical results confirm our idea. 

A description of the new sparse quasi-Newton method is given as follows.  

 
Algorithm 2.1 (NMCQN)    

Step 1. Obtain an extension of F (still denoted by F) such that ( , )G V F  is chordal. Choose 0
n

x R  and  

a positive definite matrix 0H  with 
1

0( ) = 0, ( , ) .ijH i j F


   Set := 0.k   

Step 2.  If kx  satisfies the termination criterion, then stop.  

Step 3.  1 = ( ).k k k kx x H f x     

Step 4.  Obtain the sparse clique-factorization formula (9) of .
S
kH   

Step 5.  Obtain 1,kH   ( , )i j F  by some ordinary quasi-Newton update.  

Step 6.  Set := 1k k   and go to Step 2.  

  
MINGHOU CHENG, YU-HONG DAI and RUI DIAO 

34 

We call the New MCQN method NMCQN. If, in Step 5, the quasi-Newton matrix 1kH   is obtained from 

S
kH  by (10), we denote 1 = BFGS( , , )

S
k k k kH H s y  and call the corresponding algorithm NMCQN-BFGS. If 

the DFP update formula is used, namely,  

                        1 = ,
S T S T

S k k k k k k
k k T S T

k k k k k

H y y H s s
H H

y H y s y
    (11) 

 we denote 1 = DFP( , , )
S

k k k kH H s y  and call the corresponding algorithm NMCQN-DFP. 

3. Convergence analysis 

In this section, we show the local and superlinear convergence of NMCQN-DFP, namely, Algorithm 2.1 

with 1 = DFP( , , ).
S

k k k kH H s y  The results are established in a manner similar to (Yamashita, 2008). 

We give the following assumptions on the objective function, where   means the two-norm. 

 
Assumption 3.1  Let *x  be a solution of (1) and let *= { : }
n

x R x x b    with a positive constant b. 

(i) The objective function f is twice continuously differentiable on .  

(ii) There exist positive constants m and M such that  

  
12 22

( ) ,
T n

m z z f x z M z z R


      

for all .x    

 
If the second-order sufficient optimality condition holds at the solution *x  and b is sufficiently small, 

Assumption 3.1(ii) holds. From Assumption 3.1(i), 
2

( )f x  is Lipschitz continuous on .  Then, from Lemmas 

4.1.12 and 4.1.15 in (Dennis and Schnabel, 1983), there exist 1L  and 2L  such that for all 1, ,k kx x     

                             
22

* 1( )k k ky f x s L s   (12) 

and  

                            
2

* 2( ) ,k k k ky f x s L s   (13) 

where k  is defined by  

                             1 * *= max , .k k kx x x x     (14) 

Moreover, there exists a positive constant 3L  such that for all 1 2,z z  ,  

                          1 2 3 1 2( ) ( ) .f z f z L z z     (15) 

Therefore, we have  

        1 3 1= ( ) ( ) , .k k k k k ky f x f x L s x x C       (16) 

From Eq. (8.1.2) of (Nocedal and Wright, 1999) we have  

                                            = ,k k ky G s  (17) 

where kG  is the average Hessian defined by 
1 2
0

= ( ) .k k kG f x ts dt   

For convenience, the following notations are used in the analysis.  

 
1

2 2
* * * *= ( ),  = ( ) ,G f x H f x


  
 

   
1/2 1/2

* *= ,  = ,k k k ks H s y H y


 
A NEW SPARSE QUASI-NEWTON UPDATE METHOD 

35 

1/2 1/2 1/2 1/2
* * * *= ,  = ,

S S
k k k kH H H H H H H H

   
 

2
cos = , = ,

T S T S
k k k k k k

k kS
k k k k

y H y y H y
q

y H y y
       

2

= , = ,
T

k k k
k kT T

k k k k

s y s
M m

y s y y
 

where k  is the angle between ky  and .k kH y  

Firstly, we introduce the following two lemmas similarly to (Yamashita, 2008).  

 
Lemma 3.1  Suppose that Assumption 3.1 holds. Then there exist (0, )c    and (0, )b   such that  

ln 2 ,

1 .

k k

k k

m c

M c





 

 
 

whenever < .k    

 
Lemma 3.2  Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ).
S

k k k kH H s y  Then 

we have  

( ) ( ) ,
S
k kH H   

where ( ) = tr( ) ln det( ).A A A   

 
By using the above lemmas, we show the following key inequality.  

 
Lemma 3.3  Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ).
S

k k k kH H s y  Let 

  be the constant specified in (13). If ,k   then we have  

 1 2 2 2

1
( ) ln 1 ln ( ) 3 .

cos cos cos

k k
k k k

k k k

q q
H H c  

  


 
      

 
 

Proof. By Assumption 3.1(ii) and (17), we have  

=
T T
k k k k k

T T
k k k k

y s y H y
m

y y y y
  

and  

= ,
T T
k k k k k

T T
k k k k

y y z H z
M

y s z z
  

where 
1/2

=k k kz H y


 and 
1

= .k kH G


 
Since 1kH   is obtained from 
S
kH  by the DFP formula, we have  

1/ 2 1/ 2
1 * 1 *

1/ 2 1/ 2 1/ 2 1/ 2
* * * *

1/ 2 1/ 2 1/ 2 1/ 2
* * * *

1/ 2 1/ 2 1/ 2 1/ 2 1/ 2 1/ 2
* * * * * *

=

=

=

k k

S T S T
S k k k k k k
k T S T

k k k k k

S T S T
S k k k k k k
k T S T

k k k k k

H H H H

H y y H s s
H H H H H

y H y y s

H H y y H H H s s H
H

y H H H H H y y H H s

 
 

   

 

  

 
  

 
 

 

 
MINGHOU CHENG, YU-HONG DAI and RUI DIAO 

36 

          = .
S T S T

S k k k k k k
k T S T

k k k k k

H y y H s s
H

y H y y s
   (18) 

Since 
2

tr( ) =
T

zz z  for ,
n

z R  it follows from (18) that  

                
2 2

1tr( ) = tr( ) .

S
S k k k

k k T S T
k k k k k

H y s
H H

y H y y s
    (19) 

In a manner similar to the use of Eqs. (8.45) in (Nocedal and Wright, 1999), we can show that  

                          1det( ) = det( ) .
T

S k k
k k T S

k k k

y s
H H

y H y
  (20) 

Moreover, simple calculations show that  

                        
2

2
= =

T T
kk k k k k

T S T S
kk k k k k kk

yy s y s m

qy H y y H yy
 (21) 

and  

           
2 2 2

2 2 2
= = .

( ) cos

S ST S
k k k k kk k k k

T S T S
kk k k k k kk

H y H y yy H y q

y H y y H yy 
 (22) 

It follows from (19), (20), (21) and (22) that  

 
1 1 1

2

2

2 2

( ) = tr( ) ln det( )

= tr( ) ln det( ) ln ln
cos

= ( ) ln 1 1 ln ln cos .
cos cos

k k k

S Sk
k k k k k

k

S k k
k k k k

k k

H H H

q
H M H m q

q q
H M m





 
 

  

    

      

 (23) 

Lemmas 3.2 and 3.3 give  

ln 1 1 2 1 = 3k k k k kM m c c c         

and  

( ) ( ) .
S
k kH H   

Then it follows from (23) that  

 
2

1 2 2
( ) ( ) 3 ln cos 1 ln .

cos cos

k k
k k k k

k k

q q
H H c   

 
        (24) 

Therefore  

 1 2 2 2

1
( ) ln 1 ln ( ) 3 ,

cos cos cos

k k
k k k

k k k

q q
H H c  

  


 
      

 
 (25) 

which completes the proof.                                                                                                                                       □ 

 
Using inequality (25), the local and superlinear convergence will be shown.  

 
Theorem 3.1  Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ).
S

k k k kH H s y  

Then, for any (0,1),   there exist x  and H  such that 0 * xx x    and 0 * HH H    imply  

1 * *k kx x x x    , for all k.  


A NEW SPARSE QUASI-NEWTON UPDATE METHOD 

37 

 
Proof. Suppose that (0,1).   The following inequalities will be shown to hold for all k,  

                              1 * * ,k kx x x x     (26) 

                                    *
3

,
2

kH H
L


   (27) 

where 3L  is the Lipschitz constant of .f  

First, note that by choosing x  to be sufficiently small, we have  

                                     1 < ,  ,
2

x xL M


    (28) 

where 1,L  M and   are the constants specified in (12), Assumption 3.1(ii) and (13), respectively. Moreover, 

according to Lemma 4 in (Yamashita, 2008), by choosing x  and H  to be sufficiently small, there exists 

(0, )   such that  

                                     0( ) < ,
2

H n


   (29) 

                      *
3

( ) <
2

H n H H
L


      (30) 

and  

                                        
3

,
1 2

xc 





 (31) 

where H is a symmetric positive definite matrix, 
1/2 1/2

* *= ,H H HH
 

 and c is the constant specified in Lemma 

3.1. 

We prove the theorem by induction. When = 0,k  the inequality (27) holds from (29) and (30). Moreover, 

we have  

 
1 * 0 0 0 *

0 * * 0 0 * 0 *

* 0 * * 0 * 3 0 * 0 *

2
1 * 0 * 0 *

1 0 *

0 *

= ( )

( ) ( )( ( ) ( ))

( ( ) ( ) ( ))

2

( )
2

.

x

x x x H f x x

x x H f x H H f x f x

H f x f x G x x L H H x x

L H x x x x

L M x x

x x








   

       

       

   

  

 

 
where the second inequality follows from (15), the third inequality follows from (12) and (30), the forth 

inequality follows from Assumption 3.1(ii) and 0 * ,xx x    and the final inequality follows from (28) and 

< .   

Next, assuming (26) and (27) for = 0, 1, , 1,k l   we shall show they are also true for = .k l  In fact, 

similarly to the case in which = 0,k  we have  


MINGHOU CHENG, YU-HONG DAI and RUI DIAO 

38 

1 * *

* * * *

* * * * 3 * *

2
1 * * *

1 * *

( )

( ) ( )( ( ) ( ))

( ( ) ( ) ( ))

2

( )
2

l l l l

l l l l

l l l l

l l

l l

x x x H f x x

x x H f x H H f x f x

H f x f x G x x L H H x x

L H x x x x

L M x x x x





     

       

       

   

   

 
and thus 

1 * 1 *

*

( )
2

l
l x l

l

x x L M x x

x x


 



    

 

 
where the first inequality follows from the induction assumption. This shows the truth of (26) for = .k l  In the 

following, we show (27) by (25) in Lemma 3.3. Summing up the inequality (25) with = 0, 1, , 1,k l   we have  

1 1

02 2 2
=0 =0

1
( ) ln 1 ln ( ) 3 .

cos cos cos

l l
k k

l k
k kk k k

q q
H H c  

  

   
         

  

 
Since 0 < cos 1k   and the term in the square brackets is nonpositive, we have  

                        
1

0
=0

( ) ( ) 3 .
l

l k
k

H n H n c  


      (32) 

From (14) and the induction assumption, we have  

   11 * *= max , max , =k k kk k k x x xx x x x           
for = 0, 1, , 1,k l   and thus  

 
1 1

=0 =0

1
= .

1 1

ll l
k x

k x x
k k


   

 

  
  

 
 

It then follows from (32), (29) and (31) that  

0

3
( ) ( ) < .

1

x
l

c
H n H n


  


   


 

From (30), we have 
2 1

*
3

( ) ,
2

lH f x
L


   which is (27) for = .k l  Thus by induction, (26) and (27) hold 

for all k.                                                                                                                                                                     □ 

 
In order to show the superlinear convergence, we build the following relationship similarly to (Yamashita, 

2008),  

       
* *( ) ( )

= 0 = 0 .lim lim
k k k k

k kk k

H H y B G s

y s 

 
  (33) 

 
Lemma 3.4  Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ).
S

k k k kH H s y  

Suppose also that 0 * xx x    and 0 * HH H    with the constants x  and H  specified in Theorem 3.1 

for sufficiently small (0,1).   Then relationship (33) holds.  

 
A NEW SPARSE QUASI-NEWTON UPDATE METHOD 

39 

Proof. Let , = 1, 2, ,
k
i i n  be the eigenvalues of .kH  Since the inequality (27) holds for sufficiently small 

,  we can assume that there exists > 0min  such that 
k
i min   for all i and k. Moreover, since 

* *= ( ) ,k k k ky G s G G s   we have  

* * * * *

* * *

* * *

( ) = ( ) ( )( )

( )

( ) .

k k k k k k k

k k k k k k

min k k k k k

H H y H H G s H H G G s

H G B s H H G G s

B G s H H G G s

    

    

    

 
It then follows from (16) that  

* * * *

3 3

( ) ( ) ( )
.

k k min k k k k

k k

H H y B G s H H G G

y L s L

   
   

Since 
1 2
0

= ( )k k kG f x ts dt   and *kx x  by Theorem 3.1, the second term of the right-hand side of the 

inequality converges to 0 as .k   Then, relationship (33) holds.                                                                      □ 

 
In the following, we give the main result of this section.  

 
Theorem 3.2  Suppose that Assumption 3.1 holds. Suppose also that 0 * xx x    and 0 * HH H    hold 

for sufficiently small , > 0.x H   Then the sequence { }kx  generated by Algorithm 2.1 with 

1 = DFP( , , )
S

k k k kH H s y  converges to *x  superlinearly.  

  
Proof. From Lemma 3.4, it suffices to show  

                                     
*( )

= 0 .lim
k k

k k

H H y

y


 (34) 

Notice by Theorem 3.1 that for any (0,1),   relation (27) holds provided , > 0x H   are sufficiently small. 

Then we must have that  

                                        * = 0 ,lim k
k

H H


  (35) 

otherwise there is a contradiction. Therefore (34) is true since * *( ) .k k k kH H y H H y    By Lemma 

3.4, we conclude the superlinear convergence of the algorithm.                                                                             □  

 
Comparing the proof of Theorem 4 in (Yamashita, 2008), the above proof is simpler and is directly 

obtained by one statement in Theorem 3.1. 

4. Numerical results 

We tested the new MCQN algorithm with the DFP and BFGS formulas, namely, NMCQN-DFP and 

NMCQN-BFGS, by using Matlab R2008a on Core(TM)2 PC with Windows-XP. We used the three problems in 

(Yamashita, 2008) and two more problems in (Dai and Yamashita, 2007) in our numerical experiments. Various 

dimensions, namely, = 10,n  100, 1000 and 10000, were chosen for the problems. All tested problems were 

tridiagonal. Therefore, the chordal extensions of their sparsity pattern can easily be obtained. 

 
The details of the problems are given as follows, where inix  means the used initial point. 

 
MINGHOU CHENG, YU-HONG DAI and RUI DIAO 

40 

Problem I (TRIDIA (Gould and Toint, 2003))  

2 2
1 1

=2

( ) = ( 1) ( 2 ) ,

= (1, 1, , 1) .

n

i i
i

T
ini

f x x x x

x

  
 

Problem II (Extended Rosenbrock function (Fletcher,1995))  
1

2 2 2
1

=1

( ) = 100( ) (1 ) ,

= ( 1.2, 1, 1.2, 1, , 1.2, 1) .

n

i i i
i

T
ini

f x x x x

x



   

  

 
Problem III (Boundary value problem (Fletcher,1995))  

2
=1

1 1
( ) = (cos 2 ),

2 ( 1)

1 2
= ( , , , ) ,

1 1 1

n
T T

n i i
i

T
ini

f x x Tx e x x x
n

n
x

n n n

  


  

 
where = (1, 1, , 1) ,
T

ne   

2 1

1 2 1

= .1 2

1

1 2

T

 
 
 
 
 
 

 
  

 
Problem IV (Extended Powell singular function (Moré et al., 1981))  
/4

4 4 2 2
4 3 4 4 2 4 1 4 1 4 4 3 4 2

=1

( ) = 10( ) ( 2 ) 5( ) ( 10 ) ,

= (3, 1, 0, 1, , 3, 1, 0, 1) .

n

i i i i i i i i
i

T
ini

f x x x x x x x x x

x

     
       
 

 

 
Problem V (Broyden tridiagonal function (Moré et al., 1981)) 
2 2 2 2

1 1 2 1

1
2 2

1 1
=2

( ) = (3 2 2 1) (3 2 1)

(3 2 2 1) ,

= ( 1, , 1) .

n n n

n

i i i i
i

T
ini

f x x x x x x x

x x x x

x





 

      

    

 

 
The following termination criterion is employed:  

5( )
10 or > 50000 .

kf x
k

n


  

Instead of Step 3 in Algorithm 2.1, we set 1 = ( )k k k k kx x H f x    with a step size k  to improve the 

numerical performance. k  is chosen to satisfy Wolfe's rule :  


A NEW SPARSE QUASI-NEWTON UPDATE METHOD 

41 

4
( ) ( ) 10 ( ) ,

| ( ) | 0.9 ( ) .

T
k k k k k k k

T T
k k k k k k

f x d f x f x d

f x d d f x d

 




   

    
 

For convenience, we also list the numerical results reported for the MCQN update, as well as for the BFGS 

and L-BFGS methods, in (Yamashita, 2008). See Tables 1 and 2. For the L-BFGS method, = 5,m  which is the 

number of stored curvature pairs of L-BFGS, and the scaling factor 
2

1 1 1/
T
k k ks y y    was employed. L-BFGS 

just stores the pairs of vectors ( , ), = 1, , ,k ks y k m  while our method just stores a few entries of kH  to 

exploit the sparsity structure of the Hessian. Therefore, both of them have fast implementation in practice. 

Although NMCQN-DFP has a nice theoretical convergence property as shown in the previous section, its 

numerical performance is not very good. Hence we only list the results of NMCQN-BFGS. The tables list the 

total number of iterations. The symbol "F" denotes that the number of iterations exceeds 50000  and "-" means 

that the BFGS method could not be implemented for = 10000.n  

 
Table 1. Results of problems I, II and III 

 
Problem  n   BFGS L-BFGS MCQN-DFP MCQN-BFGS NMCQN-BFGS 

  10 15 31 20 29 47 

I 100 108 126 167 72 75 

 1000 662 415 1498 192 195 

 10000 - 1191 11626 528 475 

  10 78 68 76 60 514 

II 100 487 527 665 341 1002 

 1000 4525 4979 6574 3207 690 

 10000 - 49580 F 31737 403 

  10 15 24 15 15 13 

III 100 107 299 49 50 18 

 1000 571 3117 86 54 23 

 10000 - F 2600 402 25 

  
Table 2. Results for problems IV and V 

 
Problem  n   MCQN-BFGS NMCQN-BFGS 

  10 40 37 

IV 100 211 324 

 1000 589 121 

 10000 998 97 

  10 30 52 

V 100 56 54 

 1000 49 61 

 10000 56 52 

  
The results in Table 1 show that NMCQN-BFGS is the best among the methods compared. For Problems 

IV and V, we only present numerical results for MCQN-BFGS by Yamashita (2008) and NMCQN-BFGS. From 

Tables 1 and 2, we can see that the NMCQN-BFGS method performs much better than MCQN-BFGS and BFGS 

and L-BFGS. 


MINGHOU CHENG, YU-HONG DAI and RUI DIAO 

42 

For a further comparison between NMCQN-BFGS with MCQN-BFGS, we tested the new method with 

different choices of initial points. The initial points for the five test problems are ,inix  2 ,inix  4 ,inix  7 inix  and 

10 .inix  The dimension of the test problems are fixed to be = 1000.n  We list the number of iterations of the two 

methods for different test problems in Table 3. The results show that NMCQN-BFGS is better than MCQN-

BFGS. Therefore the new method, NMCQN-BFGS, is a promising alternative of MCQN-BFGS especially for 

large-scale problems. 

5. Conclusions and discussion 

In this paper, we have proposed an alternative to the sparse quasi-Newton update method (MCQN) by 

Yamashita (2008). The quasi-Newton matrix in the new MCQN method (denoted by NMCQN) satisfies exactly 

the secant condition, but does not possess the same sparsity structure as the function Hessian in general. We have 

established the local and superlinear convergence of NMCQN with the DFP updating formula, namely, 

NMCQN-DFP. The numerical experiments showed that NMCQN-BFGS is promising especially for large-scale 

problems. 

 
Table 3. Results with different initial points 

 
Problem 
Initial MCQN NMCQN 

Problem 
Initial MCQN NMCQN 

Point -BFGS -BFGS Point -BFGS -BFGS 

  
inix  192 195   inix  54 121 

 2 inix  210 202   2 inix  213 191 

I 4 inix  213 199 IV 4 inix  210 249 

 7 inix  220 215   7 inix  228 370 

 10 inix  213 210   10 inix  294 252 

  
inix  3207 690   inix  54 61 

 2 inix  4850 3753   2 inix  213 54 

II 4 inix  5056 3786 V 4 inix  210 90 

 7 inix  2157 813   7 inix  228 130 

 10 inix  4961 853   10 inix  294 62 

  
inix  54 23         

 2 inix  213 24         

III 4 inix  210 24         

 7 inix  228 24         

 10 inix  294 25         

 
It still remains under investigations how to provide a good combination of the MCQN update and the new 

NMCQN method by adaptively choosing one of them based on the property of the problems. When the sparsity 

dominates the performance, the MCQN update may be preferable; otherwise we could implement the NMCQN 

update. In addition, the new update method can obviously be extended to the whole Broyden's convex family. 

 
A NEW SPARSE QUASI-NEWTON UPDATE METHOD 

43 

 
6. Acknowledgements  

This work was partly supported by the Chinese NSF grants (no. 10831106), the CAS grant (no. kjcx-yw-

s7-03) and the China National Funds for Distinguished Young Scientists (no. 11125107) 

7. References 

BYRD, R. and NOCEDAL, J. 1989. A tool for the analysis of quasi-Newton methods with application to 

unconstrained optimization. SIAM Journal on Numerical Analysis, 26: 727-739. 

DAI, Y.H. and YAMASHITA, N. 2007. Analysis of sparse quasi-Newton updates with positive definte matrix 

completion. Research report, Academy of Mathematics and Systems Science, Chinese Academy of 

Sciences, Beijing. 

DENNIS, J.E. and SCHNABEL, R.B. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear 

Equations, Prentice-Hall Inc., New Jersey. 

FLETCHER, R. 1995. An optimal positive definite update for sparse Hessian matrices. SIAM Journal on 

Optimization, 5: 192-218. 

FUKUDA, M., KOJIMA, M., MUROTA, K. and NAKATA, K. 2000. Exploiting sparsity in semidefinite 

programming via matrix completion I: General frameworks. SIAM Journal on Optimization, 11: 647-674. 

GRIEWANK, A. and TOINT, Ph.L. 1982a. Partitioned variable metric updates for large structure optimization 

problems. Numerische Mathematik, 39: 119-137. 

GRIEWANK, A. and TOINT, Ph.L. 1982b. Local convergence analysis of partitioned quasi-Newton updates. 

Numerische Mathematik, 39: 429-448. 

GRIEWANK, A. and TOINT, Ph.L. 1984. Numerical experiments with partially separable optimization 

problems. Numerical Analysis: Proceedings Dundee 1983 (Lecture Notes in Mathematics 1066) (D.F. 

GRIFFITHS, ed.), Springer Verlag, (Berlin). Pp. 203-220. 

GOULD, N.I.M. and TOINT, Ph.L. 2003. CUTEr, a constrained and unconstrained testing environment: 

revisited. ACM Trans. Math. Softw., 29: 373-394. 

LIU, D.C. and NOCEDAL, J. 1989. On the limited memory BFGS method for large scale optimization. 

Mathematical Programming, 45: 503-528. 

MORÉ, J.J., GARBOW, B.S. and HILLSTROM, K.E. 1981. Testing Unconstrained Optimization Software. 

ACM Trans. Math. Softw., 7: 17-41. 

NOCEDAL, J. 1980. Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35: 

773-782. 

NOCEDAL, J. and WRIGHT, S.J. 1999. Numerical Optimization, Springer, New York. 

SORENSEN, D.C. 1982. Collinear scaling and sequential estimation in sparse optimization algorithm. 

Mathematical Programming Study, 18: 135-159. 

TOINT, Ph.L. 1977. On sparse and symmetric matrix updating subject to a linear equation. Mathematics of 

Computation, 31: 954-961. 

YAMASHITA, N. 2008. Sparse quasi-Newton updates with positive definite matrix completion. Mathematical 

Programming, 115: 1-30. 

 
Received   24 August 2011                

Accepted   29 November 2011