A New Sparse Quasi-Newton Update SQU Journal for Science, 17 (1) (2012) 30-43 © 2012 Sultan Qaboos University 30 A New Sparse Quasi-Newton Update Method Minghou Cheng*, Yu-Hong Dai** and Rui Diao* Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, *Email: chengmh@lsec.cc.ac.cn; diaorui@lsec.cc.ac.cn. **Email: dyh@lsec.cc.ac.cn. ABSTRACT: Based on the idea of maximum determinant positive definite matrix completion, Yamashita proposed a sparse quasi-Newton update, called MCQN, for unconstrained optimization problems with sparse Hessian structures. Such an MCQN update keeps the sparsity structure of the Hessian while relaxing the secant condition. In this paper, we propose an alternative to the MCQN update, in which the quasi-Newton matrix satisfies the secant condition, but does not have the same sparsity structure as the Hessian in general. Our numerical results demonstrate the usefulness of the new MCQN update with the BFGS formula for a collection of test problems. A local and superlinear convergence analysis is also provided for the new MCQN update with the DFP formula. KEYWORDS: Large-scale, Matrix completion, Quasi-Newton methods, Secant condition, Sparsity, Unconstrained optimization. المتناثرة نيوتنمتماثلة تحسين جديد لطريقة يوهونج داي و روي دياومنغو تشينج و نيوتن متماثلةلى فكرة إكمال المحدد األقصى لمصفوفة موجبة محددة، اقترح ياماشيتا تحسيناً لطريقة إباالستناد ص: خمل مسائل األمثليات غير المقيدة مع أصفار متناثرة في مصفوفة هس. يحافظ هذا وذلك لحل MCQNتناثرة وسماها الم MCQN التناثر في مصفوفة هس مع تخفيف شرط القاطع. نقترح في هذا البحث بديال عن تحسينالتحسين على هيكلية هيكلية التناثر لمصفوفة هس بشكل عام. تبين نفس لك تمحققة لشرط القاطع، ولكنها التم نيوتن متماثلةبحيث تكون مصفوفة . كذلك تم تجريبيةمجموعة من المسائل ال لحل BFGSقانونالجديد مع MCQN استخدام تحسينفائدة نتائجنا العددية .DFP قانونالجديد مع MCQN قانون الخطي الفائق لتحسينالموضعي وتحليل التقارب 1. Introduction onsider the unconstrained optimization problem ( ) ,min n x R f x  (1) where : n f R R is continuously differentiable and its gradient ( )f x is available. If the dimension n is not C A NEW SPARSE QUASI-NEWTON UPDATE METHOD 31 large, the quasi-Newton method is one choice for solving problem (1) because of its superlinear convergence and the unnecessity to calculate the function Hessian. Assuming that kx is the current iterate and kH is the approximation to the inverse Hessian, the quasi-Newton method generates the next iterate by 1 = ( ) ,k k k k kx x H f x   (2) where > 0k is a stepsize obtained via some line search, and updates the approximation kH to 1kH  to meet the secant condition 1 = ,k k kH y s (3) where 1=k k ks x x  and 1= ( ) ( ).k k ky f x f x  If the dimension of the problem (1) is large, the direct use of the quasi-Newton method is not possible due to the storage of an n n matrix. In order to overcome this difficulty, several methods have been proposed. The limited-memory BFGS (L-BFGS) method (Liu and Nocedal, 1989; Nocedal, 1980) is only to store a few curvature pairs ( , )i is y in the construction of the Hessian approximation. Since there is no need to know any information about the Hessian, the L-BFGS method is friendly to users and has been widely used in practice. For many large-scale problems, the function f can be written in the form =1 ( ) = ( ) , ne i i f x f x where each of the en element functions if depends only on a few variables. In this case, the partitioned quasi- Newton method, developed by Griewank and Toint (see Griewank and Toint, 1982a, 1982b; Griewank and Toint, 1984; and the references therein), performs very well in practice, and is now regarded as one of the important practical optimization algorithms. Their basic idea is to update a Hessian approximation i kB for each element function if and then to assemble these matrices to obtain an approximation kB to the whole Hessian of f. Further, they determine the search direction by solving the linear system =1 ( ) = ( ) . n ie k k ki B d f x Their method was implemented with the trust region strategy since the matrix kB is not positive definite in general. There are also many large-scale problems where the function Hessian 2 ( )f x is sparse and the sparsity structure is available. Suppose that for all , n x R 2 ,[ ( )] = 0, ( , ) ,i jf x i j F  (4) where F is some subset of I I and = {1, 2, , }.I n In this case, it is possible to establish faster optimization methods by exploiting the sparsity structure of the Hessian. Toint (1977) and Fletcher (1995) studied such updates and required 1kH  to meet the sparsity requirement, namely, 1 1 , 1 ,( ) = ( ) = 0k i j k i jH B    when ( , ) ,i j F and the secant equation (3) simultaneously. As a result, their methods involve the solution of a linear system or a convex program at each iteration. If some component of ks is zero, the obtained approximate Hessian may be ill-posed (see Sorensen's example (Sorensen, 1982)). Inspired by the successful use of positive definite matrix completion in (Fukuda et al., 2000) for semidefinite programming, Yamashita (2008) proposed a novel type of quasi-Newton update for problem (1) satisfying the sparse Hessian structure. Let : n n R R   be a strictly convex function defined by ( ) =tr( ) ln det( )A A A  (5) (This function is introduced in (Byrd and Nocedal, 1989) as a powerful tool for the convergence analysis of MINGHOU CHENG, YU-HONG DAI and RUI DIAO 32 quasi-Newton methods). Yamashita determines the new approximation matrix 1kH  from kH by two steps: (i) update kH to QN H by certain ordinary quasi-Newton formula; (ii) obtain 1kH  by solving the following subproblem with H, 1/ 2 1/ 2 , , 1 , min ( ) s.t. = , ( , ) ( ) = 0 , ( , ) . k k QN i j i j i j H HH H H i j F H i j F H S         (6) Here S  denotes the set of symmetric positive semidefinite matrices. Notice that since step (ii) uses , , QN i j H ( , ) ,i j F we only have to update | |F elements of kH in step (i), where | |F means the cardinality of F. As in (Yamashita, 2008), we call the above update MCQN (Matrix Completion Quasi-Newton). The use of DFP and BFGS methods in step (i) are considered in (Yamashita, 2008). Further, Yamashita showed that, if the sparsity pattern of the Hessian is such that there is not any fill-in in its Cholesky factorization, or equivalently, the graph induced by the Hessian is chordal (see (Yamashita, 2008) for details), problem (6) is equivalent to finding a maximum-determinant positive definite matrix completion of , , QN i j H ( , )i j F : , , max det( ) s.t. = , ( , ) . QN i j i j H H H i j F H S    (7) The above problem can be easily solved by analyzing the clique tree of the graph induced by the Hessian (see (Yamashita, 2008) for details). In addition, it is shown in (Yamashita, 2008) that the update does not suffer from the drawback in Sorensen's example (Sorensen, 1982). Therefore by relaxing the secant equation, the MCQN update is easy to implement and is well-posed. The numerical experiments in (Yamashita, 2008) show that, the MCQN update with BFGS obviously performs better than the MCQN update with DFP. As seen from the above procedure, the MCQN update by Yamashita keeps the sparsity structure of the Hessian, but does not satisfy the quasi-Newton condition. Nevertheless, local and superlinear convergence results are only established for an MCQN update with DFP. Dai and Yamashita (2007) extended the results to the MCQN update with Broyden's family. In this paper, we propose an alternative of the MCQN update, in which the quasi-Newton matrix satisfies the secant condition, but does not have the same sparsity structure as the Hessian in general (see the next section). A local and superlinear convergence analysis is also provided for the new MCQN update with DFP (see Section 3). Our numerical results for a collection of test problems demonstrate that the new MCQN update with BFGS clearly outperforms the previous MCQN update with BFGS (see Section 4). Conclusions and discussions are presented in the last section. 2. The new MCQN method Looking back to the MCQN update by Yamashita (2008), the whole sequence of quasi-Newton matrices, which were used for the calculations of search directions, keep the same sparsity structure as the function Hessian. The intermediate matrix QN H satisfies the secant condition, but does not necessarily have the same sparsity structure as the function Hessian. As an alternative of such an MCQN update, we may think of the possible use of those intermediate matrices in the calculations of search directions. In this paper, we explore this A NEW SPARSE QUASI-NEWTON UPDATE METHOD 33 possibility and our numerical results in Section 4 demonstrate the usefulness of such an idea. To describe the new MCQN update, we assume that the current quasi-Newton matrix is ,kH which is symmetric and positive definite. Since kH does not have the sparsity structure of the function Hessian in general, we consider the optimal solution of the following subproblem as an intermediate matrix , S kH 1/ 2 1/ 2 , , 1 , min ( ) s.t. = ( ) , ( , ) ( ) = 0 , ( , ) , k k i j k i j i j H HH H H i j F H i j F H S         (8) where, again S  denotes the set of symmetric positive semidefinite matrices and F is some subset of I I such that (4) holds. If the graph induced by the Hessian is chordal (otherwise, we extend the set F such that the induced graph has such property), we know that S kH possesses the sparse structure and has the following form of sparse clique factorization 1 2 1 2 1= , T T T l l lH P P P QP P P P (9) where , = 1, 2, ,iP i l and Q are some sparse matrices (see (Yamashita, 2008)). Having obtained the intermediate matrix , S kH we can use for example the BFGS formula to generate a new quasi-Newton matrix 1 = 1 , S T T S T S T S k k k k k k k k k k k k k T T T k k k k k k H y s s y H y H y s s H H s y s y s y            (10) which satisfies the secant condition 1 =k k kH y s and will be used for the calculation of search directions. In other words, we alter the two steps in the original MCQN method and determine the new approximation matrix 1kH  in the following way: (i) obtain S kH by solving problem (7); (ii) update S kH to 1kH  by certain quasi-Newton formula. It is not difficult to see that the amount of computation of such a strategy is almost the same as that required by the original MCQN. However, since the new approximation matrix 1kH  satisfies exactly the secant condition, we think that the new quasi-Newton matrix 1kH  contains more information about the function Hessian. The numerical results confirm our idea. A description of the new sparse quasi-Newton method is given as follows. Algorithm 2.1 (NMCQN) Step 1. Obtain an extension of F (still denoted by F) such that ( , )G V F is chordal. Choose 0 n x R and a positive definite matrix 0H with 1 0( ) = 0, ( , ) .ijH i j F    Set := 0.k Step 2. If kx satisfies the termination criterion, then stop. Step 3. 1 = ( ).k k k kx x H f x   Step 4. Obtain the sparse clique-factorization formula (9) of . S kH Step 5. Obtain 1,kH  ( , )i j F by some ordinary quasi-Newton update. Step 6. Set := 1k k  and go to Step 2. MINGHOU CHENG, YU-HONG DAI and RUI DIAO 34 We call the New MCQN method NMCQN. If, in Step 5, the quasi-Newton matrix 1kH  is obtained from S kH by (10), we denote 1 = BFGS( , , ) S k k k kH H s y and call the corresponding algorithm NMCQN-BFGS. If the DFP update formula is used, namely, 1 = , S T S T S k k k k k k k k T S T k k k k k H y y H s s H H y H y s y    (11) we denote 1 = DFP( , , ) S k k k kH H s y and call the corresponding algorithm NMCQN-DFP. 3. Convergence analysis In this section, we show the local and superlinear convergence of NMCQN-DFP, namely, Algorithm 2.1 with 1 = DFP( , , ). S k k k kH H s y The results are established in a manner similar to (Yamashita, 2008). We give the following assumptions on the objective function, where  means the two-norm. Assumption 3.1 Let *x be a solution of (1) and let *= { : } n x R x x b   with a positive constant b. (i) The objective function f is twice continuously differentiable on . (ii) There exist positive constants m and M such that   12 22 ( ) , T n m z z f x z M z z R       for all .x  If the second-order sufficient optimality condition holds at the solution *x and b is sufficiently small, Assumption 3.1(ii) holds. From Assumption 3.1(i), 2 ( )f x is Lipschitz continuous on . Then, from Lemmas 4.1.12 and 4.1.15 in (Dennis and Schnabel, 1983), there exist 1L and 2L such that for all 1, ,k kx x   22 * 1( )k k ky f x s L s  (12) and 2 * 2( ) ,k k k ky f x s L s  (13) where k is defined by  1 * *= max , .k k kx x x x    (14) Moreover, there exists a positive constant 3L such that for all 1 2,z z  , 1 2 3 1 2( ) ( ) .f z f z L z z    (15) Therefore, we have 1 3 1= ( ) ( ) , .k k k k k ky f x f x L s x x C      (16) From Eq. (8.1.2) of (Nocedal and Wright, 1999) we have = ,k k ky G s (17) where kG is the average Hessian defined by 1 2 0 = ( ) .k k kG f x ts dt  For convenience, the following notations are used in the analysis. 1 2 2 * * * *= ( ), = ( ) ,G f x H f x       1/2 1/2 * *= , = ,k k k ks H s y H y  A NEW SPARSE QUASI-NEWTON UPDATE METHOD 35 1/2 1/2 1/2 1/2 * * * *= , = , S S k k k kH H H H H H H H     2 cos = , = , T S T S k k k k k k k kS k k k k y H y y H y q y H y y  2 = , = , T k k k k kT T k k k k s y s M m y s y y where k is the angle between ky and .k kH y Firstly, we introduce the following two lemmas similarly to (Yamashita, 2008). Lemma 3.1 Suppose that Assumption 3.1 holds. Then there exist (0, )c   and (0, )b  such that ln 2 , 1 . k k k k m c M c       whenever < .k  Lemma 3.2 Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ). S k k k kH H s y Then we have ( ) ( ) , S k kH H  where ( ) = tr( ) ln det( ).A A A  By using the above lemmas, we show the following key inequality. Lemma 3.3 Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ). S k k k kH H s y Let  be the constant specified in (13). If ,k  then we have 1 2 2 2 1 ( ) ln 1 ln ( ) 3 . cos cos cos k k k k k k k k q q H H c                  Proof. By Assumption 3.1(ii) and (17), we have = T T k k k k k T T k k k k y s y H y m y y y y  and = , T T k k k k k T T k k k k y y z H z M y s z z  where 1/2 =k k kz H y  and 1 = .k kH G  Since 1kH  is obtained from S kH by the DFP formula, we have 1/ 2 1/ 2 1 * 1 * 1/ 2 1/ 2 1/ 2 1/ 2 * * * * 1/ 2 1/ 2 1/ 2 1/ 2 * * * * 1/ 2 1/ 2 1/ 2 1/ 2 1/ 2 1/ 2 * * * * * * = = = k k S T S T S k k k k k k k T S T k k k k k S T S T S k k k k k k k T S T k k k k k H H H H H y y H s s H H H H H y H y y s H H y y H H H s s H H y H H H H H y y H H s                         MINGHOU CHENG, YU-HONG DAI and RUI DIAO 36 = . S T S T S k k k k k k k T S T k k k k k H y y H s s H y H y y s   (18) Since 2 tr( ) = T zz z for , n z R it follows from (18) that 2 2 1tr( ) = tr( ) . S S k k k k k T S T k k k k k H y s H H y H y y s    (19) In a manner similar to the use of Eqs. (8.45) in (Nocedal and Wright, 1999), we can show that 1det( ) = det( ) . T S k k k k T S k k k y s H H y H y  (20) Moreover, simple calculations show that 2 2 = = T T kk k k k k T S T S kk k k k k kk yy s y s m qy H y y H yy (21) and 2 2 2 2 2 2 = = . ( ) cos S ST S k k k k kk k k k T S T S kk k k k k kk H y H y yy H y q y H y y H yy  (22) It follows from (19), (20), (21) and (22) that 1 1 1 2 2 2 2 ( ) = tr( ) ln det( ) = tr( ) ln det( ) ln ln cos = ( ) ln 1 1 ln ln cos . cos cos k k k S Sk k k k k k k S k k k k k k k k H H H q H M H m q q q H M m                      (23) Lemmas 3.2 and 3.3 give ln 1 1 2 1 = 3k k k k kM m c c c        and ( ) ( ) . S k kH H  Then it follows from (23) that 2 1 2 2 ( ) ( ) 3 ln cos 1 ln . cos cos k k k k k k k k q q H H c             (24) Therefore 1 2 2 2 1 ( ) ln 1 ln ( ) 3 , cos cos cos k k k k k k k k q q H H c                  (25) which completes the proof. □ Using inequality (25), the local and superlinear convergence will be shown. Theorem 3.1 Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ). S k k k kH H s y Then, for any (0,1),  there exist x and H such that 0 * xx x   and 0 * HH H   imply 1 * *k kx x x x    , for all k. A NEW SPARSE QUASI-NEWTON UPDATE METHOD 37 Proof. Suppose that (0,1).  The following inequalities will be shown to hold for all k, 1 * * ,k kx x x x    (26) * 3 , 2 kH H L    (27) where 3L is the Lipschitz constant of .f First, note that by choosing x to be sufficiently small, we have 1 < , , 2 x xL M     (28) where 1,L M and  are the constants specified in (12), Assumption 3.1(ii) and (13), respectively. Moreover, according to Lemma 4 in (Yamashita, 2008), by choosing x and H to be sufficiently small, there exists (0, )  such that 0( ) < , 2 H n    (29) * 3 ( ) < 2 H n H H L       (30) and 3 , 1 2 xc     (31) where H is a symmetric positive definite matrix, 1/2 1/2 * *= ,H H HH   and c is the constant specified in Lemma 3.1. We prove the theorem by induction. When = 0,k the inequality (27) holds from (29) and (30). Moreover, we have 1 * 0 0 0 * 0 * * 0 0 * 0 * * 0 * * 0 * 3 0 * 0 * 2 1 * 0 * 0 * 1 0 * 0 * = ( ) ( ) ( )( ( ) ( )) ( ( ) ( ) ( )) 2 ( ) 2 . x x x x H f x x x x H f x H H f x f x H f x f x G x x L H H x x L H x x x x L M x x x x                                  where the second inequality follows from (15), the third inequality follows from (12) and (30), the forth inequality follows from Assumption 3.1(ii) and 0 * ,xx x   and the final inequality follows from (28) and < .  Next, assuming (26) and (27) for = 0, 1, , 1,k l  we shall show they are also true for = .k l In fact, similarly to the case in which = 0,k we have MINGHOU CHENG, YU-HONG DAI and RUI DIAO 38 1 * * * * * * * * * * 3 * * 2 1 * * * 1 * * ( ) ( ) ( )( ( ) ( )) ( ( ) ( ) ( )) 2 ( ) 2 l l l l l l l l l l l l l l l l x x x H f x x x x H f x H H f x f x H f x f x G x x L H H x x L H x x x x L M x x x x                                 and thus 1 * 1 * * ( ) 2 l l x l l x x L M x x x x            where the first inequality follows from the induction assumption. This shows the truth of (26) for = .k l In the following, we show (27) by (25) in Lemma 3.3. Summing up the inequality (25) with = 0, 1, , 1,k l  we have 1 1 02 2 2 =0 =0 1 ( ) ln 1 ln ( ) 3 . cos cos cos l l k k l k k kk k k q q H H c                       Since 0 < cos 1k  and the term in the square brackets is nonpositive, we have 1 0 =0 ( ) ( ) 3 . l l k k H n H n c         (32) From (14) and the induction assumption, we have    11 * *= max , max , =k k kk k k x x xx x x x          for = 0, 1, , 1,k l  and thus 1 1 =0 =0 1 = . 1 1 ll l k x k x x k k                It then follows from (32), (29) and (31) that 0 3 ( ) ( ) < . 1 x l c H n H n           From (30), we have 2 1 * 3 ( ) , 2 lH f x L    which is (27) for = .k l Thus by induction, (26) and (27) hold for all k. □ In order to show the superlinear convergence, we build the following relationship similarly to (Yamashita, 2008), * *( ) ( ) = 0 = 0 .lim lim k k k k k kk k H H y B G s y s     (33) Lemma 3.4 Suppose that Assumption 3.1 holds. Consider Algorithm 2.1 with 1 = DFP( , , ). S k k k kH H s y Suppose also that 0 * xx x   and 0 * HH H   with the constants x and H specified in Theorem 3.1 for sufficiently small (0,1).  Then relationship (33) holds. A NEW SPARSE QUASI-NEWTON UPDATE METHOD 39 Proof. Let , = 1, 2, , k i i n be the eigenvalues of .kH Since the inequality (27) holds for sufficiently small , we can assume that there exists > 0min such that k i min  for all i and k. Moreover, since * *= ( ) ,k k k ky G s G G s  we have * * * * * * * * * * * ( ) = ( ) ( )( ) ( ) ( ) . k k k k k k k k k k k k k min k k k k k H H y H H G s H H G G s H G B s H H G G s B G s H H G G s                It then follows from (16) that * * * * 3 3 ( ) ( ) ( ) . k k min k k k k k k H H y B G s H H G G y L s L       Since 1 2 0 = ( )k k kG f x ts dt  and *kx x by Theorem 3.1, the second term of the right-hand side of the inequality converges to 0 as .k  Then, relationship (33) holds. □ In the following, we give the main result of this section. Theorem 3.2 Suppose that Assumption 3.1 holds. Suppose also that 0 * xx x   and 0 * HH H   hold for sufficiently small , > 0.x H  Then the sequence { }kx generated by Algorithm 2.1 with 1 = DFP( , , ) S k k k kH H s y converges to *x superlinearly. Proof. From Lemma 3.4, it suffices to show *( ) = 0 .lim k k k k H H y y  (34) Notice by Theorem 3.1 that for any (0,1),  relation (27) holds provided , > 0x H  are sufficiently small. Then we must have that * = 0 ,lim k k H H   (35) otherwise there is a contradiction. Therefore (34) is true since * *( ) .k k k kH H y H H y   By Lemma 3.4, we conclude the superlinear convergence of the algorithm. □ Comparing the proof of Theorem 4 in (Yamashita, 2008), the above proof is simpler and is directly obtained by one statement in Theorem 3.1. 4. Numerical results We tested the new MCQN algorithm with the DFP and BFGS formulas, namely, NMCQN-DFP and NMCQN-BFGS, by using Matlab R2008a on Core(TM)2 PC with Windows-XP. We used the three problems in (Yamashita, 2008) and two more problems in (Dai and Yamashita, 2007) in our numerical experiments. Various dimensions, namely, = 10,n 100, 1000 and 10000, were chosen for the problems. All tested problems were tridiagonal. Therefore, the chordal extensions of their sparsity pattern can easily be obtained. The details of the problems are given as follows, where inix means the used initial point. MINGHOU CHENG, YU-HONG DAI and RUI DIAO 40 Problem I (TRIDIA (Gould and Toint, 2003)) 2 2 1 1 =2 ( ) = ( 1) ( 2 ) , = (1, 1, , 1) . n i i i T ini f x x x x x    Problem II (Extended Rosenbrock function (Fletcher,1995)) 1 2 2 2 1 =1 ( ) = 100( ) (1 ) , = ( 1.2, 1, 1.2, 1, , 1.2, 1) . n i i i i T ini f x x x x x         Problem III (Boundary value problem (Fletcher,1995)) 2 =1 1 1 ( ) = (cos 2 ), 2 ( 1) 1 2 = ( , , , ) , 1 1 1 n T T n i i i T ini f x x Tx e x x x n n x n n n        where = (1, 1, , 1) , T ne 2 1 1 2 1 = .1 2 1 1 2 T                  Problem IV (Extended Powell singular function (Moré et al., 1981)) /4 4 4 2 2 4 3 4 4 2 4 1 4 1 4 4 3 4 2 =1 ( ) = 10( ) ( 2 ) 5( ) ( 10 ) , = (3, 1, 0, 1, , 3, 1, 0, 1) . n i i i i i i i i i T ini f x x x x x x x x x x                   Problem V (Broyden tridiagonal function (Moré et al., 1981)) 2 2 2 2 1 1 2 1 1 2 2 1 1 =2 ( ) = (3 2 2 1) (3 2 1) (3 2 2 1) , = ( 1, , 1) . n n n n i i i i i T ini f x x x x x x x x x x x x                   The following termination criterion is employed: 5( ) 10 or > 50000 . kf x k n   Instead of Step 3 in Algorithm 2.1, we set 1 = ( )k k k k kx x H f x   with a step size k to improve the numerical performance. k is chosen to satisfy Wolfe's rule : A NEW SPARSE QUASI-NEWTON UPDATE METHOD 41 4 ( ) ( ) 10 ( ) , | ( ) | 0.9 ( ) . T k k k k k k k T T k k k k k k f x d f x f x d f x d d f x d              For convenience, we also list the numerical results reported for the MCQN update, as well as for the BFGS and L-BFGS methods, in (Yamashita, 2008). See Tables 1 and 2. For the L-BFGS method, = 5,m which is the number of stored curvature pairs of L-BFGS, and the scaling factor 2 1 1 1/ T k k ks y y   was employed. L-BFGS just stores the pairs of vectors ( , ), = 1, , ,k ks y k m while our method just stores a few entries of kH to exploit the sparsity structure of the Hessian. Therefore, both of them have fast implementation in practice. Although NMCQN-DFP has a nice theoretical convergence property as shown in the previous section, its numerical performance is not very good. Hence we only list the results of NMCQN-BFGS. The tables list the total number of iterations. The symbol "F" denotes that the number of iterations exceeds 50000 and "-" means that the BFGS method could not be implemented for = 10000.n Table 1. Results of problems I, II and III Problem n BFGS L-BFGS MCQN-DFP MCQN-BFGS NMCQN-BFGS 10 15 31 20 29 47 I 100 108 126 167 72 75 1000 662 415 1498 192 195 10000 - 1191 11626 528 475 10 78 68 76 60 514 II 100 487 527 665 341 1002 1000 4525 4979 6574 3207 690 10000 - 49580 F 31737 403 10 15 24 15 15 13 III 100 107 299 49 50 18 1000 571 3117 86 54 23 10000 - F 2600 402 25 Table 2. Results for problems IV and V Problem n MCQN-BFGS NMCQN-BFGS 10 40 37 IV 100 211 324 1000 589 121 10000 998 97 10 30 52 V 100 56 54 1000 49 61 10000 56 52 The results in Table 1 show that NMCQN-BFGS is the best among the methods compared. For Problems IV and V, we only present numerical results for MCQN-BFGS by Yamashita (2008) and NMCQN-BFGS. From Tables 1 and 2, we can see that the NMCQN-BFGS method performs much better than MCQN-BFGS and BFGS and L-BFGS. MINGHOU CHENG, YU-HONG DAI and RUI DIAO 42 For a further comparison between NMCQN-BFGS with MCQN-BFGS, we tested the new method with different choices of initial points. The initial points for the five test problems are ,inix 2 ,inix 4 ,inix 7 inix and 10 .inix The dimension of the test problems are fixed to be = 1000.n We list the number of iterations of the two methods for different test problems in Table 3. The results show that NMCQN-BFGS is better than MCQN- BFGS. Therefore the new method, NMCQN-BFGS, is a promising alternative of MCQN-BFGS especially for large-scale problems. 5. Conclusions and discussion In this paper, we have proposed an alternative to the sparse quasi-Newton update method (MCQN) by Yamashita (2008). The quasi-Newton matrix in the new MCQN method (denoted by NMCQN) satisfies exactly the secant condition, but does not possess the same sparsity structure as the function Hessian in general. We have established the local and superlinear convergence of NMCQN with the DFP updating formula, namely, NMCQN-DFP. The numerical experiments showed that NMCQN-BFGS is promising especially for large-scale problems. Table 3. Results with different initial points Problem Initial MCQN NMCQN Problem Initial MCQN NMCQN Point -BFGS -BFGS Point -BFGS -BFGS inix 192 195 inix 54 121 2 inix 210 202 2 inix 213 191 I 4 inix 213 199 IV 4 inix 210 249 7 inix 220 215 7 inix 228 370 10 inix 213 210 10 inix 294 252 inix 3207 690 inix 54 61 2 inix 4850 3753 2 inix 213 54 II 4 inix 5056 3786 V 4 inix 210 90 7 inix 2157 813 7 inix 228 130 10 inix 4961 853 10 inix 294 62 inix 54 23 2 inix 213 24 III 4 inix 210 24 7 inix 228 24 10 inix 294 25 It still remains under investigations how to provide a good combination of the MCQN update and the new NMCQN method by adaptively choosing one of them based on the property of the problems. When the sparsity dominates the performance, the MCQN update may be preferable; otherwise we could implement the NMCQN update. In addition, the new update method can obviously be extended to the whole Broyden's convex family. A NEW SPARSE QUASI-NEWTON UPDATE METHOD 43 6. Acknowledgements This work was partly supported by the Chinese NSF grants (no. 10831106), the CAS grant (no. kjcx-yw- s7-03) and the China National Funds for Distinguished Young Scientists (no. 11125107) 7. References BYRD, R. and NOCEDAL, J. 1989. A tool for the analysis of quasi-Newton methods with application to unconstrained optimization. SIAM Journal on Numerical Analysis, 26: 727-739. DAI, Y.H. and YAMASHITA, N. 2007. Analysis of sparse quasi-Newton updates with positive definte matrix completion. Research report, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing. DENNIS, J.E. and SCHNABEL, R.B. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall Inc., New Jersey. FLETCHER, R. 1995. An optimal positive definite update for sparse Hessian matrices. SIAM Journal on Optimization, 5: 192-218. FUKUDA, M., KOJIMA, M., MUROTA, K. and NAKATA, K. 2000. Exploiting sparsity in semidefinite programming via matrix completion I: General frameworks. SIAM Journal on Optimization, 11: 647-674. GRIEWANK, A. and TOINT, Ph.L. 1982a. Partitioned variable metric updates for large structure optimization problems. Numerische Mathematik, 39: 119-137. GRIEWANK, A. and TOINT, Ph.L. 1982b. Local convergence analysis of partitioned quasi-Newton updates. Numerische Mathematik, 39: 429-448. GRIEWANK, A. and TOINT, Ph.L. 1984. Numerical experiments with partially separable optimization problems. Numerical Analysis: Proceedings Dundee 1983 (Lecture Notes in Mathematics 1066) (D.F. GRIFFITHS, ed.), Springer Verlag, (Berlin). Pp. 203-220. GOULD, N.I.M. and TOINT, Ph.L. 2003. CUTEr, a constrained and unconstrained testing environment: revisited. ACM Trans. Math. Softw., 29: 373-394. LIU, D.C. and NOCEDAL, J. 1989. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45: 503-528. MORÉ, J.J., GARBOW, B.S. and HILLSTROM, K.E. 1981. Testing Unconstrained Optimization Software. ACM Trans. Math. Softw., 7: 17-41. NOCEDAL, J. 1980. Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35: 773-782. NOCEDAL, J. and WRIGHT, S.J. 1999. Numerical Optimization, Springer, New York. SORENSEN, D.C. 1982. Collinear scaling and sequential estimation in sparse optimization algorithm. Mathematical Programming Study, 18: 135-159. TOINT, Ph.L. 1977. On sparse and symmetric matrix updating subject to a linear equation. Mathematics of Computation, 31: 954-961. YAMASHITA, N. 2008. Sparse quasi-Newton updates with positive definite matrix completion. Mathematical Programming, 115: 1-30. Received 24 August 2011 Accepted 29 November 2011