Full page photo 199-209 SQU Journal For Science, 12 (2) (2007) © 2007 Sultan Qaboos University 199 An Overview of Some Practical Quasi- Newton Methods for Unconstrained Optimization M. Al-Baali* and H. Khalfan** Department of Mathematics and Statistics, College of Science, Sultan Qaboos University,P.O.Box 36, PC 123, Al-Khodh, Muscat, Sultanate of Oman, *Email: albaali@squ,edu.om, **Department of Mathematics Sciences, UAE University, Al-Ain 17551, United Arab Emirates, Email: humaidf@uaeu.ac.ae. دراسة حصریة لبعض الخوارزمیات التطبیقیة المشابھة لطریقة نیوتن للحلول المثلى غیر المقیدة محي الدین البعلي وحمید خلفان تعتبر الخوارزمیات المشابھة لطریقة نیوتن من الطرق التطبیقیة األكثر كفاءة لمسائل الحلول المثلى غیر المقیدة. :خالصة دراسة شاملة لبعض الخوارزمیات الحدیثة من ھذا النوع مع التركیز على الطرق المستخدمة في تقریب تتضمن ھذه المقالة مصفوفة المشتقة الثانیة ووسائل تطویرھا. ABSTRACT: Quasi-Newton methods are among the most practical and efficient iterative methods for solving unconstrained minimization problems. In this paper we give an overview of some of these methods with focus primarily on the Hessian approximation updates and modifications aimed at improving their performance. KEYWORDS: Unconstrained optimization, line search technique, quasi-newton methods. 1. Introduction In this paper we give an overview of some line search quasi-Newton methods for solving the unconstrained minimization problem ( )min nx R f x ∈ , (1) where f is a twice continuously differentiable function. Emphasis will be on the Hessian approximation formulas used in these methods, and techniques developed to improve their performance. The basic iteration of a quasi-Newton method consists of the following. Starting with an initial approximation 1 x to a solution x ∗ of (1), and an initial positive definite Hessian approximation 1 B , calculate a new approximation at iteration k by M. AL-BAALI and H. KHALFAN 200 1k k k kx x dα+ = + , (2) where k α is a steplength and k d is a search direction obtained by ( )1k k kd B f x−= − ∇ , (3) where k B is a positive definite n n× Hessian approximation matrix chosen from the general Broyden class of updates discussed in the next section. The steplength k α is calculated such that the Wolfe conditions ( ) ( ) ( )1 1 (0 1 2) T k k k k k kkf x d f x f dxα ρ α ρ+ − ≤ ∇ , ∈ , / (4) and ( ) ( ) ( )2 2 1 1 T T k kk k k kf d f dx d xρ ρ ρα∇ ≥ ∇ , ∈ ,+ (5) are satisfied. The first condition ensures sufficient reduction in f , and the second one guarantees that the steplegnth is not too small relative to the initial rate of decrease in f . In practice, the strong Wolfe conditions (4) and ( ) ( )2 T T k kk k k kf d f dx d xρα| ∇ |≤ − ∇+ are usually used. If beside the standard assumptions on f , f∇ is Lipschitz continuous, the steplength kα satisfies the Wolfe conditions, and the matrices k B are positive definite and have a bounded condition number, then iteration (2) is globally convergent. (see e.g. Fletcher, 1987, Dennis and Schnabel, 1996, and Nocedal and Wright, 1999). In the next section we discuss some well-known members of the Broyden class of Hessian approximation updates. In Section 3 we outline some approaches for improving the performance of some standard updates by modifications of the gradient difference ( ) ( )1k kf x f x+∇ − ∇ . In Section 4 we discuss self- scaling quasi-Newton methods aimed at handling ill-conditioned problems. Some quasi-Newton methods for large scale optimization are discussed in Section 5. 2. Quasi-newton updates Setting 1kα = and 2 ( )k kB f x= ∇ in (2) and (3), respectively, we obtain the well-known Newton method which converges quadratically to a solution x ∗ , if 1x is sufficiently close to x ∗ . For many practical problems, however, analytic second derivatives are unavailable, and the cost of approximating them by finite differences requires either an additional n gradient evaluations or 2( )O n function evaluations per iteration. Quasi-Newton updates for Hessian approximation avoid these disadvantages by approximating 2 1( )kf x +∇ by a symmetric matrix 1k k k B B E+ = + , where k E is an n n× matrix, such that the secant equation 1k k kB s y+ = , (6) is satisfied, where 1k k k s x x+= − and ( ) ( )1k k ky f x f x+= ∇ − ∇ . AN OVERVIEW OF SOME PRACTICAL QUASI-NEWTON METHODS 201 The solution to equation (6), however, is not unique, and there is a variety of updating formulas obtained by imposing more conditions on 1k B + . If kE is assumed to be a symmetric rank-one matrix for instance, then (6) yields the unique Symmetric Rank-One update (SR1) ( ) ( ) 1 T k k k k k k k k T T k k k k k y B s y B s B B s y s B s+ − − = + . − (7) Requiring k E to be a symmetric rank-two matrix, equation (6) yields a variety of possible updates such as the well-known (broyden fletcher goldfarb shanno) and (davidon fletcher powell) formulas. A more general formula satisfying the secant equation with k E symmetric and of rank two at most, known as the Broyden class, is given by 1 ( ) T T Tk k k k k k k k k kT T k k k k k B s s B y y B B w w s B s y s θ θ+ = − + + , where 1 2( )T k k kk k k k T T k k k k k y B s w s B s y s s B s /  = −    and θ is a parameter. This class includes as special case the BFGS update, when 0θ = ; the DFP update, when 1θ = ; and the SR1 update, when ( ) T T T k k k k k k ks y s y s B sθ = / − . (Another family of updates was proposed by Huang (1970), but is not discussed here since it was shown to be equivalent to the self-scaling Broyden family discussed in Section 5.) If k B is positive definite, and the curvature condition 0Tk ks y > holds (which is guaranteed by the second Wolfe condition (5)), then 1k B + is positive definite for any kθ θ> , where 1 , 1 T T k k k k k k k k kT T k k k k k k s B s y H y b h b h s y s y θ = = , = − , and 1 k kH B −= . Since 0kθ < (by Cauchy’s inequality), it clearly follows that any update with 0θ ≥ (such as the BFGS and DFP updates) preserves positive definiteness if the curvature condition holds. The SR1 update preserves positive definiteness only if either 1kb < or 1kh < , which may not hold even for quadratic functions. (See e.g. Fletcher, 1987). Powell (1976) showed that if f is a convex function and the Wolfe conditions hold, then for any starting point 1 x and any positive definite initial matrix 1 B , the BFGS method converges globally; and if furthermore the true Hessian 2 f x∗     ∇ is positive definite, then the rate of convergence is q -superlinear. This result was extended by Byrd, Nocedal and Yuan (1987) to the interval 0 1θ≤ < , (updates belonging to [ ]0 1, are known as the convex class). Although Dixon (1972) showed that for a general nonlinear function f , all well-defined members of the Broyden family with kθ θ≠ generate the same sequence of iterates when used with exact line searches, numerical experience showed that only some updates, which we discuss next, worked well in practice when inexact line searches are used; and that the performance deteriorates as θ increases above 0 (see e.g. Byrd, Liu and Nocedal, 1992). M. AL-BAALI and H. KHALFAN 202 The SR1 method enjoys some desirable features which are not shared by other standard updates. Fiaco and McCormick (1968) showed that for a positive definite quadratic function, if the SR1 update is used with linearly independent steps, and all the updates are well-defined, then the solution is reached in at most 1n + iterations. Furthermore, if 1n + iterations are required, then the final Hessian approximation 1nB + is the actual Hessian. This quadratic termination property is not generally true for other members of the Broyden family, unless exact line searches are used. For general functions, Conn, Gould and Toint (1991) proved that the sequence of SR1 Hessian approximations converges to the true Hessian at the solution provided that the steps are uniformly linearly independent; that the SR1 update denominator is always sufficiently different from zero, and that the iterates converge to a finite limit. Hence under these conditions the rate of convergence is q -superlinear. If the assumption of uniform linear independence is dropped, then as shown in Khalfan, Byrd and Schnabel (1993) the SR1 method converges ( 1)n + - q -superlinearly provided that for all k, kB is positive definite and bounded. On the other hand, Ge and Powell (1983) showed that the sequence of matrices generated by the BFGS method converges to a matrix not necessarily equal to the true Hessian. In order to obtain a well-conditioned update, Davidon (1975) proposed the update 1 , 2 1 1 , . 1 k k k k k k k k h b h b h b h otherwise b θ − + ≤ − =    − (8) The first update in this formula is obtained by minimizing over θ the condition number 1( ( ) )k kB Hκ θ+ , and the second one is the SR1 formula. Practical experience, however, showed no significant improvement using this update (see e.g. Al-Baali, 1993 and Lukšan and Spedicato, 2000). Since updates from the preconvex class work well in practice (see e.g. Zhang and Tewarson 1988 and Byrd, Liu and Nocedal, 1992), Al-Baali (1993) reported improved numerical performance using the switching BFGS/SR1 update 0 1 1 otherwise 1 k k h b θ , ≥  =  , , − which preserves positive definiteness. Lukšan and Spedicato (2000) also reported competitive performance using this update. Other updates of the switching type are given in Al-Baali, Fuduli and Musmanno (2004). An interval of globally convergent updates was tested by Al-Baali and Khalfan (2005), defined by k k θ θ θ− +≤ ≤ , where 1 2 1 1 1 1k k k k kb h θ ν ν / ±  = ± , = − .    (9) Their numerical experience indicated that several updates from this interval worked well in practice, especially the modified SR1 update AN OVERVIEW OF SOME PRACTICAL QUASI-NEWTON METHODS 203 1 1 max 1 1 1 min 1 1 k k k k k k k k b b b b b θ θ θ θ −    −           +          , =   = , , > −   , , <  − and the update defined by max( 1 )k kbθ θ −= , − which does not include the SR1 formula. The fast rate of convergence observed in the numerical experience with these updates suggests further study of their convergence properties. 3. Modifying gradient-difference vector Many approaches have been proposed to improve the quasi-Newton Hessian approximation updates. In this section we outline some recent suggested updates obtained by modifying the vector k y . Zhang, Deng, and Chen (1999) suggested replacing k y in the BFGS formula by the vector 1 1 2 6( ) 3( )Tk k k k k k k k k f f g g s y y s s ∗ + +− + += + || || (10) which has the property that ( )2 31 T k k k k k k s f x s y O s s ∗      +    ∇ − = || || || || and reduces to k y if f is quadratic. The resulting modified BFGS method retains global and q -superlinear convergence for convex functions, and performs slightly better than the standard BFGS method on some test problems. Li and Fukushima (2001) proposed using 2max 0 T k k k k k k k y s y y g s s ∗   − = + || || + ,   || ||   instead of k y in the BFGS update for ensuring that 0Tk ky s ∗ > . They showed that for a general function, the resulting BFGS method with backtracking line search converges globally and superlinearly, under standard assumptions on the objective function. Their numerical experience also indicated some improvement in performance. For other modifications of this type (Yuan, 1991, Xu and Zhang, 2001, Wei et al., 2004, and Zhang, 2005). Modifying k y was originally suggested by Powell (1978) who proposed a BFGS method for constrained optimization with ( ) ( )1 ,k k k k ky y B s yϕ∗ = + − − M. AL-BAALI and H. KHALFAN 204 where 0 1ϕ< ≤ . Al-Baali (2003b) used this modification for unconstrained problems and reported improved performance for the BFGS and many other methods. Other approaches for improving Hessian approximation involved employing multiple quasi-Newton updates at each iteration, using information at the current and previous steps. Some improvement was observed for certain type of test problems (Khalfan, 1989, Ford and Tharmlikit, 2003, and Al-Baali, et al., 2004). 4. Self-scaling quasi-newton methods The standard quasi-Newton methods that we considered so far may have difficulties in solving some ill- conditioned problems. Powell (1986) showed that, the BFGS and DFP methods behaved badly (the latter far worse) when applied to a simple ill-conditioned quadratic function. Moreover, Dai (2002) and Mascarenhas (2004) gave examples of nonconvex functions, for which the BFGS method failed to converge to the solution of the problem. In this section we consider self-scaling Hessian approximation updates for handling ill-conditioned problems. Oren and Luenberger (1974) proposed the two parameters class of self-scaling Hessian approximation updates, ( )1 , , T T Tk k k k k k k k kT T k k k k k B s s B y y B w w s B s y s θ θ+ κ   τ = τ Β − + + τ    (11) where θ and τ are chosen such that the new update is optimally conditioned in some sense. For the parameters θ and τ the authors suggested the intervals 11, k k h b θ ≤ ≤ τ ≤ (12) to ensure that ( )( ) ( )1 11 ,k kB G B Gκ θ τ κ− −+ ≤ when f is a quadratic function with a positive definite Hessian G. Using intervals (12), Oren and Spedicato (1976) proposed the class ( )1 k k k h b h τ θ τ − = − (13) which can be obtained by minimizing the condition number ( )( )1 , /k kB Hκ θ τ τ+ over θ (see Spedicato (1978)). Notice that substituting 1τ = in class (13) gives the Davidon optimally conditioned update, given in the first case of (8). Moreover, as shown in Al-Baali (1995), class (13) can be also obtained by minimizing the same condition number over τ . In order to include members from the nonconvex class such as the SR1 update, and maintain positive definiteness, Al-Baali (1995) and Hu and Storey (1994) proposed the intervals , ,k k k kθ θ θ τ τ τ − + − +≤ ≤ ≤ ≤ (14) where ( )1 ,k k kh vτ ± = ± and where kθ ± and k v are defined by (9). The end points of intervals (14) define two self-scaling SR1 updates which are the same updates obtained by Osborne and Sun (1988) and Wolkowicz (1996). The choice ( )1/ 1 0k kbθ τ += − ≤ with kτ τ += is preferable since methods from the preconvex class work well in practice. AN OVERVIEW OF SOME PRACTICAL QUASI-NEWTON METHODS 205 Most of these self-scaling updates, however, do not generally improve the performance of the unscaled methods as reported by several authors. In fact, Shanno and Phua (1978) showed that the BFGS method worked better when scaling is used only for the initial Hessian approximation 1 B . Moreover, Nocedal and Yuan (1993) showed that, compared with the (unscaled) BFGS method, the best self-scaling BFGS algorithm of Oren and Luenberger (1974), 0θ = and 1/ kbτ = , performs badly, when used for solving a simple quadratic problem of two variables. They also showed that for the same problem, superliner convergence is not obtained unless certain steplegnth values are used which cannot be guaranteed in practice. Al-Baali (1998) however, extended the global and superliner convergence theory of Byrd, Liu and Nocedal (1992) for convex functions to the self-scaling class, (11), on the intervals 1 2 3 1, ,kc c cτ θ τθ≤ < ≤ ≤ (15) where ( )1 2 3, , 0,1 ,c c c ∈ and reported that self-scaling methods from these intervals, outperformed corresponding unscaled methods. Further numerical testing reported by Al-Baali and Khalfan (2005) showed that these methods succeeded in solving more problems than the unscaled methods, especially when 1.θ ≥ Using scaling only when 1,τ < also improves the performance of other self-scaling methods discussed above. For example, replacing the best self-scaling BFGS method of Oren and Luenberger (1974) mentioned above with 1 min ,1 , kb τ   =     the resulting method outperformed even the standard BFGS method. Performance improvement was also reported, especially for the DFP method, by Contreras and Tapia (1993), and Yabe, Martines, and Tapia (2004), when a similar self-scaling approach was used for a certain type of problems. 5. Large-scale quasi-newton methods The storage and computational requirement of the methods we considered so far is ( )2nΟ for a problem of n variables. Several modifications of quasi-Newton updates have been proposed to improve their efficiency when this cost is excessive. In this section we consider two approaches that are used widely in practice: the limited-memory method which is suitable for problems in which the true Hessian is not sparse, and the partitioned method which is the method of choice for problems with partially separable Hessians. 5.1 Limited-memory methods In these methods only a few vectors of length n are used for approximating the inverse of the Hessian implicitly instead of storing a full n × n matrix. For example, if we write the BFGS update in the form 1 k, V , 1/ , T T T k k k k k k k k k k kH V H V I I y s s yρ ρ ρ+ = + = − = then using the m vector pairs { } 1, , , 1,..., 1,i i ks y i k k k m H += − − + can be expressed as M. AL-BAALI and H. KHALFAN 206 ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 2 1 1 2 2 2 1 1 2 . T T k k k m k m k m k T T T k m k k m k m k m k m k T T T k m k k m k m k m k m k T k k k H V V H V V V V s s V V V V s s V V s s ρ ρ ρ + − + − + − + − + − + − + − + − + − + − + − + − + − + = + + + + L L L L L L L In the limited-memory method of Nocedal (1980), the matrix 1k m H − + is replaced at each iteration by a positive definite diagonal matrix ,kD and the search direction ( )1 1 1k k kd H f x+ + += − ∇ is obtained by 4mn multiplications involving the m pairs { },i is y and ( )1kf x +∇ as shown for example in Nocedal and Wright (1999). This method is referred to as the L-BFGS method and it converges only globally for convex functions as shown in Liu and Nocedal (1989). A compact representation of limited-memory methods for general quasi- Newton updates is given by Byrd, Nocedal and Schnabel (1994). Computational experience with ( )1/ ,k kD h I= a multiple of the identity matrix, indicates that for large scale problems in which ( )2 1kf x +∇ is not sparse, the L-BFGS method outperforms other methods such as the nonlinear conjugate gradient method; and that its performance improves substantially, in term of computing time, as n gets large. The method however may suffer from slow convergence, which costs more function evaluations, especially on very ill-conditioned problems (Nocedal and Wright, 1999). For large-scale least- square problems, Al-Baali (2003a) considered a modified L-BFGS method using a vector k y∗ , similar to the one discussed in section 3, instead of k y and reported substantial improvement in numerical performance. Another limited-memory approach, is based on the fact that the standard BFGS method accumulates approximate curvature in a sequence of expanding subspaces, which allows using a smaller reduced matrix to approximate the Hessian, that increases in dimension at each iteration. This feature is used to define limited- memory reduced-Hessian methods that require half the storage of conventional limited-memory methods. For more on these methods see Gill, and Leonard (2003 ) and Lukšan and Vlček (2006). 5.2 Partitioned methods Every function f with a sparse Hessian can be written in the form 1 2( ) ( ) ( ) ( ),mf x f x f x f x= + + +L where each function if depends only on a few variables in , and 1 2 mn n n n= + +L (see Griewank and Toint (1982)). In a partitioned method the Heassian of each element function i f is approximated using a quasi- Newton Hessian approximation, i kB . These matrices are then assembled to define a sparse Hessian approximation k B to ( )2 kf x∇ for finding the search direction by solving ( ).k k kB d f x= −∇ (16) If all the element functions ( )if x are convex, then the BFGS method converges globally, even if the system (16 ) is solved inexactly as shown in Griewank (1991). The partitioned BFGS method performs well in practice provided that partial separability is fully exploited. Practical implementation however, mostly use the AN OVERVIEW OF SOME PRACTICAL QUASI-NEWTON METHODS 207 SR1 method, since some element functions may have indefinite Hessians (see Griewank and Toint (1984), Conn, Gould and Toint (1992, 1996). Finally we mention that there are many other modifications of quasi-Newton methods that we did not cover in this paper. For other reviews of quasi-Newton methods see for instance Nocedal (1992) and Lukšan and Spedicato (2000). 6. References AL-BAALI, M. 1993. Variational quasi-newton methods for unconstrained optimization. JOTA, 77: 127–143. AL-BAALI, M. 1995. On measure functions for the self–scaling updating formulae for quasi–newton methods. Optimization, 32: 59–69. AL-BAALI, M. 1998. Global and superliner convergence of a restricted class of self-scaling methods with inexact line searches for convex functions. COAP, 9: 191–203. AL-BAALI, M. 2003a. Quasi-Newton Algorithms for Large-Scale Nonlinear Least-Squares. In High Performance Algorithms and Software for Nonlinear Optimization, Editors G. Di Pillo and A. Murli, Kluwer Academic, pp. 1-21. AL-BAALI, M. 2003b. 18th International Symposium on Mathematical Programming, Copenhagen, August 18– 22. AL-BAALI, M., FUDULI, A. and MUSMANNO, R. 2004. On the Performance of Switching BFGS/SR1 Algorithms for Unconstrained Optimization. OMS, 19: 153-164. AL-BAALI, M. and KHALFAN, H. 2005. A Wide Interval for Efficient Self-Scaling Quasi-Newton Algorithms. OMS, 20: 679–691. BYRD, R.H., LIU, D.C. and NOCEDAL, J. 1992. On the Behavior of Broyden’s Class of Quasi-Newton Methods. SIAM J. Optim., 2: 533–557. BYRD, R.H., NOCEDAL, J. and SCHNABEL, R.B. 1994. Representation Matrices and their Use in Limited- Memory Methods. Math. Programming, Series A, 63: 129-156. BYRD, R.H., NOCEDAL, J. and YUAN, Y. 1987. Global Convergence of a Class of Quasi-Newton Methods on Convex Problems. SIAM J. Numer. Anal., 24: 1171–1190. CONN, A.R., GOULD, N.I.M. and TOINT, PH.L. 1991. Convergence of Quasi-Newton Matrices Generated by the Symmetric Rank One Update. Math. Programming, 50: 177–195. CONN, A.R., GOULD, N.I.M. and TOINT, PH.L. 1991. LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A), Springer Series in Computational Mathematics, Springer, New York. CONN, A.R., GOULD, N.I.M. and TOINT, PH.L. 1994. Large-Scale Nonlinear Constrained Optimization: A Current Survey, in Algorithms for Continuous Optimization: The State of the Art (E. Spedicato, ed.), Vol. 434 of NATO ASI Series C: Mathematical and Physical Sciences, Kluwer, Dordrecht, Netherlands, pp. 287–332. CONN, A.R., GOULD, N.I.M. and TOINT, PH.L. 1996. Large-Scale Nonlinear Optimization 351, ‘Numerical Experiments with the LANCELOT Package (Release A) for Large-Scale Nonlinear Optimization.’ Math. Programming, Ser. A, 73: 73–110. CONTRERAS, M. and TAPIA, R.A. 1993. Sizing the BFGS and DFP Updates: A Numerical Study. JOTA, 78: 93–108. DAI, Y. 2002. Convergence Properties of the BFGS Algorithm. SIAM J. Optim., 13: 693–701. DAVIDON, W.C. 1975. Optimally Conditioned Optimization Algorithms without Line Searches. Math. Programming, 9: 1-30. DENNIS, J.E. and SCHNABEL, R.B. 1996. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM Publications. DIXON, L.C.W. 1972. Quasi-Newton Methods Generate Identical Points. Math. Programming, 2: 383–387. FIACO, A.V. and MCCORMICK, G.P. 1968. Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Wiley, New York. Reprinted by SIAM Publications, 1990. M. AL-BAALI and H. KHALFAN 208 FLETCHER, R. 1987. Practical Methods of Optimization (2nd edition), Wiley, Chichester, England. Reprinted in 2000. FLETCHER, R. 1994. An Overview of Unconstrained Optimization. In E. Spedicato, editor, Algorithms for Continuous Optimization: The state of the Art, Kluwer Academic, Boston, pp. 109–143. FORD, J.A. and THARMLIKIT, S. 2003. New Implicit Updates in Multi-Step Quasi-Newton Methods for Unconstrained Optimization. J. Compt. Appl. Math., 152: 133–146. GE, R.-P. and POWELL, M.J.D. 1983. The Convergence of Variable Metric Matrices in Unconstrained Optimization. Math. Programming, 27: 123–143. GILL, P.E. and LEONARD, M.W. 2003. Limited-Memory Reduced-Hessian Methods for Large-Scale Unconstrained Optimization. SIAM J. Optim., 14: 380–401. GRIEWANK, A. 1991. The Global Convergence of Partitioned BFGS on Problems with Convex Decomposition and Lipschitzian Gradient. Math. Programming, 50: 141-175. GRIEWANK, A. and TOINT, PH.L. 1982. On the Unconstrained Optimization of Partially Separable Objective Functions in Nonlinear Optimization. In 1981 (M.J.D. Powell Ed.) Academic Press, (London), 301-302. GRIEWANK, A. and TOINT, PH.L. 1984. Numerical Experiments with Partially Separable Optimization Problems. In Numerical Analysis Proceedings Dundee 1983, Lecture Notes in Mathematics 1066 (D.F. Griffiths Ed.) Springer Verlag (Berlin), 203-220. KHALFAN, H.F. 1989. Topics in Quasi-Newton Methods for Unconstrained Optimization, Ph.D. thesis, University of Colorado at Boulder, USA. KHALFAN, H.F., BYRD, R.H. and SCHNABEL, R.B. 1993. A Theoretical and Experimental Study of the Symmetric Rank One Update. SIAM J. Optim., 3: 1–24. HU, Y.F. and STOREY, C. 1994. Family of Optimally Conditioned Quasi-Newton Updates for Unconstrained Optimization. JOTA, 83: 421–431. HUANG, H.Y. 1970. Unified Approach to Quadratically Convergent Algorithms for Function Minimization. JOTA, 5: 405–423. LI, D. and FUKUSHIMA, M. 2001. A Modified BFGS Method and its Global Convergence in Nonconvex Minimization. J. Compt. Appl. Math., 129: 15–35. LIU, D.C. and NOCEDAL, J. 1989. On the Limited Memory BFGS Method for Large Scale Optimization. Math. Programming (Series B), 45:503–528. LUKŠAN, L. and SPEDICATO, E. 2000. Variable Metric Methods for Unconstrained Optimization and Nonlinear Least Squares. J. Compt. Appl. Math., 124: 61–95. LUKŠAN, L. and VLČEK, J. 2006. Variable Metric Method for Minimization of Partially Separable Nonsmooth Functions. Pacific Journal of Optimization, 2: 59-70. MASCARENHAS, W.F. 2004. The BFGS Method with Exact Line Searches Fails for Nonconvex Objective Functions. Math. Programming, 99: 49–61. NOCEDAL, J. 1980. Updating Quasi-Newton Matrices with Limited Storage. Math. Comput., 35: 773–782. NOCEDAL, J. 1992. Theory of Algorithms for Unconstrained Optimization. Acta Numerica, 1: 199–242. NOCEDAL, J. and WRIGHT, S.J. 1999. Numerical optimization, Springer, London. NOCEDAL, J. and YUAN, Y. 1993. Analysis of a Self-Scaling Quasi-Newton Method. Math. Programming, 61: 19–37. OREN, S.S. and LUENBERGER, D.G. 1974. Self-Scaling Variable Metric (SSVM) Algorithms, Part I: Criteria and Sufficient Conditions for Scaling a Class of Algorithms. Manage. Sci., 20: 845–862. OREN, S.S. and SPEDICATO, E. 1976. Optimal Conditioning of Self-Scaling Variable Metric Algorithms. Math. Programming, 10: 70–90. OSBORNE, M.R. and SUN, L.P. 1988. A New Approach to the Symmetric Rank-One Updating Algorithm. Tech. Report, NMO/01, School of Mathematics, Australian National University. POWELL, M.J.D. 1976. Some Global Convergence Properties of a Variable Metric Algorithm for Minimization without Exact Line Searches. In R.W. Cottle and C.E. Lemke, editors, Nonlinear Programming. SIAM- AMS Proceedings, Vol. IX, SIAM Publications. AN OVERVIEW OF SOME PRACTICAL QUASI-NEWTON METHODS 209 POWELL, M.J.D. 1978. Algorithms for Nonlinear Constraints that Use Lagrange Functions. Math. Programming, 14: 224–248. POWELL, M.J.D. 1986. How Bad are the BFGS and DFP Methods when the Objective Function is Quadratic? Math. Programming, 34: 34–47. SHANNO, D.F. and PHUA, K.H. 1978. Matrix Conditioning and Nonlinear Optimization. Math. Programming, 14: 149–160. Spedicato, E. 1978. On a Conjecture of Dixon and Other Topics in Variable Metric Methods. Math. Programming, 15: 123–129. WEI, Z., YU, G., YUAN, G. and LIAN, Z. 2004. The Superlinear Convergence of a Modified BFGS-Type Method for Unconstrained Optimization. COAP, 29: 315–332. WOLKOWICZ, H. 1996. An Efficient Region of Optimal Updates for Least Change Secant Methods. SIAM J. Optim., 5: 172–191. XU, C. and ZHANG, J. 2001. A Survey of Quasi-Newton Equations and Quasi-Newton Methods for Optimization. Annals of Operations research, 103: 213–234. YABE, H., MARTINEZ, H.J. and TAPIA, R.A. 2004. On Sizing and Shifting the BFGS Update within the Sized-Broyden Family of Secant Updates. SIAM J. Optim., 15: 139–160. YUAN, Y. 1991. A Modified BFGS Algorithm for Unconstrained Optimization. IMA J. Numerical Analysis, 11: 325–332. ZHANG, J.Z., DENG, N.Y. and CHEN, L.H. 1999. Quasi-Newton Equation and Related Methods for Unconstrained Optimization. JOTA, 102: 147–167. ZHANG, L. 2005. A Globally Convergent BFGS Method for Nonconvex Minimization without Line Searches. OMS, 20: 737–747. ZHANG, Y. and TEWARSON, R.P. 1988. Quasi-Newton Algorithms with Updates from the Preconvex Part of Broyden’s Family. IMA J. Numer. Anal., 8: 487–509. YUETINGA, Y. and XU, C. 2007. A Compact Limited Memory Method for Large Scale Unconstrained Optimization. European Journal of Operational Research, 180: 48-56. Received 13 February 2007 Accepted 9 October 2007