Numerical Experience with Damped SQU Journal for Science, 17 (1) (2012) 1- 11 © 2012 Sultan Qaboos University 1 Numerical Experience with Damped Quasi-Newton Optimization Methods when the Objective Function is Quadratic Mehiddin Al-Baali* and Anton Purnama** Department of Mathematics and Statistics, Sultan Qaboos University, Muscat, Oman, *Email: albaali@squ.edu.om and **Email: antonp@squ.edu.om. ABSTRACT: A class of damped quasi-Newton methods for nonlinear optimization has recently been proposed by extending the damped-technique of Powell for the BFGS method to the Broyden family of quasi-Newton methods. It has been shown that this damped class has the global and superlinear convergence property that a restricted class of 'undamped' methods has for convex objective functions in unconstrained optimization. To test this result, we applied several members of the Broyden family and their corresponding damped methods to a simple quadratic function and observed several useful features of the damped-technique. These observations and other numerical experiences are described in this paper. The important role of the damped-technique is shown not only for enforcing the above convergence property, but also for improving the performance of efficient, inefficient and divergent undamped methods substantially (significantly in the latter case). Thus, some appropriate ways for employing the damped-technique are suggested. KEYWORDS: Nonlinear optimization, Quasi-Newton methods, Damped-technique, Line search framework. في األمثليات عندما تكون دالة الهدف تربيعيةنيوتن متماثلة تخامد مع ةعددي ةخبر أنتون برناماو محي الدين البعلي باول إلى حزمة قنيةت رويطمثليات غير الخطية بتفي األ لمتماثلة نيوتنتخامدة ماقتراح حزمة طرق اتم حديث :خصمل التي تمتلكها الفائق أن هذه الحزمة المتخامدة تمتلك خاصية التقارب الخطي إثبات. وقد تم متماثلة نيوتنبرويدن لطرق ، لهدف محدبة. الختبار هذه النتيجةعندما تكون دوال ا المقيدة ة للطرق غير المتخامدة في األمثليات غيرحددالحزمة الم من عناصر حزمة برويدن وما يقابلها من الطرق المتخامدة على دالة تربيعية بسيطة والحظنا عددا من الصفات ا طبقنا عدد تم توضيح حيثالعددية األخرى في هذه الورقة. بارجتغيرها من الهذه المالحظات وشرح تم وقد التخامد. لتقنية الجيدة ، بل لتحسين سير الطرق السريعة والبطيئة وغير على التقارب فحسبمن أجل الحصول التخامد ليس لتقنيةالدور الرئيسي التخامد. قنيةتراح طرق مناسبة الستخدام تمتميز في الحالة األخيرة(. لذا تم اقوالمتقاربة بشكل كبير ) MEHIDDIN AL-BAALI and ANTON PURNAMA 2 1. Introduction onsider iterative quasi-Newton methods with line search for solving the unconstrained optimization problem min ( ) ,f x where : n f R R is a smooth function. On each iteration k of these methods, an estimate of a solution kx and a positive definite Hessian approximation kB are used to obtain a new estimate 1.kx  Then, kB is updated to a new Hessian approximation 1kB  by using a member of the Broyden family in terms of the differences in points 1=k k ks x x  and gradients 1= ,k k ky g g  where kg denotes the gradient ( ).kf x This family consists of efficient, inefficient and divergent methods. The former two types of methods are usually defined sufficiently closely to the well known BFGS and DFP methods, respectively, while the latter type is defined sufficiently remotely away from these methods. For further details see for instance (Fletcher, 1987; Dennis and Schnabel, 1996; Nocedal and Wright, 1999). Al-Baali and Grandinetti (2009) show that the performance of the BFGS method can be improved if ky is modified before updating to the damped-technique ˆ = (1 ) ,k k k k k ky y B s   (1) where k is a parameter chosen appropriately and sufficiently large in the interval (0,1] . The resulting damped (D)-BFGS method is proposed by Powell (1978) for the Lagrangian function in constrained optimization and used many times with only values of 0.8k  , see for example (Fletcher, 1987; Nocedal and Wright, 1999). The aim of this paper is to show that small values of k can be used to improve the behaviour of not only the BFGS method, but also all members of the Broyden family of methods for unconstrained optimization. To illustrate this possibility, we applied several members of this family and their corresponding damped methods to a simple quadratic function of two variables, using certain initial point 1x and Hessian approximation 1.B For particular choices of ,k it is shown that many damped methods work substantially better than the BFGS method. The analysis is organized in the following way. Section 2 describes the Broyden and D-Broyden classes of methods, Section 3 provides some numerical results and Section 4 concludes the paper. 2. Damped quasi-Newton methods We now briefly describe the Broyden and D-Broyden classes of methods. On each iteration of these methods, the current point kx and a symmetric and positive-definite Hessian approximation kB are available, given initially 1x and 1.B Using these available data, a new point 1 1 =k k k k kx x B g    is calculated. Here, k denotes a steplength which is usually chosen such that the Wolfe conditions 1 0 T k k k kf f s g   (2) and 1(1 ) , T T k k k ks y s g   (3) where kf denotes ( ),kf x 0 (0, 0.5)  and 1 0( ,1),  are satisfied. Then kB is updated to a new Hessian approximation 1 ˆ ˆ ˆ ˆ= ( ) , ˆ T T T Tk k k k k k k k k k k k k kT T k k k k k B s s B y y B B s B s v v s B s s y     (4) C NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 3 where k is a parameter, ˆ ˆ = ˆ k k k k T T k k k k k y B s v s y s B s  (5) and ˆky is defined by (1) for a suitable value of .k This class of damped updates is reduced to the Broyden family if ˆ =k ky y (which corresponds to = 1k ). Thus, if this equality holds for all iterations we obtain the Broyden family of methods. Otherwise, we obtain the D-Broyden class of methods. In particular, the choices = 0k and = 1k yield the D-BFGS and D-DFP methods which correspond (for these choices and = 1k ) to the well known BFGS and DFP methods, respectively. Although the restricted class of quasi-Newton methods (defined for = 1k and < < 1,k k  where k is a certain negative value sufficiently close to zero) converges globally and q-superlinearly for convex functions, the performance of the class becomes worse as k increases. In general, the BFGS method is robust and the DFP method is inefficient, though the former method usually suffers from large eigenvalues of kB , see in particular (Powell, 1986; Byrd et al., 1992). Therefore several modification techniques have been introduced to the BFGS method, see for example (Yabe et al., 2007; Gratton and Toint, 2010). Here, we test the D-Broyden class of methods which is an extension of the D-BFGS update of Powell (1978) in the augmented Lagrangian and SQP methods for constrained optimization. For details on the latter case, see for instance (Fletcher, 1987; Nocedal and Wright, 1999). This damped method was applied to unconstrained optimization problems for the first time by Al-Baali (2003) who considered 2 2 3 3 , < 1 1 = , > 1 1 1, otherwise , k k k k k                    (6) where = T k k k T k k k s y s B s  (7) and 2 3, 0.   If 3 ( 1) ,k c   where > 0,c then the bound k c  is guaranteed. This formula with damped-technique (1) ensures that ˆ T k ks y is sufficiently close to either the curvature T k ks y or approximate curvature T k k ks B s if k is sufficiently large or small, respectively. Another useful feature of formula (6) is that ˆTk ks y remains sufficiently positive even when 0 T k ks y  (which may occur if the Wolfe condition (3) is not enforced on ks ). We also observe that formula (6) is reduced to the choice of Powell (1978) if 2 = 0.8 and 3 = .  For certain choices of these parameters, Al-Baali (2004) introduced formula (6) to the limited memory L-BFGS method of Nocedal (1980) and reported encouraging numerical results in certain cases. Other encouraging results for the D-BFGS method with formula (6) on unconstrained optimization have been reported by Al-Baali and Grandinetti (2009). The authors also show that this formula works better than other modified BFGS formulae which modify ky to the form 1 1ˆ = [6( ) 3( ) ] , T k k k k k k k ky y f f g g s u     (8) for certain vectors , see for example (Yabe et al., 2007) and the references therein. Because this form is reduced to ky when the objective function is quadratic, the proposed modified BFGS methods maintain the difficulty MEHIDDIN AL-BAALI and ANTON PURNAMA 4 associated with the BFGS method in this case. Due to this feature and the above observations, we consider only formula (6) here and state the following investigations. In practice, we observed that formula (6) with 2 sufficiently close to 0.6 seems to work well in some cases, but generally worsens the performance of the D-BFGS method as 2 decreases. This observation illustrates that large values of 2 such as 0.8 or 0.9 are usually recommended when the D-BFGS update is employed in methods for constrained optimization, see for example (Fletcher, 1987; Nocedal and Wright, 1999; Powell, 1978, 2009). We will argue below that small values of k are also useful in some cases, although large values of k yield small changes in .ky Al-Baali (2011) extends the global convergence property of the restricted Broyden class of methods to that of D-Broyden, assuming that 21 2 1 (1 ) ,k k kk          (9) where 1 2, (0,1),   1 1 = , = , = 1 T T k k k k k k k k kT T k k k k k k s B s y B y b h b h s y s y    (10) and (0,1]k  is given by = . (1 ) k k k k kb      (11) We observe that < 0,k and that condition (9) hold for the choice = 0k (the BFGS option) and any values of 1 and 2. Thus, this condition adds another useful feature to the BFGS method over the other methods. Although prior to this result k was never chosen outside the interval ( ,1],k except for a well defined SR1 method (see for example Fletcher, 1987), condition (9) allows any real number k for sufficiently small values of k (or k ). Recently, for further restrictions on ,k Al-Baali (2011) extends the above superlinear convergence property to the damped methods and shows that ( 1) 0k k    and 2 ( 1) 0 ,k k kb h   (12) noting that = 1/ .k kb Although this result suggests using a small value of < 1k when either term 1k  or 1k kb h  is sufficiently remotely away from zero, we use this value only when both terms satisfy the latter condition. This suggestion is due to the fact that the Broyden family of updates is reduced to a single update if = 1k kb h (see for example Al-Baali, 1993). Therefore, since the first limit in (12) yields formula (6) (see Al-Baali, 1993), we consider employing it only when the value of the product k kb h is sufficiently larger than one so that we obtain 2 2 4 3 3 4 , < min(1 , ) 1 1 = , 1 < < 1 1 1, otherwise , k k k k k k k h h                        (13) NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 5 where 4 0.  This modified formula ensures that the value of k or its modification lies in 2 3[1 , 1 ]   when 4> 1 .k kb h  If the value of 4 = 0 is used, then the damped-technique is always employed, except when the Broyden family is reduced to a single update. In particular, this value with 2 = 0.8 and 3 =  reduces formula (13) to that of Powell (1978) for D-BFGS, except when = 1k kb h and 2< 1 .k  Indeed, for some choices of these parameters, we observed that formula (13) works better than (6). We now consider the second limit of (12) which suggests using 1/2 4= / ( 1)k k kb h   if 2 4> 1 ,k kb h  where 4 0,  which can be substituted into (11) to obtain the corresponding value of .k However, since 4 is chosen by the user and k k  if 1,k  we consider 4 4, > 1 1= 1, otherwise , k k k kk b h b h         (14) where, as above, 4 0.  If 4 = 0 is used, then as for formula (13), the damped-technique is always employed, except when the Broyden family is reduced to a single update. Since 2 ,k k  Al-Baali (2011) considers formula (14) with the denominator replaced by 1.k kb h  Although other choices for k exist, formulae (13) and (14) are considered here for illustration. We note that the above formulae for k are independent of the updating parameter k , so that the global convergence condition (9) may not hold for sufficiently large values of | | .k Therefore, in this case, this condition can be enforced by choosing k sufficiently small. For example, the equality in (9) could be solved for k which can be substituted into (11) to obtain the largest acceptable value of .k In this way, all the damped algorithms which we consider below for ( ,1)k k   (the divergent Broyden options) solved the considered problem successfully. However, alternatively, replacing 1k kb h  by = ( 1) max(1,| |)k k k ka b h   (15) in formula (14), it follows that 4 4, > = 1, otherwise . k kk a a         (16) Using formula (16), we observed that all the damped algorithms which we consider below for several values of k from inside and outside the convex interval solved the problem successfully with performance significantly better than that of BFGS. On the basis of the above discussions, we state the following outline for practical damped quasi-Newton methods. We note that the strong Wolfe conditions in Step 3 yield the Wolfe conditions (2)-(3). If, on every iteration, the choice ˆ =k ky y is used in Step 5, which follows from (1) and (13) with values of 2 = 1, 3 =  and 4 = ,  then Algorithm 2.1 is reduced to the 'undamped' Broyden family of methods. In particular, this choice with = 0,k for all values of k, reduces the D-BFGS method to the standard BFGS method. In practice, we observed that several values for 2 > 1 2 improve the performance of the BFGS and DFP methods (the latter significantly). If k is chosen such that the usual positive definiteness property holds (which we consider here), then any value of k satisfies condition (9). Indeed, we observed that sufficiently small MEHIDDIN AL-BAALI and ANTON PURNAMA 6 values of k are sometimes useful, particularly when | |k is remotely away from zero. These features will be illustrated in the next section. Algorithm 2.1 D-Broyden class Step 0: Given a starting point 1,x a symmetric and positive-definite matrix 1,B positive values of 0 and 1, and a tolerance > 0. Set = 1.k Step 1: Terminate if .kg  Step 2: Compute the search direction 1 = .k k kd B g   Step 3: Find a steplength k and a new point 1 =k k k kx x d  such that the following strong Wolfe conditions hold: 1 0 1 1, | | . T T T k k k k k k k k kf f d g d g d g        (17) Step 4: Compute ,ks ,ky kb and .kh Step 5: Choose values for k and .k Step 6: Update ky to ˆky and hence kB to 1,kB  using (1) and (4), respectively. Step 7: Set := 1k k  and go to Step 1. 3. A numerical example The question that has been most useful to the development of successful algorithms for unconstrained optimization is 'Does the method work well when the objective function is quadratic?' This statement is given by Powell (2009) who also states that the answer is very welcome and encouraging for the updating of second derivative matrices of quadratic models by a Broyden family method. Since this family usually suffers from ill- conditioned problems, we would like to apply some selected Broyden family methods to a quadratic function with a very ill-conditioned Hessian matrix. Because quasi-Newton methods are invariant under linear transformation, the difficulty is unchanged if we choose the simple quadratic function 21 ( ) = , 2 T f x x x x R which has the solution * = 0x and identity Hessian 2 ( ) = ,f x I and the initial Hessian approximation 1B is very ill-conditioned. In particular, Powell (1986) uses 1 = diag(1, ) ,B  where > 0, and 1 1 = , = . 11 c x c c          The author simplifies the analytic derivation of 1kx  when calculated by either the BFGS or DFP method with = 1.k He shows that for large values of , BFGS performs badly and DFP is far worse, although ky is exact. Note that the former performance is maintained for the modified BFGS methods which use (8). The derivation of Powell (1986) for 1kx  has been extended to a restricted Broyden class of methods by Byrd et al. (1992) who also show that the performance of this class becomes worse as k increases. Since these papers illustrate that NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 7 these methods usually suffer from large eigenvalues of kB rather than small ones, we will describe below the numerical results for a very large value of . In a similar manner, we have tested the D-Broyden class of methods by applying Algorithm 2.1 to the above problem. To define Step 0 of this algorithm, we let 1x and 1B be given as above, using 10 = 10 and 7 = 10 .  We do not define 0 and 1 here, because = 1k was used in Step 3 on all iterations whether this value satisfies the Wolfe conditions or not. Nevertheless, these conditions hold to the limit if a quasi-Newton method converges superlinearly (Dennis and Moré, 1974). Thus the algorithms we consider below differ only in the choices of k and k as required in Step 5 of Algorithm 2.1. All experiments were run in Matlab. The damped-technique (1) with suitable values of k has the ability of correcting the Hessian approximations successfully. Here, we report the results for several choices of k and 7 7 { 10 , 100, 0.5, 0, 0.5,1,1.5,100,10 }k     which belong to the so called convex, preconvex and postconvex classes of methods (defined for 0 1,k   < 0k and > 0,k respectively). We will report the number of function evaluations ( nfe ) which is required to terminate Algorithm 2.1 in Step 1 with the Euclidean norm .kg  This number is the same as that of the gradient evaluations as well as that of iterations. 'F' indicates the algorithm was terminated before the latter inequality held, ie the algorithm failed to solve the problem. We also consider the BFGS method for comparison, which required = 32nfe to solve the problem. Table 1. D-BFGS with formula (6) 2 0.95 0.9 0.7 0.6 0.5 0.4 0.1 0.01 0.001 610  nfe 32 32 32 27 35 47 220 2107 18887 F We firstly tested the D-BFGS method with k  given by formulae (6), (13) and (14) for various choices of the parameters 2 and 4. We do not consider choices for 3 here, because the second case in (6) and (13) was not used as the inequality 3> 1k  was not satisfied for 3 > 0. The results expressed in terms of nfe are presented in Table 1. We used (6) for 10 different values of 2 (0,1).  For sufficiently large values of 2 , D-BFGS required = 32nfe to solve the problem, which is the same as that required by BFGS. The reason is that using 2 = 1 in formula (6) yields = 1k which (by the damped-technique (1)) reduces the D-BFGS update to the BFGS one. We also note that D-BFGS works slightly better than BFGS when 2 is close to 0.6, while it becomes worse as 2 decreases in (0,0.5). Indeed, D-BFGS also failed to solve the problem when values of 6 2 10   were used. This observation is somehow expected on the basis of some results from the literature, because (according to our knowledge) values of 2 = 0.8 or 0.9 have been used in the D-BFGS method for constrained optimization during the last three decades, see for example (Powell, 2009). This drawback of formula (6) with choices of 2 0.5  can be avoided by employing it only when the BFGS update is sufficiently away from the rank one update, that is when k kb h is sufficiently away from 1. To illustrate this feature, we repeated the run using formula (13) with most of the above values of 2 and some values of 4 0.  The results are given in Table 2. We do not report the results for 4 > 2 here, because we observed that D-BFGS required the same nfe as that required for 4 = 2. It is clear that the damped-technique MEHIDDIN AL-BAALI and ANTON PURNAMA 8 improves BFGS substantially when 4 is defined sufficiently close to 0.5 and 2 0.6  and significantly for very small values of 2. It is rather surprising that D-BFGS with 6 2 = 10  and 4 < 2 required at most = 7nfe to solve the problem. This result shows that small values of 2 (and hence k ) are useful in practice. This table also shows that the choice of 4 = 0 (or nearly so) which yields that the damped-technique is employed on most iterations, is not desirable. Table 2. D-BFGS with formula (13) 4 2\  0.95 0.9 0.7 0.6 0.5 0.4 0.1 0.01 310  6 10  2 32 32 32 32 32 32 32 32 32 32 1.5 32 32 32 20 18 17 12 8 7 6 0.95 32 32 32 20 18 17 12 8 8 6 0.5 32 32 32 20 18 17 12 8 8 5 0.1 32 32 32 20 19 18 12 8 8 5 0.001 32 32 32 22 20 19 13 8 8 5 6 10  32 32 32 24 21 19 14 9 8 5 0 32 32 32 27 27 25 87 625 3918 7 The damped-technique seems to work well if the left limit in (12) is enforced only when the term under the right limit, 1,k kb h  is sufficiently away from zero. Therefore, it is worth testing formula (14) which enforces the right limit in (12). The results are given in Table 3. We observe that the damped-technique improves over BFGS as 4 decreases. For example when 4 = 0.01, D-BFGS required only = 8nfe , which is very small compared to 32 as required by BFGS. Table 3. D-BFGS with formula (14) 4 2 1 0.7 0.6 0.5 0.4 0.1 0.01 310  6 10  0 nfe 32 19 17 16 15 14 11 8 7 5 4 To see if the above encouraging numerical results for D-BFGS are generalised to other members of the D- Broyden class, we repeated the run for formula (14) with the same values of 4 and several values of ;k some of them are very remotely away from 0 (the BFGS option). The numerical results are presented in Table 4. In the first row of results, we used 4 =  to obtain = 1k for all k and, hence, the corresponding undamped quasi-Newton methods were considered for comparison. The failures for < 0k and > 1k are expected, see for example (Byrd et al., 1992). Note that 10 = 10nfe  required for = 1k (the DFP option) is expected, see (Powell, 1986). The remaining results in Table 4 were obtained by testing condition (9) with 1 2= = 0.05.  If it does not hold for the computed ,k we reduce this value such that this condition holds with equality. In this way convergence is enforced to all algorithms which, indeed, solves the problem successfully. Thus the second row NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 9 of results with 4 =  and condition (9) being enforced indicates that the damped-technique was employed for all algorithms defined with [0,0.95].k  Table 4. D-Broyden with formula (14) and enforcing condition (9) 4 \ k  710 -100 -0.5 0 0.5 1 1.5 100 7 10 * F F F 32 78 1010 F F F  16 21 16 32 78 411 118 150 18424 2 17 9 8 32 78 411 118 65 18425 0.95 17 9 13 19 22 30 30 65 18425 0.5 14 10 15 15 16 18 21 65 18425 0.1 11 19 10 11 11 12 15 66 18425 0.01 11 7 8 8 8 10 13 65 18425 0.001 11 7 7 7 7 9 12 64 18425 6 10  6 5 5 5 5 7 10 62 18425 0 7 7 7 7 7 9 12 79 22172 * Condition (9) was not enforced. An examination of the other results in Table 4 shows the following. The damped algorithms with 1.5k  and 4 0.1  work substantially better than BFGS. It is rather surprising that all algorithms with < 0,k which sometimes yields indefinite Hessian approximations in the undamped algorithms, solved the problem successfully. Although BFGS works much better than the algorithms for 100,k  it is also surprising that they solved the problem successfully. Note that the slow convergence of these algorithms is avoided as described in the following paragraph. Table 5. D-Broyden with formula (16) 4 \ k  710 -100 -0.5 0 0.5 1 1.5 100 7 10 * F F F 32 78 1010 F F F  16 21 16 32 78 411 118 150 18424 0.95 20 16 13 19 22 27 23 16 18 0.5 13 11 15 15 16 17 16 12 12 0.1 12 8 10 11 11 10 10 8 34 0.01 820 7 8 8 8 8 8 8 1109 0.001 7 24 7 7 7 7 7 10 6 6 10  5 6 5 5 5 6 6 5 5 0 4 4 4 4 4 4 4 4 4 * Condition (9) was not enforced. We now repeat the test with formula (16). The results are given in Table 5. We observe the surprising results that all damped algorithms solve the problem successfully. For sufficiently small values of 4 , the MEHIDDIN AL-BAALI and ANTON PURNAMA 10 algorithms performance is significantly better than that of BFGS, and the algorithms improve as 4 decreases. 4 = 0 means that the damped-technique is always employed, except when the Broyden family of updating formulae is reduced to the SR1 update. All algorithms approach an optimal method with = 4nfe which is required to solve the problem by a well defined SR1 method. Although 4 = 0 gives the best choice, the value of 4 = 0.95 might be typical for a general function, and further experiments on several test problems should be considered. It is worth reporting that an application of some algorithms to the above quadratic problem (or its linear transformation) with some choices for 1x and 10 1 = diag(1,10 )B  has been considered. We observed, as expected, that all the algorithms solved the problem successfully with = 2nfe or 3 whether the damped- technique was employed or not. To see if the above results are generalised to general functions, we applied Algorithm 2.1, employing the strong Wolfe conditions (17) with the usual values of 4 0 = 10  and 1 = 0.9, to some unconstrained optimization problems and observed the following. When k is sufficiently away from zero, the damped algorithms work significantly better than the undamped Broyden methods. For = 0,k we observed that generally the standard BFGS method is preferable to the D-BFGS algorithm unless further restrictions on the damped parameter k are made or, similarly, the scalars 2 , 3 and 4 are appropriately chosen. Indeed, Al- Baali and Grandinetti (2009) showed that the performance of the D-BFGS method with formula (6) for 2 = 0.9, 3 = 9 and further restrictions on k is substantially better than that of the BFGS method. This performance is improved further when both the damped and self-scaling techniques are combined in a certain sense (Al-Baali and Khalfan, 2009). For large-scale problems, Al-Baali (2004) introduced the damped formula (6) to the limited memory L-BFGS method, see for example (Nocedal and Wright, 1999). Using 2 = 0.6, 3 3  and sufficiently large values of ,k Al-Baali (2004) reported encouraging numerical results in certain cases. 4. Conclusion It is shown that the D-Broyden class of methods with appropriate choices for the damped-technique work very well when applied to the quadratic function of Powell (1986). In particular, the damped parameter (16) improves the performance of robust methods (e.g. BFGS) substantially and inefficient methods (e.g. DFP) significantly and enforces fast convergence of divergent methods. The numerical results also demonstrate that small values for the damped parameter k are useful in some cases. Since this finding contradicts the well known fact that large values of 0.8k  should be used in the D- BFGS update of methods for the Lagrangian function in constrained optimization (see for example Fletcher, 1987; Nocedal and Wright, 1999; Powell, 1978; Powell, 2009; Gill and Leonard, 2003), it is worth noting that smaller values of k using formulae (13) and (14) which modify that of Powell (1978) should also be further investigated. For a general function, these formulae can be used with ky replaced by the right hand side of (8) as in (Al- Baali and Grandinetti, 2009). 5. Acknowledgment We would like to thank the two anonymous referees for making a number of valuable comments on a draft of this paper. NUMERICAL EXPERIENCE WITH DAMPED QUASI-NEWTON METHODS 11 6. References AL-BAALI, M. 1993. Variational quasi-Newton methods for unconstrained optimization. JOTA, 77: 127-143. AL-BAALI, M. 2003. Quasi-Wolfe conditions for Newton-like methods. 18th International Symposium on Mathematical Programming, Copenhagen, Denmark, August 18-22, 2003. AL-BAALI, M. 2004. Quasi-Wolfe conditions for quasi-Newton methods for large-scale optimization. 40th Workshop on Large-Scale Nonlinear Optimization, Erice, Italy, June 22 - July 1, 2004. AL-BAALI, M. 2011. Convergence analysis of a family of damped quasi-Newton methods for nonlinear optimization. Research Report DOMAS 11/2, Sultan Qaboos University, Oman. AL-BAALI, M. and GRANDINETTI, L. 2009. On practical modifications of the quasi-Newton BFGS method. Advanced Modeling and Optimization, 11: 63-76. AL-BAALI, M. and KHALFAN, H.F. 2009. A combined class of self-scaling and modified quasi-Newton methods. Optimization Online, 07/2334 (to appear in COAP). BYRD, R.H., LIU, D.C. and NOCEDAL, J. 1992. On the behavior of Broyden's class of quasi-Newton methods. SIAM J. Optim., 2: 533-557. DENNIS, J.E. and MORÉ, J.J. 1974. A characterization of superlinear convergence and its application to quasi- Newton methods. Math. Comp., 28: 549-560. DENNIS, J.E. and SCHNABEL, R.B. 1996. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM Publications, Philadelphia. FLETCHER, R. 1987. Practical Methods of Optimization (2nd edition). Wiley, Chichester, England. GILL, P.E. and LEONARD, M.W. 2003. Limited-memory reduced-Hessian methods for large-scale unconstrained optimization. SIAM J. Optim., 14: 380-401. GRATTON, S. and TOINT, P.L. 2010. Approximate invariant subspaces and quasi-Newton optimization methods. OMS, 25: 507-529. NOCEDAL, J. 1980. Updating quasi-Newton matrices with limited storage. Math. Comput., 35: 773-782. NOCEDAL, J. and WRIGHT, S.J. 1999. Numerical Optimization. Springer, London. POWELL, M.J.D. 1978. Algorithms for nonlinear constraints that use Lagrange functions. Math. Programming, 14: 224-248. POWELL, M.J.D. 1986. How bad are the BFGS and DFP methods when the objective function is quadratic?. Math. Programming, 34: 34-47. POWELL, M.J.D. 2009. On nonlinear optimization since 1959. Optimization Online, 11/2477. YABE, H., OGASAWARA, H. and YOSHINO, M. 2007. Local and superlinear convergence of quasi-Newton methods based on modified secant conditions. Journal of Computational and Applied Mathematics, 205: 617-632. Received 20 April 2011 Accepted 4 October 2011