SQU Journal for Science, 2021, 26(2), 141-151 DOI:10.53539/squjs.vol26iss2pp141-151 Sultan Qaboos University 141 ϯ Some preliminary numerical results were presented at 5 th International Conference on Numerical Analysis and Optimization (NAO-V), January 2020, Muscat, Oman Improved Fletcher–Reeves Methods Based on New Scaling Techniques ϯ Amal Al-Saidi 1,* and Mehiddin Al-Baali 2 1 Department of Mathematics, Sultan Qaboos University, Muscat, Sultanate of Oman. 2 Department of Mathematics, Sultan Qaboos University, Muscat, Sultanate of Oman; and GERAD-HEC, Montréal, Canada. *Email: Amal2.AlSaeedi@moe.om. ABSTRACT: This paper introduces a scaling parameter to the Fletcher-Reeves (FR) nonlinear conjugate gradient method. The main aim is to improve its theoretical and numerical properties when applied with inexact line searches to unconstrained optimization problems. We show that the sufficient descent and global convergence properties of Al- Baali for the FR method with a fairly accurate line search are maintained. We also consider the possibility of extending this result to less accurate line search for appropriate values of the scaling parameter. The reported numerical results show that several values for the proposed scaling parameter improve the performance of the FR method significantly. Keywords: Unconstrained optimization; Large-scale optimization; Line search framework; Nonlinear conjugate gradient methods; Fletcher-Reeves method. جديدةموازنة تقنيات إستنادا إلى ريفز المحسنة -فليتشر طرق أمل السعيدي و محي الدين البعلي ة عدديالنظرية وال الصفات. الهدف الرئيسي هو تحسين ةغيرالخطي ةطرق التدرج المترافقفي ريفز -رفليتش لطريقة موازنةيقدم هذا البحث معامل الملخص: نحدارالكافي والتقارب اإلفي البعليصفات الحفاظ على بين إمكانيةغيرالمقيدة. ن على مسائل األمثليات ةتقريبي بحثمستقيمات عند تطبيقها مع لتلك الطريقة قيم مناسبة ستخدام إتقريبي ببحث مستقيمهذه النتيجة إلى عميمعتبارأيًضا إمكانية تاإل. نأخذ في ة معقولةدقتقريبي ببحث مستقيممع FRلطريقة لامشال .بشكل ملحوظ FRن أداء طريقةيتحسل موازنةالعديد من قيم معامل الوجود . تظهرالنتائج العددية موازنةلمعامل ال ريفز. -فليتشر، طريقة ةغيرالخطي ة، طرق التدرج المترافقالبحث مستقيم، مسائل ذات أبعاد عالية، غيرالمقيدة األمثليات الكلمات المفتاحية: 1. Introduction This paper is concerned with the line-search Fletcher-Reeves (FR) conjugate gradient (CG) method for solving the unconstrained optimization problem min 𝑥∈𝑅𝑛  𝑓(𝑥), (1) where f: Rn → R is a smooth function and its gradient 𝑔(𝑥) = ∇f (x) is available for any value of x. For a given starting point x1, the method defines a sequence of points {𝑥𝑘} iteratively as follows. On each iteration, once the function value 𝑓𝑘 = 𝑓(𝑥𝑘) and the gradient value 𝑔𝑘 = 𝑔(𝑥𝑘) are calculated, a search direction 𝑑𝑘 is defined such that the descent property 𝑑𝑘 𝑇𝑔𝑘 < 0 holds, assuming 𝑔𝑘 ≠ 0 since 𝑔𝑘 = 0 holds at a solution of problem (1). Then it is possible to find a positive steplength αk and hence a new point mailto:Amal2.AlSaeedi@moe.om AMAL AL-SAIDI and MEHIDDIN AL-BAALI 142 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑑𝑘 (2) such that the function reduction 𝑓𝑘 − 𝑓𝑘+1 is sufficiently positive (see for example Fletcher [1]). In the class of CG methods, the search direction 𝑑𝑘 is defined initially by the steepest descent choice 𝑑1= −𝑔1 that satisfies the above descent property for k = 1 . The other directions are defined by 𝑑𝑘+1 = −𝑔𝑘+1 + 𝛽𝑘𝑑𝑘, (3) where βk is the CG parameter. In particular, Fletcher and Reeves [2] propose the positive value of 𝛽𝑘 = ∥∥𝑔𝑘+1∥∥ 2 ∥∥𝑔𝑘∥∥ 2 (4) (referred to as 𝛽𝑘 𝐹𝑅), where || ∙ || denotes the Euclidean norm. If the exact line search equation 𝑑𝑘 𝑇𝑔𝑘+1 = 0 (5) is satisfied, which is guaranteed if the steplength αk = α ∗ is obtained by solving the minimization subproblem α∗ = arg min𝛼 𝑓(𝑥𝑘 + 𝛼𝑑𝑘) exactly, then the above descent property with 𝑘 replaced by 𝑘 + 1 is also satisfied. Since solving this subproblem exactly (referred to as exact line search) is impractical, inexact line search is usually used so that the descent property may not hold. In this case, choice (4) is replaced by 𝛽𝑘 = 0 (i.e., direction (3) is reset to that of the steepest descent) so that the descent property holds for all k (for further detail, see for example Fletcher [1], Nocedal and Wright [3] and Pytlak [4]). Because this resetting may destroy the useful features of the CG method, in certain cases, many choices for replacing the CG parameter FR formula have been proposed (see for example Hager and Zhang [5] and the references therein). If exact line search is employed and f (x) is quadratic, all the directions generated via (3) are mutually conjugate. Hence, the gradients of 𝑓(𝑥) at the different iterates are mutually orthogonal, i.e., we have 𝑔𝑘+1 𝑇 𝑔𝑘 = 0, (6) and these formulae are reduced to the FR one (see e.g. [1, 3, 4]). Our aim in this paper is to maintain the global convergence property of the FR method and improve its behavior by scaling formula (4) by 𝜉𝑘 before using it in (3) (as in Al-Baali [6]). Therefore, the resulting scaled direction 𝑑𝑘+1 = −𝑔𝑘+1 + 𝜉𝑘𝛽𝑘𝑑𝑘 (7) would be close to the steepest descent one if 𝜉𝑘 is chosen sufficiently close to zero. Hence, by continuity, the corresponding class of scaled CG methods (referred to as ScFR when 𝛽𝑘 = 𝛽𝑘 𝐹𝑅) would satisfy the descent and convergence properties that the steepest descent method has. It is important to mention that Powell [7] and essentially Zoutendijk [8] show that the FR method with exact line searches converges globally, in the sense that lim𝑘→∞ inf ‖𝑔𝑘‖ = 0. This result has been extended by Al-Baali [9] to inexact line searches with a fairly accurate values of αk, known as the first practical global convergence result for the FR method (see for example Nocedal [10]). Al-Baali has obtained this result based on showing that the sufficient descent condition 𝑑𝑘 𝑇𝑔𝑘 ≤ −𝑐∥∥𝑔𝑘∥∥ 2 (8) holds for some positive constant c, assuming the steplength αk satisfies the strong Wolfe conditions 𝑓𝑘+1 ≤ 𝑓𝑘 + 𝜌𝛼𝑘𝑔𝑘 𝑇𝑑𝑘, | 𝑔𝑘+1 𝑇 𝑑𝑘 | ≤ −𝜎𝑔𝑘 𝑇𝑑𝑘, (9) for 0 < ρ < σ < 1 2 . It is worth noting that a value of 𝛼𝑘 > 0 which satisfies these conditions, for 0 < ρ < 1 2 and ρ < σ < 1, can be found in a finite number of operations whenever the above descent property holds (see for example Al-Baali and Fletcher [11] and Fletcher [1]). In Section 2, we show that the ScFR class of methods with 𝜉𝑘 ∈ (0, 1] maintains the sufficient descent property that the FR method has for the strong Wolfe conditions (9) with σ < 1 2 . We will also enforce this property for any line search technique with sufficiently small values of the scaling parameter ξk. In addition, that section defines some choices for 𝜉𝑘. In Section 3, we introduce the quasi-Newton feature to 𝜉𝑘 for the FR method as considered by Al-Saidi et al. [12] for the scaled CG methods. Section 4 shows that the proposed class of ScFR methods maintains the global convergence property that the FR method has when the strong Wolfe conditions are employed with σ < 1 2 . In Section 5, we study the behavior of the proposed ScFR methods by applying them to a set of standard test problems. It is shown that the proposed scaling technique improves the performance of the FR method significantly in many cases. Finally, Section 6 concludes the paper. 2. The ScFR Class of Methods We now define the scaled FR (ScFR) class of methods as suggested by Al-Baali [6] for the CG class of methods. The author replaces the search direction (3) by (7), for k > 1, where 0 < 𝜉𝑘 ≤ 1 is the scaling parameter, and maintains 𝑑1 = −𝑔1. Since the value of 𝜉𝑘 = 0 reduces this direction to that of the steepest descent, it follows by IMPROVED FLETCHER–REEVES METHODS 143 continuity that the ScFR class of methods, with values of 𝜉𝑘 sufficiently close to zero, has the sufficient descent and global convergence properties that the steepest descent method has. We first define some values of 𝜉𝑘 that enforce the sufficient descent condition (8) as follows (Al-Baali [6]). We note that this condition holds for k = 1 and c = 1. We now prove this condition for k ≥ 1 as follows. On substituting (7) into (8) with k replaced by k + 1, we have 𝑔𝑘+1 𝑇 𝑑𝑘+1 = −∥∥𝑔𝑘+1∥∥ 2 + 𝜉𝑘𝛽𝑘𝑑𝑘 𝑇𝑔𝑘+1 ≤ −𝑐∥∥𝑔𝑘+1∥∥ 2 . (10) Hence, 𝜉𝑘𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1) ≤ (1 − 𝑐)∥∥𝑔𝑘+1∥∥ 2 (11) or equivalently by the FR formula (4), 𝜉𝑘(𝑑𝑘 𝑇𝑔𝑘+1) ≤ (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 , (12) where 0 < c ≤ 1. This upper bound on c is included because condition (11) cannot hold for c > 1 and 𝑑𝑘 𝑇𝑔𝑘+1= 0 (the exact line search option). We note that (11) holds if its left hand side is nonpositive or either 𝜉𝑘, 𝑑𝑘 𝑇𝑔𝑘+1 or βk is sufficiently close to zero. This paper assumes that the CG parameter βk is given by the FR formula (4), unless otherwise stated, and defines 𝜉𝑘 such that (11) holds whether dk satisfies the sufficient descent condition (8) or not. For convenience, we start with the following result of Al-Saidi et al. [12] assuming βk satisfies (8). Theorem 1. Consider the ScFR class of methods, defined by (2), (4) and (7) and assume that the search direction (7) with 𝜉𝑘= 1 satisfies the sufficient descent condition (8). Then, this condition remains satisfied for 0 ≤ 𝜉𝑘 ≤ 1. Proof. When k = 1, 𝑑𝑘= −𝑔𝑘 and hence condition (8) holds with c = 1. For k ≥ 1, it follows from (7) that 𝑔𝑘+1 𝑇 𝑑𝑘+1 = −∥∥𝑔𝑘+1∥∥ 2 + 𝜉𝑘𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1). (13) If 𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1) < 0, it follows that 𝑔𝑘+1 𝑇 𝑑𝑘+1 ≤ −∥∥𝑔𝑘+1∥∥ 2 , which is (8) with c = 1 and k replaced by k + 1, since 𝜉𝑘 ≥ 0. Otherwise, if 𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1) ≥ 0, then (13) implies 𝑔𝑘+1 𝑇 𝑑𝑘+1 ≤ −∥∥𝑔𝑘+1∥∥ 2 + 𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1), since 𝜉𝑘 ≤ 1. Hence, by the theorem assumption we obtain (8) with k replaced by k + 1. □ We note that this result remains valid for any value of c > 0 if the choice 𝜉𝑘 = 1 is used when 𝛽𝑘(dk T 𝑔𝑘+1) ≤ 0 as we consider below for some choices of 𝜉𝑘. It is important to note that if the strong Wolfe conditions (9) are employed with σ < 1/2, then the sufficient descent condition (8) holds for c = 𝑐σ ≡ 1−2σ 1−σ (see Al-Baali [9], for detail). Thus, for any choice of c ≤ cσ, 𝜉𝑘 = 1 and the ScFR method is reduced to the standard FR method. However, using values of c > 𝑐σ require defining 𝜉𝑘 < 1, for which we suggest the following way. If condition (11) holds with 𝜉𝑘 = 1, then we use this value so that the CG search direction is unchanged. Otherwise, we choose 𝜉𝑘 such that condition (11) holds with equality. Thus, we choose 𝜉𝑘 = { (1 − 𝑐)∥∥𝑔𝑘+1∥∥ 2 𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1)      if 𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1) > (1 − 𝑐)∥∥𝑔𝑘+1∥∥ 2 , 1     otherwise, (14) or equivalently for the ScFR method, 𝜉𝑘 1 = { (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 𝑑𝑘 𝑇𝑔𝑘+1      if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 , 1     otherwise. (15) In this case, it follows that 𝜉𝑘 1𝛽𝑘 = { (1 − 𝑐)∥∥𝑔𝑘+1∥∥ 2 𝑑𝑘 𝑇𝑔𝑘+1      if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 , 𝛽𝑘 𝐹𝑅     otherwise (16) which shows that scaling is employed only when αk is sufficiently remote away from the right hand side of a minimizer α∗. To be precise, 𝜉𝑘 = 1 is used when 𝛽𝑘(𝑑𝑘 𝑇𝑔𝑘+1) ≤ 0, although in this case (11) holds for any value of 𝜉𝑘 > 0. To obtain the least change in the CG parameter, c should be chosen close to zero (in practice, the value of c = 0.001 seems reasonable as observed by Al-Saidi [13]). We also note that 𝜉𝑘 1= 1 usually is used for a fairly accurate line search (σ ≤ 0.1). Thus, we consider the outline of the scaled FR (ScFR) method as in Algorithm 1 that is reduced to the standard FR method if 𝜉𝑘=1 for all k. If in Step 2 exact line searches (or nearly so) are employed for all k, then Algorithm 1 is reduced to the standard FR method. In practice, we observed that the above scaling technique improves the performance of the FR method significantly in several cases (see Section 5, for details). AMAL AL-SAIDI and MEHIDDIN AL-BAALI 144 Algorithm 1 (ScFR) Step 0: Given ε > 0, 0 < c ≤ 1 and x1, let 𝑑1= −𝑔1 and set k = 1 Step 1: Stop if ||𝑔1||≤ ε Step 2: Compute a positive steplength αk which satisfies certain standard conditions Step 3: Compute a new point by (2); 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑑𝑘 Step 4: Define the FR parameter βk by (4) Step 5: Define 𝜉𝑘 (e.g., by (14)) Step 6: Compute a new search direction by (7); 𝑑𝑘+1 = −𝑔𝑘+1 + 𝜉𝑘𝛽𝑘𝑑𝑘 Step 7: Set k := k + 1 and go to Step 1. It is important to realize that the choice of 𝜉𝑘 1 in (15) is independent of the line search technique. However, if the strong Wolfe conditions (9) are employed, the first case of (15) could be replaced by a smaller value to obtain 𝜉𝑘 2 = { (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 𝜎|𝑑𝑘 𝑇𝑔𝑘|      if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 , 1     otherwise. (17) This choice satisfies the sufficient descent condition (11) with 𝜉𝑘 = 𝜉𝑘 2, because 𝜉𝑘 2 ≤ ξk 1 and we have 𝑑𝑘 𝑇𝑔𝑘+1 > 0 and ξk 2(𝑑𝑘 𝑇𝑔𝑘+1) ≤ ξ1(𝑑𝑘 𝑇𝑔𝑘+1) = (1 − c) ∥∥𝑔𝑘∥∥ 2 . Indeed, replacing 𝜉𝑘 2 by a smaller value, which we consider below, maintains the sufficient descent condition (12). In practice, for c = 0.001, we observed that choice (17) works better than (15). We now consider the rearrangement of Al-Baali [6] for the FR parameter as 𝛽𝑘 = 𝜂𝑘 𝑇𝑔𝑘+1, 𝜂𝑘 = 𝑔𝑘+1 ∥∥𝑔𝑘∥∥ 2 , (18) which is possible for most CG methods (e.g., see Hager and Zhang [5] and Narushima and Yabe [14] and the references therein). Using this form and observing that 𝑑𝑘 𝑇𝑔𝑘+1 ≤ ∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥, (19) the first case of (15) can be reduced to obtain 𝜉𝑘 3 = { (1−𝑐)∥∥𝑔𝑘∥∥ 2 ∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥ if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 1 otherwise. , (20) To increase the interval for the scaling parameter, we use (19) again in the condition of (20) so that we obtain the following choice of Al-Baali [6]: 𝜉𝑘 4 = { (1−𝑐)∥∥𝑔𝑘∥∥ 2 ∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥ if ∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥ > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 1 otherwise . , (21) We note that the interval for using 𝜉𝑘 < 1 in (21) is larger than or equal to that used by (20) so that sufficiently large values of c might be used. Indeed, in practice, we observed that choice (17) is preferable to (21). Thus, we will not provide further details about (21). 3. A Quasi-Newton Feature One feature for choices (20) and (21) is given by Al-Baali [6] who shows that the values of 𝜉𝑘 ≤ (1−𝑐) ∥∥η𝑘∥∥∥∥d𝑘∥∥ , where η for the FR formula is given in (18), enforce some positive eigenvalues of the ScCG matrix 𝐻𝑘+1 = 𝐼 − 𝜉𝑘𝑑𝑘𝜂𝑘 𝑇, (22) noting that direction (7) (similar to that of Perry [15]) can be rearranged as follows 𝑑𝑘+1 = −𝐻𝑘+1𝑔𝑘+1. (23) Thus, we introduce another feature to the ScFR method like that of Al-Saidi et al. [12] for the CG methods by modifying the scaling parameter 𝜉𝑘 such that ideally the quasi-Newton condition 𝐻𝑘+1𝛾𝑘 = 𝛿𝑘 , (24) Where 𝛿𝑘 = 𝑥𝑘+1 − 𝑥𝑘, 𝛾𝑘 = 𝑔𝑘+1 − 𝑔𝑘, (25) holds, assuming 𝐻𝑘+1 approximates the inverse of Hessian of 𝑓(𝑥) at 𝑥𝑘. Because, in general, we cannot fulfill condition (24), we let IMPROVED FLETCHER–REEVES METHODS 145 𝜉𝑘 = arg min 𝜉 ∥∥𝐻𝑘+1𝛾𝑘 − 𝛿𝑘∥∥ = arg min 𝜉  ∥∥𝛾𝑘 − 𝛿𝑘 − 𝜉(𝜂𝑘 𝑇𝛾𝑘)𝑑𝑘∥∥. (26) Solving this minimization problem (see for example Al-Saidi et al. [12]), it follows that 𝜉𝑘 𝑞 = (𝛾𝑘−𝛿𝑘) 𝑇𝑑𝑘 𝜂𝑘 𝑇𝛾𝑘∥∥𝑑𝑘∥∥ 2 . (27) On substituting 𝜂𝑘 for the FR formula (18), (27) is reduced to 𝜉𝑘 𝑞 = (𝛾𝑘−𝛿𝑘) 𝑇𝑑𝑘∥∥𝑔𝑘∥∥ 2 𝛾𝑘 𝑇𝑔𝑘+1∥∥𝑑𝑘∥∥ 2 . (28) To maintain the search direction differs from that of the steepest descent, the values of 𝜉𝑘 should be bounded away from zero. Since, in addition, the first case of (15) enforces the sufficient descent condition (12) with equality, and that replacing 𝜉𝑘 by a smaller value keeps inequality (12) holds, we may enforce 𝜉𝑘 ∈ [ĉ, 1], where ĉ is a small positive number. Thus, we modify (15), which defines 𝜉𝑘 1, to 𝜉𝑘 𝑞1 = { (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 𝑑𝑘 𝑇𝑔𝑘+1      if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 and 𝜉𝑘 𝑞 ∉ [�̂�,1], min{ (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 𝑑𝑘 𝑇𝑔𝑘+1 ,𝜉𝑘 𝑞 }     if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 and 𝜉𝑘 𝑞 ∈ [�̂�,1], 1     otherwise. (29) and modify (17), which defines 𝜉𝑘 2, to 𝜉𝑘 𝑞2 = { (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 𝜎|𝑑𝑘 𝑇𝑔𝑘|      if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 and 𝜉𝑘 𝑞 ∉ [�̂�,1], min{ (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 𝜎|𝑑𝑘 𝑇𝑔𝑘| ,𝜉𝑘 𝑞 }     if 𝑑𝑘 𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥ 2 and 𝜉𝑘 𝑞 ∈ [�̂�,1], 1     otherwise. (30) Since choices (29) and (30) can be rewritten as 𝜉𝑘 𝑞𝑖 = min(max( 𝜉𝑘 𝑞 , ĉ) ,𝜉𝑘 𝑖), for 𝑖 = 1,2 respectively, we observe that 𝜉𝑘 𝑞𝑖 ≤ 𝜉𝑘 𝑖 . Similarly, we modify any 𝜉𝑘 𝑖 , for i = 1, 2, 3, 4, to 𝜉𝑘 𝑞𝑖 = min(max(𝜉𝑘 𝑞 , ĉ), 𝜉𝑘 𝑖 ). In practice, we observed that these modified choices with 𝜎 = 0.1 and ĉ = 0.001 improves the performance of 𝜉𝑘 2 (further possible choices and details can be seen in Al-Saidi [13]). Therefore, the above choices for 𝜉𝑘 are chosen to enforce the sufficient descent condition as shown in the following result which is an extension to Theorem 1. Theorem 2. Consider the ScFR class of methods, defined by (2), (4) and (7), and let 𝜉𝑘 = 𝜉𝑘 𝑖 or 𝜉𝑘 = 𝜉𝑘 𝑞𝑖 , for 𝑖 = 1,2,3,4. Then, the sufficient descent condition (8) holds for all k ≥ 1. Proof. It is obvious for k = 1. For k ≥ 1, 𝜉𝑘 = 𝜉𝑘 𝑖 , 1 ≤ i ≤ 4, are chosen such that inequality (12) holds which is equivalent to the sufficient descent condition (10). Since 𝜉𝑘 𝑞𝑖 ≤ 𝜉𝑘 𝑖 , 1 ≤ i ≤ 4, inequalities (11) and (12) remain satisfied so that condition (10) holds. Hence, condition (10) implies (8). □ Introducing other features to the scaling parameter 𝜉𝑘 are also possible. In particular, we could introduce the useful conjugacy condition γ𝑘 𝑇 𝑑𝑘+1= 0. This equation with (7) holds if 𝜉𝑘 = 𝜉𝑘 𝑐 ≡ γ𝑘 𝑇 𝑔𝑘+1 β𝑘γ𝑘 𝑇 𝑑𝑘 which exists if the Wolfe condition holds. Similarly if we consider the modified conjugacy condition of Dai and Liao [16], γ𝑘 𝑇 𝑑𝑘+1= td𝑘 𝑇 𝑔𝑘+1 where t ≥ 0 is a scalar, we obtain 𝜉𝑘 = (𝛾𝑘−𝑡𝑑𝑘) 𝑇𝑔𝑘+1 𝛽𝑘𝛾𝑘 𝑇𝑑𝑘 which reduces direction (7), for any βk to the class of Dai and Liao. It is worth noting that Al-Baali [6] defines a bound on 𝜉𝑘 to ensure that matrix (22) is positive definite, but we do not consider it here because it may increase the value of 𝜉𝑘 so that the sufficient descent condition (11) may not hold. 4. Global Convergence Property We now study the possibility of imposing the global convergence property lim 𝑘→∞  inf ∥∥𝑔𝑘∥∥ = 0 (31) to the ScFR methods which we proposed in the previous sections such that the sufficient descent condition (8) holds for all k ≥ 1. Therefore, we first state the following standard assumptions on the objective function. Assumption 1. 1. The level set ℒ = {x: f (x) ≤ f (x0)} is bounded, 2. T he objective function f is continuously differentiable in some neighborhood 𝒩 of ℒ, 3. The gradient 𝑔(𝑥) 𝑖𝑠 Lipschitz continuous, that is there exists a positive constant L such that ∥ 𝑔(𝑥) − 𝑔(�̃�) ∥≤ 𝐿 ∥ 𝑥 − �̃� ∥ for all 𝑥, �̃� ∈ 𝒩. (32) AMAL AL-SAIDI and MEHIDDIN AL-BAALI 146 Supposing Assumption 1 holds, Powell [7] and Zoutendijk [8] prove the ideal result that the FR method, defined by (2), (3) and (4), satisfies the global convergence property (31) if exact line searches are used. This means that impractical choice of 𝜎 = 0 in (9) is considered. Al-Baali [9], however, considers the practical strong Wolfe conditions with 𝜎 < 1 2 to show that the sufficient descent condition (8) holds and extends the above global convergence result to the practical FR method. This result has been extended by Touati-Ahmed and Storey [17] to the interval 0 ≤ βk ≤ 𝛽𝑘 𝐹𝑅, while Gilbert and Nocedal [18] have extended it to −𝛽𝑘 𝐹𝑅 ≤ 𝛽𝑘 ≤ 𝛽𝑘 𝐹𝑅. (33) Thus, we obtain the following convergence result for the ScFR methods. Theorem 3. Suppose Assumption 1 holds and let the ScFR class of methods be defined by (2), (4), (7), and |𝜉𝑘| ≤ 1. If the strong Wolfe conditions (9) are employed with 𝜎 < 1 2 , then the ScFR methods converge globally in the sense that limit (31) holds. Proof. The ScFR methods are defined in the previous sections like the FR method, except that the FR formula 𝛽𝑘 𝐹𝑅 > 0, defined by (4), is replaced by 𝛽𝑘 𝑆𝑐𝐹𝑅 = 𝜉𝑘𝛽𝑘 𝐹𝑅. Because |𝜉𝑘| ≤ 1, conditions (33) hold and hence the result follows from that of Gilbert and Nocedal [18]. □ 5. Numerical Results To show the efficiency of the proposed scaling technique for the Fletcher-Reeves method, we describe some numerical results. The results are obtained by applying the FR method with several scaled choices to a set of 46 different type of unconstrained optimization problems; their function names are given in Table 1. They belong to the CUTE [19] and Moré, Garbow and Hillstrom [20] collections of test problems. We also include the simple quadratic function of Al-Baali (see for example Al-Saidi [13]). For each type of problem, we let the dimension n varies as 2, 10, 100, 1000, 10000 so that the total number of test problems is 230. All th e ScFR methods that we consider are defined by Algorithm 1 and differ only in Step 5 for defining the scaling parameter 𝜉𝑘 unless otherwise stated. For the purposes of an accurate comparison, we use in Step 2 for all methods the Matlab line search program routine of Al- Baali (which essentially written in Fortran by Fletcher) which computes a value of the steplength αk that satisfies the strong Wolfe conditions (9) as described in Al-Baali and Fletcher [11] and Fletcher [1]. The run was stopped as in Step 1 when either ||𝑔𝑘|| ≤ 10 −6 or the number of iterations reaches 10 5 . We consider the following algorithms:  FR; the standard FR method, defined by Algorithm 1 with ρ = 10 −4 and σ = 0.1 in Step 2 and 𝜉𝑘 = 1 in Step 5. Table 1. The test functions. Number Function’s name Number Function’s name 1 A Simple Quadratic function 24 Trecanni 2 Extended White and Holst 25 Zettl 3 Extended Rosenbrock 26 Shallow 4 Extended Freudenstein and Roth 27 Generalized Quartic 5 Extended Beale 28 Axis Parallel hyper-ellipsoid 6 Extended Wood 29 Leon 7 Raydan 1 30 Generalized Tridiagonal 1 8 Generalized Tridiagonal 1 31 Generalized Tridiagonal 2 9 Diagonal 4 32 POWER 10 Extended Himmelblau 33 Quadratic QF1 11 Extended Hiebert 34 Extended quadratic penalty QP2 12 FLETCHCR 35 Dixon and Price 13 Extended Powell 36 Quartic 14 NONSCOMP 37 Diagonal4 15 Extended DENSCHNB 38 Colville 16 Extended quadratic penalty QP1 39 Schumer Steilitz 17 Extended Penalty I 40 Sphere 18 De Jong’s 41 Sum Squares 19 Hager 42 Powell Singular 20 Extended Maratos 43 Qing 21 Cube 44 Generalized PSC1 22 Three hump function 45 Perturbed Quadratic 23 Booth 46 Diagonal 2 IMPROVED FLETCHER–REEVES METHODS 147  ScFRi; same as FR except that in Step 5, 𝜉𝑘 = 𝜉𝑘 𝑖 , for i = 1, 2, 3, 4, defined by (15), (17), (20), and (21), respectively.  ScFRqi, i ≤ 4; same as ScFRi except that in Step 5, 𝜉𝑘 = 𝜉𝑘 𝑞𝑖 = min(max( 𝜉𝑘 𝑞 , ĉ) ,𝜉𝑘 𝑖) with ĉ = 0.001, where 𝜉𝑘 𝑞 is given by (28). We observed that ScFR1, ScFR4 and ScFRqi, for i = 1, 3, 4, are less efficient than the others. Therefore, we will not give further details about them. Although it is possible to study the behaviour of other methods that we considered here, we do not consider them in our comparisons because they were less efficient than the above algorithms. For a useful comparison, we used the performance profiles tool of Dolan and Moré [21], which compares some solvers on the above set of problems in terms of the number of line searches, the number of function evaluations, the number of gradient evaluations and CPU time, required to solve the problems. In general the performance profile 𝑃𝑀(𝜏), 𝜏 ≥ 0, is defined by the formula 𝑃𝑀(𝜏) = number of problems where log2(𝜏𝑃,𝑀) ≤ 𝜏 total number of problems , (34) where 𝜏𝑃,𝑀 is the performance ratio of the number of line searches (similarly for the other measures) required to solve problem p by a method M to the lowest number of line searches required to solve problem p. The ratio 𝜏𝑃,𝑀 is set to ∞ (or some large number) if the method M fails to solve problem p. The value of PM(τ) at τ = 0 gives the percentage of test problems for which the M method is the best. The value for τ large enough is the percentage of test problems that M method can solve. The relative efficiency and reliability of each method can be directly seen from the performance profiles: the higher the particular curve, the better is the corresponding method. The performance profiles for the FR, ScFR2, ScFR3 and ScFRq2 algorithms are given in Figures 1. We observe that all scaling techniques improve the performance of the FR method substantially. We also observe that ScFR2 is slightly better than ScFRq2 and both of them perform better than ScFR1. Thus, ScFR2 seems to be the best o f the ScFR algorithms. Figure 1. Comparison among FR, ScFR2, ScFR3 and ScFRq2 for σ = 0.1. (a) # Line searches (b) # Function Evaluation (c) # Gradient Evaluation (d) CPU time AMAL AL-SAIDI and MEHIDDIN AL-BAALI 148 (c) # Gradient Evaluation (d) CPU time Figure 2. Comparison among FR, ScFR2, ScFR3 and ScFRq2 for σ = 0.9. (a) # Line searches (b) # Function Evaluation To give further idea about the behaviour of these methods for a large value of σ, we repeated the run for the previous four methods but with 𝜎 = 0.9 instead of 𝜎 = 0.1 in Step 2. The comparisons are given in Figures 2. Although these figures show that the FR method failed to solve about 40% of the test problems which is expected, the three ScFR methods solved almost all problems successfully. Thus, the ScFR methods are superior to the FR method. To choose the best choice for σ, we repeated the run for the best ScFR2 method, using the values of σ = 0.1, 0.4, 0.7, 0.9 and 0.9999 (for the first value, the method is referred to as ScFR2). The comparisons are given in Figures 3. We observe that the performance of the ScFR method as improves as σ decreases in terms of the number of line searches, function evaluations and gradient evaluations. However, the choice σ = 0.1 is significantly better than the others in terms of CPU time. Thus, σ = 0.1 is the best value for all the measurements. Another useful observation is that the scaled methods solved all problems for not only 𝜎 < 1/2 but also for 𝜎 ≥ 1/2. Thus, the scaled technique improves not only the good behaviour of the FR method when σ is sufficiently small but also solved problems that cannot be solved by the FR method. We now compare the ScFR2 method with some useful CG methods such as the Polak-Ribière-Polyak [22, 23], Dai-Yuan [24] and Hager-Zhang [25] methods. These methods (referred to as PRP, DY and HZ) are defined by Algorithm 1 with 𝜉𝑘 = 1 in Step 5 and βk in Step 4 is given respectively by the following formulae: 𝛽𝑘 𝑃𝑅𝑃 = 𝛾𝑘 𝑇𝑔𝑘+1 ∥∥𝑔𝑘∥∥ 2 , (34) 𝛽𝑘 𝐷𝑌 = ∥∥𝑔𝑘+1∥∥ 2 𝑑𝑘 𝑇𝛾𝑘 (35) and IMPROVED FLETCHER–REEVES METHODS 149 (d) CPU time (c) # Gradient Evaluation 𝛽𝑘 𝐻𝑍 = (𝛾𝑘 − 2 ∥∥𝛾𝑘∥∥ 2 𝛾𝑘 𝑇𝑑𝑘 𝑑𝑘) 𝑇 𝑔𝑘+1. 𝛾𝑘 𝑇𝑑𝑘 . (36) Using 𝜎 = 0.1, the comparisons of the methods are given in Figures 4. We see that ScFR2 performs a little better than DY and a little worse than PRP and HZ. Thus, the ScFR2 method is efficient and there is a room for improving it. The above numerical results show indeed that all the ScFR methods perform well and the ScFR2 method is the best of them. 6. Conclusion We show that introducing a simple scaling technique to the well-known Fletcher-Reeves method maintains its global convergence property, improves its performance significantly and solved problems that cannot be solved by the FR method. Since the ScFR methods work well when the strong Wolfe conditions are used with σ ≥ 1/2, it is expected to extend their global convergence to σ < 1. It is also worth testing further possible choices for the FR scaled parameter (e.g., similar to that applied to the DY method by Esmaeili et al. [26]). Conflict of interest The authors declare no conflict of interest. Acknowledgment The authors would like to thank the editor and two anonymous referees for their critical and constructive comments Figure 3. ScFR2 Comparison among the values of σ = 0.1, 0.4, 0.7, 0.9 and 0.9999. (a) # Line searches (b) # Function Evaluation AMAL AL-SAIDI and MEHIDDIN AL-BAALI 150 which improve the quality of the paper. We would also like to thank Lucio Grandinetti, Calabria University, Italy, and Ekkehard Sachs, Trier University, Germany, for the useful discussions on the preparation of the paper. The first author would like to thank the Oman Ministry of Education for the award of a scholarship. Figure 4. Comparison among ScFR2, PRP, DY and HZ for σ = 0.1. References 1. Fletcher, R. Practical Methods of Optimization, 2nd edition, 2013, Wiley, Chichester, England. 2. Fletcher, R., and Reeves, C. Function minimization by conjugate gradients, Computational Journal, 1964, 7, 149-154. 3. Nocedal, J., and Wright, S. J. Numerical Optimization, 2nd Edition, 2006, Springer, London. 4. Pytlak, R. Conjugate Gradient Algorithms in Nonconvex Optimization, 2009, Springer, Berlin. 5. Hager W., and Zhang, H. A survey of nonlinear conjugate gradient methods, Pacific Journal of Optimization, 2006, 2, 35-58. 6. Al-Baali, M. A general class of conjugate gradient methods, 22 nd International Symposium on Mathematical Programming (ISMP), Pittsburg, USA, 2015. 7. Powell, M.J.D. Nonconvex minimization calculations and the conjugate gradient method. In Numerical Analysis, Lecture Notes in Mathematics (Ed. Griffiths D.F.), Springer, 1984, 1066, 122-141. 8. Zoutendijk, G. Nonlinear Programming, Computational Methods, in Integer and Nonlinear Programming (Ed. J. Abadie), North-Holland, Amsterdam, 1970, 37-86. 9. Al-Baali, M. Descent property and global convergence of the Fletcher-Reeves method with inexact line search, IMA Journal of Numerical Analysis, 1985, 5, 121-124. 10. Nocedal, J. Theory of algorithms for unconstrained optimization, Acta Numerica, 1992, 1, 199-242. 11. Al-Baali, M., and Fletcher, R. An efficient line search for nonlinear least squares, Journal of Optimization Theory and Applications, 1986, 48, 359-378. 12. Al-Saidi, A., Al-Baali, M., Fasano, G., and Roma, M. New scaled and shifted conjugate gradient methods for (a) # Line searches (b) # Function Evaluation (c) # Gradient Evaluation (d) CPU time IMPROVED FLETCHER–REEVES METHODS 151 large-scale unconstrained optimization, Technical Report DM215, Department of Mathematics, Sultan Qaboos University, Muscat, Oman, 2021. 13. Al-Saidi, A. Improved conjugate gradient methods for large-scale unconstrained optimization, PhD Thesis, Sultan Qaboos University, Muscat, Oman, 2021. 14. Narushima, Y., and Yabe, H. A survey of sufficient descent conjugate gradient methods for unconstrained optimization, SUT Journal of Mathematics, 2014, 50, 167-203. 15. Perry, J. M. A class of conjugate gradient algorithms with a two-step variable-metric memory, Discussion Paper 269, Center for Mathematical Studies in Economics and Management Sciences, Northwestern University, Evanston, Illinois, USA, 1977. 16. Dai, Y. H., and Liao, L. Z. New conjugacy conditions and related nonlinear conjugate gradient methods, Applied Mathematics and Optimization, 2001, 43, 87-101. 17. Touati-Ahmed, D., and Storey, C. Globally convergent hybrid conjugate gradient methods, Journal of Optimization Theory and Applications, 1990, 64, 379-397. 18. Gilbert, J. C., and Nocedal, J. Global convergence properties of conjugate gradient methods for optimization, SIAM Journal on Optimization, 1992, 2, 21-42. 19. Bongartz, I., Conn A. R., Gould, N., and Toint, Ph. L. CUTE: Constrained and unconstrained testing environment, ACM Transactions on Mathematical Software, 1995, 21, 123-160. 20. Moré, J. J., Garbow, B. S., Hillstrom, K. E. Testing unconstrained optimization software, ACM Transactions on Mathematical Software, 1981, 7, 17-41. 21. Dolan, E. D., Moré, J. J. Benchmarking optimization software with performance profiles, Mathematical Programming, 2002, 91, 201-213. 22. Polak, E., and Ribière, G. Note sur la convergence de directions conjugées, Rev. Francaise Informat Recherche Opertionelle, 1969, 3e Annee 16, 35-43. 23. Polyak, B. The conjugate gradient method in extreme problems, USSR Computational Mathematics and Mathematical Physics, 1969, 9, 94-112. 24. Dai, Y. H., and Yuan, Y. X. A nonlinear conjugate gradient method with a strong global convergence property, SIAM Journal on Optimization, 1999, 10, 177-182. 25. Hager, W., and Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM Journal on Optimization, 2005, 16, 170-192. 26. Esmaeili. H., Rostami, M., and Kimiaei. M. Extended Dai-Yuan conjugate gradient strategy for large-scale unconstrained optimization with applications to compressive sensing, Filomat, 2018, 32 (6), 2173-2191. Received 30 June 2021 Accepted 6 September 2021