SQU Journal for Science, 2021, 26(2), 141-151  DOI:10.53539/squjs.vol26iss2pp141-151 

Sultan Qaboos University  

 
141 
ϯ
Some preliminary numerical results were presented at 5

th
 International Conference on Numerical Analysis and Optimization (NAO-V), January 

2020, Muscat, Oman 

 
Improved Fletcher–Reeves Methods Based 
on New Scaling Techniques

ϯ 

Amal Al-Saidi
1,* 

and Mehiddin Al-Baali
2 

1
Department of Mathematics, Sultan Qaboos University, Muscat, Sultanate of Oman. 

2
Department 

of Mathematics, Sultan Qaboos University, Muscat, Sultanate of Oman; and GERAD-HEC, 
Montréal, Canada. *Email: Amal2.AlSaeedi@moe.om. 

ABSTRACT: This paper introduces a scaling parameter to the Fletcher-Reeves (FR) nonlinear conjugate gradient 

method. The main aim is to improve its theoretical and numerical properties when applied with inexact line searches to 

unconstrained optimization problems. We show that the sufficient descent and global convergence properties of Al-

Baali for the FR method with a fairly accurate line search are maintained. We also consider the possibility of extending 

this result to less accurate line search for appropriate values of the scaling parameter. The reported numerical results 

show that several values for the proposed scaling parameter improve the performance of the FR method significantly.  

 
Keywords: Unconstrained optimization; Large-scale optimization; Line search framework; Nonlinear conjugate 

gradient methods; Fletcher-Reeves method. 

 جديدةموازنة تقنيات إستنادا إلى ريفز المحسنة  -فليتشر طرق

 أمل السعيدي و محي الدين البعلي

ة عدديالنظرية وال الصفات. الهدف الرئيسي هو تحسين ةغيرالخطي ةطرق التدرج المترافقفي ريفز  -رفليتش لطريقة موازنةيقدم هذا البحث معامل  الملخص:

نحدارالكافي والتقارب اإلفي  البعليصفات الحفاظ على  بين إمكانيةغيرالمقيدة. ن على مسائل األمثليات ةتقريبي بحثمستقيمات عند تطبيقها مع لتلك الطريقة 

قيم مناسبة ستخدام إتقريبي ببحث  مستقيمهذه النتيجة إلى  عميمعتبارأيًضا إمكانية تاإل. نأخذ في ة معقولةدقتقريبي ببحث  مستقيممع  FRلطريقة  لامشال

 .بشكل ملحوظ  FRن أداء طريقةيتحسل موازنةالعديد من قيم معامل الوجود . تظهرالنتائج العددية موازنةلمعامل ال

 ريفز. -فليتشر، طريقة ةغيرالخطي ة، طرق التدرج المترافقالبحث  مستقيم، مسائل ذات أبعاد عالية، غيرالمقيدة األمثليات  الكلمات المفتاحية:

 
1. Introduction 

This paper is concerned with the line-search Fletcher-Reeves (FR) conjugate gradient (CG) method for solving the 
unconstrained optimization problem 

 min
𝑥∈𝑅𝑛

 𝑓(𝑥),                                                                               (1) 

where f: Rn → R is a smooth function and its gradient 𝑔(𝑥) = ∇f (x) is available for any value  of x. For a given 
starting point x1, the method defines a sequence of points {𝑥𝑘} iteratively as follows. On each iteration, once 
the function value 𝑓𝑘 = 𝑓(𝑥𝑘) and the gradient value 𝑔𝑘 = 𝑔(𝑥𝑘) are calculated, a search direction 𝑑𝑘 is 
defined such that the descent property 𝑑𝑘

𝑇𝑔𝑘 < 0 holds, assuming 𝑔𝑘 ≠ 0 since 𝑔𝑘 = 0 holds at a solution of 
problem (1). Then it is    possible to find a positive steplength αk and hence a new point 

mailto:Amal2.AlSaeedi@moe.om


AMAL AL-SAIDI
 
and MEHIDDIN AL-BAALI 

 
142 

 
 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑑𝑘  (2) 

such that the function reduction 𝑓𝑘 − 𝑓𝑘+1 is sufficiently positive (see for example Fletcher [1]). 
In the class of CG methods, the search direction 𝑑𝑘 is defined initially by the steepest descent choice  𝑑1= −𝑔1 

that  satisfies  the  above  descent  property for k = 1 .  The other directions are defined by 

 𝑑𝑘+1 = −𝑔𝑘+1 + 𝛽𝑘𝑑𝑘, (3) 

where βk is the CG parameter. In particular, Fletcher and      Reeves [2] propose the positive value of 
 

𝛽𝑘 =
∥∥𝑔𝑘+1∥∥

2

∥∥𝑔𝑘∥∥
2   

(4) 

(referred to as 𝛽𝑘
𝐹𝑅), where || ∙ || denotes the Euclidean norm. If the exact line search equation  

 𝑑𝑘
𝑇𝑔𝑘+1 = 0  (5) 

is satisfied, which is guaranteed if the steplength αk = α
∗ is obtained by solving the minimization subproblem              

α∗ = arg min𝛼 𝑓(𝑥𝑘 + 𝛼𝑑𝑘) exactly, then the above descent property with 𝑘 replaced by 𝑘 + 1 is also satisfied. Since 
solving this subproblem exactly (referred to as exact line search) is impractical, inexact line search is usually used so 

that the descent property may not hold. In this case, choice (4) is replaced by 𝛽𝑘 = 0 (i.e., direction (3) is  reset to 
that of the steepest descent) so that the descent property holds for all k (for further detail, see for example Fletcher [1], 

Nocedal and Wright [3] and Pytlak [4]). Because this resetting may destroy the useful features of the CG method, in 

certain cases, many choices for replacing the CG parameter FR formula have been proposed (see for example Hager 

and Zhang [5] and the references therein). If exact line search is employed and f (x) is quadratic, all the 

directions generated via (3) are mutually conjugate. Hence, the gradients of 𝑓(𝑥) at the different iterates are 
mutually orthogonal,  i.e., we have 

 𝑔𝑘+1
𝑇 𝑔𝑘 = 0, (6) 

and these formulae are reduced to the FR one (see e.g.  [1, 3, 4]).  

Our aim in this paper is to maintain the global convergence property of the FR method and improve its behavior 

by scaling formula (4) by 𝜉𝑘 before using it in (3) (as in Al-Baali [6]). Therefore, the resulting scaled direction 
 

 𝑑𝑘+1 = −𝑔𝑘+1 + 𝜉𝑘𝛽𝑘𝑑𝑘  (7) 
 

would be close to the steepest descent one if 𝜉𝑘 is chosen sufficiently close to zero. Hence, by continuity, the 
corresponding class of scaled CG methods (referred to as ScFR when 𝛽𝑘 = 𝛽𝑘

𝐹𝑅) would satisfy the descent and 

convergence properties that the steepest descent method has. 

It is important to mention that Powell [7] and essentially Zoutendijk [8] show that the FR method with exact line 

searches converges globally, in the sense that lim𝑘→∞ inf ‖𝑔𝑘‖ = 0. This result has been extended by Al-Baali [9] to 
inexact line searches with a fairly accurate values of αk, known as the first practical global convergence result for the FR 

method (see for example Nocedal [10]). Al-Baali has obtained this result based on showing that the sufficient descent 

condition 

 𝑑𝑘
𝑇𝑔𝑘 ≤ −𝑐∥∥𝑔𝑘∥∥

2
       (8) 

holds for some positive constant c, assuming the steplength αk satisfies the strong Wolfe conditions 
 

                                                 𝑓𝑘+1 ≤ 𝑓𝑘 + 𝜌𝛼𝑘𝑔𝑘
𝑇𝑑𝑘,       | 𝑔𝑘+1 

𝑇 𝑑𝑘 | ≤ −𝜎𝑔𝑘
𝑇𝑑𝑘,     

                           
      (9)  

for 0 < ρ < σ < 
1

2
 .  It is worth noting that a value of 𝛼𝑘 > 0 which satisfies these conditions, for    0 < ρ <  

1

2
 and ρ < 

σ < 1, can be found in a finite number of operations whenever the above descent property holds (see for example 

Al-Baali and Fletcher [11] and Fletcher [1]). 

In Section 2, we show that the ScFR class of methods with 𝜉𝑘 ∈ (0, 1] maintains the sufficient  descent property 

that the FR method has for the strong Wolfe conditions (9) with σ < 
1

2
. We will also enforce this property for any 

line search technique with sufficiently small values of the scaling parameter ξk. In addition, that section defines some 

choices for 𝜉𝑘. In Section 3, we introduce the quasi-Newton feature to 𝜉𝑘 for the FR method as considered by Al-Saidi 
et al. [12] for the scaled CG methods. Section 4 shows that the proposed class of ScFR methods maintains the global 

convergence property that the FR method has when the strong Wolfe conditions are employed with σ < 
1

2
. In 

Section 5, we study the behavior of the proposed ScFR methods by applying them to a set of standard test 

problems. It is shown that the proposed scaling technique improves the performance of the FR method significantly 

in many cases. Finally, Section 6 concludes the paper.  

2. The ScFR Class of Methods 

We now define the scaled FR (ScFR) class of methods as suggested by Al-Baali [6] for the CG class of methods.  

The author replaces the search direction (3) by (7), for k > 1, where 0 <  𝜉𝑘 ≤ 1 is  the  scaling  parameter,  and  
maintains  𝑑1 = −𝑔1. Since  the  value  of  𝜉𝑘 = 0 reduces this direction to that of the steepest descent, it follows by 


IMPROVED FLETCHER–REEVES METHODS 

143 

 
continuity that the ScFR class of methods, with values of 𝜉𝑘 sufficiently close to zero, has the sufficient descent and 
global convergence properties that the steepest descent method has. 

       We first define some values of 𝜉𝑘 that enforce the sufficient descent condition (8) as follows (Al-Baali [6]). We 
note that this condition holds for k = 1 and c = 1. We now prove this condition for k ≥ 1 as follows. On substituting 

(7) into (8) with k replaced by k + 1, we have 
 

 𝑔𝑘+1
𝑇 𝑑𝑘+1 = −∥∥𝑔𝑘+1∥∥

2
+ 𝜉𝑘𝛽𝑘𝑑𝑘

𝑇𝑔𝑘+1 ≤ −𝑐∥∥𝑔𝑘+1∥∥
2
. (10) 

Hence, 

 𝜉𝑘𝛽𝑘(𝑑𝑘
𝑇𝑔𝑘+1) ≤ (1 − 𝑐)∥∥𝑔𝑘+1∥∥

2
 (11) 

or equivalently by the FR formula (4),  

 𝜉𝑘(𝑑𝑘
𝑇𝑔𝑘+1) ≤ (1 − 𝑐)∥∥𝑔𝑘∥∥

2
, (12) 

where 0 < c ≤ 1. This upper bound on c is included because condition (11) cannot hold for c > 1 and 𝑑𝑘
𝑇𝑔𝑘+1= 0 (the 

exact line search option). We note that (11) holds if its left hand side is nonpositive or either 𝜉𝑘, 𝑑𝑘
𝑇𝑔𝑘+1 or βk is 

sufficiently close to zero. 

This paper assumes that the CG parameter βk is given by the FR formula (4), unless otherwise stated, and defines 

𝜉𝑘 such that (11) holds whether dk satisfies the sufficient descent condition (8) or not. For convenience, we start with 
the following result of Al-Saidi et al. [12] assuming βk satisfies (8). 

Theorem 1. Consider the ScFR class of methods, defined by (2), (4) and (7) and assume that the       search direction 

(7) with 𝜉𝑘= 1 satisfies the sufficient descent condition (8). Then, this condition remains satisfied for 0 ≤ 𝜉𝑘 ≤ 1. 
Proof.  When k  = 1, 𝑑𝑘= −𝑔𝑘 and hence condition (8) holds with c = 1.  For k ≥ 1, it follows from (7) that 
 

 𝑔𝑘+1
𝑇 𝑑𝑘+1 = −∥∥𝑔𝑘+1∥∥

2
+ 𝜉𝑘𝛽𝑘(𝑑𝑘

𝑇𝑔𝑘+1). (13) 

If 𝛽𝑘(𝑑𝑘
𝑇𝑔𝑘+1) < 0, it follows that 

 
𝑔𝑘+1
𝑇 𝑑𝑘+1 ≤ −∥∥𝑔𝑘+1∥∥

2
,  

which is (8) with c = 1 and k replaced by k + 1,  since 𝜉𝑘  ≥ 0. Otherwise, if 𝛽𝑘(𝑑𝑘
𝑇𝑔𝑘+1) ≥ 0, then (13) implies 

 
 𝑔𝑘+1
𝑇 𝑑𝑘+1 ≤ −∥∥𝑔𝑘+1∥∥

2
+ 𝛽𝑘(𝑑𝑘

𝑇𝑔𝑘+1), 
 

since 𝜉𝑘 ≤ 1. Hence, by the theorem assumption we obtain (8) with k replaced by k + 1.                                □ 
 

      We note that this result remains valid for any value of c > 0 if the choice 𝜉𝑘 = 1 is used when 𝛽𝑘(dk
T 𝑔𝑘+1) ≤ 0 as  

we consider  below  for  some  choices of 𝜉𝑘. It is important to note that if the strong Wolfe conditions (9) are 

employed with σ < 1/2, then the sufficient descent condition (8) holds for c = 𝑐σ ≡ 
1−2σ 

1−σ
 (see Al-Baali [9], for detail). 

Thus, for any choice of c ≤ cσ,  𝜉𝑘 = 1 and the ScFR method is reduced to the standard FR method. However, using 
values of c > 𝑐σ  require defining  𝜉𝑘 < 1, for which we suggest the following way. 
       If condition (11) holds with 𝜉𝑘 = 1, then we use this value so that the CG search direction is unchanged. 
Otherwise, we choose 𝜉𝑘 such that condition (11) holds with equality. Thus, we     choose 
 

𝜉𝑘 = {

(1 − 𝑐)∥∥𝑔𝑘+1∥∥
2

𝛽𝑘(𝑑𝑘
𝑇𝑔𝑘+1)

     if 𝛽𝑘(𝑑𝑘
𝑇𝑔𝑘+1) > (1 − 𝑐)∥∥𝑔𝑘+1∥∥

2
,

1      otherwise,

 
(14) 

or equivalently for the ScFR method, 

 
𝜉𝑘
1 = {

(1 − 𝑐)∥∥𝑔𝑘∥∥
2

𝑑𝑘
𝑇𝑔𝑘+1

     if 𝑑𝑘
𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥

2
,

1      otherwise. 

 
(15) 

In this case, it follows that 
 

𝜉𝑘
1𝛽𝑘 = {

(1 − 𝑐)∥∥𝑔𝑘+1∥∥
2

𝑑𝑘
𝑇𝑔𝑘+1

     if 𝑑𝑘
𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥

2
,

𝛽𝑘
𝐹𝑅      otherwise

 
(16) 

which shows that scaling is employed only when αk is sufficiently remote away from the right hand side of a 

minimizer α∗. To be precise,  𝜉𝑘  = 1 is used when  𝛽𝑘(𝑑𝑘
𝑇𝑔𝑘+1) ≤ 0, although in this case (11) holds for any value of 

𝜉𝑘 > 0. To obtain the least change in the CG parameter, c should be chosen close to zero (in practice, the value 
of c = 0.001 seems reasonable as observed by Al-Saidi [13]). We also note that  𝜉𝑘

1= 1 usually is used for a fairly 

accurate line search (σ ≤ 0.1). Thus, we consider the outline of the scaled FR (ScFR) method as in Algorithm 1 

that is reduced to the standard FR method if 𝜉𝑘=1 for all k. If in Step 2 exact line searches (or nearly so) are 
employed for all k, then Algorithm 1 is reduced to the standard FR method. In practice, we observed that the above 

scaling technique improves the performance of the FR method significantly in several cases (see Section 5, for details). 


AMAL AL-SAIDI
 
and MEHIDDIN AL-BAALI 

 
144 

 
Algorithm 1 (ScFR) 

Step 0:  Given ε > 0, 0 < c ≤ 1 and x1, let 𝑑1= −𝑔1 and set k = 1 
Step 1:  Stop if ||𝑔1||≤ ε 
Step 2: Compute a positive steplength αk which satisfies certain standard conditions 

Step 3: Compute a new point by (2);  𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑑𝑘   
Step 4: Define the FR parameter βk by (4)  

Step 5: Define 𝜉𝑘  (e.g., by (14)) 
Step 6:  Compute a new search direction by (7);  𝑑𝑘+1 = −𝑔𝑘+1 + 𝜉𝑘𝛽𝑘𝑑𝑘 
Step 7: Set k := k + 1 and go to Step 1. 

 
It is important to realize that the choice of 𝜉𝑘
1 in (15) is independent of the line search technique. 

However, if the strong Wolfe conditions (9) are employed, the first case of (15) could be replaced by a smaller value 

to obtain  
 

𝜉𝑘
2 = {

(1 − 𝑐)∥∥𝑔𝑘∥∥
2

𝜎|𝑑𝑘
𝑇𝑔𝑘|

     if 𝑑𝑘
𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥

2
,

1      otherwise. 

 
(17) 

 
This choice satisfies the sufficient descent condition (11) with 𝜉𝑘 = 𝜉𝑘
2, because 𝜉𝑘

2 ≤ ξk
1 and we have 𝑑𝑘

𝑇𝑔𝑘+1 > 0  and 

ξk
2(𝑑𝑘

𝑇𝑔𝑘+1) ≤ ξ1(𝑑𝑘
𝑇𝑔𝑘+1) = (1 − c) ∥∥𝑔𝑘∥∥

2
. Indeed, replacing 𝜉𝑘

2 by a smaller value, which we consider below, 

maintains the sufficient descent condition (12). In practice, for c = 0.001, we observed that choice (17) works better 

than (15). 

We now consider the rearrangement of Al-Baali [6] for the FR parameter as 
 

 𝛽𝑘 = 𝜂𝑘
𝑇𝑔𝑘+1,       𝜂𝑘 =

𝑔𝑘+1

∥∥𝑔𝑘∥∥
2 ,    (18) 

 
which is possible for most CG methods (e.g., see Hager and Zhang [5] and Narushima and Yabe [14] and the 

references therein). Using this form and observing that 
 

 𝑑𝑘
𝑇𝑔𝑘+1 ≤ ∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥, (19) 

the first case of (15) can be reduced to obtain 
 

𝜉𝑘
3 = {

(1−𝑐)∥∥𝑔𝑘∥∥
2

∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥
     if 𝑑𝑘

𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥
2

1     otherwise. 
, 

(20) 

  
To increase the interval for the scaling parameter, we use (19) again in the condition of (20) so that we obtain 

the following choice of Al-Baali [6]: 
 

𝜉𝑘
4 = {

(1−𝑐)∥∥𝑔𝑘∥∥
2

∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥
     if ∥∥𝑑𝑘∥∥∥∥𝑔𝑘+1∥∥ > (1 − 𝑐)∥∥𝑔𝑘∥∥

2

1     otherwise .
, 

(21) 

We note that the interval for using 𝜉𝑘 < 1 in (21) is larger than or equal to that used by (20) so  that sufficiently 
large values of c might be used. Indeed, in practice, we observed that choice (17) is preferable to (21). Thus, we 

will not provide further details about (21). 

3. A Quasi-Newton Feature 

One feature for choices (20) and (21) is given by Al-Baali [6] who shows that the values of 𝜉𝑘 ≤ 
(1−𝑐)

∥∥η𝑘∥∥∥∥d𝑘∥∥
, 

where η for the FR formula is given in (18), enforce some positive eigenvalues of the ScCG matrix 
 

  𝐻𝑘+1 = 𝐼 − 𝜉𝑘𝑑𝑘𝜂𝑘
𝑇,    

 
(22) 

noting that direction (7) (similar to that of Perry [15]) can be rearranged as follows 
 

 𝑑𝑘+1 = −𝐻𝑘+1𝑔𝑘+1. 
 

(23) 

Thus, we introduce another feature to the ScFR method like that of Al-Saidi et al. [12] for the CG  methods by 

modifying the scaling parameter 𝜉𝑘 such that ideally the quasi-Newton condition 
 

 𝐻𝑘+1𝛾𝑘 = 𝛿𝑘 , (24) 
Where 
 

 𝛿𝑘 = 𝑥𝑘+1 − 𝑥𝑘,     𝛾𝑘 = 𝑔𝑘+1 − 𝑔𝑘, (25) 
 

holds, assuming 𝐻𝑘+1 approximates the inverse of Hessian of 𝑓(𝑥) at 𝑥𝑘. Because, in general, we cannot fulfill 
condition (24), we let 


IMPROVED FLETCHER–REEVES METHODS 

145 

 
             𝜉𝑘 = arg min
𝜉
 ∥∥𝐻𝑘+1𝛾𝑘 − 𝛿𝑘∥∥ = arg min 

𝜉
 ∥∥𝛾𝑘 − 𝛿𝑘 − 𝜉(𝜂𝑘

𝑇𝛾𝑘)𝑑𝑘∥∥. 
 

(26) 

Solving this minimization problem (see for example Al-Saidi et al. [12]), it follows that 
 

                                                                𝜉𝑘

𝑞
=

(𝛾𝑘−𝛿𝑘)
𝑇𝑑𝑘

𝜂𝑘
𝑇𝛾𝑘∥∥𝑑𝑘∥∥

2 . 
(27) 

On substituting 𝜂𝑘 for the FR formula (18), (27) is reduced to 
 

𝜉𝑘
𝑞
=

(𝛾𝑘−𝛿𝑘)
𝑇𝑑𝑘∥∥𝑔𝑘∥∥

2

𝛾𝑘
𝑇𝑔𝑘+1∥∥𝑑𝑘∥∥

2 . 
(28) 

To maintain the search direction differs from that of the steepest descent, the values of 𝜉𝑘 should be bounded 
away from zero. Since, in addition, the first case of (15) enforces the sufficient descent condition (12) with equality, 

and that replacing 𝜉𝑘 by a smaller value keeps inequality (12) holds, we may enforce 𝜉𝑘 ∈ [ĉ, 1],  where ĉ is a small 
positive number. Thus, we modify (15), which defines 𝜉𝑘

1, to 
 

𝜉𝑘
𝑞1
=

{
 
 
(1 − 𝑐)∥∥𝑔𝑘∥∥

2

𝑑𝑘
𝑇𝑔𝑘+1

      if 𝑑𝑘
𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥

2
 and 𝜉𝑘

𝑞
∉ [�̂�,1],

min{
(1 − 𝑐)∥∥𝑔𝑘∥∥

2

𝑑𝑘
𝑇𝑔𝑘+1

,𝜉𝑘
𝑞
}     if 𝑑𝑘

𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥
2
 and 𝜉𝑘

𝑞
∈ [�̂�,1],

1      otherwise. 

 
(29) 

and modify (17), which defines 𝜉𝑘
2, to 

 
𝜉𝑘
𝑞2
=

{
 
 
(1 − 𝑐)∥∥𝑔𝑘∥∥

2

𝜎|𝑑𝑘
𝑇𝑔𝑘|

      if 𝑑𝑘
𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥

2
 and 𝜉𝑘

𝑞
∉ [�̂�,1],

min{
(1 − 𝑐)∥∥𝑔𝑘∥∥

2

𝜎|𝑑𝑘
𝑇𝑔𝑘|

,𝜉𝑘
𝑞
}     if 𝑑𝑘

𝑇𝑔𝑘+1 > (1 − 𝑐)∥∥𝑔𝑘∥∥
2
 and 𝜉𝑘

𝑞
∈ [�̂�,1],

1      otherwise. 

 
(30) 

Since choices (29) and (30) can be rewritten as  𝜉𝑘
𝑞𝑖
= min(max( 𝜉𝑘

𝑞
, ĉ) ,𝜉𝑘

𝑖), for 𝑖 = 1,2 respectively, we observe that 

𝜉𝑘
𝑞𝑖
≤ 𝜉𝑘

𝑖 . Similarly, we modify any 𝜉𝑘
𝑖 , for i = 1, 2, 3, 4, to 𝜉𝑘

𝑞𝑖
= min(max(𝜉𝑘

𝑞
, ĉ), 𝜉𝑘

𝑖 ). In practice, we observed that 

these modified choices with 𝜎 =  0.1 and ĉ = 0.001 improves the performance of 𝜉𝑘
2 (further possible choices and 

details can be seen in Al-Saidi [13]). Therefore, the above choices for 𝜉𝑘 are chosen to enforce the sufficient descent 
condition as shown in the following result which is an extension to Theorem 1. 

Theorem 2. Consider the ScFR class of methods, defined by (2), (4) and (7), and let 𝜉𝑘 = 𝜉𝑘
𝑖  or 𝜉𝑘 = 𝜉𝑘

𝑞𝑖
, for 

𝑖 =  1,2,3,4. Then, the sufficient descent condition (8) holds for all k ≥ 1. 

Proof. It is obvious for k = 1. For k ≥ 1, 𝜉𝑘 = 𝜉𝑘
𝑖  , 1 ≤ i ≤ 4, are chosen such that inequality (12) holds which is 

equivalent to the sufficient descent condition (10). Since 𝜉𝑘
𝑞𝑖

 ≤        𝜉𝑘
𝑖 , 1 ≤ i ≤ 4, inequalities (11) and (12) remain 

satisfied so that condition (10) holds. Hence, condition (10) implies (8).                                                    □ 

Introducing other features to the scaling parameter 𝜉𝑘 are also possible. In particular, we could introduce the 

useful conjugacy condition γ𝑘
𝑇 𝑑𝑘+1= 0. This equation with (7) holds if 𝜉𝑘 = 𝜉𝑘

𝑐 ≡
γ𝑘
𝑇 𝑔𝑘+1 

β𝑘γ𝑘
𝑇 𝑑𝑘 

 which exists if the Wolfe 

condition holds. Similarly if we consider the modified conjugacy condition of Dai and Liao [16], γ𝑘
𝑇 𝑑𝑘+1= td𝑘

𝑇 𝑔𝑘+1 

where t ≥ 0 is a scalar, we obtain 𝜉𝑘 =
(𝛾𝑘−𝑡𝑑𝑘)

𝑇𝑔𝑘+1

𝛽𝑘𝛾𝑘
𝑇𝑑𝑘

 which reduces direction (7), for any βk to the class of Dai and 

Liao. 

 It is worth noting that Al-Baali [6] defines a bound on 𝜉𝑘 to ensure that matrix (22) is   positive definite, but we do 
not consider it here because it may increase the value of  𝜉𝑘 so that the sufficient descent condition (11) may not hold. 

4. Global Convergence Property 

We now study the possibility of imposing the global convergence property 
 

 lim
𝑘→∞

 inf  ∥∥𝑔𝑘∥∥ = 0 (31) 
   

to the ScFR methods which we proposed in the previous sections such that the sufficient descent condition (8) holds 

for all k ≥ 1. Therefore, we first state the following standard assumptions    on the objective function. 

Assumption 1. 

1. The level set ℒ = {x: f (x) ≤ f (x0)} is bounded, 
2. T he objective function f is continuously differentiable in some neighborhood 𝒩 of ℒ,   
3. The gradient 𝑔(𝑥) 𝑖𝑠 Lipschitz continuous,  that is there exists a positive constant L such that 

 ∥ 𝑔(𝑥) − 𝑔(�̃�) ∥≤ 𝐿 ∥ 𝑥 − �̃� ∥  for all 𝑥, �̃� ∈ 𝒩. (32) 
 

AMAL AL-SAIDI
 
and MEHIDDIN AL-BAALI 

 
146 

 
     Supposing Assumption 1 holds, Powell [7] and Zoutendijk [8] prove the ideal result that the FR method, defined by 

(2), (3) and (4), satisfies the global convergence property (31) if exact line searches are used. This means that 

impractical choice of 𝜎 = 0 in (9) is considered. Al-Baali [9], however, considers the practical strong Wolfe 

conditions with 𝜎 <
1

2
  to show that the sufficient descent condition (8) holds and extends the above global 

convergence result to  the practical FR method. This result has been extended by Touati-Ahmed and Storey [17] to 

the interval 0 ≤ βk ≤  𝛽𝑘
𝐹𝑅, while Gilbert and Nocedal [18] have extended it to 

 
 −𝛽𝑘
𝐹𝑅 ≤ 𝛽𝑘 ≤ 𝛽𝑘

𝐹𝑅. (33) 
 

Thus, we obtain the following convergence result for the ScFR methods. 

Theorem 3. Suppose Assumption 1 holds and let the ScFR class of methods be defined by (2), (4),  (7), and |𝜉𝑘| ≤ 1. 

If the strong Wolfe conditions (9) are employed with 𝜎 <
1

2
 , then the ScFR methods converge globally in the 

sense that limit (31) holds. 

Proof. The ScFR methods are defined in the previous sections like the FR method, except that    the FR formula 

𝛽𝑘
𝐹𝑅 > 0, defined by (4), is replaced by 𝛽𝑘

𝑆𝑐𝐹𝑅 = 𝜉𝑘𝛽𝑘
𝐹𝑅. Because |𝜉𝑘| ≤ 1, conditions (33) hold and hence the 

result follows from that of Gilbert and Nocedal [18].                                                                                   □ 

5.  Numerical Results 

To show the efficiency of the proposed scaling technique for the Fletcher-Reeves method, we    describe some 

numerical results. The results are obtained by applying the FR method with several scaled choices to a set of 46 

different type of unconstrained optimization problems; their function names are given in Table 1. They belong to the 

CUTE [19] and Moré, Garbow and Hillstrom [20] collections of test problems. We also include the simple quadratic 

function of Al-Baali (see for example Al-Saidi [13]). For each type of problem, we let the dimension n varies as 2, 

10, 100, 1000, 10000 so that the total number of test problems is 230. All th e ScFR methods that we consider are 

defined by Algorithm 1 and differ only in Step 5 for defining the scaling parameter 𝜉𝑘 unless otherwise stated. For the 
purposes of an accurate comparison, we use in Step 2 for all methods the Matlab line search program routine of Al-

Baali (which essentially written in Fortran by Fletcher) which computes a value of the steplength αk that satisfies the 

strong Wolfe conditions (9) as described in Al-Baali and Fletcher [11] and Fletcher [1]. The run was stopped as in  

Step 1 when either ||𝑔𝑘|| ≤ 10
−6

 or the number of iterations reaches 10
5
.  We consider the following algorithms: 

 FR; the standard FR method, defined by Algorithm 1 with ρ = 10
−4

 and σ = 0.1 in Step 2  and 𝜉𝑘 = 1 in 
Step 5. 

 
      Table 1. The test functions. 

 
Number Function’s name Number Function’s name 

1 A Simple Quadratic function 24 Trecanni 

2 Extended White and Holst 25 Zettl 

3 Extended Rosenbrock 26 Shallow 

4 Extended Freudenstein and Roth 27 Generalized Quartic 

5 Extended Beale 28 Axis Parallel hyper-ellipsoid 

6 Extended Wood 29 Leon 

7 Raydan 1 30 Generalized Tridiagonal 1 

8 Generalized Tridiagonal 1 31 Generalized Tridiagonal 2 

9 Diagonal 4 32 POWER 

10 Extended Himmelblau 33 Quadratic QF1 

11 Extended Hiebert 34 Extended quadratic penalty QP2 

12 FLETCHCR 35 Dixon and Price 

13 Extended Powell 36 Quartic 

14 NONSCOMP 37 Diagonal4 

15 Extended DENSCHNB 38 Colville 

16 Extended quadratic penalty QP1 39 Schumer Steilitz 

17 Extended Penalty I 40 Sphere 

18 De Jong’s 41 Sum Squares 

19 Hager 42 Powell Singular 

20 Extended Maratos 43 Qing 

21 Cube 44 Generalized PSC1 

22 Three hump function 45 Perturbed Quadratic 

23 Booth 46 Diagonal 2 

 
IMPROVED FLETCHER–REEVES METHODS 

147 

 
 ScFRi; same as FR except that in Step 5, 𝜉𝑘 = 𝜉𝑘
𝑖 , for i = 1, 2, 3, 4, defined by (15), (17), (20), and (21), 

respectively. 

 ScFRqi,  i ≤ 4; same as ScFRi except that in Step 5,  𝜉𝑘 =  𝜉𝑘
𝑞𝑖
= min(max( 𝜉𝑘

𝑞
, ĉ) ,𝜉𝑘

𝑖) with ĉ = 0.001, where 

𝜉𝑘
𝑞
 is given by (28). 

We observed that ScFR1, ScFR4 and ScFRqi, for i = 1, 3, 4, are less efficient than the others. Therefore, 

we will not give further details about them. Although it is possible to study the behaviour of other methods that we 

considered here, we do not consider them in our comparisons because they were less efficient than the above 

algorithms.  

For a useful comparison, we used the performance profiles tool of Dolan and Moré [21], which compares some 

solvers on the above set of problems in terms of the number of line searches, the number of function evaluations, the 

number of gradient evaluations and CPU time, required to solve the problems. In general the performance profile 

𝑃𝑀(𝜏), 𝜏 ≥ 0, is defined by the formula  
 

𝑃𝑀(𝜏) =
 number of problems where log2(𝜏𝑃,𝑀) ≤ 𝜏

 total number of problems 
, 

(34) 

 
where  𝜏𝑃,𝑀 is the performance ratio of the number of line searches (similarly for the other measures) required to solve 

problem p by a method M to the lowest number of line searches required to solve problem p. The ratio 𝜏𝑃,𝑀 is set to ∞ 
(or some large number) if the method M fails to solve problem p. The value of PM(τ) at τ = 0 gives the percentage of 
test problems for which the M method  is the best. The value for τ large enough is the percentage of test problems that 

M method can solve. The relative efficiency and reliability of each method can be directly seen from the performance 

profiles: the higher the particular curve, the better is the corresponding method. 

The performance profiles for the FR, ScFR2, ScFR3 and ScFRq2 algorithms are given in Figures 1. We observe 

that all scaling techniques improve the performance of the FR method substantially. We also observe that ScFR2 is 

slightly better than ScFRq2 and both of them perform better than ScFR1. Thus, ScFR2 seems to be the best o f the 

ScFR algorithms. 

 
            Figure 1. Comparison among FR, ScFR2, ScFR3 and ScFRq2 for σ = 0.1. 

 
(a) # Line searches  (b) # Function Evaluation 

 
(c) # Gradient Evaluation  (d) CPU time 


AMAL AL-SAIDI
 
and MEHIDDIN AL-BAALI 

 
148 

 
(c) # Gradient Evaluation  

 
(d) CPU time 

Figure 2. Comparison among FR, ScFR2, ScFR3 and ScFRq2 for σ = 0.9. 

 
(a) # Line searches

  
 (b) # Function Evaluation 

 
       To give further idea about the behaviour of these methods for a large value of σ, we repeated the run for the 

previous four methods but with 𝜎 =  0.9 instead of 𝜎 =  0.1 in Step 2. The comparisons are given in Figures 2. 
Although these figures show that the FR method failed to solve about 40% of the test problems which is expected, 

the three ScFR methods solved almost all problems successfully. Thus, the ScFR methods are superior to the FR 

method. 

       To choose the best choice for σ, we repeated the run for the best ScFR2 method, using the values of σ = 0.1, 0.4, 

0.7, 0.9 and 0.9999 (for the first value, the method is referred to as ScFR2). The comparisons are given in Figures 

3. We observe that the performance of the ScFR method as improves as σ decreases in terms of the number of line 

searches, function evaluations and gradient evaluations. However, the choice σ = 0.1 is significantly better than the 

others in terms of CPU time. Thus, σ = 0.1 is the best value for all the measurements.  Another useful observation is 

that the scaled methods solved all problems for not only 𝜎 < 1/2  but also for 𝜎 ≥  1/2. Thus, the scaled technique 
improves not only the good behaviour of the FR method when σ is sufficiently small but also solved problems that 

cannot be solved by the FR method. 

We now compare the ScFR2 method with some useful CG methods such as the Polak-Ribière-Polyak [22, 23], 

Dai-Yuan [24] and Hager-Zhang [25] methods. These methods (referred to as PRP, DY and HZ) are defined by 

Algorithm 1 with 𝜉𝑘 = 1 in Step 5 and βk in Step 4 is given respectively by the following formulae: 
 

𝛽𝑘
𝑃𝑅𝑃 =

𝛾𝑘
𝑇𝑔𝑘+1

∥∥𝑔𝑘∥∥
2
, 

(34) 

 
𝛽𝑘
𝐷𝑌 =

∥∥𝑔𝑘+1∥∥
2

𝑑𝑘
𝑇𝛾𝑘

 
(35) 

and 


IMPROVED FLETCHER–REEVES METHODS 

149 

 
(d) CPU time (c) # Gradient Evaluation  

𝛽𝑘
𝐻𝑍 = (𝛾𝑘 − 2

∥∥𝛾𝑘∥∥
2

𝛾𝑘
𝑇𝑑𝑘

𝑑𝑘)

𝑇
𝑔𝑘+1.

𝛾𝑘
𝑇𝑑𝑘

. 
  (36) 

Using 𝜎 =  0.1, the comparisons of the methods are given in Figures 4. We see that ScFR2 performs a little better than 
DY and a little worse than PRP and HZ. Thus, the ScFR2 method is efficient and there is a room for improving it. The 

above numerical results show indeed that all the ScFR methods perform well and the ScFR2 method is the best of them. 

 
6. Conclusion 

We show that introducing a simple scaling technique to the well-known Fletcher-Reeves method maintains its global 

convergence property, improves its performance significantly and solved problems that cannot be solved by the FR method. 

Since the ScFR methods work well when the strong Wolfe conditions are used with σ ≥ 1/2, it is expected to extend their 

global convergence to σ < 1. It is also worth testing further possible choices for the FR scaled parameter (e.g., similar to 

that applied to the DY method by Esmaeili et al. [26]). 

Conflict of interest 

The authors declare no conflict of interest. 

Acknowledgment 

The authors would like to thank the editor and two anonymous referees for their critical and constructive comments  

 
Figure 3. ScFR2 Comparison among the values of σ = 0.1, 0.4, 0.7, 0.9 and 0.9999. 

(a) # Line searches

  
 (b) # Function Evaluation 

 
AMAL AL-SAIDI
 
and MEHIDDIN AL-BAALI 

 
150 

 
which improve the quality of the paper. We would also like to thank Lucio Grandinetti, Calabria University, Italy, and 

Ekkehard Sachs, Trier University, Germany, for the useful discussions on the preparation of the paper. The first author 

would like to thank the Oman Ministry of Education for the award of a scholarship. 

 
Figure 4. Comparison among ScFR2, PRP, DY and HZ for σ = 0.1. 

References 

1. Fletcher, R. Practical Methods of Optimization, 2nd edition, 2013, Wiley, Chichester, England. 
2. Fletcher, R., and Reeves, C. Function minimization by conjugate gradients, Computational Journal, 1964, 7, 

149-154. 

3. Nocedal, J., and Wright, S. J. Numerical Optimization, 2nd Edition, 2006, Springer, London. 
4. Pytlak, R. Conjugate Gradient Algorithms in Nonconvex Optimization, 2009, Springer, Berlin. 
5. Hager W., and Zhang, H. A survey of nonlinear conjugate gradient methods, Pacific Journal of 

Optimization, 2006, 2, 35-58. 

6. Al-Baali, M. A general class of conjugate gradient methods, 22
nd

 International Symposium on Mathematical 
Programming (ISMP), Pittsburg, USA, 2015. 

7. Powell, M.J.D. Nonconvex minimization calculations and the conjugate gradient method. In Numerical Analysis, 
Lecture Notes in Mathematics (Ed. Griffiths D.F.), Springer, 1984, 1066, 122-141. 

8. Zoutendijk, G. Nonlinear Programming, Computational Methods, in Integer and Nonlinear Programming 
(Ed. J. Abadie), North-Holland, Amsterdam, 1970, 37-86. 

9. Al-Baali, M. Descent property and global convergence of the Fletcher-Reeves method with inexact line search, 
IMA Journal of Numerical Analysis, 1985, 5, 121-124. 

10. Nocedal, J. Theory of algorithms for unconstrained optimization, Acta Numerica, 1992, 1, 199-242. 
11. Al-Baali, M., and Fletcher, R. An efficient line search for nonlinear least squares, Journal of Optimization Theory 

and Applications, 1986, 48, 359-378. 

12. Al-Saidi, A., Al-Baali, M., Fasano, G., and Roma, M. New scaled and shifted conjugate gradient methods for 

 
(a) # Line searches  (b) # Function Evaluation 

 
(c) # Gradient Evaluation  (d) CPU time 


IMPROVED FLETCHER–REEVES METHODS 

151 

 
large-scale unconstrained optimization, Technical Report DM215, Department of Mathematics, Sultan Qaboos 

University, Muscat, Oman, 2021. 

13. Al-Saidi, A. Improved conjugate gradient methods for large-scale unconstrained optimization, PhD Thesis, Sultan 
Qaboos University, Muscat, Oman, 2021. 

14. Narushima, Y., and Yabe, H. A survey of sufficient descent conjugate gradient methods for unconstrained 
optimization, SUT Journal of Mathematics, 2014, 50, 167-203. 

15. Perry, J. M. A class of conjugate gradient algorithms with a two-step variable-metric memory, Discussion Paper 
269, Center for Mathematical Studies in Economics and Management Sciences, Northwestern University, 

Evanston, Illinois, USA, 1977. 

16. Dai, Y. H., and Liao, L. Z. New conjugacy conditions and related nonlinear conjugate gradient methods, 
Applied Mathematics and Optimization, 2001, 43, 87-101. 

17. Touati-Ahmed, D., and Storey, C. Globally convergent hybrid conjugate gradient methods, Journal of 
Optimization Theory and Applications, 1990, 64, 379-397. 

18. Gilbert, J. C., and Nocedal, J. Global convergence properties of conjugate gradient methods for optimization, 
SIAM Journal on Optimization, 1992, 2, 21-42. 

19. Bongartz, I., Conn A. R., Gould, N., and Toint, Ph. L. CUTE: Constrained and unconstrained testing 
environment, ACM Transactions on Mathematical Software, 1995, 21, 123-160. 

20. Moré, J. J., Garbow, B. S., Hillstrom, K. E. Testing unconstrained optimization software, ACM Transactions 
on Mathematical Software, 1981, 7, 17-41. 

21. Dolan, E. D., Moré, J. J. Benchmarking optimization software with performance profiles, Mathematical 
Programming, 2002, 91, 201-213. 

22. Polak, E., and Ribière, G. Note sur la convergence de directions conjugées, Rev. Francaise Informat 
Recherche Opertionelle, 1969, 3e Annee 16, 35-43. 

23. Polyak, B. The conjugate gradient method in extreme problems, USSR Computational Mathematics and 
Mathematical Physics, 1969, 9, 94-112. 

24. Dai, Y. H., and Yuan, Y. X. A nonlinear conjugate gradient method with a strong global convergence 
property, SIAM Journal on Optimization, 1999, 10, 177-182. 

25. Hager, W., and Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line 
search. SIAM Journal on Optimization, 2005, 16, 170-192. 

26. Esmaeili. H., Rostami, M., and Kimiaei. M. Extended Dai-Yuan conjugate gradient strategy for large-scale 
unconstrained optimization with applications to compressive sensing, Filomat, 2018, 32 (6), 2173-2191. 

 
Received   30 June 2021 

Accepted   6 September 2021