SQU Journal for Science, 2021, 26(2), 126-140 DOI:10.53539/squjs.vol26iss2pp126-140 Sultan Qaboos University 126 Improved Diagonal Hessian Approximations for Large-Scale Unconstrained Optimization Ahmed Al-Siyabi* and Mehiddin Al-Baali Department of Mathematics, College of Science, Sultan Qaboos University, P.O. Box 36, PC 123, Al-Khod, Muscat, Sultanate of Oman. *Email: aalsiyabi@squ.edu.om. ABSTRACT: We consider some diagonal quasi-Newton methods for solving large-scale unconstrained optimization problems. A simple and effective approach for diagonal quasi-Newton algorithms is presented by proposing new updates of diagonal entries of the Hessian. Moreover, we suggest employing an extra BFGS update of the diagonal updating matrix and use its diagonal again. Numerical experiments on a collection of standard test problems show, in particular, that the proposed diagonal quasi-Newton methods perform substantially better than certain available diagonal methods. Keywords: Unconstrained optimization; Large-scale problems; Quasi-Newton methods; BFGS update. ذات أبعاد عالية مقيدةالغيرفي األمثليات قطريةتقريبات هس الحسين ت البعلي حي الدينم و السيابي حمدأ نفترض بعض الطرق المشابهة لطريقة نيوتن بمصفوفة قطرية لحل مسائل األمثليات غير المقيدة وبأبعاد عالية. يتم تقديم نهج بسيط وفعال :صلخمال بشكل BFGSلخوارزميات تلك الطرق من خالل اقتراح تعديالت جديدة إلى قطر مصفوفة هس التقريبية. إضافة إلى ذلك، نقترح تطبيق دستور التعديل كل افي على مصفوفة القطرالمعدلة واستخدام قطر المصفوفة الناتجة مرة أخرى. تُظهر التجارب العددية على مجموعة من المسائل النموذجية، بشإض خاص، أن طرق نيوتن المشابهة المقترحة لتعديل قطرالمصفوفة التقريبي تعمل بشكل أفضل بكثير من طرق قطرية معروفة. .BFGS تعديل، المشابهة ، طرق نيوتنذات أبعاد عالية لئاس، مةمقيدالغير األمثليات :مفتاحيةالكلمات ال IMPROVED DIAGONAL HESSIAN APPROXIMATIONS 127 1. Introduction This paper is concerned with quasi-Newton methods for solving the large-scale unconstrained optimization problem min𝑥∈ℜ𝑛 𝑓(𝑥), (1.1) where 𝑓: ℜ𝑛 ⟶ ℜ is a twice continuously differentiable function. It is assumed that 𝑛 is large so that a matrix cannot be stored explicitly. The methods are defined like the Newton method with line search framework, except that the Hessian 𝐺(𝑥𝑘 ) = ∇ 2𝑓(𝑥𝑘 ) is replaced by a symmetric and positive definite matrix 𝐵𝑘 . This Hessian approximation satisfies the so-called quasi-Newton condition for 𝑘 > 1, assuming 𝐵1 is given (usually, 𝐵1 = 𝐼, the identity matrix). The quasi-Newton methods are defined iteratively as follows. At the beginning of each iteration, 𝑥𝑘 is available (𝑥1 is given) so that the gradient 𝑔(𝑥𝑘 ) = ∆𝑓(𝑥𝑘 ) is computed. If this vector (denoted by 𝑔𝑘 ) is not sufficiently close to zero, a search direction 𝑠𝑘 is provided (𝑠1 = −𝐵1 −1𝑔1) such that the descent property 𝑠𝑘 𝑇 𝑔𝑘 < 0 holds. Thus, a positive steplength 𝛼𝑘 which reduces 𝑓(𝑥𝑘 ) along 𝑠𝑘 exists. In practice, 𝛼𝑘 is usually chosen to satisfy the Wolfe-Powell conditions 𝑓𝑘+1 ⩽ 𝑓𝑘 + 𝜎0𝛼𝑘 𝑔𝑘 𝑇 𝑠𝑘 , 𝑔𝑘+1 𝑇 𝑠𝑘 ⩾ 𝜎1𝑔𝑘 𝑇 𝑠𝑘 , (1.2) where 𝑓𝑘 denotes 𝑓(𝑥𝑘 ), 0 < 𝜎0 < 0.5 and 𝜎0 < 𝜎1 < 1. Then a new point is given by 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘 𝑠𝑘 . (1.3) For the next iteration, the Hessian approximation 𝐵𝑘 is updated to 𝐵𝑘+1 in terms of the two vectors 𝛿𝑘 = 𝑥𝑘+1 − 𝑥𝑘 , 𝛾𝑘 = 𝑔𝑘+1 − 𝑔𝑘 , (1.4) such that the quasi-Newton condition 𝐵𝑘+1𝛿𝑘 = 𝛾𝑘 (1.5) holds. Hence, the next search direction 𝑠𝑘+1 is computed by solving the system of linear equations 𝐵𝑘+1𝑠𝑘+1 = −𝑔𝑘+1. (1.6) Although there exist many quasi-Newton updating formulae in the literature, we will focus on the popular BFGS update 𝐵𝑘+1 = 𝐵𝑘 − 𝐵𝑘𝛿𝑘𝛿𝑘 𝑇𝐵𝑘 𝛿𝑘 𝑇𝐵𝑘𝛿𝑘 + 𝛾𝑘 𝛾𝑘 𝑇 𝛿𝑘 𝑇𝛾𝑘 , (1.7) because this formula has the following useful features. It satisfies the quasi-Newton condition (1.5) and maintains the Hessian approximations positive definite if the curvature condition 𝛿𝑘 𝑇 𝛾𝑘 > 0 (1.8) holds, which is guaranteed if the Wolfe-Powell conditions (1.2) are satisfied. In addition, the corresponding BFGS method converges superlinearly for convex objective functions. (For more details, see, for example, Fletcher [1]). Since the updated matrix 𝐵𝑘+1 cannot be stored explicitly, for sufficiently large values of 𝑛 it is replaced by another matrix that can be stored implicitly, so that the search direction in (1.6) is simply computed for all 𝑘 (see for example, Andrei [2], and Nocedal and Wright [3]). Here, we consider several proposals for maintaining 𝐵𝑘 diagonal, although the quasi-Newton feature (1.5) is not expected to be satisfied. Thus, a number of diagonal updates have been proposed (see the next section), some of which satisfy the weak quasi-Newton condition 𝛿𝑘 𝑇 𝐵𝑘+1𝛿𝑘 = 𝛿𝑘 𝑇 𝛾𝑘. (1.9) However, we will consider the possibility of improving the role of these matrices by trying to impose the quasi-Newton feature through an extra BFGS updating strategy. AHMED AL-SIYABI AND MEHIDDIN AL-BAALI 128 The rest of our work is organized as follows. In the next section, we discuss a selection strategy for diagonal entries of quasi-Newton BFGS Hessian and inverse Hessian approximations. Section 3 defines other diagonal updates (some satisfy the weak quasi-Newton condition (1.9)). In Section 4 we discuss the convergence property, while in Section 5 some numerical results are presented. We will see that the proposed technique improves upon the performance of the diagonal matrix updates substantially. Finally, we conclude in Section 6. 2. Diagonal quasi-Newton BFGS updates Although it is possible to define the inverse Hessian approximation by employing a number of BFGS updates in terms of vector pairs {𝛿𝑖 , 𝛾𝑖 }, for some 𝑖 ≤ 𝑘, without storing a matrix explicitly (see, for example, Buckley and LeNir [4], Fletcher [5], Liu and Nocedal [6], and Shanno [7]). We do not consider them here because we focus on the simple diagonal Hessian approximations and diagonal inverse Hessian approximations. In this section, we describe the BFGS update for maintaining the Hessian approximations 𝐵𝑘 or its inverse 𝐻𝑘 diagonal for all 𝑘. For convenience, we denote �̂� = diag [𝐵] ≡ diag [�̂�(1), … , �̂�(𝑛)] and note that for any two vectors 𝑢 and 𝑣 we have diag(𝑢𝑣𝑇 ) = diag[𝑢(1)𝑣 (1), … , 𝑢(𝑛)𝑣 (𝑛)] , diag(�̂�𝑢) = diag[�̂�(1)𝑢(1), … , �̂�(𝑛)𝑢(𝑛)] , diag(�̂�𝑢𝑣 𝑇 ) = diag[�̂�(1)𝑢(1)𝑣 (1), … , �̂�(𝑛)𝑢(𝑛)𝑣 (𝑛)] . Thus, in particular, it follows from (1.7) that the diagonal BFGS update of a diagonal matrix �̂�𝑘 can be written as follows: �̂�𝑘+1 = �̂�𝑘 − diag(�̂�𝑘𝛿𝑘𝛿𝑘 𝑇�̂�𝑘) 𝛿𝑘 𝑇�̂�𝑘𝛿𝑘 + diag(𝛾𝑘𝛾𝑘 𝑇) 𝛿𝑘 𝑇𝛾𝑘 (2.1) or equivalently, �̂�𝑘+1 (𝑖) = �̂�𝑘 (𝑖) − (�̂�𝑘 (𝑖) 𝛿𝑘 (𝑖) ) 2 ∑  𝑛𝑗=1 �̂�𝑘 (𝑗) (𝛿 𝑘 (𝑗) ) 2 + (𝛾𝑘 (𝑖) ) 2 𝛿𝑘 𝑇𝛾𝑘 , (2.2) for 𝑖 = 1, … , 𝑛. We note that this formula requires the storage of only three vectors. We first consider the most efficient quasi-Newton BFGS method that is defined by (1.3), (1.7) and (1.6). For convenience, we rewrite the BFGS update (1.7) as bfgs matrix function 𝐵𝑘+1 = bfgs(𝐵𝑘 , 𝛿𝑘 , 𝛾𝑘 ), (2.3) where bfgs (𝐵, 𝛿, 𝛾) = 𝐵 − 𝐵𝛿𝛿𝑇𝐵 𝛿𝑇𝐵𝛿 + 𝛾𝛾𝑇 𝛿𝑇𝛾 (2.4) is the BFGS formula for updating any symmetric matrix 𝐵 in terms of any two vectors 𝛿 and 𝛾. This formula has the useful features that it maintains the positive definiteness of 𝐵 if 𝛿 𝑇 𝛾 > 0 and satisfies the quasi-Newton condition bfgs (𝐵, 𝛿, 𝛾)𝛿 = 𝛾. Thus, the updated matrix (2.3) is maintained positive definitive if the curvature condition (1.8) holds. Since the updated matrix (2.3) cannot be stored explicitly, Gilbert and Lemaréchal [8] suggest using the diagonal BFGS update �̂�𝑘+1 = diag[bfgs(�̂�𝑘 , 𝛿𝑘 , 𝛾𝑘 )], (2.5) which is equivalent to both (2.1) and (2.2), assuming �̂�1 is given diagonal. In this case, �̂�𝑘 remains diagonal and positive definite so that the search direction (1.6) is simply computed as 𝑠𝑘+1 = −�̂�𝑘+1 −1 𝑔𝑘+1, (2.6) without computing the inverse matrix �̂�𝑘+1 −1 (= �̂�𝑘+1, say) explicitly. In the next section, we will consider a modification of this method. IMPROVED DIAGONAL HESSIAN APPROXIMATIONS 129 We now show that the inverse matrix of (2.3) which defines the BFGS inverse Hessian approximation yields a choice for defining �̂�𝑘+1 explicitly as follows. Let the BFGS inverse Hessian approximation be defined by 𝐻𝑘+1 = bfgs −1(𝐻𝑘 , 𝛿𝑘 , 𝛾𝑘 ), (2.7) where for any symmetric matrix 𝐻 and two vectors 𝛿 and 𝛾, bfgs−1(𝐻, 𝛿, 𝛾) = 𝐻 + (1 + 𝛾𝑇𝐻𝛾 𝛿𝑇𝛾 ) 𝛿𝛿𝑇 𝛿𝑇𝛾 − 𝛿𝛾𝑇𝐻+𝐻𝛾𝛿𝑇 𝛿𝑇𝛾 . (2.8) Thus, if 𝐻 is stored implicitly (particularly, when it is diagonal) such that the product 𝐻𝑢 is available for any vector 𝑢, then the product bfgs−1 (𝐻, 𝛿, 𝛾)𝑣, for any vector 𝑣, can be computed without storing the updated matrix bfgs−1 (𝐻, 𝛿, 𝛾) explicitly. Hence, the search direction (1.6) for the next iteration can be computed directly by 𝑠𝑘+1 = −𝐻𝑘+1𝑔𝑘+1, (2.9) although 𝐻𝑘+1 is a dense nondiagonal matrix. This technique maintains the useful quasi-Newton condition 𝐻𝑘+1𝛾𝑘 = 𝛿𝑘 and will be applied to some diagonal matrices to impose the quasi-Newton condition again (see Al-Siyabi, [9]). As for the BFGS update, we now apply the diagonal technique to the inverse BFGS update (2.7) to obtain �̂�𝑘+1 = diag[bfgs −1(�̂�𝑘 , 𝛿𝑘 , 𝛾𝑘 )]. (2.10) Although the inverse BFGS updated matrix bfgs (�̂�𝑘 , 𝛿𝑘 , 𝛾𝑘 ) −1 = bfgs−1 (�̂�𝑘 , 𝛿𝑘 , 𝛾𝑘 ), since �̂�𝑘 = �̂�𝑘 −1, we note that diag (bfgs(�̂�𝑘 , 𝛿𝑘 , 𝛾𝑘 )) −1 ≠ diag (bfgs−1(�̂�𝑘 , 𝛿𝑘 , 𝛾𝑘 )); i.e., the inverse of the diagonal updated matrix (2.5) differs from (2.10). Therefore, for convenience, we rewrite the search direction (2.6) as follows: 𝑠𝑘+1 = −�̂�𝑘+1𝑔𝑘+1. (2.11) In practice, the diagonal choice (2.5)– (2.6) is preferred to (2.10)– (2.11). In fact, the diagonal inverse BFGS update of (2.10) has been suggested by Gilbert and Lemaréchal [8], as follows: �̂�𝑘+1 = �̂�𝑘 + (1 + 𝛾𝑘 𝑇�̂�𝑘𝛾𝑘 𝛿𝑘 𝑇𝛾𝑘 ) diag(𝛿𝑘𝛿𝑘 𝑇) 𝛿𝑘 𝑇𝛾𝑘 − 2 diag(�̂�𝑘𝛾𝑘 𝛿𝑘 𝑇) 𝛿𝑘 𝑇𝛾𝑘 , (2.12) or equivalently, �̂�𝑘+1 (𝑖) = �̂�𝑘 (𝑖) + (1 + ∑   𝑛 𝑗=1 �̂�𝑘 (𝑗) (𝛾 𝑘 (𝑗) ) 2 𝛿𝑘 𝑇𝛾𝑘 ) (𝛿 𝑘 (𝑖) ) 2 𝛿𝑘 𝑇𝛾𝑘 − 2 𝛿𝑘 (𝑖) 𝛾𝑘 (𝑖) �̂�𝑘 (𝑖) 𝛿𝑘 𝑇𝛾𝑘 , (2.13) with a certain positive definite diagonal matrix �̂�1 so that the updated diagonal matrices are maintained positive definite if the curvature condition 𝛿𝑘 𝑇 𝛾𝑘 > 0 is satisfied. Now, the corresponding algorithm for the above diagonal quasi-Newton updates can be outlined in the following way, where throughout the paper ∥⋅∥ denotes the Euclidean vector norm. Algorithm 2.1. Step 0: Given an initial point 𝑥1, a symmetric and positive definite diagonal matrix �̂�1 and 𝜖 > 0 (an acceptance tolerance on the gradient norm). Set 𝑘 = 1 and compute the initial search direction 𝑠1 = −�̂�1 −1𝑔1. Step 1: Compute a steplength 𝛼𝑘 and a new point 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘 𝑠𝑘, such that the Wolfe-Powell conditions (1.2) hold. Step 2: If ∥∥𝑔𝑘+1∥∥ ≤ 𝜖, then stop. Step 3: Compute the vectors 𝛿𝑘 = 𝑥𝑘+1 − 𝑥𝑘 and 𝛾𝑘 = 𝑔𝑘+1 − 𝑔𝑘 . Step 4: Define a new diagonal Hessian approximation �̂�𝑘+1 by (2.1) using �̂�𝑘 , 𝛿𝑘 and 𝛾𝑘. Step 5: Compute the new search direction 𝑠𝑘+1 = −�̂�𝑘+1 −1 𝑔𝑘+1. Step 6: Set 𝑘 = 𝑘 + 1, and go to Step 1. Note that in steps 0 and 5, the search direction is computed without forming �̂�𝑘+1 −1 explicitly; i.e., it is calculated as 𝑠𝑘+1 (𝑖) = −𝑔𝑘+1 (𝑖) /�̂�𝑘+1 (𝑖) , for 𝑖 = 1,2, … , 𝑛. In Step 4, we define �̂�𝑘+1 in particular by the diagonal BFGS Hessian approximation (2.1) or equivalently (2.5), unless otherwise stated. However, if a diagonal inverse Hessian AHMED AL-SIYABI AND MEHIDDIN AL-BAALI 130 approximation is considered, using for example formula (2.10) for updating �̂�𝑘 to �̂�𝑘+1, then steps 0, 4 and 5 are used with �̂�𝑗 −1 replaced by �̂�𝑗 , for 𝑗 = 1, 𝑘 and 𝑘 + 1, respectively. 3. Diagonal non quasi-Newton updates In this section, we describe some diagonal Hessian approximations �̂�𝑘 which satisfy certain useful properties. Although it is possible to define diagonal inverse Hessian approximations �̂�𝑘, we do not consider them here because they are inferior to the direct Hessian approximations (Al-Siyabi [9]). Nazareth [10] has proposed the following Hessian approximation update: 𝐵𝑘+1 = 𝐵𝑘 + 𝛿𝑘 𝑇𝛾𝑘−𝛿𝑘 𝑇𝐵𝑘𝛿𝑘 (𝛿𝑘 𝑇𝐵𝑘𝛿𝑘) 2 𝐵𝑘 𝛿𝑘 𝛿𝑘 𝑇 𝐵𝑘 . (3.1) This satisfies the weak quasi-Newton condition (1.9) and maintains the Hessian approximations positive definite whenever the curvature condition (1.8) holds. The author suggests replacing 𝐵𝑘 by a diagonal matrix �̂�𝑘 to obtain the new diagonal Hessian approximation �̂�𝑘+1 = �̂�𝑘 + 𝛿𝑘 𝑇𝛾𝑘−𝛿𝑘 𝑇�̂�𝑘𝛿𝑘 (𝛿𝑘 𝑇�̂�𝑘𝛿𝑘) 2 diag(�̂�𝑘 𝛿𝑘 𝛿𝑘 𝑇 �̂�𝑘 ), (3.2) or equivalently, �̂�𝑘+1 (𝑖) = �̂�𝑘 (𝑖) + 𝛿𝑘 𝑇𝛾𝑘−∑   𝑛 𝑗=1 �̂�𝑘 (𝑗) (𝛿 𝑘 (𝑗) ) 2 (∑  𝑛𝑗=1 �̂�𝑘 (𝑗) (𝛿 𝑘 (𝑗) ) 2 ) 2 (�̂�𝑘 (𝑖) ) 2 (𝛿𝑘 (𝑖) ) 2 , (3.3) for 𝑖 = 1, … , 𝑛. This also remains positive definite if the curvature condition (1.8) holds. Although a formula for the inverse update of (3.1) can be used to define diagonal inverse Hessian approximations �̂�𝑘, we do not consider it here, because the performance of the corresponding method is worse than that of the above one (for details, see Al-Siyabi [9]). Moreover, Zhu et al. [11] proposed the direct diagonal update �̂�𝑘+1 = �̂�𝑘 + 𝛿𝑘 𝑇𝛾𝑘 −𝛿𝑘 𝑇�̂�𝑘𝛿𝑘 tr(�̂�𝑘) 2 �̂�𝑘 , (3.4) where �̂�𝑘 = diag [(𝛿𝑘 (1) ) 2 , (𝛿𝑘 (2) ) 2 , … , (𝛿𝑘 (𝑛) ) 2 ], or equivalently, �̂�𝑘+1 (𝑖) = �̂�𝑘 (𝑖) + 𝛿𝑘 𝑇𝛾𝑘−∑   𝑛 𝑗=1 �̂�𝑘 (𝑗) (𝛿 𝑘 (𝑗) ) 2 ∑  𝑛𝑗=1 (𝛿𝑘 (𝑗) ) 4 (𝛿𝑘 (𝑖) ) 2 , (3.5) for 𝑖 = 1, … , 𝑛, which satisfies the weak quasi-Newton condition (1.9). Since the positive definiteness of this update is not guaranteed, the authors introduced the following safeguarding test. If the inequality �̂�𝑘+1 (𝑖) < 𝜖1 holds for any 𝑖 and some 𝜖1 > 0 (we used 𝜖1 = 10 −6), the authors suggest resetting the above updated matrix to the scaled identity matrix 𝜏𝑘 0𝐼, where 𝜏𝑘 0 = 𝛾𝑘 𝑇𝛾𝑘 𝛿𝑘 𝑇𝛾𝑘 (3.6) is suggested by Oren and Luenberger [12], which is positive if the curvature condition holds. Thus, the authors propose the positive definite diagonal update as follows: �̂�𝑘+1 = { �̂�𝑘+1, if �̂�𝑘+1 (𝑖) ≥ 𝜖1, ∀𝑖, 𝜏𝑘 0𝐼, otherwise. (3.7) Recently, Sim et al. [13] proposed the following diagonal Hessian approximation: IMPROVED DIAGONAL HESSIAN APPROXIMATIONS 131 �̂�𝑘+1 = { �̂�𝑘+1, if �̂�𝑘 0 < 1 �̂�𝑘 0𝐼, otherwise, (3.8) where �̂�𝑘+1 (𝑖) = 1 1+𝜔𝑘(𝛿𝑘 (𝑖) ) 2 , 𝜔𝑘 = 𝛿𝑘 𝑇𝛿𝑘−𝛿𝑘 𝑇𝛾𝑘 ∑  𝑛𝑗=1 (𝛿𝑘 (𝑗) ) 4 , (3.9) and �̂�𝑘 0 = 𝛿𝑘 𝑇𝛾𝑘 𝛿𝑘 𝑇𝛿𝑘 . (3.10) Since this self-scaling parameter of Oren and Luenberger [12] is positive if the curvature condition holds, the proposed diagonal matrix (3.8) is positive definite. The authors derived the above diagonal matrix �̂�𝑘+1 as follows. If �̂�𝑘 0 ≥ 1, then they use the other known scaled identity matrix �̂�𝑘+1 = �̂�𝑘 0𝐼. Otherwise, they define positive definite �̂�𝑘+1 as an approximate solution of the constrained optimization problem min �̂�+  𝜓(�̂�+) = tr (�̂�+) − ln (det(�̂�+)) (3.11) s.t. 𝛿𝑘 𝑇 �̂�+𝛿𝑘 = 𝛿𝑘 𝑇 𝛾𝑘 , �̂� +diagonal, (3.12) assuming �̂�+ is positive definite and approximating the Lagrange multiplier by a reasonable value of 𝜔𝑘 . Since the 𝜓 function is useful for deriving the BFGS formula (Byrd and Nocedal [14]), it is expected that the first case in (3.8) would work well. In practice, this works a little better than (3.7) and slightly worse than the diagonal BFGS update (2.5) (see Section 5 for details). Andrei [15] considers the possibility of defining a diagonal quasi-Newton Hessian approximation (say, �̂�𝑘+1) without updating a matrix by enforcing the quasi-Newton condition �̂�𝑘+1𝛿𝑘 = 𝛾𝑘 which he writes as follows: �̂�𝑘+1𝑆𝑘 = 𝑌𝑘 , (3.13) where 𝑆𝑘 = diag [𝛿𝑘 (1) , … , 𝛿𝑘 (𝑛) ] and 𝑌𝑘 = diag [𝛾𝑘 (1) , … , 𝛾𝑘 (𝑛) ]. Because, in general, equation (3.13), subject to �̂�𝑘+1 positive definite, cannot be solved exactly, for 𝑖 = 1,2, … , 𝑛, the author suggests the choice �̂�𝑘+1 (𝑖) = { 𝛾𝑘 (𝑖) 𝛿 𝑘 (𝑖) , if 𝛾 𝑘 (𝑖) 𝛿 𝑘 (𝑖) ⩾ 𝜖2 1, otherwise, (3.14) where 𝜖2 > 0. He reports that the corresponding algorithm performs better than certain diagonal quasi-Newton algorithms. We note that the second case in (3.14) has the drawback of losing the details of �̂�𝑘. Therefore, we suggest the following modified diagonal choice �̂�𝑘+1 (𝑖) = { 𝛾 𝑘 (𝑖) 𝛿 𝑘 (𝑖) , if 𝜖2 ≤ 𝛾𝑘 (𝑖) 𝛿 𝑘 (𝑖) ⩽ 1 𝜖3 , �̂�𝑘 (𝑖) , otherwise, (3.15) where 𝜖3 > 0, which maintains the positive definite property. In practice, this choice works better than (3.14) (see Section 5 for details). We let 𝜖3 = 10 −14 be small enough so that the second inequality in (3.15) was always satisfied in our experiment. Although other useful choices for the second case in (3.15) are possible (see Al-Siyabi [9]), we do not consider them here because we can improve the choice (3.15) as follows. Since the above choices for the Hessian approximation �̂�𝑘+1 do not satisfy the quasi-Newton condition (1.5), we suggest updating this matrix by any quasi-Newton formula (in particular, the BFGS update). Hence, replacing the updated matrix by its diagonal, as in (2.5), we obtain the positive definite improved diagonal BFGS update �̂�𝑘+1 = diag[bfgs(�̂�𝑘+1 ∗ , 𝛿𝑘 , 𝛾𝑘 )], (3.16) where �̂�𝑘+1 ∗ denotes any of the above �̂�𝑘+1. Finally, the corresponding algorithm for the above diagonal quasi-Newton updates can be outlined along the lines of Algorithm 2.1, where the difference among the above updates occurs in Step 4 for defining �̂�𝑘+1 (given in particular by (3.15)– (3.16), unless otherwise stated). We will show that this improves the performance of choice (3.15) in Section 5. AHMED AL-SIYABI AND MEHIDDIN AL-BAALI 132 4. Convergence analysis In this section, we study the convergence property of our proposed diagonal Hessian approximations. Since they are maintained positive definite, the descent condition 𝑠𝑘 𝑇 𝑔𝑘 < 0 (4.1) is satisfied for all 𝑘. Before presenting the convergence property, we first state the following standard assumption. Assumption 4.1. a) Consider the level set Ω = {𝑥 ∈ ℜ𝑛 : 𝑓(𝑥) ≤ 𝑓(𝑥1)} and let Ω̃ be an open set containing Ω. b) The objective function 𝑓(𝑥) is bounded and continuously differentiable in Ω̃. c) The gradient 𝑔(𝑥) is Lipschitz continuous on Ω̃, that is, there exist a constant L > 0 such that ∥ 𝑔(𝑥) − 𝑔(�̃�) ∥≤ 𝐿 ∥ 𝑥 − �̃� ∥, ∀𝑥, �̃� ∈ Ω̃. (4.2) Similar to the well-known result of Zoutendijk [16], we state the following result. Theorem 4.1. Suppose Assumption 4.1 holds. Consider the iterations of the form (1.3), with 𝑥1 being any starting point, the search direction 𝑠𝑘 being defined such that the descent condition (4.1) holds and the steplength 𝛼𝑘 satisfies the Wolfe-Powell conditions (1.2). Then, the so-called Zoutendijk condition ∑  ∞𝑘=1 (𝑠𝑘 𝑇𝑔𝑘) 2 ∥∥𝑠𝑘∥∥ 2 < ∞ (4.3) is obtained. Proof. Similar to many analyses (see, for example, Nocedal and Wright [3]), we state the following proof (for complete illustration). Rearranging the second Wolfe-Powell condition in (1.2) and using the Lipschitz condition (4.2), it follows that 𝐿∥∥𝑠𝑘 ∥∥∥∥𝑥𝑘+1 − 𝑥𝑘 ∥∥ ≥ 𝑠𝑘 𝑇 (𝑔(𝑥𝑘+1) − 𝑔(𝑥𝑘 )) ≥ (𝜎1 − 1)𝑠𝑘 𝑇 𝑔𝑘 . Substituting 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘 𝑠𝑘 and using (4.1), we obtain: 𝛼𝑘 ≥ 1 − 𝜎1 𝐿 |𝑠𝑘 𝑇 𝑔𝑘 | ∥∥𝑠𝑘 ∥∥ 2 . Using this result, we rewrite the first Wolfe-Powell condition in (1.2) as follows: 𝑓𝑘 − 𝑓𝑘+1 ≥ 𝜎0(1 − 𝜎1) 𝐿 (𝑠𝑘 𝑇 𝑔𝑘 ) 2 ∥∥𝑠𝑘 ∥∥ 2 . By summing this expression over 𝑘 and using the assumed bound on the 𝑓𝑘, we obtain the Zoutendijk condition (4.3). To obtain the global convergence result for Algorithm 2.1, we assume the condition number of the positive definite diagonal Hessian approximate �̂�𝑘 to be uniformly bounded, that is, there is a constant 𝑀 such that 𝜅(�̂�𝑘 ) = 𝜆1 𝜆𝑛 ≤ 𝑀, ∀𝑘, (4.4) where 𝜆1 and 𝜆𝑛 are the largest and smallest eigenvalues of �̂�𝑘 . Theorem 4.2. Suppose that 𝑓 satisfies Assumption 4.1. Let 𝑥1 be a starting point and �̂�1 be a positive definite diagonal matrix. Consider Algorithm 2.1 with 𝜖 = 0 in Step 2, �̂�𝑘+1 in Step 4 defined such that condition (4.4) holds and that Step 1 defines the steplength 𝛼𝑘 such that the Wolfe-Powell conditions (1.2) hold. Then, the algorithm converges globally, that is, lim 𝑘→∞  ∥∥𝑔𝑘 ∥∥ = 0. (4.5) Proof. Substituting 𝑠𝑘 = −�̂�𝑘 −1𝑔𝑘 into the Zoutendijk condition (4.3), we obtain: 1 𝑀 ∑𝑘=1 ∞  ∥∥𝑔𝑘 ∥∥ 2 ≤ ∑𝑘=1 ∞   (𝑠𝑘 𝑇 �̂�𝑘 𝑠𝑘 )(𝑔𝑘 𝑇 �̂�𝑘 −1𝑔𝑘 ) ∥∥𝑠𝑘 ∥∥ 2 < ∞, where 𝑀 is given as in (4.4). Hence, the limit (4.5) is obtained. 5. Numerical results In this section, we study the performance of our proposed methods on a set of standard unconstrained optimization test problems. All the methods are implemented as in Algorithm 2.1, differing only in Step 4 for defining the Hessian approximate �̂�𝑘+1 by the choices (2.5), (2.12), (3.2), (3.7) (with 𝜖1 = 10 −6), (3.8), (3.14) (with 𝜖2 = 10 −2 as in Andrei [15]), and (3.15) (with 𝜖2 = 10 −2 and 𝜖3 = 10 −14). These choices (referred to as L1, L2, ..., and L7), IMPROVED DIAGONAL HESSIAN APPROXIMATIONS 133 except for L7, have been proposed by Gilbert and Lemaréchal [8], Nazareth [10], Zhu et al. [11], Sim et al. [13] and Andrei [15], respectively. The choice L7 defines our proposed modification of L6. Because the L2 update defines the inverse Hessian approximate �̂�𝑘+1, steps 0, 4 and 5 are used with �̂�𝑗 −1 replaced by �̂�𝑗 , for 𝑗 = 1, 𝑘 and 𝑘 + 1, respectively. In all algorithms, we consider the followings. For Step 0, we choose �̂�1 = 𝐼 and 𝜖 = 10 −7. For Step 1, we calculate a value of the steplength 𝛼𝑘 such that the strong Wolfe-Powell conditions 𝑓𝑘+1 ≤ 𝑓𝑘 + 𝜎0𝛼𝑘 𝑠𝑘 𝑇 𝑔𝑘 , |𝑠𝑘 𝑇 𝑔𝑘+1| ≤ −𝜎1𝑠𝑘 𝑇 𝑔𝑘 (5.1) hold, using the usual values of 𝜎0 = 10 −4 and 𝜎1 = 0.9, which imply the Wolfe-Powell conditions (1.2). We used the MATLAB line search routine ‘lswpc’ of Al-Baali, which is essentially written with slight differences in Fortran by Fletcher. It is based on using quadratic and cubic interpolations for estimating a value of the steplength 𝛼𝑘. It also guarantees finding a positive value of 𝛼𝑘 in a finite number of operations (see Al-Baali and Fletcher [17] and Fletcher [1]). We stopped the run when either ∥∥𝑔𝑘 ∥∥ ≤ 10 −7 max{∥∥𝑔1∥∥, 1}, 𝑓𝑘 − 𝑓𝑘+1 ≤ 10 −14, or the number of line searches reached 105. All codes were written in MATLAB R2017b and the runs were made on a CPU processor with Intel(R) Core (TM) i7- (2.7 GHz) and 16.0 GB RAM memory. The test problems were selected from the collection of Andrei [18] (which belong to the CUTEst collection established by Gould et al. [19], Himmelblau [20] and Moré et al. [21]). We picked 84 test problems (as given in Table 1). For certain extended test problems (e.g., Extended Rosenbrock), increasing the number of variables does not increase the number of line searches, function and gradient evaluations required to solve the problems (see for example Al-Baali [22]). To avoid this occurrence, the author modifies the standard starting point �̅�1 to �̿�1, where �̿�1 (𝑖) = �̅�1 (𝑖) + 1 𝑖+1 , (5.2) for 𝑖 = 1, … , 𝑛. We used all 84 test functions with the standard starting points and their modifications (5.2) for n = 900, 9000 and 27000 to define three sets of test problems. Each set (referred to as Set1, Set2 and Set3, respectively) consists of 168 test problems. We also consider all the test problems as a single 504 test problem (referred to as Set4). To study the behaviour of the above algorithms, we compared the numerical results required to solve the tests, using the performance profiles of Dolan and Moré [23] based on the numbers of line searches (#ls), function evaluations (#fun) and gradient evaluations (#gra) as well as the cpu time in seconds, required to solve the test problems. The Dolan- Moré performance profile can be briefly described as follows. It illustrates the relative solvers performance of the solvers on a set of test problems in terms of #ls (similarly for #fun, #gra and cpu time). In general, 𝑃𝑀 (𝜏), the fraction of problems with performance ratio 𝜏 ≥ 0, is defined by 𝑃𝑀 (𝜏) = number of problems where 𝑙𝑜𝑔2(𝜏𝑝,𝑀)≤𝜏 total number of problems . (5.3) Here, 𝜏𝑝,𝑀 is the performance ratio of #ls required to solve problem 𝑝 by the M method to the lowest #ls required to solve problem 𝑝. The ratio 𝜏𝑝,𝑀 is set to ∞ (or some large number) if the M method fails to solve problem 𝑝. The values of 𝑃𝑀 (𝜏) at 𝜏 = 0 gives the percentage of test problems for which the method M performs to be best and the value for 𝜏 large enough is the percentage of test problems that the M method can solve. Thus, a solver with high values of 𝑃𝑀 (𝜏) or one with corresponding figure located at the top right performs better than the ones located at lower levels. Applying the above algorithms to the four sets, Set1, Set2, Set3 and Set4, of test problems, we obtained some numerical results. Their comparisons are given in Figures 1–4, respectively, with respect to #ls, #fun, #gra and cpu time for each figure. We observe that L7 appears to be the best, while showing to be a little better than L1, L2, L4 and L6, with L3 and L5 being a little worse than the other methods. To give another fair and useful comparison which shows the percentage improvement or worsening of the algorithms, we also considered the comparison rule of Al-Baali (see, e.g., Al-Baali [24] and essentially Al-Baali [25]). To compare two methods (say, M1 and M2) with respect to #ls (similarly for #fun, #gra and cpu time), the author proposes the average ratio measure of the form: 𝑟 = 1 𝑡 ∑  𝑡𝑖=1 𝑟𝑖 , (5.4) where 𝑡 is the number of test problems and AHMED AL-SIYABI AND MEHIDDIN AL-BAALI 134 𝑟𝑖 = { 𝑝𝑖 𝑞𝑖 , if 𝑝𝑖 ≤ 𝑞𝑖 2 − 𝑞𝑖 𝑝𝑖 , if 𝑝𝑖 > 𝑞𝑖 (5.5) with 𝑝𝑖 and 𝑞𝑖 denoting #ls required to solve problem 𝑖 by the M1 and M2 methods, respectively. If only M1 or only M2 failed to solve the problem, we set 𝑟𝑖 = 2 and 𝑟𝑖 = 0, respectively. If both M1 and M2 methods either failed or converged to two different local solutions, for some test problem 𝑖 then we set 𝑟𝑖 = 1. The average ratio 𝑟 in (5.4) always falls in the interval [0,2]. A value of 𝑟 ≤ 1 indicates that M1 is better than M2 by 100(1 − 𝑟)%. Otherwise, when 𝑟 > 1, M1 is worse than M2 (or M2 is better than M1) by 100(1 − 𝑟)% (for more details on this measurement ratio, see Al-Baali [24], for instance). Using the same numerical results used to obtain the comparison Figures 1- 4 for Set 𝑖, 𝑖 = 1, … ,4, we applied the above average ratio measure to compare L1, L2, ..., L6 versus L7 and obtained Tables 2–5, using both the starting points for 𝑛 = 900, 9000, 27000 and all 3 values of 𝑛, respectively. Since we have 𝑟 > 1 in all cases, it is clear that L7 gives the best performance, it being at least 10% better than L1, L2, L4 and L6, and more than 48% better than L3 in terms of #ls, #fun #gra and cpu time. Thus, these observations agree with those in Figures 1–4. We observe that the performance of L6 improves as n increases in comparison with L1, L2, and L4. We also observe that L1, L2 and L4 have nearly identical performances in terms of all measurements, whereas L3 is the worst among all the algorithms. Since the L3, L4, L5, L6 and L7 methods do not consider imposing the quasi-Newton condition (1.5), like that of L1 and L2, we suggest using (3.16) with �̂�𝑘+1 ∗ given by (3.2), (3.7), (3.8), (3.14) and (3.15), which define the former five methods, respectively. We compared the corresponding algorithms (referred to as L3a, L4a, L5a, L6a and L7a, respectively) and observed that each L𝑖a, for 𝑖 = 3, … ,7, performs better than L𝑖 and that the performance of L7a remains the best (see Al-Siyabi [9]). Thus, we present only the comparison of L7a versus L7 as shown in Figure 5 and Table 6 for Set4 of test problems. We notice that L7a outperforms L7 in terms of all measurements. Moreover, L7a performs better than L7 by at least 6% in terms of #ls, #fun, #gra and cpu time, which provides further illustration of the comparison shown in Figure 5. We also observe from the combination of Figures 1–4 and Figure 5 as well as Tables 2–5 and Table 6 that L7a performs substantially better than L6 with at least 15% improvement in terms of #ls, #fun and #gra and 20% in terms of cpu time. Table 1. List of test functions. No. Function’s Name No. Function’s Name 1 Extended Freudenstein & Roth 43 ARGLINB 2 Extended Trigonometric 44 ARWHEAD 3 Extended Rosenbrock 45 NONDIA 4 Generalized Rosenbrock 46 NONDQUAR 5 Extended White & Holst 47 BQDRTIC 6 Extended Beale 48 EG2 7 Extended Penalty 49 DIXMAANA 8 Perturbed Quadratic function 50 DIXMAANB 9 Raydan 1 51 DIXMAANC 10 Raydan 2 52 DIXMAAND 11 Diagonal 1 53 DIXMAANE 12 Diagonal 2 54 DIXMAANF 13 Diagonal 3 55 DIXMAANG 14 Hager 56 DIXMAANH 15 Generalized Tridiagonal 1 57 DIXMAANI 16 Extended Tridiagonal 1 58 DIXMAANJ 17 Extended TET (Three exponential terms) 59 DIXMAANK 18 Generalized Tridiagonal 2 60 DIXMAANL 19 Diagonal 4 61 Broyden Tridiagonal 20 Diagonal 5 62 Almost Perturbed Quadratic 21 Extended Himmelblau 63 Staircase 1 IMPROVED DIAGONAL HESSIAN APPROXIMATIONS 135 Table 1. continued 22 Generalized White & Holst 64 Staircase 2 23 Generalized PSC1 65 LIARWHD 24 Extended PSC1 66 ENGVAL1 25 Extended Powell 67 EDENSCH 26 Full Hessian FH1 68 CUBE 27 Full Hessian FH2 69 NONSCOMP 28 Extended BD1 (Block Diagonal) 70 QUARTC 29 Extended Maratos 71 Diagonal 6 30 Extended Cliff 72 SIQUAD 31 Perturbed quadratic diagonal 73 Extended DENSHNB 32 Extended Wood 74 Extended DENSHNF 33 Extended Hiebert 75 COSINE 34 Quadratic QF1 76 Generalized Quartic 35 Extended quadratic penalty QP1 77 Diagonal 7 36 Extended quadratic penalty QP2 78 Diagonal 8 37 Quadratic QF2 79 Full Hessian FH3 38 Extended quadratic exponential EP1 80 SINCOS 39 Extended Tridiagonal 2 81 Diagonal 9 40 FLETCHR 82 HIMMELBG 41 BDQRTIC 83 HIMMELH 42 TRIDIA 84 INDEF (a) Number of Line Searches. (b) Number of Function Evaluations. (c) Number of Gradient Evaluations. (d) CPU Times. Figure 1. Comparison among L1, L2, L3, L4, L5, L6 and L7, for Set1; 𝑛 = 900. AHMED AL-SIYABI AND MEHIDDIN AL-BAALI 136 Table 2. Average ratios 𝑟 for methods versus L7, for Set1; 𝑛 = 900. Method /Measure #ls #fun #gra cpu L1 1.153 1.203 1.155 1.130 L2 1.074 1.158 1.0714 1.097 L3 1.598 1.622 1.596 1.597 L4 1.207 1.211 1.209 1.173 L5 1.214 1.232 1.211 1.224 L6 1.124 1.144 1.129 1.077 Table 3. Average ratios 𝑟 for methods versus L7, for Set2; 𝑛 = 9000. Method/ Measure #ls #fun #gra cpu L1 1.190 1.259 1.205 1.281 L2 1.136 1.229 1.150 1.268 L3 1.479 1.542 1.506 1.566 L4 1.197 1.197 1.201 1.240 L5 1.237 1.249 1.250 1.290 L6 1.102 1.134 1.121 1.235 (a) Number of Line Searches. (b) Number of Function Evaluations. (c) Number of Gradient Evaluations. (d) CPU Times. Figure 2. Comparison among L1, L2, L3, L4, L5, L6 and L7, for Set2; 𝑛 = 9000. IMPROVED DIAGONAL HESSIAN APPROXIMATIONS 137 Table 4. Average ratios 𝑟 for methods versus L7, for Set3; 𝑛 = 27000. Method / Measure #ls #fun #gra Cpu L1 1.239 1.311 1.254 1.290 L2 1.137 1.232 1.152 1.166 L3 1.482 1.556 1.507 1.671 L4 1.228 1.222 1.225 1.223 L5 1.231 1.243 1.238 1.280 L6 1.079 1.114 1.095 1.096 (a) Number of Line Searches. (b) Number of Function Evaluations. (c) Number of Gradient Evaluations. (d) CPU Times. Figure 3. Comparison among L1, L2, L3, L4, L5, L6 and L7, for Set3; 𝑛 = 27000. AHMED AL-SIYABI AND MEHIDDIN AL-BAALI 138 Table 5. Average ratios 𝑟 for methods versus L7, for Set4; 𝑛 ∈ [900, 9000, 27000]. Method /Measure #ls #fun #gra cpu L1 1.211 1.285 1.229 1.267 L2 1.126 1.222 1.141 1.191 L3 1.487 1.556 1.512 1.585 L4 1.203 1.200 1.204 1.217 L5 1.223 1.235 1.233 1.262 L6 1.087 1.118 1.101 1.145 (a) Number of Line Searches. (b) Number of Function Evaluations. (c) Number of Gradient Evaluations. (d) CPU Times. Figure 4. Comparison among L1, L2, L3, L4, L5, L6 and L7, for Set4; 𝑛 ∈ [900, 9000, 27000]. IMPROVED DIAGONAL HESSIAN APPROXIMATIONS 139 Table 6. Average ratios 𝑟 for L7a versus L7, for Set4; 𝑛 ∈ [900, 9000, 27000]. Method / Measure #ls #fun #gra cpu L7a 0.942 0.933 0.934 0.943 6. Conclusion We first studied some diagonal Hessian approximation methods for large-scale unconstrained optimization, then presented new diagonal Hessian approximations and established their global convergence. Based on extensive numerical experiments, we observed that a number of our algorithms were more efficient and more robust than several similar available methods. Two of the proposed methods require storing only a few vectors while sharing certain desirable features of the quasi-Newton methods. Conflict of interest The authors declare no conflict of interest. Acknowledgment We thank the two anonymous referees for careful reading of the paper and for many constructive comments and suggestions. The first author would like to thank the Oman Ministry of Education for the award of a scholarship. (a) Number of Line Searches. (b) Number of Function Evaluations. (c) Number of Gradient Evaluations. (d) CPU Times. Figure 5. Comparison among L7 and L7a, for Set4; 𝑛 ∈ [900, 9000, 27000]. AHMED AL-SIYABI AND MEHIDDIN AL-BAALI 140 References 1. Fletcher, R. Practical Methods of Optimization. John Wiley & Sons, 2013. 2. Andrei, N. Nonlinear Conjugate Gradient Methods for Unconstrained Optimization. In Springer Optimization and Its Applications, 2020, Volume 158. 3. Nocedal, J., and Wright, S. Numerical Optimization. Springer Science & Business Media, 2006. 4. Buckley, A. and LeNir, A. QN-like variable storage conjugate gradients. Mathematical Programming, 1989, 27(2), 155-175. 5. Fletcher, R. Low storage methods for unconstrained optimization. In Computational Solution of Nonlinear Systems of Equations, E.L.Allgower and K.Georg (Eds.), Lectures in Applied Mathematics, Vol.26, AMS Publications, Providence, RI. Mathematical Programming, 1990, 91(2), 201-213. 6. Liu, D.C. and Nocedal, J. On the limited memory BFGS method for large scale optimization, 1989. Mathematical Programming, 45(1-3), 503-528. 7. Shanno, D.F. Conjugate gradient methods with inexact searches. Mathematics of Operations Research, 1978, 3(3), 244-256. 8. Gilbert, J.C. and Lemaréchal, C. Some numerical experiments with variable-storage quasi-Newton algorithms. Mathematical Programming, 1989, 45(1-3), 407-435. 9. Al-Siyabi, A. Efficient Methods for Large-Scale Nonlinear Least-Squares Unconstrained Optimization. Ph.D. thesis, Sultan Qaboos University, Muscat, Oman, 2021. 10. Nazareth, J. If quasi-Newton then why not quasi-Cauchy. SIAG/Opt Views-and-news, 1995, 6, 11-14. 11. Zhu, M., Nazareth, J.L., and Wolkowicz, H. The quasi-Cauchy relation and diagonal updating. SIAM Journal on Optimization, 1999, 9(4), 1192-1204. 12. Oren, S.S., and Luenberger, D.G. Self-scaling variable metric (SSVM) algorithms, part I: Criteria and sufficient conditions for scaling a class of algorithms, Management Science, 1974, 20, 845-862. 13. Sim, H.S., Leong, W.J., and Chen, C.Y. Gradient method with multiple damping for large-scale unconstrained optimization. Optimization Letters, 2019, 13(3), 617-632. 14. Byrd, R.H. and Nocedal, J.A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM Journal on Numerical Analysis, 1989, 26(3), 727-739. 15. Andrei, N. A new diagonal quasi-Newton updating method with scaled forward finite differences directional derivative for unconstrained optimization. Numerical Functional Analysis and Optimization, 2019, 40(13), 1467- 1488. 16. Zoutendijk, G. Nonlinear programming computational method. In J. Abadie (Ed.), Integer and Nonlinear Programming, North-Holland, Amsterdam, 1970, pp. 37-86. 17. Al-Baali, M. and Fletcher, R. An efficient line search for nonlinear least squares. Journal of Optimization Theory and Applications, 1986, 48(3), 359-377. 18. Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim., 2008, 10(1), 147-161. 19. Gould N.I., Orban D., Toint Ph. CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Computational Optimization and Applications, 2015, 60(3), 545-557. 20. Himmelblau, D. Applied Nonlinear Programming McGraw-Hill, New York, 1972. 21. Moré, J.J., Garbow, B.S. and Hillstrom, K.E. Testing unconstrained optimization software. ACM Transactions on Mathematical Software (TOMS), 1981, 7, 17-41. 22. Al-Baali, M. Improved Hessian approximations for the limited memory BFGS method. Numerical Algorithms, 1999, 22(1), 99-112. 23. Dolan, E.D. and Moré, J.J. Benchmarking optimization software with performance profiles, 2002. 24. Al-Baali, M. Numerical experience with a class of self-scaling quasi-Newton algorithms. Journal of Optimization Theory and Applications, 1998, 96(3), 533-553. 25. Al-Baali, M. A rule for comparing two methods in practical optimization. Technica1 Report 119, Dipartimento di Sistemi, Universita della Calabria, Italy, 1991. Received 6 May 2021 Accepted 6 July 2021