CG Versus MINRES: An Empirical Comparison SQU Journal for Science, 17 (1) (2012) 44-62 © 2012 Sultan Qaboos University 44 CG Versus MINRES: An Empirical Comparison David Chin-Lung Fong* and Michael Saunders** *ICME, Stanford University, California, USA, Email: david3.14159@gmail.com. **Systems Optimization Laboratory, Department of Management Science and Engineering, Stanford University, California, USA, Email: saunders@stanford.edu. ABSTRACT: For iterative solution of symmetric systems = ,Ax b the conjugate gradient method (CG) is commonly used when A is positive definite, while the minimum residual method (MINRES) is typically reserved for indefinite systems. We investigate the sequence of approximate solutions kx generated by each method and suggest that even if A is positive definite, MINRES may be preferable to CG if iterations are to be terminated early. In particular, we show for MINRES that the solution norms kx are monotonically increasing when A is positive definite (as was already known for CG), and the solution errors * kx x are monotonically decreasing. We also show that the backward errors for the MINRES iterates kx are monotonically decreasing. KEYWORDS: Conjugate gradient method (CG), Conjugate residual method (CR), Iterative method, Krylov subspace method, Linear equations, Minimum residual method (MINRES), Sparse matrix, Trust-region method. مقارنة تجريبية بين طريقتي التدرج المترافقة والخطأ المتبقي األصغر لونج فونج و ميخائيل سوندرز-ديفيد تشن Ax=من المعتاد استخدام طريقة التدرج المترافقة لحل جملة من المعادالت الخطية المتناظرة :خصمل b تكرارياً، عندما لحل جملة غير محددة. ندرس في هذا نموذجية األصغر الخطأ المتبقي طريقة كونموجبة محددة، بينما ت A المصفوفة تكون األصغر أفضل من الخطأ المتبقي أن طريقة نبين و تين،طريقنتج عن الالتي ت kxيةتقريبال الحلولمن البحث متسلسلة أنموجبة محددة، إذا توقف التكرار مبكراً. نبرهن بشكل خاص A عندما تكون المصفوفة طريقة التدرج المترافقة، حتى هو معروف اموجبة محددة )كم A المصفوفة عندما تكون ستمرارتزايد باياألصغر الخطأ المتبقي طريقة في kxيمنظال أخطاء الحلأن في حالة طريقة التدرج المترافقة( و * kx x لتكرارات سابقةأن األخطاء ال بانتظام. كذلك نبين تناقصت .ستمرارباناقص تتاألصغر الخطأ المتبقي طريقة CG VERSUS MINRES: AN EMPIRICAL COMPARISON 45 1. Introduction he conjugate gradient method (CG) (Hestenes and Stiefel, 1952) and the minimum residual method (MINRES) (Paige and Saunders, 1975) are both Krylov subspace methods for the iterative solution of symmetric linear equations = .Ax b CG is commonly used when the matrix A is positive definite, while MINRES is generally reserved for indefinite systems (van der Vorst, 2003, p85). We re-examine this wisdom from the point of view of early termination on positive-definite systems. We assume that the system =Ax b is real with A symmetric positive definite (spd) and of dimension .n n The Lanczos process (Lanczos, 1950) with starting vector b may be used to generate the n k matrix  1 2k kV v v v and the ( 1)k k  Hessenberg tridiagonal matrix kT such that 1=k k kAV V T for = 1, 2, ,k and =AV V T for some ,n where the columns of kV form a theoretically orthonormal basis for the kth Krylov subspace 2 1 ( , ) span{ , , , , }, k k A b b Ab A b A b   and T is  and tridiagonal. Approximate solutions within the kth Krylov subspace may be formed as =k k kx V y for some k-vector .ky As shown in (Paige and Saunders, 1975), three iterative methods CG, MINRES, and SYMMLQ may be derived by choosing ky appropriately at each iteration. CG is well defined if A is spd, while MINRES and SYMMLQ are stable for any symmetric nonsingular A. As noted by Choi (2006), SYMMLQ can form an approximation 1 1 1=k k kx V y   in the (k+1)th Krylov subspace when CG and MINRES are forming their approximations =k k kx V y in the kth subspace. It would be of future interest to compare all three methods on spd systems, but for the remainder of this paper we focus on CG and MINRES. With different methods using the same information 1kV  and kT to compute solution estimates =k k kx V y within the same Krylov subspace (for each k), it is commonly thought that the number of iterations required will be similar for each method, and hence CG should be preferable on spd systems because it requires somewhat fewer floating-point operations per iteration. This view is justified if an accurate solution is required (stopping tolerance  close to machine precision  ). We show that with looser stopping tolerances, MINRES is sure to terminate sooner than CG when the stopping rule is based on the backward error for ,kx and by numerical examples we illustrate that the difference in iteration numbers can be substantial. 1.1 Notation We study the application of CG and MINRES to real symmetric positive-definite (spd) systems = .Ax b The unique solution is denoted by *.x The initial approximate solution is 0 0,x  and k kr b Ax  is the residual vector for an approximation kx within the kth Krylov subspace. For a vector v and matrix A, v and A denote the 2-norm and the Frobenius norm respectively, and 0A indicates that A is spd. 2. Minimization properties of Krylov subspace methods With exact arithmetic, the Lanczos process terminates with =k for some .n To ensure that the approximations =k k kx V y improve by some measure as k increases toward , the Krylov solvers minimize some convex function within the expanding Krylov subspaces (Freund et al., 1992). T DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 46 2.1 CG When A is spd, the quadratic form 1 ( ) 2 T T x x Ax b x   is bounded below, and its unique minimizer solves = .Ax b A characterization of the CG iterations is that they minimize the quadratic form within each Krylov subspace (Freund et al., 1992; Meurant, 2006, §2.4; Watkins, 2010, §8.8-§8.9): = , where = arg min ( ) . C C C k k k k k y x V y y V y With = *b Ax and 2 ( ) = 2 *, T T k k k kx x Ax x Ax  this is equivalent to minimizing the function * ( * ) ( * ), T k k kA x x x x A x x    known as the energy norm of the error, within each Krylov subspace. For some applications, this is a desirable property (Steihaug, 1983; Sun, 2003; Arioli, 2004; Meurant, 2006; Watkins, 2010). 2.2 MINRES For nonsingular (and possibly indefinite) systems, the residual norm was used in (Paige and Saunders, 1975) to characterize the MINRES iterations: = , where = arg min . M M M k k k k k y x V y y b AV y (1) Thus, MINRES minimizes kr within the kth Krylov subspace. This was also an aim of Stiefel's Conjugate Residual method (CR) (Stiefel, 1955) for spd systems (and of Luenberger's extensions of CR to indefinite systems (Luenberger, 1969, 1970)). Thus, CR and MINRES must generate the same iterates on spd systems. We use this connection to prove that kx increases monotonically when MINRES is applied to an spd system. 2.3 CG and CR The two methods for solving spd systems =Ax b are summarized in Table 1. The first two columns are pseudocodes for CG and CR with iteration number k omitted for clarity; they match our Matlab implementations. Note that =q Ap in both methods, but it is not computed as such in CR. Termination occurs when = 0r ( = = 0  ). To prove our main result we need to introduce iteration indices; see column 3 of Table 1. Termination occurs when = 0kr for some index =k n ( = = 0,  = = = = 0r s p q ). Note: This is the same as the at which the Lanczos process theoretically terminates for the given A and b. Theorem 2.1 The following properties hold for Algorithm CR: (a) = 0 T i jq q ( i j ) (b) = 0 T i jr q ( 1i j  ) Proof. Given in Luenberger (1970, Theorem 1). Theorem 2.2 The following properties hold for Algorithm CR: (a) 0i  (b) 0i  (c) 0 T i jp q  CG VERSUS MINRES: AN EMPIRICAL COMPARISON 47 (d) 0 T i jp p  (e) 0 T i jx p  (f) 0 T i jr p  Proof. (a) Here we use the fact that A is spd. The inequalities are strict until =i (and = 0r ). = = 0 ( 0) T T i i i i ir s r Ar A  (2) 2 1 1= / 0i i iq     (b) And again: 1= / 0 (by (2))i i i     (c) Case I: =i j = 0 ( 0) T T i i i ip q p Ap A Case II: = > 0i j k 1= = T T T T i j i i k i i k i i i kp q p q r q p q    1= (by Theorem 2.1(b)) T i i i kp q   0 , where 0i  by (b) and 1 0 T i i kp q   by induction as ( 1) ( ) = 1 < .i i k k k    Case III: = > 0j i k = = T T T i j i i k i i kp q p q p Ap  1= ( ) T i i k i k i kp A r p    1= T T i i k i k i i kq r p q    1= (by Theorem 2.1(b)) T i k i i kp q    0 , where 0i k   by (b) and 1 0 T i i kp q    by induction as ( 1) = 1 < .i k i k k    (d) At termination, define 0 1 1span{ , , , }p p p  and 0 1span{ , , }.q q  By construction, 1 = span{ , , , }b Ab A b  and = span{ , , }Ab A b (since =i iq Ap ). Again by construction, ,x  and since = 0r we have = .b Ax b  We see that . By Theorem 2.1(a), 1 =0{ / }i i iq q  forms an orthonormal basis for . If we project ip   onto this basis, we have 1 =0 = , T i k i kT k k k p q p q q q   where all coordinates are non-negative from (c). Similarly for any other ,jp < .j Therefore 0 T i jp p  for any , < .i j (e) By construction, DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 48 1 1 1 0 =1 = = = ( = 0) i i i i i k k k x x p p x     . Therefore 0 T i ix p  by (d) and (a). (f) Note that any ir can be expressed as a sum of iq : 1 1=i i i ir r q  = 1 1= l l l i ir q q     1 1= .l l i iq q    Thus we have 1 1= ( ) 0 , T T i j l l i i jr p q q p     where the inequality follows from (a) and (c). □ Table 1. Pseudocode for algorithms CG and CR We are now able to prove our main theorem about the monotonic increase of kx for CR and MINRES. A similar result was proved for CG by Steihaug (1983). Theorem 2.3 For CR (and hence MINRES) on an spd system = ,Ax b kx increases monotonically. Proof. 2 2 1 1 1 1 1= 2 0, T T i i i i i i ix x x p p p       where the last inequality follows from Theorem 2.2(a), (d) and (e). Therefore 1 .i ix x  □ * kx x is known to be monotonic for CG (Hestenes and Stiefel, 1952). The corresponding result for CR is a direct consequence of (Hestenes and Stiefel, 1952, Theorem 7.5). However, the second half of that theorem, 1* > * , C M k kx x x x  rarely holds in machine arithmetic. We give here an alternative proof that does not depend on CG. CG VERSUS MINRES: AN EMPIRICAL COMPARISON 49 Theorem 2.4 For CR (and hence MINRES) on an spd system = ,Ax b the error * kx x decreases monotonically. Proof. From the update rule for ,kx we can express the final solution = *lx x as 1 1 1=l l l lx x p   = 1 1 1= k k k l lx p p      (3) 1 1 1 1 1= .k k k k k l lx p p p          (4) Using equations (3) and (4), we can write 2 2 1 1 1= ( ) ( ) ( ) ( ) T T l k l k l k l k l k l kx x x x x x x x x x x x          2 1 1 1 1 1 1= 2 ( ) T T k k k k l l k k kp p p p p           0 , where the last inequality follows from Theorem 2.2 (a), (d). □ The energy norm error * k A x x is monotonic for CG by design. The corresponding result for CR is given in (Hestenes and Stiefel, 1952, Theorem 7.4). We give an alternative proof here. Theorem 2.5 For CR (and hence MINRES) on an spd system = ,Ax b the error in energy norm * k A x x is strictly decreasing. Proof. From (3) and (4), we can write 2 2 1l k l kA A x x x x   1 1= ( ) ( ) ( ) ( ) T T l k l k l k l kx x A x x x x A x x      2 1 1 1 1 1 1= 2 ( ) T T k k k k l l k k kp A p p p Ap           2 1 1 1 1 1 1= 2 ( ) T T k k k k l l k k kq p p q p           > 0 , where the last inequality follows from Theorem 2.2 (a), (c). □ 3. Backward error analysis For many physical problems requiring numerical solution, we are given inexact or uncertain input data (in this case A and/or b). It is not justifiable to seek a solution beyond the accuracy of the data (Dongarra et al., 1979). Instead, it is more reasonable to stop an iterative solver once we know that the current approximate solution solves a nearby problem. The measure of "nearby'' should match the error in the input data. The design of such stopping rules is an important application of backward error analysis. For a consistent linear system = ,Ax b we think of kx coming from the kth iteration of one of the iterative solvers. Following Titley-Péloquin (2010) we say that kx is an acceptable solution if and only if there exist perturbations E and f satisfying ( ) = , ,k E f A E x b f A b      (5) for some tolerances 0,  0  that reflect the (preferably known) accuracy of the data. We are naturally DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 50 interested in minimizing the size of E and f. If we define the optimization problem , , s.t. ( ) = , ,min k E f E f A E x b f A b       to have optimal solution ,k ,kE kf (all functions of ,kx , and  ), we see that kx is an acceptable solution if and only if 1.k  We call k the normwise relative backward error (NRBE) for .kx With = ,k kr b Ax the optimal solution ,k ,kE kf is shown in (Titley-Péloquin, 2010) to be 2 (1 ) = , = , Tk k k k k k k b E r x A x b x        (6) = , = . k k k k k k r f r A x b       (7) See (Higham, 2002, p12) for the case = 0 and (Higham, 2002, §7.1 and p336) for the case = .  3.1 Stopping rule For general tolerances  and , the condition 1k  for kx to be an acceptable solution becomes ,k kr A x b   (8) the stopping rule used in LSQR for consistent systems (Paige and Saunders, 1982a, p54, rule S1). F igure 1. Comparison of distribution of the condition number for matrices used for CG vs MINRES, before and after diagonal preconditioning. CG VERSUS MINRES: AN EMPIRICAL COMPARISON 51 3.2 Monotonic backward errors Of interest is the size of the perturbations to A and b for which kx is an exact solution of = .Ax b From (6)-(7), the perturbations have the following norms: = (1 ) = , k k k k k k r A r E x A x b       (9) = = . k k k k k b r f r A x b     (10) Since kx is monotonically increasing for CG and MINRES, we see from (6) that k is monotonically decreasing for both solvers. Since kr is monotonically decreasing for MINRES (but not for CG), we have the following result. Theorem 3.1 Suppose > 0 and > 0 in (5). For CR and MINRES (but not CG), the relative backward errors /kE A and /kf b decrease monotonically. Proof. This follows from (9)-(10) with kx increasing for both solvers and kr decreasing for CR and MINRES but not for CG. □ 4. Numerical results Here we compare the convergence of CG and MINRES on various spd systems =Ax b and some associated indefinite systems ( ) = .A I x b The test examples are drawn from the University of Florida Sparse Matrix Collection (Davis, 2007). We experimented with all 26 cases for which A is real spd and b is supplied. In Matlab we computed the condition number for each test matrix by finding the largest and smallest eigenvalue using eigs(A,1,'LM') and eigs(A,1,'SM') respectively. For this test set, the condition numbers range from 1.7E+03 to 3.1E+13. Since A is spd, we apply diagonal preconditioning by redefining A and b as follows: = diag( )d A , = diag(1. / sqrt( )),D d ,A DAD ,b Db / .b b b Thus in the figures below we have diag( ) =A I and = 1.b With this preconditioning, the condition numbers range from 1.2E+01 to 2.2E+11. The distribution of the condition number of the test set matrices before and after preconditioning is shown in Figure 1. The stopping rule used for CG and MINRES was (8) with = 0 and 8 = 10 .  That is, 8 8 10 = 10kr b    (but with a maximum of 5n iterations for spd systems and n iterations for indefinite systems). 4.1 Positive-definite systems In plotting backward errors, we assume for simplicity that > 0 and = 0 in (5)-(7), even though it doesn't match the choice of  and  in the stopping rule (8). This gives = 0k and = /k k kE r x in (9). Thus, as in Theorem 3.1, we expect kE to decrease monotonically for CR and MINRES but not for CG. DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 52 Figure 2. Comparison of backward and forward errors for CG and MINRES solving two spd systems = .Ax b CG VERSUS MINRES: AN EMPIRICAL COMPARISON 53 Figure 3. Comparison of backward and forward errors for CG and MINRES solving two more spd systems = .Ax b DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 54 In Figures 2 and 3, we plot / ,k kr x * k A x x (the quantity that CG minimizes at each iteration), and * kx x against iteration number k for CG and MINRES for four different problems. For CG, the plots confirm that * k A x x and * kx x are monotonic. For MINRES, the plots confirm the prediction of Theorems 3.1, 2.5, and 2.4, that / ,k kr x * ,k A x x and * kx x are monotonic. Figure 2 (left) shows problem Schenk_AFE_af_shell8 with A of size 504855 504855 and cond( )A = 2.7E+05 (MINRES stops significantly sooner than CG). From the plot of backward errors / ,k kr x we see that both CG and MINRES converge quickly at the early iterations. Then the backward error of MINRES plateaus at about iteration 80, and the backward error of CG stays about 1 order of magnitude behind MINRES. A similar phenomenon of fast convergence at early iterations followed by slow convergence is observed in the energy norm error and 2-norm error plots. Figure 2 (right) shows problem Cannizzo_sts4098 with A of size 4098 4098 and cond( )A = 6.7E+03 (MINRES stops slightly sooner than CG). MINRES converges slightly faster in terms of backward error, while CG converges slightly faster in terms of both error norms. Figure 4. Comparison of residual and solution norms for CG and MINRES solving two spd systems = .Ax b Figure 3 (left) shows problem Simon_raefsky4 with A of size 19779 19779 and cond( )A = 2.2E+11. Because of the high condition number, both algorithms hit the 5n iteration limit. We see that the backward error for MINRES converges faster than for CG as expected. For the energy norm error, CG is able to decrease over 5 orders of magnitude while MINRES plateaus after a 2 orders of magnitude decrease. For both the energy norm CG VERSUS MINRES: AN EMPIRICAL COMPARISON 55 error and 2-norm error, MINRES reaches a lower point than CG for some iterations. This must be due to numerical error in CG and MINRES (a result of loss of orthogonality in kV ). Figure 3 (right) shows problem BenElechi_BenElechi1 with A of size 245874 245874 and cond( )A = 1.8E+09. The backward error of MINRES stays ahead of CG by 2 orders of magnitude for most iterations. Around iteration 32000, the backward error of both algorithms goes down rapidly and CG catches up with MINRES. Both algorithms exhibit a plateau on energy norm error for the first 20000 iterations. The error norms for CG start decreasing around iteration 20000 and decrease even faster after iteration 30000. Figures 4 and 5 show kr and kx against iteration number k for CG and MINRES on four different problems. The solution norms grow somewhat faster for CG than for MINRES. Both reach the limiting value *x significantly before kx is close to *.x Figure 4 (left) shows problem Simon_olafu with A of size 16146 16146 and cond( )A = 4.3E+08, and (right) shows problem Cannizzo_sts4098 with A of size 4098 4098 and cond( )A = 6.7E+03. We see that kx is monotonically increasing for both solvers, and the kx values rise fairly rapidly to their limiting value, with a moderate delay for MINRES. Figure 5. Comparison of residual and solution norms for CG and MINRES solving two more spd systems = .Ax b Figure 5 (left) shows problem Schmid_thermal1 with A of size 82654 82654 and cond( )A = 3.0E+05, and (right) shows problem BenElechi_BenElechi1 with A of size 245874 245874 and cond( )A = 1.8E+09, in which the residual decrease and the solution norm increase are somewhat slower than typical. The rise of kx DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 56 for MINRES is rather more delayed. In the second case, if the stopping tolerance were 6 = 10  rather than 8 = 10 ,  the final MINRES kx ( 10000)k  would be less than half the exact value * .x It will be of future interest to evaluate this effect within the context of trust-region methods for optimization. 4.1.1 Why does k r for CG lag behind MINRES? It is commonly thought that even though MINRES is known to minimize kr at each iteration, the cumulative minimum of kr for CG should approximately match that of MINRES. That is, 0 .min C M i k i k r r    However, in Figures 2 and 3 we see that kr for MINRES is often smaller than for CG by 1 or 2 orders of magnitude. This phenomenon can be explained by the following relations between C kr and M kr (Greenbaum, 1977, Lemma 5.4.1; Titley-Peloquin, 2011): 2 2 1 = . 1 / M k C k M M k k r r r r  (11) From (11), one can infer that if M kr decreases a lot between iterations k-1 and k, then C kr would be roughly the same as . M kr The converse also holds, in that C kr will be much larger than M kr if MINRES is almost stalling at iteration k (i.e., M kr did not decrease much relative to the previous iteration). The above analysis was pointed out by Titley-Peloquin (2011) in comparing LSQR and LSMR (Fong and Saunders, 2011). We repeat the analysis here for CG vs MINRES and extend it to demonstrate why there is a lag in general for large problems. With = 0 in stopping rule (8), CG and MINRES stop when .kr b If this occurs at iteration l, we have =1 1 = . l k l k k r r r b    Thus on average, 1/ M M k kr r  will be closer to 1 if l is large. This means that the larger l is (in absolute terms), the more CG will lag behind MINRES (a bigger gap between C kr and M kr ). 4.2 Indefinite systems A key part of Steihaug's trust-region method for large-scale unconstrained optimization (Steihaug, 1983) (see also (Conn et al., 2000)) is his proof that when CG is applied to a symmetric (possibly indefinite) system = ,Ax b the solution norms 1, , kx x are strictly increasing as long as > 0 T j jp Ap for all iterations 1 .j k  (We are using the same notation as given in Table 1.) From our proof of Theorem 2.2, we see that the same property holds for CR and MINRES as long as both > 0 T j jp Ap and > 0 T j jr Ar for all iterations 1 .j k  In case future research finds that MINRES is a useful CG VERSUS MINRES: AN EMPIRICAL COMPARISON 57 solver in the trust-region context, it is of interest now to offer some empirical results about the behavior of kx when MINRES is applied to indefinite systems. First, on the nonsingular indefinite system 2 1 1 0 1 0 1 = 1 , 1 1 2 1 x                     (12) MINRES gives non-monotonic solution norms, as shown in the left plot of Figure 6. The decrease in kx implies that the backward errors /k kr x may not be monotonic, as illustrated in the right plot. Figure 6. For MINRES on the indefinite problem (4.2), kx and the backward error /k kr x are both slightly non-monotonic. More generally, we can gain an impression of the behavior of kx by recalling from (Choi et al., 2011) the connection between MINRES and MINRES-QLP. Both methods compute the iterates = M M k k kx V y in (1) from the subproblems 1 1 1 1= arg min and possibly = . M M k k k y y T y e T y e    When A is nonsingular or =Ax b is consistent (which we now assume), M ky is uniquely defined for each k  and the methods compute the same iterates M kx (but by different numerical methods). In fact they both compute the expanding QR factorizations 1 1 = , 0 k k k k k R t Q T e           (with kR upper tridiagonal) and MINRES-QLP also computes the orthogonal factorizations =k k kR P L (with kL lower tridiagonal), from which the kth solution estimate is defined by = ,k k kW V P = ,k k kL u t and = . M k k kx W u As shown in (Choi et al., 2011 §5.3), the construction of these quantities is such that the first 3k  columns of kW are the same as in 1,kW  and the first 3k  elements of ku are the same as in 1.ku  DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 58 Since kW has orthonormal columns, = , M k kx u where the first 2k  elements of ku are unaltered by later iterations. As shown in (Choi et al., 2011, §6.5), it means that certain quantities can be cheaply updated to give norm estimates in the form 2 2 2 2 2 2 2 2 1ˆ , = , M k k k kx          where it is clear that 2  increases monotonically. Although the last two terms are of unpredictable size, 2 M kx tends to be dominated by the monotonic term 2  and we can expect that M kx will be approximately monotonic as k increases from 1 to . Figure 7. Residual norms and solution norms when MINRES is applied to two indefinite systems ( ) = .A I x b Experimentally we find that for most MINRES iterations on an indefinite problem, kx does increase. To obtain indefinite examples that were sensibly scaled, we used the four spd ( , )A b cases in Figures 4 and 5, applied diagonal scaling as before, and solved ( ) =A I x b with = 0.5 and where A and b are now scaled (so that diag( ) =A I ). The number of iterations increased significantly but was limited to n. Figures 7 and 8 show residual norms 10log kr and and solution norms kx against iteration number k CG VERSUS MINRES: AN EMPIRICAL COMPARISON 59 when MINRES is applied to the four indefinite systems ( ) = ,A I x b where = 0.5 is large enough to make the systems indefinite. Figure 7 (left) shows problem Simon_olafu with = 16146,n and (right) shows problem Cannizzo_sts4098 with = 4098n (where A is the spd matrices in Figure 4). The values of kx are essentially monotonic. The backward errors /k kr x (not shown) were even closer to being monotonic (at least for the first n iterations). During the = 16146n iterations of Simon_olafu, kx increased 83% of the time and the backward errors /k kr x (not shown) decreased 96% of the time; and during the = 4098n iterations of Cannizzo_sts4098, kx increased 90% of the time and the backward errors /k kr x (not shown) decreased 98% of the time. Figure 8. Residual norms and solution norms when MINRES is applied to two more indefinite systems ( ) = .A I x b Figure 8 (left) shows problem Schmid_thermal1 with = 82654,n and (right) shows problem BenElechi_BenElechi1 with = 245874n (where A is the spd matrices in Figure 5). The problem Schmid_thermal1 reveals a definite period of decrease in ,kx and there is a mild but clear decrease in kx over an interval of about 10000 iterations. Nevertheless, during the = 82654n iterations, kx increased 83% of the time and the backward errors /k kr x decreased 91% of the time. The problem BenElechi_BenElechi1 is more like those DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 60 in Figure 7, where the solution norms and backward errors are essentially monotonic. During the = 245874n iterations, kx increased 88% of the time and the backward errors /k kr x (not shown) decreased 95% of the time. 5. Conclusions For full-rank least-squares problems min ,Ax b the solvers LSQR (Paige and Saunders, 1982a, 1982b) and LSMR (Fong and Saunders, 2011, 2010) are equivalent to CG and MINRES on the (spd) normal equation = . T T A Ax A b Comparisons in (Fong and Saunders, 2011) indicated that LSMR can often stop much sooner than LSQR when the stopping rule is based on Stewart's backward error norm / T k kA r r for least-squares problems (Stewart, 1977). Table 2. Comparison of CG and MINRES properties on an spd system =Ax b [11]=Hestenes and Stiefel (1952); [18]=Paige and Saunders (1975); [21]=Steihaug (1983) Table 3. Comparison of LSQR and LSMR properties on min Ax b [7]=Fong (2011); [8]=Fong and Saunders (2011); [19]=Paige and Saunders (1982) Our theoretical and experimental results here provide analogous evidence that MINRES can often stop much sooner than CG on spd systems when the stopping rule is based on the backward error /k kr x for =Ax b (or the more general backward errors in Theorem 3.1). In some cases, MINRES can converge faster than CG by as much as 2 orders of magnitude (Figure 3). On the other hand, CG converges somewhat faster than MINRES in terms of both * k A x x and * kx x (same figure). For spd systems, Table 2 summarizes properties that were already known by Hestenes and Stiefel (1952) and Steihaug (1983), along with the two additional properties we proved here (Theorems 2.3 and 3.1). CG VERSUS MINRES: AN EMPIRICAL COMPARISON 61 These theorems and experiments on CG and MINRES are part of the first author's PhD thesis (Fong, 2011), which also discusses LSQR and LSMR and derives some new results for both solvers. Table 3 summarizes the known results for CG and MINRES in (Paige and Saunders, 1982a) and (Fong and Saunders, 2011), respectively, and the newly derived properties for both solvers in (Fong, 2011). 6. Acknowledgements We kindly thank Professor Mehiddin Al-Baali and other colleagues for organizing the Second International Conference on Numerical Analysis and Optimization at Sultan Qaboos University (January 3-6, 2011, Muscat, Sultanate of Oman). Their wish to publish some of the conference papers in a special issue of SQU Journal for Science gave added motivation for this research. We also thank Dr Sou-Cheng Choi for many helpful discussions of the iterative solvers, and Michael Friedlander for a final welcome tip. The first author (David Fong) was partially supported by a Stanford Graduate Fellowship. The second author (Michael Saunders) was partially supported by Office of Naval Research grant N00014-08-1-0191 and NSF grant DMS-1009005, and by the U.S. Army Research Laboratory, through the Army High Performance Computing Research Center, Cooperative Agreement W911NF-07-0027. 7. References ARIOLI, M. 2004. A stopping criterion for the conjugate gradient algorithm in a finite element method framework. Numerical Mathematics, 97: 1-24. CHOI, S.-C. 2006. Iterative Methods for Singular Linear Equations and Least-Squares Problems. PhD thesis, Stanford University, Stanford, California, USA. CHOI, S.-C., PAIGE, C.C. and SAUNDERS, M.A. 2011. MINRES-QLP: a Krylov subspace method for indefinite or singular symmetric systems. SIAM J. Sci. Comput., 33: 1810-1836. CONN, A.R., GOULD, N.I.M. and TOINT, PH.L. 2000. Trust-Region Methods, vol. 1. SIAM, Philadelphia, USA. DAVIS, T.A. 2007. University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices. DONGARRA, J.J., BUNCH, J.R., MOLER, C.B. and STEWART, G.W. 1979. LINPACK Users’ Guide. SIAM, Philadelphia, USA. FONG, D.C.-L. 2011. Minimum-Residual Methods for Sparse Least-Squares Using Golub-Kahan Bidiagonalization. PhD thesis, Stanford University, Stanford, California, USA. FONG, D.C.-L. and SAUNDERS, M.A. 2010. LSMR software for linear systems and least squares. http://www.stanford.edu/group/SOL/software.html. FONG, D.C.-L. and SAUNDERS, M.A. 2011. LSMR: An iterative algorithm for sparse least-squares problems. SIAM Journal on Scientific Computing, 33: 2950-2971. FREUND, R.W., GOLUB, G.H. and NACHTIGAL, N.M. 1992. Iterative solution of linear systems. Acta Numerica, 1: 57-100. GREENBAUM, A. 1997. Iterative Methods for Solving Linear Systems. SIAM, Philadelphia, USA. HESTENES, M.R. and STIEFEL, E. 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49: 409-436. HIGHAM, N.J. 2002. Accuracy and Stability of Numerical Algorithms. (second Edition) SIAM, Philadelphia, USA. LANCZOS, C. 1950. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research of the National Bureau of Standards, 45: 255-282. LUENBERGER, D.G. 1969. Hyperbolic pairs in the method of conjugate gradients. SIAM Journal on Applied Mathematics, 17: 1263-1267. DAVID CHIN-LUNG FONG and MICHAEL SAUNDERS 62 LUENBERGER, D.G. 1970. The conjugate residual method for constrained minimization problems. SIAM Journal on Numerical Analysis, 7: 390-398. MEURANT, G.A. 2006. The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations, vol. 19 of Software, Environments, and Tools, SIAM, Philadelphia, USA. PAIGE, C.C. and SAUNDERS, M.A. 1975. Solution of sparse indefinite systems of linear equations. SIAM Journal on Numerical Analysis, 12: 617-629. PAIGE, C.C. and SAUNDERS, M.A. 1982a. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software, 8: 43-71. PAIGE, C.C. and SAUNDERS, M.A. 1982b. Algorithm 583; LSQR: Sparse linear equations and least-squares problems. ACM Transactions on Mathematical Software, 8: 195-209. STEIHAUG, T. 1983. The conjugate gradient method and trust regions in large scale optimization. SIAM Journal on Numerical Analysis, 20: 626-637. STEWART, G.W. 1977. Research, development and LINPACK, in Mathematical Software III, J.R. RICE, ed., Academic Press, New York, USA, 1-14. STIEFEL,E. 1955. Relaxationsmethoden bester strategie zur l¨osung linearer gleichungssysteme. Comm. Math. Helv., 29: 157-179. SUN, Y. 2003. The Filter Algorithm for Solving Large-Scale Eigenproblems from Accelerator Simulations. PhD thesis, Stanford University, Stanford, California, USA. TITLEY-PELOQUIN, D. 2010. Backward Perturbation Analysis of Least Squares Problems. PhD thesis, School of Computer Science, McGill University, Canada. TITLEY-PELOQUIN, D. 2011. Convergence of backward error bounds in LSQR and LSMR. Private communication. VAN DER VORST, H.A. 2003. Iterative Krylov Methods for Large Linear Systems. (1st Edition) Cambridge University Press, Cambridge, UK. WATKINS, D.S. 2010. Fundamentals of Matrix Computations. (3rd Edition) Pure and Applied Mathematics, Wiley, Hoboken, New Jersey, USA. Received 20 October 2011 Accepted 29 November 2011