Preconditioning Large Indefinite Linear SQU Journal for Science, 17 (1) (2012) 63-79 © 2012 Sultan Qaboos University 63 Preconditioning Large Indefinite Linear Systems Giovanni Fasano * and Massimo Roma ** *Department of Management, University Ca’Foscari Venezia, Venice, Italy, E-mail: fasano@unive.it. *The Italian Ship Model Basin - INSEAN, Rome, Italy. **Dipartimento di Ingegneria Informatica Automatica e Gestionale A. Ruberti, SAPIENZA Università di Roma, Roma, Italy, E-mail: roma@dis.uniroma1.it. ABSTRACT: After briefly recalling some relevant approaches for preconditioning large symmetric linear systems, we describe a novel class of preconditioners. Our proposal is tailored for large indefinite linear systems, which arise very frequently in many different contexts of numerical analysis and nonlinear optimization. Our preconditioners are built as a byproduct of the Krylov subspace method used to solve the system. We describe theoretical properties of the proposed class of preconditioners, namely their capability of both shifting some eigenvalues of the system’s matrix to controlled values, and reducing the modulus of the other ones. The results of a numerical experimentation give evidence of the good performance of our proposal. KEYWORDS: Krylov subspace methods, Large indefinite linear systems, Large scale optimization, Preconditioners, Quasi-Newton updates. لجملة كبيرة من المعادالت الخطية غير المحددة يةشرطتهيئة جوڤاني فاسانو و ماسيمو روما لجملة كبٌرة من المعادالت الخطٌة المتناظرة، نصف نوع ٌةشرطالتهٌئة الالمتعلقة ب طرقبعد سرد سرٌع لبعض ال :ملخص هر فً التحلٌل العددي دة التً كثٌرا ما تظقٌمٌالئم الجمل الخطٌة الكبٌرة غٌر الهذا المقترح . رقالطهذه جدٌد من ٌلوف الجزئً المستعملة فً حل تلك اكرفضاء نتٌجة جانبٌة لطرٌقة كالمقترحة رٌقةطنبنً ال مثلٌات غٌر الخطٌة.واأل المسبقة وخاصة قدرتها على تغٌٌر تهٌئة بوصف الخواص النظرٌة لهذا الصنف من أدوات ال تقوم هذه الدراسةالجمل. باقً القٌم الذاتٌة. تظهر نتائج التجارب لإلى قٌم متحكم فٌها وتقلٌص القٌم المطلقة الخطٌة جملللالقٌم الذاتٌة لمصفوفة بعض ٌة مدى فاعلٌة الطرٌقة المقترحة.العدد 1. Introduction he efficient solution of large linear systems (or a sequence of slowly varying linear systems) is of fundamental importance in many contexts of numerical analysis and nonlinear optimization. In this paper we first recall a few relevant approaches for preconditioning large indefinite linear systems. Observe that in many contexts of numerical analysis (see e.g. http://math.nist.gov/MatrixMarket) and nonlinear optimization, the T mailto:fasano@unive.it GIOVANNI FASANO and MASSIMO ROMA 64 iterative efficient solution of linear systems and sequences of linear systems is sought. Truncated Newton methods in unconstrained optimization, KKT systems arising in constrained optimization, interior point methods, and PDE constrained optimization are just some examples from optimization. We first show that using information from quasi-Newton updates may often provide effective preconditioners. The latter are sometimes endowed with theoretical properties related to the spectrum and the condition number of the preconditioned matrix. Then, we describe a novel class of preconditioners for the solution of large indefinite linear systems, without assuming any sparsity pattern of the system matrix. In particular, the class of preconditioners we propose uses information collected from Krylov subspace methods, in order to assess the structural properties of the system matrix. We iteratively construct our preconditioners either indirectly using (but not performing) a factorization of the system matrix (see, e.g. (Fasano and Roma, 2007; Golub and Van Loan, 1996; Stoer, 1983)), obtained as by product of Krylov subspace methods, or performing a Jordan canonical form on a very small size matrix. We address our preconditioners using a general Krylov subspace method; then, we prove theoretical properties for such preconditioners, and describe results which indicate how to possibly select the parameters which characterize the definition of the preconditioners. The basic idea of our approach is to apply a Krylov subspace method to generate a positive definite approximation of the inverse of the system matrix. The latter is then used to build our preconditioners, needing to store just a few vectors, without requiring any product of matrices. We assume that the entries of the system matrix are not known and the information necessary to build the preconditioner is gained by using a routine, which computes the product of the system matrix times a vector. In the paper (Fasano and Roma, 2011b) we experiment our preconditioners both in numerical analysis and nonconvex optimization frameworks. In particular, we test our proposal on significant linear systems from the literature. Then, we focus on the so called Newton-Krylov methods, also known as Truncated Newton methods (see (Nash, 2000) for a survey). In these contexts, both positive definite and indefinite linear systems are considered. We recall that in case the optimization problem in hand is nonconvex, i.e. the Hessian matrix of the objective function is possibly indefinite and at least one eigenvalue is negative, the solution of Newton's equations within Truncated Newton schemes may require some care. Indeed, the Krylov subspace method used to solve Newton's equation, should be suitably applied considering that optimization frameworks require the computation of descent directions, which have to satisfy additional properties (Dennis and Schnabel,1983; Nocedal and Wright, 2000). In this regard our proposal provides a tool, in order to preserve the latter properties. The paper is organized as follows. In Section 2 we briefly recall relevant approaches from the literature. Then, in Section 3 we describe our class of preconditioners for large indefinite linear systems, by using a general Krylov subspace method. We detail some theoretical properties of our proposal, along with some hints on its calculation. Finally, a section of conclusions and future work completes the paper. Regarding the notations, for a n n real matrix M we denote with [ ]M the spectrum of M; kI is the identity matrix of order k and we use the capital letter T to indicate a tridiagonal matrix. Finally, with 0C we indicate that the matrix C is positive definite, and denotes the Euclidean norm. 2. Some approaches for preconditioning large symmetric systems Let us consider the following linear system = , , , n n n Ax b A R b R (1) where A is symmetric and nonsingular, and n is large. We assume that the structure of the matrix A is unknown as is its sparsity pattern. We recall that in case of special structures of matrix A, suitable preconditioners may be built for solving (1) (see (Higham, 2002)). As is well-known, the main rationale behind the idea of using a preconditioner to solve the linear system PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 65 (1), consists in introducing the nonsingular matrix M, such that solving =MAx Mb (2) is possibly simpler in some sense than solving (1). Of course, to the latter purpose the extreme choices for M are =M I and 1 =M A (the latter being the ideal choice). In most cases (always when n is large) there is no chance to compute 1 A (or computing 1 A is no cheaper than solving (1)). Notwithstanding this difficulty, the preconditioner M can be chosen according to the following alternative guidelines: the linear system (2) should be of easy solution thanks to the structure of matrix M (e.g. preconditioners for linear systems from PDEs discretization often have a suitable block-structure which is suggested by the problem in hand); the condition number ( )MA should be relatively small. The latter fact may be helpful when attempting to solve the preconditioned system (2), with a technique sensitive to ( )MA (e.g. the Krylov subspace methods); the eigenvalues in the spectrum [ ]MA should be as clustered as possible (see e.g. (Nocedal and Wright, 2000)). The latter fact may again be helpful whenever Krylov subspace methods are adopted to solve (2). Since we want to deal with the large scale case, without any assumption on the sparsity pattern of A, the main approaches in the literature for building and applying a preconditioner to (1) often gain information on the system matrix by computing the matrix-vector product ,A v with , n v R or rely on numerical differentiation. In particular, among the approaches proposed we have the following: Approximate Inverse Preconditioners based on using the BFGS update method (see also (Nash, 1985; Benzi et al., 2000), and references therein). Here, the main adopted idea is that a BFGS update may be suitably used to compute the approximate inverse # A of matrix A. Then, the matrix # A is applied as a preconditioner. Preconditioners based on the L-BFGS method (see also (Morales and Nocedal, 2000; 2001), and the class of preconditioners Limited Memory Preconditioners (LMPs) by Giraud and Gratton (2006)), which pursue an idea similar to Approximate Inverse Preconditioners in the previous item. Approximate Inverse Preconditioners based on the use of Krylov subspace methods (see also (Simoncini and Szyld, 2007; Fasano and Roma, 2009)), where a Krylov subspace method is used to determine the solution of (1) and to provide information in order to build a preconditioner for (1). Band Preconditioners based on matrix scaling/balancing (see also (Roma, 2005)). Preconditioners based on numerical differentiation (see e.g. (Higham, 2002)). Band Preconditioners based on the BFGS method (see e.g. (Luksan et al., 2010)), where a BFGS update is partially modified, so that suitable band preconditioners are defined for linear systems. This approach was mainly tested within truncated-Newton frameworks. For each of the preconditioning strategies mentioned above we have both a theoretical analysis and a numerical experience, for validation. In this paper we want to follow and generalize the approach proposed in Fasano and Roma (2009), where at any step, an iterative Krylov subspace method is used to compute the approximate inverse # ,A over an increasing dimensional subspace. Note that as detailed in Section 3, our approach also encompasses diagonal and block-diagonal preconditioners. 3. Our class of preconditioners In this section we first introduce some preliminaries, then we propose our class of preconditioners. Consider the indefinite linear system GIOVANNI FASANO and MASSIMO ROMA 66 = ,Ax b (3) where n n A R is symmetric, n is large and . n b R Suppose any Krylov subspace method is used for the solution of (3), e.g. the Lanczos process or the CG method (Golub and Van Loan, 1996) (but MINRES (Saad, 2003) or Planar-CG methods (Hestenes, 1980; Fasano, 2005) may be also alternative choices). They are equivalent as long as 0,A whereas the CG, though cheaper, in principle may not cope with the indefinite case. In the next Assumption 3.1 we suppose that a finite number of steps, say ,h n of the adopted Krylov subspace method are performed. Assumption 3.1 Let us consider any Krylov subspace method to solve the symmetric linear system (3). Suppose at step h of the Krylov subspace method, with 1,h n the matrices , n h hR R h h hT R and the vector 1 n hu R are generated, such that 1 1 1= , , and T h h h h h h hAR R T u e R (4) , if is indefinite = , if is positive definite T h h h h h T h h h h V B V T T L D L T (5) where • 1= ( ),h hR u u = 0, T i ju u = 1,iu 1 ,i j h • 1 = 0, T h iu u 1 = 1,hu 1 ,i h • hT is irreducible and nonsingular, with eigenvalues 1, , h not all coincident, • 1= diag { }h i h iB , 1= ( ) h h h hV v v R orthogonal, ( , )i iv is an eigenpair of ,hT • 0hD is diagonal, hL is unit lower bidiagonal. Remark 3.1 Note that most of the common Krylov subspace methods for the solution of symmetric linear systems at iteration h may easily satisfy Assumption 3.1 (Saad, 2003; Stoer, 1983). In particular, also observe that from (4) we have = , T h h hT R AR so that whenever 0A , then 0.hT Since the Jordan form of hT in (5) is required only when hT is indefinite, it is important to check whenever 0.hT without computing the eigenpairs of hT if unnecessary. For this purpose, note that the Krylov subspace method adopted always provides relation = , T h h h hT L D L with hL nonsingular and hD block diagonal (blocks can be 1 1 or 2 2 at most), even when hT is indefinite (Saad, 2003; Stoer, 1983; Fasano and Roma, 2007). Thus, checking the entries of hD will suggest if the Jordan form = T h h h hT V B V is really needed for ,hT i.e. if hT is indefinite. Furthermore, the matrix hT captures much information on the eigenvalues of A, corresponding to eigenvectors in 1span{ , , }.hu u Indeed, let , n v R then hR v belongs to the Krylov subspace 1span{ , , }.hu u Now, considering that = T h h hR AR T from (4), and recalling that = , T h h hR R I relation 2 2T hm v v T v M v implies 2 2 2 2 = ( ) ( ) = . T h h h hm R v m v R v A R v M v M R v PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 67 Thus, m and M are respectively a lower bound and an upper bound of eigenvalues of A, corresponding to eigenvectors in 1span{ , , }.hu u Observe also that from Assumption 3.1 the parameter 1h may be possibly nonzero, i.e. the subspace 1span{ , , }hu u is possibly not an invariant subspace under the transformation performed by matrix A (thus, in this paper we consider a more general case with respect to (Baglama et al., 1998)). Remark 3.2 The adopted Krylov subspace method may in general perform m h iterations, generating the orthonormal vectors 1, , .mu u Then, we can set 1 = ( , , ),h h R u u with 1{ , , } {1, , },h m and change relations (4-5) accordingly; i.e. Assumption 3.1 may hold selecting any h out of the m vectors (among 1, , mu u ) computed by the Krylov subspace method. Remark 3.3 For relatively small values of the parameter h in Assumption 3.1 (say 20,h as it often suffices in the applications), the computation of the eigenpairs ( , ),i iv = 1, , ,i h of hT when hT is indefinite may be extremely fast, with standard codes. For example, if the CG is the Krylov subspace method used in Assumption 3.1 to solve (3), then the Matlab (2011) (general) function eigs() requires as low as 4 10 seconds to fully compute all the eigenpairs of ,hT for = 20,h on a commercial laptop. In the case where the CG is the Krylov- subspace method of choice, the matrix hT is tridiagonal. Nonetheless, in a separate paper (Fasano and Roma, 2009) we consider a special case where the request (5) on hT may be considerably weakened under mild assumptions. Moreover, in the paper (Fasano and Roma, 2011b) we also prove that for a special choice of the parameter 'a' used in our class of preconditioners (see below), strong theoretical properties may be stated. On the basis of the latter assumption, we can now define our preconditioners and show their properties. To this aim, considering for the matrix hT the expression (5), we define (see also (Gill et al., 1992)) 1d | | , | |= diag {| |}, if is indefinite , | | = , if is positive definite . T h h h h i h i hef h h h V B V B T T T T As a consequence, when hT is indefinite we have 1 1 ˆ| | =| | = , T h h h h h h hT T T T V I V where the h nonzero diagonal entries of the matrix ˆhI are in the set { 1, 1}. Furthermore, it can be easily seen that | |hT is positive definite, for any h, and the matrix 1 2 1 | | | | =h h h hT T T I is the identity matrix. Now let us introduce the following n n matrix, which depends on the real parameter 'a': d 1 1= ( ) | | , 1 , ef T T T T h h h h h h h h h hM I R R R T R a u u u u h n 1 , 1 1 ( 1) , 1 | | 0 = | | 1 0 T hh h TT h h n h hh T n h n h RT ae R u R uae I R (6) d = ( ) | | = | | , ef T T T n n n n n n n n nM I R R R T R R T R (7) where hR and hT satisfy relations (4-5), ,a R the matrix [ ( 1)] , 1 n n h n hR R is such that GIOVANNI FASANO and MASSIMO ROMA 68 , 1 , 1 ( 1)= T n h n h n hR R I and 1 , 1| |h h n hR u R is orthogonal. Observe that of course the matrix , 1n hR in (6) always exists, with , 1 , 1 1 1= ( | )( | ) . T T n h n h n h h h hR R I R u R u Using the parameter dependent matrix hM in (6-7) we are now ready to introduce our class of preconditioners # 1 1( , , ) = | | T T h n h h h hM a D D I R u R u D 1 2 1 1 | | | | 1 Th h h h h h T h T ae R Du R Du ae (8) # 1 ( , , ) = | | . T n n n nM a D R T R (9) Theorem 3.1 Consider any Krylov subspace method to solve the symmetric linear system (3). Suppose that Assumption 3.1 holds and the Krylov method performs h n iterations. Let ,a R = 0, and let the matrix n n D R be such that 1 , 1| |h h n hR u DR is nonsingular, where , 1 , 1 1 1= | | TT n h n h n h h h hR R I R u R u . Then, we have the following properties: a) the matrix # ( , , )hM a D is symmetric. Furthermore - when 1,h n for any 1 1/2 { ( | | ) }, T h h ha R e T e # ( , , )hM a D is nonsingular; - when =h n the matrix # ( , , )hM a D is nonsingular; b) the matrix # ( , , )hM a D coincides with 1 hM as long as either = nD I and = 1, or = ;h n c) for 1 1/2 | |<| | ( | | ) T h h ha e T e the matrix # ( , , )hM a D is positive definite. Moreover, if = nD I the spectrum # [ ( , , )]h nM a I is given by 1 2 # ( 1) | | [ ( , , )] = ; 1 h h h n n h T h T ae M a I I ae d) when 1:h n - if D is nonsingular then # ( , , )hM a D A has at least 3h singular values equal to 2 1 / ; - if D is nonsingular and = 0a then the matrix # ( , , )hM a D A has at least 2h singular values equal to 2 1 / ; e) when = ,h n then # 1 ( , , ) = ,n nM a D M [ ] = [| |]n nM T and 1 1 [ ] = [ ] { 1, 1},n nM A AM i.e. the n eigenvalues of the preconditioned matrix # ( , , )hM a D A are either +1 or -1. Proof. See (Fasano and Roma, 2011b) for the proof. □ The Corollary which follows considers the important particular case obtained by setting = 0,a = 1 and PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 69 = ,nD I in the definition of the preconditioner # ( , , ).hM a D Corollary 3.2 Consider any Krylov subspace method to solve the symmetric linear system (3). Suppose that Assumption 3.1 holds and the Krylov subspace method performs h n iterations. Then, the preconditioner # 1 1(0,1, ) = | | T h n n h h h hM I I R u R u 1 1 1 | | 0 | | 0 1 Th h h h h T R u R u (10) # 1 (0,1, ) = | | , T n n n n nM I R T R (11) is such that a) the matrix # (0,1, )h nM I is symmetric and nonsingular for any ;h n b) the matrix # (0,1, )h nM I coincides with 1 ,hM for any ;h n c) the matrix # (0,1, )h nM I is positive definite. Moreover, its spectrum # [ (0,1, )]h nM I is given by # 1[ (0,1, )] = | | ;h n h n hM I T I d) when 1,h n then the matrix # (0,1, )h nM I A has at least 2h singular values equal to +1; e) when = ,h n then [ ] = [| |]n nM T and # 1 1 [ (0,1, ) ] = [ ] = [ ] { 1, 1},n n n nM I A M A AM i.e. the n eigenvalues of # (0,1, )h nM I A are either +1 or -1. Proof. The result is directly obtained from (6-7) and Theorem 3.1, with = 0,a = 1 and = .nD I □ Remark 3.4 As stated in the comments to relation (6), the matrix , 1n hR in the statement of Theorem 3.1 always exists, such that 1 , 1| |h h n hR u R is orthogonal. However, , 1n hR is neither built nor used in (8-9), and it is introduced only for theoretical purposes. Furthermore, it is easy to see that since 1 , 1| |h h n hR u R is orthogonal, any nonsingular diagonal matrix D may be used in order to satisfy the hypotheses of Theorem 3.1. Remark 3.5 Observe that the case h n in Theorem 3.1 is of scarce interest for large scale problems. Indeed, in the literature of preconditioners the values of 'h' typically do not exceed 10-20 (Morales and Nocedal, 2001; Gratton et al., 2011). Moreover, for small values of h in (8), the computation of the inverse matrix 1 2 | | 1 h h T h T ae ae (12) in order to provide # ( , , )h nM a I or # ( , , ),hM a D may be cheaply performed when hT is either indefinite or positive definite. Indeed, after a brief computation we have 1 1 1 1 1 2 2 4 2 1 2 1 | | | | | | | | | | = , 1 | | T h h h h h h h h h T T h h h a T T e e T T e T ae ae e T a (13) with GIOVANNI FASANO and MASSIMO ROMA 70 2 1 2 = . 1 | | T h h h a a e T e (14) Thus, when hT is indefinite, Remark 3.3 and relation (13) will provide the result. On the other hand, in case 0,hT it suffices to use (13). Finally, the proper setting of the parameter 'a' allows to easily control the condition number of matrix (12). Figure 1. The condition number of matrix A (i.e. ( )Cond A ) along with the condition number of matrix # (0,1, )h nM I A (i.e. 1 ( )Cond M A ), when {10, 20, 30, 40, 50, 60, 70, 80, 90},h and A is randomly chosen with entries in the uniform distribution [ 10,10].U 4. Preliminary numerical results In order to preliminarily test our proposal on a general framework, without any assumption on the sparsity pattern of the matrix A, we used our parameter dependent class of preconditioners # ( , , ),hM a D setting = 1 and = .nD I We anticipate that in our numerical experience we obtain very interesting results concerning the correspondence between theoretical and numerical results. Indeed, all the results stated in Theorem 3.1 for the singular values of the (possibly) unsymmetric matrix # ( , , ) ,hM a D A seem to hold in practice also for the eigenvalues of # ( , , )hM a D A (it is worth recalling that since # ( , , ) 0hM a D , then # # 1/2 # 1/2 [ ( , , ) ] [ ( , , ) ( , , )h h hM a D A M a D AM a D ]), so that # ( , , )hM a D A has only real eigenvalues. Regarding the numerical investigation, we used 3 different sets of test problems. First, we considered a set of symmetric linear systems as in (3), where the number of unknowns n is set as = 1000,n and the matrix A has also a moderate condition number. We simply wanted to experience how our class of preconditioners affects the condition number of A. In particular (see also (Geman, 1980)), a possible choice for the latter class of matrices is given by ,= { }, [ 10,10], , = 1, , ,i j ijA a a U i j n (15) where , ,=i j j ia a are random entries in the uniform distribution [ 10,10],U between -10 and +10. Then, also the vector b in (3) is computed randomly with entries in the set [ 10,10].U We computed the preconditioners PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 71 (8-9) by using the Conjugate Gradient (CG) method (Higham, 2002), which is one of the most popular Krylov subspace iterative methods to solve (3) (Golub and Van Loan, 1996). We remark that the CG is also often used in case the matrix A is indefinite, though it can prematurely stop. As an alternative choice, in order to satisfy Assumption 3.1 with A indefinite, we can use the Lanczos (1950) process, MINRES methods (Paige and Saunders, 1975) or Planar-CG methods (Fasano, 2005). In (8) we set the parameter h in the range { 20 , 30 , 40 , 50 , 60 , 70 , 80 , 90 },h and we preliminarily chose = 0a (though other choices of the parameter 'a' yield similar results), which satisfied items a) and c) of Theorem 3.1. We have generated several matrices like (15), obtaining very similar results. In particular, given one instance of A as in (15), we plotted in Figure 1 the condition number ( )A of A ( ( )Cond A ), along with the condition number # ( (0,1, ) )h nM I A of # (0,1, )h nM I A ( 1 ( )Cond M A ): in both cases the condition number was calculated by preliminarily computing the eigenvalues 1, , n (using Matlab (2011) routine eigs()) of A and # (0,1, )h nM I A respectively, then obtaining the ratio | |max = . | |min i i i i Evidently, numerical results confirm that the order of the condition number of A is pretty similar to that of the condition number of # (0,1, ) .h nM I A This indicates that if the preconditioners (8) are used as a tool to solve (3), then most preconditioned iterative methods which are sensitive to the condition number (e.g. the Krylov subspace methods) are, on average, not expected to perform worse with respect to the unpreconditioned case. However, it is important to remark that the spectrum # [ (0,1, ) ]h nM I A tends to be shifted with respect to [ ],A inasmuch as the eigenvalues in [ ]A whose absolute value is larger than +1 tend to be scaled in # [ (0,1, ) ]h nM I A (see Figure 2). The latter property is an appealing result as described in Section 1, since the eigenvalues of # (0,1, )h nM I A will be 'more clustered'. The latter phenomenon was better investigated by introducing other sets of test problems, described hereafter. Table 1. The adopted linesearch-based truncated Newton method GIOVANNI FASANO and MASSIMO ROMA 72 Figure 2. Comparison between the full/detailed spectra (left/right figures) [ ]A (Unprecond) and # [ (0,1, ) ]h nM I A (Precond), with A randomly chosen (eigenvalues are sorted for simplicity); without loss of generality we show the results for the values 5= = 20h h and 6= = 30.h h The intermediate eigenvalues in the spectrum # [ (0,1, ) ],h nM I A whose absolute value is larger than 1, are in general smaller than the corresponding eigenvalues in [ ].A The eigenvalues in # [ (0,1, ) ]h nM I A are more clustered near +1 or -1 than those in [ ].A In a second experiment we generated the set of matrices A such that = ,A HDH (16) where , n n H R = 500n , is a Householder transformation given by = 2 , T H I vv with n v R a randomly chosen unit vector. The matrix n n D R is diagonal (so that its non-zero entries are also eigenvalues of A, while each column of H is also an eigenvector of A). The matrix D is such that its perc n eigenvalues are larger (about one order of magnitude) than the remaining (1 )perc n eigenvalues (we set = 0.3perc ). Finally, again we computed the preconditioners (8-9) by using the CG, setting the starting point 0x so that the initial residual 0b Ax was a linear combination (with randomly chosen coefficients -1 and +1) of all the n eigenvectors of A. We strongly highlight that the latter choice of 0x is expected to be not favorable when applying the CG, in order to build our preconditioners. In the latter case the CG method is indeed expected to perform exactly n iterations before stopping (see also (Nocedal and Wright, 2000; Saad, 2003)), so that the matrices (4.2) may be significant to test the effectiveness of our preconditioners, in case of small values of h (broadly speaking, h small implies that the preconditioner contains correspondingly little information on the inverse matrix 1 A ). We compared the spectra [ ]A and # [ (0,1, ) ],h nM I A in order to verify again how the preconditioners (8) are able to cluster the eigenvalues of A. Following the guidelines in (Morales and Nocedal, 2001), in order to test our proposal also on a different range of values for the parameter h, we set PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 73 { 4 , 8 , 12 , 16 , 20 }.h The results are given in Figure 3 (full comparisons) which includes all the 500 eigenvalues, and Figure 4 (details) which includes only the eigenvalues from the 410-th eigenvalue to the 450-th eigenvalue. Observe that our preconditioners are able to shift the largest absolute eigenvalues of A towards -1 or +1, so that the clustering of the eigenvalues is enhanced when the parameter h increases. For each value of h the matrix A is (randomly) recomputed from scratch, according to relation (16). This explains why in the five plots of Figures 3-4 the spectrum of A changes. Again, a behavior very similar to Figures 3-4 is obtained also using different values for the parameter 'a'. Figure 3. Comparison between the full spectra [ ]A (Unprecond) and # [ (0,1, ) ]h nM I A (Precond), with A nonsingular and given by (16) (eigenvalues are sorted for simplicity); we used different values of h ( 1 = 4,h 2 = 8,h 3 = 12,h 4 = 16,h 5 = 20h ), setting = 500.n The large eigenvalues in the spectrum # [ (0,1, ) ]h nM I A are in general smaller (in modulus) than the corresponding large eigenvalues in [ ].A A 'flatter' piecewise-line of the eigenvalues in # [ (0,1, ) ]h nM I A indicates that the eigenvalues tend to cluster around -1 and +1, according with the theory. GIOVANNI FASANO and MASSIMO ROMA 74 Figure 4. Comparison between a detail of the spectra [ ]A (Unprecond) and # [ (0,1, ) ]h nM I A (Precond), with A nonsingular and given by (16) (eigenvalues are sorted for simplicity); we used different values of h ( 1 = 4,h 2 = 8,h 3 = 12,h 4 = 16,h 5 = 20h ), setting = 500.n The large eigenvalues in the spectrum # [ (0,1, ) ]h nM I A are in general smaller (in modulus) than the corresponding large eigenvalues in [ ].A A 'flatter' piecewise-line of the eigenvalues in # [ (0,1, ) ]h nM I A indicates that the eigenvalues tend to cluster around -1 and +1, according with the theory. To complete our preliminary experience we tested our class of preconditioners in optimization frameworks. In particular, we considered a standard linesearch-based truncated Newton method in Table 1, where for any 0k the solution of the symmetric linear system (Newton's equation) 2 ( ) = ( )k kf x d f x is required. We considered several unconstrained optimization problems from CUTEr collection (Gould et al., 2003), and for each problem we applied the truncated Newton method in Table 1. At the outer iteration k we computed the preconditioner # (0,1, ),h nM I with {4, 8, 12, 16, 20},h by using the CG to solve the equation 2 ( ) = ( ).k kf x d f x Then, we adopted # (0,1, )h nM I as a preconditioner for the solution of Newton's equation at the subsequent iteration PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 75 2 1 1( ) = ( ) .k kf x d f x The iteration index 'k' was the first index such that both relations 31 10 and 0.95k k k k k x x x (17) hold (the first relation implies that 1 ,k kx x while the second holds when the search direction 1( ) /k k kx x approaches Newton's step). Thus, the index k was chosen in order to have 1k kx x small, i.e. the entries of the Hessian matrices 2 ( )kf x and 2 1( )kf x are not expected to differ significantly. For simplicity we just report the results on two test problems, using = 1000,n in the set of all the optimization problems experienced. Very similar results were obtained for almost all the test problems. Figure 5. The condition number of matrix 2 1( )kf x ( ( )Cond A ) along with the condition number of matrix # 2 1(0,1, ) ( )h n kM I f x ( 1 ( )Cond M A ), for the optimization problem NONCVXUN, when 1 17.h The condition number of 2 1( )kf x is close to the condition number of # 2 1(0,1, ) ( ),h n kM I f x for any value of the parameter h. The value = 175k was chosen as in (17) and it was 176 175 0.083.x x In Figures 5-6 we consider the problem NONCVXUN. For the sake of brevity we only show the numerical results using = 16h in (8). Observe that since 1kx is close to kx (i.e. we are eventually converging to a local minimum) the Hessian matrix 2 1( )kf x is positive semidefinite. Furthermore, again the eigenvalues larger than +1 in 2 1[ ( )]kf x are scaled in # 2 1[ (0,1, ) ( )].h n kM I f x Similarly we show in Figures 7-8 the results for the test function NONDQUAR in CUTEr collection. The test problems in this optimization framework, where the preconditioner # (0,1, )h nM I is computed at the outer iteration k and used at the outer iteration 1,k confirm that the properties of Theorem 3.1 may hold also when # (0,1, )h nM I is used on a sequence of linear systems = ,k kA x b when kA changes slightly with k. GIOVANNI FASANO and MASSIMO ROMA 76 Figure 6. Comparison between the full spectra/detailed spectra (left figure/right figure) of 2 1( )kf x (Unprecond) and # 2 1(0,1, ) ( )h n kM I f x (Precond), for the optimization problem NONCVXUN, with 4= = 16.h h The eigenvalues in # 2 1[ (0,1, ) ( )]h n kM I f x larger than +1 are evidently scaled, so that # 2 1[ (0,1, ) ( )]h n kM I f x is more clustered. Figure 7. The condition number of matrix 2 1( )kf x ( ( )Cond A ) along with the condition number of matrix # 2 1(0,1, ) ( )h n kM I f x ( 1 ( )Cond M A ), for the optimization problem NONDQUAR, when 1 17.h The condition number of 2 1( )kf x is now slightly larger than the condition number of # 2 1(0,1, ) ( )h n kM I f x (though they are both 10 10 ). The value = 40k was chosen as in (17) and it was 41 40 0.203.x x 5. Conclusions We have given theoretical and numerical results for a class of preconditioners, which are parameter dependent. The preconditioners in our proposal can be built by using any Krylov method for the symmetric PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 77 linear system (3), provided that it is able to satisfy the general conditions (4-5) in Assumption 3.1. The latter property may be appealing in several real problems, where a few iterations of the adopted Krylov subspace method may suffice to compute an effective preconditioner. Figure 8. Comparison between the full spectra/detailed spectra (upper figure/lower figures) 2 1[ ( )]kf x (Unprecond) and # 2 1[ (0,1, ) ( )]h n kM I f x (Precond), for the optimization problem NONDQUAR, with 4= = 16.h h Some nearly-zero eigenvalues in the spectrum 2 1[ ( )]kf x are shifted to non-zero values in # 2 1[ (0,1, ) ( )].h n kM I f x Since many eigenvalues in 2 1[ ( )]kf x are zero or nearly-zero, the preconditioner # (0,1, )h nM I may be of slight effect, unless large values of the parameter h are considered. GIOVANNI FASANO and MASSIMO ROMA 78 Our proposal seems tailored also for those cases where a sequence of linear systems of the form = , = 1, 2,k kA x b k requires a solution (e.g., see (Morales and Nocedal, 2001) for details), where kA slightly changes with the index k. In the latter case, the preconditioner # ( , , )hM a D in (8-9) can be computed applying the Krylov subspace method to the first linear system 1 1= .A x b Then, # ( , , )hM a D can be used to efficiently solve = ,k kA x b with = 2, 3,k . On this guideline, in a future work we are going to experience our proposal with other preconditioners described in Section 2. In particular, we think that a comparison with the proposals in (Gratton et al., 2011; Morales and Nocedal, 2000) could be noteworthy. Finally, the class of preconditioners in this paper seems to be an interesting tool also for the solution of linear systems in financial frameworks. In particular, in future works we want to focus on symmetric linear systems arising when we impose KKT conditions in portfolio selection problems, with a large number of titles in the portfolio, along with linear equality constraints (see also (Al-Jeiroudi et al., 2008)). 6. Acknowledgment The first author wishes to thank the Italian Ship Model Basin - INSEAN, CNR institute, for their support. 7. References AL-JEIROUDI, G., GONDZIO, J. and HALL, J. 2008. Preconditioning indefinite systems in interior point methods for large scale linear optimization. Optimization Methods and Software, 23: 345–363. BAGLAMA, J., CALVETTI, D., GOLUB, G. and REICHEL, L. 1998. Adaptively preconditioned GMRES algorithms. SIAM Journal on Scientific Computing, 20: 243–269. BENZI, M., CULLUM, J. and TUMA, M. 2000. Robust approximate inverse preconditioner for the conjugate gradient method. SIAM Journal on Scientific Computing, 22: 1318–1332. DENNIS, J. and SCHNABEL, R. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs. FASANO, G. 2005. Planar–conjugate gradient algorithm for large–scale unconstrained optimization, Part 1: Theory. Journal of Optimization Theory and Applications, 125: 523–541. FASANO, G. and ROMA, M. 2007. Iterative computation of negative curvature directions in large scale optimization. Computational Optimization and Applications, 38: 81–104. FASANO, G. and ROMA, M. 2009. Preconditioning Newton-Krylov methods in nonconvex large scale optimization. Submitted to Computational Optimization and Applications. FASANO, G. and ROMA, M. 2011a. A class of preconditioners for large indefinite linear systems, as by- product of Krylov-based methods: Part 1. Technical Report n. 4, Department of Management, University Ca’Foscari, Venice, Italy. FASANO, G. and ROMA, M. 2011b. A class of preconditioners for large indefinite linear systems, as by- product of Krylov-based methods: Part 2. Technical Report n. 5, Department of Management, University Ca’Foscari, Venice, Italy. GEMAN, S. 1980. A limit theorem for the norm of random matrices. The Annals of Probability, 8: 252–261. GILL, P.E., MURRAY, W., PONCELEÓN, D.B. and SAUNDERS, M.A. 1992. Preconditioners for indefinite systems arising in optimization. SIAM Journal on Matrix Analysis and Applications, 13: 292–311. GIRAUD, L. and GRATTON, S. 2006. On the sensitivity of some spectral preconditioners. SIAM Journal on Matrix Analysis and Applications, 27: 1089–1105. PRECONDITIONING LARGE INDEFINITE LINEAR SYSTEMS 79 GOLUB, G. and VAN LOAN, C. 1996. Matrix Computations. The John Hopkins Press, Baltimore, Third edition. GOULD, N.I.M., ORBAN, D. and TOINT, P.L. 2003. CUTEr (and sifdec), a constrained and unconstrained testing environment (revised). ACM Transaction on Mathematical Software, 29: 373–394. GRATTON, S., SARTENAER, A. and TSHIMANGA, J. 2011. On a class of limited memory preconditioners for large scale linear systems with multiple right-hand sides. SIAM Journal on Optimization, 21: 912-935. HESTENES, M. 1980. Conjugate Direction Methods in Optimization. Springer Verlag, New York. HIGHAM, N. 2002. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA, Second edition. LANCZOS, C. 1950. An iteration method for the solution of the eigenvalue problem of linear differential and integral. Journal of Research of the National Bureau of Standards, 45: 255–282. LUKSAN, L., MATONOHA, C. and VLCEK, J. 2010. Band preconditioners for the matrix free truncated Newton method. Technical Report n. 1079, Institute of Computer Science, Academy of Sciences of the Czech Republic. MATLAB. 2001. Release 2011a, The MathWorks Inc. MORALES, J. and NOCEDAL, J. 2000. Automatic preconditioning by limited memory quasi–Newton updating. SIAM Journal on Optimization, 10: 1079–1096. MORALES, J. and NOCEDAL, J. 2001. Algorithm PREQN: Fortran 77 subroutine for preconditioning the conjugate gradient method. ACM Transaction on Mathematical Software, 27: 83–91. NASH, S. 1985. Preconditioning of truncated-Newton methods. SIAM J. Scientific and Statistical Computing, 6: 599–616. NASH, S. 2000. A survey of truncated-Newton methods. Journal of Computational and Applied Mathematics, 124: 45–59. NOCEDAL, J. and WRIGHT, S. 2000. Numerical Optimization (Springer Series in Operations Research and Financial Engineering) - Second edition, Springer, New York. PAIGE, C. and SAUNDERS, M. 1975. Solution of sparse indefinite systems of linear equations. SIAM Journal on Numerical Analysis, 12: 617–629. ROMA, M. 2005. A dynamic scaling based preconditioning for truncated Newton methods in large scale unconstrained optimization. Optimization Methods and Software, 20: 693–713. SAAD, Y. 2003. Iterative Methods for Sparse Linear Systems. Second Edition, SIAM, Philadelphia. SIMONCINI, V. and SZYLD, D.B. 2007. Recent developments in Krylov subspace methods for linear systems. Numer. Linear Algebra with Appl., 14: 1–59. STOER, J. 1983. Solution of large linear systems of equations by conjugate gradient type methods. In Mathematical Programming. The State of the Art, A. BACHEM, M. GRÖTSCHEL, and B. KORTE, eds., Berlin Heidelberg, Springer-Verlag, pp. 540–565. Received 15 July 2011 Accepted 13 September 2011