A Mathematical Journal Vol. 6, No 4, (73 - 94). December 2004. A Restricted Additive Schwarz Preconditioner with Harmonic Overlap for Symmetric Positive Definite Linear Systems Xiao-Chuan Cai 1 Department of Computer Science, University of Colorado Boulder, CO 80309 cai@cs.colorado.edu Maksymilian Dryja 2 Faculty of Mathematics, Informatics and Mechanics Warsaw University, Warsaw dryja@mimuw.edu.pl Marcus Sarkis 3 Mathematical Sciences Department, Worcester Polytechnic Institute Worcester, MA 01609 msarkis@wpi.edu ABSTRACT A restricted additive Schwarz (RAS) preconditioning technique was intro- duced recently for solving general nonsymmetric sparse linear systems. In this paper, we provide an extension of RAS for symmetric positive definite prob- lems using the so-called harmonic overlaps (RASHO). Both RAS and RASHO outperform their counterparts of the classical additive Schwarz variants (AS). The design of RASHO is based on a much deeper understanding of the behavior of Schwarz type methods in overlapping subregions, and in the construction of 1Supported in part by the NSF grants ASC-9457534, ECS-9725504, and ACI-0072089. 2Supported in part by the NSF grant CCR-9732208 and in part by the Polish Science Foundation grant 2 P03A 021 16. 3Supported in part by the NSF grant CCR-9984404. 74 6, 4(2004) the overlap. In RASHO, the overlap is obtained by extending the nonoverlap- ping subdomains only in the directions that do not cut the boundaries of other subdomains, and all functions are made harmonic in the overlapping regions. As a result, the subdomain problems in RASHO are smaller than that of AS, and the communication cost is also smaller when implemented on distributed memory computers, since the right-hand sides of discrete harmonic systems are always zero that do not need to be communicated. We also show numerically that RASHO preconditioned CG takes fewer number of iterations than the cor- responding AS preconditioned CG. A nearly optimal theory is included for the convergence of RASHO/CG for solving elliptic problems discretized with a finite element method. Key words and phrases: Restricted additive Schwarz preconditioner, domain decomposition, harmonic overlap, elliptic equations, finite elements Math. Subj. Class.: 65N30, 65F10 1 Introduction A restricted additive Schwarz (RAS) preconditioning technique was introduced re- cently for solving general nonsymmetric sparse linear systems [1, 5, 7, 13, 15, 16, 17]. RAS outperforms the classical additive Schwarz preconditioner (AS) [8, 20] in the sense that it requires fewer number of iterations, as well as smaller communication and CPU time costs when implemented on distributed memory computers, [1]. Un- fortunately, RAS in its original form is nonsymmetric and therefore the conjugate gradient method (CG) cannot be used [14]. Although a symmetrized version was constructed in [7], our numerical experiments show that it often takes more itera- tions than the corresponding AS/CG. In this paper we propose another modification of RAS and show in both theory and numerical experiments that this new variant works well for symmetric positive definite sparse linear systems and is superior to AS. Recall that the basic building blocks of classical Schwarz type algorithms are realized by solving the linear systems of the form Aδi w = R δ i v (1) on each extended subdomain, where Aδi is the extended subdomain stiffness matrix and Rδi is the restriction operator for the extended subdomain (formal definitions will be given later in the paper). The key idea of RAS is that equation (1) is replaced by Aδi w = { v inside the un-extended subdomain 0 in the overlapping part of the subdomain. (2) 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 75 Note that the solution of (2) is discrete harmonic in the overlapping part of the subdomain, and therefore carries minimum energy in some sense. Setting part of the right-hand side vector to zero reduces the energy of the solution, and also destroys the symmetry of the additive Schwarz operator. In this paper, we further explore the idea of “harmonic overlap” and at the same time keep the symmetry of the Schwarz preconditioner. We mention that other approaches can also be taken to modify the Schwarz algorithm in the overlapping regions, such as allowing the functions to be discontinuous [4]. The algorithm to be discussed below is applicable for general symmetric positive definite problems. However, in order to provide a complete mathematical analysis, we restrict our discussion to a finite element problem, [3]. We consider a simple variational problem: Find u ∈ H10 (Ω), such that a(u, v) = f (v), ∀ v ∈ H10 (Ω), (3) where a(u, v) = ∫ Ω ∇u · ∇v dx and f (v) = ∫ Ω f v dx for f ∈ L2(Ω). For simplicity, let Ω be a bounded polygonal region in �2 with a diameter of size O(1). The extension of the results to �3 can be carried out easily by using the theory developed here in this paper and the well-known three-dimensional additive Schwarz techniques; [9, 10, 12]. Let T h(Ω) be a shape regular, quasi-uniform triangulation, of size O(h), of Ω and V ⊂ H10 (Ω) the finite element space consisting of continuous piecewise linear functions associated with the triangulation. We are interested in solving the following discrete problem associated with (3): Find u∗ ∈ V such that a(u∗, v) = f (v), ∀ v ∈ V. (4) Using the standard basis functions, (4) can be rewritten as a linear system of equations Au∗ = f. (5) For simplicity, we understand u∗ and f both as functions and vectors depending on the situation. The paper is organized as follows. In section 2, we introduce notations. The new algorithm is described in section 3. Section 4 is devoted to the mathematical analysis of the new algorithm. We conclude the paper in section 5 by providing some numerical results and final remarks. Through out this paper, C and C0, are positive generic constants that are independent of any of the mesh parameters and the number of subdomains. All the domains and subdomains are assumed to be open; i.e., boundaries are not included in their definitions. 2 Notations Let n be the total number of interior nodes of T h(Ω) and W the set containing all the interior nodes. We assume that a node-based partitioning has been applied and 76 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) resulted in N nonoverlapping subsets W 0i , i = 1, . . . , N , whose union is W . For each W 0i , we define a subregion Ω R i as the union of all elements of T h(Ω) that have all three vertices in W 0i ∪∂Ω. Note that ∪Ω̄Ri is not equal to Ω̄; see Fig. 1(b). We denote by H as the representative size (diameter) of the subregion ΩRi . We define the overlapping partition of W as follows. Let {W 1i } be the one-overlap partition of W , where W 1i ⊃ W 0i is obtained by including all the immediate neigh- boring vertices of all vertices in W 0i ; see Fig. 1(c). Using the idea recursively, we can define a δ-overlap partition of W , W = N⋃ i=1 W δi . Here the integer δ indicates the level of overlap with its neighboring subdomains and δh is approximately the extend of the extension. The definition of W δi , as well as many other subsets, can be found in an illustrative picture, Fig. 1. We next define a subregion of Ω induced by a subset of nodes of T h(Ω) as follows. Let Z be a subset of W . The induced subregion, denoted as Ω(Z), is defined as the union of: (1) the set Z itself; (2) the union all the open elements (triangles) of T h(Ω) that have at least one vertex in Z; and (3) the union of the open edges of these triangles that have at least one endpoint as a vertex of Z. Note that Ω(Z) is always an open region. The extended subregion Ωδi is defined as Ω(W δ i ), and the corresponding subspace as Vδi ≡ V ∩ H10 (Ωδi ) extended by zero to Ω\Ωδi . It is easy to verify that V = Vδ1 + Vδ2 + · · · + VδN . This decomposition is used in defining the classical one-level additive Schwarz algo- rithm [8]. Note that for δ = 0 this decomposition is a direct sum. Let us define P δi : V → Vδi by: for any u ∈ V, a(P δi u, v) = a(u, v), ∀v ∈ Vδi . (6) Then, the classical one-level additive Schwarz operator has the form P δ = P δ1 + · · · + P δN . In the classical AS as defined above, all the nodes of W δi are treated equally even through some subsets of the nodes play different roles in determining the convergence rate of the AS preconditioned CG. To further understand the issue, we classify the nodes as follows. Let Γδi = ∂Ω δ i \∂Ω; i.e., the part of the boundary of Ωδi that does not belong to the Dirichlet part of the physical boundary ∂Ω. We define the interface overlapping boundary Γδ as the union of all Γδi ; i.e., Γ δ = ∪Ni=1Γδi . We also need to define the following subsets of W , see, for examples, Fig. 1, where δ = 1 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 77 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 1: The partition of a finite element mesh into 9 subdomains with the overlapping factor δ = 1. (a) the finite element mesh and nodal points; (b) a node-based partition of the mesh into 9 nonoverlapping subsets, and the collection of “•” forms the set W 0i ; (c) W δi ; (d) W Γ δ ; (e) W Γ δ i ; (f) W Γδ i,in; (g) W Γδ i,cut; (h) W δ i,ovl; (i) W δ i,non; (j) W δ i,in; (k) W̃ δ i ; (l) the shadowed area is Ωδi . 78 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) • W Γδ ≡ W ⋂Γδ (interface nodes) • W Γδi ≡ W Γ δ ⋂ W δi (local interface nodes) • W Γδi,in ≡ W Γ δ ⋂ W 0i (local internal interface nodes) • W Γδi,cut ≡ W Γ δ i \W Γ δ i,in (local cut interface nodes) • W δi,ovl ≡ (W δi \W Γ δ i ) ⋂ ( ⋃ j �=i W δ j ) (local overlapping nodes) • W δi,non ≡ W δi \(W Γ δ i ⋃ W δi,ovl) (local nonoverlapping nodes) • W δi,in ≡ W δi,non ⋃ W Γ δ i,in (internal nodes) We note that the most northwest and the southeast nodes in (c) were added to Γδi in order to make Ω δ i a rectangle. This just to simplify the presentation and it is not required in the implementation of the algorithms. We frequently use functions that are discrete harmonic at certain nodes. Let xk ∈ W be a mesh point and φxk (x) ∈ V the finite element basis function associated with xk; i.e., φxk (xk) = 1, and φxk (xj ) = 0, j �= k. We say u ∈ V is discrete harmonic at xk if a(u, φxk ) = 0. If u is discrete harmonic at a set of nodal points Z, we say u is discrete harmonic in Ω(Z). Our new algorithm will be built on the subspace Ṽδi defined as a subspace of Vδi . Ṽδi consists of all functions that vanish on the cuting nodes W Γ δ i,cut and discrete harmonic at the nodes of W δi,ovl. Note that the support of the subspace Ṽδi is W̃ δi ≡ W δi \W Γ δ i,cut and, since the values at the harmonic nodes are not independent, they can not be counted toward the degree of freedoms. The dimension of Ṽδi is dim ( Ṽδi ) = |W δi,in|. Let Ω(W̃ δi ) be the induced domain. It is easy to see that Ω(W̃ δ i ) is the same as Ω δ i but with cuts. We denote Ω(W̃ δi ) by Ω̃ δ i . We have then Ṽδi = V ∩ H10 (Ω̃δi ) and discrete harmonic on Ω(W δi,ovl). We denote Ω(W δ i,ovl) by Ω δ i,ovl. We define Ṽδ ⊂ Vδ as Ṽδ = Ṽδ1 ⊕ · · · ⊕ ṼδN , 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 79 which is a direct sum. We remark that functions in Ṽδ are, by definition, the sum of functions ui ∈ Ṽδi , i = 1, · · · , N . Functions in Ṽδ can, in fact, be characterized easily as in the following lemma. Lemma 2.1 If u ∈ V and u is discrete harmonic at all the overlapping nodes, i.e., on ∪Ni=1W δi,ovl, then u ∈ Ṽδ. Proof. To prove that u ∈ Ṽδ, all we need is to find a decomposition u = N∑ i=1 ui, with ui ∈ Ṽδi , i = 1, · · · , N. For the given u, we define ui piece by piece as follows. On the nodes in W δi,in we let ui = u. On the nodes in W δi,cut we let ui be zero. On the nodes outside W δ i we set ui to zero. We now need only to define ui on the nodes belong to W δi,ovl. There, we extend ui as a discrete harmonic function with boundary data given by ui just defined. 3 Restricted additive Schwarz with harmonic over- lap (RASHO) Using notations introduced in the previous section, we now describe a new method, namely a restricted additive Schwarz with harmonic overlap. We first define P̃ δi : Ṽδ → Ṽδi as a projection operator, such that, for any u ∈ Ṽδ a(P̃ δi u, v) = a(u, v), ∀v ∈ Ṽδi . (7) The RASHO operator can then be defined as P̃ δ = P̃ δ1 + · · · + P̃ δN . (8) Note, however, that the solution u∗ of (4), see also (5), is not, generally speaking, in the subspace Ṽδ, therefore, the operator P̃ δ cannot be used to solve the linear system (5) directly. We will need to modify the right-hand side of the system (5). A reformulated (5) will be presented in Lemma 3.1 below. We will show that the elimination of the variables associated with the overlapping nodes is not needed in order to apply P̃ δ to any given vector v ∈ P̃ δ. We now introduce a matrix form of (8). We define the restriction operator, or a matrix, R̃δi as follows. Let v = (v1, . . . , vn) T be a vector corresponding to the nodal values of a function u ∈ V; namely for any node xi ∈ W , vi = u(xi). For convenience, we say “v is defined on W ”. Its restriction on W̃ δi , R̃ δ i v, is defined as ( R̃δi v ) (xi) = ⎧⎨ ⎩ vi if xi ∈ W̃ δi 0 otherwise. (9) 80 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) The matrix representation of R̃δi is given by a diagonal matrix with 1 for nodal points in W̃ δi and zero for the remaining nodal points. We remark that, by way of definition, the operator R̃δi is symmetric; i.e., (R̃ δ i ) T = R̃δi . Use this restriction operator, we define the subdomain stiffness matrix as Ãδi = R̃ δ i A (R̃ δ i ) T , which can also be obtained by the discretization of the original finite element problem on W̃ δi with zero Dirichlet data on nodes W \ W̃ δi . The matrix Ãδi is block diagonal with blocks corresponding to the structure of R̃δi and its inverse is understood as an inverse of the nonzero block. A matrix representation of P̃ δi denoted also by P̃ δ i is equals to P̃ δi = ( Ãδi )−1 A and P̃ δ = ( (Ãδ1) −1 + · · · + (ÃδN )−1 ) A. (10) Using the matrix notations, the next lemma shows how to modify the system (5) so that its solution belongs to Ṽδ. Lemma 3.1 Let u∗ and f be the exact solution and the right-hand side of (5), and w = N∑ i=1 (Ãδi ) −1R̃0i f, (11) then, we have ũ∗ = u∗ − w ∈ Ṽδ, which is the solution of the modified linear system of equations Aũ∗ = f − Aw = f̃ . Proof. If we can show that a(w, φk) = f (φk), for a regular basis function associated with an arbitrary overlapping node xk ∈ W δi,ovl, for some i, then we will have a(u∗ − w, φk) = f (φk) − f (φk) = 0, (12) which says that ũ∗ = u∗ − w is discrete harmonic at the overlapping node xk. We can then use Lemma 2.1 to conclude the proof. Let us now consider wi = (à δ i ) −1R̃0i f, which, by definition, is the same as a(wi, φj ) = (φj , R̃ 0 i f ), ∀xj ∈ W̃ δi . 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 81 Here and in the rest of the proof, φj is the basis function associated with the node xj ∈ W̃ δi . Using that R̃0i is symmetric and (φj , R̃ 0 i f ) = (f, R̃ 0 i φj ) = a(u ∗, R̃0i φj ), we get a(wi, φj ) = a(u ∗, R̃0i φj ). (13) Let us compute a(wi, φk). Since xk is an overlapping node, it cannot be on the boundary of Ω̃δi . This leaves us with the following two cases. Case (1): The support of φk(x) belongs to the exterior of Ω̃δi . Since the supports of wi and φk do not overlap, we have a(wi, φk) = 0. Case (2): The support of φk(x) belongs to the interior of Ω̃δi . In this case, we have a(wi, φk) = a(u ∗, R̃0i φk). Taking the sum of the above equality for i = 1, · · · , N , a(w, φk) = a ( N∑ i=1 wi, φk ) = a ( u∗, N∑ i=1 R̃0i φk ) = a(u∗, φk), which proves (12). Here the fact ∑N i=1 R̃ 0 i = I is used. There are basically two ways to compute w in practice. Suppose that subdomain problems are solved using some LU factorization based method. One can use the same factorization of Ãδi to modify the right-hand side of the system and to solve subdomain problems in the preconditioning steps, as what was suggested in Lemma 3.1. Or, one can obtain w by solving several small Poisson problems on each subdomain with zero Dirichlet boundary conditions in the overlapping regions Ωδi,ovl. In both strategies, the computation can be done in parallel and no communication is needed in a distributed memory implementation. Let f̃ = f − Aw, then ũ∗ is the solution of the following linear system of equations Aũ∗ = f̃ . (14) Since ũ∗ ∈ Ṽδ, g ≡ P̃ δũ∗ is well defined, and can be computed without knowing ũ∗ by using the following relations: a(P̃ δi ũ ∗, v) = a(ũ∗, v) = (f̃ , v), ∀v ∈ Ṽδi and i = 1, · · · , N. 82 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) More precisely speaking, we can obtain g by solving the subdomain problems a(gi, v) = (f̃ , v), ∀v ∈ Ṽδi , for i = 1, · · · , N , and taking g = g1 + · · · + gN . With such a right-hand side, we introduce a new linear system P̃ δũ∗ = g, (15) which is equivalent to the linear system (14). The system (15) is a symmetric positive definite system under the usual energy inner product, and therefore, can be solved using the conjugate gradient method. RASHO has a few advantages over the classical AS preconditioner. Let us recall AS briefly. Let ( Rδi v ) (xi) = ⎧⎨ ⎩ vi if xi ∈ W δi 0 otherwise. (16) Then the AS operator takes the following matrix form P δ = ( (Aδ1) −1 + · · · + (AδN )−1 ) A (17) where Aδi = R δ i A(R δ i ) T . Because of the inclusion of the cut interface nodes, the size of the matrix Aδi is |W δi |, which is slightly larger than the size of the matrix Ãδi , which is |W̃ δi |. In a distributed memory implementation, the operation Rδi v involves moving data from one processor to another, but the operation R̃δi v does not involve any communication. More precisely speaking, in RASHO, if u ∈ Ṽδ, then it is easy to see that R̃δi Au = R̃ δ i,inAu, (18) where R̃δi,in is defined as ( R̃δi,inv ) (xi) = ⎧⎨ ⎩ vi if xi ∈ W δi,in 0 otherwise. (19) Therefore, for functions in Ṽδ, we can rewrite P̃ δ, as in (10), in the following form P̃ δ = ( (Ãδ1) −1R̃δ1,in + · · · + (ÃδN )−1R̃δN,in ) A. (20) Although the operator (20) does not look like a symmetric operator, but it is indeed symmetric when applying to functions in the subspace Ṽδ. The form (18) takes the advantage of the fact that the operator R̃δi,in is communication-free in the sense that it needs only the residual associated with nodes in W Γ δ i,in ⊂ Ω0i . We make some further comments on how the residual Au can be calculated in a distributed memory environment, for a given vector u ∈ Ṽδ. In a typical implementa- tion, the matrix A is constructed and stored in the form of {Ãδi }, each processor has 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 83 one or several of the subdomain matrix Ãδi . Similarly u is stored in the form of {ui}, where ui ∈ Ṽδi . We note, however, that to compute the residual at nodes W Γ δ i,in some communications are required. The processor associated with subdomain Ωδi needs to obtain the local solution from the neighboring subdomains at nodes connected to W Γ δ i,in. It is important to note that the amount of communications does not depend on the size of the overlap since only one layer of nodes is required. This shows that in terms of communications, the RASHO is superior to AS and RAS. 4 Theoretical analysis The algorithm presented in the previous section is applicable for general sparse, sym- metric positive definite linear systems. The notions of subdomains, harmonic overlaps, the classification of the nodal points, etc, can all be defined in terms of the graph of the sparse matrix. In this section we provide a nearly optimal estimate for a Poisson equation discretized with a piecewise linear finite element method. We estimate the condition number of the RASHO operator P̃ δ in terms of the fine mesh size h, the subdomain size H, and the overlapping factor δ. We note that because we do not include a coarse space, the constant will depend on the subdomain size H. We shall follow the abstract additive Schwarz theory [20]: Lemma 4.1 Suppose the following assumptions hold: i) There exists a constant C0 such that for any u ∈ Ṽδ there exists a de- composition u = N∑ i=1 ui, where ui ∈ Ṽδi , and N∑ i=1 |ui|2H1(Ωδ i ) ≤ C20 |u|2H1(Ω). ii) There exist constants �ij , i, j = 1, . . . , N such that a(ui, uj ) ≤ �ij a(ui, ui)1/2a(uj, uj )1/2, ∀ui ∈ Ṽδi , ∀uj ∈ Ṽδj . Then, P̃ δ is invertible, symmetric; i.e., a(P̃ δu, v) = a(u, P̃ δv), ∀u, v ∈ Ṽδ, and C−20 a(u, u) ≤ a(P̃ δu, u) ≤ ρ(E)a(u, u), ∀u ∈ Ṽδ. (21) Here ρ(E) is the spectral radius of E, which is a (N ) × (N ) matrix made of {�ij}. It is trivial to see that ρ(E) ≤ C. So our focus in the rest of the paper is in bounding C0. 84 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) 4.1 A partition of unity and a comparison function The construction of a partition of unity is one of the key steps in an additive Schwarz analysis. We construct φi(x) as follows: φi(xk) = ⎧⎪⎨ ⎪⎩ 1 if xk ∈ W Γ δ i,in discrete harmonic if xk ∈ W δi,ovl ∪ W δi,non 0 if xk ∈ W \W̃ δi Note that φi(x) = 0 if x �∈ Ω̃δi . Let us denote Ω(W δi,non) by Ωδi,non, then φi(xk) = 1 at xk ∈ W δi,non for the case Ωδi,non ∩ ∂Ω = ∅ since all the boundary nodes of Ωδi,non belong to W Γ δ i,in. Also, it is easy to see that {φi(x), i = 1, . . . , N} restricted to W Γ δ form a partition of unit. ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ����� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� �������� H ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ Type (2) Type (3) Type (4) Type (1) Figure 2: The partition of Ωδi into the union of four types of subregions. This is a ‘floating’ subdomain with δ = 2. The collection of “•” forms the set W 0i . In addition to φi(x), we also need to construct a comparison function θi(x) for each subdomain Ωδi . Comparison functions, or barrier functions, are very useful for many Schwarz algorithms, such as these on non-matching grids [6]. We will show 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 85 that, even though θi(x) ∈ Vδi , not in Ṽδi as we wished, it can still be used to bound functions in Ṽδi . Both θi(x) and φi(x) depend on the overlapping factor δ. Because φi(x) is discrete harmonic at W δi,ovl ∪ W δi,non, we have a(φi, φi) ≤ a(θi, θi). To construct the function θi(x), we first consider the case when Ω0i is a floating square subdomain. “Floating” refers to the fact that the subdomain doesn’t touch the boundary ∂Ω. The extension to cases when Ωδi touches the boundary is simple and we will comment on it later. To further simplify our arguments, we assume that Ωδi and its neighboring extended subdomains Ωδj are squares of the same size; i.e. sides length equals to H + 2(δ + 1)h. This assumption is equivalent to that ΩR has size H and δ levels of overlap is applied; see Fig. 2. And we also assume the overlap is not too large; for the analysis given below δh no larger than H/4 is enough. Our techniques can be modified to consider larger overlaps and more complex subdomains, although too large of an overlap has little practical value. Roughly speaking, θi(x) equals to φi(x) on W \W δi,ovl. On the overlapping region W δi,ovl we need to define θi(x) carefully so that we can control its energy in the semi H1 norm. For this purpose, we decompose Ωδi into subregions of four types: Ω δ i,non, Ωδδi , Ω δH i , and Ω δδ̃ i and define θi(x) on each piece of the subregion separately. Type (1): The first subregion is Ωδi,non, which a square with sides of size H − 2δh. Type (2): The second subregion Ωδδi is the area where Ω δ i overlaps simulatneously with three neighbors Ωδj . Ω δδ i therefore represents the union of the four corner pieces of Ωδi ; i.e. four squares with sides of size (2δ + 1)h. Type (3) and (4): The area where Ωδi overlaps only one neighbor are four rect- angles of size H − 2δh by (2δ + 1)h. We further partition each of the four rectangles into three smaller rectangles; i.e. two of them are of Ωδδ̃i type and one of them of ΩδHi type. For instance, without lost of generality, Let us consider the intersection of Ωδi with its right neighbor Ω δ j , excluding the corner parts. In this case, the subregion to be partitioned is a rectangle of size (2δ + 1)h in the x direction and H − 2δh in the y direction. The partition of this rectangles gives two smaller rectangles of Ωδδ̃i type with dimensions 2(δ + 1)h × δh and each one has an edge in common with a square of Ωδδi type. We denote them as transition subregions because they are placed between a corner type subregion Ωδδi and a face type subregion Ω δH i . The Ω δH i face type subregions are the smaller rectangles that are placed between the two smaller rectangles of Ωδδ̃i type. Ω δH i face type regions are of size (2δ + 1)h by H − 4δh. For any node x belonging to a Type (1) region Ωδi,non, we define θi(x) to be equal to one; i.e., equals to φi(x). Therefore |φi(x)|2 H1(Ωδ i,non ) = |θi(x)|2H1(Ωδ i,non ) = 0. We next define θi(x), node by node, in Ωδi,ovl, which is the union of corner, tran- sition and face type regions defined above. 86 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) For a Type (2) region Ωδδi . Let Q be such a square with vertices V1 = (a, b), V2 = (a + (2δ + 1)h, b), V3 = (a, b + (2δ + 1)h), and V4 = (a + (2δ + 1)h, b + (2δ + 1)h). We assume that V1, V2, and V4 belong to ∂Ωδi . In other words, Q is located on the southeast corner of Ωδi . Let use also introduce another square region Q̃, with vertices V3 = (a, b + (2δ + 1)h), Ṽ1 = (a, b + δh), Ṽ2 = (a + (δ + 1)h, b + δh), and Ṽ4 = (a + (δ + 1)h, b + (2δ + 1)h). Note that Q̃ is contained in Q, with V3 as the common vertex. To define θi(x) on Q, we set θi(V3) = 1, θi(Ṽ1) = 0, θi(Ṽ2) = 0, θi(Ṽ4) = 0. At the remaining nodes x on the edges Ṽ1Ṽ2 and Ṽ2Ṽ4 we set θi(x) = 0, and on the edges V3Ṽ1 and V3Ṽ4 we set θi(x) = 1. For nodes on Q\Q̃ we set θi(x) = 0. It remains only to define θi(x) for nodes x in the interior of Q̃. To define θi(x) there we use a well-known cutoff function technique, such as the one introduced in Lemma 4.4 of [10] but for two-dimensional square regions. An illustrative picture of θi(x) in a typical region Ωδδi is shown in Fig 3. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Figure 3: An illustrative picture of θi(x) in a typical region Ωδδi . For the completeness of this paper, we include the construction below. Let C be the center of the square Q̃. The construction of θi(x) is defined by the following steps: (1) Define θi(V3) = 1, θi(Ṽ2) = 0, θi(Ṽ1) = 0 and θi(Ṽ4) = 0. (2) For a point P that belongs to the segments V3Ṽ4 or V3Ṽ1, define θi(P ) = 1. For a point P that belongs to the segments Ṽ4Ṽ2 or Ṽ1Ṽ2, define θi(P ) = 0. (3) For a point Y that belongs to the line segment connecting C to V3, define θi(Y ) by linear interpolation between values θi(C) = 1/2 and θi(V3) = 1. For a point 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 87 Y that belongs to the line segment connecting C to Ṽ2, define θi(Y ) by linear interpolation between values θi(C) = 1/2 and θi(Ṽ2) = 0. (4) For a point S that belongs to a line segment connecting a point Y to a vertex Ṽ1 or Ṽ4, define θi(S) = θi(Y ). (5) Note that the θi is defined everywhere on Q̃ ∪ ∂Q̃. θi is continuous everywhere except at the points Ṽ1 and Ṽ4. We redefine θi as the continuous piecewise linear finite element function given by the standard pointwise interpolation. The most important observation of the construction of θi(x) inside Q̃ is that |∇θi(x)| ≤ C/r near Ṽ1 or Ṽ4. Here r is the distance of x to Ṽ1 or Ṽ4. Therefore, we obtain (see [10] and [19]) |θi(x)|2H1(Q) = |θi(x)|2H1(Q̃) ≤ C ( 1 + log ( (δ + 1)h h )) = C(1 + log(δ + 1)). Since inside of Ωδi there are four of those squares we obtain |θi(x)|2H1(Ωδδ i ) ≤ C (1 + log(δ + 1)) . Type (3) regions consist of transition type rectangles. Let us consider one of them and denote it by T , which we assume has vertices at V3 = (a, b + (2δ + 1)h), V4 = (a+(2δ+1)h, b+(2δ+1)h), V5 = (a, b+(3δ+1)h), and V6 = (a+(2δ+1)h, b+(3δ+1)h). Note that T stands on the top of the square Q introduced above and has the common edge V3V4. We define θi(x) over the edge V3V4 to be equal to φi(x). Over the edge V3V5, we set θi(x) = 1. Over the edge V4V6, we set θi(x) = 0. And over the edge V5V6 we let θi(x) decrease linearly from the value 1 to 0. What remains is to define θi(x) inside T . Let us define the nodes Vl = (a + δh, b + (2δ + 1)h) and Vr = (a + (δ + 1)h, b + (2δ + 1)h), which is the same as the node Ṽ4 used in the description of Type (2) regions. The nodes Vl and Vr are exactly the places on the edge V3V4 where φi(x) jumps from 1 to 0. On the triangle V3VlV5 we set θi(x) = 1. On the triangle VrV4V6 we set θi(x) = 0. On the region VlVrV6V5, we let θi(x) decrease linearly in the x direction from the value 1 to 0. We note that next to the nodes VlVr , θi(x) has a singular behavior similar to |∇θi(x)| ≤ C/r where r is the distance from x to the line Vl Vr. Similarly, we have |θi(x)|2H1(T ) ≤ C (1 + log(δ + 1))) . Since there are eight rectangles of Type (3) inside Ωδδ̃i , we obtain |θi(x)|2H1(Ωδδ̃ i ) ≤ C (1 + log(δ + 1)) . Type (4) regions are rectangles of face type. Let R be one of them, and we assume that the vertices are given by V5 = (a, b+(3δ +1)h), V6 = (a+(2δ +1)h, b+(3δ +1)h), V7 = (a, b + H − (δ − 1)h), and V8 = (a + (2δ + 1)h, b + H − (δ − 1)h). Note that R is 88 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) on the top of the rectangle T defined above and its height is H − 4δh. The vertices V6 and V8 are the vertices that belong to ∂Ωδi . We define θi(x) = 1 if x is on the edge V5V7, and equals zero if x is on the edge V6V8, and linear in the x direction for the remaining points. We obtain then |θi(x)|2H1 (R) ≤ H − 4δh (2δ + 1)h . Since there are four of those rectangles inside ΩδHi , we obtain |θi(x)|H1(ΩδH i ) ≤ C H − 4δh (2δ + 1)h ≤ C H (2δ + 1)h . For the cases in which Ω0i touches the boundary ∂Ω, the analysis needs to be modified slightly. The first modification is because the shape of the overlapping region changes slightly, i.e. the longer side is shorter. It is easy to see that we get similar bounds as before. The other modification is because φi on Ωδi,non is not identically equal to one and therefore the corresponding energy is not necessarily zero. For this case we can design θi similarly and obtain |θi(x)|2H1(Ωδi,non) ≤ C ( 1 + log ( H h )) . Putting all pieces of θi(x) together, we see that θi(x) ∈ Vδi and it equals to φi(x) on W Γ δ . Adding all the estimates on subregions of four types, we arrive at the following lemma. Lemma 4.2 For i = 1, · · · , N. θi(x) ∈ Vδi , and φi(x) ∈ Ṽδi , and also (1) |φi|2 H1(Ωδ i ) ≤ |θi|2H1(Ωδ i ) . (2) |θi|2H1(Ωδ i \Ωδ i,non ) ≤ C ( 1 + log(δ + 1) + H (2δ + 1)h ) . (3) if Ωδi,non ∩ ∂Ω = ∅ then |θi|2H1(Ωδ i,non ) = 0. (4) if Ωδi,non ∩ ∂Ω �= ∅ then |θi|2H1(Ωδ i,non ) ≤ C ( 1 + log ( H h )) . Here C > 0 is independent of the parameters h, H and δ. 4.2 A bounded partition lemma To obtain the parameter C0 of Assumption i) of the abstract additive Schwarz theory, see Lemma 4.1, we construct a decomposition of Ṽδ and prove its boundedness below. 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 89 Lemma 4.3 There exists a constant C > 0, independent of h, H, and δ, such that for any u ∈ Ṽδ, there exist ui ∈ Ṽδi , such that u = N∑ i=1 ui, (22) and N∑ i=1 |ui|2H1(Ω) ≤ C (1 + log(δ + 1)) ( 1 + log ( H h )) |u|2 H1(Ω) + C 1 H2 ( 1 + log(δ + 1) + H (2δ + 1)h ) |u|2H1(Ω). (23) Proof. We first construct the decomposition (22). For any given u ∈ Ṽδ, we define ui ∈ Ṽδi as ui(xk) = ⎧⎨ ⎩ u(xk) if xk ∈ W δi,in discrete harmonic if xk ∈ W δi,ovl 0 if xk ∈ W \W̃ δi . It is easy to see (22) holds. For i = 1, . . . , N , let us define vi ∈ Ṽδi by vi = ui − ūiφi ∈ Ṽδi , where ūi = 1 |Ωδi | ∫ Ωδ i udx is the average of u on the extended region Ωδi . Here |Ωδi | is the area of the region Ωδi . The next step is to bound the sums ∑N i=1 |vi|2H1(Ω) and ∑N i=1 |ūiφi|2H1(Ω). For the second sum, we use Lemma 4.2 to obtain N∑ i=1 |ūiφi|2H1(Ω) ≤ C ( 1 + log ( H h )) ∑ i∈∂Ω |ūi|2+ C ( 1 + log(δ + 1) + H (2δ + 1)h )∑ i |ūi|2. Here we use i ∈ ∂Ω to denote the subdomains Ω0i that touch the boundary ∂Ω with a face. By Cauchy-Schwarz and Friedrichs inequalities we have N∑ i=1 |ūi|2 = N∑ i=1 ( 1 |Ωδi | ∫ Ωδ i udx )2 ≤ C N∑ i=1 1 H2 ‖u‖2 L2(Ωδ i ) ≤ C 1 H2 ‖u‖2L2(Ω) ≤ C 1 H2 |u|2H1(Ω). 90 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) And for the cases i ∈ ∂Ω, we can use a Poincaré inequality to obtain ∑ i∈∂Ω |ūi|2 ≤ C ∑ i∈∂Ω 1 H2 ‖u‖2 L2(Ωδ i ) ≤ C ∑ i∈∂Ω |u|2 H1(Ωδ i ) ≤ C|u|2H1(Ω). To bound the other terms |vi|2H1(Ω), i = 1, . . . , N , we use θi(x), i = 1, . . . , N , introduced before. Consider ṽi ∈ Vδi defined as follows ṽi(x) = Ih(θi(x)(u(x) − ūi)). Note that ṽi(x) is equal to vi(x) on W Γ δ i and on ∂Ω δ i . On Ω δ i,ovl and Ω δ i,non, vi is discrete harmonic. Therefore, we have |vi|2H1(Ωδ i ) ≤ |ṽi|2H1(Ωδ i ) . The rest of the proof will be devoted to the estimate of |ṽi|2H1(Ωδ i ) in terms of |u|2 H1(Ωδ i ) . Let K be an element of Ωδi and let us denote wi = u − ūi then |ṽi|2H1(K) = |Ih(θiwi)|2H1(K) ≤ 2|θ̄iwi|2H1(K) + 2|Ih((θ̄i − θi)wi)|2H1(K). (24) Here, θ̄i is the average of θi on K, and Ih is the standard pointwise interpolation. To estimate the first part of (24) we use the fact that |θ̄i| ≤ 1, to obtain |θ̄iwi|2H1(K) = |θ̄i(u − ūi)|2H1(K) ≤ |u − ūi|2H1(K) = |u|2H1(K). The last equality comes from the fact that ūi is a constant. For the second part of (24), according to an inverse inequality we have |Ih((θ̄i − θi)wi)|2H1(K) ≤ C 1 h2 ‖Ih((θ̄i − θi)wi)‖2L2(K). (25) To obtain the bound for the right-hand side of (25), we consider the element K in four different situations corresponding to the four types of subregions into which the the subregion Ωδi is split i.e., Ω δ i,non, Ω δH i , Ω δδ̃ i and Ω δδ i . The proof for the cases K ⊂ ΩδHi and K ⊂ Ωδδ̃i are nearly the same, so we only consider one of them here. For K ⊂ ΩδHi , since ‖θ̄i − θi‖2L∞(K) ≤ C ( h (2δ + 1)h )2 , we obtain 1 h2 ‖Ih((θ̄i − θi)wi)‖2L2(K) ≤ C 1 ((2δ + 1)h)2 ‖wi‖2L2(K). Applying a technique developed in Dryja and Widlund [11], we obtain 1 ((2δ + 1)h)2 ‖wi‖2L2(ΩδH i ) ≤ C ( H (2δ + 1)h |wi|2H1(Ωδ i ) + 1 H((2δ + 1)h) ‖wi‖2L2(Ωδ i ) ) . (26) 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 91 Using the fact |wi|2H1(Ωδ i ) = |u|2 H1(Ωδ i ) and a Friedrichs inequality ‖wi‖2L2(Ωδ i ) ≤ CH2|u|2 H1(Ωδ i ) . (27) Combining the estimates (26) and (27), we obtain 1 ((2δ + 1)h)2 ‖wi‖2L2(ΩδH i ) ≤ C H (2δ + 1)h |u|2 H1(Ωδ i ) . For the case when K ⊂ Ωδδi , we use similar arguments as in Dryja, Smith and Widlund [10] to obtain ∑ K∈Ωδδ i 1 h2 ‖Ih((θ̄i − θi)wi‖2L2(K) ≤ ∑ K∈Ωδδ i C 1 r2 ‖wi‖2L2(K), (28) where ch ≤ r ≤ C((δ + 1)h) is the distance to those “cut pieces”. We have used here that θi(x) has the singular behavior C/r on Ωδδi . We have then ∑ K∈Ωδδ i 1 r2 ‖wi‖2L2(K) ≤ C ∫ C(δ+1)h ch ∫ α r−2r‖wi‖2L∞(Ωδδ i ) dαdr (29) and ‖wi‖2L∞(Ωδδ i ) ≤ C ( 1 + log ( H h )) |u|2 H1(Ωδ i ) . (30) For the inequality (30), we have used a well-known result (see Bramble [2]) ‖u − ūi‖2L∞(Ωδδ i ) ≤ ‖u − ūi‖L∞(Ωδ i ) ≤ C ( 1 + log ( H h )) ‖u − ūi‖2H1(Ωδ i ) and that ūi is the average of u on Ωδi ; i.e., a Friedrichs inequality ‖u − ūi‖2H1(Ωδ i ) ≤ C|u|2 H1(Ωδ i ) . Putting (29) and (30) together, we obtain ∑ K∈Ωδδ i 1 r2 ‖wi‖2L2(K) ≤ C ( (1 + log(δ + 1)) ( 1 + log ( H h ))) |u|2 H1Ωδi . (31) For the case K ⊂ Ωδi,non. If Ω0i is a floating subdomain, which is to say that Ωδi,non does not touch ∂Ω, then θ̄i − θi is zero. If Ωδi,non touches the boundary ∂Ω, then the estimate becomes |vi|2H1(Ωδ i,non ) ≤ C ( |u|2 H1(Ωδ i,non ) + |ūi|2|φi|2H1(Ωδ i,non ) ) ≤ C ( 1 + log ( H h )) |u|2 H1(Ωδ i ) . (32) 92 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) 4.3 The main theorem We state the main theorem of this paper here, the proof follows directly from the abstract AS theory and the lemmas just proved. Theorem 4.1 The RASHO operator P̃ δ is symmetric in the inner product a(·, ·), nonsingular, and bounded in the following sense C−20 a(u, u) ≤ a(P̃ δu, u) ≤ C1a(u, u) ∀u ∈ Ṽδ. (33) Here C20 = C ( (1 + log(δ + 1)) ( 1 + log ( H h )) + 1 H2 ( 1 + log(δ + 1) + H (2δ + 1)h )) . The constants C, C1 > 0 are independent of h, H, and δ. We remark that the corresponding convergence rate estimate for the regular one- level AS [11], in terms of the constant C0, is C20 = C ( 1 + 1 H(2δ + 1)h ) . The lower bound C20 of RASHO is theoretically slightly worse than the lower bound of AS in case of large overlap, but roughly the same for small overlap. On the other hand, the upper bound C1 of RASHO is smaller than the upper bound of AS. We can see this since Ṽδk ⊂ Vδk , ∀k, implies that the positive numbers �ij defined in Lemma 4.1 are smaller for RASHO than the correspondent �ij for AS. Consequently, the spectral radius of E in RASHO is smaller. Because C1 of RASHO is smaller, the numerical performance of RASHO presented in the next section is better than that of AS. We also remark that the results of the paper is for one-level Schwarz algorithms. Because of the “harmonic overlap“ requirement, the extension of the algorithm to multiply levels is not as trivial as the multilevel AS. 5 Numerical experiments In this section, we present some numerical results for solving the Poisson’s equation on the unit square with zero Dirichlet boundary conditions. We compare the perfor- mance of RASHO and AS preconditioned Conjugate Gradient methods in terms of the number of iterations and the condition numbers. We pay particular attention to the dependence on the number of subdomains and the size of overlap. We first discuss a few implementation issues related to the new preconditioner. In order to apply the RASHO/CG method, it is necessary to force the solution to belong to Ṽδ. To do so, a pre-CG-computation is needed, and it is done through the formula (11). We note that u = u∗ − w ∈ Ṽδ, see Lemma 3.1, and therefore, we can apply the regular PCG to the RASHO preconditioned system (15). The AS/CG is 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 93 Table 1: RASHO and AS preconditioned CG for solving the Poisson’s equation on a 128 × 128 mesh decomposed into 2 × 2 = 4 subdomains with overlap = ovlp. The AS/CG results are shown in ( ). The “+1” is for the preprocessing step needed for RASHO. ovlp iter cond max min h 42 (42) 129.(129.) 1.98 (1.98) 0.0154 (0.0154) 3h 24+1 (28) 48.4 (86.3) 1.94 (4.00) 0.0402 (0.0464) 5h 20+1 (23) 33.3 (51.8) 1.91 (4.00) 0.0574 (0.0773) 7h 18+1 (20) 27.2 (37.0) 1.89 (4.00) 0.0694 (0.1081) Table 2: RASHO and AS preconditioned CG for solving the Poisson’s equation on a 32 ∗ DOM × 32 ∗ DOM mesh decomposed into DOM × DOM subdomains with overlap = 3h, i.e. δ = 1. DOM × DOM iter cond max min 2 × 2 19+1 (20) 26.8 (43.7) 1.89 (4.00) 0.0708 (0.0916) 4 × 4 39+1 (42) 86.9 (145.) 1.95 (4.00) 0.0225 (0.0276) 8 × 8 75+1 (78) 328. (550.) 1.97 (4.00) 0.0060 (0.0073) 16 × 16 147+1 (156) 1295 (2168.) 1.98 (4.00) 0.0015 (0.0018) the classical additive Schwarz preconditioned CG as described in [8]. We note that in the case δ = 0, i.e. ovlp = h, RASHO and AS are the same. The stopping condition for CG is to reduce the initial residual by a factor of 10−6. The exact solution of the equation is u(x, y) = e5(x+y) sin(πx) sin(πy). All subdomain problems are solved exactly. The iteration counts (iter), condition numbers (cond), maximum (max) and minimum (min) eigenvalues of the preconditioned matrix are summerized in Table 1, and Table 2. From Table 1 and Table 2, it is clear that RASHO/CG is always better than the classical AS/CG in terms of the iteration counts and condition numbers. Note that there is a practical suggestion for AS that the overlap should be 3h − 5h width. In this case the condition number of RASHO is almost twice smaller than AS. This is an important result since it is easy to modify a (parallel) AS/CG code to obtain a RASHO/CG implementation. Although we do not have any parallel results to report here, we are confident to predict that RASHO/CG would be even better than AS/CG on a parallel computer with distributed memory since much less communications are required. Also the local solvers in RASHO are slightly cheaper since the local solvers have slightly smaller numbers of unknowns than for the regular AS. 94 Xiao-Chuan Cai, Maksymilian Dryja and Marcus Sarkis 6, 4(2004) References [1] S. Balay, W. Gropp, L. McInnes, and B. Smith, The Portable Extensible Toolkit for Scientific Computing (PETSc), www.mcs.anl.gov/petsc, 2001. [2] J. Bramble, A second order finite difference analogue of the first biharmonic boundary value problem, Numer. Math., 9(1966), pp. 236-249. [3] S. Brenner and R. Scott, The Mathematical Theory of Finite Element Meth- ods, Springer, 1994. [4] X.-C. Cai, M. Casarin, F. Elliot, and O. Widlund, Overlapping Schwarz algorithms for solving Helmholtz’s equation, Contemporary Mathematics, 218 (1998), pp. 391-399. [5] X.-C. Cai, C. Farhat, and M. Sarkis, A minimum overlap restricted additive Schwarz preconditioner and applications in 3D flow simulations, Contemporary Mathematics, 218 (1998), pp. 479-485. [6] X.-C. Cai, T. Mathew, and M. Sarkis, Maximum norm analysis of over- lapping nonmatching grid discretizations of elliptic equations, SIAM J. Numer. Anal., 37 (2000), pp. 1709-1728. [7] X.-C. Cai and M. Sarkis, A restricted additive Schwarz preconditioner for general sparse linear systems, SIAM J. Sci. Comput., 21 (1999), pp. 792-797. [8] M. Dryja and O. Widlund, An additive variant of the Schwarz alternating method for the case of many subregions, Technical Report 339, also Ultracom- puter Note 131, Department of Computer Science, Courant Institute, 1987. [9] M. Dryja, M. Sarkis, and O. Widlund, Multilevel Schwarz methods for ellip- tic problems with discontinuous coefficients in three dimensions, Numer. Math., 72(1996), pp. 313–348. [10] M. Dryja, B. Smith, and O. Widlund, Schwarz analysis of iterative sub- structuring algorithms for elliptic problems in three dimensions, SIAM J. Numer. Anal., 31(1994), pp. 1662–1694. [11] M. Dryja and O. Widlund, Domain decomposition algorithms with small overlap, SIAM J. Sci. Comp., 15(1994), pp. 604–620. [12] M. Dryja and O. Widlund, Schwarz Methods of Neumann-Neumann type for three-dimensional elliptic finite element problems, Comm. Pure Appl. Math., 48(1995), pp. 121–155. [13] A. Frommer and D. Szyld, An algebraic convergence theory for restricted additive Schwarz methods using weighted max norms, Research Report 00-3-23, Department of Mathematics, Temple University, 2000. (SIAM J. Numer. Anal., to appear) 6, 4(2004) A Restricted Additive Schwarz Preconditioner with ... 95 [14] G. Golub and C. van Loan, Matrix Computations The Johns Hopkins Uni- versity Press, 1983. [15] W. Gropp, D. Kaushik, D. Keyes, and B. Smith, Performance modeling and tuning of an unstructured mesh CFD application, Proceedings of SC2000, IEEE Computer Society, 2000. [16] M. Lesoinne, M. Sarkis, U. Hetmaniuk, and C. Farhat, A linearized method for the frequency analysis of three-dimensional fluid/structure interaction problems in all flow regimes, Comp. Meth. Appl. Mech. Eng., 2001. (to appear). [17] A. Quarteroni and A. Valli, Domain Decomposition Methods for Partial Differential Equations, Oxford Science Publications, 1999. [18] Y. Saad and M. Schultz, GMRES: A generalized minimum residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7(1986), pp. 856–869. [19] M. Sarkis, Nonstandard coarse spaces and Schwarz methods for elliptic problems with discontinuous coefficients using non-conforming elements, Numer. Math., 77(1997), pp. 383-406. [20] B. Smith, P. Bjørstad, and W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge Uni- versity Press, 1996.