JOURNAL OF THEORETICAL AND APPLIED MECHANICS 42, 4, pp. 869-903, Warsaw 2004 COMPUTATIONS OF AN UNSTEADY VISCOUS FLOW IN A THREE-DIMENSIONAL SYSTEM OF DUCTS. PART II: IMPLEMENTATION OF THE SPECTRAL ELEMENT METHOD AND SAMPLE RESULTS Jacek Szumbarski Piotr Olszewski Institute of Aeronautics and Applied Mechanics, Warsaw University of Technology e-mail: jasz@meil.pw.edu.pl; polsz@meil.pw.edu.pl Konrad Wawruch Interdisciplinary Centre for Mathematical and Computational Modeling, Warsaw University e-mail: kwaw@icm.edu.pl Zbigniew Małota Foundation of Cardiac Surgery Development, Zabrze e0mail: zmalota@o2.pl The second part the paper focuses on implementation issues of the spec- tral element technique and presents sample results of numerical simulations. Particular algorithms like preconditioning of the presser solver, fast diago- nalizationmethod and the projectionmethod for solving sequences of large linear systems are described in some details. The computational results, ob- tained for both test domains andgeometries ofmedical origin, are presented. The comparisonwith the results of computationswith theFIDAPpackage is provided.The difficulties and limitations of the spectralmethod and current implementation are discussed.Attempts of the parallelization of the spectral code and the obtained efficiency are briefly described. Keywords:Navier-Stokes equations, spectral elementmethods, pressurepre- conditioners, fast diagonalization, parallel computations 1. Introduction In the first part of the paper, the problem of a nonstationary flow of a viscous incompressible fluid in a three dimensional pipe system has been for- mulated. The most remarkable element of this formulation is, following the 870 J.Szumbarski et al. terminology introduced byFormaggia et al. (2000), the enforcement of ”defec- tive” boundary conditions at inlet/outlet sections of a computational domain. Such conditions prescribe temporal variation of the flow rate (the volume flux) or area-averaged static pressure. It means that distributions of velocity and pressure fields across the inlet/outlet section are not a priori assumed – they are obtained during a solution process. Moreover, the flow rate inlet/outlet conditions (referred to as V F-type conditions) can coexist with the area- averaged pressure conditions (referred to as AP-type conditions), meaning that each of the inlet/outlet sections can be of either of these types (but not of both at the same time). While the boundary conditions of the AP-type can be incorporated directly in theweak formulation, the V F-type conditions are, in fact, additional constrains imposed on the problem, and they are en- forced using Lagrange’s multipliers. Ifmixed explicit-implicit time integration schemes are used, the algebraic problem to be solved during each time step becomes linear. It is convenient to seek for a solution to such a system in the form of a linear combination of particular solutions, which all but one can be computed in advance. This procedure allows also for simple evaluation of time-dependentLagrange’smultipliers, which are interpreted as instantaneous values of the area-averaged pressure at V F-type inlets/outlets. In this part of the paper, we discuss the most important elements of the numerical implementation based on the spectral elements and present sample results of numerical calculations. For a convenient reference, the summary of the numerical algorithm, formulated inPart I, is provided in Section 2.Abrief account of the spectral element discretization used in this study is presented in Section 3. For a detailed exposition of the spectral methods, the reader may refer to recent monographs (Deville et al., 2002; Karniadakis and Sherwin, 1999). Some issues, particularly important for the overall performance of theme- thod, also are addressed: the construction of the preconditioner of the pressure solver (Section 4), the fast diagonalizationmethod (Section 5) and the projec- tionmethod for solving a sequence of large linear problems (Section 6). Some longermathematical derivations and formulas have been shifted to Appendix. Sample numerical results are described in Section 7. The focus is on the validation of the algorithm in the case of mixed inlet/outlet conditions rather than on solving a particular (technically interested) problem. Therefore, most of the calculations are carried out for a simple T-shaped pipe system. Some preliminary results, obtained for a more complicated model of the Blallock- Taussig shunt, are also presented. Finally, we shortly discuss an important issue of the parallelization and present some preliminary results concerning the parallel efficiency of the spectral solver. Computations of an unsteady viscous flow... 871 2. Description of the numerical method In this section, the numerical method, described in Part I of the paper, is summarized for convenient reference. Let u (m) α (α =1,2,3) and π (m) denote vectors of a spectral representation of velocity components and pressure computed in the time step t(m). The following procedure, based on the OIFS scheme, yields the solution in the next time step t(m+1): Computation of the convection substep The following initial value problems (k = 1, ...,K, where K is the order of OIFS scheme) d dt (ûα)k =−Cα[(û1)k,(û2)k,(û3)k](ûα)k α =1,2,3 (2.1) (ûα)k(t = t (m−k+1))=u(m−k+1)α are integrated numerically using the 4th order Runge-Kutta explicit scheme. The integration is carried out with the uniform sub-step h = ∆t/MS (MS is typically not larger that 5) until the time t(m+1) = t(m) + ∆t is reached. The system of differential equations (2.1) is a space-discretized counterpart of equations (3.13) from Part I. As a result, one obtains vectors (ûα) (m+1) k = (ûα)k(t = t (m+1)), α =1,2,3, k =1, ...,K. Determination of the solution to the special Stokes problem Next, the following (unsteady) Stokes problem is solved   A 0 0 (D1) > 0 A 0 (D2) > 0 0 A (D3) > D1 D2 D3 0     u {0} 1 u {0} 2 u {0} 3 π{0}   (m+1) =   r1 r2 r3 0   (m+1) (2.2) where the right-hand side vectors are defined as follows r(m+1)α =− 1 ∆t K∑ k=1 βkMV (ûα) (m+1) k − (ΛPα) >P (m+1) α =1,2,3 (2.3) In the above, the symbol P (m+1) denotes the vector of the section-averaged pressureprescribed for all inlets/outlets of the AP-typeat the time t = t(m+1). 872 J.Szumbarski et al. The solution to linear system (2.2) is sought using the Schur-compliment method in an incremental version, which can be summarized as follows: • Solution of three (independent) linear systems Aũα = r (m+1) α − (Dα) > π {0}(m) α =1,2,3 (2.4) The Helmholtz matrix A is symmetric, positive definite and, typically, strongly diagonally dominant. Thus, the above systems can be efficien- tly solved with the Conjugate Gradients Method (CGM) with simple diagonal preconditioning. • Solution of the linear system for the pressure correction Sπ′ =D1ũ1+D2ũ2+D3ũ3 (2.5) where S=D1A −1(D1) >+D2A −1(D2) >+D3A −1(D3) > (2.6) The Uzawa matrix S is symmetric and positive definite. For an unste- ady Stokes problem, this matrix is very poorly conditioned. Hence, an effective preconditioning technique is compulsory. We will discuss this problem in details later. Note also that computing product, like S ·p, involves solving of three linear systemswith theHelmholtzmatrix A. In Part I, we shortly described themethod of approximation of thematrix A −1 based on the Neumann expansion. This method, however, is not used in the current implementation of the algorithm. In effect of this part of calculations, we obtain the total pressure of the Stokes solution π{0}(m+1) =π{0}(m) +π′ (2.7) • Calculation of the velocity components The new velocity field can be calculated as follows. First, the Helmholtz linear systems for velocity corrections Au ′ α =−(Dα) > π ′ α =1,2,3 (2.8) are solved by PCGM iterations with the diagonal preconditioner. After- wards, the final velocity components are obtained u {0}(m+1) α =u {0}(m) α +u ′ α α =1,2,3 (2.9) Computations of an unsteady viscous flow... 873 Finally, thevector of correspondingvolumefluxes throughall inlets/outlets of the VF type is calculated f {0}(m+1) VF = 3∑ α=1 ΛFαu {0}(m+1) α (2.10) Determination of the full solution at the time t = t(m+1) The full solution at t = t(m+1) is obtained in the form of the following linear combination    u (m+1) 1 u (m+1) 2 u (m+1) 3 π(m+1)    =    u {0}(m+1) 1 u {0}(m+1) 2 u {0}(m+1) 3 π{0}(m+1)    + NVF∑ k=1 λ (m+1) k    u {k} 1 u {k} 2 u {k} 3 π{k}    (2.11) Let us remind that the symbols u {k} α , α =1,2,3, k =1, ...,NVF , refer to the solutions to special Stokes problems   A 0 0 (D1) > 0 A 0 (D2) > 0 0 A (D3) > D1 D2 D3 0     u {k} 1 u {k} 2 u {k} 3 π{k}   =   −(ΛF1 ) >λ{k} −(ΛF2 ) >λ{k} −(ΛF3 ) >λ{k} 0   (2.12) where λ {k} j = { 0 if j 6= k 1 if j = k For a fixed geometry and time step, these solutions can be calculated once and forever. The vector of the Lagrange multipliers λ(m+1) is found from the linear system TFλ (m+1) =F (m+1)−f {0}(m+1) VF (2.13) Thematrix TF of system (2.13) is defined as follows TF =   f {1} 1 · · · f {NVF} 1 ... ... ... f {1} NVF · · · f {NVF} NVF   (2.14) where f {k} j = ( 3∑ α=1 ΛFαu {k} α ) j j =1, ...,NVF (2.15) 874 J.Szumbarski et al. Note that parts (2) and (3) of the above procedure provide a solution to the following linear system   A 0 0 (D1) > (ΛF1 ) > 0 A 0 (D2) > (ΛF3 ) > 0 0 A (D3) > (ΛF3 ) > D1 D2 D3 0 0 ΛF1 Λ F 2 Λ F 3 0 0     u1 u2 u3 π λ   (m+1) =   r1− (Λ P 1 ) >P r2− (Λ P 2 ) >P r3− (Λ P 3 ) >P 0 F   (m+1) (2.16) Typically, theabove systemcontains onlya fewequationsmore that the special Stokes systems described above. However, the matrix of (2.16) is not positive definite, and a different iterative technique fromPCGMhas to be applied. 3. Spectral element discretization 3.1. Physical and standard spectral elements In thepresentwork, a block-structured grid of hexahedral elements is used. In order to perform local operations, each element in the physical space is mapped to the standard spectral element: the cube [−1,1]3 (see Fig.1). Fig. 1. Standard cube [−1,1]3 and its transformation to a spectral element in the physical space Computations of an unsteady viscous flow... 875 The mapping is based only on vertex information and takes the following triple-linear form x(ξ1,ξ2,ξ3) = 1 8 (1− ξ1)(1− ξ2)(1− ξ3)x1+ 1 8 (1+ ξ1)(1− ξ2)(1− ξ3)x2+ + 1 8 (1+ ξ1)(1+ ξ2)(1− ξ3)x3+ 1 8 (1− ξ1)(1+ ξ2)(1− ξ3)x4+ (3.1) + 1 8 (1− ξ1)(1− ξ2)(1+ ξ3)x5+ 1 8 (1+ ξ1)(1− ξ2)(1+ ξ3)x6+ + 1 8 (1+ ξ1)(1+ ξ2)(1+ ξ3)x7+ 1 8 (1− ξ1)(1+ ξ2)(1+ ξ3)x8 More sophisticated mappings including detailed description of curvilinear in- terfaces of the element can also beused.An efficient algorithm for dealingwith suchmappingswas proposedbyGordon andHall (see, for instance, Chapter 4 of the work by Deville et al. (2002). Inorder to carryout local differentiation and integration, theJacobimatrix of transformation (3.1) should be calculated. The rows of this matrix can be expressed as follows ∂ ∂ξ1 x = 1 8 (1− ξ2)(1− ξ3)(x2−x1)+ 1 8 (1+ ξ2)(1−ξ3)(x3−x4)+ + 1 8 (1− ξ2)(1+ ξ3)(x6−x5)+ 1 8 (1+ ξ2)(1+ξ3)(x7−x8) ∂ ∂ξ2 x = 1 8 (1− ξ1)(1− ξ3)(x4−x1)+ 1 8 (1+ ξ1)(1−ξ3)(x3−x2)+ (3.2) + 1 8 (1− ξ1)(1+ ξ3)(x8−x5)+ 1 8 (1+ ξ1)(1+ξ3)(x7−x6) ∂ ∂ξ3 x = 1 8 (1− ξ1)(1− ξ2)(x5−x1)+ 1 8 (1+ ξ1)(1−ξ2)(x6−x2)+ + 1 8 (1− ξ1)(1+ ξ2)(x7−x3)+ 1 8 (1+ ξ1)(1+ξ2)(x8−x4) 3.2. Local basic functions The local representation of any scalar field defined in the element is based on theusage of theLagrange interpolation.The interpolation grids in the stan- dard cube [−1,1]3 are definedas triple tensor products of one-dimensional sets 876 J.Szumbarski et al. of nodes in the interval [−1,1]. The grid for velocity components is generated over the Gauss-Jacobi-Lobatto (GJL) nodes ξ (0) V =−1 ξ (MV−1) V =1 (3.3) {ξ (j) V , j =1, ...,MV −2} – roots of the Jacobi polynomial P 1,1 MV−2 The pressure field is interpolated using a grid generated using the Gauss- Legendre (GL) nodes {ξ (j) P , j =0, ...,MP −1} – roots of the Legendre polynomial PMP−1 (3.4) Note that the pressure grid contains no interface points. The reason forusing two separate (andstaggered) grids for thevelocity and pressure is to satisfy the div-stability (or inf-sup) condition (see Chapter 8 in Karnidakis andSherwin (1999) and further references therein).Wealso assume that MV = MP +2. Thus, the numbers of internal nodes of both grids are equal. Thecomponents of thevelocity field in the spectral element are represented by local interpolants, which are linear combinations of the following basic functions bVijk(ξ1,ξ2,ξ3)= L V i (ξ1)L V j (ξ2)L V k (ξ3) i,j,k =0, ...,MV −1 (3.5) The Lagrange polynomials LVi (ξ) (i = 0, ...,MV − 1) satisfy the conditions LVi (ξ (j) V )= δ j i (j =0, ...,MV −1). Analogously, the local basis for the pressure is given as bPijk(ξ1,ξ2,ξ3)= L P i (ξ1)L P j (ξ2)L P k (ξ3) i,j,k =0, ...,MP −1 (3.6) where the Lagrange polynomials LPi (ξ) satisfy the conditions L P i (ξ (j) P ) = δ j i (j =0, ...,MP −1). 3.3. Local differentiation and integration Consider a function f(ξ) defined in the interval [−1,1] and its interpolant f̂(ξ) on the GJL nodes, i.e. the following polynomial f̂(ξ)= MV−1∑ α=0 f(ξVα )L V α(ξ) (3.7) Computations of an unsteady viscous flow... 877 The collocation approximation of the derivative f ′(ξ) (called also pseudo- spectral derivative) is defined as follows: it is the GJL interpolant of the deri- vative of the interpolating polynomial (3.7). Thus, we have f ′(ξ)≈ MV−1∑ α=0 φαL V α(ξ)⇒ φα ≡ f̂ ′(ξVα )= MV−1∑ β=0 L ′V β (ξ V α )f(ξ V α ) (3.8) This way, the vector of nodal values of the derivative φ is directly related to the vector f containing the nodal values of the differentiated function f, namely φ=df (3.9) where the symbol d denotes the matrix of pseudo-spectral differentiation on the GJL grid (d)αβ = L ′V β (ξ V α ) α,β =0,MV −1 (3.10) Another one-dimensional operator used in the spectral element method is the interpolation from GJL grid to GL grid. Given a function f(ξ) by its GJL interpolant f̂(ξ), this operator yields the GL interpolant defined as f̃(ξ)= MP−1∑ α=1 ψαL P α(ξ) (3.11) where ψα ≡ f̂(ξ P α )= MV−1∑ β=0 LVβ (ξ P α )f(ξ V β ) (3.12) In terms of the vectors of nodal values, the above interpolation can bewritten in the form of ψ = tf (3.13) where t is arectangular matrix defined as follows (t)αβ = b V β (ξ P α ) α =0, ...,MP −1 β =0, ...,MV −1 (3.14) In the calculations involving the gradient and divergence operators, the com- ponents of the velocity field are differentiated and then interpolated to theGL grid. Therefore, it is convenient to introduce a composite operator. Its matrix representation is obtained as the product of thematrices definedby (3.10) and (3.14) z= td (z)αβ = MV−1∑ γ=0 (t)αγ(d)γβ (3.15) 878 J.Szumbarski et al. All volume integrals are computed element-by-element using local transfor- mations (3.1) to the standard cube [−1,1]3. The integrals in the cube are computed using Gauss-Jacobi-Lobatto or Gauss-Legendre integration formu- las (see Appendix B in Karniadakis and Sherwin, 1999). 4. Preconditioning of the pressure solver During each time step, algebraic Stokes problem(2.2) has tobe solved.The solution is calculated in three major steps explained in Section 2. The most difficult element of the described procedure is the evaluation of the pressure correction. It consists in solving of linear system(2.5)withUzawamatrix (2.6). The solution is computed approximately using the iterativemethod of precon- ditioned conjugate gradients (PCGM). Since system (2.5) is ill conditioned (Maday et al., 1993), the problem of selection of an appropriate preconditio- ner becomes crucial. According to Maday et al. (1993), Cahouet and Chabard (1986), conver- gence of the CGM for linear system (2.5) can be efficiently accelerated by means of the following preconditioner R −1 = νM−1 P + β0 ∆t E −1 (4.1) where E denotes the pseudo-Laplace matrix E=D1M −1 V D > 1 +D2M −1 V D > 2 +D3M −1 V D > 3 (4.2) and MP is the pressure mass matrix. It should be noted that, in the nodal variant of the spectral elementmethod, allmassmatrices arediagonal and thus trivial to invert. The reason for this convenient structure stems from the fact that the volume integrals of products of the basic functions are computedwith theGauss orGauss-Lobatto formulas based on the same localmeshes that are used for the pressure or velocity interpolation. Thus, the basic functions form (discretely) orthogonal systems. Themajor difficulty, however, presents efficient calculation of the matrix- vector product with the matrix E−1. In each iteration of the PCGM, one has to solve a linear system with the matrix E, which can be formally written as Eq = r (4.3) Computations of an unsteady viscous flow... 879 The solution shouldbedeterminedwith reasonable accuracy, perhapsnot very high, but sufficient for the preconditioner to maintain a satisfactory conver- gence rate. Thematrix E is symmetric and positive definite, thus themethod of conjugate gradients is a natural choice. The point is that the matrix E is itself ill-conditioned and thus the convergence of the internal iterations will be very slow, unless another lower-level preconditioner is used. The complete preconditioning strategy used in this study is based on the approach proposed by Ronquist (1991) and further developed by Couzy and Deville (1994). In the latter work, several variants of preconditioning of the Uzawamatrix were discussed. Numerical tests showed that the two-stage pre- conditioners, referred to as P5 and P7, are the most effective. In this study, the implementation based on the variant P5, described below, proved to be superior. Assume we need to solve linear system (4.3) where dimE = N. Assume next that a K-dimensional subspace of RN is defined. In the context of the Stokes problem, K is equal to the number of the spectral elements in the flow domain, and the subspace will correspond to the pressure which is constant inside each individual element, i.e. piecewise constant in the computational domain. Assume that K orthonormal vectors have been chosen in the subspace. Using these vectors as columns, one can construct a matrix J with N rows and K columns. Clearly, the matrix J is orthogonal, i.e. J>J = I. Now, the K-dimensional subspace can be formally defined formally as Π = {π ∈ RN : π = Ju, u ∈RK} (4.4) We will seek a solution to (4.3) in the following form q = Jq0+q1 (4.5) where q0 is a vector from R K. Thus, the vector q is expressed as a sumof two parts. The first part corresponds to the ”background” distribution (uniform in each element), while the second one describes local variations with respect to the locally uniform ”background”. The above representation, (4.5), is clearly not unique.We will find a par- ticular pair {q0,q1} in the following way. First, we define the operator PΠ⊥ = I−EJE −1 0 J > (4.6) where E0 = J > EJ (4.7) 880 J.Szumbarski et al. Note that PΠ⊥ is a projection operator, as the following calculation proves PΠ⊥PΠ⊥ =(I−EJE −1 0 J >)2 = (4.8) = I−2EJE−10 J >+EJE−10 J > EJE −1 0 J > = I−EJE−10 J=PΠ⊥ The operator PΠ⊥ projects all vectors onto a subspace orthogonal to Π, denoted here as Π⊥. Indeed, it is easy to verify that J>PΠ⊥w = 0. On the other hand, this projection is not orthogonal. The following equality, valid for each vector π ∈ Π PΠ⊥Eπ ≡PΠ⊥EJu=0 (4.9) shows that the operator PΠ⊥ is the projection onto Π ⊥ along the subspace E(Π). Since PΠ⊥EJq0 = 0, the application of the projection operator to the equation EJq0+Eq1 = r (4.10) yields PΠ⊥Eq1 =PΠ⊥r or, equivalently (I−EJE−10 J >)Eq1 =(I−EJE −1 0 J >)r (4.11) The matrix of the linear system E1 = (I−EJE −1 0 J >)E is singular. Neverthe- less, the system is uniquely solvable in the subspace Π⊥. The solution can be computed approximately using the Conjugate Gradient Method with an ap- propriate preconditioner described later. Once the vector q1 ∈ Π ⊥ is found, the K-dimensional vector q0 is evaluated as a solution to the following linear system E0q0 = J >(r−Eq1) (4.12) In the context of the spectral element method, linear system (4.12) has as many unknowns as the number of elements in the computational domain. In most situations, thenumberof the elements doesnot exceed several thousands, and system (4.12) can be solved by an exact solver. Since the matrix E0 is symmetric and positive definite, the natural method is the Choleski Decom- position. For a fixed grid geometry, this decomposition can be evaluated once and forever and the forward/backward substitution part of the solution can be implemented efficiently using one of the sparse storage schemes (see Saad, 2003). It should be noted that the solution to system (4.11) in the subspace Π⊥ corresponds to the pressure distribution orthogonal to all piecewise-constant Computations of an unsteady viscous flow... 881 ”background” distributions (generated by vectors from Π). The main ad- vantage of the orthogonal decomposition is that linear system (4.11) can be efficiently preconditioned by the block matrix {E}. 5. Block-diagonal preconditioning of inner iterations using the Fast Diagonalization Method In this Section, we will briefly describe the preconditioning of conjugate gradients for linear system (4.11), based on the usage of the blockmatrix {E} (Couzy andDeville, 1994). Consider a preconditioning step in PCGM iterations (see Appendix A). In order to find the vector r̃ out of the current residuum r, we follow the following procedure: (1) The global residual vector r is disassembled into local vectors rlock , k =1, ...,K corresponding to individual spectral elements. (2) Local linear systems Elock r̃ loc k = r loc k , k =1, ...,K are solved independen- tly for each spectral element. Thematrices of these systems E loc k = 3∑ j=1 (Dj) loc k (M −1 V )lock (D > j ) loc k (5.1) are constructed using local velocity and pressure basic functions. The dimension of the local systems is equal to M3P . The dimension of the velocity local base depends on the conditions imposed at interfaces of the element. For internal elements, all components of thematrix Elock are constructed using the full velocity base of M3V functions. If some collo- cation nodes are located on the material surface of the computational domain, the local velocity base is accordingly reduced. (3) The resulting local vectors r̃lock , k = 1, ...,K are assembled into the global vector r̃. The local systems canbe solved inmanyways.Note that their dimension is small (typically they contain from several tens to several hundreds equations), so it is natural to use exact solvers.One can simply construct all localmatrices explicitly and compute local Choleski factorizations. The numerical cost of such a preconditioner will be roughly as K · M6P . With a greater number 882 J.Szumbarski et al. of spectral elements with different shapes, the memory required to store all decomposed local matrices becomes large. Couzy andDeville (1994) proposed an alternative approach based on the Fast Diagonalization Method (FDM), see Appendix B. In order to use the FDM one has to construct the local matrix as a sum of appropriate tensor products of matrices corresponding to one-dimensional operations.Consider the standardelement [−1,1]3.Note thatwe canuse index notations within each element: a ”natural” triple-index notation and a single- index numeration. The latter is usually more convenient because it leads to a simpler formalism. The local velocity and pressure basic functions in the standard element, introduced in Section 2, can be conveniently numerated as follows BVi (ξ1,ξ2,ξ3)= b V I1(i) (ξ1)b V I2(i) (ξ2)b V I3(i) (ξ3) (5.2) BPj (ξ1,ξ2,ξ3)= b P J1(j) (ξ1)b P J2(j) (ξ2)b P J3(j) (ξ3) In the above, the integer functions Iα(i) and Jα(j) (α = 1,2,3) define the mappings between triple-index and single-index numeration of the basic func- tions for the velocity and pressure, respectively. Wewill now consider systematic calculations of the blockmatrix {E}. It is sufficient to consider an arbitrary element in the physical space. However, in the case of general hexahedral geometry, the volume integralswill not factorize into three one-dimensional ones, and direct application of the FDM is not possible. Thus, let us assume that all physical elements are parallelepipeds. In order to compute matrix (5.1), the following local matrices have to be computed (D>α)ij =− ∫ ΩK ∂vi ∂Xα qj dx =− ∫ [−1,1]3 ( 3∑ β=1 ∂BVi ∂ξβ ∂ξβ ∂Xα ) BPj ∣∣∣ ∂(x) ∂(ξ) ∣∣∣ dξ (5.3) The elements of the Jacobi matrix and its inverse in a parallelepiped are the same at each point of the integration domain. Introducing edge vectors x2−x1 =x3−x4 =x6−x5 =x7−x8 ≡k1 x4−x1 =x3−x2 =x8−x5 =x7−x6 ≡k2 (5.4) x5−x1 =x6−x2 =x7−x3 =x8−x4 ≡k3 it is easy to show that ∂x ∂ξ = 1 2 [k1,k2,k3] ∣∣∣ ∂x ∂ξ ∣∣∣= 1 8 |k1 · (k2×k3)| ≡ 1 8 volΩ (5.5) Computations of an unsteady viscous flow... 883 and ∂ξ ∂x = 2 volΩ    (k2×k3) > (k3×k1) > (k1×k2) >    (5.6) Now, the volume integral can be expressed in terms of one-dimensional inte- grals in [−1,1] ∫ Ω ∂vi ∂xα qj dx = = ∣∣∣ ∂x ∂ξ ∣∣∣ [∂ξ ∂x ] 1,α 1∫ −1 b ′V I1(i) bPJ1(j) dξ1 1∫ −1 bVI2(i)b P J2(j) dξ2 1∫ −1 bVI3(i)b P J3(j) dξ3+ (5.7) + ∣∣∣ ∂x ∂ξ ∣∣∣ [∂ξ ∂x ] 2,α 1∫ −1 bVI1(i)b P J1(j) dξ1 1∫ −1 b ′V I2(i) bPJ2(j) dξ2 1∫ −1 bVI3(i)b P J3(j) dξ3+ + ∣∣∣ ∂x ∂ξ ∣∣∣ [∂ξ ∂x ] 3,α 1∫ −1 bVI1(i)b P J1(j) dξ1 1∫ −1 bVI2(i)b P J2(j) dξ2 1∫ −1 b ′V I3(i) bPJ3(j) dξ3 All integrals in (5.7) are calculated approximately using theGauss quadrature based on the pressure nodes. To this end, the velocity basic functions and their derivatives are interpolated from the Gauss-Lobatto nodes to the Gauss nodes. Using one-dimensional operations defined in Section 2, one can write formulas 1∫ −1 b ′V r b P s dξ GL ≈ wPs b ′V r (ξ P s )= w P s (z)s,r (5.8) 1∫ −1 bVr b P s dξ GL ≈ wPs b V r (ξ P s )= w P s (l)s,r In order to use the FDM solver, one has to switch back to the triple-index tensor notation. The final formulas are D̃α =−(w P ⊗wP ⊗wP)(C1,αz⊗ l⊗ l+C2,αl⊗z⊗ l+C3,αl⊗ l⊗z) (5.9) D̃ > α =−(C1,αz >⊗ l>⊗ l>+C2,αl >⊗z>⊗ l>+C3,αl >⊗ l>⊗z>) · ·(wP ⊗wP ⊗wP) 884 J.Szumbarski et al. where the matrix C is defined as C= ∣∣∣ ∂x ∂ξ ∣∣∣ ∂ξ ∂x (5.10) The local velocity mass matrix and its inverse can be written as follows M̃V = ∣∣∣ ∂x ∂ξ ∣∣∣(wV ⊗wV ⊗wV ) (5.11) M̃ −1 V = ∣∣∣ ∂x ∂ξ ∣∣∣ −1 (ŵV ⊗ ŵV ⊗ ŵV ) In the above, the structure wV can be viewed as a diagonalmatrix containing the weights of the Gauss-Lobatto quadrature, and ŵV ≡ (wV )−1. The final form of the local Ẽmatrix reads Ẽ≡ 3∑ α=1 D̃αM̃ −1 V D̃ > α = ∣∣∣ ∂x ∂ξ ∣∣∣(wP ⊗wP ⊗wP) · · { (CC>)1,1(z⊗ l⊗ l)(ŵ V ⊗ ŵV ⊗ ŵV )(z>⊗ l>⊗ l>)+ +(CC>)2,2(l⊗z⊗ l)(ŵ V ⊗ ŵV ⊗ ŵV )(l>⊗z>⊗ l>)+ +(CC>)3,3(l⊗ l⊗z)(ŵ V ⊗ ŵV ⊗ ŵV )(l>⊗ l>⊗z>)+ +(CC>)1,2 [ (z⊗ l⊗ l)(ŵV ⊗ ŵV ⊗ ŵV )(l>⊗z>⊗ l>)+ (5.12) +(l⊗z⊗ l)(ŵV ⊗ ŵV ⊗ ŵV )(z>⊗ l>⊗ l>) ] + +(CC>)1,3 [ (z⊗ l⊗ l)(ŵV ⊗ ŵV ⊗ ŵV )(l>⊗ l>⊗z>)+ +(l⊗ l⊗z)(ŵV ⊗ ŵV ⊗ ŵV )(z>⊗ l>⊗ l>) ] + +(CC>)2,3 [ (l⊗z⊗ l)(ŵV ⊗ ŵV ⊗ ŵV )(l>⊗ l>⊗z>)+ +(l⊗ l⊗z)(ŵV ⊗ ŵV ⊗ ŵV )(l>⊗z>⊗ l>) ]} (wP ⊗wP ⊗wP) Unfortunately, formula (5.12) still cannot be cast into a form amenable to the FDM solver, and an additional restriction on the shape of an element is necessary. To this end, we will assume that the element is rectangular. Then, the inverse Jacobi matrix is expressed as follows Computations of an unsteady viscous flow... 885 ∂ξ ∂x = 2 k1k2k3    k2k3 k1 k > 1 k1k3 k2 k>2 k1k2 k3 k > 3    =2    1 k21 k>1 1 k22 k>2 1 k23 k > 3    (5.13) ∣∣∣ ∂x ∂ξ ∣∣∣ −1 CC > = ∣∣∣ ∂x ∂ξ ∣∣∣ ∂ξ ∂x ∂ξ> ∂x =   k2k3 2k1 0 0 0 k1k3 2k2 0 0 0 k1k2 2k3   Eventually, the local matrix Ẽ takes the form of Ẽ=(wP ⊗wP ⊗wP) [k2k3 2k1 (z⊗ l⊗ l)(ŵV ⊗ ŵV ⊗ ŵV )(z>⊗ l>⊗ l>)+ + k1k3 2k2 (l⊗z⊗ l)(ŵV ⊗ ŵV ⊗ ŵV )(l>⊗z>⊗ l>)+ (5.14) + k1k2 2k3 (l⊗ l⊗z)(ŵV ⊗ ŵV ⊗ ŵV )(l>⊗ l>⊗z>) ] (wP ⊗wP ⊗wP) Consider now the local linear system Ẽỹ = g̃ (5.15) where thematrix Ẽ is given by (5.14). After some algebra, this system can be transformed into an equivalent form [ (k21lŵ V l>)−1zŵVz>⊗ I⊗ I+ I⊗ (k22lŵ V l>)−1zŵVz>⊗ I+ +I⊗ I⊗ (k23lŵ V l>)−1zŵVz> ] (wP ⊗wP ⊗wP)ỹ = (5.16) = 2 k1k2k3 (wPlŵV l>)⊗ (wPlŵV l>)⊗ (wPlŵV l>)g̃ which can be viewed as a special case of form (B.1) fromAppendix B with Lα =(k 2 αlŵ V l >)−1zŵVz> α =1,2,3 (5.17) y =(wP ⊗wP ⊗wP)ỹ and r = 2 k1k2k3 (wPlŵV l>)⊗ (wPlŵV l>)⊗ (wPlŵV l>)g̃ (5.18) 886 J.Szumbarski et al. Note that all operators Lα can be obtained as Lα =L0/k 2 α, where L0 =(lŵ V l >)−1zŵVz> (5.19) In order to implement the FDM solver, all eigenvalues and eigenvectors of the matrix L0 shouldbe found.The eigenvalue problem for L0 canbe transformed to a generalized eigenvalue problem as follows [(lŵV l>)−1zŵVz>]υ = λυ ⇒ (zŵVz>)υ = λ(lŵV l>)υ (5.20) and easily solved using standard procedures from the LAPACK library. Note also that, in the spectral elementmethod, the dimension of the above eigenva- lue problemwill be very small, typically not larger that 10. 6. A projection method for solving a sequence of linear systems with a fixed matrix The most time-consuming part of the calculations during each time step is the determination of new pressure fields. In order to obtain convergence in a reasonable number of iterations, some nontrivial preconditioning techniques are compulsory. Additionally, the multiplication by the pressure matrix S requires a solution to the internal inversion of the Helmholtz matrix. The approximation of the matrix H−1 discussed in Part I can be applied to avoid costly inversion, however, at the price of inaccuracy in the mass balance. As long as the grid geometry and the time step are fixed, the problem of the pressure evaluation in subsequent time instants consists actually in solving a sequence of algebraic problems with the same matrix and different right-hand sides. Due to the continuity of the flow evolution, the difference between the right-hand side vectors corresponding to subsequent time steps is expected to be small, so the solution in a given time instant may be used as a sensibly accurate initial approximation for iterations in the next time step. One can probably do even better by extrapolating this initial approximation from the recent history of the solution. Such an extrapolation will typically involve several past time steps. Fisher (1998) proposed amuchmore sophisticated and efficient approach. It is shortly presented in this paper, however in a slightlymodified form,which avoids matrix-vector multiplication appearing in the original formulation. Consider a sequence of linear systems Ax(k) = b(k) k =1,2, ... (6.1) Computations of an unsteady viscous flow... 887 Thematrix of all systems is the same, and it is assumedSPD, large and sparse. Themethod starts with solving the first system Ax(1) = b(1) (6.2) At this point,we assume that no”special” initial approximations are available. After the solution is obtained, it is normalized as follows e1 = x(1) ‖x(1)‖A (6.3) The norm induced by the SDPmatrix A is defined as usual as ‖x(1)‖A = √ 〈x,x〉A ≡ √ 〈x,Ax〉 (6.4) Assume now that we want to solve the k-th system in sequence (6.1) (k > 1), and that we have, previously generated, a set of A-normalized and A-orthogonal vectors e1, ...,ek−1. Then we construct the vector x̃ as follows x̃= k−1∑ j=1 αjej (6.5) The coefficients of linear combination (6.5) are chosen in such a way that the orthogonality conditions hold 〈ej,b (k)−Ax̃〉A =0 which implies that αj = 〈ej,b (k)〉 (6.6) The new vector is now computed as b̃=Ax̃ (6.7) and the sought solution is expressed as a sum x (k) = x̃+x′ (6.8) The correction vector x′ will be found from the following linear system Ax′ = b(k)− b̃ ≡ b′ (6.9) 888 J.Szumbarski et al. and the iterations of the PCGMstart from the zero vector. Note that the resi- dual vectors of original system (6.1) and new system (6.9) (explicitly available in PCGM iterations) are the same r ≡ b(k)−Ax(k) = b′−Ax′ ≡ r′ (6.10) and the following stopping criterion is convenient ‖r‖2 ‖b(k)‖2 < ε (6.11) In order to continue the calculation, we have to create the next vector ek. Theoretically, it would be sufficient to A-normalize the vector x′ and obtain ek = x′ ‖x′‖A (6.12) Indeed, it is easy to show that if x′ were the exact solution to (6.9), then we would have x′⊥span{e1, ...,ek−1}. However, the vector x ′ is only an approxi- mate solution, so the explicit orthogonalization is recommended. To this end, we write x ′ ⊥ =x ′− k−1∑ j=1 βjej (6.13) where βj = 〈ej,x ′〉A =−〈ej,r ′〉 (6.14) The A-norm of the orthogonal projection x′⊥ should be calculated ‖x′⊥‖ 2 A = ‖x ′‖2A− k−1∑ j=1 β2j (6.15) It can be shown that the A-norm of the vector x′ can be computed without additional matrix-vector multiplication, namely ‖x′‖2A ≡〈x ′,Ax′〉= 〈x(k),b−r′〉− k−1∑ j=1 (αj +βj) 2+ k−1∑ j=1 β2j (6.16) Thus, the new vector ek is obtained as ek = x′⊥ ‖x′⊥‖ (6.17) Computations of an unsteady viscous flow... 889 where ‖x′⊥‖A ≡ √ 〈x′,Ax′〉= √√√√√〈x(k),b−r′〉− k−1∑ j=1 (αj +βj) 2 (6.18) If the number of the linear systems in sequence (6.1) is large, it is practical to interrupt the process of generation of the A-orthogonal basis having N systems completed, and then restart the process by setting e1 = x(N) ‖x(N)‖A Thenumber N should be large enough to ensure fast convergence of the itera- tive solver of system (6.9). On the other hand, the value of N may be limited by the memory available for the additional storage of the basic vectors. Fi- sher (1998) presented test calculations using various values of N, the largest being equal to 80. The obtained results show that the number of iterations necessary to solve each system (6.9) saturateswith increasing N. If suchbeha- vior is generic, it may a reasonable decision to use the lowest N ensuring the ”saturated” iteration. However, the authors’ experience suggests that using even larger values of N is advantageous. Indeed, anytime the sequence of the basic vector is restarted a ”transient state” is observed, i.e., during next cal- culations the number of iterations is much larger than that in the ”saturated state”. Thus, the larger N is used the better overall performance is obtained, since ”transient states” occur more seldom. 7. Numerical simulations and parallel efficiency Thetest calculations havebeencarriedout forflows in the T-shapeddoma- in presented in Fig.2. The hexahedral grids used in the calculations contained 1648 elements (grid A) or 3064 elements (grid B). Densities of the veloci- ty/pressure meshes of the collocation points in each element have been set to (MV ,MP) = (5,3) or (MV ,MP) = (6,4). However, in the case of the larger grid B, only the smaller density (5,3) of the collocation mesh has been used in order to avoid excessive computational times (all tests have been run on a PC computer with a single Pentium 4 processor) The dimension of the computational problem depends on the number of spectral elements and the density of the collocation mesh: 890 J.Szumbarski et al. Fig. 2. Computational domain for test calculations (the large grid with 3064 spectral elements is shown) • grid A with (5,3) collocation mesh: 3 · 99854 and 44496 unknowns for the velocity and pressure fields, respectively • grid A with (6,4) collocation mesh: 3 ·197177 and 105472 unknowns for the velocity and pressure fields, respectively • grid B with (5,3) collocation mesh: 3 ·189230 and 83268 unknowns for the velocity and pressure fields, respectively. The goal of the first part of test calculations was to compare results ob- tained for different discretization parameters. The sample flow was driven by an oscillating static pressure applied symmetrically to inlet sections 1 and 2. The pressure at outlet Section 3was fixed in time and equal to zero. The tem- poral variation of the static pressure at both inlets was given by the following formulae P1,2(t)= 1500sin 2(2πt) (7.1) The kinematic viscosity ν was set to 0.1 and the time step to ∆t = 0.0002. In order to get an idea how these parameters refer to any physical flow, it was assumed that the length was measured in centimeters. The spatial extent of the computational domain was several centimeters and the diameters of the pipes were about 0.5cm. Then it was assumed that the density of the fluid was that of water (1000kg/m3), which made the corresponding amplitude of the static pressure to be 150Pa and the viscosity 10−5m2/s2. The latter value is approximately ten times larger than the viscosity of water, and two and a half times larger than the blood viscosity in large- andmedium-sized arteries. The results of the calculations include time histories of volume fluxes thro- ugh the inlet and outlet sections and instantaneous patterns of the static pres- sure andvelocitymagnitude.The time variation of the volume fluxes obtained for different discretization parameters are presented inFig.3.Thedriving inlet Computations of an unsteady viscous flow... 891 pressure is also plotted to show the phase shift between the forcing and the fluid response. The instantaneous pressure distribution in the symmetry plane of the domain, calculated at the instant of themaximumflow rate is presented in Fig.4a, while the corresponding field of the velocity magnitude is shown in Fig.4b. It is clearly seen thatmost of the hydraulic pressure loss is located (as expected) in the area of pipe connection. Fig. 3. The volume flux of the test flow plotted as a function of time, calculated for different grids and collocationmeshes: grid A with (5,3) mesh (solid line), grid A with (6,4) mesh (thicker dashed line) and grid B with (5,3) mesh (thinner dashed line). The thick grey line corresponds to reference result obtained with FIDAP. The dashed-dotted line depicts the time dependence of the prescribed inlet pressure (formulae (7.1) in the text) In order to verify the obtained results, analogous tests have been done at the Foundation of Cardiac Surgery Development with the use of the finite- element FIDAP package. The calculations have been carried out on a hexahe- dral grid containing about 38 thousands of elements. The boundary conditions at all inlet/outlets have been formulated in terms of the static pressure: it has been assumed transversally uniform at each section, with the temporal varia- tion as describedabove.Additionally, the tangential component of the velocity fieldwas assumedtobe zero identically at all inlet/outlet sections. Such setting of the inlet/outlet data seems to bemost similar (although not equivalent) to the integral boundary conditions implemented in the spectral solver. Indeed, the results obtained with FIDAP and the spectral solver were in very good agreement (see Fig.3). In certain aspects, the spectral solver is superior, espe- cially as far as the accuracy of the mass (or volume) balance of the fluid is concerned. Indeed, the relative error of the mass balance in the spectral so- 892 J.Szumbarski et al. Fig. 4. Instantaneous patterns of the static pressure (a) and the magnitude of the velocity field (b) calculated at the symmetry plane of the test domain, at the time of the maximal flow rate lver remains of the order of 10−12, while in the FIDAP calculations it was practically difficult to achieve the accuracy level of 1%. Some improvement of FIDAP accuracy would be certainly achieved with further reduction of con- vergence tolerances, but at the expense of excessive CPU time. Surprisingly enough, the overall performance of the FIDAP and spectral solver turned out to be comparable. Fig. 5. Simple geometric model of the modified Blallock-Taussig (mBT) shunt, coveredwith the grid containing 1672 spectral elements Thenumerical simulation of a pulsatile flow in amore complicated flowdo- main depicted in Fig.5 has also been attempted. The presented configuration of the ducts can be viewed as a geometric model of the (modified) Blallock- Computations of an unsteady viscous flow... 893 Taussig shunt. The calculations have been performed using a grid with 1672 spectral elements and (5,3) inner collocation meshes (3 · 101231 and 45144 unknowns for the velocity and pressure, respectively). The AP-type condi- tions have been set for all inlet/outlet sections. The section-averaged static pressure applied at sections 1 and 2 was zero, while at sections 3 and 4 it changed in time similarly to real physiological pressure pulses at 120 heartbe- ats per minute (Fig.6). The fluidmotion has been accelerated from rest. The remarkable observation is that, initially both sections 3 and 4 are the inlets, while at later time the flow direction at section 4 reverses and this section remains the flow outlet for the rest of the simulation. Instantaneous patterns of velocity and pressure fields in the symmetry plane, calculated at during the maximum flow rate, are presented in Fig.7. Fig. 6. Temporal variation of the volume fluxes calculated for the computational domain in Fig.5 using AP-type conditions at all inlet/outlet sections. The lines V3 and P3 show, respectively, the volume flux and static pressure at the inlet section 3. The lines V4 and P4 represent the same data for the outlet section 4. The lines V1 and V2 show the temporal variations of the volume fluxes at the outlet sections 1 and 2, respectively. The pressure applied at these outlets was fixed in time and equal to zero Finally, the numerical simulations of a pulsatile flow with the V F-type inlet/outlet conditions have been performed. The results shown in the paper have been obtained for the geometric model of the mBT shunt divided into 3760 spectral elements with (5,3) internal mesh of collocation points. The al- gebraic problems solved at each time step contained 3 · 230303 and 101520 unknowns for the velocity and pressure fields, respectively. The V F-type con- ditions were formulated as follows: • section 3 was the V F-type inlet. The time variation of volumetric flow rate is depicted in the top part of Fig.8 894 J.Szumbarski et al. Fig. 7. Pulsatile flow in the mBT shunt calculated using AP-type conditions at all inlet/outlet sections: instantaneous patterns of themagnitude of the velocity (a) and the static pressure (b) calculated in the symmetry plane, at the instant of the maximal flow rate • sections 1 and 4were the V F-type outlets. It was assumed that the flow distribution between all outlets was fixed in time: 25% of the incoming volume flux was leaving the computational domain through section 4, while remaining 75%were divided equally between outlets 1 and 2. The latter one was of the AP-type (with the average static pressure set to zero) in order to make the pressure problem uniquely solvable. Asdescribed earlier in the text, theLagrangemultipliers calculated at each time step along with the flow field correspond to instantaneous values of the section-averaged static pressure at all V F-type inlets/outlets. The obtained results are shown in Fig.8 (bottom part). The reader may note that, initially, the large pressure difference between sections 3 and 4 is necessary in order to avoid the transient reversed flow through section 4, observed in the simulation with pure AP-type conditions described above. Since outlets 1 and 2 are located nearly symmetrically with respect to the shunt’s downstream end, the corresponding pressure difference is very small. For presentation purposes, the pressure at outlet 1 shown in Fig.8 has been multiplied by a factor of −100 (it is actually slightly smaller than the ”reference” pressure at outlet 2, equal to zero). Figure 9 shows patterns of the velocitymagnitude andpressure fields calculated at the time when the maximum flow rate is attained (t =0.6). Oneof themajorproblemswithnumerical simulations of three-dimensional flows is computational efficiency. The spectral element discretization in space is an efficient method in the sense that a sufficient spatial resolution can be achievedwith a relatively small number of unknowns.On the other hand, the- re is an ”overhead” caused bymore complicated forms of data structures and Computations of an unsteady viscous flow... 895 Fig. 8. Time histories of the volume fluxes and the averaged static pressures at the inlet/outlet sections of the geometric model of the mBT shunt, obtained in the simulation with V F-type inlet/outlet conditions. The corresponding fluid density is ρ =103kg/m3 and the kinematic viscosity is ν =10−5m2/s2 Fig. 9. Pulsatile flow in the geometric model of mBT shunt using V F-type inlet/outlet conditions: instantaneous patterns of the velocitymagnitude (a) and the pressure (b) fields in the symmetry plane, calculated at the instant of the maximal flow rate operations performed on the elementary level, in comparison with ”ordinary” FEM.Consequently, the algebraic problemsappearingduring the solutionpro- cess are smaller but less sparse andmore difficult for efficient preconditioning. In the current study, most of the numerical simulations described above have been carried out on a Pentium 4 2.6GHz personal computer. The com- putational efficiency can be evaluated using a CPU time required to perform one time step of numerical simulation. Clearly, it depends on the number of spectral elements, the dimension of the local collocation mesh and the length of the time step. The most time consuming part of each step is the pressure solver, thus it is essential that it works as quickly as possible. This is achie- 896 J.Szumbarski et al. ved by using the two-level preconditioning procedure (Sections 4 and 5) and ”smart” selection of the initial approximation based on the projectionmethod (Section 6) originally suggested by Paul Fisher. The pressure preconditioner has been tested on the smaller grid used in the mBT flow simulation (1672 spectral elements, (5,3) internal collocation meshes), at the occasion of deter- mination of particular Stokes solutions (2.12). It has been established that the efficiency of the preconditioning depends on the number of iterations admitted for the internal FDM-based preconditioner (Section 5). Fig. 10. Convergence history of the pressure preconditioner P5 working with different numbers of iterations of the FDM-based low-level preconditioner The convergence histories for different numbers of internal iterations are shown in Fig.10. The use of 15-18 iterations gives effective reductions of CPU times by the factor of 10. Spectacular effects on the computational efficiency arebroughtalso by theFisher’s ”trick” (Section6).Wehave found it extremely advantageous to use long (even up to 100) sequences of previous pressure states to predict the optimal initial approximation. Only in a few initial steps of each sequence, the pressure solver requires a larger number of iterations to converge. Typically, 8-15 iterations are necessary for the first few steps after the re-start, and after that the number of steps drops immediately. In most of the later steps during each sequence, the pressure solver makes only one or three iterations. This typical behaviour is shown in the Fig.11. A thick, almost horizontal line corresponds to linear regression, giving an idea about theaverage numerical cost perone time step.For the case of theT-shape region with 1648 spectral elements and (5,3)-collocation mesh, the average time for one step is around 16 seconds. It also grows almost linearly with the number of spectral elements and the number of local collocation nodes. Convergence Computations of an unsteady viscous flow... 897 of the pressure solver has found to be slower if the dimension of the local collocation mesh was larger. Consequently, the method performs better with a larger number of spectral elements having smaller collocation meshes than in the case of a smaller number of spectral elements having larger collocation meshes. Fig. 11. The figure shows the effect of the Fisher’s projectionmethod for predicting an initial approximation for the PCG iterations in the pressure solver. The black line shows the history of the CPU’s time of consecutive time steps corresponding to the simulation of flow in the test domain (grid A and (5,3) collocationmesh), carried out on Pentium 4 2.6GHz workstation. The peeks occur when the pressure base is restarted (every 100 steps), then the ”saturated state” is achieved and maintained for the rest of each cycle. The result is significant reduction of the average CPU time per one time step. The thick grey line presents the linear regression of the black-line relation Another issue is the choice of the time step. If the time step is small, the convergence of the iterative process for the Helmholtz problem is faster since the matrix A is more diagonally dominant. Consequently, also the pressure solver runsmore efficiently.Ontheotherhand, theoverall numberof time steps needed to cover a certain time interval is larger. Certainly, some restriction on the admissible time step comes from numerical stability since the time- integration method is based on a mixed explicit-implicit approach and, as such, it is only conditionally stable. Typically, the stability limits are due to an explicit treatment of the convection part of the Navier-Stokes equations. However, during the numerical test we encountered unstable behaviour that 898 J.Szumbarski et al. may be rather related to the ”integral” nature of the inlet/outlet conditions used in the study. The characteristic pattern of development of the numerical instability is presented in Fig.12. Initially, small regions of rapidly growing velocity appear at one or more inlet sections. Later, these regions penetrate the flowdomain,merge and lead to a ”blow-up” of the numerical solution. It is interesting that the rapid increase of the velocity magnitude is drivenmostly by its transversal components. The mechanism of this instability has not yet been resolved, and further computational tests are necessary. Fig. 12. Characteristic pattern of the initial stage of numerical instability encountered in computations with larger flow rates: small regions of large local velocity appear at one ormore inlet/outlet sections In view of computational complexity of the solver, the parallelization of the code, using theMPImessage passing library (version 1.2 of the standard) has been attempted. The parallelization had to be performed in two different ways, due to significant structural differences between various part of the se- quential code. During each time step, most of the calculations are organized in the loops running over spatial dimensions and over all spectral elements. In some parts of the code, the spatial-dimensions order is superior meaning that the loops over the elements run as lower-level calculations. Such parts of the code consume about 40% of the computational time. In the remaining 60%, the hierarchy of the computations is reversed: the loops running over the elements are external, and lower-level operations are now spatial-dimensions oriented. Another difficulty is imposed by the data model used in the sequen- Computations of an unsteady viscous flow... 899 tial solver which requires disassembling and assembling of local data from and to global data structures several times in each time step.Moreover, the global structures are used for permanent storage of the velocity and pressure repre- sentation, which allows one to avoid the redundant storage of the interfacial data but, on the other hand,may be another source of problemswith efficient parallelization. After the code profiling, the parallelization has been performed on 14 sub- routines, which take up approximately 99% of the computational time. The parallelization of the remainingparts of the codewas impossible (input/output and indivisible operations), or not sufficiently profitable. The efficiency of the current implementation is summarized in Table 1. Table 1 Number of CPU’s Efficiency Acceleration 1 1.00 1.00 2 0.83 1.66 4 0.71 2.84 8 0.53 4.26 The efficiency is defined as the ratio between the parallel code execution time divided by the number of utilized CPUs, and time of execution of a sin- gle CPU code. The tests have been performed on four dual AthlonMP2200+ computers, with the Gigabit Ethernet interconnection. To simulate the scala- bility for 8 separate computers, the communication between 2 CPUs located in the same computer was also routed through the Gigabit Ethernet network, instead of using sharedmemory. When using faster interprocessor communication in the computer, the fol- lowing performance results have been obtained (Table 2). Table 2 Number of CPU’s Efficiency Acceleration 1 1.00 1.00 2 0.91 1.81 4 0.76 3.03 8 0.65 5.19 The communication model used in the parallel solver is the synchronous message passing. Due to the large communication overhead in such a model, we develop an asynchronous parallel code that should allow larger speedups and better efficiency. 900 J.Szumbarski et al. To improve the performance, we consider re-implementing of the interpro- cessor communication in one computer with OpenMP, and replacing global data structures by local ones. Such a modification would impose a smaller communication overhead, though some global data relocationsmay still be ne- cessary, due to changing the order of space- dimensions-oriented and spectral- elements-oriented calculations in each time step. 8. Summary and final remarks The obtained results prove correctness of the formulation of the numerical method and its spectral element implementation. On the other hand, numeri- cal computations revealed someweak points, which restrain practical usability of the code in the current version. The most important limitation is due to numerical instability of the time integration scheme. The mechanism of this instability has not yet been recognized, although the obtained results may in- dicate that the source of unstable behaviour lies in the integral inlet/outlet conditions rather that explicit treatment of convection terms. To resolve this vital issue, a further theoretical workandnumerous test calculations are neces- sary. The latter task can be accomplished within a reasonable time only when an efficient parallel version of the code is developed. The MPI-based version using synchronous communications can work efficiently on systems with only a few processors. The performance of such implementation might be slightly improvedwith appropriate reorganization of source codes (modification of da- ta structures, unified ordering of the element-oriented and coordinate-oriented part of the code, etc.). However, an essential enhancement of the computa- tional efficiency will be achieved when the asynchronous communication is implemented. Such a version of the code is currently tested and the expected speed-up on 32 processors has been estimated to be 20 times. This ratio will certainly increase when appropriate codemodifications are made. Further de- velopment of the algorithms used in the code is also planned. In particular, the option of curvilinear spectral elements will be incorporated and more so- phisticated versions of the pressure solver preconditioning, based on domain decomposition, will be considered. Appendix A. The Method of Preconditioned Conjugate Gradients For the completeness of the exposition, we present here a description of the PCGmethod. Computations of an unsteady viscous flow... 901 Consider a symmetric and positive definite system Ax = b. The PCG method can be summarized as follows Choose x(0), compute r(0) = b−Ax(0), compute r̃(0) =R−1r(0) (precon- ditioning), p(0) = r̃(0). For k =1,2, ... αk =− (r̃(k),r(k)) (p(k),Ap(k)) x(k+1) =x(k)−αkp (k) r(k+1) = r(k)+αkAp (k) → convergence test r̃ (k+1) =R−1r(k+1) (preconditioning) βk = (r̃(k+1),r(k+1)) r̃ (k),r(k)) p (k+1) = r̃(k+1)+βkp (k) Appendix B. Fast Diagonalization Method (FDM) The Fast Diagonalization Method (see, for instance Couzy and Deville, 1994) is an algorithm for rapid solving of linear algebraic systems with a special tensor-product structure. Consider a linear algebraic operator L=L1⊗ I⊗ I+ I⊗L2⊗ I+ I⊗ I⊗L3 (B.1) where the squarematrices Lα, α =1,2,3 are given, the symbol I refers to the identitymatrix and the symbol ⊗ denotes the tensor product. The dimension of all matrices in (B.1) is equal to N. The operator L can be written in an index notation as follows (L)iα,jβ,kγ =(L1)iαδjβδkγ + δiα(L2)jβδkγ + δiαδjβ(L3)kγ (B.2) This operator acts on triple-index structures and returns triple-index results according to the formula (Lw)i,j,k = N−1∑ α=0 (L1)iαwα,j,k+ N−1∑ β=0 (L2)jβwi,β,k+ N−1∑ α=0 (L3)kγwi,j,γ (B.3) Assume that the matrices Lα, α =1,2,3 have complete sets of linearly inde- pendent eigenvectors. Then, following equalities hold LαPα =PαΛα α =1,2,3 (B.4) 902 J.Szumbarski et al. where Pα is the fundamental matrix for Lα and Λα is a diagonal matrix containing the corresponding eigenvalues. Then, the following formula for the inverse operator L−1 can be proved L−1 =(P1⊗P2⊗P3)Λ −1(P−11 ⊗P −1 2 ⊗P −1 3 ) (B.5) where the operator Λ is defined as Λ=Λ1⊗ I⊗ I+ I⊗Λ2⊗ I+ I⊗ I⊗Λ3 (B.6) Consider a system of tlinear equations expressed by the operator L Ly = r (B.7) According to formulas (B.5) and (B.6), the above system can be solved in the following way • compute the product r′ =(R−11 ⊗R −1 2 ⊗R −1 3 )r, with the formula (r′)ijk = N−1∑ α=0 (R−11 )iα {N−1∑ β=0 (R−12 )jβ [N−1∑ γ=0 (R−13 )kγ(r)αβγ ]} (B.8) i,j,k =0, ...,N −1 • find the solution to the ”diagonal” linear system (Λ1⊗ I⊗ I+ I⊗Λ2⊗ I+ I⊗ I⊗Λ3)r ′′ = r′ which is (r′′)ijk =(λ 3 i +λ 2 j +λ 1 k) −1(r′)ijk (B.9) • compute the product y =(R1⊗R2⊗R3)r ′′ with the formula (y)ijk = N−1∑ α=0 (R1)iα {N−1∑ β=0 (R2)jβ [N−1∑ γ=0 (R3)kγ(r ′′)αβγ ]} (B.10) i,j,k =0, ...,N −1 The most important feature of the above procedure is its computational efficiency. It can be noticed that the total numerical cost can be estimated as, roughly, 6 · N4 floating point operations. It means that solving linear system (B.7) is only twice as expensive as the calculation of the ”matrix- vector” product (B.3). Acknowledgements This work has been supported by the State Committee for Scientific Research (KBN), grant No. 7T11F01820. Computations of an unsteady viscous flow... 903 References 1. Cahouet J., Chabard J.P., 1986, Some fast 3Dfinite element solvers for the generalized Stokes problems, Internat. J. Numer. Meth. Fluids, 8, 869-895 2. Couzy W., Deville M.O., 1994, Spectral-element for the Uzawa pressure operator applied to incompressible flows, J. Sci. Comput., 9, 2, 107-122 3. Deville M.O., Fisher P.F., Mund E.H., 2002, High-Order Methods for Incompressible Fluid Flow, Cambridge University Press 4. Fisher P.F., 1998, Projection techniques for iterative solution ofAx= b with successive right-hand sides,Comput. Methods. Appl. Mech. Eng., 163, 193-204 5. Formaggia L., Gerbeau J.F., Nobile F., Quarteroni A., 2000, Nume- rical Treatment of Defective Boundary Conditions for the Navier-Stokes Equ- ations, EPFLPreprint 20.2000 6. KarniadakisG.E.,SherwinS., 1999,Spectral/hpElementMethods forCFD, Oxford University Press 7. Maday Y., Meiron D., Patera A.T., Ronquist E.M., 1993, Analysis of iterative methods for the steady and unsteady Stokes problem: application to spectral element discretizations, SIAM J. Sci. Comp., 14, 2, 310-337 8. Ronquist E.M., 1991, A domain decompositionmethod for elliptic boundary value problems: application to unsteady incompressible fluid flow, Proceedings of 5th Conference on Domain Decomposition Methods for Partial Differential Equations (Ed. D.E. Keyes et al.), 545-557 9. Saad Y., 2003, Iterative Methods for Sparse Linear Systems, 2nd Ed. SIAM Wyznaczenie nieustalonych przepływów cieczy lepkiej w trójwymiarowym układzie przewodów. Część II: Implementacja metody spektralnych elementów skończonych i przykładowe wyniki Streszczenie Druga część pracy omawia zagadnienia związane z implementacją metody spek- tralnych elementów skończonych oraz prezentuje wybrane wyniki obliczeń. Przedsta- wiono równieżwybranealgorytmy: skalowanie (ang.preconditioning) procesu iteracyj- nego dla ciśnienia, metodę FDM oraz metodę efektywnego rozwiązywania sekwencji układów liniowych dużych rozmiarów z ustaloną macierzą. Omówiono wyniki obli- czeń testowych. Przedyskutowano trudności i ograniczenia aktualnej implementacji oraz zaprezentowanowstępne efekty paralelizacji solwera spektralnego. Manuscript received April 23, 2004; accepted for print May 25, 2004