Jtam.dvi


JOURNAL OF THEORETICAL

AND APPLIED MECHANICS

50, 1, pp. 285-300, Warsaw 2012
50th anniversary of JTAM

VORTEX PARTICLE METHOD AND PARALLEL

COMPUTING

Andrzej Kosior
Henryk Kudela

Wroclaw University of Technology, Institute of Aviation, Processing and Power Machines

Engineering, Wrocław, Poland

e-mail: andrzej.kosior@pwr.wroc.pl; henryk.kudela@pwr.wroc.pl

In this paper, it was presentednumerical results related to three dimensional
simulation of motion of a vortex ring. For the simulation it was chosen the
Vortex InCellmethod. Themethodwas shortly described in the paper. The
numerical results were obtained on the single processor (x86) architecture.
The disadvantage of the single processor computation is a very long time of
computation. Tomenage this problem, we switched to the parallel architec-
ture. In our first approach to the multicore architecture we tested the po-
ssibility and algorithms for the solution of the algebraic system of equations
that resulted form discretization of the Poisson equation.We presented the
results obtainedwithCUDAarchitecture. In order to better understandhow
does the parallel algorithmswork onCUDAarchitecture, it was shortly pre-
sented a scheme of the device and how programs are executed on it. We
showed also our results which are related to the parallelization of some sim-
ple iterative methods like the Jacobi method and Red-Black Gauss-Seidel
method for solution of the algebraic system. The results were encouraging.
For the Red Black Gauss-Seidel using GTX480 card, the calculations were
90-times shorter than on a single processor. As we know the solution to the
Poisson equation is equivalent to the solution to the algebraic systems.

Key words: vortex particle method, vortex ring, parallel computations

1. Introduction

It is well known that vorticity plays a great role in fluid dynamics. Many
problems can be analysed using the vorticity and its evolution in time and
space. From this fact results a great importance ofmethods that are based on
the vortex particle method. To use them, one introduce particles that carries


286 A. Kosior, H. Kudela

information about vorticity. Tracing positions of these particles allows one to
study evolution of the vorticity field. From distribution of the vorticity one
can calculate fluid velocity.We can distinguish two groups of vortexmethods:

• direct methods, where velocity of vortex particles is calculated on the
basis of the Biot-Savart law by summing of the contribution to the ve-
locities induced by all particles in the flow,

• Eulerian-Lagrangian methods, like Vortex-In-Cell method (VIC), where
the velocity field is determined on a grid by solving thePoisson equation
for a vector potential.

The direct methods are very attractive from a theoretical point of view. They
are based on the fundamental law of vector analysis, are grid free and allow
one to exactly realize the far-field boundary condition. But the direct vortex
methods also have a great disadvantage. The amount of computational time is
much larger than for the Eulerian-Lagrangian methods (Cottet and Koumo-
utsakos, 2000). In 2D simulation, the vortex particle method is particularly
very efficient. In every time step one must solve only one Poisson equation
that combines the component of the vector potential (the stream function)
with the third component of vorticity. In 3D computation, in each time step
we need to solve threePoisson equations, one for each component of the vector
potential.
In the vortex particle method, the particles after several steps have a ten-

dency to concentrate in areas where the velocity gradient is very high. It may
lead to spurious vortex structures. To avoid this situation after arbitrary num-
ber of time steps the redistribution of particles to regular positions is done.
In the 2D case, we noted (Kudela and Malecha, 2008, 2009; Kudela and Ko-
zlowski, 2009) that it is useful to do the remeshing at every time step. At the
beginning the vortex particles are put on the grid nodes. After displacement
in every time step the intensities of the particles are redistributed again to
the initial grid nodes. This strategy has several advantages like shortening of
computational time and accurate simulation of the viscosity. In the present
paper, we implemented this idea to the case of 3D flow. Due to a very long
time of computations we found that the speed-up given by parallel computing
is necessary. The VIC method is very good suited for parallel computation.
ThePoisson equation for each component of the vector potential can be solved
independently. The remeshing process has a local character and computations
for each particle can be done independently. Also the displacement of the
vortex particle can be done in the samemanner.
In this work, we present the results of VIC method with the remeshing

in each time step applied to 3D motion of the single vortex rings for inviscid


Vortex particle methods and parallel computing 287

fluid. To speed-up the calculations, we decided to usemulticore architectures.
It is well suited for solving sets of algebraic equations. To find out how large
speed-up can be obtained, we decided to conclude some tests. In this paper,
we show procedures solving an algebraic system of equations resulting from
discretization of the Poisson equation with the use of multicore architecture.
Itwas found thatGraphical ProcessingUnits developed for video games could
be used for scientific computations. These Graphical Units provide cheap and
easily accessible hardware for scientific calculations. Execution of the sequen-
tial algorithm on multicore architecture does not give any speed–up. Parallel
computations need special algorithms. In this work, we were using following
algorithms for parallel computation:

• Jacobi method,

• Red-Black Gauss-Seidel method.

In the next section the equations of fluid motion are given. In the third
section the foundations of 3DVICmethod are described. In the fourth section
some early tests of themethod are shown. In the last two sections it is shown
how to speed up the calculations using parallel algorithms and multicore ar-
chitecture.

2. Equations of motion

Equations of incompressible and inviscid fluidmotion have the following form

∂u

∂t
+(u ·∇)u=−

1

ρ
∇p ∇·u=0 (2.1)

where u= [u,v,w] is the velocity vector, ρ – fluid density, p – pressure.
Equation (2.1)1 can be transformed to the Helmholtz equation for the

vorticity evolution
∂ω

∂t
+(u ·∇)ω=(ω ·∇)u (2.2)

where ω = ∇×u. From incompressibility (2.1)2 stems the existence of the
vector potential A

u=∇×A (2.3)

Assuming additionally that the vector A is incompressible (∇·A = 0), its
components can be obtained by solution of the Poisson equation

∆Ai =−ωi i =1,2,3 (2.4)


288 A. Kosior, H. Kudela

3. Description of VIC method for three-dimensional case

First, we have to discretize our computational domain. To do this, we set up
a regular 3D grid (j1∆x,j2∆y,j3∆z) (j1,j2,j3 = 1,2, . . . ,N), where ∆x =
= ∆y = ∆z = h. The samemeshwill be used for solving thePoisson equation.
The continuous field of vorticity is replaced by a discrete distribution of the
Dirac delta measures (Cottet and Koumoutsakos, 2000; Kudela and Regucki,
2009)

ω(x)=
N∑

p=1

αp(xp)δ(x−xp) (3.1)

where αp means the vorticity particle αp = (αp1,αp2,αp3) at the position
xp = (xp1,xp2,xp3). The domain of the flow is covered by numerical mesh
(Nx ×Ny ×Nz) with equidistant spacing h, and the i-th component of the
vector particle α is defined by the expression

αi =

∫

Vp

ωi(x1,x2,x3) dx≈ h
3ωi(xp) xp ∈ Vp, |Vp|= h

3 (3.2)

From theHelmholtz theorems (Wu et al., 2006) we know that the vorticity
is carried on by the fluid

dxp
dt
=u(xp, t) (3.3)

We must take into account also the fact that due to three-dimensionality of
the vorticity field the intensity of the particles are changed by the stretching
effect

dαp
dt
= [∇u(xp, t)] ·αp (3.4)

Velocity at the grid nodes was obtained by solving Poisson equation (2.4)
by the finite differencemethod and using (2.3). The system of equations (3.3),
(3.4) was solved by the Runge-Kutta method of the 4-th order.

3.1. Remeshing

In the Vortex in Cell method, the particles have a tendency to gather in re-
gions with high velocity gradients. This can lead to inaccuracies as particles
come too close to one another. To overcome this problem, the particles have
to be remeshed, that is distributed back to the nodes of the rectangularmesh.
In practice, the remeshing is done after several time steps. This may cause
some difficulty with the accurate simulation of the viscosity that is done by


Vortex particle methods and parallel computing 289

numerical solution of the diffusion equation. To be able to simulate the visco-
sity accurately in this work, the particles are remeshed in every time step. It
is done using an interpolation

ωj =
∑

p

Γ̃pnϕ
(
xj − x̃p

h

)
h−3 (3.5)

where j is the index of grid node and p is the index of a particle.
Let us assume that x ∈ R. In thiswork,weused the following interpolation

kernel

ϕ(x)=





1

2
(2−5x2+3|x|3) if 0¬ |x| ¬ 1

1

2
(2−|x|)2(1−|x|) if 1¬ |x| ¬ 2

0 if 2¬ |x|

(3.6)

For 3D case ϕ =ϕ(x)ϕ(y)ϕ(z).
We require our interpolation kernel to satisfy

∑

p

ϕ
(x−xp

h

)
≡ 1 (3.7)

Tomeasure the discrepancy between the old (distorted) and new (regular)
particle distribution

∑

p

Γ̃pnδ(x− x̃p)−
∑

p

Γpnδ(x−xp) (3.8)

one canmultiply the above function by a test function φ and get (Cottet and
Koumoutsakos, 2000)

E =
∑

p

Γ̃pnφ(x̃p)−
∑

p

Γpnφ(xp) (3.9)

where Γ̃pn and x̃p are values from the old distribution.
Using (3.5), we can write

E =
∑

p

Γ̃pn

[
φ(x̃p)−

∑

p

φ(xj)ϕ
(xj − x̃p

h

)]
(3.10)

Thus we have to evaluate the function

f(x)= φ(x)−
∑

j

φ(xj)ϕ
(xj −x

h

)
(3.11)


290 A. Kosior, H. Kudela

Using (3.7) the (3.11) can be rewritten as

∑

j

[φ(x)−φ(xj)]ϕ
(xj −x

h

)
(3.12)

The Taylor expansion of φ yields

f(x)=
∑

α

∑

j

[ 1
α!
(xj −x)

αd
αϕ

dxα

]
ϕ
(x−xj

h

)
(3.13)

We get the conclusion that if ϕ satisfies

E =
∑

q

(x−xq)
αϕ
(x−xq

h

)
1¬ |α| ¬m−1 (3.14)

then

f(x)= O(hm) (3.15)

and the remeshing procedure will be of the order m. It means that the po-
lynomial functions up to the order m will be exactly represented by this
interpolation.

Kernel (3.6) used in this work is of order m =3.

4. Test problem

To check correctness of our code with the remeshing particle position in every
time step we carry out a test for the vortex ring. The calculations were done
on a single processor.

For a thin cored ringwith circulation Γ formula for the translation velocity
of the vortex ring is (Green, 1995)

U =
Γ

4πR0

[
ln
(8R0
ǫ0

)
−
1

4
+O
( ǫ0
R0

)]
(4.1)

where R0 is the ring radius and ǫ0 is the core radius (ǫ0/R0 ≪ 1).

We carried out four computational experiments. The parameters of the
vortex ring R0 and ǫ0 were constant, R0 =1.5, ǫ0 =0.3. The computational
domainwas 2π×2π×2π andanumerical grid had 128 nodes in each direction.
Periodic boundary conditions in all directions were assumed.


Vortex particle methods and parallel computing 291

In the initial step, every vector particle αp whose position (xp,yp,zp)
fulfills the equation

(
√
x2+y2−R0)

2+z2− ǫ20 < 0 (4.2)

obtains a constant portion of vorticity. The initial circulation of the ring Γ
was changed in the range Γ ∈ [1.06,4.24].

To advance in time, the fourth order Runge-Kutta method was used with
the time step equal ∆t =0.01.

The results are shown in Fig.1.

Fig. 1. Comparison of theoretical and computed velocities of the vortex ring

The sequence of positions of the vortex ring for the largest value of Γ is
shown in Fig.2.

Fig. 2. Vortex ring evolution for the case Γ =4.23. The sequence of the iso-vorticity
surface for |ω| ∈ [6,14] at different time t


292 A. Kosior, H. Kudela

We can see in Fig.1 good agreement of the theoretical and numerical re-
sults. Some discrepancy between the results from analytical Kelvin formula
(4.1) and our numerical calculationmay stem from the fact that formula (4.1)
was derived for the vortex ring with the uniform distribution of the vorticity
inside the vortex core. Duringmotion of the ring, the vorticity distribution in
the core changes in time and is not uniform (see Fig.3). It is worth to notice
that our results better fit to the formula derived by Hicks for stagnant fluids
inside the core (Saffman, 1992). In the formula by Hicks, the coefficient 1/4
in (4.1) is replaced by 1/2. In Fig.1, the velocity calculated from the Hicks
formula is drawn by a dashed line.

Fig. 3. Isolines of vorticity in the vortex ring core after t =6. The initial vorticity of
the core was |ω|=15

In Fig.2, we can see the development of the ring in time. The evolution
of the vorticity around the core of the ring depend on the parameter of the
ring and distribution of the vorticity inside the core. It is well known that the
vortex ring is an unstable structure (Green, 1995) andwe can expect the ring
to take a fuzzy shape. It is clearly visible in Fig.4 where the position of the
vortex ring is shown in the initial and final state but without cutting off the
smaller value of the vorticity.
During computation, the invariants ofmotionwere checked.Kinetic energy

was calculated from two different equations

Ek =

∫

Ω

u
2 dV and Ek =

∫

Ω

Aω dV (4.3)


Vortex particle methods and parallel computing 293

Fig. 4. Vortex ring evolution for the case Γ =4.23. The iso-vorticity surface for
|ω| ∈ [0.2,14] for the initial and final time step

After 750 time steps, the kinetic energy calculated from equations (4.3)
dropped down by less than 2%. The divergences of the vector potential A,
vorticity ω and velocity uwere all lower than 1.0E−10 during whole calcu-
lations.

5. Parallel computing

Themain disadvantage of the presentedmethod is large time of computations.
To overcome this problem, it seems to be reasonable to apply parallel compu-
tations. The Single InstructionMultipleData (SIMD) architecture was chosen
for parallelization. We may divide our code into two parts:

• part for which the parallelization is straightforward (movement of par-
ticles, computation of velocity in grid nodes),

• and part for which the parallelization will causemore problems (solving
Poisson equation).

In this paper, we concentrated on the second part of the code, that is on
solution of the Poisson equation. First of all, in the three-dimensional case we
have to solve three Poisson’s equations in every time step. Because they are
independent of each other, we can solve them simultaneously. Because Fast
Poisson Solvers were designed for sequential computing, they will not take
the advantage of SIMD architecture. This is why we need to find some other
algorithms.


294 A. Kosior, H. Kudela

For our calculations, we have selectedGraphical ProcessingUnits (GPUs).
It has great computingpower-to-cost ratio. But it has a different programming
model and offers few types of memory. One need to properly recognize pro-
blems connected with this architecture in order to take the advantage of its
computational power.Todo that,we tested differentmemory types andnume-
rical algorithms. To find which is the most efficient, we compared them with
the same algorithm executed on a single processor. Our results are presented
in this paper.

6. Parallel algorithms

Poisson’s equation can be written in the following form

∂2Al
∂x2
+
∂2Al
∂y2
+
∂2Al
∂z2
=−ωl (6.1)

where Al is a component of the vector potential A= [A1,A2,A3], and ωl is a
component of the vorticity vector ω= [ω1,ω2,ω3]. The velocity in grid nodes
can be calculated from relation (2.3). For two-dimensional flows, there is only
one non-zero component of the vector potential A= [0,0,ψ]. For the sake of
simplicity, we will derive all of the following formulas for the two-dimensional
case. A five-point stencil is used for discretization of the Poisson equation on
a rectangular grid (ihx,jhy). If we additionally assume that the grid steps in
each direction are equal (hx = hy = h) we can rewrite equation (6.1) in the
form

ψi,j+1+ψi+1,j −4ψi,j +ψi−1,j +ψi,j−1 =−ωi,jh
2 (6.2)

where

i =0,1,2, . . . ,Nx j =0,1,2, . . . ,Ny

This way we obtain a set of algebraic equations with an unknown vector of
the stream function ψi,j at the grid nodes. This set of equations is solved by
iterative methods.

6.1. Jacobi method

One of the earliest and simplestmethods of solving sets of linear equations
is the Jacobimethod (Braide, 2006; Thomas, 1999). In every iteration, the va-
lues of unknown variables are calculated independently of each other. Thanks
to this, the Jacobi method can be easily parallelized. For the two-dimensional


Vortex particle methods and parallel computing 295

case, in each iteration the value of ψi,j is computed using ψi,j values from the
previous iteration. One can transform equation (6.2) in order to compute the
central element in the form

ψ
(k)
i,j =

1

4

(
ωi,j +ψ

(k−1)
i,j+1 +ψ

(k−1)
i+1,j +ψ

(k−1)
i−1,j +ψ

(k−1)
i,j−1

)
(6.3)

where k is the iteration number.Because in each iteration the unknownvalues
are independent, one can use parallel computations. Values at each grid node
can be evaluated by different threads. Here, the threads are related to the
inner structure of the computing device.

6.2. Red-Black Gauss-Seidel method

The second iterative method for solving sets of linear equations is the
Gauss-Seidel method (Braide, 2006; Thomas, 1999). In sequential computing,
calculations of the unknowns can be done in a lexicographical order shown in
Fig.5.One can see that some values had already been evaluated. For example,
to compute the new value for node 13, we use values from nodes 12 and 8 for
which the new values were already found. One can take advantage of this fact
and rewrite equation (6.3) in the form

ψ
(k)
i,j =

1

4

(
ωi,j +ψ

(k−1)
i,j+1 +ψ

(k−1)
i+1,j +ψ

(k)
i−1,j +ψ

(k)
i,j−1

)
(6.4)

Fig. 5. Lexicographical ordering

Thanks to this, the Gauss-Seidel method needs less iterations than the
previous method to solve a set of equations. Unfortunately, it cannot be used
in parallel computing in this form because we need all computation to be
independent. What we can do here is to split our task into two parts. We
divide our computational grid as it is shown in Fig.6.


296 A. Kosior, H. Kudela

Fig. 6. Red-Black ordering

Like on the chessboard, we divide nodes into red and black ones. In the
first step of this method, we evaluate values only at the black nodes. It can
be seen that in order to do so we only need values of the unknowns at the
red nodes. Thanks to this, the computations will be independent and can be
parallelized. In the second step, we do the same with the red nodes using the
new values from the black ones. We can write new equations in the following
form:

ψ
B(k)
i,j =

1

4

(
ωi,j +ψ

R(k−1)
i,j+1 +ψ

R(k−1)
i+1,j +ψ

R(k−1)
i−1,j +ψ

R(k−1)
i,j−1

)

ψ
R(k)
i,j =

1

4

(
ωi,j +ψ

B(k)
i,j+1+ψ

B(k)
i+1,j +ψ

B(k)
i−1,j +ψ

B(k)
i,j−1

) (6.5)

7. Different types of memory in CUDA architecture

An important element of using parallel architectures is efficient usage of the
memory. In CUDA architecture, we can distinguish the following memory ty-
pes [8]:

• device memory

• texture memory

• constant memory

• shared memory

• register memory.

The first of the mentioned types can be accessed by threads from all stre-
aming processors. Unfortunately, reading from this type of memory is very


Vortex particle methods and parallel computing 297

slow (takes hundreds of clock cycles). During that time no further operations
can be done and it slows down execution of the program. If some data is used
more than once in the program, there is a way to shorten the access time.We
can use shared memory. Access to this memory is very fast (only a few clock
cycles) but it can be accessed only by a limited set of threads. Nonetheless, it
is useful for solving some problemswithmemory lags. Themaking use of it in
the test programs enabled a speed-up of over 2 times.

8. Results

8.1. Jacobi method

The test problemwas a three-dimensional Poisson equationwhose solution
was a following function

ψ(x,y,z) = 100xyz(x−1)(y −1)(z −1) x,y,z ∈ [0,1] (8.1)

We tested our method for different types of device memories (shared me-
mory or texturememory).We also tested a case with enlarged computational
grid (Fig.7).

Fig. 7. Enlarged numerical grid allowing for a code with no conditional branching

In this case, one does not have to use conditional branching inside the
code forGPU.The parameters of the computational grid and obtained speed-
ups are presented in Table 1. In each test, 100 iterations were done by the
Jacobi method. Computations were performed on: CPU (Intel Core 2 Quad
Q9550), TESLAGPU (NVIDIA TESLA S1070) and FERMI GPU (NVIDIA
GFX 480).


298 A. Kosior, H. Kudela

Table 1. Speed-up of the Jacobi method

Number of nodes TSL SH TSLTX TSLNC FRMNC

32×32×32 4.05 6.61 6.94 12.32
64×64×64 17.71 26.32 31.26 52.82
128×128×128 24.78 29.95 43.67 58.89
256×256×256 25.47 24.59 41.92 60.96

Abbreviations used in Table 1 and Table 2 denote TSL – TESLA GPU,
FRM–FERMIGPU, SH – SharedMemory, TX – texturememory andNC –
no conditional branching.
As can be seen, the speed-up strongly depends on both the type ofmemory

used (the results for the global memory which were worse than those for the
shared memory are not presented) and the shape of the numerical grid.

8.2. Red-Black Gauss-Seidel

The next test problemwas the same equation as for the Jacobi method

ψ(x,y,z) = 100xyz(x−1)(y −1)(z −1) x,y,z ∈ [0,1] (8.2)

Here we tested only the case which gave the best results for the Jacobi
method – the no-conditional branching case. The parameters of the computa-
tional grid and obtained results are presented in Table 2. In each test, 100 ite-
rations were done by the Jacobi method. The computations were performed
on: CPU (Intel Core 2 QuadQ9550), TESLAGPU (NVIDIA TESLAS1070)
and FERMIGPU (NVIDIAGFX 480).

Ttable 2. Speed up of the Red-Black Gauss-Seidel method

Number of nodes TSLNC FRMNC

32×32×32 11.17 45.06
64×64×64 26.11 63.01
128×128×128 35.95 88.62
256×256×256 31.22 89.47

9. Conclusions

Nowadays, it is not difficult to notice that the computational power of a sin-
gle processor has stopped rising. Parallel architectures deliver means to speed
up computations. Developing programs on GPUs is an interesting alternative


Vortex particle methods and parallel computing 299

to CPU. Thanks to hundreds of streaming processors working in parallel we
can get the results faster. They are also quite cheap and easily accessible. An
important element of parallel computations is choosing the right computatio-
nal method allowing for effective use of the computer architecture. Moving a
sequential program to this hardwaremay not be the best solution. Proper use
ofGPUs (memorymanagement, parallel algorithms, etc.) allows the programs
to be executedmuch faster (even 60-90 times faster)with a relatively low cost.

Acknowledgments

The authors would like to thank the Institute of Informatics,WroclawUniversity

of Technology for its support and access to the resources of the Cumulus Computing

Environment.

References

1. Braide B., 2006, A Friendly Introduction to Numerical Analysis, Pearson
Prentice Hall

2. Cottet G.H., Koumoutsakos P.D., 2000, Vortex Methods: Theory and
Practice, Cambridge University Press

3. Green S.I., 1995,Fluid Vortices, Springer

4. KudelaH.,KozlowskiT., 2009,Vortex in cellmethod for exteriorproblems,
Journal of Theoretical and Applied Mechanics, 47, 4, 779-796

5. Kudela H., Malecha Z.M., 2008, Viscous flow modeling using the vortex
particles method,Task Quarterly, 13, 1/2, 15-32

6. Kudela H., Malecha Z.M., 2009, Eruption of a boundary layer induced by
a 2D vortex patch, Fluid Dyn. Res., 41

7. Kudela H., Regucki P., 2009, The vortex-in-cell method for the study of
three-dimensional flows by vortex methods, [In:] Tubes, Sheets and Singulari-
ties in Fluid Dynamics, Vol. 7 of Fluid Mechanics and Its Applications, 49-54,
Kluwer Academic Publisher, Dordrecht

8. NVIDIA CUDA Programming Guide, 2009, www.nvidia.com

9. Saffman P.G., 1992,Vortex Dynamics, Cambridge University Press

10. Thomas J.W., 1999, Numerical Partial Differential Equations: Conservation
Laws and Elliptic Equations, Springer-Verlag, NewYork

11. Wu J.Z., Ma H.Y., Zhou M.D., 2006, Vorticity and Vortex Dynamics,
Springer


300 A. Kosior, H. Kudela

Metoda cząstek wirowych w obliczeniach równoległych

Streszczenie

W pracy przedstawiono wyniki numeryczne ruchu trójwymiarowego pierścienia
wirowego. W obliczeniach zastosowano metodę cząstek wirowych, która została po-
krótce opisana. Obliczenia przeprowadzono na pojedynczym procesorze (x86).Wadą
takiej realizacji jest długi czas obliczeń. Dla przyspieszenia obliczeń zaproponowa-
no algorytm obliczeń równoległych w środowisku wieloprocesorowym karty graficz-
nej z technologią CUDA. Architekturę karty krótko opisano. Znajomość architektury
ma istotne znaczenie dla efektywności kodu. Napisany program przetestowano, roz-
wiązując układ równań algebraicznych otrzymany po dyskretyzacji równania Pois-
sona. Przedstawiono wyniki obliczeń dla zrównoleglonych, prostych metod iteracyj-
nych rozwiązywania układów równań takich jak metoda Jacobiego czy „Red-Black
Gauss-Seidel”.Dlametody „Red-BlackGauss-Seidel” oraz kartyGTX480 otrzymano
90-krotne przyspieszenie czasu obliczeń względem pojedynczego procesora.

Manuscript received June 25, 2010; accepted for print June 20, 2011