Vol52,5,2009 505 ANNALS OF GEOPHYSICS, VOL. 52, N. 5 October 2009 Key words parallel computing – SMS – seismic wave propagation – numerical simulations – site effects 1. Introduction An important contribution to assessing in- teraction between near-surface geology and seismic radiation arises from studies involving the so called site effects P.-Y. Bard, 1995; 1998; Panza et al., 2001; S. Bonnefoy-Claudet et al., 2006, amongst others). These latter are related to seismic wave propagation phenomena (Aki, 1988; 1993) with important implications for seismic hazard mitigation. Site effects can be studied using an instru- mental approach during earthquakes recording soil shaking on specific stations located on suit- able chosen sites (Caserta et al., 2000; Di Giulio et al., 2002; Cultrera et al., 2003; Cara et al., 2005). Nevertheless, it would be more use- ful to know the site characteristics of the ground motion before an earthquake occurs, in this respect numerical modelling is a powerful approach (Caserta and Lanucara, 2000; Moczo et al., 2001) also for estimating parameters commonly used for earthquake engineering purposes (Fäh et al., 1993; Rovelli et al., 1994). This is particularly true in urban areas where the instrumental approach may not always be used because of the high level of anthropogenic noise (Olsen et al., 2006). The great drawback to modelling more real- istic site dynamics is the impressive computa- tional requirements needed for numerical simu- lations such as: gigabytes of memory, perform- ances at gigaflops rate, days of computational time to simulate a minute’s soil shaking, etc. (Sáncez-Sesma and Luzon, 1995; Graves, 1998). Seismologists have been addressing these issues through the use of parallel comput- Parallelisation technique for serial 3D seismic codes: SMS approach Arrigo Caserta (1)(2), Vittorio Ruggiero (3), Maria Pia Busico (3) and Ivo Oprš al (2) (1) Istituto Nazionale di Geofisica e Vulcanologia, Roma, Italy (2) Charles University, Prague, Czech Republic (3) C.A.S.P.U.R., Roma, Italy Abstract We investigate a fast and easy way to parallelise seismological serial codes mainly oriented for simulating the seismic wave propagation through anelastic dissipative media. Having an efficient modelling tool is important in both assessing strong ground motion and mitigation of seismic hazard when the site effects are considered, and in crustal propagation when the crustal geological structures are of interest. Our chosen case study is repre- sentative of a set of such seismological 3D problems. The Scalable Modelling System (SMS) tool for paralleliza- tion is considered. The IBM SP5 native compiler has been used. Results such as Speed-Up and Efficiency are shown and discussed. SMS can run both in shared and distributed memory environments. The greater advantages of using SMS in such environments become apparent with the utilisation of a higher number of multiprocessor machines arranged in a cluster. We also demonstrate how successful porting from serial to parallel codes is re- alised by way of minimal instructions (6% of the serial original code only) provided that an ad hoc profiling analysis of the serial code is first performed. Mailing address: Dr. Arrigo Caserta, Istituto Naziona- le di Geofisica e Vulcanologia, Via di Vigna Murata 605, 00143 Roma, Italy; e-mail: arrigo.caserta@ingv.it Vol52,5,2009 1-12-2009 18:35 Pagina 505 506 A. Caserta, V. Ruggiero, M. P. Busico and Ivo Opršal ers (Olsen and Archuleta, 1996) coupled with optimization techniques such as the use of un- structured grids to reduce the number of com- putational nodes (Bao et al., 1996). During the last 20 years a considerable amount of work has been done in numerically modelling the interaction between both near- surface and inner geological heterogeneities and seismic radiation (see the review of Bard, 1998; Bielak et al., 1998, Panza et al., 2001; Moczo et al., 2007, and references therein). The results are huge serial codes able to numerical- ly simulate the process. It would be useful to re- convert such serial codes into parallel ones, to have more powerful simulations that realistical- ly represent the dynamics of such interaction. On the other hand, the conversion of a seri- al code to a parallel one requires typically rewriting only choice available is to rewrite the serial codes from the beginning, and this is not an easy task as well as being energy demand- ing. Not to mention that writing a 3D code from the scratch, on the basis of our experience, is more time demanding than paralleling an exist- ing serial 3D code. This paper is aimed at inves- tigating a more simple and efficient way to par- allelise a 3D serial code, more specifically codes for simulating the seismic wave propaga- tion through viscoelastic dissipative 3D media. For the purposes of our study, we adopted the serial code of Opršal and Zahradnik (2002), which contains general aspects that make it rep- resentative of the dynamics that we want to simulate. Based on a finite-difference technique on a Cartesian mesh (Opršal and Zahradnik, 2002; Opršal et al., 2004, 2005), it computes the full 3D seismic wave-field. Null normal stress boundary condition on the topographic surface is realized by the so-called vacuum for- malism (Opršal and Zahardnik, 2002). Its sta- bility conditions are represented by the classical C.F.L. criterion for explicit methods (Mitchel and Griffits, 1980). These features are similarly adopted in many of the serial codes developed in the last years dealing with the seismic wave propaga- tion problem, used to assess strong ground mo- tion (Zahradnik et al., 1993; Graves, 1998; Opršal and Zahradnik, 1999; Pitarka, 1999; Moczo et al., 2001; amongst others). 2. Study case: serial 3D wave propagation code The main guidelines for the previously-ref- erenced program by Opršal and Zahradnik (2002) are based on the formulation of an initial boundary value problem that represents the vis- coelastic dynamics driving the full 3D wave- field propagation. The medium is dissipative with arbitrary 3D shape and topography. Source energy nucleation and its crustal propagation can be numerically modelled through a discrete wave-number method (DW), as well as ray theory or even an analytical solu- tion; the resulting wave-field represents the seismic radiation ready to light up the site un- der study. The finite-difference (FD) technique is adopted to model the interaction between the site and the seismic radiation previously com- puted yielding the site response. In such a way we are able to study the combined source – path – site response of the medium under the action of input radiation as it arises from the seismic source and propagated through the crust be- neath the site. The DW-FD coupling used here is realized by extending hybrid technique developed first for the 2D case by Zahradnik and Moczo (1996) and generalized by Opršal et al. (2009) to the 3D case. The main advantage of such a hybrid tech- nique is fast evaluation of site effects and seis- mic hazard because it needs less computer memory and time than all-in-one source-path- site computational methods. Computations are made in the time domain. The results are syn- thetic seismograms on the free surface and snapshots of the wave amplitude inside a 3D profile (for further details see Opršal and Zahradnik, 2002). Furthermore, this hybrid technique allows us to study the source-crustal propagation in case we are interested in the crustal propagation only. Our test case deals with a real geological structure (Fäh et al., 2006), representing the stratigraphy underlying the Roman city Augus- ta Raurica (fig. 1), located east of Basel, Switzerland. The 3D structure of the area was retrieved through array investigations (H/V, am- bient vibrations) and borehole data (Fäh et al., Vol52,5,2009 1-12-2009 13:29 Pagina 506 507 Parallelisation technique for serial 3D seismic codes: SMS approach 2006; Opršal and Fäh, 2007), and the topmost structure (namely so-called Top_Muschelkalk interface), was refined in terms of fundamental frequencies fit by modifying the shape of the interface limiting the upper geology from be- low (see fig. 2). The geological model was con- verted into a computational one according to the stability conditions of the numerical scheme (fig. 1). According to Fäh et al. (2006), the section of interest is 1552 m (WE) x 1696 m (SN) large rectangular area with constant spatial sampling of 4 m. The computational domain covers the wave dumping area around, resulting in 442 x 480 grid points corresponding to 1764 m x 1916 m of computational model, respectively. The vertical extent of the computational model is 115 grid points corresponding to ~793 m depth while the geological structure differing from bedrock up to topography peaks is less then 150 m. The vertical grid step is not con- stant, being scaled with respect to the vertical velocity profile. This study also used the same time-step, 0.00023 s and these parameters guarantee the accuracy of the FD computation up to a fre- quency of 12 Hz with a usable range up to 10 Hz. We simulate 5 s ground shaking for a total of 21740 time-steps. A problem of near 73 mil- lion unknowns is solved at each time-step. Al- though the Augusta Raurica site was chosen for the purpose of this test case, our study code can be applied to any 3D stratigraphy, as it de- pends on neither the shape nor the geological details of the site. Indeed, other recent real case studies (Opršal and Fäh, 2007; Sorensen et al., 2006; Opršal et al., 2004; 2005), have used this study code to simulate wave-field time evolu- tion and its interaction with the geological structure. Fig. 1. Geological model at Augusta Raurica. The EW vertical slice is a 3D FD computational model extrac- tion. See also fig. 2. Vol52,5,2009 1-12-2009 13:29 Pagina 507 508 A. Caserta, V. Ruggiero, M. P. Busico and Ivo Opršal 3. Parallelisation Tools to parallelise a code can be divided in- to high-level and low-level programming. Par- allel virtual machine (PVM) and message pass- ing interface (MPI) belong to low-level pro- gramming, where the parallelisation scheme is planned, designed and realized by the user. In more detail, the management of information ex- changed among processors (e.g., message pass- ing, synchronisations, etc.) is realised by the user adding code segments. Although efficient on shared and distributed memory architec- tures, low-level techniques will not be consid- ered in this study as, on the basis of our experi- ence, too much effort (and time) must be spent to convert a serial code into an MPI (or PVM) parallel one (Caserta et al., 2002). To parallelise our serial code we chose tools based on a high level technique where all low level parallel ma- chinery (creation, communications and mainte- nance of processes) is realised by directives to the compiler in a user-hiding mode. The best candidates for such an approach are High Performance Fortran (HPF), OpenMP and Scalable Modelling System (SMS). It must be noted that while OpenMP is only available on shared memory architectures, both HPF and SMS can be used in shared as well as distributed memory environments. Contrary to Caserta et al. (2002), we have avoided adopting OpenMP for several reasons. First of all, our main target is to simulate numerical problems with tens of millions of variables, nearly 20 times the aver- age size problem of Caserta et al. Therefore, we prefer to adapt our code to distributed memory machines that allow to handle much larger amount of memory. We choose a high-level technique based on distributed memory archi- tecture as it is more suited for our purposes. Fig. 2. Intensities for Mw=5.9 1999 Athens-like earthquake simulation in Augusta Raurica (left panel); for de- tails see Fäh et al. (2006) and Opršal and Fäh (2007). Thick white lines are rivers Ergolz (left) and Violenbach (right), red semi-ring denotes the amphitheatre, and the green lines in the southern part are for present-time high- way. The thin white lines are historically re-mapped streets of the upper and lower city of Augusta Raurica. The right panel depicts vertical slices of computational model realization of the geology structure; vacuum shown as black color. Black arrows connect the surface traces of the vertical geology slices. Vol52,5,2009 1-12-2009 13:29 Pagina 508 509 Parallelisation technique for serial 3D seismic codes: SMS approach Because HPF is maintained by third-party and private companies (see the HPF official web- site http://hpff.rice.edu), we have chosen the SMS tool which is freely available. See Foster (1995) for a complete survey of parallel program- ming design and tools on Distributed Memory ar- chitectures, while refer to Nichols et al. (1996) and the official OpenMP web site for Shared Memory references. As far as SMS is concerned, see Govett et al. (2003) and the official SMS website (http://wwwad.fsl.noaa.gov/ac/sms.html). The detection of code segments that must be mod- ified to optimize the serial code and to reduce the memory requirement is performed by the user via the application of a profiling analysis. In such a way, we can ensure a uniformly optimized serial code suitable for parallelisation. Below, we detail how this operation is carried out. 3.1. SMS technique SMS has been developed by the Advanced Computing Branch of the Forecast Systems Laboratory at NOAA (National Oceanic and Atmospheric Administration) (Govett et al., 2003). The user inserts directives in the form of comments (indetified by!sms$) into the exist- ing serial Fortran code. SMS translates the code and directives into a parallel version that runs efficiently on shared and distributed mem- ory high-performance computing platforms. The translation is realized by the Parallelising Pre-Processor (PPP), a component of SMS. In other words, SMS translates user directives in- to MPI-like code. The main advantage of using SMS is that such translation is hidden to the user. This is because SMS is built on top of MPI. As a consequence no complicated com- piler-generated communication statements have to be included in the code. Moreover, SMS contains a number of features to speed up the debugging process and to support incre- mental parallelization. In order to obtain con- structs ready to be automatically parallelised by the SMS, the first step is to upgrade some commands of the original Fortran77 code to Fortran90. In case the code is already written in Fortran90, the first step is to modify the code in such a way that it can directly understand SMS directives. So, according to SMS paral- lelization philosophy, we have first to define the decomposition type through the decom- position declaration as in the follow- ing example module decomp !sms$distribute (DECOMP_XY. 1. 2) end module decomp (3.1) With the previous declaration we have cre- ated a new Fortran90 module to define decom- position structure, DECOMP_XY, needed to support data exchange, local and global address translation, do-loop transformations and data decomposition. The number that follows DE- COMP_XY in the decomposition declaration is the number of dimensions involved in the de- composition declaration. Even though we deal with a 3D problem, the maximum number of dimensions allowed by SMS for parallelizing in the decomposition declaration is 2. The next step is to distribute the array repre- senting the solution of our equations of motion, more specifically the array u (:,:,:,:,:) representing soil displacement, as well as all other auxiliary arrays. According to the previ- ous decomposition declaration we have: !sms$distribute (DECOMP_XY. 1. 2) begin real. allocatable : : u (:,:,:,:,:) !sms$distribute end (3.2) This directive links the data decomposition structures, defined in eq. (3.1), with arrays tar- geted for decomposition. With the previous SMS directive we have chosen to distribute the first two indices of array u. Declarations (3.1) and (3.2) allow SMS to understand how the chosen arrays must be dis- tributed among processors. This is because all information necessary to access, communicate, input and output, decomposed and non-decom- posed arrays are specified in the previous two declarations. To go on with the parallelization, we insert SMS directives in the original For- tran90 code, after the declaration section. The first instruction is the decomposition creation Vol52,5,2009 1-12-2009 13:29 Pagina 509 510 A. Caserta, V. Ruggiero, M. P. Busico and Ivo Opršal where the decomposition name, DECOMP_XY, the global size of the 1st and 2nd decomposed di- mensions, and width of the halo re- gion, <1, 1>, are specified. More in details, represents the maximum number of grid points of the 1st and 2nd dimension that must be decomposed, whereas <1, 1> represents the number of grid points involved in the halo region; we use a nearest neighbours algorithm in which we have specified the halo region made by just one grid point. Of course, such a halo region is in common between two neighbour processors. SMS requires the indices that have to be paral- lelized must be both, the first two in the array and they must refer to the grid’s nodes. Such con- strains might require modifications of the serial code to have the indices in correct order for the SMS parallelisation. The following old Fortran90 instruction in the serial code ALLOCATE (u(ndim,nx,nz,nt)) has the first index not linked with the grid’s nodes, ndim is an index re- ferring to the component, actually. Moreover, the indices of the grid’s nodes, i.e., nx,ny,nz are in the wrong position to be parallelized. So, accord- ing to the aforementioned SMS rules, we have been forced to change the index order as follows ALLOCATE (u(nx,ny,nz,nt,ndim) (3.4) Once we have detected segments of the pro- gram that should be parallelized analyzing the code with a profiling software, we insert SMS directives in the serial code. From such profil- ing analysis it turned out that more than 84% of the time is spent by the serial code in executing the following section for a single time step, that’s why we have focused our attention on it: !sms$parallel(DECOMP_XY, , ) begin !sms$exchange (u) ..... ..... do kz=2,nz-1 do ky=2,ny-1 do kx=2, nx-1 call stencil_full(kx,ky,kx) ! full form stencil enddo ennddo enddo ..... ..... !sms$parallel end ..... ..... subroutine stencil_full(kx,ky,kz) ! the 3-D stencil: ..... ..... unew=2.*u(kx,ky,kz,2,kul)-(u(kx,ky,kz,1,kul)*dtdt_rho* (& & u(kx+1,ky-1,2,ku2)-u(kx,ky-1,kz,2,ku2)) - & &(u(kx,ky,kz,2,ku1)-u(kx-1.ky,kz,2,ku1))))+ & & ((muf*r1_dyky*(u(k,ky+1,kz,2,ku1)-u(kx,ky,kz,2ku1)))- & ..... ..... u(kx,ky,kz,kt3,ku1)=unew end soubroutine stencil_full (3.5) !sms$create_decomp(DECOMP_XY, , <1, 1>) (3.3) Vol52,5,2009 1-12-2009 13:29 Pagina 510 511 Parallelisation technique for serial 3D seismic codes: SMS approach The parallel section starts with the SMS in- struction !sms$parallel(DECOMP_XY, , ) begin. This instruction de- fines a region over which parallel computations will be done on each processor’s local data, as defined by the given data decomposition DE- COMP_XY. All do-loops inside a parallel region that reference the specified loop variables (, ) will be translated. It is worth noting that the order of the indices in the u array coupled with the order of the nest- ed loops in (3.5), is the most efficient combina- tion in respect to the so called easy memory ac- cess patterns problem (Dowd and Severance, 1998; Hennessy and Patterson, 2006). This holds for both serial and parallel codes provided they are written in Fortran language. For differ- ent programming languages the coupling be- tween the order of indices in the array and the order of nested loops might be different. The SMS directive !sms$exchange (u) has been introduced to communicate with neighbouring processors to update halo regions and to maintain consistency with the serial code in the updating. Such a communication is nec- essary because the array u shows a index de- pendence on the first neighbouring grid points, e.g. u(kx+1,ky 1,kz,2,ku2). Infor- mation provided by (3.2) is used to generate the correct communication code for our exchanged variable. In other words, !sms$exchange (u) is in charge of the message passing. Time updating procedure has not been changed from serial to parallel code, details about such a pro- cedure can be found in the Opršal and Zahrad- nik (2002). Probable serial parts of the code, that cannot be parallelized, are nested into a parallel one by !sms$serial directive. The parallel region is closed by adding the directive !sms$par- allel end. It is worth noting that we have obtained the parallel version inserting directives that repre- sent only 6% of the whole serial code. 3.2. SMS performances For this study, numerical tests are run on an IBM SP5 cluster, with 48 processors arranged in six symmetric multi processor (SMP) ma- chines (8 processors each). Communication be- tween processors is realised via shared memory within each 8-way SMP machine, while a high performance switch (HPS) is used to realize communication between the six 8-way SMP machines in a cluster. Each processor has 1.9 Ghz Power5, the integrated cluster has a peak performance rate of 364 Gflops and a global RAM of 192 Gbytes. We used the XLF IBM native Fortran third party compiler optimized according to the target architecture, moreover, we have optimized the SMS tool to the IBM SP5 cluster. In order to achieve suitable per- formances the first step is to balance the com- putational work among processors. By default, SMS uses a pre-determined set of rules to de- cide how grid points are assigned to each process and how many processes are allocated to each decomposed dimension. Roughly speaking, the grid-points are distributed evenly among the processes along a given decomposed dimension. In order to adopt its domain decomposition we have to run our SMS executable code as fol- lows smsRun np 3dcpp_SMS where 3dcpp_SMS is our executable whereas np is the number of processors. Ideally, each processor will have the same number of computational grid-points. In prac- tice, most models have computations that vary spatially so some processes may have more work than others. This is commonly known as load imbalance. Load imbalances slow down a parallel program because processes with less work are forced to wait for processes with more work to do. The criterion used by SMS to divide the computational workload is to minimize infor- mation that must be exchanged among proces- sors. However, attention should be paid in real- izing the domain decomposition process. This can be easily done by the user because SMS allows to specify the assignment of processors using a process configuration file in the form of a Fortran name list. In details, we run our executable SMS code as follows Vol52,5,2009 1-12-2009 13:29 Pagina 511 512 A. Caserta, V. Ruggiero, M. P. Busico and Ivo Opršal smsRun -cf my_config 3dcpp_SMS where my_config is our configuration file &decomp decomp1_name = ‘decomp_xy’, decomp1_nps = 2 23/ in which we specify, in the last line, the number of processors along x-direction and y-direction, respectively. In this case we have 46 processors available. Figure 3 shows the time needed for a single time-step versus number of processors for both the domain decomposition suggested by SMS (Default in fig. 3), and the one preferred by us (User in fig. 3). Oscillations in time are clearly recognized as the number of processors in- crease. Such an effect is strictly linked to the load balancing problem. Indeed, in our case SMS has to divide 480 nodes along the x axis and 442 nodes along the y-axis. In the worst case with 46 processors (see fig. 3), SMS real- ized such division by default assigning 23 processors along x-axis and 2 processors along y-axis. This causes a big load imbalance well represented in the time needed for a single com- putational step. To fix the load balancing problem, we have inverted the default SMS decomposition as- signing 2 processors along x-axis and 23 processors along y-axis. Our choice, of 46 processors, improved the time needed for a sin- gle iteration by almost 19% (see fig. 3). Such a procedure has been adopted to improve the do- main decomposition for other numbers of processors. A useful index to check any gain in per- formance with respect to the serial code is the Speed-up, S(n). It is defined as the ratio be- tween the real time employed by one processor to run the code, T (1), and the real time needed by n processors, T (n): . Figure 4 left panel compares the Speed-up of both SMS default domain decomposition and our specified decomposition with ideal per- formance. It emerges that for our modeling the Speed-up scaling is satisfactory up to 24 S n T n T 1 =] ] ] g g g processors. Beyond this threshold the onset of saturation for S(n) starts. Saturation is an im- portant marker, to illustrate its meaning we have to consider that the user-hiding message passing (among processors) is a time consum- ing process. Saturation starts when the mes- sage-passing time is comparable with the com- putation time needed by each processor to run its own part of the workload. The communica- tion time increases with the number of proces- sors while the computational time of each processor decreases. Therefore, there is no ad- vantage in increasing the number of processors within a Speed-up saturated regime. In looking at fig. 4 left panel, we see that the saturation is approached through oscillations in the Speed-up values and this is not surprising taking into account the load balancing analysis discussed above. Figure 4 right panel shows the performance of the parallel code in terms of the Efficiency, which is defined as the ratio between the Speed-up and the number of processors: . By definition, Efficiency is a measure of the relevance of user-hiding message passing for a single process. From fig. 4 right panel, the or- der of magnitude of the improvement achieved can be seen comparing the values of the Effi- ciency for the same number of processors in the case of the User and Default domain decompo- sition. As an example, looking at fig. 4 right panel, at 36 processors we see that the Efficien- cy corresponding to the User decomposition is almost 10% higher that the Default one. Moreover, in looking at the Efficiency plot we see an improvement in performances each time a new 8 processors SP5 machine is added to the cluster. This effect is better seen and rec- ognizable in the Efficiency curve corresponding to our domain decomposition choice. In fact, the Efficiency shows a decrease followed by a flat curve almost after every 8 processors. The same behavior has been observed in Caserta et al. (2002) for HPF; in our case this trend is more difficult to recognize because load bal- ancing effects are superimposed. It follows from this that advantages in using SMS are ob- E n n S n =] ] g g Vol52,5,2009 1-12-2009 13:29 Pagina 512 513 Parallelisation technique for serial 3D seismic codes: SMS approach Fig. 3. Iteration time per time step (in milliseconds) vs. number of processors for the domain decomposition suggested by SMS (Default) and that one suggested by us (User). The plot starts from 10 processors in order to better magnify the differences between the two decompositions. Under 10 processors no difference takes place. Fig. 4. Plots of Speed-up (left panel) and Efficiency (right panel) vs. number of processors for both the domain decompositions as planned and realized by SMS (Default) and as planned and realized by us (User). Compari- son shows up to 48 processors, i.e., using six 8-processor SP5 machines in a cluster. Straight line with asterisks represents the ideal Speed Vol52,5,2009 1-12-2009 13:29 Pagina 513 514 A. Caserta, V. Ruggiero, M. P. Busico and Ivo Opršal tained when a cluster of multiprocessor ma- chines are used in a distributed memory envi- ronment. 4. Conclusions We investigated a simple and efficient way to parallelise a serial code, namely 3D seismo- logical code able to simulate the interaction be- tween seismic radiation and near-surface geo- logical structures. The chosen case study code (Opršal and Zahradnik, 2002) and the case test geological site (Fäh et al., 2006) are representa- tive of a wide set of seismological problems dealing with the aforementioned dynamic inter- action. Our parallel approach is based on the SMS high-level tool. It allows the user to avoid all the low-level machinery needed to plan and re- alise the management of information ex- changed among processors, apart from taking care of the domain decomposition problem as discussed in the previous paragraph. This does not mean algorithmic modifications in the seri- al code were not performed by the user to in- crease both the Speed-up and the Efficiency. On the contrary, these latter must be included in the profiling analysis applied to the serial code. In the parallelisation of our case study, nu- merical tests indicate care must be taken in the domain decomposition process to reduce oscil- lations in both Speed-up and Efficiency, im- proving parallel performances. Generally speaking, in a shared memory environment communication is much faster because it takes place in the same cluster node. Nevertheless, we have seen less degradation of performances in adding multiprocessor machines. This phe- nomenon is clearer in the Efficiency plot (fig. 3 right panel) where the Efficiency remains al- most constant when we add a new 8 processor machine, then in the Speed-up plot (fig. 3 left panel); this is mainly due to the load balance problem that is more evident in the Speed-up plot. Such an effect was also observed in Caserta et al. (2002) with smaller size numerical prob- lems, when a new multiprocessor machine is added under HPF compiler. Moreover, porting from serial to parallel versions can be realized relatively simply for both environments. In fact, the porting is real- ized through the following simple steps: – definition of the decomposition structure (see eq. 3.1), – distribution of variables among processors (see eq. 3.2), – domain decomposition creation (see eq. 3.3), – creation of parallel region and exchange of variables among processors, i.e., message passing (see eq. 3.5). Such SMS directives represent around 6% of serial code instructions. On the other hand, SMS limitations are mainly: – it can be used for Fortran codes only, – it is intended for applications using struc- tured Cartesian grids, – maximum number of array indices that can be distributed is two. We point out that they must be the first two indices in the array. This constrain might require modifications of the se- rial code to have the indices in the required or- der. This is what we have been forced to do as in eq. 3.4. No further limitations are arranged concern- ing, in particular, the numerical implementa- tion, i.e., boundary elements, finite elements, spectral elements, finite differences, and so on. Under the above limitations, the simplicity and efficiency of the SMS tool are evident as it may allow broader and/or detailed physical models to be studied due to the use of parallel ma- chines. It is worth noting that the majority of 3D seismic codes that represent our target are writ- ten in Fortran and are based on the finite differ- ences method on a Cartesian grid. On the other hand, no limitations so, we can conclude that for our purposes the previous limitations are not so severe. Acknowledgements We thank Piero Lanucara for useful discus- sions on SMS and HPF tools and for help in us- ing the sophisticated cluster of parallel machines Vol52,5,2009 1-12-2009 13:29 Pagina 514 515 Parallelisation technique for serial 3D seismic codes: SMS approach at CASPUR computer centre. We also wish to thank Antonio Rovelli for help and criticism. Part of the research activities were carried out by A. Caserta, for his PhD thesis, in cooperation with the Charles University in Prague (Czech Republic) and in the frame of the projects GACR 205/07/0502 and MSM 0021620860. Partially supported by Japanese Society for Promotion of Science award FY2006/P06320. REFERENCES AKI, K. (1988): Local site effect on ground motion, in Earthquake engineering and soil dynamics, II: Recent advances in ground-motion evaluation, edited by J. LAWRENCE VON THUN, ASCE, 103-155. AKI, K. (1993): Local site effects on weak and strong ground motions, Tectono-physics, 218, 93-111. BAO, H., J. BIELAK, O. GHATTAS, D.R. O’HALLARON, L.F. KALLIVOKAS, J.R. SHEWCHUK and J. XU (1996): Earth- quake Ground Motion Modeling on Parallel Comput- ers, Supercomputing ‘96, (Pittsburgh, Pennsylvania, November 1996). BARD, P.-Y. (1995): Effects of surface geology on ground motion: recent results and remaining issues, in Pro- ceedings of the 10th European Conference on Earth- quake Engineering, (Vienna, Austria), 305-324. BARD, P.-Y. (1998): Microtremor measurements: a tool for site effect estimation?, in Proceeding of the Second Inter- national Symposium on the Effects of Surface Geology on Seismic Motion, (Yokohama, Japan), pp. 1251-1279. BIELAK, J., O. GHATTAS and H. BAO (1998): Ground motion modeling using 3D finite element methods, in Proceed- ings of the 2nd International Symposium on the Effects of Surface Geology on the Seismic Motion, (Yokohama, Japan), 121-133. BONNEFOY-CLAUDET, S., F. COTTON and P.-Y. BARD (2006): The nature of noise wavefield and its applications for site effects studies. A literature review, Earth Science Review, 79, 205-227. CARA, F., A. ROVELLI, G. DI GIULIO, F. MARRA, T. BRAUN, G. CULTRERA, R. AZZARA and E. BOSCHI (2005): The role of site effects on the intensity anomaly of S. Giu- liano di Puglia inferred from aftershocks of the Molise, Central Southern Italy, sequence, November 2002, Bul- letin of the Seismological Society of America, 95 (4), 1457-1468. CASERTA, A. and P. LANUCARA (2000): Computer animation as a tool to visualize effects of seismic wave propaga- tion inside heterogeneous media, Annali di Geofisica 43 (1), 119-134. CASERTA, A., F. BELLUCCI, G. CULTRERA, S. DONATI, F. MARRA, G. MELE, B. PALOMBO and A. ROVELLI (2000): Study of site effects in the area of Nocera Umbra (Cen- tral Italy) during the 1997 Umbria-Marche seismic se- quence, Journal of Seismology, 4, 555-565. CASERTA, A., V. RUGGIERO and P. LANUCARA (2002): Nu- merical Modelling of dynamical interaction between seismic radiation and near-surface geological struc- tures: a parallel approach, Computer & Geoscience, 28, 1069-1077. CULTRERA, G., A. ROVELLI, G. MELE, R. AZZARA, A. CASER- TA and F. MARRA (2003): Azimuth dependent amplifi- cation of the weak and strong ground motions within a fault zone (Nocera Umbra, central Italy), Journal of Geophysical Research, 108, 2156-2170. DOWD, K. and C. SEVERANCE (1998): High performance computing, (O’Reilly, Cambridge- Kölon-Paris-Se- bastopol-Tokyo), pp. 446. DI GIULIO G., A. ROVELLI, F. CARA, R.M. AZZARA, F. MAR- RA, R. BASILI and A. CASERTA (2002): Long duration asynchronous ground motions in the Colforito plain, central Italy, observed on a two dimensional dense ar- ray, Journal of Geophysical Research, 108, 2486-2498. FÄH, D., C. IODICE and P. SUHADOLC (1993): A new method for realistic estimation of seismic ground motion in megacities: the case of Rome, Earthquake Spectra, 4, 643-668. FÄH, D., S. STEIMEN, I. OPRŠAL, J. RIPPERGER, J. WÖSSNER, R. SCHATZMANN, P. KÄSTLI, I. SPOTTKE and P. HUGGEN- BERGER (2006): The earthquake of 250 A.D. in Augus- ta Raurica, A real event with a 3D site effect?, Journal of Seismology, 10 (4), 459-477. FOSTER, I. (1995): Designing and Building Parallel Pro- grams: Concept and Tools for Parallel Software Engi- neering. Addison-Wesley, Reading, MA, pp. 381. GOVETT, M., L. HART, T. HENDERSON, J. MIDDLECOFF and D. SCHAFFER (2003): The scalable modeling system: di- rective-based code parallelization for distributed and shared memory computers, Parallel Computing, 29 (8), 995-1020. GRAVES, R.W. (1998): Three-dimensional computer simula- tions of realistic earthquake ground motions in regions of deep sedimentary basin. in Proceedings 2nd Interna- tional Symposium on the effects of surface geology on seismic motion, Yokohama, Japan, pp. 103-120. HENNESSY, J. and D. PATTERSON (2006): Computer Archi- tecture: A Quantitative Approach [With CD-ROM], (Morgan Kaufmann Publishers, San Francisco, CA), pp. 672. MITCHEL, A.R. and D.F. GRIFFITHS (1980): The Finite Dif- ference Method in Partial Differential Equations, (Wi- ley, New York), pp. 272. MOCZO, P., J. KRISTEK and E. BYSTRICKY (2001): Efficien- cy and optimization of the 3-D finite-difference model- ing of seismic ground motion, Journal of Computation- al Acoustics, 9 (2), 593-609. MOCZO, P., J. KRISTEK, M. GALIS, P. PAZAK and M. BALAZOV- JECH (2007): The Finite-Difference and Finite-Element Modeling of Seismic Wave Propagation and Earthquake Motion, Acta Physica Slovaca, 57 (2), 177-406. NICHOLS, B., D. BUTTLAR and J. FARRELL (1996): Pthreads Programming, (1st ed. O’Reilly, Sebastopol, CA), pp. 283. OLSEN, K.B. and R.J. ARCHULETA (1996): Three-dimen- sional simulations of earthquakes on the Los Angeles fault system, Bulletin of the Seismological Society of America, 86, 575-596. OLSEN, K.B., A. AKINCI, A. ROVELLI, F. MARRA and L. MALAGNINI (2006): 3D ground-motion estimation in Rome, Italy, Bulletin of the Seisological Society of America, 96, 133- 146. Vol52,5,2009 1-12-2009 13:29 Pagina 515 516 A. Caserta, V. Ruggiero, M. P. Busico and Ivo Opršal OPRŠAL, I. and J. ZAHRADNIK (1999) Elastic finite-differ- ence method for irregular grids. Geophysics, 64 (1), 240-250. OPRŠAL, I. and J. ZAHRADNIK (2002): Three-dimensional fi- nite difference method and hybrid modeling of earth- quake ground motion, Journal Of Gephysisical Re- search, 107, B8, 2161. OPRŠAL, I., J. ZAHRADNÍK, A, SERPETSIDAKI and G.-A. TSE- LENITS (2004): 3D hybrid simulation of the source and site effects during the 1999 Athens earthquake, in Proc. Of 13th World Conference on Earthquake Engineering, (Vancouver, B.C., Canada, August 1-6, 2004), Paper No. 3337, pp. 15. OPRŠAL, I., D. FÄH, M. MAI and D. GIARDINI (2005): Deter- ministic earthquake scenario for the Basel area: Simu- lating strong motions and site effects for Basel, Switzerland, J. Geophys. Res., 110 (B4), 19, B04305, doi:10.1029/2004JB003188. OPRŠAL, I. and D. FÄH (2007): 1D vs. 3D strong ground mo- tion hybrid modelling of site, and pronounced topogra- phy effects at Augusta Raurica, Switzerland - earth- quakes or battles?, in Proceedings of 4th International Conference on Earthquake Geotechnical Engineering June 25-28, 2007, Greece, Paper No. 1416, pp. 12. OPRŠAL, I., C. MATYSKA and K. IRIKURA (2009): The source-box wave propagation hybrid methods: general formulation and implementation, Geophysical Journal International, 176 (2), 555-564, doi: 10.1111/j.1365- 246X.2008.03986x. OPRŠAL, G.F., F. ROMANELLI and F. VACCARI (2001): Seis- mic wave propagation in laterally heterogeneous anelastic media: theory and applications to seismic zonation, Advances in Geophysics, 43, 1-95. PITARKA, A. (1999): 3D elastic finite difference modeling of seismic wave propagation using staggered grid non- uniform spacing, Bulletin of the Seismological Society of America, 89, 54-68. ROVELLI, A., A. CASERTA, L. MALAGNINI and F. MARRA (1994): Assessment of potential strong ground motions in the city of Rome, Annali di Geofisica, 37 (6), 1745-1769. SÁNCEZ-SESMA, F.J. and F. LUZON (1995): Seismic response of three-dimensional valleys for incident P, S, and Rayleigh waves, Bulletin. of the Seismological Society of America, 85, 269-284. SORENSEN, M.B., I. OPRŠAL, S. BONNEFOY-CLAUDET, K. ATAKAN, P.M. MAI, N. PULIDO and C. YALCINER (2006): Local site effects in Ataköy, Istanbul, Turkey, due to a future large earthquake in the Marmara Sea, Geophy- sical Journal International, 167, 1413-1424, doi: 10.1111/j.1365-246X.2006.03204.x. ZAHRADNIK, J., P. MOCZO and F. HRON (1993): Testing four elastic finite-difference schemes for behaviour at dis- continuities, Bulletin of the Seismological Society of America, 83, 107-129. ZAHRADNIK, J. and P. MOCZO (1996): Hybrid seismic model- ing based on discretewave number and finite difference methods, Pure and Applied Geophysics, 148, 21-38. (received, February 25, 2009; accepted, July 14, 2009) Vol52,5,2009 2-12-2009 10:54 Pagina 516