Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. III (2008), No. 2, pp. 172-182 Analytical Model For a Multiprocessor With Private Caches And Shared Memory Angel Vassilev Nikolov Abstract: We develop an analytical model of multiprocessor with private caches and shared memory and obtain the following results: the instantaneous state proba- bilities and the steady-state probabilities of the system. Both transient behaviour and equilibrium can be studied and analyzed. We showed that results can be applied to determine the output parameters for both blocking and non-blocking caches. Keywords: Invalidate cache-coherence protocol, queuing system, discrete transform 1 Introduction Shared memory multiprocessors are widely used as platforms for technical and commercial comput- ing [2]. Performance evaluation is a key technology for design in computer architecture. The continuous growth in complexity of systems is making this task increasingly complex [7]. In general, the problem of developing effective performance evaluation techniques can be stated as finding the best trade-off between accuracy and speed. The most common approach to estimate the performance of a superscalar multiprocessor is through building a software model and simulating the execution of a set of benchmarks. Since processors are synchronous machines, however, simulators usually work at cycle-level and this leads to enormous slow- down [9]. It might take hours even days to simulate. For memory structures relatively accurate analytical models were developed [3, 7, 9, 10] through extensive use of various queuing systems. Open queue system with Poisson arrivals and exponential service times is considered quite good for description of memory hierarchies [7]. Our focus is on the impact of the cache-coherence protocols on the overall system performance. The most commonly used technique for this purpose is the Mean Value Analysis (MVA) [3, 5, 7, 8, 9]. It allows the total number of the customers to be fixed (closed queue system), and this seems to be more adequate representation of the processes of self-blocking requestors [5]. Calculations of output parameters such as residency times, waiting times and utilization are shown in [3, 8, 9]. MVA is based on the forced flow that means in equilibrium output rate equals input rate. However, instantaneously, we can have input rate different from output rate, so that the instantaneous probabilities could be different from equilibrium [7]. MVA offers no possibility to study transient effects. Moreover, the assumption of exponential service times is not realistic, in fact all bus access times and memory access times are constants. It will be seen later in this paper that state probabilities depend on the server’s time density function. We use the technique of Markov processes to describe the behaviour of the multiprocessor imple- menting cache-coherence protocols. 2 Definition and Analysis of the Model A multiprocessor consists of several processors connected together to a shared main memory by a common complete transaction bus. Each processor has a private cache. When a processor issues a request to its cache, the cache controller examines the state of the cache and takes suitable action, which may include generating bus transaction to access main memory. Coherence is maintained by having all cache controllers "snoop" on the bus and monitor the transaction. Snoopy cache-coherence protocols fall in two major categories: Invalidate and Update [2, 3, 10]. Invalidating protocols are studied here but the concepts can be applied with some modifications to updating protocols too. Transactions may Copyright © 2006-2008 by CCC Publications Analytical Model For a Multiprocessor With Private Caches And Shared Memory 173 or may not include the memory block and the shared bus. Typical transaction that does not include memory block is Invalidate Cache Copy which occurs when a processor requests writing in the cache. All other processors simply change the status bit(s) of their on copies to Invalid. If the memory block is uncached or not clean it can be uploaded from the main memory, but in today’s multiprocessors it is rather uploaded from another cache designated as Owner (O) (cache-to cache transfer). Memory-to- cache transfer occurs when the only clean copy is in the main memory. A cache block is written back (WB) in the main memory (bus is used) when a dirty copy is evicted [6]. The bus and the main memory are also used when synchronization procedures are executed [2]. Apparently the bus can be considered as the bottleneck of the system. In terms of the queuing theory processors can be viewed as customers (clients) and the bus can be viewed as a server. Inter-arrival times are exponentially distributed with parameter λ . This assumption is adequate for most applications [7]. Requests are served on First Come First Served (FCFS) basis. Immediately after issuing a request for cache-to-cache transfer or synchronization procedure the customer blocks itself. The service time for blocking request has a density function f1(x). When service is completed the processor (customer) resumes processing with probability p or resumes processing and generates a new request with probability q (p+q=1). Details on how to obtain the input parameters are given in [2, 3, 8, 9]. This new request has a different density function f2(x) and corresponds to WB transaction. It does not block the customer but the server is held until completion of WB transaction therefore adding to the queue. The system can be in one of the following states: 1) N: all N customers are doing internal processing; 2) j, 1: j customers are doing internal processing (N-j are blocked respectively) and all requests are of type 1(0≤j≤N-1), 3) j ,2: j customers are doing internal processing , the server is serving request of type 2, and N-j customers are waiting in the queue for service of type 1 (0≤j≤N). The transitions between these states are illustrated in Fig. 1. Throughout this paper we use the following notations PN (t) Probability[all N customers are doing internal processing at time t] P j,i(t,x) Probability[j customers are doing internal processing, N-j are in the queue and/or in the server, and the server is busy doing service of type i at time t and the elapsed service time lies between x and x+dx ] P j,i(x) Probability[in the equilibrium state j customers are doing internal processing, N-j are in the queue and/or in the server, the server is busy doing service of type i and the elapsed service time lies between x and x+dx ] P j,i(t) Probability[j customers are doing internal processing, N-j are in the queue or in the server, the server is busy doing service of type i at time t] PN , P j,i steady-state probabilities. PN = limt→∞ PN (t), Pj,i = ∫ ∞ 0 Pj,i(x)dx βi = jλ Fi(x) cumulative distribution function (c.d.f.) of the service time of type i ; i=1,2 fi(x) probability density function (p.d.f.) of the service time of type i ; i=1,2 δ m,n Kronecker delta 1 µi = ∫ ∞ 0 xfi(x)dx i=1,2 hi(x) = fi(x) 1−Fi(x) service rate for type i; i=1,2 fi(s), fi(s+βn), fi(βn) Laplace Transforms (LT) of fi(x) t.u. time unit Viewing the nature of the system, we obtain the following set of integro-differential equations [ d dt + βn ] PN = p ∫ t 0 PN−1(t, x)h1(x)dx + ∫ t 0 PN,2(t, x)h2(x)dx (1) 174 Angel Vassilev Nikolov Figure 1: State-transition diagram of the model. 1 ≤ j ≤ N [ d dt + ∂ ∂ t + βN−1 + h1(x) ] PN−1,1(t, x) = 0 (2) [ d dt + ∂ ∂ t + βN + h2(x) ] PN,2(t, x) = 0 (3) [ d dt + ∂ ∂ t + β j + hi(x) ] Pj,i(t, x) = β j+1Pj+1,i(t, x) (4) for i = 1, 1 ≤ j ≤ N −1; i = 2, 1 ≤ j ≤ N [ d dt + ∂ ∂ t + hi(x) ] P0,i(t, x) = β1P1,i(t, x) (5) for i=1,2 having the following boundary and initial conditions Pj,1(t, 0) = (1−δ j,0)p ∫ ∞ 0 Pj−1,1(t, x)h1(x)dx + ∫ ∞ 0 Pj,2(t, x)h2(x)dx + δ j,N−1βN PN (t) (6) for 0 ≤ j ≤ N −1 Pj,2(t, 0) = q ∫ ∞ 0 Pj−1,1(t, x)h1(x)dx (7) for 1 ≤ j ≤ N PN (0) = 1, P0,2(t, 0) = 0, Pj,i(0, 0) = 0 (8) for i = 1, 1 ≤ j ≤ N −1; i = 2, 1 ≤ j ≤ N By using Laplace transform and discrete transform [4, 8] the above equations are transformed as follows (s + βn) PN = 1 + p ∫ ∞ 0 PN−1(s, x)h1(x)dx + ∫ ∞ 0 PN,2(s, x)h2(x)dx (9) Analytical Model For a Multiprocessor With Private Caches And Shared Memory 175 [ s + d dx + β j + hi(x) ] u j,i(s, x) = 0 (10) for i = 1, 1 ≤ j ≤ N −1; i = 2, 1 ≤ j ≤ N [ s + d dx + +hi(x) ] P0,i(s, x) = β1P1,i(s, x) (11) for i=1,2 where u j,1(s, x) = ∑N−1n= j ( n j ) Pn,1(s, x), Pj,1(s, x) = ∑N−1n= j (−1)n− j ( n j ) un,1(s, x) for 1 ≤ j ≤ N − 1, and u j,2(s, x) = ∑Nn= j ( n j ) Pn,2(s, x), Pj,2(s, x) = ∑Nn= j(−1)n− j ( n j ) un,2(s, x) for 1 ≤ j ≤ N. Let v j,i(s, x) = u j,i(s,x) 1−Fi(x) and P ′ 0,1(s, x) = P0,1(s,x) 1−Fi(x) . Then from (10 and 11) we have after some transfor- mations [ s + d dx + βi ] v j,i(s, x) = 0 for i = 1, 1 ≤ j ≤ N −1; i = 2, 1 ≤ j ≤ N and [ s + d dx ] P ′ 0,i(s, x) = β1P1,i(s, x) for i = 1, 2. Hence the solutions of (9-11) are u j,i(s, x) = [1−Fi(x)]u j,i(s, 0)e−(s+βi)x (12) PN (s) = 1 + p f1(s + βN−1)uN−1,1(s, 0) + f2(s + βN )uN,2(s, 0) s + βN (13) P0,1(s, x) = [1−F1(x)]β1e−sx [ P0,1(s, 0) + N−1 ∑ n=1 (−1)n−1n 1−e −βnx βn un,1(s, 0) ] (14) P0,2(s, x) = [1−F2(x)]β1e−sx [ N ∑ n=1 (−1)n−1n 1−e −βnx βn un,2(s, 0) ] . (15) By integrating (12, 14, and 15) we obtain the LT of the instantaneous probabilities Pj,1(s) = N−1 ∑ n= j (−1)n− j ( n j ) [ 1− f1(s + βn) s + βn ] un,1(s, 0) (16) for 1 ≤ j ≤ N −1 Pj,2(s) = N ∑ n= j (−1)n− j ( n j ) [ 1− f2(s + βn) s + βn ] un,2(s, 0) (17) for 1 ≤ j ≤ N P0,1(s) = P0,1(s, 0) [ 1− f1(s) s ] + β1 N−1 ∑ n=1 (−1)n−1n [ 1− f1(s) s − 1− f1(s + βn) s + βn ] un,1(s, 0) βn (18) 176 Angel Vassilev Nikolov P0,2(s) = β1 N ∑ n=1 (−1)n−1n [ 1− f2(s) s − 1− f2(s + βn) s + βn ] un,2(s, 0) βn . (19) Taking LT of (6-7) and using (8 and 12-15) we get after some transformations the following system of linear equations N−1 ∑ n= j (−1)n− j ( n j ) un,1(s, 0) = p N−1 ∑ n= j (−1)n− j+1 ( n j −1 ) f1(s + βn)un,1(s, 0) (20) + N ∑ n= j (−1)n− j ( n j ) f2(s + βn)un,2(s, 0) + δ j,N−1βN PN for 2 ≤ j ≤ N −1 N ∑ n= j (−1)n− j ( n j ) un,2(s, 0) = q N−1 ∑ n= j−1 (−1)n− j+1 ( n j −1 ) f1(s + βn)un,1(s, 0) (21) for 2 ≤ j ≤ N N−1 ∑ n=1 (−1)n−1 ( n j ) un,1(s, 0) = pP0,1(s, 0) f1(s) + pβ1 [ N−1 ∑ n=1 (−1)n−1n f1(s)− f1(s + βn) βn un,1(s, 0) ] (22) N ∑ n=1 (−1)n−1 ( n j ) un,2(s, 0) = qP0,1(s, 0) f1(s) + qβ1 [ N−1 ∑ n=1 (−1)n−1n f1(s)− f1(s + βn) βn un,1(s, 0) ] (23) Coefficients u j,i(s,0) can now be determined from the above equations. We can apply the final-value theorem to (16-19) to obtain the steady-state probabilities but it will require use of the L’Hopital rule and seems difficult and impractical [11]. Instead we set the following differential equations βnPN = p ∫ ∞ 0 PN−1(x)h1(x)dx + ∫ ∞ 0 PN,2(x)h2(x)dx (24) [ d dx + βN−1 + h1(x) ] PN−1,1(x) = 0 (25) [ d dx + βN + h2(x) ] PN,2(x) = 0 (26) [ d dx + β j + hi(x) ] Pj,i(x) = β j+1Pj+1,i(x) (27) for i = 1, 1 ≤ j ≤ N −1; i = 2, 1 ≤ j ≤ N [ d dx + hi(x) ] P0,i(x) = β1P1,i(x) (28) for i=1,2. Equations (24-28) are to be solved under the following boundary conditions and normalizing condition Pj,1(0) = (1−δ j,0)p ∫ ∞ 0 Pj−1,1(x)h1(x)dx + ∫ ∞ 0 Pj,2(x)h2(x)dx + δ j,N−1βN PN (29) Analytical Model For a Multiprocessor With Private Caches And Shared Memory 177 for 0 ≤ j ≤ N −1 Pj,2(0) = q ∫ ∞ 0 Pj−1,1(x)h1(x)dx (30) for 1 ≤ j ≤ N −1 P0,2(0) = 0 (31) PN + N−1 ∑ j=0 Pj,1 + N ∑ j=0 Pj,2 = 1. (32) The solutions of (2.29-2.32) are PN = 1 + p f1(βN−1)uN−1,1(0) + f2(βN )uN,2(0) βN (33) Pj,1 = N−1 ∑ n= j (−1)n− j ( n j ) [ 1− f1(βn) βn ] un,1(0) (34) for 1 ≤ j ≤ N −1 Pj,2 = N ∑ n= j (−1)n− j ( n j ) [ 1− f2(βn) βn ] un,2(0) (35) for 1 ≤ j ≤ N −1 P0,1 = P0,1(0) µ1 + N−1 ∑ n= j (−1)n− jn [ 1 µ1 − 1− f1(βn) βn ] un,1(0) (36) P0,2 = N ∑ n= j (−1)n− jn [ 1 µ2 − 1− f2(βn) βn ] un,2(0) (37) For u j,i(0) and P0,1(0) we have N−1 ∑ n= j (−1)n− j ( n j ) un,1(0) = p N−1 ∑ n= j (−1)n− j+1 ( n j −1 ) f1(βn)un,1(0) + δ j,N−1βN PN (38) for 2 ≤ j ≤ N −1 N−1 ∑ n= j (−1)n− j ( n j ) un,2(0) = q N−1 ∑ n= j (−1)n− j+1 ( n j −1 ) f2(βn)un,2(0) (39) for 2 ≤ j ≤ N −1 P0,1(0) = β1 N ∑ n=1 (−1)n−1n [ 1− f2(βn βn ] un,2(0) (40) N−1 ∑ n=1 (−1)n−1nun,1(0) = pP0,1(0) + pβ1 N ∑ n=1 (−1)n−1n [ 1− f1(βn βn ] un,1(0) + N ∑ n=1 n f2βnun,2(0) (41) N ∑ n=1 (−1)n−1nun,2(0) = qP0,1(0) + qβ1 N ∑ n=1 (−1)n−1n [ 1− f1(βn βn ] un,1(0) (42) The coefficients u j,i(0) can be determined from (32) and (38-42). 178 Angel Vassilev Nikolov 3 Examples In order to obtain the transient state probabilities first we have to determine PN (s) and Pj,i(s) from (16- 19) and (20-24) and then to apply the Inverse Laplace Transform to them. We used the packages of Maple 8 on a standard PC platform under Windows XP for these computations [12]. Results were produced and printed in less than a second. For N=4 the instantaneous probabilities are listed in Appendix A. Various performance characteristics can be computed using the steady-state probabilities. For ex- ample, the average number of blocked customers (ANBC) in the case of blocking caches will be given by ANBC = 2 ∑ i=1 N ∑ j=0 (N − j)Pj,i. (43) In the case of non-blocking caches ANBC will be ANBC = N ∑ j=0 (N − j −1 + k)Pj,1 + N−1 ∑ j=0 (N − j)Pj,2. (44) where k is the ratio of average memory stall time [2] . k depends strongly on the application. (1-k) actually refers to the fraction time the processor is consuming data while cache-to-cache or memory-to- cache transfer is in progress. In Appendix B we list the ANBC for two popular service time distributions: exponential and erlangian [1], for blocking and fully non-blocking caches (k=0). The time to solve (33-42) and calculate ANBC was meaninglessly short. 4 Concluding Remarks This work presented a model for a shared bus, shared memory multiprocessor with private caches and captures the whole spectrum of Invalidate type cache coherence protocols. Although we started with fairly sophisticated set of integro-differential equations, the output of the model is a set of few linear equations from which the state probabilities can be determined. The approach eliminates the main drawbacks of the most commonly used MVA analysis: inability to deal with transients and constraint on the service time distribution. The model gives insights into the transient behaviour of the system. Moreover, the assumption of exponentially distributed service times can be dropped; any continuous distribution can be used. The ease of obtaining performance measures in a meaningless time makes very feasible the incorpo- ration of the model in a multiprocessor design tool. Bibliography [1] S. K. Bose, Introduction to Queuing Systems, Kluwer/Plenum Publishers, 2001 [2] J. L. Hennessy, D. A. Patterson; Computer Architecture: A Quantitative Approach, Pearson Pub- lishers, 2003 [3] M. C. Chiang, Memory System Design For Bus Based Multiprocessor, PhD Thesis, University of Wisconsin, 1991 [4] T. Itoi, T. Nishida, M. Kodama and E. Ohi, N-unit parallel redundant system with correlated failures and single repair facility, Microelectronics and Reliability, vol. 17, pp. 279-285, 1978 Analytical Model For a Multiprocessor With Private Caches And Shared Memory 179 [5] E. Lazowska, J. Zahorjan, G. Graham, and K. Sevcik, Quantitative System Performance, Computer System Analysis Using Queuing Network Models, Prentice-Hall, Englewood Clis, NJ, May 1984 [6] A. Louri, A.K. Kodi, An optical interconnection network and a modifying snooping protocol for the design of large-scale symmetric multiprocessors (SMPs), IEEE Transactions on Parallel and Distributing Systems, vol. 15, No. 12, Dec. 2004, pp. 1093-11047. [7] R. E. Matick, Comparison of analytic performance models using closed mean-value analysis versus open-queuing theory for estimating cycles per instruction of memory hierarchies, IBM Journal of Research and Development, Jul 2003 [8] D. J. Sorin et. al., A customized MVA model for ILP multiprocessors, Technical report No.1369, University of Wisconsin-Madison, 1998 [9] D. J. Sorin et. al., Evaluation of shared-memory parallel system with ILP processors , Proc. 25th Int’l Symp. On Computer Architecture, June 1998, pp. 180-191 [10] J. Sustersic, A. Hurson, Coherence protocol for bus-based and scalable multiprocessors, Internet and wireless distributed computing environments: a survey , Advances in Computers, vol.59, 2003, pp. 211-278 [11] Schiff, Joel L., The Laplace Transform, Springer, 1999 [12] Waterloo Maple Inc., Introduction to Maple 8, 2002 APPENDIX A For N=4, λ =0.001[1/t.u.], f1(x)=0.1exp(-0.1x), and f2(x)=0.01exp(-0.01x) the instantaneous proba- bilities are P4(t) =0 .9211361286+0.8058476879e-2*exp(-0.1248619627*t) +0.8535072295e-2*exp(-0.1089825679*t)+0.9049529656e-2*exp(-0.9494144284e-1*t) +0.9696074769e-2*exp(-0.8072343638e-1*t)+0.1774027054e-3 exp(-0.1510201407e-1*t) +0.1728181365e-2*exp(-0.1398702636e-1*t) +0.5618533851e-2*exp(-0.1256085210e-1*t) +0.1211910345e-1 exp(-0.1067946234e-1*t)+0.2388149701e-1*exp(-0.8161235321e-2*t), P31(t) =0.3792913471e-1-0.1093143496e-1*exp(-0.1248619627*t) -0.1007354818e-1 *exp(-0.1089825679*t)-0.9271731350e-2*exp(-0.9494144284e-1*t) -0.8405607569e-2*exp(-0.8072343638e-1*t)+0.2658572506e-2 *exp(-0.1510201407e-1*t)-0.2212663963e-5*exp(-0.1398702636e-1*t) -0.3015750621e-3*exp(-0.1256085210e-1*t)-0.6739486112e-3 *exp(-0.1067946234e-1*t)-0.9276492013e-3*exp(-0.8161235321e-2*t), P21(t) = 0.1420742616e-2+0.2324288902e-2*exp(-0.1248619627*t) +0.2986557798e-3* exp(-0.1089825679*t)-0.1243034332e-2 *exp(-0.9494144284e-1*t)-0.2544948528e-2*exp(-0.8072343638e-1*t) -0.5329230583e-2*exp(-0.1510201407e-1*t)+0.6737760872e-2 *exp(-0.1398702636e-1*t)+0.4442015626e-3*exp(-0.1256085210e-1*t) -0.5688666731e-3*exp(-0.1067946234e-1*t)-0.1539483112e-2 *exp(-0.8161235321e-2*t), P11(t) =0 .7684028624e-4-0.2290986748e-3*exp(-0.1248619627*t) +0.3160128043e-3*exp(-0.1089825679*t)+0.2148505581e-3 *exp(-0.9494144284e-1*t)-0.3252577041e-3 180 Angel Vassilev Nikolov *exp(-0.8072343638e-1*t)+0.3775725072e-2*exp(-0.1510201407e-1*t) -0.8400171763e-2*exp(-0.1398702636e-1*t)+0.4974831098e-2 *exp(-0.1256085210e-1*t)+0.5143708805e-3*exp(-0.1067946234e-1*t) -0.9181578239e-3*exp(-0.8161235321e-2*t), P01(t) = 0.9242283829e-5*exp(0-.1248619627*t)-0.3513395647e-4 *exp(-0.1089825679*t)+0.4257071327e-4*exp(-0.9494144284e-1*t) -0.1688587212e-4*exp(-0.8072343638e-1*t)-0.9200675435e-3 *exp(-0.1510201407e-1*t)+0.2810623081e-2*exp(-0.1398702636e-1*t) -0.3183584754e-2*exp(-0.1256085210e-1*t)+0.1611954912e-2 *exp(-0.1067946234e-1*t)-0.3239077071e-3*exp(-0.8161235321e-2*t) +0.5218152790e-5, P42(t) = 0.2709223908e-1+0.9859983387e-3*exp(-.1248619627*t) +0.1060558367e-2*exp(-0.1089825679*t)+0.1145474465e-2 *exp(-0.9494144284e-1*t)+0.1259769099e-2*exp(-0.8072343638e-1*t) -0.2412466943e-1*exp(-0.1510201407e-1*t)-0.1705507775e-2 *exp(-0.1398702636e-1*t)-0.2095511260e-2*exp(-0.1256085210e-1*t) -0.2029637013e-2*exp(-0.1067946234e-1*t)-0.1588776483e-2 *exp(-0.8161235321e-2*t), P32(t) = -0.2421204825e-3*exp(-0.1248619627*t)-0.7509940526e-4 *exp(-0.1089825679*t)+0.9576676158e-4 *exp(-0.9494144284e-1*t)+0.3013803504e-3 *exp(-0.8072343638e-1*t)+0.7126069503e-1*exp(-0.1510201407e-1*t) -0.6135152996e-1*exp(-0.1398702636e-1*t)-0.8971987351e-2 *exp(-0.1256085210e-1*t)-0.5950006752e-2*exp(-0.1067946234e-1*t) -0.4494935895e-2*exp(-0.8161235321e-2*t)+0.9428952497e-2, P22(t) =0 .2421271696e-2+0.2626333487e-4*exp(-0.1248619627*t) -.3154175021e-4*exp(-0.1089825679*t)-0.2945613244e-4 *exp(-0.9494144284e-1*t)+0.3412946115e-4*exp(-0.8072343638e-1*t) -0.8108903801e-1*exp(-0.1510201407e-1*t)+0.1349032466 *exp(-0.1398702636e-1*t)-0.4071010637e-1*exp(-0.1256085210e-1*t) -0.9622074403e-2*exp(-0.1067946234e-1*t)-0.5904604182e-2 *exp(-0.8161235321e-2*t), P12(t) = -0.1765308077e-5*exp(-0.1248619627*t)+0.4800731626e-5 *exp(-0.1089825679*t)-0.4448905932e-5*exp(-0.9494144284e-1*t) +0.1599282603e-5*exp(-0.8072343638e-1*t)+0.4177917201e-1 *exp(-0.1510201407e-1*t)-0.9973555226e-1*exp(-0.139870263e-1*t +0.7256040480e-1*exp(-0.1256085210e-1*t)-0.9747995399e-2 *exp(-0.1067946234e-1*t)-0.5300997812e-2*exp(-0.8161235321e-2*t) +0.4449749927e-3, P02(t) = -0.4618227199e-6*exp(-0.1248619627*t)-0.2203030325e-6 *exp(-0.1089825679*t)+0.4881890483e-7*exp(0-.9494144284e-1*t) -0.1699392257e-7*exp(-0.8072343638e-1*t)-0.8188760719e-2 *exp(-0.1510201407e-1*t)+0.2501502200e-1*exp(-0.1398702636e-1*t) -0.2833447693e-1*exp(-0.1256085210e-1*t)+0.1434663085e-1 *exp(-0.1067946234e-1*t)-0.2882912592e-2*exp(-0.8161235321e-2*t) +0.4449749927e-4. In the above expressions e-i means 10-i for i=1,7. Analytical Model For a Multiprocessor With Private Caches And Shared Memory 181 APPENDIX B Table 1: N=8, f1(x)=0.1exp(-0.1x), f2(x)=0.01exp(-0.01x) λ [1/t.u.] p ANBC for blocking ANBC for fully caches nonblocking caches 0.001 0.9 0.154099881194466 0.075640880006411 0.002 0.9 0.441552853804251 0.290383910880334 0.003 0.9 0.822750601431095 0.607433119474025 0.004 0.9 1.253944990222998 0.984102789831906 0.001 0.8 0.230012889507952 0.152313018403034 0.002 0.8 0.729883782777377 0.584481458432927 0.003 0.8 1.382033782478873 1.183494795953230 0.004 0.8 2.063720956300253 1.826269794552253 Table 2: N=8, f1(x)=0.13x2ex p(−0.1x)/2!, f2(x)=0.013x2ex p(−0.01x)/2! λ [1/t.u] p ANBC for blocking ANBC for fully caches nonblocking caches 0.001 0.9 0.384839057891723 0.211437492029451 0.002 0.9 1.313451009452606 0.582993712839022 0.003 0.9 2.390481400874492 1.782339618354729 0.004 0.9 3.691834116720534 2.882438452093385 0.001 0.8 0.614956120345239 0.400820549913285 0.002 0.8 2.611487230549326 1.722034656332087 0.003 0.8 4.062557145097248 3.429652938504840 0.004 0.8 5.899361833023557 5.394204692051840 Angel Vassilev Nikolov National University of Lesotho Department of Mathematics and Computer Science Roma 180 Lesotho E-mail: av.nikolov@nul.ls Received: December 17, 2007 182 Angel Vassilev Nikolov Angel Vassilev Nikolov received the BEng degree in Electronic and Computer Engineering from the Technical University of Bu- dapest, Hungary in 1974 and the PhD degree in Computer Sci- ence from the Bulgarian Academy of Sciences in 1982 where he worked as a Research Associate. In 1989 he was promoted to As- sociate Research Professor in Bulgaria. Dr Nikolov also served as a Lecturer of Computer Science at the National University of Science and Technology, Bulawayo, Zimbabwe and at the Grande Prairie Regional College, Alberta, Canada and as an Associate Professor at Sharjah College, United Arab Emirates. His research interests include computer architecture, performance evaluation of multiprocessors, and reliability modeling. He has published numerous journal and conference articles and holds four patents on the above topics.