Cubo_Reliability_versija5.dvi


CUBO A Mathematical Journal
Vol.12, No¯ 01, (7–13). March 2010

Analysis of the Component-Based Reliability in

Computer Networks

Saulius Minkevičius

Vilnius University, Faculty of Mathematics and Informatics

Naugarduko 24, 03225 Vilnius, Lithuania

email : stst@ktl.mii.lt

ABSTRACT

Performance in terms of reliability of computer networks motivates this paper. Limit theorems

on the extreme queue length and extreme virtual waiting time in open queueing networks in

heavy traffic are derived and applied to a reliability model for computer networks where we

relate the time of failure of a computer network to the system parameters.

RESUMEN

Desempeño en términos de fiabilidad de redes de computador notiva este art́ıculo. Teoremas

ĺımite sobre la duración extrema de cola y el tiempo de espera virtual extremo en redes de

cola abierta en trafico pesado sao derivados y aplicados a un modelo de fiabilidad para redes

de computador donde relacionamos el tiempo de falha de una red de computador al sistema

de parámetros.

Key words and phrases: Performance evaluation, reliability theory, queueing theory, mathemat-

ical models of technical systems, open queueing network, heavy traffic, the probability limit theorem,

extreme values, queue length of customers, virtual waiting time of a customer.

Math. Subj. Class.: 60K25, 60G70, 60F17.


8 Saulius Minkevičius CUBO
12, 1 (2010)

1 Introduction

Probabilistic models and queueing networks have long been used to study the performance and

reliability of computer systems [1, 2] and to analyse the performance and reliability of computer

networks and of distributed information systems [3, 4]. In this paper, we will first briefly review

the works related to using the queueing theory of computer system reliability, and then present

some new results on the estimation of the time of failure of a computer network.

In one of the first papers of this kind [6], the reliability of execution of programs in a distributed

computing system is considered, showing that a program, which runs on multiple processing el-

ements that have to communicate with other processing elements for remote data files, can be

executed successfully despite that certain system components may be unreliable. In order to anal-

yse the performance of multimedia service systems which have unreliable resources and to estimate

their capacity requirements, a capacity planning model using an open queueing network is pre-

sented in [9], and in [5] a novel model for a reliable system composed of N unreliable systems,

which can hinder or enhance one anothers reliability, is discussed. In [10], the management pol-

icy of an M /G/1 queue with a single removable and non-reliable server is discussed and analytic

results are explored, using an efficient Matlab program to calculate the optimal threshold of the

management policy and to evaluate system performance. In [11], the authors consider a single

machine subject to breakdown and employ a fluid queue model with repair. In [12], the behaviour

of a heterogeneous finite-source system with a single server is considered and applications in the

field of telecommunications and reliability theory are treated.

In this paper, first we present probability limit theorems on the extreme queue length of

customers and the extreme virtual waiting time of a customer in heavy traffic for open queueing

networks. We consider open networks with the “first come, first served” service discipline at

each station and general distributions of interarrival and service times. The basic components of

the queueing network are arrival processes, service processes, and routing processes among the

different queues. The queueing network that we study has k single server stations, each of which

has an associated infinite capacity waiting room. Every station has an arrival stream from outside

the network, and the arrival streams are assumed to be mutually independent renewal processes.

Customers are served in the order of arrival and after service they are randomly routed to either

another station in the network, or out of the network entirely. Service times and routing decisions

form mutually independent sequences of independent identically distributed random variables.

2 The Mathematical Model

Let us consider the mutually independent sequences of independent identically distributed ran-

dom variables
{

z
(j)
n , n ≥ 1

}

,
{

S
(j)
n , n ≥ 1

}

and
{

Φ
(j)
n , n ≥ 1

}

for j = 1, 2, . . . , k; defined on the

probability space. The random variables z
(j)
n and S

(j)
n are strictly positive, and Φ

(j)
n have sup-


CUBO
12, 1 (2010)

Analysis of the Component-Based Reliability ... 9

port in {0, 1, 2, . . . , k}. We define µj =
(

M
[

S
(j)
n

])−1

> 0, σj = D
(

S
(j)
n

)

> 0 and λj =
(

M
[

z
(j)
n

])−1

> 0, aj = D
(

z
(j)
n

)

> 0, j = 1, 2, ..., k; with all of these terms assumed finite.

Denote pij = P
(

Φ
(i)
n = j

)

> 0, j = 1, 2, . . . , k.

In the context of the queueing network being considered, the random variables z
(j)
n represent

an interarrival time from outside the network at the station j, while S
(j)
n is the nth service time

at the station j, and Φ
(j)
n is a routing indicator for the nth customer served at the station j. If

Φ
(i)
n = j (which occurs with probability pij ), then the nth customer served at the station i is routed

to the station j. When Φ
(i)
n = 0, the associated customer leaves the network. The matrix P is

called a routing matrix.

We observe that this system is quite general, encompassing tandem systems, acyclic networks

of GI/G/1 queues, networks of GI/G/1 queues with feedback and open queueing networks.

Let us define Qj(t) as the queue length of customers at the jth station of the queueing network

at time t, β̂j = λj +
k
∑

i=1

µi · pij − µj > 0, σ̂2j = (λj )
3·Dz(j)n +

k
∑

i=1

(µi)
3 ·DS(i)n ·(pij )2 +(µj )3·DS(j)n >

0, j = 1, 2, . . . , k and t > 0. Also, let us define Wj (t) as the virtual waiting time of a customer

at the jth station of a queueing network at time t (one must wait until a customer arrives at the

jth station of the queueing network to be served at time t), β̃j =

λj +
k
∑

i=1

µi · pij

µj
− 1, σ̃2j =

k
∑

i=1

p2ij · µi ·
(

σj +

(

µi
µj

)2

· σi
)

+ λj ·
(

σj +

(

λj
µj

)2

· aj
)

,

j = 1, 2, · · · , k and t > 0.

We will also assume that the following “overload conditions” are fulfilled:

λj +

k
∑

i=1

µi · pij > µj , j = 1, 2, . . . , k. (1)

Note that these conditions guarantee that the length of all the queues will grow indefinitely with

probability one. The results of the present paper are based on the following theorems:

Theorem 1. If conditions (1) are fulfilled, then

lim
n→∞

P







sup
0≤s≤t

Qj (ns) − β̂j · n · t

σ̂j ·
√

n
< x






=

∫ x

−∞

exp

(

− y
2

2t

)

dy,

0 ≤ t ≤ 1 and j = 1, 2, . . . , k.

and


10 Saulius Minkevičius CUBO
12, 1 (2010)

Theorem 2. If conditions (1) are fulfilled, then

lim
n→∞

P







sup
0≤s≤t

Vj (ns) − β̃j · n · t

σ̃j ·
√

n
< x






=

∫ x

−∞

exp

(

− y
2

2t

)

dy,

0 ≤ t ≤ 1 and j = 1, 2, . . . , k.

Proof. These theorems are proved in [7], and the proof is therefore omitted here so as not to

lengthen this short paper.

3 The Reliability of a Computer Network

Now we present a technical example from the computer network practice. Assume that queues

arrive at a computer vj at the rate λj per hour during business hours, j = 1, 2, . . . ,p+r. These

queues are served at the rate µj per hour in the computer vj , j = 1, 2, . . . ,p+r. After service in the

computer vj , with probability pj (usually pj ≥ 0.9), they leave the network and with probability
pji, i 6= j, 1 ≤ i ≤p+r (usually 0 < pji ≤ 0.1) arrive at the computer vi, i = 1, 2, . . . ,p+r.
Also, we assume the computer vi fails when the virtual waiting time of queues is more than

kj , j = 1, 2, . . . , p, and the computer vj fails when the queue length of queues is more than

γi, i = 1, 2, . . . , r.

In this section, we prove the following theorem on the probability that a computer network

fails due to overload.

Theorem 3. If t ≥ max
(

max
1≤j≤p

kj

β̂j
, max
1≤i≤r

γi

β̃i

)

and conditions (1) are fulfilled, the computer

network becomes unreliable (all computers fail).

Proof. At first, using Theorem 1 and Theorem 2, we get that for x > 0

lim
n→∞

P







sup
0≤s≤t

Qj (ns) − β̂j · n · t

σ̂j ·
√

n
< x






=

∫ x

−∞

exp

(

− y
2

2t

)

dy, j = 1, 2, . . . , p. (2)

and

lim
n→∞

P







sup
0≤s≤t

Vi(ns) − β̃i · n · t

σ̃i ·
√

n
< x






=

∫ x

−∞

exp

(

− y
2

2t

)

dy, i = 1, 2, . . . , r. (3)


CUBO
12, 1 (2010)

Analysis of the Component-Based Reliability ... 11

Let us investigate a computer network which consists of the elements (computers) αj that

are indicators of stations Xj , j = 1, 2, . . . , p and elements (computers) νi that are indicators of

stations Yi, i = 1, 2, . . . , r

Denote

Xj =

{

1, if the element αj is reliable

0, if the element αj is not reliable,

j = 1, 2, . . . , p and

Yi =

{

1, if the element νi is reliable

0, if the element νi is not reliable,

i = 1, 2, . . . , r.

Note that {Xj = 1} = { sup
0≤s≤t

Qj(ns) < kj}, j = 1, 2, . . . , p and {Yi = 1} = { sup
0≤s≤t

Vi(ns) <

γi}, i = 1, 2, . . . , r. Denote the structural function of the system of elements, connected by scheme
1 from p + r (see, for example, [8]), as follows:

φ(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) =

{

1,
∑p

j=1 Xj +
∑r

i=1 Yi ≥ 1
0,

∑p

j=1 Xj +
∑r

i=1 Yi < 1.

Assume y =
∑p

j=2 Xi +
∑r

i=1 Yi. Estimate the reliability function of the system (computer

network) using the formula of conditional probability

h(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) = Eφ(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) =

P (φ(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) = 1) = P (

p
∑

j=1

Xj +

r
∑

i=1

Yi ≥ 1) =

P (X1 + y ≥ 1) = P (X1 + y ≥ 1|y = 1) · P (y = 1) + P (X1 + y ≥ 1|y = 0) · P (y = 0) =
P (X1 ≥ 0) · P (y = 1) + P (X1 ≥ 1) · P (y = 0) ≤ P (y = 1) + P (X1 ≥ 1) =
P (y = 1) + P (X1 = 1) ≤ P (y ≥ 1) + P (X1 = 1) =

P (

p
∑

j=2

Xj +

r
∑

i=1

Yi ≥ 1) + P (X1 = 1) ≤ · · · ≤
p
∑

j=1

P (Xj = 1) +

r
∑

i=1

P (Yi ≥ 1)

≤ · · · ≤
p
∑

j=1

P (Xj = 1) +
r
∑

i=1

P (Yi = 1) =

p
∑

j=1

P ( sup
0≤s≤t

Qj (ns) ≤ kj ) +
r
∑

i=1

P ( sup
0≤s≤t

Vi(ns) ≤ γi).

Thus,

0 ≤ h(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) ≤
p
∑

j=1

P ( sup
0≤s≤t

Qj (ns) ≤ kj ) +
r
∑

i=1

P ( sup
0≤s≤t

Vi(ns) ≤ γi).

(4)


12 Saulius Minkevičius CUBO
12, 1 (2010)

Applying Theorem 1, we obtain that

0 ≤ lim
n→∞

P ( sup
0≤s≤t

Qj (ns) < kj ) =

lim
n→∞

P







sup
0≤s≤t

Qj(ns) − β̂j · n · t

σ̂j ·
√

n
<

kj − β̂j · n · t
σ̂j ·

√
n






=

∫ −∞

−∞

exp

(

− y
2

2t

)

dy = 0.
(5)

j = 1, 2, . . . , p.

From (5) it follows that for kj < ∞,

lim
n→∞

P

(

sup
0≤s≤t

Qj(ns) < kj

)

= 0, j = 1, 2, . . . , p. (6)

Similarly as in (5) - (6) we prove that for γi < ∞

lim
n→∞

P

(

sup
0≤s≤t

Vi(ns) < γi

)

= 0, i = 1, 2, . . . , r. (7)

Consequently, lim
n→∞

h(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) = 0 (see (5)-(7)), which completes the

proof.

Finally, we provide an expression for h(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t), t > 0 by proving the

following theorem.

Theorem 4. h(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) is equal to exp(−
∑p

j=1 P ( sup
0≤s≤t

Qj(ns) < kj ) −
∑r

i=1 P ( sup
0≤s≤t

Vi(ns) < γi)).

Proof. Let λj , j = 1, 2, . . . , p and qi, i = 1, 2, . . . , r be the traffic intensities related to each server.

Then the probability of stopping this system is equal to e−
∑p

j=1
λj −

∑

r
i=1

qj (see, for example, [13]).

But

λj = M Xj = P (Xj = 1) = P ( sup
0≤s≤t

Qj (ns) < kj ), j = 1, 2, . . . , p. (8)

and

qi = M Yi = P (Yi = 1) = P ( sup
0≤s≤t

Vi(ns) < γi), j = 1, 2, . . . , .r. (9)

Applying (8) and (9), we obtain that h(X1, X2, . . . , Xp, Y1, Y2, . . . , Yr, t) is equal to

e

−

p
∑

j=1

λj −
r
∑

i=1

qj

= e
−
∑p

j=1
P ( sup

0≤s≤t

Qj (ns)<kj )−
∑

r
i=1

P ( sup
0≤s≤t

Vj (ns)<γi)

and the proof is complete.


CUBO
12, 1 (2010)

Analysis of the Component-Based Reliability ... 13

We thus see that Theorem 4 provides an estimate for the probability that the computer

network fails due to overload at time t.

Received: May, 2008. Revised: September, 2009.

References

[1] Gelenbe, E., Unified approach to the evaluation of a class of replacement algorithms, IEEE

Transaction on Computers, C22(6): 611–618, June 1973.

[2] Gelenbe, E., Probabilistic Models of Computer Systems, Part II, Acta Informatica, 12:

285–303, 1979.

[3] Gelenbe, E., Finkel, D. and Tripathi, S.K., Availability of a distributed computer

system with failures, Acta Informatica, 23(6): 643–655, November 1986.

[4] Gelenbe, E., Wang, X.W. and Onvural, R., Diffusion based statistical call

admission control in ATM, Performance Evaluation, 27-8: 411–436, October 1996.

[5] Gelenbe, E. and Fourneau, J.M., G-networks with resets, Performance Evaluation, 49(1-

4): 179–191, 2002.

[6] Lin, M.S. and Chen, D.J., The computational complexity of reliability problem on dis-

tributed systems, Information Processing Letters, 64(3): 143–147, 1997.

[7] Minkevičius, S., Extreme values in open queueing networks. II, to appear, 2009.

[8] Morder, J.J. and Elmaghraby, S.E., Handbook of operational research models and ap-

plications, Van Nostrand Reinhold, New York, 1978.

[9] Park, K. and Kim, S., A capacity planning model of unreliable multimedia service, Journal

of Systems and Software, 63(1): 69–76, 2002.

[10] Pearn, W.L, Ke, J.C. and Chang Y.C., Sensitivity analysis of the optimal management

policy for a queueing system with a removable and non-reliable server, Computers and Indus-

trial Engineering, 46(1): 87–99, 2004.

[11] Perry, D., Posner, M.J.M., A correlated M/G/1-type queue with randomized server repair

and maintenance modes, Operations Research Letters, 26(3): 137–148, 2000.

[12] Sztrik, J. and Kim C.S., Markov-modulated finite-source queueing models in evaluation

of computer and communication systems, Mathematical and Computer Modelling, 38(7-9):

961–968, 2003.

[13] Zubov, V., Mathematical theory of reliability of queueing systems, Radio i Sviaz, 1964.