Microsoft Word - Proofreading-Comp040616-1.doc


51-62 SQU  Journal For  Science, 10 (2005) 
 © 2005 Sultan Qaboos University 

51 

Conditional Fault-Diameter of Torus 
Networks 

Abderezak Touzene and Khaled Day 

Department of Computer Science, College of Science, Sultan Qaboos 
University, P.O. Box 36, Al-Khod 123, Sultanate of Oman,  Email: 
touzene@squ.edu.om, kday@squ.edu.om.  

  Torusالقطر المختل المشروط لشبكات 

  عبدالرزاق توزان و خالد داي 

 المربعة المختلة مع افتراض توزيعات مشروطة لنقاط        Torusنحصل في هذا البحث على قطر شبكات         :خالصة  
نبـرهن أن   . الخلل حيث أننا نفترض أن لكل نقطة سليمة في الشبكة ما ال يقل عن نقطة مجاورة سليمة من الخلل                  

 المربعة والتي يساوي مقدار الربط فيها أربعة، وتحت الشرط المذكور لتوزيع نقاط الخلل، يمكن أن                Torusشبكة  
 Torusنستنتج من هذا أن القطر المختل المشروط لشبكات         . تتحمل إلى خمس نقاط خلل بدون قطع االتصال بينها        

 المربعة يساوي القطـر فـي       Torusوط لشبكات   كما نستنتج أن القطر المختل المشر      .المربعة قيمته ستة وحدات   
التي تحتوي علـى    ( المربعة إلى مجموعة الشبكات      Torusبهذه النتيجة تنضم شبكة     . غياب أي خلل زائد وحدتان    

والتي يساوي القطر المختل المشروط فيها القطـر السـليم زائـد            ) star-graph وشبكات   hypercubeشبكات  
راسة نظامين خوارزميين لتوجيه االتصاالت ذات قدرة على تحمل نقاط خلل في            نقترح أيضا في هذه الد    . وحدتان

 .باستخدام المسالك غير المتقاطعة التي تم تصميمها في هذا البحثالشبكة وذلك 
 

ABSTRACT: We obtain the conditional fault-diameter of the square torus 
interconnection network under the condition of forbidden faulty sets (i.e. assuming that 
each non-faulty processor has at least one non-faulty neighbor). We show that under this 
condition, the square torus, whose connectivity is 4, can tolerate up to 5 faulty nodes 
without becoming disconnected. The conditional node connectivity is, therefore, 6. We 
also show that the conditional fault-diameter of the square torus is equal to the fault-free 
diameter plus two. With this result the torus joins a group of interconnection networks 
(including the hypercube and the star-graph) whose conditional fault-diameter has been 
shown to be only two units over the fault-free diameter. Two fault-tolerant routing 
algorithms are discussed based on the proposed vertex disjoint paths construction. 
 
KEYWORDS: Fault-tolerance, forbidden faulty sets, torus, node connectivity, 
conditional node connectivity, conditional fault-diameter, fault-tolerant routing. 
 
 
ABDEREZAK TOUZENE AND KHALED DAY  

 52

1. Introduction 

he node connectivity and the fault diameter have been used as measures of the fault-tolerance of 
interconnection networks. These measures however do not reflect the real resilience of these 

networks. It is true that when the number of faulty nodes is equal to the connectivity, the network may 
become disconnected. However this is very unlikely to happen since only very special fault 
distributions of these faults cause disconnection. For instance, in the torus, the node-connectivity is 4 
and therefore 4 faulty nodes may cause disconnection. However, the network becomes disconnected 
only when all the 4 faulty nodes are neighbors of the same node which is very improbable. 

In an attempt to better quantify the fault resilience of a network, the concept of forbidden faulty 
sets has been introduced Esfahanian (1989). The idea is to assume that each node has at least one non-
faulty neighbor. Under this forbidden faulty set condition the number of tolerable faulty nodes is 
significantly larger with a slight increase in the fault diameter. Esfahanian (1989) proved that for the 
binary n-cube, whose connectivity is n, 2n-3 nodes can fail (under the forbidden faulty set condition) 
without disconnecting the network. Latifi  (1993) then showed that the corresponding conditional fault 
diameter increases only by 2. In Rouskov et al. (1996) similar results for the star graph network have 
been established.  (Latifi and Naraghi-Pour, 1994) has generalized this idea by assuming that each 
node has at least k non-faulty neighbors. Similar results for the m-ary generalized n-cube network 
have also been obtained (Wu, 1998). 

In this paper we contribute to the study of the properties of the torus by establishing its 
conditional node connectivity and its conditional fault-diameter. We show that under the condition of 
forbidden faulty sets, the torus (whose connectivity is 4) can tolerate up to 5 faulty nodes without 
becoming disconnected. The corresponding conditional fault-diameter is shown to be equal to the 
fault-free diameter plus two (i.e. 2+k ). A fault tolerant routing algorithm is discussed based on the 
construction of node-disjoint paths in the presence of faults under the forbidden faulty sets condition. 
The rest of the paper is organized as follows. In section 2 we introduce the notations and include some 
preliminaries. Section 3 establishes the conditional faulty diameter of the torus. In section 4 we 
discuss how a fault tolerant routing can be derived and section 5 concludes the paper. 

2.   Notations and Preliminaries 

Definition 1: The node-connectivity C(G) (or point-connectivity) of a graph G is the minimum 
number of nodes of G whose removal results in a disconnected or trivial graph.  

For any two nodes in the torus there are four node-disjoint paths connecting them (Day and Al-
Ayyoub, 1997), therefore C(torus) = 4. We now define the conditional node connectivity. 
Definition 2: The conditional node connectivity CC(G) of a graph G is the minimum number of nodes 
of G whose removal results in a disconnected or trivial graph, provided that each of the remaining 
nodes has at least one adjacent node in G that is not removed. 

We will obtain the conditional node connectivity of the torus as a consequence of obtaining its 
conditional fault-diameter. 
 Definition 3: The square torus (k-torus) is a wrap-around mesh which consists of  k rows and k 
columns. Each row and each column consists of a cycle of  k  nodes.  Any two nodes are connected by 
two paths one is called the shortest path on the cycle and the other is called the longest path on the 
cycle. 

In what follows we will adopt the following notations: If we consider two nodes x and y in the 
same column Cm ( xm, and y m), m=0..k-1, we denote by xS

m, yS
m the respective neighbors of x and y 

located on the shortest path between x and y in Cm.  We denote by xL
m, yL

m the respective neighbors of 
x and y located on the longest path between x and y in Cm.  We use similar notation for the second 

T 


CONDITIONAL FAULT-DIAMETER OF TORUS NETWORKS  

 53 

node in the shortest path  xSS
m, ySS

m .  And xLL
m, yLL

m  are  the second nodes in the longest path between 
x and y in Cm.  Any node xm in Cm has 4 neighbors: xS

m, xL
m, xm+1, xm-1.  See Figure 1. 

 
                                                     1−mC        mC              1+mC             

                                             x LL 
 
                                                                            xL  
                                                                             x  
           
                                                                               x S 
                                                                               x SS 
 
 
                                                                               y SS   
                                                                              y S 
                                                                             y  
 
                                                                             y L 
                                                                             y LL 
 
 
Figure 1.  x and y in the same cycle mC . 

 
3.   Conditional Fault-Diameter of the k-Torus 

The fault-diameter of an interconnection network is the maximum distance between any two 
nodes in the presence of a number of faults equal to the connectivity of the network minus one. 
Definition 4: The fault-diameter FD(G) of a graph G is the maximum distance between any two 
nodes of G in the presence of at most C(G)-1 faulty nodes.  
Definition 5: The conditional node diameter CFD(G) of a graph G is the maximum distance between 
any two nodes of G in the presence of at most C(G)-1 faulty nodes, provided that each of the non-
faulty nodes has at least one non-faulty adjacent node in G. 
We now obtain the conditional fault-diameter of the  k-torus. 
Theorem 1. The conditional fault diameter of the k-torus under the forbidden sets assumption is k+2. 
Proof: Consider two nodes x and y at distance D(x, y)=L in a k-torus with at most 5 faulty nodes. In 
the first case, we consider that x and y belong to the same column C0.  The case where x and y belong 
to a same row is symmetric and can be treated similarly. In the second case we consider that x and y 
are located in different columns and different rows of the torus. 
Case 1.  If x and y are on the same columns (say C0 ) and similarly for x and y on the same row. 
Case 1.1. If  all faults are either outside the target cycle C0 or are located along the longest path 
between x and y, then the shortest path between x and y is fault-free of length L ≤ k+2 
Case 1.2. If there is only one fault in the target cycle and this fault is located on the shortest path 
between x and y, then the longest path between x and y is fault-free of length  k-L ≤ k+2. 
1.3. If the number of faults in the target cycle is 4 or 5, then there are two node-disjoint paths between 
x and y:   

• B1: x, x1, shortest path in C1 to reach y1, y. 


ABDEREZAK TOUZENE AND KHALED DAY  

 54

• B2: x, xk-1, shortest path in Ck-1 to reach yk-1, y. 
       At  least  one  of these two paths is fault-free.  The length of these paths is L+2 ≤  k+2.  See    
       Figure 2 (a). 
Case 1.4. The number of faults in  C0 is 2 or 3: 

We discuss several cases based on the number of faults which are neighbors of x and y.   
Case 1.4.1. If  these faults are located on the shortest path between x and  y, then we have the same               
conclusion as in case 1.2.  
Case 1.4.2. All faults are not neighbors of x or y.  The following four paths have no common nodes 
other than x, y and their neighbors: 

• P1: x, xS0, xS1, shortest path in C1 to reach ys1, ys, y. 
• P2: x, xS0, xSk-1, shortest path in Ck-1 to reach ySk-1, yS, y. 
• P3: x, xL0, xL1, longest path in C1 to reach yL1, yL, y. 
• P4: x, xL0, xLk-1, longest path in Ck-1 to reach yLk-1, yL, y. 

If the number of faults in C0 is 3, then at least two of these paths are fault-free.      
If the number of faults in C0 is 2, then at least one of these paths is fault-free.   
The length of the above paths is less than k+1 < k+2.  See Figure 2 (b). 
Case 1.4.3.  One fault is neighbor of y, the second and the third (if 3 faults) are not a neighbor of  x: 
this case can be solved using slight modification of the paths in case 1.4.2 as follows:  
If the faulty neighbor of y is yS

0: We use P3, P4 and  
• M1: x, xS0, xS1, shortest path in C1 to reach y1, y (path length L+2 ≤  k+2). 
• M2: x, xS0, xSk-1, shortest path in Ck-1 to reach yk-1, y (path length L+2 ≤ k+2).   

If  the faulty neighbor of y is yL
0 , we use P1, P2  and  

• M3: x, xL0, xL1, longest path in C1 to reach yL1, y1, y (path length k-L+2 ≤  k+2) 
• M4: x, xL0, xLk-1, longest path in Ck-1 to reach yLk-1, y k-1, y (path length k-L+2 k+2). See Figure 2  
(c). 

The symmetric case where the fault is neighbor of x is similar.  
Case 1.4.4. Two faults are neighbors of y (yS

0 and yL
0), the third fault, if any, is not neighbor of x. 

There are four paths:  M1, M2, and  
•  S1: x, x1, x2, shortest path in C2 to reach y2, y1, y (path length L+4 ≤  k+2). 
•  S2: x, xk-1, xk-2, shortest path in Ck-2 to reach yk-2, yk-1, y (path length L+4 ≤ k+2). 

Note that y1 and yk-1 cannot be both faulty at the same time (forbidden sets assumption). If both are not 
faulty, then one of these four paths will be fault-free. Assume now that one of them is faulty (say yk-1).   
We end up only with M1, S1 (a minimum of three faults already consumed). An additional path is             
M3.  In the presence of a maximum of 2 other faults, one among the three paths is fault-free of  length  
less than or equal to k-L+2 < k+2. See Figure 2 and 3. Now we consider a third fault in C0, which is a 
neighbor of x.  If the third faulty node is  xL

0
 , then we make use of the same paths even without the 

need to M3.  If the third faulty node is xS
0, then we use the four paths: B1, B2, M3 and M4.  Since y1 

and yk-1  are common nodes to the four paths, the same distribution of faults on  y1 and yk-1 leads to the 
same  conclusions.   
The case where both faults are neighbors of x only is a symmetric case and can be solved similarly. 
Case 1.4.5. One fault is neighbor of y and the second is neighbor of x: If the faulty neighbor of x is xL

0 
we will use the two paths P1, P2 (if yS

0 is fault-free),  or M1 and  M2  (if  yS
0 is faulty).  The two other 

paths are:    
• M5: x,  x1, longest path in C1 to reach yL1, y1, y (path length k-L+2 ≤  k+2), 
• M6: x, xk-1, longest path in Ck-1 to reach yLk-1, y k-1, y (path length k-L+2 ≤  k+2),  if yL0 is faulty 
(yS

0  is fault-free), or  
• M7: x, x1, longest path in C1 to reach yL1, yL, y  (path length L+2 ≤ k+2), 

 
CONDITIONAL FAULT-DIAMETER OF TORUS NETWORKS  

 55 

  1−kC       0C          1C             1−kC          0C          1C          1−kC          0C          1C   
   P4                                   P3                                                                      
                           xL

0                                                                      M4                                     M3 
                                                                          xL

0                                                            xL
0 

                      x 
                                   x 

k-1                                  x 
                    x 

1                                              x  
                                                                     xS

0                                                                       xS
0    

                              xS
0                                                                                                                                        

 P2 P1       B2                                    B1    M2                                   M1             
  
         y s0                                      yS

0                                     yS
0 

                                                                                             
          y                       y 

k-1                               y                     y 
1                              y 

                                   yL
0                              yL

0                                                          yL
0 

  
                          (b)                                           (a)                                         (c) 
 

Figure 2.  Paths for cases 1.3, 1.4.2 and 1.4.3. 
 

            1−kC        0C        1C                   1−kC        0C          1C        2−kC 1−kC 0C   1C   2C        
 
      M6                                    M5      M8                                  M7   
 x  x x 
 
 
                                                                                                           S2                                  S1 
 
                                
                                y                                                    y                                             y                
                             
 
Figure 3. Paths for  cases 1.4.3, 1.4.4 and 1.4.5. 
 

• M8: x, xk-1, longest path in Ck-1 to reach yLk-1, yL, y (path length L+2 ≤ k+2),  if  yL0 is fault- 
          free (yS

0 is faulty). Now if the neighbor of x, xS
0  is faulty: A similar technique is used  to 

construct four node-disjoint paths (the common nodes are fault-free) between x and y. These paths 
will depend on which  neighbor of y will be faulty (yS

0 or yL
0
 ).  

 
ABDEREZAK TOUZENE AND KHALED DAY  

 56

   2−kC 1−kC      0C      1C   2C                                                    1−kC      0C      1C    
  
  x x 
  
 
 L2 L1  B2  B1 
L4                                                L3 
 
                                y y 
 
 
      L6  L5                                                 L6                                 L5 
                           (a)                                                                                       (b) 
                                                      

Figure 4.  Paths for  case 1.4.6. 
 
Case 1.4.6. The number of faults in C0  is two and both faults are neighbors of x and y at the same 
time: In this case y0S , y

0
L  are faulty and x

0
L = y

0
L , (L=2).   The case x

0
S = y

0
S  is symmetric and can be 

solved similarly.  We can construct 6 paths as follows: 
• L1: x, x0S, shortest path in C0 to reach y0SS, ySS1, yS1, y1, y (path length L+2 ≤  k+2). 
• L2: x, x0S, shortest path in C0 to reach y0SS, ySSk-1, ySk-1, yk-1, y (path length L+2 ≤ k+2). 
• L3: x, x0S, xS1, xS2, shortest path in C2 to reach y2,y1, y (path length L+4 ≤ k+2). 
• L4: x, x0S, xSk-1, xSk-2, shortest path in Ck-2 to reach yk-2,yk-1, y (path length L+4 ≤  k+2). 
• L5: x, x1, yL1, y1, y (path length 4). 
• L6: x, xk-1, yLk-1, yk-1, y (path length 4). 

 
Note that the nodes y1 and yk-1  are common nodes to the six paths, but fortunately they cannot be 
faulty at the same time  (forbidden sets  assumption). See Figure 4 (a). If  both are not faulty, then in 
presence of 3 faults outside C0, three among the six  above paths will be fault-free.  If we assume that 
yk-1 is faulty and y1 is fault-free, then at least one of the three paths L1, L3, L5  will be fault-free.  If the 
node yk-1 is not faulty and y1 is faulty,  then at least one of the three paths L2, L4, L6 will be fault-free.  
The length of the above paths is less than or equal to  k+2. 

Now if we consider that there is a third faulty node in C0, which is  xS
0.  We make use of the 

following paths: B1, B2 and L5, L6. See Figure 4(b). Note that these paths have some common nodes:  
L5 and B1 have two nodes in common, which are  x1  and  y1.  L6 and B2 have two nodes in common, 
which are  xk-1  and yk-1.  In presence of two faults outside C0, if all these four nodes are not faulty, then 
there are at least two fault-free paths between x and  y. Under the forbidden sets assumption, nodes  x1   
and xk-1   cannot be both faulty in the same time.  And nodes  y1 and  yk-1  cannot be both faulty  in the 
same time. If both nodes x1 and y1 are faulty, then  B2  and L6 will be fault-free. 

If both nodes xk-1  and  yk-1  are faulty, then B1 and L5 will be fault-free.  If   yk-1 and x1 are faulty, 
then an alternative path will be x, xk-1  in B2,  xS

k-1 in B2,  xSS
k-1 in B2 ,  xSS

0
 in C

0,    xSS
1  in B1, continue 

on B1 to reach y.  And similarly for the symmetric case (y1 and   xk-1  are faulty), jump from B1 to B2 
from the node xSS

1 in B1 to the node xSS
0 , xSS

k-1 in B2, then  continue on B2.  The length of these paths 
is less than or equal to k+2. 
Case 2.  Nodes x and y do not belong to the same  column or the same row. Without loss of generality,  
we assume that x =x0 belongs to C0,  and y=ym belongs to Cm. Let us consider the six-node disjoint 
paths connecting the columns C0 to Cm.   These paths can be divided into two groups: 


CONDITIONAL FAULT-DIAMETER OF TORUS NETWORKS  

 57 

       1−kC  0C              Cm-2 Cm-1 Cm  Cm+1                                    1−kC  0C      Cm-3  Cm-2 Cm-1 Cm  Cm+1 

 
 P* P* 
 
 x x 
  
 
 B1                                    A5     A4    A3  A2          A1 
 
 
 Short_rc 
                                                             y y 
 
 
Figure 5.  Paths for cases 2.3 and 2.4.1. 

 
Group 1 

• xL0 ,xL1, xL2 , shortest path in this row to reach xLm-1 , xLm . 
• x, x1 , x2 , shortest path in this row to reach xm-1 , xm . 

 
Group 2  

• xS0, xS1 , xS2 , shortest path in this row to reach xSm-1 , xSm . 
• yL0, yL1 , yL2 , shortest path in this row to reach yLm-1 , yLm . 
• y0, y1 , y2 , shortest path in this row to reach ym-1 , ym . 
• yS0, yS1 , yS2 , shortest path in this row to reach ySm-1 , ySm . 
 

In presence of five faults, at least one of the above six paths is fault-free.   If the fault-free path is 
in group 1, then we will construct paths starting from x going to y. Otherwise, we construct paths 
starting from y going to x.  In what follows, to simplify the analysis, we will always consider that the 
fault-free path is from group 1, and we denote this path  P* . The other case is similar (symmetric). 
Case 2.1. The column Cm is fault-free: There is at least one fault-free path from x to y, which starts at 
x,  P*  to reach x*

 m
, shortest path in C

m to reach y (x*
 m  represents xS

 m, x m or xL
 m  depending on which 

of the three paths of group 1 corresponds to P* ). The length of the constructed fault-free path between 
x and y is at most  L+2 ≤ k+2. 
Case 2.2. The column Cm contains 5 faults: The following path named Short_rc (rc stands for rows 
then columns): x, xS

0
,  shortest path in C

0 to reach  y0, shortest path in this row to reach ym-1 , y.  This 
path is fault-free of  length L ≤ k+2. 
Case 2.3. The number of faults in Cm  is 4. We use the path Short_rc, if it is fault-free. Otherwise 
Short_rc contains the fifth fault and there are no more faults outside C

m. An alternative way is to start 
from x, P*  to reach x*

 m
 , x*

 m+1 , shortest path on Cm+1 to reach ym+1 , y. The length of the constructed 
fault-free path using B1 is L+4.  See Figure 5. 
Remark 1.  If  L ≤ k-2, then the length of the constructed path is less than or equal to k+2.  Otherwise, 
(L= k-1, or k), we construct another path as follows:   x, xk-1,  xk-2,  longest path on this row to reach 
xm+1 ,  shortest path in Cm+1 to reach y m+1, y.  It is simple to see that the length of this fault-free path in 
all cases (L >  k-2) is less than or equal to k+2. 


ABDEREZAK TOUZENE AND KHALED DAY  

 58

Case 2.4. The number of faults in Cm  is 1 (4 faults outside). 
Case 2.4.1.  The nodes  yS

m , yL
m  are both not faulty:  We consider the 5  following paths: 

• A1: x, P*  to reach x* m,   x* m+1, shortest path in Cm+1 to reach ym+1, y (path length L+4). 
• A2: x, P*  to reach x* m-1, shortest path in Cm-1 to reach ySm-1, ySm, y  (path length L+2). 
• A3: x, P*  to reach x* m-2, shortest path in Cm-2 to reach  ym-2, ym-1  , y (path length L+2). 
• A4: x, P*  to reach x* m-3, shortest path in Cm-3 to reach yLm-3, yLm-2,yLm-1, yLm, y  (path length L+4). 
• A5: x, xS 0 , shortest path in C0 to reach yLL0, shortest path in this row to reach  yLLm,yLm , y (path 
length L+4 ). 

These five paths are node-disjoint or the common nodes are fault-free. At least one among the five 
path will be fault-free.  If  the fault-free path is of length L+4,  then if L ≤ k-2, the considered path will 
have a length ≤ k+2. Otherwise, we construct an alternative path using a similar technique  as 
described in remark 1.  See Figure 5. 
Remark 2. The existence of the paths A4, A3, A2 is subject to the condition: the column distance c 
between C0  and Cm is greater or equal to 4. Otherwise, we consider the extreme case where no one of 
these paths exists (c=1).  Since there is only one fault in Cm , if the fault belongs to the longest path 
between xm  and y in Cm, then  the path x, xm , shortest path in Cm  to reach y, will be fault-free.  
Otherwise, the longest path between xm  and y  in Cm will be fault-free and we construct five paths as 
follows: 

• 1: x, x0L,  PL  to reach xL m,   xL m+1, shortest path in Cm+1 to reach ySm+1, ySm, y.  
• 2: x, x0L,  PL  to reach xL m,  longest path in Cm  to reach  y. 
• 3: x, x 0L,  PL  to reach xL m,  xLL m,   xLL m+1,   longest path in Cm+1  to reach ym+1,  y. 
• 4: x, x0S,   shortest path in C0 to reach y0, shortest path in this row to reach  y.  
• 5: x, x0L,  longest path in C0  to reach yL0, shortest path in this row to reach  y. 

These five paths are node disjoint or the common nodes are fault-free. If we denote by  r the row 
distance between x and y , the longest path will be of length k-r+1 ≤  k+2. See Figure 6. 
Case 2.4.2. Node  yS

m  is  faulty. The case where yL
m  is  faulty is trivial, because the shortest path 

between x*
 m

 and  y  will be fault-free and of maximum length of  k+2.   
We construct the following four paths: 

• C1: x, P*  to reach x* m,   x* m+1, shortest path in Cm+1  to reach ym+1, y (path length L+4). 
• C2: x, P*  to reach x* m-1, shortest path in Cm-1 to reach ym-1, y (path length L+2 ). 
• C3: x, P*  to reach x* m-2, shortest path in Cm-2 to reach  yLm-2, yLm-1, yLm, y (path length L+4). 
• C4: x, xS0, shortest path in C0  to  reach  yLL0 , shortest path in this row to reach yLLm, yLm, y (path 
length L+4). 

If any one of the above paths is fault-free, the problem is solved. Otherwise, all the faults are 
consumed. We focus on the fault of  path C4: If the fault belongs to the shortest path within C0 , an 
alternative path (disjoint with C1, C2, C3) will be x, x1, shortest path in C

1  to reach  yLL
1 , shortest path 

in this row to reach yLL
m, yL

m,  y (path length ≤ L+4 ).  If the faulty node is not in C0 , an alternative 
path will be : C4’:  x, xS, shortest path in C

0  to reach  yLL
0 ,  yLLL

0  (successor of yLL
0 ), shortest path in 

this row to reach yLLL
m, yLL

m,  yL
m,  y (path length L+6).   If L  ≤  k-4, this path is fault-free of length ≤  

k+2.   See Figure 7 (a). 
Remark 3. If  L > k-4, we need to construct another path using longest path which are not much 
longer than the shortest paths (see remark 1 for  L= k or k-s). Let us decompose L=  r+c, where r 
(respectively c) is the row distance (respectively the column distance) between x and y. If we focus on 
the case L= k-3, there are two cases: 

• r=
 

/ 2k⎢ ⎥⎣ ⎦  and c= / 2 3k⎢ ⎥ −⎣ ⎦ , the alternative path will be Pr: x, x L
0 , longest path in  C0 to 

reach yLLL
0, shortest path in this row to reach yLLL

m, yLL
m, yL

m, y. (path length=k-2 < k+2).  If 


CONDITIONAL FAULT-DIAMETER OF TORUS NETWORKS  

 59 

c= / 2k⎢ ⎥⎣ ⎦  
and r= / 2 3k −⎢ ⎥⎣ ⎦ , the alternative path will be Pc: x, x 

k-1, longest  path that row to reach  
xm+2, shortest  path in Cm+2  to reach yL

m+2, yL
m+1, yL

m,  y. (path length=k-2 < k+2) 
 
                                              1−kC 0C   Cm  Cm+1 

 
 x 
 
 
1 
 4 
 
 
 y 
  
 
  5 2 3 

 
Figure 6.  Distance between C0  and Cm is 1 . 

 
• r= / 2 1k⎢ ⎥ −⎣ ⎦ and c= / 2 2k⎢ ⎥ −⎣ ⎦ , the alternative path is Pr (path length= k-1 < k+2).   Pc will 
be used for the reversed case. 

A similar technique leads to the construction of a fault-free path between x and y, of length ≤  k+2, 
when L= k-2. 
Case 2.5. The number of faults in Cm  is 2 or 3: 
Case 2.5.1. The nodes  yS

m , yL
m  are both not faulty or only one of them is faulty: Similar  to 2.4.  

Case 2.5.2. The nodes  yS
m , yL

m  are both faulty: The nodes ym-1 , ym+1 could not be both faulty at the 
same time (forbidden sets assumption). 
Case 2.5.2.1. If the node ym-1 is not faulty. We consider the paths  C1, C2, A3 and  

• D1 : x, xS  , shortest path in  C0  to  reach   yL0 , shortest path in this row to reach   yLm-1,  ym-1,  y  
(path length L+2 ). 

Note that the three paths A3, C2 , D1 have only one common node (ym-1).  If  there are 3 faults in Cm, 
then two among the four above paths will be fault-free.   
If there are only two faults in Cm, then only one among the four above paths will be fault-free.   
Case 2.5.2.2.  If   the node ym+1 is fault-free (ym-1 is faulty), we consider the paths  C1 and 

• D2: x, xS, shortest path in C0 to  reach  yLL0 , shortest path in this row to reach yLLm+1, yLm+1, ym+1, 
y. (path length L+6 ) 

If C1 is fault-free, the problem is solved.  Otherwise, we use the following path: 
• D3: is a modification of the path C1, bridging the faulty node, using the nodes of Cm+2, then 

back to C1 (path length L+6 ). 
In the presence of  three faults in Cm, D2 and D3 will be fault-free.  If there are only two  faults in Cm, 
then only D2 or D3  will be fault-free.  The longest path (L+6) will be acceptable when  L ≤  k-4.   
Otherwise, an alternative path of length less than k+2 will be derived using the same technique as in 
remark 3. See Figure 7 (b). 

 
ABDEREZAK TOUZENE AND KHALED DAY  

 60

    1−kC  0C               Cm-2 Cm-1 Cm  Cm+1   1−kC  0C             Cm-2 Cm-1 Cm  Cm+1 
 
 P* P* 
 
 x  x 
 
 
      C4              C3 C2 C1                                                                                   D3 
 
 
 y y 
                                                                                                   D1  
 D2 
 
 C4’ 
  
                           (a)    (b) 
 

Figure 7.  Paths for cases 2.4.2 and 2.5. 

4.     Fault-Tolerant Routing 

We now illustrate the usefulness of the fault-free paths constructions of the previous section by 
describing two fault-tolerant routing algorithms that make use of these paths. Fault-tolerant routing 
has drawn a lot of attention in the literature (Suh et al. 2000; Chen et al. 2001; Ho et al. 2002; Wu 
2003). Different ways of achieving fault-tolerant routing have been proposed. Some methods require 
extra hardware resources to route packets around faulty components while others simply bypass faulty 
components, as well as some healthy components, to maintain network regularity such as in (IBM 
2002). Another technique is based on reconfiguring the routing tables in the case of failure, adapting 
them to the new topology after the failure such as in Casado et al. (2001), Pinkston et al. (2003) and 
Lysne et al. (2003). A more recently proposed methodology uses intermediate nodes (Gomez et al. 
2004). The methodology is valid for any network topology. In Nordbotten et al. (2004), the idea of 
using intermediate nodes is extended to using multiple intermediate nodes for some paths. 

In what follows, we describe two fault-tolerant routing algorithms for the square torus. It has 
been shown in the previous section that for any two nodes X and Y at distance L, we are able to 
construct between these two nodes a fault-free path of length at most L+6  ≤  k+2 (assuming at most 
five faulty nodes satisfying the forbidden faulty set condition). We propose the following two fault-
tolerant routing algorithms: 

4.1    Parallel Routing Algorithm  

Parallel routing algorithm has been used to design fault tolerant routing for the star graph in 
Rouskov et al. (1996).  In our case, the basic idea is first to identify the case (see section 3) according 
to the location of X and Y in the torus and then apply the described construction of the parallel paths 
between X and Y. The routing process can send a message in parallel on the constructed vertex-
disjoint paths to guarantee delivery because at least one of these paths is not faulty. This routing 
process is known as parallel routing algorithm. It consists of generating all the possible paths of 
interesting length between the source and the destination nodes. The generation of the routes results 


CONDITIONAL FAULT-DIAMETER OF TORUS NETWORKS  

 61 

from the union of all the paths generated for each case described in the previous section. As an 
example, if the source node X and the destination node Y do not belong to the same row and not to the 
same column, all the paths of case two will be generated. The generation of all paths of the case is 
necessary because the distribution of the faults is not known. Each fault distribution may correspond 
to a sub-case of the given case and thus to a set of paths to cover the corresponding fault distribution.  
The disadvantage of this routing algorithm is the communication overhead due to the presence of 
multiple copies of the same message in the network. If a wormhole routing model is used, this 
algorithm may increase link contention problem and thus the communication delay.   

4.2    Single Path Fault-Tolerant Routing Algorithm 

It is possible to achieve lower cost fault-tolerant routing using a single routing path but at the 
expense of additional failure detection and routing path reconfiguration mechanisms. A similar 
approach has been used in Day and Al-Ayyoub (2001) for the star graph. We can associate with each 
source node X and destination node Y, at any time, only one path among all the possible paths 
generated in the cases of the previous section.  This path is called the active routing path.  In our 
routing algorithm the choice of the active routing path starts by simply selecting the shortest path 
among the generated ones (cases) to ensure high communication efficiency. In fact, this routing 
approach requires link failure detection mechanisms and a mechanism for switching to a new active 
routing path for a given source node when a link failure is detected on its current active routing path. 
First we make the following failure assumptions: 
1. Lower level fault detection and diagnosis mechanisms exist. Each node knows exactly the failure 

status of all its adjacent links. Each node's information about the status of its adjacent links is 
updated dynamically via these fault detection and diagnosis mechanisms. 

2. Link failure and repairs may happen at any time, but the total number of faulty links does not 
exceed 5 under forbidden faulty set condition at any time. 

3. We also assume that the delay for detecting an adjacent link failure is negligible. 
A routing initiated at a source node X continues to follow the edges of its current active routing path 
(ARP), until a link failure is detected. The node that has detected the failure will report it to the source 
node X using the ARP backward. The source node will switch to another ARP chosen from the 
remaining possible paths (the shortest among them) and then resends the message using the new ARP.  

This approach guarantees message delivery; in the worst case it may use all the possible ARPs 
(at least one among them is fault-free). This approach is efficient because it uses shortest paths as 
much as possible and it reduces considerable link contentions under heavy load network condition. 

5.    Conclusion 

We have contributed in this paper to the study of the fault-tolerance of the square torus 
interconnection network by establishing its conditional fault-diameter under the condition of 
forbidden faulty sets (i.e., assuming that each non-faulty processor has at least one non-faulty 
neighbor). We have shown that under this condition the k-torus, whose connectivity is 4, can tolerate 
up to 5 faulty nodes without becoming disconnected. The conditional node connectivity in this case is 
therefore 6.  We have also shown that the conditional fault-diameter of the k-torus is 2k + . With this 
result the torus joins a group of interconnection networks (including the hypercube and the star-graph) 
whose conditional fault-diameter has been shown to be only two units over the fault-free diameter.  
We have also illustrated the usefulness of our results by describing two possible fault tolerant routing 
algorithms for the torus under the forbidden faulty set condition. 


ABDEREZAK TOUZENE AND KHALED DAY  

 62

6.    References 

CASADO, R., BERMUDEZ, A., DUATO, J., QUILES, F.J., and SANCHEZ, J.L. 2001. A Protocol 
for Deadlock-Free Dynamic Reconfiguration in High Speed Local Area Networks, IEEE 
Transactions on Parallel and Distributed Systems, 12(2): 115-132.  

CHEN, C.L., and CHIU, G.M. 2001. A Fault-Tolerant Routing Scheme for Meshes with Non Convex 
Faults," IEEE Transactions on Parallel and Distributed Systems, 12(5): 467-475.  

DAY, K., and AL-AYYOUB, A.E. 1997. Fault Diameter of k-ary n-cube Networks, IEEE Trans. on 
Parallel and Distributed Systems, 8(9): 903-907.   

DAY, K., and AL-AYYOUB, A.E. 2001. Adaptive Fault-Tolerant Routing in Star Networks, Journal 
of Interconnection Networks, 2(2): 213-231.  

ESFAHANIAN, A.H. 1989. Generalized Measures of Fault Tolerance with Application to n-Cube 
Networks, IEEE Transactions on Computers, 38(11): 1586-1591.  

GOMEZ, M., DUATO, J., FLICH, J., LOPEZ, P., ROBLES, A., NORDBOTTEN, N., LYSNE, O., 
and SKEIE, T. 2004. An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori, 
Computer Architecture Letters, 3, May. 

HO, C.T., and STOCKMEYER, L.  2002. A New Approach to Fault-Tolerant Wormhole Routing for 
Mesh-Connected Parallel Computers, Proceedings of 16th International Parallel and Distributed 
Processing Symposium, April. 

IBM BG/L TEAM, An Overview of BlueGene/L Supercomputer,  ACM Supercomputing Conference, 
2002. 

LATIFI, S. 1993. Combinatorial Analysis of Fault-Diameter of the n-Cube, IEEE Transactions on 
Computers, 42(1): 27-33.   

LATIFI, S., HEGDE, M., and NARAGHI-POUR, M. 1994. Conditional Connectivity Measures for 
large Multiprocessor Systems, IEEE Trans. on Computers, 43(2): 218-222.  

LYSNE, O., PINKSTON, T., and DUATO, J. 2003. A Methodology for Developing Dynamic 
Network Reconfiguration Processes, Proceedings of the International Conference on Parallel 
Processing, pp. 77-86. 

NORDBOTTEN, N., GOMEZ, M., FLICH, J., LOPEZ, P., ROBLES, A., SKEIE, T., LYSNE, O., and 
DUSTO, J. 2004. A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate 
Nodes, Proceedings of the IFIP International Conference on Network and Parallel Computing, 
pp. 341-356. 

PINKSTON, T., PANG, R., and DUATO, J. 2003. Deadlock-Free Dynamic Reconfiguration Schemes 
for Increased Network Dependability, IEEE Transactions on Parallel and Distributed Systems, 
Vol. 14(8): 780-794.  

ROUSKOV, Y., LATIFI, S., and SRIMANI, P.K. 1996. Conditional Fault Diameter of Star Graph 
Networks, Journal of Parallel and Distributed Computing, 33: 91-97.   

SUH, Y., DAO, B., DUATO, J., and YALAMANCHILI, S. 2000. Software-Based Rerouting for 
Fault-Tolerant Pipelined Communication, IEEE Transactions on Parallel and Distributed 
Systems, 11(3): 193-211.  

WU,  J. 1998. Fault Tolerance Measures for m-Ary n-Dimensional Hypercube Based on Forbidden 
Faulty Sets, IEEE Transactions on Computers, 47(8): 888-983.   

WU, J. 2003. Fault-Tolerant and Deadlock-Free Routing in 2D Meshes Based on Odd-Even Turn 
Model, IEEE Transactions on Computers, 52, No. 9, September. 

 
Received   16 June 2004     
Accepted   10 July 2005