Instruction


FACTA UNIVERSITATIS  

Series: Electronics and Energetics Vol. 29, No 2, June 2016, pp. 309 - 323 

DOI: 10.2298/FUEE1602309S 

IN-CHANNEL MISROUTING SUPPRESSION TECHNIQUE 

FOR DEFLECTION-ROUTED NETWORKS ON CHIP

 

Igor Z. Stojanovic, Goran Lj. Djordjevic 

Faculty of Electronic Engineering, University of Niš, Serbia 

Abstract. Deflection routing, where port-contentions in routers are resolved by intentionally 

misrouting some of packets along unwanted directions instead of storing them, has been 

proposed as a promising approach for improving power and area efficiency of large-scale 

networks on chip (NoCs). However, at high network load, when packets are misrouted more 

frequently, the cost and energy benefits of this simple routing scheme are offset by the 

performance degradation. To address this problem, we propose a technique that uses small 

in-channel buffers to capture some of deflected packets before they take a misrouting hop. 

The captured packets are then looped-back to the routers where they suffered deflection and 

routed again. To improve the efficiency of this in-channel misrouting suppression scheme 

we also slightly modify the routing function of the deflection router by restricting the 

choice of productive directions for misrouted packets. Evaluations on synthetic traffic 

patterns show that the proposed misrouting suppression mechanism yields an improvement 

of 36.2% in network saturation throughput when implemented into the conventional 

deflection-routed network. 

Key words: Network-on-chip, multi-core, deflection routing, misrouting suppression. 

1. INTRODUCTION 

Network-on-chip (NoC) has been proposed as an efficient and scalable solution to the 

challenging on-chip interconnection problems in modern many-core systems on chip 

(SoCs). To accommodate the communication needs of tens or even hundreds of processing 

elements (PEs) integrated on a single chip, this architecture employs dedicated routers 

interconnected by some form of network topology. NoCs typically use wormhole routing 

with virtual channel (wormhole/VC) flow control to route data packets from the source to 

the destination PE. This flow control scheme enables deadlock avoidance, optimize 

channel utilization, improve performance and provide quality of service [1, 2]. Although 

wormhole/VC routing needs considerably less amount of buffer storage then other traditional 

flow control schemes (e.g. virtual cut-through and store-and-forward), the in-router buffers 

are still a significant source of area and energy overhead. For a static random access memory 

                                                           
Received June 3, 2015; received in revised form August 3, 2015 

Corresponding author: igor.stojanovic@elfak.ni.ac.rs 

Faculty of Electronic Engineering, University of Niš, A. Medvedeva 14, 18000 Niš, Serbia 

(e-mail: mita@iritel.com) 


310 I. Z. STOJANOVIC, G. LJ. DJORDJEVIC 

(SRAM) buffer implementation, the input buffers can consume 46% of the total on-chip 

network power while occupying 17% of the total area [3]. To address the issue, several 

bufferless NoC architectures have recently been proposed. In these architectures, in-router 

buffers are removed and contentions among packets are handled by employing the 

deflection routing [4-13]. 

With deflection routing, data packets are divided into flits (flow control units) which 

are then routed independently through the network and reassembled at their destination. 

Flits arrive synchronously on the router’s input ports, and each flit is routed via the output 

port that offers the shortest path to its destination. When two incoming flits require the 

same output port, the router deflects one of the flits to an alternative output port (this is 

always possible as long as the router has as many outgoing as incoming ports). In this 

way, port contentions cause flits to be misrouted temporarily, in contrast with the 

wormhole/VC scheme where such flits must be buffered. 

Deflection routing has several advantages over wormhole/VC scheme. First, since the 

number of incoming ports is equal to the number of outgoing ports, and flits move 

between routers synchronously, deadlock cannot occur. The adaptive nature of deflection 

routing also enables hot spots avoidance and provides fault-tolerance in the network [4]. 

This approach also eliminates the need for backward status links to implement flow 

control, and thus the design of the router is greatly simplified. Finally, the deflection 

routing permits the use of as few as one flit-wide register per inter-router link, thereby 

realizing significant savings in hardware cost and power consumption over wormhole/VC 

NoCs, which must provide ample buffers in each router. Recent studies have shown that 

in the deflection-routed NoCs, the power consumption is reduced by 20-40%, and the 

router area on die is reduced by 40-75% [6]. 

Deflection routers target mainly low-latency operation at low network load [5]. Under 

such load conditions, deflections are rare so that flits rapidly advance toward their 

destinations over shortest paths. On the other hand, under high load, frequent deflections 

might cause flits to deviate significantly from their shortest paths, leading to early saturation 

and poor energy efficiency. The issue of limited maximum throughput of deflection-routed 

networks has been addressed by several prior works. One line of research is aimed at 

improving the design of router’s port allocator and switching (PAS) stage. Within this 

stage, input flits are first permuted and then passed to the router’s output ports so that as 

many flits as possible are directed toward their desired directions. BLESS router uses the 

PAS stage composed of a 44 crossbar switch controlled by an allocator unit that arbitrates 

the flits to output ports based on oldest-first arbitration policy [6]. The full priority ordering 

of flits results in fewer deflections, but it incurs a long critical path delay, thus limiting router 

operation to low clock frequencies. CHIPPER router speeds up the critical path of the router 

by replacing the crossbar with a two-stage permutation network composed of four 

independently controlled 22 switch modules [7]. However, the simplicity of this design results 

in an increased deflection rate, and consequently lowers the maximum network throughput. 

Another line of research deals with techniques for reducing the overhead of flit 

deflection. Such misrouting suppression mechanisms try to prevent deflected flit to take a 

misrouting hop by temporary holding the flit at its current route position. The minimally 

buffered deflection router (MinBD) achieves the misrouting suppression by a small side-

buffer attached between the output and the input of the router’s PAS stage [8]. At each 

clock cycle, the side-buffer can accept up to one of deflected flits from PAS output, and 


 In-channel Misrouting Suppression Technique 311 

resubmit that flit to the PAS input at some later cycle. By preventing a fraction of deflected flits 

to leave the router, this technique significantly improves the maximum network throughput. 

However, it also introduces the contention between the buffered flits and the new flits waiting 

for injection, which can cause the injection unfairness among routers in a highly loaded 

network. In our previous work, we proposed an in-channel misrouting suppression technique, 

referred to as the dual-mode channel, which uses a lightweight link-control mechanism to force 

deflected flits, when possible, to loop-back to their current routers instead of being misrouted 

[9]. This simple and effective method improves performances without compromising the 

injection fairness, but the obtained maximum network throughput is lower than that obtained 

with the side-buffering technique. 

In this paper, we further improve the misrouting suppression efficiency of the dual-

mode channel by adding small buffers at both ends of the channel. These buffers temporary 

store deflected flits that cannot be looped-back during the same clock cycle when they are 

entering the channel. Also, we slightly modify the routing function of the baseline deflection 

router to remove the tendency of misrouted flits to take immediate reverse hops. This 

modification is motivated by our observation that such hops have an adverse effect on how 

often the channel is able to loop-back the deflected flits. When combined, the proposed 

mechanisms suppress more than 50% of misrouting hops, raising the maximum throughput 

by 36.2% with respect to the baseline deflection-routed network. The throughput improvement 

is 8.7% higher than with the side-buffering technique, and is achieved without compromising 

the injection fairness in the network.  

The remainder of the paper is organized as follows. Section 2 provides a background 

on deflection routing including the overview of two representative misrouting suppression 

techniques: the side-buffering and the dual-mode channel. Section 3 presents the novel 

misrouting suppression scheme for deflection-routed NoCs. In Section 4, evaluation and 

results are presented. Section 5 concludes this paper. 

2. DEFLECTION-ROUTED NOC ARCHITECTURE OVERVIEW 

In this section, we first provide a generic model of deflection-routed NoC architecture, 

which includes only the essential features reported in several previous proposals [5-13]. In 

particular, we consider a network of 2D mesh topology composed of non-pipelined (i.e. 

combinational) deflection routers connected by synchronous bidirectional communication 

channels. Then we also discuss two existing techniques to improve the performance of the 

baseline deflection-routed network via misrouting suppression. 

2.1. Baseline 2D mesh deflection network 

Figure 1 illustrates the fundamental elements of a generic 2D mesh deflection-routed 

NoC. The NoC is constructed as a grid of routers where each router is connected by 

bidirectional communication channels only to its neighbors. Each router is also connected 

to a local PE, which serves as a source and sink for data packets. Before being injected to 

the router, packets are split into smaller flow control units, so called flits, and each flit is 

routed independently through the network. In the most basic form, the deflection router is 

a pure combinational logic module, which directs the incoming flits from the input ports 

to the proper output ports. The inter-router communication channel includes a pair of 


312 I. Z. STOJANOVIC, G. LJ. DJORDJEVIC 

oppositely oriented flit-wide edge-triggered registers. Since there are no in-router buffers, 

these so-called flit-registers are the only memory elements for storing flits in transit. 

Therefore, during traveling towards their destinations, flits are always on the move, by 

hopping between the flit-registers and propagating through the routers. 

x,y EW

N

S

flit register

x-1,y x-1,y+1

x,y+1

PE

PE

PE

PE

 
Fig. 1 2D mesh deflection-routed NoC architecture 

Routers attempt to route each flit along a shortest path to its destination. A router 

forwards a flit through a productive output port in a productive direction if the distance 

between the current flit position and its destination decreases. In 2D mesh network, when 

a flit reaches a router, there are at most two productive directions (i.e. output ports) to its 

destination. If the router is not able to grant the productive output port, the flit is deflected 

to any free but non-productive output port. Deflection occurs within the internal router 

structure when multiple incoming flits contend for the same output port. On the other hand, 

the term misrouting refers to an external manifestation of the flit deflection. It corresponds to 

a transfer of a deflected flit over the inter-router channel one hop further in a non-productive 

direction. The cost of misrouting is two clock cycles since each non-productive hop must be 

compensated by one productive hop in the opposite direction. Let note that in the baseline 

deflection-routed network, every flit deflection leads to a flit misrouting. 

E I

C1

C2

C3

C4

PAS

Nin
Ein
Sin

Win

Pin Pout

R

Sout
Nout
Eout
Wout

 
Nout

Eout

Sout

Wout

C1
C2

C3
C4  

a) b) 

Fig. 2 Architecture of baseline deflection router:  

a) internal structure, and b) PAS based on permutation network 


 In-channel Misrouting Suppression Technique 313 

Figure 2a shows the architecture of the deflection router with four pairs of input and 

output network ports (denoted as N - North, S - South, W - West and E - East) and a pair of 

inject and eject ports (denoted as Pin and Pout) which are connected to the local PE. The 

router is composed of four consecutive stages: the routing stage (R), the eject stage (E), 

the inject stage (I), and the port allocation and switching stage (PAS). Through these 

stages, four internal flit-channels, C1, ..., C4, are established to guide flits from the set of 

input to the set of output ports. The routing stage associates a set of productive ports to 

each incoming flit. The routing function is based on offsets in X and Y dimensions 

between the current router and the flit’s destination router. The number of productive 

ports assigned to a flit can be: 0 (flit is addressed to the local PE, i.e. both X- and Y-offset 

are zero), 1 (flit is already at one of the axes of its final destination, i.e. either X- or Y-

offset is zero) or 2 (both X- and Y-offset are different than zero). The eject stage picks 

randomly one of locally-addressed flits (if any), and directs that flit to the local PE. The 

inject stage detects the presence of a free flit-channel and directs the new flit (generated 

by the local PE) to that channel. If the new flit is not injected into the network because all 

flit-channels are occupied, then that flit remains in the PE’s transmission queue and is 

resubmitted in the next clock cycle. The PAS stage permutes and passes the flits from flit-

channels to output network ports. Here, we adopt a PAS stage introduced in CHIPPER 

router [7], which consists of four two-input switch modules arranged into two stages (Fig. 

2b). Each switch module is controlled by an arbitration logic which first, decides the 

winner between two flits, and then, sends the winning flit toward its productive output 

port. The losing flit is directed to the other output of the module. The winner between two 

input flits is determined according to the silver-flit arbitration policy [8]. In this arbitration 

scheme, a single randomly selected flit is designated as a silver flit, i.e. it is prioritized 

above the others. The silver flit always wins in arbitration. The winner between any two 

non-silver flits is decided randomly. 

2.2. Misrouting suppression techniques 

The term misrouting suppression refers to any technique for reducing delay overhead 

incurred by flit deflection in deflection-routed networks [9]. These mechanisms cannot 

cancel flit deflection, which occurs within the PAS stage of the router, but they can 

recognize a deflected flit and force it to temporary stay at its current route position instead 

of making a non-productive hop. The misrouting suppression can be implemented either 

within the deflection router or within the inter-router communication channel. 

 
FRA

FRB

Router
B

Router
A

straight-through loop-back  
a) b) 

Fig. 3 Misrouting suppression techniques: a) in-router misrouting suppression with  

side-buffer, and b) in-channel misrouting suppression with dual-mode channel 


314 I. Z. STOJANOVIC, G. LJ. DJORDJEVIC 

Side-buffering. The side-buffering [8] is an in-router misrouting suppression technique 

which uses a small buffer memory (so-called side buffer) attached to each router to buffer 

some deflected flits that otherwise would be misrouted. The side buffer can be implemented 

either as a single flit-register or as a small-size FIFO (composed of several flit-registers). As 

shown in Fig. 3a, the side buffer (SB) is attached to the deflection router via two additional 

stages: the buffer-eject stage (BE) and the buffer-inject stage (BI). The BE stage recognizes 

deflected flits at the output of the PAS stage, and puts one of them into the side buffer if the 

side buffer is not full. This flit is picked randomly among the deflected flits. The buffered flit 

will be re-ejected through the BI stage in some later clock cycle, when there is a free flit-

channel after flit ejection. 

Previous studies have shown that even adding the smallest side-buffer (1-flit in size) 

can reduce the misrouting rate by 50%, and can improve the maximum network throughput 

by 26% [8]. However, the studies have also shown that the performance improvement of this 

technique does not scale with the increasing side-buffer size because increasing the buffer 

size over 2 flits leads to only marginal performance gain. More importantly, as pointed out in 

[9], the presence of side buffers can cause an imbalance between the injection and ejection 

bandwidth available to PEs in the areas of the network congested with in-transit traffic. This 

occurs because of the arrangement of stages within the side-buffered deflection router, 

which gives injection precedence to the flit residing in the side-buffer over the new flit 

waiting at the PE inject port. When the router is overloaded with in-transit flits, a free flit-

channel appears rarely and is occupied by buffered flit in most cases, leaving the new flit 

to wait for another chance. 

Dual-mode channel. The dual-mode channel is an in-channel misrouting suppression 

technique which prevents some non-productive network hops by forcing deflected flits, 

when possible, to loop-back to their current routers instead of being misrouted [9]. The 

datapath for this design is shown in Fig. 3b. The approach is based on enhancing the inter-

router communication channel with the capability to dynamically (i.e. on a cycle-by-cycle 

basis) switches between two modes of operation. If deflected flits are present on both ends of 

the channel, or one flit is deflected and the other one is absent, then the channel activates the 

loop-back mode (indicated by dotted lines in Fig. 3b). In this mode, the flits are returned 

back to the corresponding input ports of their current routers. Otherwise, the channel is 

configured in the straight-through mode (indicated by dashed lines) allowing both flits to 

make one network hop. With this scheme, a deflected flit will be misrouted only if there is a 

productively-routed flit on the opposite side of the channel. In all other cases, the deflected 

flit will stay at its current route position. It is important to note that the loop-back mechanism 

is transparent for productively-routed flits, which flow as in a network with the conventional 

inter-route channels. 

Our previous simulation results show that this simple in-channel misrouting suppression 

mechanism offers 14.3% performance improvement in terms of maximum network 

throughput when implemented in the baseline deflection-routed NoC [9]. The improvement 

is smaller when comparing with the side-buffering technique, but is accomplished with lower 

implementation cost (i.e. there is no need for additional buffer memory) and without any 

modification to the underlying router microarchitecture. An important advantage of the dual-

mode channel approach over the side-buffering is that it preserves the injection fairness in 

the network. 


 In-channel Misrouting Suppression Technique 315 

3. MISROUTING SUPPRESSION WITH IN-CHANNEL BUFFERING 

The limited misrouting suppression efficiency of the dual-mode channel is a consequence 

of the fact that the channel cannot save a deflected flit from misrouting if a productively-

routed flit is present on the opposite end of the channel. Under high traffic, when the inter-

router channels are almost fully utilized, the loop-back mode can only be activated when 

both ends of the channel are occupied by deflected flits, which occur rarely. In this section 

we propose two techniques to mitigate the performance limitation of the dual-mode channel. 

The first one relates to modifying the routing function of the baseline deflection router with 

goal to increase the frequency of simultaneous appearance of deflected flits at both sides of 

the channel. The second technique deals with adding a small in-channel buffer memory for 

temporary storing deflected flits that cannot be looped-back immediately. 

3.1. Optimized routing function 

According to the results of our simulation experimentation with 2D mesh deflection 

networks under saturated load with uniform random traffic pattern, a deflected flit appears 

at a router’s output port with the probability of δ = 0.3. Assuming that flit deflections 

occur in neighboring routers independent, the probability that both sides of an inter-router 

channel are fed with deflected flits should be δ
2
 = 0.09. However, the simulation results 

show that this probability is actually 0.05. That is, the loop-backs in dual-mode channels 

occur less frequently than would be expected. 

A closer examination of the patterns of inter-router communication reviles that the 

discrepancy between expected and measured loop-back probability is caused by the 

tendency of the misrouted flits to return back to the routers wherein they have suffered 

deflection during the previous clock cycle. Suppose that a flit f is deflected in a router A 

and then misrouted to a router B over channel CAB. Upon arriving at router B, the flit f is 

assigned with at most two productive ports. Because flit f is misrouted, one of its productive 

ports must be the port through which it just has entered the router B. Therefore, during the 

next clock cycle, there is a high chance that flit f will be returned back to the router A over 

the channel CAB, but now as a productively-routed flit, thus forcing the straight-through 

configuration of the dual-mode channel. If happens that router A sends deflected flit to 

channel CAB during the next clock cycle, that flit will be misrouted, too. Thus, the net effect 

of such behavior is that the likelihood of flit misrouting depends on whether a flit sent by the 

same router over the same channel during the previous clock cycle was misrouted or not. 

In order to resolve this performance issue, we slightly modify the routing function of 

the baseline deflection router by restricting the choice of productive ports for misrouted 

flits. In particular, we extend the routing function of the deflection router with the 

following rule:  

Rule 1: Let flit f has entered a router A through the input port T  {N,S,E,W}, and let 

P  {N,S,E,W}be the set of productive output ports for flit f in router A. If the size of P is 

two, then remove T from P. 

Rule 1 only impacts the implementation of the routing stage of a deflection router 

(Fig. 3a). It is applied after the incoming flits are assigned with productive ports. If flit f 

has reached router A by a productive hop, then Rule 1 has no effect on the routing 

decision regarding f because T cannot be in P. Otherwise, if flit f has arrived at router A 


316 I. Z. STOJANOVIC, G. LJ. DJORDJEVIC 

by a misrouting hop, then port T must be in P. In this case, port T will be preserved in P if 

T is the only productive option for f. Otherwise, T is removed form P. Without T in the set 

of its productive ports, flit f will not be intentionally returned back to the previous router, 

unless it is deflected within the PAS stage of router A. It should be noted that Rule 1 does 

not preclude the possibility that a misrouted flit will be returned back to the previous 

router; it only decreases the likelihood of such event to occur. 

3.2. In-channel buffering 

The main motivation for using the in-channel buffers is to decouple the operations of 

the two sides of the dual-mode channel by enabling each side to buffer incoming deflected 

flits which cannot be looped-back immediately. Thus, instead of being misrouted to a 

neighboring router, the buffered deflected flit will be kept at its current route position 

until the condition for looping-back is met. When eventually looped-back to the router 

that has caused its deflection, the flit will get a new chance to continue traveling along a 

productive direction toward its destination. 

The datapath of the proposed inter-router channel with in-built flit-buffers is shown in 

Fig. 4a. In comparison to the dual-mode channel (Fig. 3(b)), the buffered channel contains 

two additional small-sized FIFO sections which parallel direct loop-back paths. With 

FIFOs included, the dual-mode channel is enhanced with several new options on how to 

handle the incoming and buffered flits. As indicated by dotted lines in Fig. 4a, the buffered 

channel can carry out one or more of ten different flit-transfer actions during each clock 

cycle. The choice of the actions depends on the routing statuses of the incoming flits as 

well as the statuses of the two FIFOs. The first set of options is for transferring of an 

incoming flit straight-through to the flit-register on the opposite side of the channel. If the 

incoming flit is productively-routed, this action leads to a productive hop (actions labelled 

as 1A/1B); otherwise, if the flit is deflected, the straight-through transfer causes a misrouting 

hop (2A/2B). The second set of options is those that keep an incoming flit on the same side 

of the channel. The flit loop-back action (3A/3B) allows an incoming flit to bypass the FIFO 

and immediately reach the flit-register (FRA/FRB) on the same side of the channel. The 

incoming flit can also be buffered (4A/4B), and a buffered flit can be looped-back (5A/5B). 

A C-like pseudo code describing the operation of the dual-mode channel with in-built 

buffers is shown in Fig. 4b. Consider the operation of the A-side part of the channel in 

more details. The B-side part operates analogously. The A-side part of the channel can be 

configured in either the straight-through or the loop-back mode. The straight-through 

mode moves the opposite-side flit, fB, into the A-side flit-register, FRA. In the loop-back 

mode, either the A-side incoming flit, fA, or the flit taken form FIFOA is written into the 

FRA. The straight-through mode is prioritized over the loop-back mode, and occurs in two 

distinct situations: when the flit fB is productively-routed (1B), and when the flit fB is 

deflected and must be misrouted (2B). The deflected flit fB is misrouted if there are no other 

options for handling that flit, i.e. the loop-back path of B-side is blocked by a productively-

routed flit fA and the FIFOB is full. Even if the A-side part of the channel is configured in the 

straight-through mode, a deflected flit fA can still be saved from misrouting by storing into 

FIFOA if FIFOA is not full (4A). If the A-side loop-back path is enabled, the flit-register 

FRA receives either a flit from FIFOA (5A) if FIFOA is not empty or an incoming flit fA, if 

that flit is deflected and FIFOA is empty. In the case of buffered loop-back action (5A), 


 In-channel Misrouting Suppression Technique 317 

the incoming flit fA, if deflected, is written into FIFOA (4A). It should be noted that a 

situation where both incoming flits are misrouted is not possible with this scheme. The 

critical case is one where both FIFOs are full, and both incoming flits, fA and fB, are 

deflected. According to the algorithm, in this case, both sides of the channel activate the 

buffered loop-back operation (5A/5B), which enables buffering of both flits (4A/4B) 

regardless of the current FIFOs statuses. 

Out

In
Out

In

(4A) buffering
(3A) flit loopback

(2A) misrouting hop

(1A) productive hop

(5A) 
buffered 
loopback

(4B)
(3B)
(2B)

(1B)

(5B)

fA

fB

FIFOB

FIFOA

FRA

FRB
Side B

Side A

Router
B

Router
A

 
a) 

 Side A:  Side B: 

 if(fB.P ||  
   fB.N && fA.P && FIFOB.Full){ 

 if(fA.P ||  
   fA.N && fB.P && FIFOA.Full){ 

1B/2B    FRA ← fB; 1A/2A    FRB ← fA; 

    if(fA.N && !FIFOA.Full){     if(fB.N && !FIFOB.Full){ 

4A       FIFOA ← fA; 4B       FIFOB ← fB; 

    }     } 

 } else if(!FIFOA.Empty){  } else if(!FIFOB.Empty){ 

5A    FRA ← FIFOA; 5B    FRB ← FIFOB; 

    if(fA.N){     if(fB.N){ 

4A       FIFOA ← fA; 4B       FIFOB ← fB; 

    }     } 

 } else if(fA.N){  } else if(fB.N){ 

3A    FRA ← fA; 3B    FRB ← fB; 

 }  } 

b) 

Fig. 4 Misrouting suppression with in-channel buffering: (a) datapath; (b) pseudo code. 
Notice: f.P is true if flit f is productively-routed; f.N is true if flit f is deflected;  

sign “←” denotes a register transfer operation. 

In-channel buffering vs. side-buffering. The rationale of using in-channel buffering is 

similar to that of using side-buffering – to buffer some deflected flits that otherwise would 

be misrouted. In difference to the side-buffering, which picks and buffers deflected flits 

before they leave the router, the in-channel buffers store deflected flits that have entered 


318 I. Z. STOJANOVIC, G. LJ. DJORDJEVIC 

the inter-router channel but cannot be looped-back immediately. By placing buffers within 

the channels instead of within the routers brings the following advantages. As opposite to 

the side-buffering that can accept up to one deflected flit per router per clock cycle, the 

buffered dual-mode channel can loop-back/store up to two deflected flits at each clock cycle. 

In a 2D mesh network with dimension of NN  , the number of routers is N
2
 and the number 

of inter-router channels is 2N
2 

- 2N. Because the number of inter-router channels is almost 

two times greater than the number of routers, the opportunities to capture deflected flits are 

more frequent with the in-channel buffering than with the side-buffering. Moreover, being 

stored outside the routers, the flits buffered into the in-channel FIFOs re-enter the routers via 

network ports, and consequently they do not block the new flits generated by PE to enter 

the router. In this way, the problem of injection unfairness is avoided. The minimum 

delay overhead of a deflected flit which is buffered into an in-channel FIFO is two clock 

cycles: the first cycle is used for buffering, and the second one for looping-back the 

buffered flit. Although the delay overhead is the same as in the case of misrouting, the in-

channel buffering is still beneficial since the buffered flit does not occupy the resources of 

the neighboring router. 

4. PERFORMANCE EVALUATIONS 

In order to evaluate the performance impact of the proposed misrouting suppression 

technique, we have developed a discrete-event, cycle-accurate simulator for modeling 

deflection-routed NoC using SystemC [14]. It provides support to experiment with 

deflection NoC with various options available, such as network topology and size, router/ 

channel architecture, buffer parameters, and traffic modelling. The simulator provides output 

performance metrics, such as latency, throughput, transport delay, and deflection rate for a 

given set of choices. The main building blocks of the simulator are: 1) processing element, 

2) deflection router, and 3) inter-router channel (IRC). The processing element block 

generates and injects flits into the network according to the user-specified configuration, 

including the traffic pattern and injection rate. It is also responsible for ejecting flits from the 

destination endpoints and collecting appropriate statistics. The router block mimics the 

behavior of the generic non-pipelined deflection router described in Section 2. It can be 

configured in the bufferless mode (i.e. without side-buffer) or the buffered mode (with side-

buffer of configurable size). The configuration options for the IRC block are the following: 

conventional channel (a pair of oppositely oriented flit-registers), dual-mode channel (see 

Fig 3b), and buffered channel (see Fig 4). The simulation results presented in this section 

are obtained for 2D mesh network with size of 88 nodes. The default buffer size in 

buffered architectures was set to 1 flit. Each simulation run was started with a warm-up 

period of 1,000 cycles followed by a measurement period of 20,000 cycles. 

4.1. Performance under saturation load 

The first set of evaluations was carried out in a saturation mode under uniform random 

traffic pattern. In this mode, the transmission queue of each PE is assumed to be always 

nonempty. Under such overloaded conditions, each PE injects a new flit into the network 

in every clock cycle when a free flit-channel is available in the router inject stage. The 


 In-channel Misrouting Suppression Technique 319 

injected flits are destined randomly to other PEs with an equal probability. A summary of 

the results is given Table 1.  

Table 1 Comparison of saturation performance of baseline deflection-routed NoC 

architecture and architectures with misrouting suppression support  

 th td h r r e 

Baseline 0.265 13.216 13.216 0.298 0.298 0 

Side-buffering 0.332 11.016 8.696 0.295 0.143 51.5% 

Dual-mode channel 0.303 11.555 10.889 0.298 0.240 19.36% 

In-channel buffering 0.361 14.541 8.144 0.305 0.145 52.3% 

The details of the performance measures reported in Table 1 are as follow. The 

saturation throughput (th) is defined as the average number of flits received per PE per 

clock cycle. It is the single most important network-level performance indicator, which 

being measured under saturation load provides an absolute limit reached by the 

throughput of a deflection-routed network. The transport delay (td) is the time, measured 

in clock cycles, elapsed from the instant when the source PE injects a flit to the network 

to the instant when the destination PE receives it. Both the saturation throughput and the 

transport delay are correlated with the average hop count (h), which is defined as the 

average number of hops (i.e. channels traverses) a flit takes from source to destination. 

The average hop count accounts for both productive and non-productive (i.e. misrouting) 

inter-router hops. In networks where deflected flits are misrouted more often, the average 

hop count is larger, and consequently the transport delay is longer and throughput is 

lower. Deflections occur within the routers due to inability of PAS stage to grant 

productive ports to all incoming flits. The tendency of the PAS stage to produce deflections 

is measured with the deflection rate (r), which is defined as r = nd  / nr, where nd  is the total 
number of deflected flits, and nr is the total number of flits that are processed by PAS stages 

of all routers during the simulation. Similarly, the misrouting rate (r) is defined as 

r = nm / nr, where nm is the total number of flits that are misrouted after deflection. The 

baseline deflection-routed network misroutes every deflected flit, thereby r = r . With a 
misrouting suppression mechanism implemented, not all deflected flits are misrouted. The 

misrouting suppression efficiency is defined as  e =((r  r) / r)100% . 
The results in Table 1 show that the implementation of misrouting suppression 

techniques brings a significant improvement in saturation throughput over the baseline 

architecture. The dual-mode channel, as the simplest misrouting suppression technique, 

raises the throughput by 14.3% over the baseline, while the improvement reaches 25.3% 

for the side-buffering technique. The highest throughput of 0.361 flits/cycle is achieved 

with the in-channel buffering, which represents an improvement of 36.2% over the 

baseline. 

In the baseline architecture, a flit takes 13.2 inter-router hops on average to reach its 

destinations. Misrouting suppression techniques decrease the average hop count (and thus 

increase the throughput) by temporary holding some of deflected flits at their current 

route positions. This way, in the network with dual-mode channels, the average hop count 

is reduced for 2.33 hops with respect to baseline, while the reduction for 4.52 hops has 

achieved with the side-buffering. As expected, the lowest average hop count of 8.14 hops 


320 I. Z. STOJANOVIC, G. LJ. DJORDJEVIC 

is achieved with the in-channel buffering, which represents a decrease of 5.07 inter-router 

hops per flit (or, 38.4%) with respect to the baseline. 

In the baseline architecture, the transport delay equals the average hop count because 

each hop (either productive or misrouting) takes one clock cycle. In the networks with a 

misrouting suppression support, the transport delay incurred by a flit is the sum of two 

components: the hop count and the time the flit spends blocked by a misrouting suppression 

mechanism. For example, each time the dual-mode channel activates the loop-back mode, it 

adds one clock cycle to the transport delay of the looped-back flits. However, since the loop-

back saves two hops, the total transport delay is lower than in the baseline NoC. In 

difference to the dual-mode channel, deflected flits captured by the side-buffering or in-

channel buffering mechanism may stay buffered at their current route positions for several 

clock cycles before they get a chance to make the next inter-router hop. A closer examination of 

the simulation statistics revealed that flits, while traveling toward their destinations, spend 2.32 

clock cycles in the side-buffers on average, which is low enough to provide a 16.6% lower total 

transport delay than in the baseline network. On the other hand, with the in-channel buffering 

the average buffer delay is 4.85 clock cycles. The high buffer delay is the reason why the 

transport delay with the in-channel buffering is larger than in the baseline architecture, 

despite a significant reduction in hop count. Note that the in-channel buffering achieves a 

high saturation throughput even with a high transport delay. This is because buffered flits 

waiting to be looped-back do not block other flits that could otherwise make forward progress. 

Let note that the transport delay can be reduced by limiting the time (i.e. the number of clock 

cycles) that flits are allowed to spent in in-channel buffers – when the time limit is reached, the 

buffered flit is forced to loop-back, regardless of the routing status of the flit arriving from the 

opposite side of the channel. However, an inevitable consequence of such buffering policy will 

be reduction of the network throughput due to lower utilization of in-channel buffers. For this 

reason, we have excluded this design option from further consideration. 

The results in Table 1 do not show significant difference in deflection rates between the 

baseline and NoCs architectures with the misrouting suppression support. This is because the 

same PAS stage (i.e. permutation network with silver flit arbitration policy) is used in all 

investigated NoC configurations. On the other hand, the misrouting rate depends not only on 

how often flits deflect, but it also depends on how efficiently the misrouting suppression 

mechanism prevents the deflected flits to make misrouting hops. The side-buffering 

technique reduces the misrouting rate by preventing some of deflected flits to leave the 

router. In this way, 51.5% of misrouting hops are prevented. The dual-mode channel uses the 

loop-back mode to return some of deflected flits back to their current routers. With this 

strategy, the dual-mode channel succeeds to prohibit about 19.36% of all deflected flits to 

make misrouting hops without adding extra buffers. By adding buffers into the dual-mode 

channels and optimizing the routing function of the deflection router, the proposed in-

channel buffering technique reaches the misrouting suppression efficiency which is slightly 

higher than that of the side-buffering technique. 

4.2. Injection fairness 

As emphasized out in Section 3, the arrangement of stages within the side-buffered 

deflection router may create injection unfairness in the network, in sense that some PEs get 

to transmit more flits than others. This phenomenon can be best observed in Fig. 5a, which 

shows distribution of the injection rate (i.e. the number of flits injected by each PE per clock 


 In-channel Misrouting Suppression Technique 321 

cycle) over all PEs in the side-buffered deflection NoC under saturated load with uniform 

traffic pattern. As can be seen, the injection rate differences between PEs are significant: 

while corner PEs can inject their flits at almost every cycle, the PEs in the middle of the 

mesh get a chance to inject a flit on every tenth cycle. As shown in Fig. 5b, the in-channel 

buffering provides almost uniform injection rate distribution under the same load conditions. 

This advantage occurs because the in-channel buffering is transparent for the deflection 

router, which treats each incoming flit equally, regardless of whether the flit is looped-back 

by the in-channel misrouting suppression logic or it comes from a neighboring router. 

  
a) b) 

Fig. 5 Injection rate distribution under saturation load in deflection-routed 2D mesh NoCs 

with misrouting suppression support: a) side-buffering; b) in-channel buffering 

4.3. Sensitivity to buffer size 

The second set of simulations deal with the impact of buffer size on the effectiveness of 

the side-buffering and in-channel buffering techniques. Observed form Table 2, although 

increasing the buffer size improves the throughput and misrouting suppression efficiency 

under saturated traffic load, this improvement is relatively small and rapidly saturates. 

Doubling the buffer size from 1-flit to 2-flits increases the saturation throughput by only 

2.71% for side-buffering, and 4.15% for in-channel buffering technique. In addition to high 

hardware cost, the price paid for this throughput improvement is 10% longer transport delay 

for side-buffering, and even 28% longer for in-channel buffering. Further increase of buffer 

size increases the saturation throughput only marginally, while the transport delay continues 

to steadily increase. These results suggest that buffers with size larger than 1 flit increases 

hardware complexity and wastes power without significant performance benefit. 

Table 2 Comparison of saturation performance of baseline deflection-routed NoC 

architecture and architectures with misrouting suppression support  

 Side-buffering In-channel buffering 

Buffer size th td e th td e 
1 flit 0.332 11.016 51.5% 0.361 14.541 52.3% 

2 flits 0.341 12.126 57.2% 0.376 18.613 58.6% 

3 flits 0.344 13.476 59.2% 0.382 22.899 61.2% 

4 flits 0.346 14.915 60.0% 0.386 27.201 62.4% 


322 I. Z. STOJANOVIC, G. LJ. DJORDJEVIC 

4.3. Latency analysis 

Finally, we evaluate the impact of different misrouting suppression schemes on the latency 

performance of deflection-routed network. Latency is defined as the time (in clock cycles) 

since the flit is generated at the source PE until it arrives at the destination PE, including the 

time the flit spends in the source PE’s transmission queue. In these simulations, each PE 

generates flits following Poisson distribution with mean rate λ (λ is also called the average 

flit injection rate for the NoC). Generated flits remain in its queue until they are successfully 

injected to the network. For each network configuration, the flit injection rate is varied from 

zero to the point when the first transmission-queue in the network becomes saturated. 

 
Fig. 6 Latency comparison of baseline deflection NoC architecture and architectures 

with misrouting suppression support under uniform traffic pattern 

Figure 6 contains load-latency graph under uniform traffic pattern. As observed, at low 

injection rates, deflection-routed networks with the misrouting suppression support experience 

almost the same average flit latency as the baseline deflection network. This is because of 

the fact that the network is free from congestion. However, as load in the network increases, 

the effect of misrouting suppression technique adopted becomes more visible. The graph in 

Fig. 6 shows that the proposed in-channel buffering technique significantly improves the 

routing performance by providing low-latency communication at higher injection rates.  

As can be observed in Fig. 6, for every deflection scheme, except for the side-buffering 

technique, the maximum injection rate achieved closely matches the saturation throughput 

reported in Table 1. This is because these schemes provide injection fairness so that all 

transmission queues in the network become saturated at approximately the same injection 

rate. On the other hand, in the side-buffered deflection network, the transmission queues of 

PEs in the middle area of the network become saturated at much lower injection rate than 

those of boundary PEs, leading to early saturation. 


 In-channel Misrouting Suppression Technique 323 

5. CONCLUSIONS 

In this paper, a misrouting suppression technique for deflection-routed networks on 

chip was presented. The presented technique avoids misrouting hops by looping-back or 

capturing deflected flits into small in-channel buffers, immediately after they have appeared 

at router’s output ports. The efficiency of the technique is further improved by modifying the 

routing function of deflection router in a way to prevent misrouted flits to take immediate 

reverse hops. The simulation results show that the proposed schemes improves performance 

of the baseline deflection-routed NoC by 36.2% in terms of saturation throughput. Results 

also show that the misrouting suppression with the in-channel buffering offers higher 

saturation throughput than with the in-router buffering (i.e. side-buffering) although with a 

penalty in terms of hardware cost. Moreover, the performance improvement is achieved 

without incurring injection unfairness among network nodes, which characterizes the side-

buffering approach. 

Acknowledgement: This work was partially supported by the Serbian Ministry of Science 

and Technological Development Project No. TR-33035. 

REFERENCES 

[1] W. J. Dally "Virtual-channel flow control", IEEE Trans. Parallel Distributed Syst., 1992, vol. 3, no. 2, 

pp. 194-205. 

[2] T. Bjerregaard, S. Mahadevan, "A survey of research and practices of network-on-chip", ACM Comput. 

Surv., vol 38, no. 1, 2006. 

[3] A. Kumar, P. Kundu, A. Singh, L. S. Peh and N. Jha, "A 4.6 Tbits/s 3.6 GHz single-cycle NOC router 

with a novel switch allocator in 65 nm CMOS", In Proc. of 25th International Conference on Computer 

Design, ICCD, 2007, pp. 63-70.  

[4] A. Kohler and M. Radetzki, "Fault-tolerant architecture and deflection routing for degradable NoC 

switches", In Proc. of the 3
rd

 IEEE International Symposium on Networks-on-Chip, 2009, pp. 22–31. 

[5] G. Michelogiannakis, D. Sanchez, W.J. Dally, C. Kozyrakis, "Evaluating bufferless flow control for on-

chip networks", In Proc. of the 4
th

 ACM/IEEE Int. Symposium on Networks-on-Chip, 2010, pp. 9-16. 

[6] T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", In Proc. of the 36
th
 

annual international symposium on Computer architecture, ACM, New York, 2009, pp. 196-207. 

[7] C. Fallin, C. Craik and O. Mutlu, "CHIPPER: A low-complexity bufferless deflection router", In Proc. of 

the 17
th
 International Symposium on High Performance Computer Architecture (HPCA), 2011, pp. 144–155.  

[8] C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun and O. Mutlu, "MinBD: Minimally-

Buffered Deflection Routing for Energy-Efficient Interconnect", In Proc. of the 6th IEEE/ACM 

International Symposium on Networks on Chip, 2012, pp. 1-10. 

[9] I. Z. Stojanovic, M. D. Jovanovic and G. Lj. Djordjevic, "Dual-mode inter-router communication 

channel for deflection-routed networks-on-chip", The Journal of Supercomputing, Springer US, 

Published online: March 2015. 

[10] Y. Li, K. Mei, Y. Liu, N. Zheng, Yi Xu, "LDBR: Low-deflection bufferless router for cost-sensitive 

network-on-chip design", Microprocessors and Microsystems, 2014, vol. 38, no. 7, pp. 669-680.  

[11] M. Hayenga, "SCARAB: A single cycle adaptive routing and bufferless network”, In Proc. of the 42
nd

 
Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42), 2009, pp. 244-254. 

[12] J. Jose, B. Nayak, K. Kumar and Muyam M, "DeBAR: Deflection based adaptive router with minimal 

buffering", In Proc. of the Design, Automation & Test in Europe Conference & Exhibition (DATE), 

2013, pp. 1583–1588. 

[13] C. Feng, J. Li, Z. Lu, A. Jantsch, M. Zhang, "Evaluation of Deflection Routing on Various NoC 

Topologies", In Proc. of IEEE 9th International Conference on ASIC (ASICON 2011), pp. 163-166.  

[14] Open SystemC Initiative. SystemC v2.1 Language Reference Manual, 2005. http://www.systemc.org/