Instruction


FACTA UNIVERSITATIS  

Series: Electronics and Energetics Vol. 31, N
o
 1, March 2018, pp. 25 - 39 

https://doi.org/10.2298/FUEE1801025B 

 
COMPARATIVE EVALUATION OF QUASI-DELAY-INSENSITIVE 

ASYNCHRONOUS ADDERS CORRESPONDING TO  

RETURN-TO-ZERO AND RETURN-TO-ONE HANDSHAKING 

 
Padmanabhan Balasubramanian 

School of Electrical and Electronic Engineering, Nanyang Technological University, 

Singapore 

Abstract. This article makes a comparative evaluation of quasi-delay-insensitive (QDI) 

asynchronous adders, realized using the delay-insensitive dual-rail code, which adhere to 

4-phase return-to-zero (RTZ) and 4-phase return-to-one (RTO) handshake protocols. The 

QDI adders realized correspond to the following adder architectures: i) ripple carry 

adder, ii) carry lookahead adder, and iii) carry select adder. The QDI adders correspond 

to three different timing regimes viz. strong-indication, weak-indication, and early output. 

They are physically implemented using a 32/28nm CMOS process. The comparative 

evaluation shows that, overall, QDI adders which correspond to the 4-phase RTO handshake 

protocol are better than the QDI adder counterparts which correspond to the 4-phase RTZ 

handshake protocol in terms of latency, area, and average power dissipation.  

Key words: asynchronous circuits, QDI, adders, indication, standard cells, CMOS 

1. INTRODUCTION 

The International Technology Roadmap for Semiconductors (ITRS 2.0) [1] has 

identified design for variability as one of the key challenges for nanoelectronics. Process 

variability and device variability have assumed more significance in the nanoelectronics 

era compared to the microelectronics era. This is because random dopant and atomistic 

fluctuations, high heat flux, negative bias temperature instability, electro-migration, hot 

carrier effects, stress-induced variation, process-induced defects, electrostatic discharge, 

and metrology and other manufacturing issues have become more prominent in the 

nanoelectronics era compared to the microelectronics era. To overcome these issues, 

solutions are being developed at various levels such as at material-level, process-level, 

device-level, circuit-level, and the system-level [2].    

                                                           
Received September 18, 2017 

Corresponding author: Padmanabhan Balasubramanian  

The author is now with the School of Computer Science and Engineering, Nanyang Technological University, 

50 Nanyang Avenue, Singapore 639798 

(E-mail: balasubramanian@ntu.edu.sg) 


26 P. BALASUBRAMANIAN 

 
At the circuit-level, the QDI
1
 asynchronous design method [3] employing delay-

insensitive code(s) for data representation and processing and a 4-phase handshake protocol 

for data communication is considered to be robust and is construed to be a viable alternative 

to the synchronous design method [4]. This is because QDI circuits encompass several 

advantages [5] such as low power [6 – 9], tolerance to noise and electromagnetic 

interference [10 – 12], ability to withstand process, voltage and temperature variations [13] 

[14], self-checking [15], resistant to side channel attacks in the case of secure applications 

[16 – 19] etc.   

In general, QDI circuits widely employ the delay-insensitive dual-rail data encoding 

and the 4-phase RTZ handshaking [20]. However, a new 4-phase RTO handshake 

protocol was proposed [21] for QDI circuits. Based on a few case studies [22] [23], it was 

reported that QDI circuits which correspond to the RTO protocol report better design 

metrics than their QDI circuit counterparts adhering to the RTZ protocol. QDI circuits 

performing data transactions based on either the RTZ or the RTO protocol are robust. 

QDI circuits and systems are guaranteed to be correct by construction since they adopt 

unbounded delay models for gates and wires, with the only exception of isochronic forks
2
 

[24] which represent the weakest compromise to delay-insensitivity.    

In this work, the adder which forms an important datapath of any processing unit is 

considered for the analysis to compare the efficiency of the RTO protocol versus the RTZ 

protocol. Various adder architectures such as the ripple carry adder (RCA), the carry 

lookahead adder (CLA), and the carry select adder (CSLA) are considered for QDI 

implementation based on the RTZ and RTO protocols to perform a comprehensive 

comparative evaluation. This work builds upon [25], wherein only the RCA architecture 

was considered to comparatively evaluate the RTZ and RTO protocols.         

The rest of this article is organized as follows. Section 2 provides an overview of: i) 

QDI circuit operation encompassing delay-insensitive data encoding and data transaction 

using the RTZ and RTO handshake protocols, and ii) the types of QDI circuits, their 

timing characteristics, and their general properties. Section 3 presents the logic rules for 

transforming QDI circuits corresponding to the RTZ protocol into QDI circuits adhering 

to the RTO protocol and vice-versa. Also, some circuit illustrations are provided in this 

section. Section 4 presents the simulation results corresponding to several 32-bit QDI 

RCAs, CLAs, and CSLAs, implemented using delay-insensitive dual-rail data encoding 

and adhering to RTZ and RTO handshaking. The QDI adders realized correspond to 

strong-indication, weak-indication, and early output. Section 5 provides the conclusions.  

2. QDI CIRCUIT OPERATION, TYPES AND PROPERTIES 

2.1. Operation of QDI Circuit 

The architecture of a QDI circuit is correlated with the sender (SX) and receiver (RX) 

analogy in Figure 1a. The QDI circuit is sandwiched between the current stage and the 

                                                           
1
 QDI design represents a robust flavor of asynchronous circuit design methods. QDI circuits are the practically 

realizable delay-insensitive asynchronous circuits.   
2
 An isochronic fork implies that the up-going or down-going signal transitions on all the ends of the fork are 

assumed to be concurrent.  


 Comparative Evaluation of Quasi-Delay-Insensitive Asynchronous Adders... 27 

 
next stage register banks. A register in a QDI design is a 2-input C-element that is 

represented by the circle with the marking C in the figures. The C-element outputs binary 

1 or 0 only if all its inputs are binary 1 or 0 respectively and would maintain its existing 

steady-state even if any of its inputs is different.  

 
QDI circuit

Current 

stage 

register

Next 

stage 

register

Completion 

detector (CD)

ACKOUT

ACKIN

Completion 

detector (CD)

ACKOUT

ACKIN

ACKOUT

ACKOUT

X1

X0

Sender (SX) Receiver (RX)

ACKIN

QDI circuit

X1

X0

Y1

Y0

Z1

Z0

C

(b)

X1

X0

Y1

Y0

Z1

Z0

C

(c)

CDRTZ CDRTO

(a)

Z1

Z0

Y1

Y0

 
Fig. 1 (a) an asynchronous circuit, correlated with the sender-receiver analogy for 

illustration. Completion detectors corresponding to (b) the RTZ handshake 

protocol, and (c) the RTO handshake protocol.  

 
A single-rail data wire X is encoded using the dual-rail code [26] into two data wires 

as X1 and X0. Based on the RTZ protocol [20], the data X = 1 is represented by X1 = 1 

and X0 = 0, and the data X = 0 is represented by X0 = 1 and X1 = 0. X1 = X0 = 0 

represents the spacer. X1 = X0 = 1 is invalid since the coding scheme is unordered [27] 

and where no code word is allowed to be a subset of another code word. According to the 

RTZ protocol, the application of primary inputs to a QDI circuit should follow the 

sequence: data-spacer-data-spacer, and so forth, with each input data followed by the RTZ 

of the encoded data wires. Note that binary 1 is used to represent data with respect to the 

RTZ protocol. On the other hand, according to the RTO protocol [21], binary 0 is used to 


28 P. BALASUBRAMANIAN 

 
represent data. As per the RTO protocol, the valid data Y = 1 is represented by Y1 = 0 

and Y0 = 1, and Y = 0 is represented by Y0 = 0 and Y1 = 1. The spacer is represented by 

Y0 = Y1 = 1. Y1 = Y0 = 0 is deemed invalid since the coding scheme is unordered. As 

per the RTO protocol, the application of primary inputs to a QDI circuit follows the 

sequence: spacer-data-spacer-data, and so forth, with each input data followed by the 

RTO of the encoded data wires.  

The 4-phase handshake protocol, whether it is RTZ or RTO, consists of four phases 

which will be explained with reference to Figure 1a by considering dual-rail encoded data. 

However, the explanation would be applicable for data represented using any delay-

insensitive 1-of-n code [26]. As per the RTZ protocol, in the first phase, the dual-rail data 

bus shown in Figure 1a which is specified by (X1, X0), (Y1, Y0), and (Z1, Z0) is in the 

spacer state, and ACKIN is high. SX transmits data and this results in rising signal 

transitions on anyone of the corresponding dual rails of the entire dual-rail data bus. In the 

second phase, RX receives the data sent, and it drives ACKOUT high. In the third phase, SX 

waits for ACKIN to go low and then resets the entire dual-rail data bus to the spacer state i.e. 

all 0s. In the fourth phase, after an unbounded but a finite and positive time, RX would drive 

ACKOUT low i.e. ACKIN becomes high. With this one data transaction is said to be 

complete, and the asynchronous circuit is ready to proceed with the next data transaction. An 

example completion detector, which comprises the dual-rail encoded primary inputs (X1, 

X0), (Y1, Y0), and (Z1, Z0), that indicates or acknowledges the receipt of data and the all 

zeroes spacer on the primary inputs through its output CDRTZ is illustrated in Figure 1b. 

The completion detector shown in Figure 1b corresponds to the RTZ protocol.  

With respect to the RTO handshake protocol, in the first phase, ACKIN is 1. SX 

would transmit the spacer i.e. all 1s, and this causes rising signal transitions on all the rails 

of the dual-rail data bus. In the second phase, RX receives the spacer sent, and it drives 

ACKOUT high. In the third phase, TX waits for ACKIN to assume 0 and then sends the 

input data by resetting any one of the corresponding dual-rails of the entire dual-rail data 

bus. Then in the fourth phase, after an unbounded but a finite and positive time, RX 

would drive ACKOUT low i.e. ACKIN becomes high. With this one data transaction is 

said to be complete, and the QDI circuit is ready to commence the next data transaction. 

An example completion detector that comprises the dual-rail encoded primary inputs (X1, 

X0), (Y1, Y0), and (Z1, Z0), which indicates the receipt of data and the all ones spacer on 

the primary inputs through its output CDRTO is depicted by Figure 1c. This completion 

detector corresponds to the RTO protocol.  

2.2. Types of QDI Circuits  

QDI circuits are classified as strongly indicating, weakly indicating, and early output 

types [28]. A strong-indication QDI circuit [29] [30] waits to receive all the primary 

inputs, whether they are data or spacer, and then starts data processing to produce the 

required primary outputs. A weak-indication QDI circuit [29] [31] would produce some 

of the primary outputs after receiving a subset of the primary inputs. However, the 

production of at least one primary output is delayed till the last primary input is received. 

An early output QDI circuit [32] [33] is the most relaxed of the three in that it is able to 

produce all the primary outputs after receiving a subset of the primary inputs. If an early 

output QDI circuit produces data early, it is said to be of early set type, and if an early 

output QDI circuit assumes the spacer state early, it is said to be of early reset type. The 


 Comparative Evaluation of Quasi-Delay-Insensitive Asynchronous Adders... 29 

 
input-output timing behaviour of strong-indication, weak-indication, and early output QDI 

circuits is captured by Figure 2. The early set and reset behaviours are shown in Figure 2.  

 
Inputs 

arrival

All

None

All

None

Outputs 

production
Strong-indication

All

None

Outputs 

production
Weak-indication

All

None

Outputs 

production
Early output

Valid 

data 

arriving

Spacer 

data 

arriving

Valid 

data 

arrived

Spacer 

data 

arrived

Early set

behaviour

Early reset

behaviour  
 

Fig. 2 Input-output timing characteristic of strong-indication,  

weak-indication, and early output QDI circuits          

2.3. General Properties of QDI Circuits 

QDI circuits, regardless of whether they are strongly indicating or weakly indicating 

or early output type, have some properties in common. Firstly, QDI circuits should be free 

of wire and gate orphans [34] [35]. A wire orphan refers to an unacknowledged signal 

transition on a wire. The wire orphan problem, if any, can be resolved through the 

isochronic fork assumption. A gate orphan is an unacknowledged signal transition on an 

intermediate gate output. The gate orphan problem is difficult to resolve and to overcome 

it, sophisticated timing assumption(s) might be required. Secondly, QDI circuits tend to 

satisfy the monotonic cover constraint [16], which implies the activation of a unique 

signal path from a primary input to a primary output for each input data applied. The 

monotonic cover constraint is implicit in a disjoint sum-of-products expression [36], 

which is used to synthesize a QDI circuit. In a disjoint sum-of-products expression, the 

product terms are mutually orthogonal, i.e. the logical conjunction of any two product 

terms in a disjoint sum-of-products expression yields null [37 – 39]. Thirdly, the signal 


30 P. BALASUBRAMANIAN 

 
transitions ripple monotonically [40] from the first logic level up to the last logic level in 

a QDI circuit [41]. The transitions either increase or decrease monotonically. For a QDI 

circuit that adheres to the RTZ protocol, for the application of data, the transitions would 

increase monotonically and for the application of spacer, the transitions would decrease 

monotonically. On the contrary, for a QDI circuit adhering to the RTO protocol, for the 

application of spacer, the transitions would increase monotonically, and for the 

application of data, the transitions would decrease monotonically throughout the circuit.  

It is important to ascertain the type of a QDI circuit when it is composed using many 

QDI sub-circuits, as is common in the design of QDI arithmetic circuits. In general, a 

cascade of strong-indication or weak-indication or early output QDI sub-circuits yields a 

strong-indication or a weak-indication or an early output QDI circuit respectively. 

Sometimes there might be an exception when composing early output QDI sub-circuits. For 

example, it was noted in [42] [43] that a cascade of early output QDI full adders led to a 

relative-timed RCA, whereas in [33] [44] a cascade of early output QDI full adders led to an 

early output RCA. This might be because in terms of robustness, the strong-indication timing 

model tops the hierarchy followed by the weak-indication timing model, which is succeeded 

by the early output timing model. The relative-timing model is not QDI and is the least 

robust of the asynchronous timing models described. Relative-timed asynchronous circuits 

[45] require explicit and perhaps complicated timing assumptions to ensure their safe 

operation but could exhibit more optimized design metrics compared to the QDI circuits. 

Hence, in the case of relative-timing, the robustness is traded off for greater design 

optimization [46]. Further, a cascade of QDI sub-circuits with more robust and less robust 

timing models generally causes the least robust timing model to be ascribed to the resultant 

QDI circuit. For example, a cascade of strong-indication and weak-indication QDI sub-

circuits leads to a weak-indication QDI circuit. A cascade of strong-indication and/or weak-

indication QDI sub-circuits and early output QDI sub-circuit(s) leads to an early output QDI 

circuit.     

3. LOGIC RULES FOR RTZ TO RTO AND VICE-VERSA PROTOCOL CONVERSION 

  QDI circuits, regardless of whether they correspond to the RTZ or the RTO protocol, 

when physically realized, generally consist of C-elements
3
 and simple and complex logic 

gates. Any C-elements used in a QDI circuit, whether they correspond to the RTZ or the 

RTO protocol, would remain unchanged and their inputs would also be unchanged when 

transforming a QDI circuit which adheres to the RTZ protocol into a QDI circuit which 

corresponds to the RTO protocol and vice-versa. The logic transformation rules to be 

discussed below, which could facilitate the RTZ to RTO and vice-versa protocol 

conversion are applicable only to the discrete and complex logic gates comprising the 

respective circuits and excludes any C-elements. The logic transformation rules for the 

handshake protocols conversion tend to obey the well-established duality principle of 

Boolean algebra. The duality principle [47] states that every algebraic expression that is 

deduced using the postulates of Boolean algebra remains valid if the logical operators and 

                                                           
3
 The C-element outputs binary 1 or 0 only when all its inputs are binary 1 or 0. If any of its inputs is different, 

the C-element would maintain its existing steady-state. The C-element is portrayed by an AND gate with the 

marking „C‟ on its periphery.   


 Comparative Evaluation of Quasi-Delay-Insensitive Asynchronous Adders... 31 

 
identity elements are interchanged. Herein, it implies that for the RTZ to RTO protocol 

conversion the AND operator should be replaced by the OR operator and the OR operator 

should be replaced by the AND operator; the reverse is applicable for the RTO to RTZ 

protocol conversion. An example set of logic transformation rules for the handshake 

protocols conversion and their proofs by induction are provided below. These rules may 

be extended without any loss of generality depending upon a QDI circuit composition.  

 RTZ: P + Q ↔ RTO: PQ   (1) 

 RTZ: P + QR ↔ RTO: P (Q + R)  (2) 

 RTZ: PQ + RS ↔ RTO: (P + Q) (R + S)  (3) 

The function (P + Q) corresponding to the RTZ protocol, given in (1), is implemented 

using a 2-input OR gate, and the RTO equivalent viz. PQ is implemented using a 2-input 

AND gate. Table 1 shows the proof by induction for (1). The 2-input OR and AND gates 

are simple logic gates present in a standard digital cell library [48]. The function (P + QR) 

corresponding to the RTZ protocol, given in (2), can be implemented using the AO21 

gate and its RTO equivalent viz. P (Q + R) can be implemented using the OA21 gate. 

Table 2 shows the proof by induction for (2). The function (PQ + RS) corresponding to 

the RTZ protocol, given in (3), can be implemented using the AO22 gate and the RTO 

equivalent i.e. (P + Q) (R + S) can be implemented using the OA22 gate. Table 3 shows 

the proof by induction for (3). The AO21, OA21, AO22 and OA22 gates are complex 

logic gates present in a standard digital cell library [48].  

Table 1 Proof by induction for (1) 

Inputs RTZ RTO 

P Q P + Q PQ 

0 0 0 0 

0 1 1 0 

1 0 1 0 

1 1 1 1 

Recall that binary 1 is used to represent the data with respect to the RTZ protocol and 

binary 0 is used to represent the data with respect to the RTO protocol after data 

encoding. This is conformance with the duality property of Boolean algebra, which states 

that identity elements can be interchanged [47]. As mentioned in Section 2.1, the zeroes 

spacer is used in the case of the RTZ protocol and the ones spacer is used in the case of 

the RTO protocol. Given these, it can be seen from Table 1 that if the input P or Q is 1, 

which indicates the data with respect to the RTZ protocol, (P + Q) would yield 1, and 

when P and Q are 0 then (P + Q) would yield 0 indicating the RTZ state. On the other 

hand, if either P or Q is 0 in Table 1, which indicates the data based on the RTO protocol, 

PQ would evaluate to 0, and when P and Q are 1, PQ would evaluate to 1, which indicates 

the RTO state.  

In Tables 2 and 3, sub-functions are additionally introduced for the sakes of clarity 

and illustration. In the case of Table 2, if P or QR is 1, then (P + QR) evaluates to 1 

signifying the data according to the RTZ protocol, and if P and QR are 0, then (P + QR) 


32 P. BALASUBRAMANIAN 

 
evaluates to 0 signifying the RTZ state. If P or (Q + R) is 0, then P (Q + R) evaluates to 0 

signifying the data according to the RTO protocol, and if P and (Q + R) are 1, then P (Q + 

R) evaluates to 1 signifying the RTO state. With respect to Table 3, if PQ or RS is 1, then 

(PQ + RS) evaluates to 1 signifying the data as per the RTZ protocol, and if PQ and RS 

are 0, then (PQ + RS) evaluates to 0 signifying the RTZ state. However, if (P + Q) or (R + 

S) is 0, then (P + Q) (R + S) evaluates to 0 signifying the data as per the RTO protocol. 

Supposing (P + Q) and (R + S) are 1, then (P + Q) (R + S) would evaluate to 1 signifying 

the RTO state.    

Table 2 Proof by induction for (2) 

Inputs RTZ sub-function RTZ RTO sub-function RTO 

P Q R QR P + QR Q + R P (Q + R) 

0 0 0 0 0 0 0 

0 0 1 0 0 1 0 

0 1 0 0 0 1 0 

0 1 1 1 1 1 0 

1 0 0 0 1 0 0 

1 0 1 0 1 1 1 

1 1 0 0 1 1 1 

1 1 1 1 1 1 1 

Table 3 Proof by induction for (3) 

Inputs RTZ sub-functions RTZ RTO sub-functions RTO 

P Q R S PQ RS PQ + RS P + Q R + S (P + Q) (R + S) 

0 0 0 0 0 0 0 0 0 0 

0 0 0 1 0 0 0 0 1 0 

0 0 1 0 0 0 0 0 1 0 

0 0 1 1 0 1 1 0 1 0 

0 1 0 0 0 0 0 1 0 0 

0 1 0 1 0 0 0 1 1 1 

0 1 1 0 0 0 0 1 1 1 

0 1 1 1 0 1 1 1 1 1 

1 0 0 0 0 0 0 1 0 0 

1 0 0 1 0 0 0 1 1 1 

1 0 1 0 0 0 0 1 1 1 

1 0 1 1 0 1 1 1 1 1 

1 1 0 0 1 0 1 1 0 0 

1 1 0 1 1 0 1 1 1 1 

1 1 1 0 1 0 1 1 1 1 

1 1 1 1 1 1 1 1 1 1 

Example circuits to illustrate the conversion from RTZ to RTO protocol and vice-

versa are shown in Figure 3.   


 Comparative Evaluation of Quasi-Delay-Insensitive Asynchronous Adders... 33 

 
A0

CIN1

COUT1

CIN1

CIN0

CIN0

CIN0

CIN1

SUM0

COUT0

B1

A1
B0

A0
B0

A1
B1

SUM1

(a)

C

C

C

C

C

C

C

C

C

C

A0

CIN1

COUT1

CIN1

CIN0

CIN0

CIN0

CIN1

SUM0

COUT0

B1

A1
B0

A0
B0

A1
B1

SUM1

C

C

C

C

C

C

C

C

C

C

(b)

A0 CIN0

CIN1

B1

A1
B0

A0 CIN1

CIN0

B0

A1
B1

SUM1

SUM0

CIN1

CIN0

COUT1

(c)

C

C

C

C

C

C

C

C

COUT0

A0 CIN0

CIN1

B1

A1
B0

A0 CIN1

CIN0

B0

A1
B1

SUM1

SUM0

CIN1

CIN0

COUT1

(d)

C

C

C

C

C

C

C

C

COUT0

A0
B0

A1
B1

CIN1

CIN0

SUM0

SUM1

CIN1

A1
B1

COUT1

CIN0

A0
B0

COUT0

(e)

C

C

A0
B1

A1
B0

CIN1

CIN0

C

C

A0
B0

A1
B1

CIN1

CIN0

SUM0

SUM1

COUT1

(f)

C

C

CIN1

CIN0

C

C

A0
B1

A1
B0

A1
B1

A0
B0

COUT0

CIN1

CIN0

 
Fig 3 Strongly indicating full adder [50] corresponding to (a) RTZ handshaking and 

(b) RTO handshaking; Weakly indicating full adder [51] corresponding to (c) RTZ 

handshaking and (d) RTO handshaking; Early output full adder [33], corresponding 

to (e) RTZ handshaking (early reset type) and (f) RTO handshaking (early set type)  

Figure 3 portrays strong-indication, weak-indication and early output implementations 

of the full adder. The full adder adds an augend and an addend along with a carry input 


34 P. BALASUBRAMANIAN 

 
and produces the sum output and any carry overflow. In Figure 3, (A1, A0), (B1, B0) and 

(CIN1, CIN0) represent the dual-rail augend, addend and carry inputs of the full adders, 

and (SUM1, SUM0) and (COUT1, COUT0) represent the dual-rail sum and carry 

outputs. Figures 3a, 3c and 3e depict full adder implementations which correspond to the 

RTZ protocol, and Figures 3b, 3d and 3f show the respective full adder realizations which 

correspond to the RTO protocol. The 2-input OR gates, AO21 gates and AO22 gates of 

Figures 3a, 3c and 3e are replaced by 2-input AND gates, OA21 gates and OA22 gates 

respectively in Figures 3b, 3d and 3f, in accordance with (1), (2) and (3), given earlier. 

Note that there is no change whatsoever in the inputs or outputs of the corresponding 

circuits belonging to the RTZ and the RTO protocols in Figure 3. Moreover, the 2-input 

C-elements and their corresponding inputs remain unchanged.  

4. SIMULATION RESULTS 

 Several 32-bit QDI RCAs [49 – 55], CLAs [56] [57] and CSLAs [58] were semi-

custom realized using the standard digital library cells of a 32/28nm CMOS process [48]. 

The 2-input C-element was alone custom realized by modifying the AO222 complex gate 

by introducing feedback. The 2-input C-element was realized using 12 transistors and was 

made available to implement the various QDI adders, which correspond to RTZ and RTO 

protocols. Any high-input C-element functionality, wherever likely in an adder design, 

was safely decomposed in QDI style [59] to avoid the problem of gate orphans.  

About 1000 random input vectors were identically supplied to all the QDI adders 

through a test bench at time intervals of 20ns to perform the functional simulations and to 

capture their respective switching activities. The value change dump (.vcd) files generated 

through the functional simulations were used to estimate the average power dissipation. 

The worst-case (forward) latency, i.e. the critical path delay and the area of the QDI 

adders were also estimated. A default wire load model was considered while estimating 

the design metrics to include the effect of parasitic in the simulations. The design metrics 

viz. latency, area, and average power dissipation, estimated for the various QDI adders, 

which correspond to the RTZ and RTO protocols are given in Table 4. The input registers 

and the completion detectors of the various QDI adders corresponding to the RTZ and 

RTO protocols are respectively identical. So the differences between their design metrics 

can be attributed to the respective differences between their function blocks.    

Before discussing the results, it should be noted that the focus of this article is not to 

comment on the efficiency of various adder architectures or about the type of the adders 

with relation to latency or area or power optimization, and these have already been 

discussed in the published literature. Rather, the intent of this article is to provide a 

comparison between the design metrics of different QDI adders based on their realization 

using RTZ and RTO protocols, and eventually to arrive at a general conclusion regarding 

which of these handshake protocols is more preferable to potentially achieve enhanced 

optimizations in the design metrics regardless of the extent of optimization achievable. 

The improvements in the design metrics which may be achieved by one protocol over the 

other could in part be explained as due to the differences in the implementation. However, 

the extent of optimizations in the design metrics achievable would also depend on the 

digital cell library targeted, the technology node and the PVT corner chosen to perform 


 Comparative Evaluation of Quasi-Delay-Insensitive Asynchronous Adders... 35 

 
the simulations. Thus the results given in Table 4 are to be used as a reference to guide 

the choice of a 4-phase handshake protocol for the effective design of QDI circuits.  

     
Table 4 Design metrics of 32-bit QDI adders corresponding to RTZ and RTO protocols, 

estimated using Synopsys tools based on implementation using a 32/28nm 

CMOS process 

 
QDI adder reference;  

and adder type 

4-phase RTZ  

handshake protocol 

4-phase RTO  

handshake protocol 

Latency 

(ns) 

Area 

(µm
2
) 

Power 

(µW) 

Latency 

(ns) 

Area 

(µm
2
) 

Power 

(µW) 

RCAs 

[49]; SI 14.61 2529 2190 14.15 2529 2185 

[52]; SI 9.26 2504.60 2181 8.74 2374.48 2167 

[50]; SI 9.04 2293.14 2172 8.88 2293.15 2168 

[52]; WI  8.24 2423.27 2177 8.03 2358.21 2167 

[53]; WI 7      2016.63 2171 6.95 2016.63 2167 

[54]; WI 9.66 2642.85 2192 9.66 2642.85 2191 

[55]; WI 4.43 2097.96 2174 3.79 2097.96 2170 

[51]; WI 3.32 2049.16 2171 3.31 2049.16 2167 

[33]; EO 3.10 1658.80 2161 2.93 1658.80 2157 

[44]; EO 2.14 2436.48 2173 2.13 2649.96 2176 

CLAs 

[56], [55]; WI: Regular 3.31 2951.88 2191 3.19 2984.41 2184 

[56], [55]; WI: Hybrid 3.08 2845.14 2189 2.97 2873.61 2182 

[56], [55]; WI: Regular with alias logic 2.46 2992.55 2192 2.36 3025.08 2185 

[56], [55]; WI: Hybrid with alias logic 2.38 2880.72 2190 2.29 2909.19 2183 

[56], [51]; WI: Regular 3.14 2915.29 2188 3.10 2947.82 2182 

[56], [51]; WI: Hybrid 2.93 2807.02 2186 2.89 2835.49 2180 

[56], [51]; WI: Regular with alias logic 2.32 2955.95 2190 2.30 2988.48 2183 

[56], [51]; WI: Hybrid with alias logic 2.25 2842.60 2187 2.22 2871.07 2181 

[57]; EO: Regular 2.75 2569.65 2177 2.73 2553.39 2169 

[57]; EO: Hybrid  2.53 2455.80 2175 2.51 2441.56 2167 

CSLAs 

[58] – [33], [60]; EO: Non-uniform 3.23 3384.44 2312 3.15 3384.44 2303 

[58] – [33], [60]; EO: Uniform 2.46 3000.17 2293 2.38 3000.17 2285 

Legends used: SI – Strong-indication; WI – Weak-indication; EO – Early output 

Hybrid CLAs incorporate a 4-bit least significant RCA, which improves the design metrics of Regular CLAs 

Overall, it can be observed from Table 4 that the QDI adders based on the RTO 

protocol feature less latency (and hence less cycle time) and power dissipation and 

occupy almost the same area than their QDI adder counterparts based on the RTZ 

protocol. The completion detector of a QDI circuit corresponding to the RTZ protocol 

consists of a series of 2-input OR gates whose outputs are synchronized by a C-element 

tree. On the other hand, the completion detector of a QDI circuit adhering to the RTO 


36 P. BALASUBRAMANIAN 

 
protocol comprises a series of 2-input AND gates whose outputs are synchronized by a 

tree of C-elements. Further, any 2-input OR gates present in the functional block(s) of a 

QDI adder corresponding to the RTZ protocol would be replaced by 2-input AND gates 

in the functional block(s) of a QDI adder counterpart adhering to the RTO protocol. In 

static CMOS implementations, it is well known that the OR gate is more expensive than 

the AND gate in terms of delay, area, and power dissipation [61] due to the series 

stacking of pMOS transistors in the pull-up network of the former contrary to the parallel 

stacking of pMOS transistors in the pull-up network of the latter. Hence the use of 2-input 

AND gates instead of 2-input OR gates in the QDI adders and their respective completion 

detectors implies better optimized design metrics can be expected for the RTO protocol 

compared to the RTZ protocol.  

In Table 4, it can be noticed that in some scenarios the areas of the QDI adders 

corresponding to the RTZ and RTO protocols are the same. For examples, the non-

uniform 32-bit CSLA with the input partition of 8-7-6-4-3-2-2 and the uniform 32-bit 

CSLA with the input partition of 8-8-8-8 occupy similar areas with respect to both the 

handshake protocols. The non-uniform and uniform QDI CSLAs, highlighted in Table 4, 

are constructed using the early output full adder of [33], and the strongly indicating 2:1 

multiplexer (MUX) of [60]. With respect to the RTZ protocol, the early output full adder 

of [33] consists of four AO22 gates, four 2-input C-elements and two 2-input OR gates, as 

shown in Figure 3e. Based on the RTO protocol, the early output full adder of [33] would 

comprise four OA22 gates, four 2-input C-elements and two 2-input AND gates, as shown 

in Figure 3f. The strongly indicating 2:1 MUX design of [60], which is called SIDCO, 

requires seven 2-input C-elements and four 2-input OR gates for realization based on the 

RTZ protocol. On the other hand, for implementation based on the RTO protocol, the 

strongly indicating 2:1 MUX design would require seven 2-input C-elements and four 2-

input AND gates. The AO22 and OA22 gates of the digital cell library [48] have the same 

area of 2.54µm
2
, and the 2-input OR gate and the 2-input AND gate occupy the same area 

of 2.03µm
2
. As a result, the areas of the full adder and the 2:1 MUX of a QDI CSLA 

would be the same regardless of the handshake protocol adopted. This explains why the 

non-uniform and uniform QDI CSLAs in Table 4 feature the same area with respect to 

both RTZ and RTO protocols. Although the areas of AO22 and OA22 gates, and the areas 

of the 2-input OR gate and the 2-input AND gate are the same in [48], their corresponding 

delay and power dissipation values are different. This is the reason why the QDI CSLAs 

based on the RTO protocol have less latency and power dissipation than the QDI CSLAs 

based on the RTZ protocol, as seen in Table 4.   

Having similar cell areas for the dual logic gates viz. OR and AND, AO21 and OA21, 

AO22 and OA22 etc. in [48] is rather uncommon in the case of commercial standard cell 

libraries. The standard digital cell library [48] does not have foundry support and is meant 

for use for academic teaching and research. Hence, it may be safely hypothesized that if a 

commercial digital cell library is used for the physical implementation of the QDI adders 

given in Table 4, then the RTO protocol would facilitate higher percentage optimizations 

in the design metrics than the RTZ protocol and therefore the improvements in the design 

metrics reported in Table 4 would tend to serve as a baseline.    


 Comparative Evaluation of Quasi-Delay-Insensitive Asynchronous Adders... 37 

 
5. CONCLUSIONS 

 This article discussed the implementation of various QDI adders, which correspond to 

diverse architectures and timing regimes by utilizing the delay-insensitive dual-rail code, 

based on the 4-phase RTZ and RTO handshake protocols. The logic transformation rules 

governing the circuit conversions between RTZ and RTO protocols were presented, and 

their proofs by induction were also provided. The simulations were performed by using a 

32/28nm CMOS process. The simulation results show that QDI adders corresponding to 

the RTO protocol generically feature improved design parameters than the QDI adder 

counterparts which adhere to the RTZ protocol. Hence it is concluded that the 4-phase 

RTO protocol is potentially more efficient than the 4-phase RTZ protocol to implement 

handshaking in QDI asynchronous (arithmetic) circuits.                        

REFERENCES 

[1] ITRS design report. Available: http://www.itrs2.net 

[2] S. Kundu and A. Sreedhar, Nanoscale CMOS VLSI Circuits: Design for Manufacturability, McGraw-

Hill, New York, USA, 2010.  

[3] A.J. Martin, S.M. Burns, T.K. Lee, D. Borkovic and P.J. Hazewindus, “The first asynchronous 

microprocessor: the test results,” ACM SIGARCH Computer Architecture News, vol. 17, pp. 95-98, 1989. 

[4] A.J. Martin and M. Nystrom, “Asynchronous techniques for system-on-chip design,” Proceedings of the 

IEEE, vol. 94, pp. 1089-1120, 2006.  

[5] C.H. Van Kees Berkel, M.B. Josephs and S.M. Nowick, “Scanning the technology applications of 

asynchronous circuits”, Proceedings of the IEEE, vol. 87, pp. 223-233, 1999.  

[6] S.B. Furber, D.A. Edwards and J.D. Garside, “AMULET3: a 100 MIPS asynchronous embedded 

processor,” In Proceedings of the International Conference on Computer Design, pp. 329-334, 2000.  

[7] L. Necchi, L. Lavagno, D. Pandini and L. Vanzago, “An ultra-low energy asynchronous processor for 

wireless sensor networks,” In Proceedings of the 12
th

 IEEE International Symposium on Asynchronous 

Circuits and Systems, 2006, pp. 1-8.   

[8] B.Z. Tang and F. Lane, “Low power QDI asynchronous FFT,” In Proceedings of the 22
nd

 IEEE 

International Symposium on Asynchronous Circuits and Systems, 2016, pp. 87-88.  

[9] W. Jiang, D. Bertozzi, G. Miorandi, S.M. Nowick, W. Burleson and G. Sadowski, “An asynchronous NoC 

router in a 14nm FinFET library: comparison to an industrial synchronous counterpart,” In Proceedings of 

the Design, Automation and Test in Europe Conference and Exhibition, 2017, pp. 732-733.  

[10] N.C. Paver, P. Day, C. Farnsworth, D.L. Jackson, W.A. Lien and J. Liu, “A low-power, low noise, 

configurable self-timed DSP”, In Proceedings of the 4
th

 International Symposium on Advanced Research 

in Asynchronous Circuits and Systems, pp. 32-42, 1998.  

[11] A.J. Martin and M. Nystrom, “Asynchronous techniques for noise tolerant nanoelectronics,” Technical 

Report Situs-TR-04-01, Situs Logic, Pasadena, CA, USA, 2004.  

[12] G.F. Bouesse, G. Sicard, A. Baixas and M. Renaudin, “Quasi delay insensitive asynchronous circuits for 

low EMI”, In Proceedings of the 4
th

 International Workshop on Electromagnetic Compatibility of 

Integrated Circuits, 2004, pp. 27-31.  

[13] K.J. Kulikowski, V. Venkataraman, Z. Wang, A. Taubin and M. Karpovsky, “Asynchronous balanced 

gates tolerant to interconnect variability”, In Proceedings of the IEEE International Symposium on 

Circuits and Systems, 2008, pp. 3190-3193.   

[14] I.J. Chang, S.P. Park and K. Roy, “Exploring asynchronous design techniques for process-tolerant and 

energy-efficient subthreshold operation”, IEEE Journal of Solid-State Circuits, vol. 45, pp. 401-410, 2010.  

[15] I. David, R. Ginosar and M. Yoeli, “Self-timed is self-checking”, Journal of Electronic Testing: Theory 

and Applications, vol. 6, pp. 219-228, 1995.  

[16] L.A. Plana, P.A. Riocreux, W.J. Bainbridge, A. Bardsley, S. Temple, J.D. Garside, Z.C. Yu, “SPA – a 

secure Amulet core for smartcard applications,” Microprocessors and Microsystems, vol. 27, pp. 431-

446, 2003.   

http://www.itrs2.net/


38 P. BALASUBRAMANIAN 

 
[17] D. Sokolov, J. Murphy, A. Bystrov and A. Yakovlev, “Design and analysis of dual-rail circuits for 

security applications”, IEEE Transactions on Computers, vol. 54, pp. 449-460, 2005.  

[18] F. Burns, A. Bystrov, A. Koelmans and A. Yakovlev, “Design and security evaluation of balanced 1-of-n 

circuits,” IET Computers and Digital Techniques, vol. 6, pp. 125-135, 2012.   

[19] W. Cilio, M. Linder, C. Porter, J. Di, D.R. Thompson and S.C. Smith, “Mitigating power- and timing-

based side-channel attacks using dual-spacer dual-rail delay-insensitive asynchronous logic,” 

Microelectronics Journal, vol. 44, pp. 258-269, 2013.   

[20] J. Sparsø and S. Furber (Eds.), Principles of Asynchronous Circuit Design: A Systems Perspective, 

Kluwer Academic Publishers, 2001.  

[21] M.T. Moreira and N.L.V. Calazans, “Quasi-delay-insensitive return-to-one design,” In Proceedings of 

the Design, Automation and Test in Europe Conference and Exhibition PhD Forum, 2014, pp. 1-2.   

[22] M.T. Moreira, J.J.H. Pontes and N.L.V. Calazans, “Tradeoffs between RTO and RTZ in WCHB QDI 

asynchronous design,” In Proceedings of the 15
th

 International Symposium on Quality Electronic Design, 

2014, pp. 692-699.  

[23] R.A. Guazzelli, M.T. Moreira and N.L.V. Calazans, “A comparison of asynchronous QDI templates 

using static logic,” In Proceedings of the 8
th

 IEEE Latin American Symposium on Circuits and Systems, 

2017, pp. 1-4.  

[24] A.J. Martin, “The limitation to delay-insensitivity in asynchronous circuits,” In Proceedings of the 6
th

 
MIT Conference on Advanced Research in VLSI, 1990, pp. 263-278.    

[25] P. Balasubramanian, C. Dang, “A comparison of quasi-delay-insensitive asynchronous adder designs 

corresponding to return-to-zero and return-to-one handshaking,” In Proceedings of the 60
th

 IEEE 

International Midwest Symposium on Circuits and Systems, 2017, pp. 1192-1195.  

[26] T. Verhoeff, “Delay-insensitive codes – an overview”, Distributed Computing, vol. 3, pp. 1-8, 1988.  

[27] B. Bose, “On unordered codes”, IEEE Transactions on Computers, vol. 40, pp. 1-8, 1988.  

[28] P. Balasubramanian, “Comments on “Dual-rail asynchronous logic multi-level implementation”,” 

Integration, the VLSI Journal, vol. 52, pp. 34-40, 2016.   

[29] C.L. Seitz, “System Timing”, in Introduction to VLSI Systems, C. Mead and L. Conway (Editors), pp. 

218-262, Addison-Wesley, Reading, Massachusetts, USA, 1980.  

[30] P. Balasubramanian and D.A. Edwards, “Efficient realization of strongly indicating function blocks”, In 

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2008, pp. 429-432.   

[31] P. Balasubramanian and D.A. Edwards, “A new design technique for weakly indicating function 

blocks”, In Proceedings of the 11
th

 IEEE Workshop on Design and Diagnostics of Electronic Circuits 

and Systems, 2008, pp. 116-121.   

[32] C. Brej, “Early output logic and anti-tokens,” PhD thesis, School of Computer Science, The University 

of Manchester, 2006.   

[33] P. Balasubramanian, “A robust asynchronous early output full adder,” WSEAS Transactions on Circuits 

and Systems, vol. 10, pp. 221-230, 2011.  

[34] C. Jeong and S.M. Nowick, “Block-level relaxation for timing-robust asynchronous circuits based on 

eager evaluation”, In Proceedings of the 14
th

 IEEE International Symposium on Asynchronous Circuits 

and Systems, 2008, pp. 95-104.  

[35] P. Balasubramanian, K. Prasad and N.E. Mastorakis, “Robust asynchronous implementation of Boolean 

functions on the basis of duality,” In Proceedings of the 14
th

 WSEAS International Conference on 

Circuits, 2010, pp. 37-43.  

[36] P. Balasubramanian, R. Arisaka and H.R. Arabnia, “RB_DSOP: a rule based disjoint sum of products 

synthesis method”, In Proceedings of the 12
th
 International Conference on Computer Design, 2012, pp. 39-43.  

[37] P. Balasubramanian and D.A. Edwards, “Self-timed realization of combinational logic”, In Proceedings 

of the 19
th

 International Workshop on Logic and Synthesis, 2010, pp. 55-62.  

[38] P. Balasubramanian, “Self-timed logic and the design of self-timed adders”, PhD thesis, School of 

Computer Science, The University of Manchester, 2010.   

[39] P. Balasubramanian and N.E. Mastorakis, “A set theory based method to derive network reliability 

expressions of complex system topologies,” In Proceedings of the Applied Computing Conference, 2010, 

pp. 108-114.    

[40] J. Cortadella, A. Kondratyev, L. Lavagno and C. Sotiriou, “Coping with the variability of combinational 

logic delays,” In Proceedings of the IEEE International Conference on Computer Design: VLSI in 

Computers and Processors, 2004, pp. 505-508.  


 Comparative Evaluation of Quasi-Delay-Insensitive Asynchronous Adders... 39 

 
[41] V.I. Varshavsky (Ed.), Self-Timed Control of Concurrent Processes: The Design of Aperiodic Logical 

Circuits in Computers and Discrete Systems, Chapter 4: Aperiodic Circuits, pp. 77-85, (Translated from 

the Russian by A.V. Yakovlev), Kluwer Academic Publishers, 1990.  

[42] P. Balasubramanian and K. Prasad, “Early output hybrid input encoded asynchronous full adder and 

relative-timed ripple carry adder,” In Proceedings of the 14
th

 International Conference on Embedded 

Systems, Cyber-physical Systems, and Applications, 2016, pp. 62-65.  

[43] P. Balasubramanian and S. Yamashita, “Area/latency optimized early output asynchronous full adders 

and relative-timed ripple carry adders,” SpringerPlus, vol. 5, pages 26, 2016.  

[44] P. Balasubramanian and K. Prasad, “Latency optimized asynchronous early output ripple carry adder 

based on delay-insensitive dual-rail data encoding,” International Journal of Circuits, Systems and 

Signal Processing, vol. 11, pp. 65-74, 2017.  

[45] K.S. Stevens, R. Ginosar and S. Rotem, “Relative timing,” IEEE Transactions on VLSI Systems, vol. 11, 

pp. 129-140, 2003.  

[46] D. Bhadra and K.S. Stevens, “Design of a low power, relative timing based asynchronous MSP430 

processor,” In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, 

pp. 794-799, 2017.   

[47] M.M. Mano and M.D. Ciletti, Digital Design, 4
th

 edition, Prentice-Hall, New Jersey, USA, 2007.  

[48] Synopsys Digital Standard Cell Library SAED_EDK32/28_CORE Databook, Revision 1.0.0, 2012.   

[49] N.P. Singh, “A design methodology for self-timed systems,” MSc dissertation, Massachusetts Institute of 

Technology, USA, 1981.  

[50] W.B. Toms, “Synthesis of quasi-delay-insensitive datapath circuits”, PhD thesis, School of Computer 

Science, The University of Manchester, UK, 2006.  

[51] P. Balasubramanian, “A latency optimized biased implementation style weak-indication self-timed full 

adder,” Facta Universitatis, Series: Electronics and Energetics, vol. 28, pp. 657-671, 2015.  

[52] J. Sparsø and J. Staunstrup, “Delay-insensitive multi-ring structures”, Integration, the VLSI Journal, vol. 

15, pp. 313-340, 1993.  

[53] B. Folco, V. Bregier, L. Fesquet and M. Renaudin, “Technology mapping for area optimized quasi delay 

insensitive circuits”, In Proceedings of the IFIP 13
th

 International Conference on Very Large Scale 

Integration of System-on-Chip, 2005, pp. 146-151.  

[54] W.B. Toms and D.A. Edwards, “A complete synthesis method for block-level relaxation in self-timed 

datapaths,” In Proceedings of the 10
th

 International Conference on Application of Concurrency to System 

Design, 2010, pp. 24-34.  

[55] P. Balasubramanian and D.A. Edwards, “A delay efficient robust self-timed full adder”, In Proceedings 

of the IEEE 3
rd

 International Design and Test Workshop, 2008, pp. 129-134.  

[56] P. Balasubramanian, D.A. Edwards and W.B. Toms, “Self-timed section-carry based carry lookahead 

adders and the concept of alias logic,” Journal of Circuits, Systems, and Computers, vol. 22, pp. 

1350028-1–1350028-24, 2013.  

[57] P. Balasubramanian, D. Dhivyaa, J.P. Jayakirthika, P. Kaviyarasi and K. Prasad, “Low power self-timed 

carry lookahead adders,” In Proceedings of the 56
th

 IEEE International Midwest Symposium on Circuits 

and Systems, 2013, pp. 457-460.  

[58] P. Balasubramanian, “Asynchronous carry select adders,” Engineering Science and Technology, an 

International Journal, vol. 20, pp. 1066-1074, 2017.  

[59] P. Balasubramanian and N.E. Mastorakis, “QDI decomposed DIMS method featuring homogeneous/ 

heterogeneous data encoding”, In Proceedings of the International Conference on Computers, Digital 

Communications and Computing, 2011, pp. 93-101.  

[60] P. Balasubramanian and D.A. Edwards, “Power, delay and area efficient self-timed multiplexer and 

demultiplexer designs,” In Proceedings of the 4
th

 IEEE International Conference on Design and 

Technology of Integrated Systems in Nanoscale Era, 2009, pp. 173-178, 2009.   

[61] N.H.E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, 2
nd

 edition, 

Addison-Wesley Publishing Company, Massachusetts, USA, 1993.