AP05_2.vp


1 Introduction
RC5 is a fast block cipher designed by R. Rivest [1, 2, 3]. It

is a parameterized algorithm in which the block size, key
length, and number of rounds are variable. The parameters
are often specified in the RC5-w/r/b notation, where w is the
word size in bits, r�[0, 255] is the number of rounds, and
b�[0, 255] is the number of bytes in the secret key. The block
size is twice the word size and the allowed values are 32 bits
(recommended for experiments and testing only), 64 bits, or
128 bits. This flexibility allows the optimal level of security
and efficiency to be chosen. A suggested “nominal” choice is
RC5-32/12/16, that is, RC5 with 32-bit words (64-bit blocks),
12 rounds, and a 128-bit (16-byte) key.

Although RC5 is extremely simple and easy to implement,
it provides a good level of security, as long as a sufficient key
length and enough rounds are employed. Attempts to crack
RC5 through various cryptanalysis techniques [4, 5, 6, 7, 8, 9]
have been published; [6] concludes that at least 16 rounds are
necessary to prevent a partial differential attack. However, all
of these works consider a chosen-plaintext attack, which is not
always plausible.

In 1997, RSA Security Inc. announced a “Secret-Key Chal-
lenge” [10]. The objective of the challenge is to find the secret
key, given one ciphertext sample together with a portion of
the corresponding plaintext. This corresponds to a real-world
scenario, where an attacker could, for instance, capture pack-
ets of certain network communication and guess the contents
of the encrypted packet header. Unfortunately, chosen-plain-
text attack methods are of limited use in this case, and the
exhaustive keyspace search method is the only known option.

In this paper, we analyze the feasibility of a brute-force
attack, using general-purpose computers as well as dedicated
hardware. We present the design of a fast, fully pipelined
cracking engine, and compare the cost and speed with
software-based solutions used by the currently employed
methods. Based on the results, we suggest a minimum key
length that can be considered secure against various types of
attackers. These numbers are then compared to suggestions
published by a group of computer scientists and cryptogra-
phers in January 1996 [11].

2 The RC5 algorithm

The RC5 algorithm is fully described and discussed in [1,
2, 3]. We include the relevant routines here for the sake of
completeness.

The algorithm consists of three routines: key expansion,
encryption, and decryption. It makes heavy use of data-
-dependent rotations, besides additions and subtractions
modulo 2w and XORs.

In the following description, � and � denote addition and
subtraction modulo 2w respectively, � denotes bit-wise XOR,
A � B and A � B denote a rotation of A by B bits to the left or
right respectively. The least-significant bit is assumed to be at
the rightmost position.

2.1 Constants and variables
Besides the already-defined constants w, r, b, [1] defines

the following constants to describe the algorithm: c � �b/(w/8)�
is the number of w-bit words necessary to hold the b bytes of
the key, and t � 2r � 2 is the size of the expanded key table (ar-
ray S). The key expansion routine uses the magic constants Pw
and Qw, which are defined in [1] and are derived from certain
irrational numbers. Their values, for w � 32, are equal to (in
hex):

P32 � 0xB7E15163, Q32 � 0xE93779B9.

The key is initially stored in the array K of b bytes. The key
expansion routine uses two arrays, S and L, containing t or
c words respectively, two w-bit variables A and B, and the
counters i, j.

2.2 Key expansion
First, the array L[0 	 c � 1] is filled with the secret key

K[0 	 b � 1]. On little-endian machines, this is accomplished
by zeroing-out the array L and copying the contents of K
directly into the same memory area. Then, the array
S[0 	 t � 1] is initialized and the secret key is mixed in:

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 61

Czech Technical University in Prague Acta Polytechnica Vol. 45  No. 2/2005

Cost-Effective Architectures for RC5
Brute Force Cracking

J. Buček, J. Hlaváč, M. Matušková, R. Lórencz

In this paper, we discuss the options for brute-force cracking of the RC5 block cipher, that is, for revealing the unknown secret key, given a
sample ciphertext and a portion of the corresponding plaintext. First, we summarize the methods employed by the current cracking efforts.
Then, we present two hardware architectures for finding the secret key using the “brute force” method. We implement the hardware in FPGA
and ASIC and, based on the results, we discuss the cost and time needed to crack the cipher using today’s technology and suggest a minimum
key length that can be considered secure.

Keywords: RC5 cipher, decryption, brute-force cracking, FPGA, ASIC.


S Pw[ ]0 � ;
for i � 1 to t do

S i S i Q w[ ] [ ]� � �1 ;

i j A B� � � �0 0; ;
repeat 3 max (t, c) times

A S i S i A B� � � �[ ] ( [ ] ) � 3;
B L j L j A B� � � �[ ] ( [ ] ) �( )A B� ;
i i t� �( ) mod1 ;
j j c� �( ) mod1 ;

2.3 Decryption
The following decryption routine assumes that the array S

contains the mixed key and that the ciphertext block is in
registers A and B.

for i r� downto 1 do

B B S i� � �( [ ])2 1 � �A A� ;

A A S i� �( [ ])2 � �B B� ;

B B S� � [ ]1 ;
A A S� � [ ]0 ;

2.4 Encryption
Although encryption is not a primary concern of this pa-

per, the appropriate routine is given here for completeness.
The plaintext is assumed to be in A and B.
A A S� � [ ]0 ;
B B S� � [ ]1 ;
for i � 1 to r do


A A B� �( ) � �B S i� [ ]2 ;

B B A� �( ) � �A S i� �[ ]2 1 ;

3 Current cracking efforts

3.1 Distributed.net
The most well-known project aiming to crack the RC5

challenges is organized by Distributed Computing Techno-
logies, Inc. [12]. On October 19, 1997, after 250 days of
searching, distributed.net found the 56-bit secret key for the
RC5-32/12/7 challenge. On July 14, 2002, they found the
64-bit secret key for the RC5-32/12/8 challenge (it took 1757
days). At the time of writing this paper (June 2004), the
remaining challenges still remain to be solved.

The philosophy of the distributed.net project is to search
the keyspace using the idle CPU time of many computers con-
nected to the Internet, creating a virtual massively parallel
system. Participating users download and install client soft-
ware that runs in the background, searching portions of the
keyspace and sending the results to the main server.

Table 1 lists some of the results published at the distrib-
uted.net web site [12], combined with the approximate cost of
the respective CPU. The numbers pertain to the currently
running RC5-72 project (RC5-32/12/9). (Note that the fre-
quency of the AMD processors is actually their performance
rating. This follows from the numbers from distributed.net.)

It turns out that older common processors provide a more
favorable cost per Mkey/s than recent releases. In addition,
Intel Pentium 4 implements the rotate instructions less effi-

ciently than the other CPUs, attaining the least favorable
results out of those presented.

3.2 Distributed reconfigurable hardware
A recent work by Morrison et al [13] attempts to crack

RC5 using a cluster of PC compatible computers with FPGA
accelerator boards. A rate of 168 kkeys/s for a Pentium 4 and
1.7 Mkeys/s per node of their distributed system is cited.
These numbers appear rather low in comparison with the -
distributed.net [12] results; unfortunately, it is not clear what is
the cause of the mismatch.

4 Hardware cracking engine
We have designed two different architectures of dedicated

hardware (one of them was presented in [14]) for brute-force
cracking of the RC5 cipher (any of its variants with w � 32). To
attain the maximum speed possible, the design is not recon-
figurable - the parameters of the RC5 variant, which is to be
cracked, must be specified at design (synthesis) time.

4.1 Pipelined architecture
The first design utilizes a long pipeline in order to attain a

high clock rate and effectively check one key every clock cycle.
The number of pipeline stages is equal to 3max (t, c) � r.

The overall block diagram of our hardware design is
shown in Figure 1. The inputs are “SECRET KEY” – the

62 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 45  No. 2/2005 Czech Technical University in Prague

CPU Rate
(Mkeys/s)

Cost
(USD)

Cost/Rate

AMD Athlon 64
(3.4 GHz)

8.4 245 29.2

AMD Athlon XP
(3 GHz)

7.7 163 21.1

Intel Pentium 4
(3.2 GHz)

4.5 237 52.7

PowerPC G4
(1.25 GHz)

13.1 399 30.5

Table 1: General-purpose CPU results

Fig. 1: Block diagram


key which is to be tested, “INITIAL VALUES FOR S” –
the contents of the S array detailed in section 2.2, and
“CIPHERTEXT ” – the one-block sample of the encoded
message, to which the corresponding plaintext block is
known. Hardware for generating individual keys and for
comparing the computed plaintext to the expected value is
omitted from the figure.

The implementation of the key expansion pipeline (“KEY
EXPANSION” block in Fig. 1) in our design is shown in Fig. 2.
Again, its inputs are the initial contents of the array S (con-
stants calculated at design time using the values Pw and Qw)
and the secret key to be tested. The dash-and-dotted line rep-
resents registers (fields of the array S) that would still contain
original initialization values (which are known at design time)
and therefore are not physically implemented.

The structure of one key expansion element (K-E in Fig.2)
is shown in Fig. 3. It performs the operations that are inside
the body of the main loop described in section 2.2.

The structure of the decryption pipeline (the “DECRYP-
TION” block in Fig. 1) is shown in Fig. 4. It performs the func-
tions described by the algorithm in section 2.3. Its inputs are

the computed array S (with the secret key mixed in) and the
ciphertext block. The dotted lines represent registers (por-
tions of the array S) that are no longer needed for the
calculation and thus are not physically implemented. The
structure of one decryption round element is shown in Fig. 5.
It performs the operations inside the main loop detailed in
section 2.3.

The final round (not included in r) differs from the others
and its hardware structure is shown in Fig. 6. To give the
reader an idea about the size of the entire hardware, let us
take the RC5-32/12/9 variant as an example and count the
number of hardware elements.

Every stage of the key expansion pipeline contains 3 regis-
ters (L array), up to 26 registers (S array), 4 adders, 1 barrel
shifter. Every stage of the decryption pipeline contains up to

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 63

Czech Technical University in Prague Acta Polytechnica Vol. 45  No. 2/2005

Fig. 2: Key expansion pipeline

Fig. 3: Key expansion element

Fig. 4: Decryption pipeline

Fig. 5: Decryption round element

Fig. 6: Final decryption round


26 registers (S array), 2 registers (A, B), 2 subtractors, 2 XORs,
2 barrel shifters. The last round element contains 2
subtractors. Thus, the entire pipeline would contain 1906
w-bit registers, 312 adders, 26 subtractors, 24 XORs and 102
barrel shifters.

4.2 Sequential (not pipelined) architecture
As stated in section 4.1 above, the pipelined architecture is

resource hungry. We anticipate that we would need a large
FPGA (or a large ASIC) to implement the full pipeline. How-
ever, there is a class of cheap FPGAs (e.g. the Xilinx Spartan-3
series), which could be also utilized, if we can design such a
circuit that fits into these small-to-medium sized devices.

We have therefore designed another variant of the crack-
ing engine, which is sequential but not pipelined. (In fact, it
still has two stages, but it is not fully pipelined, so we call it se-
quential here.) The basic block diagram stays the same (see
Figure 1). The key expansion and decryption units have been
changed so that the computation is performed sequentially,
much like the specification of the original algorithm (sections
2.2 and 2.3). The key expansion and decryption units contain
only one key expansion element and one decryption round
element, respectively.

The structure of the sequential key expansion unit is
shown in Fig. 7. The main difference from the pipelined unit
(Fig. 2) is that the S and L arrays are now implemented as shift
registers and the A and B registers are now inside the key
expansion element.

The S and L registers shift by one word to the lower indi-
ces (to the left in the figure) in order to implement the key
expansion sequence. The number of cycles is the same as
the number of pipeline stages in the pipelined version, i.e.
3max (t, c) � r, and the end of key expansion is determined by
a counter (not shown here for simplicity). The unit now con-
tains only one key expansion element, whose structure is
shown in Fig. 8. It performs the operations that are inside the
body of the main loop described in section 2.2, and contains
the A and B registers.

The structure of the sequential decryption unit is shown in
Fig. 9. It performs the functions described by the algorithm in
section 2.3. The S register is now implemented as a shift regis-
ter, but this time it shifts by two words toward the higher indi-
ces (to the right in the figure). The structure of one decryption
round element is shown in Fig. 10. It performs the operations
inside the main loop detailed in section 2.3. The last round is
now treated differently, and the final decrypted output is now
taken directly from the decryption round element (outputs A,
and B, in Fig. 10). The end of decryption is determined by a
counter (not shown here for simplicity).

5 Results
We described both hardware architectures in VHDL, simu-

lated it and synthesized it for FPGA. We synthesized the
pipelined architecture also for ASIC. The following experi-
ments were conducted with the RC5-32/12/9 variant to make
the results directly comparable with those in section 3.1
(distributed.net).

64 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 45  No. 2/2005 Czech Technical University in Prague

Fig. 7: Sequential key expansion unit

Fig. 8: Sequential key expansion element

Fig. 9: Sequential decryption unit

Fig. 10: Sequential decryption round element


5.1 FPGA implementation
The pipelined hardware is rather complicated. Further-

more, barrel shifters are not “FPGA-friendly” due to high
requirements on routing resources. As a result, a rather large
FPGA is necessary to implement the entire design, e.g. a
Xilinx xc2vp100 (Virtex II Pro) or xc2v8000 (Virtex II). It is
clear that FPGA is not a suitable platform for the pipelined
architecture, compared to general-purpose CPUs (Table 1).

The sequential hardware is much smaller than our
pipelined implementation. The circuit now fits into cheap,
small- to medium-sized FPGAs such as Xilinx xc3s200 and
xc3s400 (Spartan-3). The downside is the substantial drop in
the decryption rate. However, as we can see, the lower price
compensates for the speed decrease. We can see that the
sequential FPGA implementation is comparable with gen-
eral-purpose CPUs, from the Cost/Rate point of view. The
results of both variants implemented in FPGA are listed in
Table 2.

5.2 ASIC Implementation
We synthesized the pipelined design using a 0.18 �m tech-

nology library (unfortunately, we did not have the necessary
data for synthesis into the latest 90 nm and 60 nm technolo-
gies). Out of the wide range of tradeoffs between speed and
area, one of the favorable settings resulted in a design that
occupies approximately 15 mm2 and can run at 215 MHz
under typical conditions. Several instances of the design can
be placed on one die to further reduce the cost.

For ASICs, it is difficult to obtain an exact price per chip
since it greatly depends, among other factors, on the total vol-
ume of chips produced. We tried to get a reasonable estimate
for a small quantity of chips (based on MOSIS price lists [15]
for a lot of 40 chips) and for a large quantity (based on the
market price of common ICs with a similar die size). Table 3
lists our results and the estimated prices.

While it would be possible to fine-tune and optimize the
synthesis process or implement the design in more advanced

technology for even better results, the obtained figures are
sufficient for comparison with the other platforms considered
in this paper.

5.3 Minimum secure key length
Concluding from the previous sections, it is clear that

FPGA is not a reasonable option for the pipelined archi-
tecture since it is too expensive. On the other hand, the
sequential architecture can be used with smaller and cheaper
FPGAs. A low-budget attacker would turn to systems built
upon general-purpose CPUs or cheap FPGAs, while a well-
-funded attacker is likely to use ASICs.

The Cost/Rate ratio of small FPGAs is comparable or even
superior to that of general purpose CPUs. However, when
considering whether to build a FPGA-based cracking engine
or a CPU-based one, we have to take into account additional
costs that were not addressed in this paper. These involve the
costs of custom built or off-the-shelf mainboards, input/out-
put processing circuits, etc. In this respect, the CPU-based
cracking engine would probably be better.

To find the unknown secret key using a brute-force attack,
an attacker must search 50% of the keyspace (28b � 1 keys) on
average, and the entire keyspace (28b keys) in the worst case.
Table 4 details the number of keys an attacker would be able to
try every second, based on the amount invested, using the
technology discussed in the preceding sections.

From the data in Table 4, we can conclude that 40-bit keys
are not secure at all, and 56-bit keys can only protect sensitive
data from low-budget attackers for a limited period of time.
Should we wish to protect sensitive data against the strongest
attacker listed in Table 4, the secret key should be longer than
70 bits. However, considering that more advanced technology
is available today than the discussed 0.18 �m process, we
would recommend at least 80 to 90 bits. 128-bit keys that
are common in today’s cryptographic systems satisfy this
condition.

Another view is that, for instance, a 54-bit key is inade-
quate, if an attacker can gain more than $500,000 by recover-
ing it in a day.

It should be noted that while the cost of brute-force attack
increases exponentially with the length of the secret key, the
cost of encryption and decryption increases negligibly. There-
fore, a prudent cryptographic system would use a key that is at
least twice or thrice as long as the recommended minimum, to
allow a margin for error or technology improvements [11].

©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 65

Czech Technical University in Prague Acta Polytechnica Vol. 45  No. 2/2005

Device Architecture Rate
[Mkeys/s]

Cost
[USD]

Cost/Rate

xc2vp100 Pipelined 39 7900 202

xc2v8000 Pipelined 43 11000 256

xc3s200 Sequential 0.66 13.45 20.4

xc3s400 Sequential 1.32 22 16.7

Table 2: FPGA results

Production type Rate/chip
(Mkeys/s)

Cost/chip
(USD)

Cost/Rate

Low volume, small die 215 1050 4.9

Low volume, large die 2150 4500 2.1

High volume, large die 2150 180 0.08

Table 3: ASIC Results

Investment Technology
Keys per

sec day year

$5k GP CPU or FPGA 228 244 253

$50k GP CPU or FPGA 231 247 256

ASIC 233 249 258

$500k ASIC 238 254 263

$5M ASIC 246 262 270

Table 4: Speed vs. investment


6 Conclusion
We have designed two hardware architectures for attack-

ing the RC5 cipher using the exhaustive search (also known
as “brute-force”) method. We have described the architec-
tures in VHDL, synthesized it for various FPGA and ASIC
platforms and compared the speed with common general-
-purpose processors.

It turns out that FPGAs are comparable to general-pur-
pose CPUs, although the additional costs may favor using
CPU-based systems. For a low-budget attacker, today’s
general-purpose CPUs provide better performance. For an
attacker with a generous budget, ASIC is the best option.

Today, an 80 to 90-bit key seems to be secure enough
against most attackers. For data that should remain protected
for some time to come, at least 128 bits should be used; how-
ever, a prudent cryptographer would use a much longer key –
the increased cost of encryption and decryption is negligible,
while a brute-force attack becomes impossible.

Our future work will include testing the sequental design
in a physical implementation in a small or medium-sized
FPGA.

References
[1] Rivest, R. L.: The RC5 Encryption Algorithm.

CryptoBytes, Vol. 1 (1995), No. 1, p. 9–11.
[2] Baldwin, R., Rivest, R.: The RC5, RC5-CBC,

RC5-CBC-Pad, and RC5-CTS Algorithms. RFC 2040,
Network Working Group, 1996.

[3] Rivest, R. L.: Block Encryption Algorithm with Data-
-Dependent Rotations. U.S. Patents No. 5,724,428 and
No. 5,835,600, 1998.

[4] Kaliski, B. S. Jr., Yin, Y. L.: “On Differential and Linear
Cryptanalysis of the RC5 Encryption Algorithm.”
Advances in Cryptology – Crypto’95, Springer-Verlag,
1995, p. 171–183.

[5] Knudsen, L. R., Meier, W.: “Improved Differential At-
tacks on RC5.” Advances in Cryptology – Crypto’96,
Springer-Verlag, 1996, p. 216–228.

[6] Biryukov, A., Kushilevitz, E.: “Improved Cryptanalysis
of RC5.” Advances in Cryptology – Eurocrypt’98,
Springer-Verlag, 1998.

[7] Selcuk, A. A.: “New Results in Linear Cryptanalysis
of RC5.” Proceedings of 5th International Workshop
on Fast Software Encryption, Springer-Verlag, 1998,
p. 1–16.

[8] Shimoyama, T., Takeuchi K., Hayakawa J.: “Correlation
Attack to the Block Cipher RC5 and the Simplified Vari-
ants of RC6.” Proceedings AES3, New York, 2001.

[9] Yin Y. L.: “The RC5 Encryption Algorithm: Two Years
On,” CryptoBytes, Vol. 2 (1997), No. 3, p. 14–15.

[10] RSA Data Security, Inc.: Secret-Key Challenges. On-line
at
http://www.rsasecurity.com/rsalabs/chalenges/secretkey/
index.html

[11] Blaze, M., Diffie, W., Rivest, R. L. et al.: “Minimal Key
Lengths for Symmetric Ciphers to Provide Adequate
Commercial Security.” A Report by an Ad Hoc Group of
Cryptographers and Computer Scientists, 1996. On-line
at
http://theory.lcs.mit.edu/rivest/bsa-final-report.pdf

[12] Distributed Computing Technologies, Inc.: The “dis-
tributed.net” project. On-line at
http://www.distributed.net

[13] Morrison, J. P., O’Dowd, P. J., Healy, P. D.: “Searching
RC5 Keyspaces with Distributed Reconfigurable Hard-
ware.” In: Engineering of Reconfigurable Systems and
Algorithms, CSREA Press, 2003, p. 269–272.

[14] Matušková, M., Hlaváč, J., Buček, J., Lórencz, R.: “RC5
Brute Force Cracking Engine.” Proceedings of the Sixth
International Scientific Conference Electronic Com-
puters and Informatics ECI 2004. Technical University
Košice, 2004, p. 259–264. ISBN 80-8073-150-0.

[15] MOSIS IC Fabrication Service. On-line at
http://www.mosis.org

Ing. Jiří Buček
e-mail: bucekj@fel.cvut.cz

Ing. Josef Hlaváč
e-mail: hlavacj2@fel.cvut.cz

Ing. Róbert Lórencz, CSc.
e-mail: lorencz@fel.cvut.cz

Department of Computer Science and Engineering

Czech Technical University in Prague
Faculty of Electrical Engineering
Karlovo nám. 13
121 35 Prague, Czech Republic

Ing. Monika Matušková
e-mail: monika.matuskova@vslib.cz

Department of Software Engineering

Technical University of Liberec
Faculty of Mechatronics
Hálkova 6
460 15 Liberec, Czech Republic

66 ©  Czech Technical University Publishing House http://ctn.cvut.cz/ap/

Acta Polytechnica Vol. 45  No. 2/2005 Czech Technical University in Prague