scispace - formally typeset
Open AccessJournal ArticleDOI

A highly regular and scalable AES hardware architecture

Reads0
Chats0
TLDR
This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semicustom design flows, that is scalable in terms of throughput and of the used key size.
Abstract
This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semicustom design flows. Contrary to other publications, a complete architecture (even including CBC mode) that is scalable in terms of throughput and in terms of the used key size is described. Similarities of encryption and decryption are utilized to provide a high level of performance using only a relatively small area (10,799 gate equivalents for the standard configuration). This performance is reached by balancing the combinational paths of the design. No other published AES hardware architecture provides similar balancing or a comparable regularity. Implementations of the fastest configuration of the architecture provide a throughput of 241 Mbits/sec on a 0.6 /spl mu/m CMOS process using standard cells.

read more

Content maybe subject to copyright    Report

A Highly Regular and Scalable
AES Hardware Architecture
Stefan Mangard, Student Member, IEEE, Manfred Aigner, and Sandra Dominikus
Abstract—This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semi-
custom design flows. Contrary to other publications, a complete architecture (even including CBC mode) that is scalable in terms of
throughput and in terms of the used key size is described. Similarities of encryption and decryption are utilized to provide a high level of
performance using only a relatively small area (10,799 gate equivalents for the standard configuration). This performance is reached
by balancing the combinational paths of the design. No other published AES hardware architecture provides similar balancing or a
comparable regularity. Implementations of the fastest configuration of the architecture provide a throughput of 241 Mbits/sec on a
0.6 m CMOS process using standard cells.
Index Terms—Advanced Encryption Standard (AES), hardware architecture, IP module, VLSI, scalability, regularity.
æ
1INTRODUCTION
T
HE symmetric block cipher Rijndael [1] was standar-
dized by NIST
1
as Advanced Encryption Standard
(AES) [2] in November 2001. Being the successor of the
Data Encryption Standard (DES) [3], the AES is used in a
wide range of applications.
The AES is the preferred algorithm for implementations
of cryptographic protocols that are based on a symmetric
cipher. It is not only used to secure data transfers between
small, mobile consumer products, but it is also used in high-
end servers. Consequently, the requirements for implemen-
tations of the AES differ significantly.
Applications with strict requirements concerning perfor-
mance, power consumption, or side-channel leakage are, in
practice, usually implemented by dedicated hardware.
Hardware implementations of the AES are, for example,
used in Internet servers as performance accelerators or in
smart cards (besides other reasons) to increase the
resistance against side-channel attacks.
Due to the practical importance of hardware implemen-
tations, the different AES candidates were implemented
and compared on FPGAs (see [4], [5], and [6]) and on ASICs
[7] before Rijndael was finally selected to become the AES.
After this selection, more effort was dedicated toward the
development of efficient hardware implementations of this
particular algorithm (see [8], [9], [10], and [11]). The most
recent proposal for an ASIC architecture of the AES is [12].
However, this architecture has very unbalanced combina-
tional paths and requires a time and area-consuming
selector function, which is not part of the actual AES
algorithm.
This article presents a highly regular and scalable AES
hardware architecture that requires only 10,799 gate
equivalents to provide a throughput of 128 Mbits/sec (for
AES-128 encryption and decryption) on a 0.6 m standard
cell library. These numbers include an AMBA APB bus
interface, a CBC register, and a key storage register.
The architecture uses similarities of encryption and
decryption to provide a high level of performance while
keeping the chip size small. The high performance is
especially reached by keeping combinational paths ba-
lanced so that every clock cycle is fully utilized. The fact
that the combinational paths are short compared to other
published AES architectures makes the presented architec-
ture a favorable choice for low-power applications. This is
due to the fact that glitches, which occur more frequently in
long combinational paths than in short ones, cause a
significant power consumption.
Besides the small area requirements and the high
performance, the presented architecture has another im-
portant property: It is highly regular. This helps to keep the
size of the AES architecture small during place-and-route of
a semi-custom design flow and facilitates the creation of
full-custom designs. Full-custom approaches are particu-
larly interesting for smart card implementations that are
required to provide protection against power analysis
attacks [13]. In a full-custom approach, the designer can
balance the capacitive loads of differential nodes well as it
is, for example, desired for logic styles like the one
described in [14].
Another very important property of the presented
architecture is its scalability. The performance of the
architecture can be increased gradually at the cost of an
increased chip size. Furthermore, the key size can easily be
changed from 128 to 192 or 256 bits. However, the overall
architecture does not change for versions with different
performance and key sizes.
Section 2 gives a brief overview of the AES algorithm. In
Section 3, the AES hardware architecture and the corre-
sponding implementation options are described. The
IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 4, APRIL 2003 483
. The authors are with the Institute for Applied Information Processing and
Communications (IAIK), Graz University of Technology, Inffeldgasse 16a,
A-8010 Graz, Austria.
E-mail: {stefan.mangard, manfred.aigner, sandra.dominikus}@iaik.at.
Manuscript received 15 June 2002; revised 2 Dec. 2002; accepted 2 Dec. 2002.
For information on obtaining reprints of this article, please send e-mail to:
tc@computer.org, and reference IEEECS Log Number 117871.
1. National Institute of Standards and Technology.
0018-9340/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society

performance of the architecture is summarized and com-
pared with other AES hardware implementations in
Section 4. Concluding remarks can be found in Section 5.
2 AES ALGORITHM
The AES is a round-based, symmetric block cipher. It is
defined for a block size of 128 bits and key lengths of 128,
192, and 256 bits. According to the key length, these
variants of the AES are called AES-128, AES-192, and
AES-256. This article mainly focuses on implementing the
AES-128, which is the most commonly used AES variant.
However, the presented architecture can also be used for
the other standardized key sizes.
The following subsection describes the AES transforma-
tions, which are the building blocks of AES encryptions and
decryptions. In Section 2.2, the AES-128 key expansion is
discussed.
2.1 AES Transformations
The AES takes a 128-bit data block as input and performs
several different transformations on this block. In case of an
encryption, the input block of the AES is called plaintext
and the returned block is called ciphertext. All intermediate
results of this block, as well as the input and the output
block, are called states. For a discussion of the different
transformations, executed on the 128-bit states in an AES
encryption or decryption, it is best to picture a state as a
4-by-4 matrix of bytes (see Fig. 1). A 128-bit input/output
block of the AES is mapped to an AES state by putting the
first byte of the block in the upper left corner of the matrix
and by filling in the remaining bytes column by column.
AES encryptions and decryptions are based on four
different transformations that are performed repeatedly in a
certain sequence. Each of these transformations, which are
described in the following, maps a 128-bit input state to a
128-bit output state.
. SubBytes: The SubBytes transformation is a non-
linear substitution operation that works on bytes.
Each byte of the input state is replaced using the
same substitution function (called S-Box).
The S-Box is defined as the multiplicative inverse
in the Galois Field GFð2
8
Þ with the irreducible
polynomial mðxÞ¼x
8
þ x
4
þ x
3
þ x þ 1 followed by
an affine transformation. The InvSubBytes transfor-
mation, which is needed for decryption, is the
inverse of the affine transformation followed by the
same inversion as in the SubBytes transformation.
. ShiftRows: The ShiftRows transformation rotates
each row of the input state to the left, whereby the
offset of the rotation corresponds to the row number.
For example, row one (the row consisting of the
elements D
1;0
, D
1;1
, D
1;2
, and D
1;3
) is rotated by one
position to the left. The inverse of this transforma-
tion is computed by performing the corresponding
rotations to the right.
. MixColumns: The MixColumns transformation
maps each column of the input state to a new
column in the output state. Each input column is
considered as a polynomial over GF ð2
8
Þ and multi-
plied with the constant polynomial aðxÞ¼f03gx
3
þ
f01gx
2
þf01gx þf02g modulo x
4
ÿ 1. The coeffi-
cients of aðxÞ are also elements of GF ð2
8
Þ and are
represented by hexadecimal values in this equation.
The InvMixColumns transformation is the multi-
plication of each column with a
ÿ1
ðxÞ¼f0Bgx
3
þ
f0Dgx
2
þf09gx þf0Eg modulo x
4
ÿ 1.
. AddRoundKey: The AddRoundKey transformation
is self-inverting. It maps a 128-bit input state to a
128-bit output state by xoring the input state with a
128-bit round key.
These transformations are applied to a 128-bit input
block in a certain sequence to perform an AES encryption or
decryption. In both cases, the transformations are grouped
to so-called rounds. There are three different types of
rounds, namely, the initial round, the normal round, and
the final round. The transformations of the different rounds
and the sequence of the rounds are shown in Fig. 2. The
rounds are slightly different for encryption and decryption
and the number of rounds, Nr, depends on the key size.
The presented decryption algorithm is called Inverse
Cipher. Compared to the encryption algorithm, it is simply
484 IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 4, APRIL 2003
Fig. 1. Alignment of an AES state.
Fig. 2. Sequence of the execution of the four different transformations
used in an AES encryption/decryption.

the execution of the inverse transformations in reversed
order. Alternatively, the so-called Equivalent Inverse Cipher
can be used for decryption. However, for the presented AES
hardware architecture, the Inverse Cipher is more suitable.
2.2 AES-128 Key Expansion
For an AES-128 encryption, the 128-bit cipher key needs
to be expanded to eleven 128-bit round keys. The
principle idea of this key expansion is that the first
round key, Roundkey
0
, corresponds to the cipher key. All
subsequent round keys are derived from their respective
predecessor using a function f.So,Roundkey
i
¼
fðRoundkey
iÿ1
Þ for all 0 <i<11.
For an AES-128 decryption, the same round keys are
used in reversed order. Using the inverse of the key
expansion function, f
ÿ1
, the round keys can be derived
recursively from RoundKey
10
.
In Fig. 3, a pseudocode for the AES-128 key expansion is
shown. This pseudocode is based on 32-bit key words and,
so, the eleven 128-bit round keys are stored one after the
other in the word array W[0..43]. The RotWord function,
used in the pseudocode, rotates the input word by one byte
to the left. The SubWord function applies the S-Box function
to each byte of the input word. The RC values, finally, are
the powers x
iÿ1
of x in the same Galois field GF ð2
8
Þ as used
for the S-Box transformation.
Fig. 4 shows how the word array W[0..43] is mapped
to the corresponding round keys. The key expansions for
the AES-192 and for the AES-256 are very similar and
described in detail in [2].
3 AES HARDWARE ARCHITECTURE
The AES hardware architecture presented in this article is
very modular and provides a high level of scalability. While
the standard version of the architecture is suited for smart
cards, USB dongles, and similar devices, the high-perfor-
mance version provides enough throughput to be used as
an acceleration module in high-end servers. It is important
to outline that, in both versions, the overall structure of the
architecture remains the same—even for different key sizes.
This overall structure of the architecture, which is
capable of performing AES encryptions and decryptions,
is shown in Fig. 5. The AES hardware module consists of
the following four components:
. Interface: The interface handles all communication
of the AES module with its environment—it com-
municates based on 32-bit words with the other
components of the AES module and via an AMBA
APB bus with the environment of the module.
. Data Unit: The data unit is the main module of the
architecture. It can perform any kind of AES
encryption or decryption round using the round
key that is assigned to its key input. Although the
number of rounds is different for the three standar-
dized key sizes, the types of rounds that are
executed are always the same. Consequently, the
data unit is independent of the key size.
The data unit has a highly regular structure, as
indicated in Fig. 5. It consists of 16 instances of a so-
called data cell and a certain number of S-Boxes. The
more S-Boxes are used, the higher is the performance
of the AES module. The standard version of the data
unit has four S-Boxes and is described in detail in
Section 3.1. A high-performance version with 16
S-Boxes is presented in Section 3.2. In principle, it is
also possible to implement a data unit with eight S-
Boxes. This version can easily be derived from the
description of the other two versions and is not
presented separately.
. Key Unit: The key unit serves two main purposes:
the storage of cipher keys and the calculation of the
round keys. To save die size, the S-Boxes of the data
unit are reused to perform the key expansion. In the
presented architecture, this reuse is possible for any
key size without loss of performance.
Since 128 bit is currently the most commonly used
key size, a key unit capable of performing the 128-bit
MANGARD ET AL.: A HIGHLY REGULAR AND SCALABLE AES HARDWARE ARCHITECTURE 485
Fig. 3. Pseudocode for the AES-128 key expansion.
Fig. 4. Mapping of the key words to round keys.
Fig. 5. Overall structure of the AES module.

key expansion is described in detail in this article
(see Section 3.3). The overall structure of the AES
module, however, allows the usage of key modules
supporting multiple key sizes in parallel or any of
the standardized key sizes on its own.
. CBC Unit: An AES module just consisting of a key
unit, a data unit, and an interface can already
perform the AES algorithm in ECB (Electronic Code
Book) mode. However, because there exist certain
attacks (e.g., reordering of blocks) against this mode,
usually other modes of operation [15] are used. The
most popular one is the CBC (Cipher Block Chain-
ing) mode, where the result of an AES encryption is
xored with the next 128-bit input block. This
procedure needs to be reversed when performing a
decryption. The CBC unit of the AES module
implements the CBC mode without any negative
influence on the overall performance of the AES
module.
In the presented architecture, a 128-bit block of data is
encrypted as follows: First, a cipher key needs to be loaded
via the interface into the key unit. Once a key is loaded, it
can be used for an arbitrary number of encryptions and
decryptions. After loading the cipher key, the first 128-bit
block of data is transferred via the interface and the CBC
unit into the data unit. The data unit then iteratively
performs the number of AES rounds that are required for
the used key size.
In each round, the key unit provides the corresponding
round key to the data unit. To calculate these round keys,
the key unit uses the S-Boxes of the data unit during a clock
cycle in which they are not used by the data unit. After the
calculation of the AES rounds, the encryption result is
passed in 32-bit words to the interface via the CBC unit.
Decryptions are computed in a very similar way. In this
case, the data unit performs the inverse AES transforma-
tions in reversed order and also the key unit provides the
round keys in reversed order.
The remainder of this section presents the details of the
standard data unit and those of the high-performance data
unit. Additionally, an AES-128 key unit that can be used
with both data units is described.
3.1 Standard Data Unit
The data unit is the biggest and the most important
component of the AES architecture. It stores the current
128-bit state (see Fig. 1) of an encryption or decryption and
is capable of performing any number and type of encryp-
tion/decryption rounds on this state. Consequently, all four
AES transformations (SubBytes, ShiftRows, MixColumns,
and AddRoundKey) and the corresponding inverse trans-
formations are implemented within the data unit. For the
AddRoundKey transformation, a round key needs to be
provided by the key unit.
Fig. 6 shows the standard version of the data unit. Its
structure is highly regular and closely related to the
definition of the AES state. The standard data unit consists
of 16 so-called data cells and four S-Boxes. An S-Box of the
architecture is a circuit capable of performing the S-Box and
the inverse S-Box transformation for an 8-bit input. The data
cells store eight bits per cell and perform all other AES
transformations and the corresponding inverses, when
connected appropriately. In full-custom designs, inputs
and outputs of the data cells can be defined in a way that
connection by abutment is possible when they are placed
next to another.
However, the regular design not only facilitates full-
custom designs. Also, for FPGA and standard-cell synth-
esis, a regular circuit is very desirable. If one improves the
synthesis results of a single data cell by special attributes for
the synthesizer, the overall area reduction is 16 times higher
and therefore worth the effort.
Another distinguishing feature of the presented archi-
tecture is the fact that the combinational paths are relatively
short and, what is even more important, very balanced. The
commonly used approach to implement the AES in
hardware is to store the 128-bit state in a register and to
perform the AES transformations (except for the ShiftRows
transformation) column by column. So, in order to perform
a normal AES encryption round, first the ShiftRows
transformation is done in one clock cycle. Then, the
remaining transformations of an AES round are done
column by column, whereby all transformations for one
column are usually done within one clock cycle.
The problem of this approach is that the combinational
path to perform a SubBytes, a MixColumn, and an
AddRoundKey transformation in one clock cycle is very
long. Additionally, the implementation of the ShiftRows
transformation causes a significant wiring overhead. The
data unit, presented in this section, solves both problems. It
performs AES encryptions and decryptions in the following
way:
486 IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 4, APRIL 2003
Fig. 6. Architecture of the standard data unit.

In order to load a data block, the input data is shifted
column by column from the right side (see Fig. 6) into the
data cells. The inputs labeled “In” are connected via the
CBC unit to the interface. The initial AddRoundKey
transformation is done in the fourth clock cycle at the same
time as the last column is loaded.
To compute a normal AES round, the registers are
rotated vertically to perform the Inv-/SubBytes and the
Inv-/ShiftRows transformation row by row. In the first
clock cycle, the Inv-/SubBytes transformation starts for row
three. Due to the fact that the implementation of the S-Boxes
is pipelined (this will be motivated in Section 3.1.1), the
result of this Inv-/SubBytes transformation is stored in row
zero (see Fig. 6) two clock cycles later. Using the pipelined
S-Boxes and the Barrel shifter between row zero and row
one of the registers, the Inv-/SubBytes and the
Inv-/ShiftRows transformations can be applied to all
16 bytes of the state within five clock cycles.
In the sixth clock cycle of a normal AES round, the
Inv-/MixColumns and the AddRoundKey transformations
are performed by all data cells in parallel. Since the S-Boxes
are not used by the data unit during the sixth clock cycle,
they can be utilized by the key unit to perform the key
expansion for the next round key. In order to compute the
final round of an encryption or decryption, the
Inv-/Mixcolumns transformation is omitted by the data
cells in this clock cycle.
In this way, the required number of encryption or
decryption rounds can be executed by the data unit and the
key unit until the 128-bit result is finally stored in the
registers of the data unit. This result is then shifted column
by column to the left (to the interface of the AES module).
At the same time, a new input state can be loaded.
Using the standard data unit, the minimal number of
clock cycles that are required to perform an AES-128
encryption or decryption is 64. Four clock cycles are
required for the I/O of the data unit, 54 clock cycles are
required to perform the nine normal AES rounds, and six
are required for the final round.
The following two subsections present the architecture of
the S-Boxes and the data cells.
3.1.1 S-Boxes
In hardware implementations, the SubBytes transformation
and its inverse are the most expensive AES transformations.
This is why the standard data unit does not contain as many
S-Boxes as data cells.
In principle, there are two ways for implementing an
S-Box in hardware that can be used for the SubBytes
transformation and its inverse. It can either be implemented
as ROM lookup or it can be calculated with combinational
logic. The straightforward way to implement an S-Box is to
store all possible output values for the S-Box function and
its inverse in a ROM. However, this requires a small ROM
with 512 bytes, where the overhead for address decoding
and output signal conditioning outweighs the area require-
ments of the ROM matrix.
Alternatively, just the result of the inversion in GFð2
8
Þ
could be stored in a 256 byte ROM and the affine
transformation and its inverse could be calculated with
combinational logic. This approach would only need half
the ROM size of the first approach, but it would have an
even worse overhead to matrix ratio.
The best way to implement an S-Box is to use combina-
tional logic for the affine transformation, for its inverse and
also for the computation of the inverse in GF ð2
8
Þ. This
approach was first proposed by Rijmen in [16] and used by
Rudra et al. in [11]. Implementations of S-Boxes that are
particularly interesting for the presented architecture have
been proposed by Satoh et al. in [12] and by Wolkerstorfer
et al. in [17].
For the presented AES module, a pipelined (one stage)
implementation of the S-Box as described in [17] is used.
The main idea of this implementation is to build an
efficient combinational circuit for the S-Box, which is
based on the fact that GFð2
8
Þ can be seen as a quadratic
extension of the field GFð2
4
Þ. A pipelined version of the
S-Box is used to accomplish that the combinational paths
in the architecture are balanced (i.e., the paths of the S-
Boxes and those of a MixColumns-and-AddRoundKey
step are roughly the same).
3.1.2 Data Cells
The design of the data cells is crucial for the overall
architecture of the data unit. The data cells serve as storage
elements of the AES state and perform the Inv-/MixColumns
and the AddRoundKey transformation. Each data cell
consists of the following components:
. Eight flip-flops: Each data cell stores one byte of the
current AES state (see Figs. 1 and 6).
. One Multiplier: The MixColumns transformation
maps one column of the input state to a new column
in the output state. The multiplier that is a part of
each data cell computes one output byte of the
MixColumns transformation based on a four byte
input. This multiplier considers its four byte input as
polynomial over GFð2
8
Þ and is capable of perform-
ing a multiplication of the input with the constant
polynomial aðxÞ¼f03gx
3
þf01gx
2
þf01gx þf02g
and with its inverse, aðxÞ
ÿ1
, modulo x
4
ÿ 1.
The inputs of each multiplier are connected to the
outputs of the four data cells that are in the same
column as the multiplier itself (see Fig. 6). However,
due to the definition of the MixColumns and the
InvMixColumns transformation, the input connec-
tions are different in each row. The multipliers of the
architecture are designed in a way that there is a
maximum reuse of components between the multi-
plication with aðxÞ and the one with aðxÞ
ÿ1
.A
detailed description of this multiplier architecture
can be found in [18].
. Eight XOR-Gates: The AddRoundKey transforma-
tion is performed in parallel in the presented
architecture. Consequently, eight xor gates are
required in each data cell.
. Input Selection: The data cells support unidirec-
tional vertical and horizontal shifting. Consequently,
each data cell consists of a multiplexor to select
which input is loaded into the data cell.
MANGARD ET AL.: A HIGHLY REGULAR AND SCALABLE AES HARDWARE ARCHITECTURE 487

Citations
More filters
Book ChapterDOI

Strong Authentication for RFID Systems using the AES Algorithm

TL;DR: A novel approach of an AES hardware implementation which encrypts a 128-bit block of data within 1000 clock cycles and has a power consumption below 9 μA on a 0.35 μm CMOS process is introduced.
Journal Article

Strong authentication for RFID systems using the AES algorithm

TL;DR: In this article, the authors presented an authentication protocol which serves as a proof of concept for authenticating an RFID tag to a reader device using the Advanced Encryption Standard (AES) as cryptographic primitive.
Journal ArticleDOI

AES implementation on a grain of sand

TL;DR: A hardware implementation of the advanced encryption standard (AES) which is optimised for low-resource requirements and nearly ignorable power consumption in combination with the extreme area efficiency allows new fields of applications for AES which were beyond imagination before.
Journal ArticleDOI

A Practical Wireless Attack on the Connected Car and Security Protocol for In-Vehicle CAN

TL;DR: It is shown that a long-range wireless attack is physically possible using a real vehicle and malicious smartphone application in a connected car environment and a security protocol for CAN is proposed as a countermeasure designed in accordance with current CAN specifications.
Journal ArticleDOI

Secure Scan: A Design-for-Test Architecture for Crypto Chips

TL;DR: The authors used a hardware implementation of the advanced encryption standard to show that the traditional scan DFT scheme can compromise the secret key, and showed that by using secure-scan DFT, neither thesecret key nor the testability of the AES implementation is compromised.
References
More filters
Book ChapterDOI

Differential Power Analysis

TL;DR: In this paper, the authors examine specific methods for analyzing power consumption measurements to find secret keys from tamper resistant devices. And they also discuss approaches for building cryptosystems that can operate securely in existing hardware that leaks information.
BookDOI

The Design of Rijndael

TL;DR: This volume is the authoritative guide to the Rijndael algorithm and AES and professionals, researchers, and students active or interested in data encryption will find it a valuable source of information and reference.
Book ChapterDOI

A Compact Rijndael Hardware Architecture with S-Box Optimization

TL;DR: Compact and high-speed hardware architectures and logic optimization methods for the AES algorithm Rijndael are described, including a new composite field and the S-Box structure is also optimized.
Proceedings Article

A dynamic and differential CMOS logic with signal independent power consumption to withstand differential power analysis on smart cards

TL;DR: A set of logic gates and flip-flops needed for cryptographic functions and compared those to Static Complementary CMOS implementations to protect security devices such as smart cards against power attacks are built.
Book ChapterDOI

An ASIC Implementation of the AES SBoxes

TL;DR: This article presents a hardware implementation of the S-Boxes from the Advanced Encryption Standard (AES), and shows that a calculation of this function and its inverse can be done efficiently with combinational logic.
Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "A highly regular and scalable aes hardware architecture" ?

This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semicustom design flows. Implementations of the fastest configuration of the architecture provide a throughput of 241 Mbits/sec on a 0.