scispace - formally typeset
Search or ask a question

Showing papers on "Reconfigurable computing published in 2003"


Proceedings ArticleDOI
09 Apr 2003
TL;DR: A module has been implemented in Field Programmable Gate Array (FPGA) hardware that scans the content of Internet packets at Gigabits/second rates and automatically generates the Finite State Machines (FSMs) to search for regular expressions.
Abstract: A module has been implemented in Field Programmable Gate Array (FPGA) hardware that scans the content of Internet packets at Gigabits/second rates. All of the packet processing operations are performed using reconfigurable hardware within a single Xilinx Virtex XCV2000E FPGA. A set of layered protocol wrappers is used to parse the headers and payloads of packets for Internet protocol data. A content matching server automatically generates the Finite State Machines (FSMs) to search for regular expressions. The complete system is operated on the Field-programmable Port Extender (FPX) platform.

286 citations


Book ChapterDOI
08 Sep 2003
TL;DR: A rigorous study of the possible implementation schemes, but also proposes heuristics to evaluate hardware efficiency at different steps of the design process and defines an optimal pipeline that takes the place and route constraints into account.
Abstract: Performance evaluation of the Advanced Encryption Standard candidates has led to intensive study of both hardware and software implementations. However, although plentiful papers present various implementation results, it seems that efficiency could still be greatly improved by applying good design rules adapted to devices and algorithms. This paper addresses various approaches for efficient FPGA implementations of the Advanced Encryption Standard algorithm. As different applications of the AES algorithm may require different speed/area tradeoffs, we propose a rigorous study of the possible implementation schemes, but also discuss design methodology and algorithmic optimization in order to improve previously reported results. We propose heuristics to evaluate hardware efficiency at different steps of the design process. We also define an optimal pipeline that takes the place and route constraints into account. Resulting circuits significantly improve previously reported results: throughput is up to 18.5 Gbits/sec and area requirements can be limited to 542 slices and 10 RAM blocks with a ratio throughput/area improved by at least 25% of the best-known designs in the Xilinx Virtex-E technology.

192 citations


Proceedings ArticleDOI
Tim Tuan1, Bo-Cheng Lai
03 Dec 2003
TL;DR: This work analyzes the leakage power of a low-cost, 90 nm FPGA using detailed device-level simulations, and identifies promising approaches for FPGAs leakage power reduction.
Abstract: Reconfigurable architectures, including FPGAs, are promising solutions for managing increasing design complexity while achieving both performance and flexibility. To support reconfiguration, FPGAs use more transistors per function than fixed-logic solutions, resulting in higher leakage power consumption. Consequently, FPGAs are generally not found in mobile applications. In this work, we analyze the leakage power of a low-cost, 90 nm FPGA using detailed device-level simulations. The simulation methodology accounts for design-dependent variations and provides detailed leakage power breakdowns. The analysis quantifies the leakage power challenge in FPGAs, and identifies promising approaches for FPGA leakage power reduction.

172 citations


Journal ArticleDOI
01 Feb 2003
TL;DR: It is found that in FPGAs with more than 20 K four-input look-up tables, the reduction in channel width, interconnect delay and power dissipation can be over 50% by 3-D implementation.
Abstract: In this paper, analytical models for predicting interconnect requirements in field-programmable gate arrays (FPGAs) are presented, and opportunities for three-dimensional (3-D) implementation of FPGAs are examined. The analytical models for two-dimensional FPGAs are calibrated by routing and placement experiments with benchmark circuits and extended to 3-D FPGAs. Based on system-level modeling, we find that in FPGAs with more than 20 K four-input look-up tables, the reduction in channel width, interconnect delay and power dissipation can be over 50% by 3-D implementation.

157 citations


Proceedings ArticleDOI
03 Mar 2003
TL;DR: The general scope of the research is presented, and the communication scheme, the design environment and the hardware/software context switching issues are details, which proved its feasibility by allowing us to design a relocatable video decoder.
Abstract: The ability to (re)schedule a task either in hardware or software will be an important asset in a reconfigurable systems-on-chip. To support this feature we have developed an infrastructure that, combined with a suitable design environment permits the implementation and management of hardware/software relocatable tasks. This paper presents the general scope of our research, and details the communication scheme, the design environment and the hardware/software context switching issues. The infrastructure proved its feasibility by allowing us to design a relocatable video decoder. When implemented on an embedded platform, the decoder performs at 23 frames/s (320/spl times/240 pixels, 16 bits per pixel) in reconfigurable hardware and 6 frames/s in software.

157 citations


Proceedings ArticleDOI
02 Jun 2003
TL;DR: The vision of some of the key changes that will emerge in the design of complex Systems-on-a-Chip for nanometer-scale semiconductor technologies and their impact on design automation requirements, from the perspective of a broad range SoC supplier is presented.
Abstract: In this paper, we analyze the emerging trends in the design of complex Systems-on-a-Chip for nanometer-scale semiconductor technologies and their impact on design automation requirements, from the perspective of a broad range SoC supplier. We present our vision of some of the key changes that will emerge in the next five years. This vision is characterized by two major paradigm changes. The first is that SoC design will become divided into four mostly non-overlapping distinct abstraction levels. Very different competences and design automation tools will be needed at each level. The second paradigm change is the emergence of domain-specific S/W programmable SoC platforms consisting of large, heterogeneous sets of embedded processors. These will be complemented by embedded reconfigurable hardware and networks-on-chip. A key enabler for the effective use of these flexible SoC platforms is a high-level parallel programming model supporting automatic specification-to-platform mapping.

150 citations


Proceedings ArticleDOI
03 Mar 2003
TL;DR: A hardware and software infrastructure is reported that enables an FPGA to dynamically reconfigure itself under the control of a soft microprocessor core that is instantiated on the same array.
Abstract: This paper presents a lightweight approach for embedded reconfiguration of Xilinx Virtex II/spl trade/ series FPGAs. A hardware and software infrastructure is reported that enables an FPGA to dynamically reconfigure itself under the control of a soft microprocessor core that is instantiated on the same array. The system provides a highly integrated, lightweight approach to dynamic reconfiguration for embedded systems. It combines the benefits of intelligent control, fast reconfiguration and small overhead.

143 citations


Journal ArticleDOI
TL;DR: Single-assignment C is a C language variant designed to create an automated compilation path from an algorithmic programming language to an FPGA-based reconfigurable computing system.
Abstract: RC systems typically consist of an array of configurable computing elements. The computational granularity of these elements ranges from simple gates - as abstracted by FPGA lookup tables - to complete arithmetic-logic units with or without registers. A rich programmable interconnect completes the array. RC system developer manually partitions an application into two segments: a hardware component in a hardware description language such as VHDL or Verilog that will execute as a circuit on the FPGA and a software component that will execute as a program on the host. Single-assignment C is a C language variant designed to create an automated compilation path from an algorithmic programming language to an FPGA-based reconfigurable computing system.

142 citations


Proceedings ArticleDOI
22 Apr 2003
TL;DR: A novel approach for designing an operating system for reconfigurable systems (OS4RS) is presented, which will ease application development by shielding the programmer from the complexity of the system and by providing a clear application development API.
Abstract: The emerging need for large amounts of flexible computational power on embedded devices motivated many researchers to incorporate reconfigurable logic together with an instruction-set-processor (ISP) into their architectures. This implies that tomorrow's applications will make use of both the ISP and the reconfigurable logic in order to provide the user with maximum performance. Today, however, a few stumbling blocks prevent these kinds of heterogeneous architectures from becoming mainstream. The technology still lacks a form of run-time management infrastructure. This infrastructure should ease application development by shielding the programmer from the complexity of the system and by providing a clear application development API. This paper presents a novel approach for designing an operating system for reconfigurable systems (OS4RS). Creating such an operating system is an integral part of our ongoing research regarding reconfigurable computing. An initial version of our operating system was used to manage a reconfigurable systems demonstrator.

128 citations


Journal ArticleDOI
27 Oct 2003
TL;DR: A new architecture for embedded reconfigurable computing, based on a very-long instruction word (VLIW) processor enhanced with an additional run-time configurable datapath, leading to an improvement in both timing performance and power consumption.
Abstract: This paper describes a new architecture for embedded reconfigurable computing, based on a very-long instruction word (VLIW) processor enhanced with an additional run-time configurable datapath. The reconfigurable unit is tightly coupled with the processor, featuring an application-specific instruction-set extension. Mapping computation intensive algorithmic portions on the reconfigurable unit allows a more efficient elaboration, thus leading to an improvement in both timing performance and power consumption. A test chip has been implemented in a standard 0.18-/spl mu/m CMOS technology. The test of a signal processing algorithmic benchmark showed speedups ranging from 4.3/spl times/ to 13.5/spl times/ and energy consumption reduced up to 92%.

127 citations


Book ChapterDOI
17 Mar 2003
TL;DR: A new virtual reconfigurable circuit, whose granularity and configuration schema exactly fit to requirements of a given application, is designed on the top of an ordinary FPGA.
Abstract: The paper introduces a new method for the design of real-world applications of evolvable hardware using common FPGAs (Field Programmable Gate Arrays). In order to avoid "reconfiguration problems" of current FPGAs a new virtual reconfigurable circuit, whose granularity and configuration schema exactly fit to requirements of a given application, is designed on the top of an ordinary FPGA. As an example, a virtual reconfigurable circuit is constructed to speed up the software model, which was utilized for the evolutionary design of image operators.

Journal ArticleDOI
TL;DR: This paper presents an overview of hardware implementations for the two commonly used types of public key cryptography, i.e. RSA and elliptic curve cryptography, both based on modular arithmetic.

Proceedings ArticleDOI
09 Apr 2003
TL;DR: An SEU simulator based on the SLAAC-1V computing board has been developed and is being used to characterize the reliability of SEU mitigation techniques for FPGAs.
Abstract: FPGAs are an appealing solution for space-based remote sensing applications. However, in a low-Earth orbit, FPGAs (field programmable gate arrays) are susceptible to Single-Event Upsets (SEUs). In an effort to understand the effects of SEUs, an SEU simulator based on the SLAAC-1V computing board has been developed. This simulator artificially upsets the configuration memory of an FPGA and measures its impact on FPGA designs. The accuracy of this simulation environment has been verified using ground-based radiation testing. This simulation tool is being used to characterize the reliability of SEU mitigation techniques for FPGAs.

Book ChapterDOI
17 Mar 2003
TL;DR: The goal of the project is the development of a hardware platform capable of implementing systems inspired by all the three major axes (phylogenesis, ontogenesis, and epigenesis) of bio-inspiration, in digital hard-ware.
Abstract: It is clear to all, after a moments thought, that nature has much we might be inspired by when designing our systems, for example: robustness, adaptability and complexity, to name a few. The implementation of bioinspired systems in hardware has however been limited, and more often than not been more a matter of artistry than engineering. The reasons for this are many, but one of the main problems has always been the lack of a universal platform, and of a proper methodology for the implementation of such systems. The ideas presented in this paper are early results of a new research project, "Reconfigurable POEtic Tissue". The goal of the project is the development of a hardware platform capable of implementing systems inspired by all the three major axes (phylogenesis, ontogenesis, and epigenesis) of bio-inspiration, in digital hard-ware.

Proceedings ArticleDOI
15 Dec 2003
TL;DR: Results show that the parallel implementation of 2-D FFT achieves virtually linear speed-up and real-time performance for large matrix sizes, and an FPGA-based parametrisable environment based on the developed parallel 2- D FFT architecture is presented as a solution for frequency-domain image filtering application.
Abstract: Applications based on Fast Fourier Transform (FFT) such as signal and image processing require high computational power, plus the ability to experiment with algorithms. Reconfigurable hardware devices in the form of Field Programmable Gate Arrays (FPGAs) have been proposed as a way of obtaining high performance at an economical price. At present, however, users must program FPGAs at a very low level and have a detailed knowledge of the architecture of the device being used. To try to reconcile the dual requirements of high performance and ease of development, this paper reports on the design and realisation of a High Level framework for the implementation of 1-D and 2-D FFTs for real-time applications. Results show that the parallel implementation of 2-D FFT achieves virtually linear speed-up and real-time performance for large matrix sizes. Finally, an FPGA-based parametrisable environment based on the developed parallel 2-D FFT architecture is presented as a solution for frequency-domain image filtering application.

Proceedings ArticleDOI
09 Jul 2003
TL;DR: A cellular hardware implementation of a spiking neural network with run-time reconfigurable connectivity is presented on a compact custom FPGA board, which provides a powerful reconfigured hardware platform for hardware and software design.
Abstract: A cellular hardware implementation of a spiking neural network with run-time reconfigurable connectivity is presented. It is implemented on a compact custom FPGA board, which provides a powerful reconfigurable hardware platform for hardware and software design. Complementing the system, a CPU synthesized on the FPGA takes care of interfacing the network with the external world. The FPGA board and the hardware network are demonstrated in the form of a controller embedded on the Khepera robot for a task of obstacle avoidance. Finally, future implementations on new multi-cellular hardware are discussed.

01 Jan 2003
TL;DR: This paper proposes a multitasking environment that executes relocatable hardware tasks, uses a memory management unit translating task requests to internal and external memory accesses, and relies on device drivers and triggers to connect to external I/O.
Abstract: In this paper, we approach the rather new area of reconfigurable hardware operating systems in a top-down manner. First, we describe a design concept that defines basic abstractions and operating system services in a device-independent way. Then, we refine this model to an implementation concept on the Xilinx Virtex XCV-800 technology. The implementation concept proposes a multitasking environment that executes relocatable hardware tasks, uses a memory management unit translating task requests to internal and external memory accesses, and relies on device drivers and triggers to connect to external I/O. Finally, we present a prototypical implementation and an application case study. The application consists of a set of dynamically loaded and executed networking and multimedia tasks such as IP packet processing, AES decryption, and audio stream decoding.

Journal ArticleDOI
TL;DR: The paper gives an introduction to fine grain and coarse grain morphware, reconfigurable computing, and its impact on classical computer science and business models, and points out trends driven by microelectronics technology, EDA, and the mind set of data-stream-based computing.

Proceedings ArticleDOI
23 Feb 2003
TL;DR: It is shown that the intrinsic cost of traditional general-purpose FPGAs can be reduced if they are designed to target an application domain or a class of applications only, and a novel FPGA logic block architecture derived based on such an analysis, and which exploits properties of target applications, is presented.
Abstract: Although FPGAs are a cost-efficient alternative for both ASICs and general purpose processors, they still result in designs which are more than an order of magnitude more costly and slower than their equivalents implemented in dedicated logic. This efficiency gap makes FPGAs less suitable for high-volume cost-sensitive applications (e.g. embedded systems).We show that the intrinsic cost of traditional general-purpose FPGAs can be reduced if they are designed to target an application domain or a class of applications only. We propose a method of the application-domain characterization and apply it to characterize DSP. A novel FPGA logic block architecture derived based on such an analysis, and which exploits properties of target applications, is presented. Its key feature is the 'mixed-level granularity' being a trade-off between fine and coarse granularity required for the implementation of datapath and random logic functions, respectively. This leads to a factor of four improvement in the LUT memory size compared to commercial FPGAs, and, assuming a standard-cell implementation, a 1.6-2.8 lower datapath mapping cost. A modified mixed-grain architecture with the ALU-like functionality reduces the LUT memory size by a factor of 16 compared to commercial FPGAs, and mapped onto standard cells has a 1.9-3.3 times higher datapath mapping efficiency. For these reasons, the proposed FPGA architectures may be an interesting alternative to the traditional general-purpose FPGA devices, especially if characteristics of a target application domain are known a priority.

Journal ArticleDOI
TL;DR: A system chip targeting image and voice processing and recognition application domains is implemented as a representative of the potential of using programmable logic in system design.
Abstract: A system chip targeting image and voice processing and recognition application domains is implemented as a representative of the potential of using programmable logic in system design. It features an embedded reconfigurable processor built by joining a configurable and extensible processor core and an SRAM-based embedded field-programmable gate array (FPGA). Application-specific bus-mapped coprocessors and flexible input/output peripherals and interfaces can also be added and dynamically modified by reconfiguring the embedded FPGA. The architecture of the system is discussed as well as the design flows for pre- and post-silicon design and customization. The silicon area required by the system is 20 mm/sup 2/ in a 0.18-/spl mu/m CMOS technology. The embedded FPGA accounts for about 40% of the system area.

Proceedings ArticleDOI
09 Jun 2003
TL;DR: A versatile partial self-reconfiguration framework for FPGA field updates that customizes to specific applications, reduces reconfiguration times, and minimizes the need for external hardware is proposed.
Abstract: Field programmable gate arrays (FPGAs) provide an attractive solution to developers needing custom logic for short time-to-market products. Products embedding FPGA system-on-chip solutions have the advantage in that they can be updated once deployed. New FPGA firmware may be loaded via manufacturer-supplied memory devices or remotely via a network connection. Recent FPGAs allow for self-reconfiguration, where the user-FPGA fabric can internally modify its own configuration data. Using self-reconfiguration, configuration control protocols can be implemented in user logic. This allows new FPGA programming methods to be designed. We propose a versatile partial self-reconfiguration framework for FPGA field updates that customizes to specific applications, reduces reconfiguration times, and minimizes the need for external hardware. The framework provides flexibility in media sources and design security. A prototype using this framework is demonstrated on a Xilinx Virtex-II FPGA.

Proceedings ArticleDOI
09 Apr 2003
TL;DR: An application framework is discussed for developing CCM-based applications beyond just the hardware configuration that allows dynamic circuit configurations that include data folding optimizations based on user input and the resulting system aids in creating applications that are potentially more intuitive, easier to develop, and better performing.
Abstract: FPGA-based (field programmable gate array) configurable computing machines (CCMs) offer powerful and flexible general-purpose computing platforms. However, development for FPGA-based designs using modern CAD (computer aided design) tools is geared mainly toward an ASIC-like process. This is inadequate for the needs of CCM application development. This paper discusses an application framework for developing CCM-based applications beyond just the hardware configuration. This framework leverages the advantages of CCMs (availability, programmability, visibility, and controllability) to help create CCM-based applications throughout the entire development process (i.e. design, debug, and deploy). The framework itself is deployed with the final application, thus permitting dynamic circuit configurations that include data folding optimizations based on user input. The resulting system aids in creating applications that are potentially more intuitive, easier to develop, and better performing. An example application demonstrates the use of the application framework and the potential benefits.

Proceedings ArticleDOI
08 Sep 2003
TL;DR: An automatic and parameterized implementation for hyperspectral images has been developed in a hardware/software codesign approach and an unsupervised clustering technique k-means that uses the Euclidian distance to calculate the pixel to centers distance was used as a case study to validate the methodology.
Abstract: Unsupervised clustering is a powerful technique for understanding multispectral and hyperspectral images, k-means being one of the most used iterative approaches. It is a simple though computationally expensive algorithm, particularly for clustering large hyperspectral images into many categories. Software implementation presents advantages such as flexibility and low cost for implementation of complex functions. However, it presents limitations, such as difficulties in exploiting parallelism for high performance applications. In order to accelerate the k-means clustering, a hardware implementation could be used. The disadvantage in this approach is that any change in the project requires previous knowledge of the hardware design process and can take several weeks to be implemented. In order to improve the design methodology, an automatic and parameterized implementation for hyperspectral images has been developed in a hardware/software codesign approach. An unsupervised clustering technique k-means that uses the Euclidian distance to calculate the pixel to centers distance was used as a case study to validate the methodology. Two implementations, a software and a hardware/software codesign one, have been implemented. Although the hardware component operates at 40 MHz, being 12.5 times less than the software operating frequency (PC), the codesign implementation was approximately 2 times faster than software one.

Proceedings ArticleDOI
23 Feb 2003
TL;DR: This paper shows how a systolic structure can accelerate placement by assigning one processing element to each possible location for an FPGA LUT from a design netlist, and demonstrates that this technique approaches the same quality point as traditional simulated annealing as measured by a simple linear wirelength metric.
Abstract: To truly exploit FPGAs for rapid turn-around development and prototyping, placement times must be reduced to seconds; late-bound, reconfigurable computing applications may demand placement times as short as microseconds. In this paper, we show how a systolic structure can accelerate placement by assigning one processing element to each possible location for an FPGA LUT from a design netlist. We demonstrate that our technique approaches the same quality point as traditional simulated annealing as measured by a simple linear wirelength metric. Experimental results look ahead to compare quality against VPR's fast placer when considering the minimum channel width required to route as the primary optimization criteria. Preliminary results from an FPGA implementation show the feasibility of accelerating simulated annealing by three orders of magnitude using this approach. This means we can place the largest design in the University of Toronto's "FPGA Placement and Routing Challenge" in around 4ms.

Proceedings ArticleDOI
28 Sep 2003
TL;DR: A general framework under which a significant portion of the data mining task is implemented in fast hardware, close to the magnetic media on which it is stored is described and initial performance results for a set of applications are provided.
Abstract: In many data mining applications, the size of the database is not only extremely large, it is also growing rapidly. Even for relatively simple searches, the time required to move the data off magnetic media, cross the system bus into main memory, copy into processor cache, and then execute code to perform a search is prohibitive. We are building a system in which a significant portion of the data mining task (i.e., the portion that examines the bulk of the raw data) is implemented in fast hardware, close to the magnetic media on which it is stored. Furthermore, this hardware can be replicated allowing mining tasks to be performed in parallel, thus providing further speedup for the overall mining application. In this paper, we describe a general framework under which this can be accomplished and provide initial performance results for a set of applications.

Book ChapterDOI
01 Sep 2003
TL;DR: In this paper, the authors propose an island-style FPGA architecture that provides interconnect capable of scaling at the same rate as typical netlists, unlike traditionally tiled FPGAs.
Abstract: This paper proposes modifications to standard island-style FPGAs that provide interconnect capable of scaling at the same rate as typical netlists, unlike traditionally tiled FPGAs. The proposal uses a logical third and fourth dimensions to create increasing wire density for increasing logic capacity. The additional dimensions are mapped to standard two-dimensional silicon. This innovation will increase the longevity of a given cell architecture, and reduce the cost of hardware, CAD tool and Intellectual Property (IP) redesign. In addition, extra-dimensional FPGA architectures provide a conceptual unification of standard FPGAs and time-multiplexed FPGAs.

Proceedings ArticleDOI
22 Apr 2003
TL;DR: A methodology for modeling of dynamically reconfigurable blocks at the system-level using SystemC 2.0 allows us to do true design space exploration at thesystem-level, without the need to map the design first to an actual technology implementation.
Abstract: To cope with the increasing demand for higher computational power and flexibility, dynamically reconfigurable blocks have become an important part inside a system-on-chip. Several methods have been proposed to incorporate their reconfiguration aspects into a design flow. They all lack either an interface to commercially available and industrially used tools or are restricted to a single vendor or technology environment. Therefore a methodology for modeling of dynamically reconfigurable blocks at the system-level using SystemC 2.0 is presented. The high-level model is based on a multi-context representation of the different functionalities that will be mapped on the reconfigurable block during different run-time periods. By specifying the estimated times of context-switching and active-running in the selected functionality modes, the methodology allows us to do true design space exploration at the system-level, without the need to map the design first to an actual technology implementation.

01 Jan 2003
TL;DR: A platform has been implemented that actively scans and filters Internet traffic for Internet worms and viruses at multi-Gigabit/second rates using the Field-programmable Port Extender (FPX), and logic that allows modules to be dynamically reconfigured to scan for new signatures.
Abstract: The security of the Internet can be improved using Programmable Logic Devices (PLDs). A platform has been implemented that actively scans and filters Internet traffic for Internet worms and viruses at multi-Gigabit/second rates using the Field-programmable Port Extender (FPX). Modular components implemented with Field Programmable Gate Array (FPGA) logic on the FPX process packet headers and scan for signatures of malicious software (malware) carried in packet payloads. FPGA logic is used to implement circuits that track the state of Internet flows and search for regular expressions and fixed-strings that appear in the content of packets. The FPX contains logic that allows modules to be dynamically reconfigured to scan for new signatures. Network-wide protection is achieved by the deployment of multiple systems throughout the Internet.

Journal ArticleDOI
TL;DR: In this paper, contemporary CCM architectures that allow dynamic hardware reconfiguration with maximum flexibility are reviewed and assessed, followed by design recommendations for CCM architecture for use in software radio handsets.
Abstract: The advent of software radios has brought a paradigm shift to radio design. A multimode handset with dynamic reconfigurability has the promise of integrated services and global roaming capabilities. However, most of the work to date has been focused on software radio base stations, which do not have as tight constraints on area and power as handsets. Base station software radio technology progressed dramatically with advances in system design, adaptive modulation and coding techniques, reconfigurable hardware, A/D converters, RF design, and rapid prototyping systems, and has helped bring software radio handsets a step closer to reality. However, supporting multimode radios on a small handset still remains a design challenge. A configurable computing machine, which is an optimized FPGA with application-specific capabilities, show promise for software radio handsets in optimizing hardware implementations for heterogeneous systems. In this article contemporary CCM architectures that allow dynamic hardware reconfiguration with maximum flexibility are reviewed and assessed. This is followed by design recommendations for CCM architectures for use in software radio handsets.

Proceedings ArticleDOI
23 Feb 2003
TL;DR: This report investigates a methodology to efficiently implement block ciphers in CLB-based FPGA's and proposes designs that unroll the 10 AES rounds and pipeline them in order to optimize the frequency and throughput results.
Abstract: Reprogrammable devices such as Field Programmable Gate Arrays (FPGA's) are highly attractive options for hardware implementations of encryption algorithms and this report investigates a methodology to efficiently implement block ciphers in CLB-based FPGA's. Our methodology is applied to the new Advanced Encryption Standard RIJNDAEL and the resulting designs offer better performances than previously published in literature. We propose designs that unroll the 10 AES rounds and pipeline them in order to optimize the frequency and throughput results. In addition, we implemented solutions that allow to change the plaintext and the key on a cycle-by-cycle basis with no dead cycles. Another strong focus is placed on low area circuits and we propose sequential designs with very low area requirements. Finally we demonstrate that RAM-based implementations implies different constraints but our methodology still holds.