scispace - formally typeset
Search or ask a question

Showing papers on "Reconfigurable computing published in 2004"


Journal ArticleDOI
TL;DR: A microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution of the processor and to prove the viability of the proposal, the proposal was experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA.
Abstract: In this paper, we present a polymorphic processor paradigm incorporating both general-purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program: In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions, is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.

436 citations


Journal ArticleDOI
TL;DR: This paper focuses on a runtime system for guarantee-based scheduling of hard real-time tasks, formulate the scheduling problem for the 1D and 2D resource models and present two heuristics, the horizon and the stuffing technique, to tackle it.
Abstract: Today's reconfigurable hardware devices have huge densities and are partially reconfigurable, allowing for the configuration and execution of hardware tasks in a true multitasking manner. This makes reconfigurable platforms an ideal target for many modern embedded systems that combine high computation demands with dynamic task sets. A rather new line of research is engaged in the construction of operating systems for reconfigurable embedded platforms. Such an operating system provides a minimal programming model and a runtime system. The runtime system performs online task and resource management. In this paper, we first discuss design issues for reconfigurable hardware operating systems. Then, we focus on a runtime system for guarantee-based scheduling of hard real-time tasks. We formulate the scheduling problem for the 1D and 2D resource models and present two heuristics, the horizon and the stuffing technique, to tackle it. Simulation experiments conducted with synthetic workloads evaluate the performance and the runtime efficiency of the proposed schedulers. The scheduling performance for the 1D resource model is strongly dependent on the aspect ratios of the tasks. Compared to the 1D model, the 2D resource model is clearly superior. Finally, the runtime overhead of the scheduling algorithms is shown to be acceptably low.

302 citations


Book
01 Jan 2004
TL;DR: This book introduces the essentials of VLSI: fabrication, circuits, interconnects, combinational and sequential logic design, system architectures, and more, and demonstrates how to reflect this VLSi knowledge in a state-of-the-art design methodology that leverages FPGA's most valuable characteristics while mitigating its limitations.
Abstract: Everything FPGA designers need to know about FPGAs and VLSI Digital designs once built in custom silicon are increasingly implemented in field programmable gate arrays (FPGAs). Effective FPGA system design requires a strong understanding of VLSI issues and constraints, and an understanding of the latest FPGA-specific techniques. In this book, Princeton University's Wayne Wolf covers everything FPGA designers need to know about all these topics: both the "how" and the "why." Wolf begins by introducing the essentials of VLSI: fabrication, circuits, interconnects, combinational and sequential logic design, system architectures, and more. Next, he demonstrates how to reflect this VLSI knowledge in a state-of-the-art design methodology that leverages FPGA's most valuable characteristics while mitigating its limitations. Coverage includes: How VLSI characteristics affect FPGAs and FPGA-based logic design How classical logic design techniques relate to FPGA-based logic design Understanding FPGA fabrics: the basic programmable structures of FPGAs Specifying and optimizing logic to address size, speed, and power consumption Verilog, VHDL, and software tools for optimizing logic and designs The structure of large digital systems, including register-transfer design methodology Building large-scale platform and multi-FPGA systems A start-to-finish DSP case study addressing a wide range of design problems PRENTICE HALL Professional Technical Reference Upper Saddle River, NJ 07458 www.phptr.com ISBN: 0-13-142461-0

248 citations


Journal ArticleDOI
TL;DR: The ACE16k as mentioned in this paper is a member of the third generation of the ACE chips, which is designed in a 0.35-/spl mu/m standard CMOS technology, and exhibits peak computing figures of 330 GOPS, 3.6 GOPS/mm/sup 2/ and 82.5 GOPS /W.
Abstract: Today, with 0.18-/spl mu/m technologies mature and stable enough for mixed-signal design with a large variety of CMOS compatible optical sensors available and with 0.09-/spl mu/m technologies knocking at the door of designers, we can face the design of integrated systems, instead of just integrated circuits. In fact, significant progress has been made in the last few years toward the realization of vision systems on chips (VSoCs). Such VSoCs are eventually targeted to integrate within a semiconductor substrate the functions of optical sensing, image processing in space and time, high-level processing, and the control of actuators. The consecutive generations of ACE chips define a roadmap toward flexible VSoCs. These chips consist of arrays of mixed-signal processing elements (PEs) which operate in accordance with single instruction multiple data (SIMD) computing architectures and exhibit the functional features of CNN Universal Machines. They have been conceived to cover the early stages of the visual processing path in a fully-parallel manner, and hence more efficiently than DSP-based systems. Across the different generations, different improvements and modifications have been made looking to converge with the newest discoveries of neurobiologists regarding the behavior of natural retinas. This paper presents considerations pertaining to the design of a member of the third generation of ACE chips, namely to the so-called ACE16k chip. This chip, designed in a 0.35-/spl mu/m standard CMOS technology, contains about 3.75 million transistors and exhibits peak computing figures of 330 GOPS, 3.6 GOPS/mm/sup 2/ and 82.5 GOPS/W. Each PE in the array contains a reconfigurable computing kernel capable of calculating linear convolutions on 3/spl times/3 neighborhoods in less than 1.5 /spl mu/s, imagewise Boolean combinations in less than 200 ns, imagewise arithmetic operations in about 5 /spl mu/s, and CNN-like temporal evolutions with a time constant of about 0.5 /spl mu/s. Unfortunately, the many ideas underlying the design of this chip cannot be covered in a single paper; hence, this paper is focused on, first, placing the ACE16k in the ACE chip roadmap and, then, discussing the most significant modifications of ACE16K versus its predecessors in the family.

230 citations


Journal ArticleDOI
TL;DR: This contribution provides a state-of-the-art description of security issues on FPGAs, both from the system and implementation perspectives, and summarizes both public and symmetric-key algorithm implementations on FGPAs.
Abstract: In the last decade, it has become apparent that embedded systems are integral parts of our every day lives. The wireless nature of many embedded applications as well as their omnipresence has made the need for security and privacy preserving mechanisms particularly important. Thus, as field programmable gate arrays (FPGAs) become integral parts of embedded systems, it is imperative to consider their security as a whole. This contribution provides a state-of-the-art description of security issues on FPGAs, both from the system and implementation perspectives. We discuss the advantages of reconfigurable hardware for cryptographic applications, show potential security problems of FPGAs, and provide a list of open research problems. Moreover, we summarize both public and symmetric-key algorithm implementations on FPGAs.

203 citations


Journal ArticleDOI
TL;DR: Chimaera is described, a system that overcomes the communication bottleneck by integrating reconfigurable logic into the host processor itself and enables the creation of multi-operand instructions and a speculative execution model key to high-performance, general-purpose reconfiguring computing.
Abstract: By strictly separating reconfigurable logic from the host processor, current custom computing systems suffer from a significant communication bottleneck. In this paper, we describe Chimaera, a system that overcomes the communication bottleneck by integrating reconfigurable logic into the host processor itself. With direct access to the host processor's register file, the system enables the creation of multi-operand instructions and a speculative execution model key to high-performance, general-purpose reconfigurable computing. Chimaera also supports multi-output functions and utilizes partial run-time reconfiguration to reduce reconfiguration time. Combined, the system can provide speedups of a factor of two or more for general-purpose computing, and speedups of 160 or more are possible for hand-mapped applications.

179 citations


Proceedings ArticleDOI
20 Apr 2004
TL;DR: The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs and considers the historical context of the last six years and is extrapolated for the next six years.
Abstract: Field programmable gate arrays (FPGAs) have long been an attractive alternative to microprocessors for computing tasks - as long as floating-point arithmetic is not required. Fueled by the advance of Moore's law, FPGAs are rapidly reaching sufficient densities to enhance peak floating-point performance as well. The question, however, is how much of this peak performance can be sustained. This paper examines three of the basic linear algebra subroutine (BLAS) functions: vector dot product, matrix-vector multiply, and matrix multiply. A comparison of microprocessors, FPGAs, and reconfigurable computing platforms is performed for each operation. The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs. This analysis considers the historical context of the last six years and is extrapolated for the next six years.

171 citations


Proceedings ArticleDOI
22 Feb 2004
TL;DR: A deeper understanding of the tradeoff between system complexity and performance when designing Configurable SoC as well as designing software for CSoC is provided and the one to two orders of magnitude in speedup of FPGAs over CPU after accounting for clock frequencies is understood.
Abstract: The speedup over a microprocessor that can be achieved by implementing some programs on an FPGA has been extensively reported. This paper presents an analysis, both quantitative and qualitative, at the architecture level of the components of this speedup. Obviously, the spatial parallelism that can be exploited on the FPGA is a big component. By itself, however, it does not account for the whole speedup.In this paper we experimentally analyze the remaining components of the speedup. We compare the performance of image processing application programs executing in hardware on a Xilinx Virtex E2000 FPGA to that on three general-purpose processor platforms: MIPS, Pentium III and VLIW. The question we set out to answer is what is the inherent advantage of a hardware implementation over a von Neumann platform. On the one hand, the clock frequency of general-purpose processors is about 20 times that of typical FPGA implementations. On the other hand, the iteration level parallelism on the FPGA is one to two orders of magnitude that on the CPUs. In addition to these two factors, we identify the efficiency advantage of FPGAs as an important factor and show that it ranges from 6 to 47 on our test benchmarks. We also identify some of the components of this factor: the streaming of data from memory, the overlap of control and data flow and the elimination of some instruction on the FPGA. The results provide a deeper understanding of the tradeoff between system complexity and performance when designing Configurable SoC as well as designing software for CSoC. They also help understand the one to two orders of magnitude in speedup of FPGAs over CPU after accounting for clock frequencies.

167 citations


Journal ArticleDOI
TL;DR: The design of a high-performance field programmable gate array (FPGA) architecture that efficiently prototypes asynchronous (clockless) logic that maintains most of the performance benefits of a custom asynchronous design, while also providing postfabrication logic reconfigurability.
Abstract: We discuss the design of a high-performance field programmable gate array (FPGA) architecture that efficiently prototypes asynchronous (clockless) logic. In this FPGA architecture, low-level application logic is described using asynchronous dataflow functions that obey a token-based compute model. We implement these dataflow functions using finely pipelined asynchronous circuits that achieve high computation rates. This asynchronous dataflow FPGA architecture maintains most of the performance benefits of a custom asynchronous design, while also providing postfabrication logic reconfigurability. We report results for two asynchronous dataflow FPGA designs that operate at up to 400 MHz in a typical TSMC 0.25 /spl mu/m CMOS process.

137 citations


Proceedings ArticleDOI
20 Apr 2004
TL;DR: This role for design patterns in reconfigurable computing is articulated, a few example patterns are provided, a starting point for the contents of the catalog is offered, and the potential benefits of this effort are discussed.
Abstract: It is valuable to identify and catalog design patterns for reconfigurable computing. These design patterns are canonical solutions to common and recurring design challenges which arise in reconfigurable systems and applications. The catalog can form the basis for creating designs, for educating new designers, for understanding the needs of tools and languages, and for discussing reconfigurable design. Tying application and implementation lessons to the expansion and refinement of this catalog make those lessons more relevant to the design community. In this paper, we articulate this role for design patterns in reconfigurable computing, provide a few example patterns, offer a starting point for the contents of the catalog, and discuss the potential benefits of this effort.

132 citations


Journal ArticleDOI
TL;DR: In this paper, hybrid chips containing both CPU and FPGA components are presented as an exciting new development promising commercial off-the-shelf economies of scale, while also supporting hardware customization.
Abstract: Emerging hybrid chips containing cpu and FPGA components are an exciting new development promising commercial off-the-shelf economies of scale, while also supporting hardware customization.

Book
26 Apr 2004
TL;DR: This is the first to focus exclusively and comprehensively on FPGA use for embedded systems, and will help engineers get familiar and succeed with this new technology by providing much-needed advice on choosing the right FPGAs for any design project.
Abstract: Field Programmable Gate Arrays (FPGAs) are devices that provide a fast, low-cost way for embedded system designers to customize products and deliver new versions with upgraded features, because they can handle very complicated functions, and be reconfigured an infinite number of times. In addition to introducing the various architectural features available in the latest generation of FPGAs, The Design Warriors Guide to FPGAs also covers different design tools and flows.This book covers information ranging from schematic-driven entry, through traditional HDL/RTL-based simulation and logic synthesis, all the way up to the current state-of-the-art in pure C/C++ design capture and synthesis technology. Also discussed are specialist areas such as mixed hardward/software and DSP-based design flows, along with innovative new devices such as field programmable node arrays (FPNAs). Clive "Max" Maxfield is a bestselling author and engineer with a large following in the electronic design automation (EDA)and embedded systems industry. In this comprehensive book, he covers all the issues of interest to designers working with, or contemplating a move to, FPGAs in their product designs. While other books cover fragments of FPGA technology or applications this is the first to focus exclusively and comprehensively on FPGA use for embedded systems. * First book to focus exclusively and comprehensively on FPGA use in embedded designs* World-renowned best-selling author* Will help engineers get familiar and succeed with this new technology by providing much-needed advice on choosing the right FPGA for any design project

Proceedings ArticleDOI
26 Jun 2004
TL;DR: A state-of-the-art review of reconfigurability and reconfigurable manufacturing systems can be found in this article, where the authors define reconfiguration as the ability to repeatedly change and rearrange the components of a system in a cost-effective way.
Abstract: This paper presents a state-of-the-art review of reconfigurability and reconfigurable manufacturing systems. Reconfigurability is defined as the ability to repeatedly change and rearrange the components of a system in a cost-effective way. This concept is illustrated through its application in computing, automated assembly and robotics. Then, the evolution of manufacturing, from dedicated to flexible manufacturing systems, is briefly discussed and the need for reconfigurable manufacturing systems is outlined. These are further studied by analysing their key features (modularity, integrability, customisation, convertibility, and diagnosability) and challenges (product variability, responsiveness, nonobsolescence, cost-effectiveness, reliability and simplicity). It is shown that there are common research issues in reconfigurable computing, robotics and manufacturing such as system-module-component interfaces, design methodologies, modularity, tools and toolsuites development, strategic analysis and business modelling, training, and support. Finally, the research priorities of the I*PROMS Network of Excellence in the area of reconfigurable manufacturing are outlined

Journal ArticleDOI
TL;DR: This work introduces an effective, low-cost repair solution in which originally unused blocks and routing resources replace faulty parts, and the proposed reconfiguration hardware allows autonomous repair, that is, the system does not require external intervention for recovery.
Abstract: Fault-tolerant systems typically require expensive additional resources (spare pins, columns, and chips) and external control for reconfiguration. We introduce an effective, low-cost repair solution in which originally unused blocks and routing resources replace faulty parts. In addition, the proposed reconfiguration hardware allows autonomous repair, that is, the system does not require external intervention for recovery.

Book
01 Jan 2004
TL;DR: This chapter discusses the development of Evolvable Computational Machines and their applications in Dynamic Environments, as well as some of the techniques used to design and implement these machines.
Abstract: 1 Introduction.- 1.1 Natural Computing.- 1.1.1 Soft Computing.- 1.1.2 Quantum Computing.- 1.1.3 DNA Computing.- 1.1.4 Membrane Computing.- 1.2 Bioinspired Hardware.- 1.3 Motivation for Research.- 2 Reconfigurable Hardware.- 2.1 Digital Cicuits.- 2.2 Digital Circuit Design.- 2.3 Field Programmable Gate arrays.- 2.3.1 Architecture of FPGAs.- 2.3.2 The XC4000 Family.- 2.3.3 ThE Virtex Family.- 2.3.4 The XC6200 Family.- 2.3.5 Atmel FPGAs.- 2.3.6 Features of FPGAs.- 2.4 Hardware Reused as Software.- 2.5 Reconfigurable Computing.- 2.6 Nanotechnology.- 2.7 Cell Matrix.- 2.8 Summary.- 3 Evolutionary Algorithms.- 3.1 Introduction.- 3.2 Variant of Evolutionary Algorithms.- 3.2.1 Genetic Algorithms.- 3.2.2 Genetic Programming.- 3.2.3 Evolutionary Strategies.- 3.2.4 Evolutionary Programming.- 3.3 Some Other Features of Evolutionary Algorithms.- 3.3.1 Parallel Implementations.- 3.3.2 Dynamic Fitness Function.- 3.4 Evolutionary Design and Optimization.- 3.5 The Evolutionary Algorithm Design.- 3.5.1 Missing Theories.- 3.5.2 The Design Strategies.- 3.6 Formal Approach.- 3.7 Summary.- 4 Evolvable Hardware.- 4.1 Basic Concept.- 4.2 Cartesian Genetic Programming.- 4.3 Features of Cartesian Genetic Programming.- 4.3.1 Redundancy and Neutrality.- 4.3.2 Fitness Landscape Analysis.- 4.3.3 Implementation Issues.- 4.4 From Chromosome to Fitness Value.- 4.4.1 Representation.- 4.4.2 Platforms for Circuit Evolution.- 4.4.3 Circuit Evaluation.- 4.5 Fitness Function.- 4.5.1 Fitness Function and Circuit Behavior.- 4.5.2 Evolutionary Circuit Design: Static Fitness Function.- 4.5.3 Evolvable Hardware: Dynamic Fitness Function.- 4.5.4 Discussion.- 4.6 Applications and Degree of Hardware Implementation.- 4.7 Promising Results.- 4.8 Major Current Problems and Potential Solutions.- 4.8.1 Scalability of Representaion.- 4.8.2 SCalability of Fitnes Evaluation.- 4.8.3 Robustness of the Evolved Circuits.- 4.8.4 Applications in Dynamic Environments.- 4.9 Summary.- 5 Towards Evolvable Components.- 5.1 Component Approach to Problem Solving.- 5.2 Evolvable Components.- 5.2.1 System Decomposition.- 5.2.2 Interface.- 5.3 Hardware Implementation.- 5.3.1 Evolvable Componenets.- 5.3.2 Environment.- 5.3.3 Communication Betweem Evolvable Component and Environment.- 5.4 Extension of Evolvable Components.- 5.5 Summary.- 6 Evolvable Computational Machines.- 6.1 Computational Machines and Evolutionary Design.- 6.2 Cellular Automata.- 6.2.1 Basic Model.- 6.2.2 Evolvable Non-Uniform CEllular Automaton.- 6.2.3 An example: Evolvable Non-Uniform Cellular Automaton as a Sequence Generator.- 6.3 General Evolvable Computational Machine.- 6.4 Dynamic Environment.- 6.5 Evolvable Computational System.- 6.5.1 Formal Definition.- 6.5.2 An example: Formal Description of a Simple Image Compression.- 6.6 Properties of Evolvable Machines.- 6.6.1 On the Computation of Evolvable Machines.- 6.6.2 Mappings g and f.- 6.6.3 Changing Fitness Fuction.- 6.7 The Computational Power.- 6.7.1 The Turing Machine and the Church Turing Thesis.- 6.7.2 Beyond the Turing Machines.- 6.7.3 A New Paradigm.- 6.7.4 Site Machine.- 6.7.5 the Power of an Evolvable System.- 6.7.6 Discussion.- 6.8 Summary.- 7 An Evolvable Component for Image Pre-processing.- 7.1 Motivation and Problem Specification.- 7.2 The Image Filter Design.- 7.2.1 Types of Noise Considered for Testing.- 7.2.2 Convnetional Approaches.- 7.2.3 Implementation of FPGAs.- 7.2.4 A Brief Survey of Evolutionary Approaches.- 7.3 Analysis of Reconfigurability and Size of the Search Space.- 7.3.1 Elementary Measures.- 7.3.2 Cartesian Genetic Programming in Hardware.- 7.3.3 Cartesian Genetic Programming at the Fuctional Level.- 7.4 Evolutionary Design: Experimental Framework.- 7.4.1 Reconfigurable Circuit.- 7.4.2 Evolutionary Algorithms.- 7.4.3 Fitness Function.- 7.5 Filters for Smoothing.- 7.5.1 The Results.- 7.5.2 Discussion.- 7.6 Other Image Operators.- 7.6.1 "Salt and Pepper" Noise Filters.- 7.6.2 Random Shot-Noise Filters.- 7.6.3 Edge Detectors.- 7.7 Dynamics Environment.- 7.7.1 Experimental Setup.- 7.7.2 The Results in Tables 7.9 and 7.10.- 7.7.3 Discussion.- 7.8 A Note on a Single Filter Design.- 7.9 Summary.- Virtual Reconfigurable Devices.- 8.1 Chip on Top of a Chip.- 8.2 Architecture of Virtual Reconfigurable Circuits.- 8.2.1 Overview.- 8.2.2 Routing Logic and Configuration Memory.- 8.2.3 Configuration Options.- 8.3 Implementation Costs.- 8.4 Speeding up the Evolutionary Design.- 8.5 Genetic Unit.- 8.6 Physical Realization.- 8.7 Discussion.- 8.8 Summary.- 9 Concluding Statements.- 9.1 The Approach.- 9.2 The Obtained Results.- 9.3 Future Work.- References.

Book ChapterDOI
05 Feb 2004
TL;DR: The resulting design offers better hardware efficiency than other recent 128-key-bit block ciphers and Resistance against side-channel cryptanalysis was also considered as a design criteria for ICEBERG.
Abstract: We present a fast involutional block cipher optimized for reconfigurable hardware implementations. ICEBERG uses 64-bit text blocks and 128-bit keys. All components are involutional and allow very efficient combinations of encryption/decryption. Hardware implementations of ICEBERG allow to change the key at every clock cycle without any performance loss and its round keys are derived “on-the-fly” in encryption and decryption modes (no storage of round keys is needed). The resulting design offers better hardware efficiency than other recent 128-key-bit block ciphers. Resistance against side-channel cryptanalysis was also considered as a design criteria for ICEBERG.

Book ChapterDOI
11 Aug 2004
TL;DR: In this paper, the authors investigate the vulnerability of Rijndael FPGA (Field Programmable Gate Array) implementations to power analysis attacks and propose theoretical predictions of the attacks that are confirmed experimentally.
Abstract: Since their publication in 1998, power analysis attacks have attracted significant attention within the cryptographic community. So far, they have been successfully applied to different kinds of (unprotected) implementations of symmetric and public-key encryption schemes. However, most published attacks apply to smart cards and only a few publications assess the vulnerability of hardware implementations. In this paper we investigate the vulnerability of Rijndael FPGA (Field Programmable Gate Array) implementations to power analysis attacks. The design used to carry out the experiments is an optimized architecture with high clock frequencies, presented at CHES 2003. First, we provide a clear discussion of the hypothesis used to mount the attack. Then, we propose theoretical predictions of the attacks that we confirmed experimentally, which are the first successful experiments against an FPGA implementation of Rijndael. In addition, we evaluate the effect of pipelining and unrolling techniques in terms of resistance against power analysis. We also emphasize how the efficiency of the attack significantly depends on the knowledge of the design.

Proceedings ArticleDOI
26 Apr 2004
TL;DR: Two FPGA-based algorithms for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications, are proposed, and the design tradeoffs in implementing this kernel on FPGAs are analyzed.
Abstract: Summary form only given. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of hardware implementations of scientific computations. We propose two FPGA-based algorithms for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications. We analyze the design tradeoffs in implementing this kernel on FPGAs. Our algorithms employ a linear array architecture with a small control logic. This architecture effectively utilizes the hardware resources on the entire FPGA and reduces the routing complexity. The processing elements (PEs) used in our algorithms are modular so that floating-point units can be easily embedded into them. In our designs, the floating-point units are optimized to maximize the number of PEs integrated on the FPGA as well as the clock speed. Experimental results show that our algorithms achieve high clock speeds and provide good scalability. Our algorithms achieve superior sustained floating-point performance compared with existing FPGA-based implementations and state-of-the-art processors.

Proceedings ArticleDOI
11 Jun 2004
TL;DR: A compile-time approach to reuse data in window-based codes is presented, which simplifies the HDL code generation and improves the resulting hardware performance.
Abstract: Balancing computation with I/O has been considered as a critical factor of the overall performance for embedded systems in general and reconfigurable computing systems in particular. Data I/O often dominates the overall computation performance for window operation, which are frequently used in image processing, image compression, pattern recognition and digital signal processing. This problem is more acute in reconfigurable systems since the compiler must generate the data path and the sequence of operations. The challenge is to intelligently exploit data reuse on the reconfigurable fabric (FPGA) to minimize the required memory or I/O bandwidth while maximizing parallelism.In this paper, we present a compile-time approach to reuse data in window-based codes. The compiler, called ROCCC, first analyzes and optimizes the window operation in C. It then computes the size of the hardware buffer and defines three sets of data values for each window: the window set, the managed set and the killed set. This compile-time analysis simplifies the HDL code generation and improves the resulting hardware performance. We also discuss in-place window operations.

Journal ArticleDOI
TL;DR: Results demonstrate the benefits of the approach (achieving similar performance to a static configuration solution but using half of the resources) and the hardware configuration prefetch unit is useful in applications with low level of parallelism.
Abstract: Dynamic scheduling for system-on-chip (SoC) platforms has become an important field of research due to the emerging range of applications with dynamic behavior (e.g., MPEG-4). Dynamically reconfigurable architectures are an interesting solution for this type of applications. Scheduling for dynamically reconfigurable architectures might be classified in two major broad categories: (1) static scheduling techniques or (2) use of an operating system (OS) for reconfigurable computing. However, research efforts demonstrate a trend to move tasks traditionally assigned to the OS into hardware (thus increasing performance and reducing power).In this paper, we introduce a methodology for dynamically reconfigurable architectures. The dynamic scheduling of tasks to several reconfigurable units is performed by a hardware-based multitasking support unit. Two different versions of the microarchitecture are possible (with or without a hardware configuration prefetch unit). The dynamic scheduling algorithms are also explained. Both algorithms try to minimize the reconfiguration overhead by overlapping the execution of tasks with device reconfigurations.An exhaustive study (using the developed simulation and performance analysis framework) of this novel proposal is presented, and the effect of the microarchitecture parameters has been studied. Results demonstrate the benefits of our approach (achieving similar performance to a static configuration solution but using half of the resources). The hardware configuration prefetch unit is useful (i.e., minimize the execution time) in applications with low level of parallelism.

Journal ArticleDOI
TL;DR: In this paper, the state-of-the-art in reconfigurable hardware SAT satisfiers is presented, and the analysis and classification of existing systems has been performed according to such criteria as algorithmic issues, reconfiguration modes, the execution model, the programming model, logic capacity, and performance.
Abstract: By adapting to computations that are not so well-supported by general-purpose processors, reconfigurable systems achieve significant increases in performance. Such computational systems use high-capacity programmable logic devices and are based on processing units customized to the requirements of a particular application. A great deal of the research effort in this area is aimed at accelerating the solution of combinatorial optimization problems. Special attention in this context was given to the Boolean satisfiability (SAT) problem resulting in a considerable number of different architectures being proposed. This paper presents the state-of-the-art in reconfigurable hardware SAT satisfiers. The analysis and classification of existing systems has been performed according to such criteria as algorithmic issues, reconfiguration modes, the execution model, the programming model, logic capacity, and performance.

Journal ArticleDOI
TL;DR: The design and implementation of an OFDM receiver in the RaPiD reconfigurable architecture is presented as a case study for comparing the relative cost and performance of ASIC, DSP, FPGA, and coarse-grained reconfigurability architectures.
Abstract: Field-programmable gate arrays (FPGAs) have become an extremely popular implementation technology for custom hardware because they offer a combination of low cost and very fast turnaround. Because of their in-system reconfigurability, FPGAs have also been suggested as an efficient replacement for application-specific integrated circuits (ASICs) and digital signal processors (DSPs) for applications that require a combination of high performance, low cost, and flexibility. Unfortunately, the use of FPGAs in mobile embedded systems platforms is hampered by the very large overhead of FPGA-based architectures. Coarse-grained configurable architectures can reduce this overhead substantially by taking advantage of the application domain to specialize the reconfigurable architecture via coarse-grained components and interconnects. This paper presents the design and implementation of an OFDM receiver in the RaPiD reconfigurable architecture as a case study for comparing the relative cost and performance of ASIC, DSP, FPGA, and coarse-grained reconfigurable architectures. RaPiD is a coarse-grained reconfigurable architecture specialized to the domain of signal and image processing. The RaPiD architecture provides a reconfigurable pipelined datapath controlled by efficient reconfigurable control logic: We have implemented the computationally intensive parts of an OFDM receiver on the RaPiD architecture and have developed careful estimates of corresponding implementations in representative ASIC, DSP and FPGA technology. Our results show that, for this application, RaPiD fills the cost/performance gap between programmable DSP and ASIC architectures, achieving a factor of 6 better than a DSP implementation but a factor of 6 less than an ASIC implementation.

Proceedings ArticleDOI
22 Feb 2004
TL;DR: An FPGA implementation is presented to speedup the pseudo-2DFDTD algorithm which is a simplified version of the 3D FDTD model and can be upgraded to 3D with limited modification of structure.
Abstract: Understanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling the electromagnetic space. The 3D FDTD buried object detection forward model is emerging as a useful application in mine detection and other subsurface sensing areas. However, the computation of this model is complex and time consuming. Implementing this algorithm in hardware will greatly increase its computational speed and widen its use in many other areas. We present an FPGA implementation to speedup the pseudo-2D FDTD algorithm which is a simplified version of the 3D FDTD model. The pseudo-2D model can be upgraded to 3D with limited modification of structure. We implement the pseudo-2D FDTD model for layered media and complete boundary conditions on an FPGA. The computational speed on the reconfigurable hardware design is about 24 times faster than a software implementation on a 3.0GHz PC. The speedup is due to pipelining, parallelism, use of fixed point arithmetic, and careful memory architecture design.

Proceedings ArticleDOI
16 Feb 2004
TL;DR: The configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/ software partitioning and even for fastconfigurable logic design in general.
Abstract: In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic - an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.

Proceedings ArticleDOI
07 Nov 2004
TL;DR: This paper proposes a novel configuration compression technique that exploits redundancies both within a configuration's bitstream as well as between bitstreams of multiple configurations, and is the first work that performs inter-configuration compression.
Abstract: Field programmable gate arrays (FPGAs) holds the possibility of dynamic reconfiguration. The key advantages of dynamic reconfiguration are the ability to rapidly adapt to dynamic changes and better utilization of the programmable hardware resources for multiple applications. However, with the advent of multi-million gate equivalent FPGAs, configuration time is increasingly becoming a concern. High reconfiguration cost can potentially wipe out any gains from dynamic reconfiguration. One solution to alleviate this problem is to exploit the high levels of redundancy in the configuration bitstream by compression. In this paper, we propose a novel configuration compression technique that exploits redundancies both within a configuration's bitstream as well as between bitstreams of multiple configurations. By maximizing reuse, our results show that the proposed technique performs 26.5-75.8% better than the previously proposed techniques. To the best of our knowledge, ours is the first work that performs inter-configuration compression.

Journal Article
TL;DR: In this paper, the authors present a runtime environment that partially reconfigures and executes hardware tasks on Xilinx Virtex FPGA's reconfigurable surface is split into a varying number of variable-sized vertical task slots that can accommodate the hardware tasks.
Abstract: We present a runtime environment that partially reconfigures and executes hardware tasks on Xilinx Virtex To that end, the FPGA's reconfigurable surface is split into a varying number of variable-sized vertical task slots that can accommodate the hardware tasks A bus-based communication infrastructure allows for task communication and I/O We discuss the design of the runtime system and its prototype implementation on an reconfigurable board architecture that was specifically tailored to reconfigurable hardware operating system research

Proceedings ArticleDOI
26 Apr 2004
TL;DR: To study single event upset (SEU) impact on signal processing applications, a novel fault injection technique to corrupt configuration bits is used, thereby simulating SEU faults and highlighting the benefits of dynamic reconfiguration for space-based reconfigurable computing.
Abstract: Summary form only given. We describe novel methods of exploiting the partial, dynamic reconfiguration capabilities of Xilinx Virtex 1000 FPGAs to manage transient faults due to radiation in space environments. The on-orbit fault detection scheme uses a radiation-hardened reconfiguration controller to continuously monitor the configuration bit streams of 9 Virtex FPGAs and to correct errors by partial, dynamic reconfiguration of the FPGAs while they continue to execute. To study single event upset (SEU) impact on our signal processing applications, we use a novel fault injection technique to corrupt configuration bits, thereby simulating SEU faults. By using dynamic reconfiguration, we can run the corrupted designs directly on the FPGA hardware, giving many orders of magnitude speed-up over purely software techniques. The fault injection method has been validated against proton beam testing, showing 97.6% agreement. Our work highlights the benefits of dynamic reconfiguration for space-based reconfigurable computing.

Book ChapterDOI
30 Aug 2004
TL;DR: A runtime environment that partially reconfigures and executes hardware tasks on Xilinx Virtex by splitting the FPGA’s reconfigurable surface into a varying number of variable-sized vertical task slots that can accommodate the hardware tasks.
Abstract: We present a runtime environment that partially reconfigures and executes hardware tasks on Xilinx Virtex. To that end, the FPGA’s reconfigurable surface is split into a varying number of variable-sized vertical task slots that can accommodate the hardware tasks. A bus-based communication infrastructure allows for task communication and I/O. We discuss the design of the runtime system and its prototype implementation on an reconfigurable board architecture that was specifically tailored to reconfigurable hardware operating system research.

Journal ArticleDOI
TL;DR: It is shown that networks-on-chip (NoC) are an ideal communication layer for dynamically reconfigurable SoCs, and how the OS provides run-time support for dynamic task relocation is explained.

Proceedings ArticleDOI
07 Nov 2004
TL;DR: This work presents a tree-based data structure, called T-trees, to represent the spatial and temporal relations among tasks and develops an efficient packing method and derive the condition to ensure the satisfaction of precedence constraints which model the temporal ordering among tasks induced by the execution of dynamically reconfigurable FPGAs.
Abstract: Improving logic capacity by time-sharing, dynamically reconfigurable FPGAs are employed to handle designs of high complexity and functionality. We model each task as a 3D-box and deal with the temporal floorplanning/placement problem for dynamically reconfigurable FPGA architectures. We present a tree-based data structure, called T-trees, to represent the spatial and temporal relations among tasks. Each node in a T-tree has at most three children which represent the dimensional relationship among tasks. For the T-tree, we develop an efficient packing method and derive the condition to ensure the satisfaction of precedence constraints which model the temporal ordering among tasks induced by the execution of dynamically reconfigurable FPGAs. Experimental results show that our tree-based formulation can achieve significantly better solution quality with less execution time than the most recent state-of-the-art work.