scispace - formally typeset
Search or ask a question

Showing papers on "Reconfigurable computing published in 2007"


Book
02 Nov 2007
TL;DR: This book is intended as an introduction to the entire range of issues important to reconfigurable computing, using FPGAs as the context, or "computing vehicles" to implement this powerful technology.
Abstract: The main characteristic of Reconfigurable Computing is the presence of hardware that can be reconfigured to implement specific functionality more suitable for specially tailored hardware than on a simple uniprocessor. Reconfigurable computing systems join microprocessors and programmable hardware in order to take advantage of the combined strengths of hardware and software and have been used in applications ranging from embedded systems to high performance computing. Many of the fundamental theories have been identified and used by the Hardware/Software Co-Design research field. Although the same background ideas are shared in both areas, they have different goals and use different approaches.This book is intended as an introduction to the entire range of issues important to reconfigurable computing, using FPGAs as the context, or "computing vehicles" to implement this powerful technology. It will take a reader with a background in the basics of digital design and software programming and provide them with the knowledge needed to be an effective designer or researcher in this rapidly evolving field. · Treatment of FPGAs as computing vehicles rather than glue-logic or ASIC substitutes · Views of FPGA programming beyond Verilog/VHDL · Broad set of case studies demonstrating how to use FPGAs in novel and efficient ways

531 citations


Book
01 Jan 2007
TL;DR: Digital Hardware Evolution.
Abstract: Digital Hardware Evolution.- An Online EHW Pattern Recognition System Applied to Sonar Spectrum Classification.- Design of Electronic Circuits Using a Divide-and-Conquer Approach.- Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel.- An Intrinsic Evolvable Hardware Based on Multiplexer Module Array.- Estimating Array Connectivity and Applying Multi-output Node Structure in Evolutionary Design of Digital Circuits.- Research on the Online Evaluation Approach for the Digital Evolvable Hardware.- Research on Multi-objective On-Line Evolution Technology of Digital Circuit Based on FPGA Model.- Evolutionary Design of Generic Combinational Multipliers Using Development.- Analog Hardware Evolution.- Automatic Synthesis of Practical Passive Filters Using Clonal Selection Principle-Based Gene Expression Programming.- Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware.- Analog Circuit Evolution Based on FPTA-2.- Bio-inspired Systems.- Knowledge Network Management System with Medicine Self Repairing Strategy.- Design of a Cell in Embryonic Systems with Improved Efficiency and Fault-Tolerance.- Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit.- Bio-inspired Systems with Self-developing Mechanisms.- Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System.- Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems.- Evolution of Polymorphic Self-checking Circuits.- Mechanical Hardware Evolution.- Sliding Algorithm for Reconfigurable Arrays of Processors.- System-Level Modeling and Multi-objective Evolutionary Design of Pipelined FFT Processors for Wireless OFDM Receivers.- Reducing the Area on a Chip Using a Bank of Evolved Filters.- Evolutionary Design.- Walsh Function Systems: The Bisectional Evolutional Generation Pattern.- Extrinsic Evolvable Hardware on the RISA Architecture.- Evolving and Analysing "Useful" Redundant Logic.- Adaptive Transmission Technique in Underwater Acoustic Wireless Communication.- Autonomous Robot Path Planning Based on Swarm Intelligence and Stream Functions.- Research on Adaptive System of the BTT-45 Air-to-Air Missile Based on Multilevel Hierarchical Intelligent Controller.- The Design of an Evolvable On-Board Computer.- Evolutionary Algorithms in Hardware Design.- Extending Artificial Development: Exploiting Environmental Information for the Achievement of Phenotypic Plasticity.- UDT-Based Multi-objective Evolutionary Design of Passive Power Filters of a Hybrid Power Filter System.- Designing Electronic Circuits by Means of Gene Expression Programming II.- Designing Polymorphic Circuits with Evolutionary Algorithm Based on Weighted Sum Method.- Robust and Efficient Multi-objective Automatic Adjustment for Optical Axes in Laser Systems Using Stochastic Binary Search Algorithm.- Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks.- Evolving in Extended Hamming Distance Space: Hierarchical Mutation Strategy and Local Learning Principle for EHW.- Hardware Implementation of Evolutionary Algorithms.- Adaptive and Evolvable Analog Electronics for Space Applications.- Improving Flexibility in On-Line Evolvable Systems by Reconfigurable Computing.- Evolutionary Design of Resilient Substitution Boxes: From Coding to Hardware Implementation.- A Sophisticated Architecture for Evolutionary Multiobjective Optimization Utilizing High Performance DSP.- FPGA-Based Genetic Algorithm Kernel Design.- Using Systolic Technique to Accelerate an EHW Engine for Lossless Image Compression.

231 citations


Journal ArticleDOI
TL;DR: This survey paper compares native double precision solvers with emulated- and mixed-precision solvers of linear systems of equations as they typically arise in finite element discretisations and concludes that the mixed precision approach works very well with the parallel co-processors gaining speedup factors and area savings, while maintaining the same accuracy as a reference solver executing everything in double precision.
Abstract: In this survey paper, we compare native double precision solvers with emulated-and mixed-precision solvers of linear systems of equations as they typically arise in finite element discretisations. The emulation utilises two single float numbers to achieve higher precision, while the mixed precision iterative refinement computes residuals and updates the solution vector in double precision but solves the residual systems in single precision. Both techniques have been known since the 1960s, but little attention has been devoted to their performance aspects. Motivated by changing paradigms in processor technology and the emergence of highly-parallel devices with outstanding single float performance, we adapt the emulation and mixed precision techniques to coupled hardware configurations, where the parallel devices serve as scientific co-processors. The performance advantages are examined with respect to speedups over a native double precision implementation (time aspect) and reduced area requirements for a chip (space aspect). The paper begins with an overview of the theoretical background, algorithmic approaches and suitable hardware architectures. We then employ several conjugate gradient (CG) and multigrid solvers and study their behaviour for different parameter settings of the iterative refinement technique. Concrete speedup factors are evaluated on the coupled hardware configuration of a general-purpose CPU and a graphics processor. The dual performance aspect of potential area savings is assessed on a field programmable gate array (FPGA). In the last part, we test the applicability of the proposed mixed precision schemes with ill-conditioned matrices. We conclude that the mixed precision approach works very well with the parallel co-processors gaining speedup factors of four to five, and area savings of three to four, while maintaining the same accuracy as a reference solver executing everything in double precision.

166 citations


Journal ArticleDOI
TL;DR: The challenge is identifying the design techniques that can extract high performance potential from the FPGA fabric.
Abstract: Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable gate arrays. The challenge is identifying the design techniques that can extract high performance potential from the FPGA fabric

154 citations


Journal ArticleDOI
02 Apr 2007
TL;DR: A new adaptive software/hardware reconfigurable system is presented in this paper, using a real application in the automotive domain implemented on a Xilinx Virtex-II 3000 FPGA to present results.
Abstract: Today's field programmable gate array (FPGA) architectures, like Xilinx's Virtex-II series, enable partial and dynamic run-time self-reconfiguration. This feature allows the substitution of parts of a hardware design implemented on this reconfigurable hardware, and therefore, a system can be adapted to the actual demands of applications running on the chip. Exploiting this possibility enables the development of adaptive hardware for a huge variety of applications. A novel method for communication interfaces using look up table (LUT)-based communication primitives enables an exact separation of reconfigurable parts and a fast and intelligent bus-system. A new adaptive software/hardware reconfigurable system is presented in this paper, using a real application in the automotive domain implemented on a Xilinx Virtex-II 3000 FPGA to present results

139 citations


Book
01 Jan 2007

134 citations


Proceedings ArticleDOI
05 Aug 2007
TL;DR: The machine itself - Maxwell - its hardware and software environment is described and very early benchmark results from runs of the demonstrators are presented.
Abstract: We present the initial results from the FHPCA Supercomputer project at the University of Edinburgh. The project has successfully built a general-purpose 64 FPGA computer and ported to it three demonstration applications from the oil, medical and finance sectors. This paper describes in brief the machine itself - Maxwell - its hardware and software environment and presents very early benchmark results from runs of the demonstrators.

124 citations


Proceedings ArticleDOI
20 May 2007
TL;DR: This work proposes an isolation primitive, moats and drawbridges, that are built around four design properties: logical isolation, interconnect traceability, secure reconfigurable broadcast, and configuration scrubbing, and each is a fundamental operation with easily understood formal properties, yet maps cleanly and efficiently to a wide variety of reconfigured devices.
Abstract: Blurring the line between software and hardware, reconfigurable devices strike a balance between the raw high speed of custom silicon and the post-fabrication flexibility of general-purpose processors. While this flexibility is a boon for embedded system developers, who can now rapidly prototype and deploy solutions with performance approaching custom designs, this results in a system development methodology where functionality is stitched together from a variety of "soft IP cores," often provided by multiple vendors with different levels of trust. Unlike traditional software where resources are managed by an operating system, soft IP cores necessarily have very fine grain control over the underlying hardware. To address this problem, the embedded systems community requires novel security primitives which address the realities of modern reconfigurable hardware. We propose an isolation primitive, moats and drawbridges, that are built around four design properties: logical isolation, interconnect traceability, secure reconfigurable broadcast, and configuration scrubbing. Each of these is a fundamental operation with easily understood formal properties, yet maps cleanly and efficiently to a wide variety of reconfigurable devices. We carefully quantify the required overheads on real FPGAs and demonstrate the utility of our methods by applying them to the practical problem of memory protection.

122 citations


Journal ArticleDOI
TL;DR: New fault-tolerant techniques for FPGA logic blocks are presented, developed as part of the roving self-test areas (STARs) approach to online testing, diagnosis, and reconfiguration.
Abstract: Most adaptive computing systems use reconfigurable hardware in the form of field programmable gate arrays (FPGAs). For these systems to be fielded in harsh environments where high reliability and availability are a must, the applications running on the FPGAs must tolerate hardware faults that may occur during the lifetime of the system. In this paper, we present new fault-tolerant techniques for FPGA logic blocks, developed as part of the roving self-test areas (STARs) approach to online testing, diagnosis, and reconfiguration . Our techniques can handle large numbers of faults (we show tolerance of over 100 logic faults via actual implementation on an FPGA consisting of a 20 times 20 array of logic blocks). A key novel feature is the reuse of defective logic blocks to increase the number of effective spares and extend the mission life. To increase fault tolerance, we not only use nonfaulty parts of defective or partially faulty logic blocks, but we also use faulty parts of defective logic blocks in nonfaulty modes. By using and reusing faulty resources, our multilevel approach extends the number of tolerable faults beyond the number of currently available spare logic resources. Unlike many column, row, or tile-based methods, our multilevel approach can tolerate not only faults that are evenly distributed over the logic area, but also clusters of faults in the same local area. Furthermore, system operation is not interrupted for fault diagnosis or for computing fault-bypassing configurations. Our fault tolerance techniques have been implemented using ORCA 2C series FPGAs which feature incremental dynamic runtime reconfiguration

112 citations


Journal ArticleDOI
01 Apr 2007
TL;DR: The development of a new FPGA-based reconfigurable computer called the Erlangen Slot Machine, which overcomes many architectural constraints of existing platforms and allows a user to partially reconfigure hardware modules arranged in so-called slots.
Abstract: Computer architects have been studying the dynamically reconfigurable computer (Schaumont, Verbauwhede, Keutzer, and Sarrafzadeh, "A Quick Safari through the Reconfiguration Jungle," in Proc. of the 38th Design Automation Conference, Las Vegas, pp. 127---177, 2001) for a number of years. New capabilities such as on-demand computing power, self-adaptiveness and self-optimization capabilities by restructuring the hardware on the fly at run-time is seen as a driving technology factor for current research initiatives such as autonomic (Kephart and Chess, Computer, 36:41---52, 2003; IBM Autonomic Computing Initiative, (http://www.research.ibm.com/autonomic/)) and organic computing (Muller-Schloer, von der Malsburg, and Wurtz, Inform.-Spektrum, 27:332---336, 2004; The Organic Computing Page, (http://www.organic-computing.org)). Much research work is currently devoted to models for partial hardware module relocation (SPP1148 Reconfigurable Computing Priority Program, (http://www12.informatik.uni-erlangen.de/spprr/)) and dynamically reconfigurable hardware reconfiguration on e.g., FPGA-based platforms. However, there are many physical restrictions and technical problems limiting the scope or applicability of these approaches. This led us to the development of a new FPGA-based reconfigurable computer called the Erlangen Slot Machine. The architecture overcomes many architectural constraints of existing platforms and allows a user to partially reconfigure hardware modules arranged in so-called slots. The uniqueness of this computer stems from (a) a new slot-oriented hardware architecture, (b) a set of novel inter-module communication paradigms, and (c) concepts for dynamic and partial reconfiguration management.

107 citations


Journal ArticleDOI
TL;DR: This book is an excellent state-of-the-art review of RC and would be a worthwhile acquisition by anyone seriously considering speeding up a specific application.
Abstract: Reconfigurable Computing. Accelerating Computation with Field-Programmable Gate Arrays by Maya B. Gokhale and Paul S. Graham Springer, 2005, 238 pp. ISBN-13 978-0387-26105-8, $87.20 Reconfigurable Computing Accelerating Computation with Field-Programmable Gate Arrays is an expository and easy to digest book. The authors are recognized leaders with many years of experience on the field of reconfigurable computing. The book is written so that non-specialists can understand the principles, techniques and algorithms. Each chapter has many excellent references for interested readers. It surveys methods, algorithms, programming languages and applications targeted to reconfigurable computing. Automatic generation of parallel code from a sequential program on conventional micro-processor architectures remains an open problem. Nevertheless, a wide range of computationally intensive applications have benefited from many tools developed to tackle such a problem. For RC, it is even a much harder problem (perhaps 10x and up) and intense research is being devoted to make RC a common-place practical tool. The aim of the authors is threefold. First, guide the readers to know current issues on HLL for RC. Second, help the readers understand the intricate process of algorithmic-to-hardware compilation. And third, show that, even though this process is painful, if the application is suitable for RC the gains in performance are huge. The book is divided into two parts. The first part contains four chapters about reconfigurable computing and languages. Chapter 1 presents an introduction of RC, contrasting conventional fixed instruction microprocessors with RC architectures. This chapter also contains comprehensive reference material for further reading. Chapter 2 introduces reconfigurable logic devices by explaining the basic architecture and configuration of FPGAs. Chapter 3 deals with RC systems by discussing how parallel processing is achieved on reconfigurable computers and also gives a survey of RC systems today. Then, in chapter 4, languages, compilation, debugging and their related manual vs. automatic issues are discussed. The second part of the book comprises five chapters about applications of RC. Chapter 5 and 6 discuss digital signal and image processing applications. Chapter 7 covers the application of RC to secure network communications. The aim of Chapter 8 is to discuss some important bioinformatics applications for which RC is a good candidate, their algorithmic problems and hardware implementations. Finally, Chapter 9 covers two applications of reconfigurable supercomputers. The first one is a simulation of radiative heat transfer and the second one models large urban road traffic. This book is neither a technical nor a text book, but in the opinion of this reviewer, it is an excellent state-of-the-art review of RC and would be a worthwhile acquisition by anyone seriously considering speeding up a specific application. On the downside, it is somewhat disappointing that the book does not contain more information about HLL tools that could be used to help close the gap between traditional HPC community and the raw computing power of RC. Edusmildo Orozco, Department of Computer Science, University Of Puerto Rico at Rio Piedras.

Proceedings ArticleDOI
18 Feb 2007
TL;DR: This paper systematically extends the concept of checkpointing known from software systems to hardware tasks running on reconfigurable devices and reveals a tool that takes over the burden of modifying hardware modules for checkpointing.
Abstract: Progress in reconfigurable hardware technology allows the implementation of complete SoCs in today's FPGAs. In the context design for reliability, software checkpointing is an effective methodology to cope with faults. In this paper, we systematically extend the concept of checkpointing known from software systems to hardware tasks running on reconfigurable devices. We will classify different mechanisms for hardware checkpointing and present formulas for estimating the hardware overhead. Moreover, we will reveal a tool that takes over the burden of modifying hardware modules for checkpointing. Post-synthesis results of applying our methodology to different hardware accelerators will be presented and the results will be compared with the theoretical estimations.

Proceedings ArticleDOI
23 Apr 2007
TL;DR: A new platform for reconfigurable computing has an object-based programming model, with architecture, silicon and tools designed to faithfully realize this model, aimed at application developers using software languages and methodologies.
Abstract: A new platform for reconfigurable computing has an object-based programming model, with architecture, silicon and tools designed to faithfully realize this model. The platform is aimed at application developers using software languages and methodologies. Its objectives are massive performance, long-term scalability, and easy development. In our structural object programming model, objects are strictly encapsulated software programs running concurrently on an asynchronous array of processors and memories. They exchange data and control through a structure of self-synchronizing asynchronous channels. Objects are combined hierarchically to create new objects, connected through the common channel interface. The first chip is a 130nm ASIC with 360 32-bit processors, 360 1KB RAM banks with access engines, and a configurable word-wide channel interconnect. Applications written in Java and block diagrams compile in one minute. Sub-millisecond runtime reconfiguration is inherent.

BookDOI
06 Dec 2007
TL;DR: The basic concepts and building blocks for the design of Fine- (or FPGA) and Coarse-Grain Reconfigurable Architectures are given and a new classification according to microcoded architectural criteria is described.
Abstract: Fine- and Coarse-Grain Reconfigurable Computing gives the basic concepts and building blocks for the design of Fine- (or FPGA) and Coarse-Grain Reconfigurable Architectures. Recently-developed integrated architecture design and software-supported design flow of FPGA and coarse-grain reconfigurable architecture are also described. The book is accompanied by an interactive CD which include case studies and lab projects for the design of FPGA and Coarse-grain architectures based on the European funded projects AMDREL and MOLEN, respectively. Part I consists of two extensive surveys of FPGA and Coarse-Grain Reconfigurable Architectures: The FPGA technology is defined, which includes architecture, logic block structure, interconnect, and configuration methods and existing fine-grain reconfigurable architectures emerged from both academia and industry. Additionally, the implementation techniques and CAD tools developed to facilitate the implementation of a system in reconfigurable hardware by the industry and academia are provided. In addition the features, the advantages and limitations of the coarse-grain reconfigurable systems, the specific issues that should be addressed during the design phase, as well as representative existing coarse-grain reconfigurable systems are explained. In Part II, case studies, innovative research results about reconfigurable architectures and design frameworks from three projects AMDREL, MOLEN and ADRES&DRESC, and, a new classification according to microcoded architectural criteria are described. Fine- and Coarse-Grain Reconfigurable Computing is an essential reference for researchers and professionals and can be used as a textbook by undergraduate, graduate students and professors.

Journal ArticleDOI
TL;DR: The Trident open source compiler translates C code to a hardware circuit description, providing designers with extreme flexibility in prototyping reconfigurable supercomputers.
Abstract: Unlocking the potential of field-programmable gate arrays requires compilers that translate algorithmic high-level language code into hardware circuits. The Trident open source compiler translates C code to a hardware circuit description, providing designers with extreme flexibility in prototyping reconfigurable supercomputers

Journal ArticleDOI
TL;DR: This paper studies designs for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications, on reconfigurable computing systems, and proposes three parameterized algorithms which can be tuned according to the problem size and the available hardware resources.
Abstract: The abundant hardware resources on current reconfigurable computing systems provide new opportunities for high-performance parallel implementations of scientific computations. In this paper, we study designs for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications, on reconfigurable computing systems. We first analyze design trade-offs in implementing this kernel. These trade-offs are caused by the inherent parallelism of matrix multiplication and the resource constraints, including the number of configurable slices, the size of on-chip memory, and the available memory bandwidth. We propose three parameterized algorithms which can be tuned according to the problem size and the available hardware resources. Our algorithms employ linear array architecture with simple control logic. This architecture effectively utilizes the available resources and reduces routing complexity. The processing elements (PEs) used in our algorithms are modular so that it is easy to embed floating-point units into them. Experimental results on a Xilinx Virtex-ll Pro XC2VP100 show that our algorithms achieve good scalability and high sustained GFLOPS performance. We also implement our algorithms on Cray XD1. XD1 is a high-end reconfigurable computing system that employs both general-purpose processors and reconfigurable devices. Our algorithms achieve a sustained performance of 2.06 GFLOPS on a single node of XD1

Proceedings ArticleDOI
12 Nov 2007
TL;DR: This work allocates a sandbox region in which modules from a library can be flexibly placed and interconnected, insulating the designer from reconfiguration details.
Abstract: In systems typified by software defined radio, existing flows for run-time FPGA reconfiguration limit resource efficiency when constructing a variety of datapaths. Our approach allocates a sandbox region in which modules from a library can be flexibly placed and interconnected. An efficient run-time framework makes use of lightweight placement and routing techniques to respond on-demand to application requests. Compile time tools automate the task of adding interface wrappers to modules, insulating the designer from reconfiguration details.

Journal ArticleDOI
TL;DR: A field-programmable gate array implementation of a molecular dynamics simulation method reduces the microprocessor time-to-solution by a factor of three while using only high-level languages.
Abstract: A field-programmable gate array implementation of a molecular dynamics simulation method reduces the microprocessor time-to-solution by a factor of three while using only high-level languages. The application speedup on FPGA devices increases with the problem size. The authors use a performance model to analyze the potential of simulating large-scale biological systems faster than many cluster-based supercomputing platforms

Proceedings ArticleDOI
02 Nov 2007
TL;DR: This paper proposes a reconfigurable (hardware) architecture with TC functionalities where it focuses on TPMs proposed by the TCG specifically designed for embedded platforms.
Abstract: Trusted Computing (TC) is an emerging technology towards building trustworthy computing platforms. The TrustedComputing Group (TCG) has proposed several specifications to implement TC functionalities by extensions to common computing platforms, particularly the underlying hardware with a Trusted Platform Module (TPM).However, actual TPMs are mostly available for workstations and servers nowadays and rather for specific domainapplications and not primarily for embedded systems. Further, the TPM specifications are becoming monolithic andmore complex while the applications demand a scalable and flexible usage of TPM functionalities.In this paper we propose a reconfigurable (hardware) architecture with TC functionalities where we focus on TPMsas proposed by the TCG specifically designed for embedded platforms. Our approach allows for (i) an efficient andscalable design and update of TPM functionalities, in particular for hardware-based crypto engines and accelerators, (ii) establishing a minimal trusted computing base in hardware, (iii) including the TPM as well as its functionalities into the chain of trust that enables to bind sensitive data to the underlying reconfigurable hardware, and (iv) designing a manufacturer independent TPM. We discuss possible implementations based on current FPGAs and point out the associated challenges, in particular with respect to protection of the internal TPM state since it must not be subject to manipulation, replay, and cloning

Journal ArticleDOI
TL;DR: This paper identifies two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method and proposes high-performance and area-efficient designs using each method.
Abstract: Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design trade-offs among the number of adders, buffer size, and latency. We then propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-ll Pro FPGA as the target device, we implemented our designs and present performance and area results.

Proceedings ArticleDOI
04 Jun 2007
TL;DR: The RISPP approach advances the extensible processor concept by providing flexibility through runtime adaptation by what is called "instruction rotation", which allows sharing resources in a highly flexible scheme of compatible components (called atoms and molecules).
Abstract: Adaptation in embedded processing is key in order to address efficiency. The concept of extensible embedded processors works well if a few a-priori known hot spots exist. However, they are far less efficient if many and possible at-design-time-unknown hot spots need to be dealt with. Our RISPP approach advances the extensible processor concept by providing flexibility through runtime adaptation by what we call "instruction rotation". It allows sharing resources in a highly flexible scheme of compatible components (called atoms and molecules). As a result, we achieve high speed-ups at moderate additional hardware. Furthermore, we can dynamically tradeoff between area and speed-up through runtime adaptation. We present the main components of our platform and discuss by means of an H.264 video codec.

Proceedings ArticleDOI
16 Apr 2007
TL;DR: A reconfigurable hardware architecture for the acceleration of video-based driver assistance applications in future automotive systems that makes use of the partial dynamic reconfiguration capabilities of Xilinx Virtex FPGAs.
Abstract: In this paper we show a reconfigurable hardware architecture for the acceleration of video-based driver assistance applications in future automotive systems. The concept is based on a separation of pixel-level operations and high level application code. Pixel-level operations are accelerated by coprocessors, whereas high level application code is implemented fully programmable on standard PowerPC CPU cores to allow flexibility for new algorithms. In addition, the application code is able to dynamically reconfigure the coprocessors available on the system, allowing for a much larger set of hardware accelerated functionality than would normally fit onto a device. This process makes use of the partial dynamic reconfiguration capabilities of Xilinx Virtex FPGAs.

01 Jan 2007
TL;DR: The goal of this work is to explore and demonstrate the feasibility of providing a systematic and easy to understand view into reconfigurable computers through OS support without incurring significant performance penalties, and to observe increased productivity among high-level application developers who have little experience in FPGA application design.
Abstract: Reconfigurable computing is a promising technology to meet future computational demand by leveraging flexibilities and the high degree of parallelism found in reconfigurable hardware fabrics, such as field programmable gate arrays (FPGAs). However, despite their promising performance researchers have demonstrated, reconfigurable computers are yet to be widely adopted. One reason is the lack of a common and intuitive operating system for these platforms. This dissertation work explores the design and implementation trade-offs of an operating system for FPGA-based reconfigurable computers, BORPH, the Berkeley Operating system for ReProgrammable Hardware. The goal of this work is to explore and demonstrate the feasibility of providing a systematic and easy to understand view into reconfigurable computers through OS support without incurring significant performance penalties. BORPH provides kernel support for FPGA applications by extending a standard Linux operating system. It establishes the notion of hardware process for executing user FPGA applications. Users therefore compile and execute hardware designs on FPGA resources the same way they run software programs on conventional processor-based systems. BORPH offers run-time general file system support to hardware processes as if they were software. The unified file interface allows hardware and software processes to communicate via standard UNIX file pipes. Furthermore, a virtual file system is built to allow access to memories and registers defined in the FPGA, providing communication links between hardware and software. The functions of BORPH are demonstrated on a BEE2 compute module. Performances of BORPH are measured to identify bottlenecks of our system. The clean OS kernel/user separation of BORPH has allowed us to improve overall system performance without affecting existing user designs. Furthermore, BORPH's unified runtime environment has enabled designers to make fair and end-to-end comparisons among software/hardware implementations of the same application. Most importantly, since the introduction of BORPH to our FPGA-based platform, we have observed increased productivity among high-level application developers who have little experience in FPGA application design.

Journal ArticleDOI
TL;DR: A reconfigurable curve-based cryptoprocessor that accelerates scalar multiplication of ECC and HECC of genus 2 over GF(2n) and it can handle various curve parameters and arbitrary irreducible polynomials.
Abstract: This paper presents a reconfigurable curve-based cryptoprocessor that accelerates scalar multiplication of Elliptic Curve Cryptography (ECC) and HyperElliptic Curve Cryptography (HECC) of genus 2 over GF(2n). By allocating a copies of processing cores that embed reconfigurable Modular Arithmetic Logic Units (MALUs) over GF(2n), the scalar multiplication of ECC/HECC can be accelerated by exploiting Instruction-Level Parallelism (ILP). The supported field size can be arbitrary up to a(n + 1) - 1. The superscaling feature is facilitated by defining a single instruction that can be used for all field operations and point/divisor operations. In addition, the cryptoprocessor is fully programmable and it can handle various curve parameters and arbitrary irreducible polynomials. The cost, performance, and security trade-offs are thoroughly discussed for different hardware configurations and software programs. The synthesis results with a 0.13-mum CMOS technology show that the proposed reconfigurable cryptoprocessor runs at 292 MHz, whereas the field sizes can be supported up to 587 bits. The compact and fastest configuration of our design is also synthesized with a fixed field size and irreducible polynomial. The results show that the scalar multiplication of ECC over GF(2163) and HECC over GF(283) can be performed in 29 and 63 mus, respectively.

Proceedings ArticleDOI
12 Nov 2007
TL;DR: The European integrated project MORPHEUS is presented, to develop new heterogeneous reconfigurable SoCs with various sizes of reconfiguration granularity and to provide an integrated toolset of spatial and sequential design that can be used for mapping and execution of the target applications.
Abstract: Reconfigurable architectures and NoC (network-on-chip) communication systems have introduced new research directions for technology and flexibility issues, which have been largely investigated in the last decades. Exploiting the flexibility of reconfigurable architectures, the run-time adap-tivity through run-time reconfiguration, opens a new area of research by considering dynamic reconfiguration. Since software parts of an embedded system can also be included into reconfigurable hardware by integration of an IP-based microcontroller, the reconfigurable architecture provides a flexible, multi-adaptive heterogeneous platform forHW/SW co-design. In this paper, we present the European integrated project MORPHEUS (1ST 027342). Its goal is to develop new heterogeneous reconfigurable SoCs with various sizes of reconfiguration granularity and to provide an integrated toolset of spatial and sequential design that can be used for mapping and execution of the target applications. Additionally a NoC approach is included in order to demonstrate the mentioned benefits and scalability for actual and future SoC design. The power of this approach will be demonstrated with four applications from the industrial environment.

Proceedings ArticleDOI
23 Apr 2007
TL;DR: The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers and a 64-node prototype cluster is described.
Abstract: While medium- and large-sized computing centers have increasingly relied on clusters of commodity PC hardware to provide cost-effective capacity and capability, it is not clear that this technology will scale to the PetaFLOP range. It is expected that semiconductor technology will continue its exponential advancements over next fifteen years; however, new issues are rapidly emerging and the relative importance of current performance metrics are shifting. Future PetaFLOP architectures will require system designers to solve computer architecture problems ranging from how to house, power, and cool the machine, all the while remaining sensitive to cost. The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers. This paper describes the nascent project's objectives and a 64-node prototype cluster. Specifically, the aim is to provide an detailed motivation for the project, describe the design principles guiding development, and present a preliminary performance assessment. Microbenchmark results are reported to answer several pragmatic questions about key subsystems, including the system software, network performance, memory bandwidth, power consumption of nodes in the cluster. Results suggest that the approach is sound.

Journal ArticleDOI
TL;DR: The Molen compiler is described, which automatically generates optimized binary code for C applications, based on pragma annotation of the code executed on the reconfigurable hardware, and an instruction-scheduling algorithm for the dynamic hardware configuration instructions.
Abstract: In this paper, we describe the compiler developed to target the Molen reconfigurable processor and programming paradigm. The compiler automatically generates optimized binary code for C applications, based on pragma annotation of the code executed on the reconfigurable hardware. For the IBM PowerPC 405 processor included in the Virtex II Pro platform FPGA, we implemented code generation, register, and stack frame allocation following the PowerPC EABI (embedded application binary interface). The PowerPC backend has been extended to generate the appropriate instructions for the reconfigurable hardware and data transfer, taking into account the information of the specific hardware implementations and system. Starting with an annotated C application, a complete design flow has been integrated to generate the executable bitstream for the reconfigurable processor. The flexible design of the proposed infrastructure allows to consider the special features of the reconfigurable architectures. In order to hide the reconfiguration latencies, we implemented an instruction-scheduling algorithm for the dynamic hardware configuration instructions. The algorithm schedules, in advance, the hardware configuration instructions, taking into account the conflicts for the reconfigurable hardware resources (FPGA area) between the hardware operations. To verify the Molen compiler, we used the multimedia video frame M-JPEG encoder of which the extended discrete cosine transform (DCTa) function was mapped on the FPGA. We obtained an overall speedup of 2.5 (about 84p efficiency over the maximal theoretical speedup of 2.96). The performance efficiency is achieved using automatically generated nonoptimized DCTa hardware implementation. The instruction-scheduling algorithm has been tested for DCT, quantization, and VLC operations. Based on simulation results, we determine that, while a simple scheduling produces a significant performance decrease, our proposed scheduling contributes for up to 16x M-JPEG encoder speedup.

Journal ArticleDOI
TL;DR: A comparative analysis of FPGAs and traditional processors is presented, focusing on floating-point performance and procurement costs, revealing economic hurdles in the adoption of FFPAs for general high-performance computing (HPC).
Abstract: For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) can produce significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating-point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general high-performance computing (HPC)

BookDOI
20 Dec 2007
TL;DR: A Programming Model and Architectural Extensions for Fine-Grain Parallelism, and Applications Using FG to reduce the Effect of Latency in Parallel Programs Running on Clusters.
Abstract: PREFACE MODELS Evolving Computational Systems S.G. Akl Decomposable BSP: A Bandwidth-Latency Model for Parallel and Hierarchical Computation G. Bilardi, A. Pietracaprina, and G. Pucci Membrane Systems: A "Natural" Way of Computing with Cells O.H. Ibarra and A. Paun Optical Transpose Systems: Models and Algorithms C.-F. Wang and S. Sahni Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform U. Vishkin, G. Caragea, and B. Lee Deterministic and Randomized Sorting Algorithms for the Parallel Disks Model S. Rajasekaran A Programming Model and Architectural Extensions for Fine-Grain Parallelism A. Gontmakher, A. Mendelson, A. Schuster, and G. Shklover Computing with Mobile Agents in Distributed Networks E. Kranakis, D. Krizanc, and S. Rajsbaum Transitional Issues: Fine-Grain to Coarse-Grain Multicomputers S. Olariu Distributed Computing in the Presence of Mobile Faults N. Santoro and P. Widmayer A Hierarchical Performance Model for Reconfigurable Computers R. Scorfano and V.K. Prasanna Hierarchical Performance Modeling and Analysis of Distributed Software Systems R.A. Ammar Randomized Packet Routing, Selection, and Sorting on the POPS Network J. Davila and S. Rajasekaran Dynamic Reconfiguration on the R-Mesh R. Vaidyanathan and J.L. Trahan Fundamental Algorithms on the Reconfigurable Mesh K. Nakano Reconfigurable Computing with Optical Buses A.G. Bourgeois ALGORITHMS Distributed Peer-to-Peer Data Structures M.T. Goodrich and M.J. Nelson Parallel Algorithms via the Probabilistic Method L. Kliemann and A. Srivastav Broadcasting on Networks of Workstations S. Khuller, Y.-A. Kim, and Y.-C. Wan Atomic Selfish Routing in Networks: A Survey S. Kontogiannis and P. Spirakis Scheduling in Grid Environments Y-C. Lee and A.Y. Zomaya QoS Scheduling in Network and Storage Systems P.J. Varman and A. Gulati Optimal Parallel Scheduling Algorithms in WDM Packet Interconnects Y. Yang Online Real-Time Scheduling Algorithms for Multiprocessor Systems M.A. Palis Parallel Algorithms for Maximal Independent Set and Maximal Matching Y. Han Efficient Parallel Graph Algorithms for Shared-Memory Multiprocessors D.A. Bader and G. Cong Parallel Algorithms for Volumetric Surface Construction J. JaJa, Q. Shi, and A. Varshney Mesh-Based Parallel Algorithms for Ultra-Fast Computer Vision S. Olariu Prospectus for a Dense Linear Algebra Software Library J. Demmel and J. Dongarra Parallel Algorithms on Strings W. Rytter Design of Multithreaded Algorithms for Combinatorial Problems D.A. Bader, K. Madduri, G. Cong, and J. Feo Parallel Data Mining Algorithms for Association Rules and Clustering J. Li, Y. Liu, W.-K. Liao, and A. Choudhary An Overview of Mobile Computing Algorithmics S. Olariu and A.Y. Zomaya APPLICATIONS Using FG to Reduce the Effect of Latency in Parallel Programs Running on Clusters T.H. Cormen and E.R. Davidson High-Performance Techniques for Parallel I/O A. Ching, K. Coloma, A. Choudhary, and W.-K. Liao Message Dissemination Using Modern Communication Primitives T. Gonzalez Online Computation in Large Networks S. Albers Online Call Admission Control in Wireless Cellular Networks I. Caragiannis, C. Kaklamanis, and E. Papaioannou Minimum Energy Communication in Ad Hoc Wireless Networks I. Caragiannis, C. Kaklamanis, and P. Kanellopoulos Power Aware Mapping of Real-Time Tasks to Multiprocessors D. Zhu, B.R. Childers, D. Mosse, and R. Melhem Perspectives on Robust Resource Allocation for Heterogeneous Parallel and Distributed Systems S. Ali, H.J. Siegel, and A.A. Maciejewski A Transparent Distributed Runtime for Java M. Factor, A. Schuster, and K. Shagin Scalability of Parallel Programs A. Grama and V. Kumar Spatial Domain Decomposition Methods in Parallel Scientific Computing Sudip Seal and Srinivas Aluru Game Theoretical Solutions for Data Replication in Distributed Computing Systems S.U. Khan and I. Ahmad Effectively Managing Data on a Grid C.L. Ruby and R. Miller Fast and Scalable Parallel Matrix Multiplication and Its Applications on Distributed Memory Systems K. Li INDEX

Journal ArticleDOI
TL;DR: High-performance reconfigurable computers have the potential to exploit coarse- grained functional parallelism as well as fine-grained instruction-level parallelism through direct hardware execution on FPGAs.
Abstract: High-performance reconfigurable computers have the potential to exploit coarse-grained functional parallelism as well as fine-grained instruction-level parallelism through direct hardware execution on FPGAs.