Showing papers on "Reconfigurable computing published in 2006"

PDF

Open Access

Proceedings Article•DOI•

Invited Paper: Enhanced Architectures, Design Methodologies and CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs

[...]

Patrick Lysaght¹, Brandon J. Blodget¹, Jeffrey M. Mason¹, Jay Young¹, Brendan K. Bridgford¹ - Show less +1 more•Institutions (1)

Xilinx¹

01 Aug 2006

TL;DR: In this article, the authors describe architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs, augmented by a new design methodology that uses pre-routed IP cores for communication between static and dynamic modules and permits static designs to route through regions otherwise reserved for dynamic modules.

...read moreread less

Abstract: The paper describes architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs. These are augmented by a new design methodology that uses pre-routed IP cores for communication between static and dynamic modules and permits static designs to route through regions otherwise reserved for dynamic modules. A new CAD tool flow to automate the methodology is also presented. The new tools initially target the Virtex-II, Virtex-II Pro and Virtex-4 families and are derived from Xilinx's commercial CAD tools

...read moreread less

308 citations

Proceedings Article•DOI•

A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH

[...]

Hayden K.-H. So¹, Artem Tkachenko¹, Robert W. Brodersen¹•Institutions (1)

University of California, Berkeley¹

22 Oct 2006

TL;DR: This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's).

...read moreread less

Abstract: This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's). By providing native kernel support for FPGA hardware, BORPH offers a homogeneous UNIX interface for both software and hardware processes. Hardware processes inherit the same level of service from the kernel, such as file system support, as typical UNIX software processes. Hardware and software components of a design therefore run as hardware and software processes within BORPH's run-time environment. The familiar and language independent UNIX kernel interface facilitates easy design reuse and rapid application development. Performance of our current implementation and our experience with developing a real-time wireless digital signal processing system based on BORPH will be presented.

...read moreread less

179 citations

Journal Article•DOI•

An overview of reconfigurable hardware in embedded systems

[...]

Philip Garcia¹, Katherine Compton¹, Michael J. Schulte¹, Emily Blem¹, Wenyin Fu¹ - Show less +1 more•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2006-Eurasip Journal on Embedded Systems

TL;DR: An overview of reconfigurable computing in embedded systems, in terms of benefits it can provide, how it has already been used, design issues, and hurdles that have slowed its adoption are presented.

...read moreread less

Abstract: Over the past few years, the realm of embedded systems has expanded to include a wide variety of products, ranging from digital cameras, to sensor networks, to medical imaging systems. Consequently, engineers strive to create ever smaller and faster products, many of which have stringent power requirements. Coupled with increasing pressure to decrease costs and time-to-market, the design constraints of embedded systems pose a serious challenge to embedded systems designers. Reconfigurable hardware can provide a flexible and efficient platform for satisfying the area, performance, cost, and power requirements of many embedded systems. This article presents an overview of reconfigurable computing in embedded systems, in terms of benefits it can provide, how it has already been used, design issues, and hurdles that have slowed its adoption.

...read moreread less

157 citations

Proceedings Article•DOI•

Regular expression matching for reconfigurable packet inspection

[...]

João Bispo, Ioannis Sourdis, João M. P. Cardoso, Stamatis Vassiliadis

01 Dec 2006

TL;DR: A nondeterministic finite automata (NFA) based implementation was presented, which takes advantage of new basic building blocks to support more complex regular expressions than the previous approaches.

...read moreread less

Abstract: Recent intrusion detection systems (IDS) use regular expressions instead of static patterns as a more efficient way to represent hazardous packet payload contents. This paper focuses on regular expressions pattern matching engines implemented in reconfigurable hardware. A nondeterministic finite automata (NFA) based implementation was presented, which takes advantage of new basic building blocks to support more complex regular expressions than the previous approaches. The methodology is supported by a tool that automatically generates the circuitry for the given regular expressions, outputting VHDL representations ready for logic synthesis. Furthermore, techniques to reduce the area cost of our designs and maximize performance when targeting FPGAs were included. Experimental results show that our tool is able to generate a regular expression engine to match more than 500 IDS regular expressions (from the Snort ruleset) using only 25K logic cells and achieving 2 Gbps throughput on a Virtex2 and 2.9 on a Virtex4 device. Concerning the throughput per area required per matching non-meta character, our design is 3.4 and 10 times more efficient than previous ASIC and FPGA approaches, respectively

...read moreread less

146 citations

Proceedings Article•DOI•

A FPGA implementation of model predictive control

[...]

Keck Voon Ling¹, S.P. Yue¹, Jan M. Maciejowski²•Institutions (2)

Nanyang Technological University¹, University of Cambridge²

14 Jun 2006

TL;DR: In this paper, the authors explore the implementation of MPC technology into reconfigurable hardware such as a FPGA chip and present a rapid prototyping environment suitable for exploring the various implementation issues to bring MPC onto a chip.

...read moreread less

Abstract: With its natural ability in handling constraints, model predictive control (MPC) has become an established control technology in the petrochemical industry, and its use is currently being pioneered in an increasingly wide range of process industries. It is also being proposed for a range of higher bandwidth applications, such as ships, aerospace and road vehicles. To extend its applications to miniaturized devices and/or embedded systems, this paper explores the implementation of the MPC technology into reconfigurable hardware such as a FPGA chip. A rapid prototyping environment suitable for exploring the various implementation issues to bring MPC onto a chip is described. Tests were conducted to verify the applicability of the "MPC on a chip" idea. It is shown that a modest FPGA chip could be used to implement a reasonably sized constrained MPC controller.

...read moreread less

131 citations

Proceedings Article•DOI•

Tartan: evaluating spatial computation for whole program execution

[...]

M. Mishra¹, Timothy J. Callahan¹, Tiberiu Chelcea¹, Girish Venkataramani¹, Seth Copen Goldstein¹, Mihai Budiu² - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, Microsoft²

20 Oct 2006

TL;DR: The initial investigation reveals that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation and can provide an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.

...read moreread less

Abstract: Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system.Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energyefficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.

...read moreread less

97 citations

Journal Article•DOI•

Real-time computing platform for spiking neurons (RT-spike)

[...]

Eduardo Ros, Eva M. Ortigosa, Rodrigo Agís, Richard R. Carrillo, M. Arnold - Show less +1 more

01 Jul 2006-IEEE Transactions on Neural Networks

TL;DR: The overall goal is to investigate biologically realistic models for the real-time control of robots operating within closed action-perception loops, and so the performance of the system on simulating a model of the cerebellum where the emulation of the temporal dynamics of the synaptic integration process is important is evaluated.

...read moreread less

Abstract: A computing platform is described for simulating arbitrary networks of spiking neurons in real time. A hybrid computing scheme is adopted that uses both software and hardware components to manage the tradeoff between flexibility and computational power; the neuron model is implemented in hardware and the network model and the learning are implemented in software. The incremental transition of the software components into hardware is supported. We focus on a spike response model (SRM) for a neuron where the synapses are modeled as input-driven conductances. The temporal dynamics of the synaptic integration process are modeled with a synaptic time constant that results in a gradual injection of charge. This type of model is computationally expensive and is not easily amenable to existing software-based event-driven approaches. As an alternative we have designed an efficient time-based computing architecture in hardware, where the different stages of the neuron model are processed in parallel. Further improvements occur by computing multiple neurons in parallel using multiple processing units. This design is tested using reconfigurable hardware and its scalability and performance evaluated. Our overall goal is to investigate biologically realistic models for the real-time control of robots operating within closed action-perception loops, and so we evaluate the performance of the system on simulating a model of the cerebellum where the emulation of the temporal dynamics of the synaptic integration process is important.

...read moreread less

94 citations

Proceedings Article•DOI•

Applying Partial Reconfiguration to Networks-On-Chips

[...]

Thilo Pionteck, R. Koch, Carsten Albrecht

01 Aug 2006

TL;DR: Two variants of CoNoChi are presented: one is based on a homogeneous hardware structure that is dynamically reconfigurable on logic block level, and the other one is adapted to the limited partial reconfiguration capabilities of Xilinx Virtex-II (Pro) FPGAs.

...read moreread less

Abstract: This paper presents CoNoChi, an adaptable Network-on-Chip for dynamically reconfigurable hardware designs. CoNoChi is designed for taking advantage of the partial dynamic reconfiguration capabilities of modern FPGAs and applies this feature to adapt the network structure to the location, number and size of currently configured hardware modules. The network consists of the minimal number of switches required. Switches can be added or removed from the network by a global control instance at runtime. Compared to common fixed Network-on-Chip structures, the CoNoChi architecture reduces the area requirements and latency of the network and eases the online placement of hardware modules. Two variants of CoNoChi are presented: one is based on a homogeneous hardware structure that is dynamically reconfigurable on logic block level, and the other one is adapted to the limited partial reconfiguration capabilities of Xilinx Virtex-II (Pro) FPGAs.

...read moreread less

91 citations

Book Chapter•DOI•

Neural network implementation in hardware using FPGAs

[...]

Suhap Şahin¹, Yasar Becerikli¹, Suleyman Yazici¹•Institutions (1)

Kocaeli University¹

03 Oct 2006

TL;DR: In this paper, the hardware implementation of neural network using FPGAs is presented using Very High Speed Integrated Circuits Hardware Description Language (VHDL) and is implemented in FPGA chip.

...read moreread less

Abstract: The usage of the FPGA (Field Programmable Gate Array) for neural network implementation provides flexibility in programmable systems. For the neural network based instrument prototype in real time application, conventional specific VLSI neural chip design suffers the limitation in time and cost. With low precision artificial neural network design, FPGAs have higher speed and smaller size for real time application than the VLSI design. In addition, artificial neural network based on FPGAs has fairly achieved with classification application. The programmability of reconfigurable FPGAs yields the availability of fast special purpose hardware for wide applications. Its programmability could set the conditions to explore new neural network algorithms and problems of a scale that would not be feasible with conventional processor. The goal of this work is to realize the hardware implementation of neural network using FPGAs. Digital system architecture is presented using Very High Speed Integrated Circuits Hardware Description Language (VHDL) and is implemented in FPGA chip. The design was tested on a FPGA demo board.

...read moreread less

89 citations

Journal Article•DOI•

Using bus-based connections to improve field-programmable gate-array density for implementing datapath circuits

[...]

Andy Ye¹, Jonathan Rose²•Institutions (2)

Ryerson University¹, University of Toronto²

01 May 2006-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: It is experimentally shown that the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%.

...read moreread less

Abstract: As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are increasingly being used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multibit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, compared to conventional FPGA routing architectures, the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections.

...read moreread less

83 citations

Proceedings Article•DOI•

REPLICA2Pro: task relocation by bitstream manipulation in virtex-II/Pro FPGAs

[...]

H. Kalte¹, Mario Porrmann²•Institutions (2)

University of Western Australia¹, University of Paderborn²

03 May 2006

TL;DR: The REPLICA2Pro (Relocation per online Configuration Alteration in Virtex-2/-Pro) filter is developed, which is capable of performing task relocations by manipulating the task's bitstream during the regular allocation process without any extra time overhead.

...read moreread less

Abstract: One vision of dynamic hardware reconfiguration is to deliver virtually unlimited hardware resources to a set of hardware tasks implementing arbitrary functions. By using partial reconfiguration, these tasks can be allocated and de-allocated on the reconfigurable architecture while others continue to operate. However, the exact placement of each task can only be determined during runtime according to the current resource allocation. This requires relocating each task from its original position after place and route to an area of available resources. The process of relocating tasks can result in a major time overhead. In order to solve this problem we have developed the REPLICA2Pro (Relocation per online Configuration Alteration in Virtex-2/-Pro) filter, which is capable of performing task relocations by manipulating the task's bitstream during the regular allocation process without any extra time overhead. The filter architecture, our reconfigurable system approach as well as our design flow and an experimental system setup are presented in this paper.

...read moreread less

Proceedings Article•DOI•

A case study in porting a production scientific supercomputing application to a reconfigurable computer

[...]

Volodymyr Kindratenko¹, David Pointer¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

24 Apr 2006

TL;DR: The rationale in choosing the development path taken and the general framework for porting an existing scientific code, such as NAMD, to the SRC-6 platform are presented and discussed in detail and the results are applicable to the large class of problems in scientific computing.

...read moreread less

Abstract: This case study presents the results of porting a production scientific code, called NAMD, to the SRC-6 high-performance reconfigurable computing platform based on Field Programmable Gate Array (FPGA) technology. NAMD is a molecular dynamics code designed to run on large supercomputing systems and used extensively by the computational biophysics community. NAMD?s computational kernel is highly optimized to run on conventional von Neumann processors; this presents numerous challenges to its reimplementation on FPGA architecture. This paper presents an overview of the SRC-6 architecture and the NAMD application and then discusses the challenges, solutions, and results of the porting effort. The rationale in choosing the development path taken and the general framework for porting an existing scientific code, such as NAMD, to the SRC-6 platform are presented and discussed in detail. The results and methods presented in this paper are applicable to the large class of problems in scientific computing.

...read moreread less

Book•

Cryptographic Algorithms on Reconfigurable Hardware (Signals and Communication Technology)

[...]

Francisco Rodríguez-Henríquez, Nazar Abbas Saqib, Arturo Diaz-Perez, Çetin Kaya Koç

01 Nov 2006

Book•

Introduction to Evolvable Hardware: A Practical Guide for Designing Self-Adaptive Systems

[...]

Garrison W. Greenwood, Andy M. Tyrrell

20 Oct 2006

TL;DR: This chapter discusses the development of EHW-Based Fault Recovery for Online Systems, which involves designing Self-Adaptive Systems and quantifying Intrinsic Reconfiguration Time.

...read moreread less

Abstract: PREFACE. ACKNOWLEDGMENTS. ACRONYMS. 1 INTRODUCTION. 1.1 Characteristics of Evolvable Circuits and Systems. 1.2 Why Evolvable Hardware Is Good (and Bad!). 1.3 Technology. 1.4 Evolvable Hardware vs. Evolved Hardware. 1.5 Intrinsic vs. Extrinsic Evolution. 1.6 Online vs. Offline Evolution. 1.7 Evolvable Hardware Applications. References. 2 FUNDAMENTALS OF EVOLUTIONARY COMPUTATION. 2.1 What Is an EA? 2.2 Components of an EA. 2.2.1 Representation. 2.2.2 Variation. 2.2.3 Evaluation. 2.2.4 Selection. 2.2.5 Population. 2.2.6 Termination Criteria. 2.3 Getting the EA to Work. 2.4 Which EA Is Best? References. 3 RECONFIGURABLE DIGITAL DEVICES. 3.1 Basic Architectures. 3.1.1 Programmable Logic Devices. 3.1.2 Field Programmable Gate Array. 3.2 Using Reconfigurable Hardware. 3.2.1 Design Phase. 3.2.2 Execution Phase. 3.3 Experimental Results. 3.4 Functional Overview of the POEtic Architecture. 3.4.1 Organic Subsystem. 3.4.2 Description of the Molecules. 3.4.3 Description of the Routing Layer. 3.4.4 Dynamic Routing. 3.5 Remarks. References. 4 RECONFIGURABLE ANALOG DEVICES. 4.1 Basic Architectures. 4.2 Transistor Arrays. 4.2.1 The NASA FTPA. 4.2.2 The Heidelberg FPTA. 4.3 Analog Arrays. 4.4 Remarks. References. 5 PUTTING EVOLVABLE HARDWARE TO USE. 5.1 Synthesis vs. Adaption. 5.2 Designing Self-Adaptive Systems. 5.2.1 Fault Tolerant Systems. 5.2.2 Real-Time Systems. 5.3 Creating Fault Tolerant Systems Using EHW. 5.4 Why Intrinsic Reconfiguration for Online Systems? 5.5 Quantifying Intrinsic Reconfiguration Time. 5.6 Putting Theory Into Practice. 5.6.1 Minimizing Risk With Anticipated Faults. 5.6.2 Minimizing Risk With Unanticipated Faults. 5.6.3 Suggested Practices. 5.7 Examples of EHW-Based Fault Recovery. 5.7.1 Population vs. Fitness-Based Designs. 5.7.2 EHW Compensators. 5.7.3 Robot Control. 5.7.4 The POEtic Project. 5.7.5 Embryo Development. 5.8 Remarks. References. 6 FUTURE WORK. 6.1 Circuit Synthesis Topics. 6.1.1 Digital Design. 6.1.2 Analog Design. 6.2 Circuit Adaption Topics. References. INDEX . ABOUT THE AUTHORS.

...read moreread less

Journal Article•DOI•

ASC: a stream compiler for computing with FPGAs

[...]

Oskar Mencer¹•Institutions (1)

Imperial College London¹

01 Sep 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Results and case studies with optimizations that are: on the gate level-Kasumi and International Data Encryption Algorithm encryptions; on the arithmetic level-redundant addition and multiplication function evaluation for two-dimensional rotation; and on the architecture level-Wavelet and Lempel-Ziv (LZ)-like compression are presented.

...read moreread less

Abstract: A stream compiler (ASC) for computing with field programmable gate arrays (FPGAs) emerges from the ambition to bridge the hardware-design productivity gap where the number of available transistors grows more rapidly than the productivity of very large scale integration (VLSI) and FPGA computer-aided-design (CAD) tools. ASC addresses this problem with a softwarelike programming interface to hardware design (FPGAs) while keeping the performance of hand-designed circuits at the same time. ASC improves productivity by letting the programmer optimize the implementation on the algorithm level, the architecture level, the arithmetic level, and the gate level, all within the same C++ program. The increased productivity of ASC is applied to the hardware acceleration of a wide range of applications. Traditionally, hardware accelerators are tediously handcrafted to achieve top performance. ASC simplifies design-space exploration of hardware accelerators by transforming the hardware-design task into a software-design process, using only "GNU compiler collection (GCC)" and "make" to obtain a hardware netlist. From experience, the hardware-design productivity and ease of use are close to pure software development. This paper presents results and case studies with optimizations that are: 1) on the gate level-Kasumi and International Data Encryption Algorithm (IDEA) encryptions; 2) on the arithmetic level-redundant addition and multiplication function evaluation for two-dimensional (2-D) rotation; and 3) on the architecture level-Wavelet and Lempel-Ziv (LZ)-like compression

...read moreread less

Journal Article•DOI•

Dynamic reconfiguration for management of radiation-induced faults in FPGAs

[...]

Maya Gokhale, Paul Graham, Michael Wirthlin, D. Eric Johnson, Nathaniel Rollins - Show less +1 more

06 Jul 2006-International Journal of Embedded Systems

TL;DR: To study single event upset (SEU) impact on signal processing applications, a novel fault injection technique to corrupt configuration bits is used, thereby simulating SEU faults and highlighting the benefits of dynamic reconfiguration for space-based reconfigurable computing.

...read moreread less

Abstract: This paper describes novel methods of exploiting the partial, dynamic reconfiguration capabilities of Xilinx Virtex V1000 FPGAs to manage Single-Event Upset (SEU) faults owing to radiation in space environments. The on-orbit fault detection scheme uses radiation-hardened reconfiguration controllers to continuously monitor the configuration bitstreams of nine Virtex FPGAs and to correct errors by partial, dynamic reconfiguration of the FPGAs while they continue to execute. To study the SEU impact on our signal processing applications, we use a novel fault injection technique to corrupt configuration bits, thereby simulating SEU faults. By using dynamic reconfiguration, we can run the corrupted designs directly on the FPGA hardware, giving many orders of magnitude speed-up over purely software techniques. The fault injection method has been validated against proton beam testing, showing 97.6% agreement. Our work highlights the benefits of dynamic reconfiguration for space-based reconfigurable computing.

...read moreread less

Proceedings Article•DOI•

Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers

[...]

Ronald Scrofano¹, Maya Gokhale², Frans Trouw², Viktor K. Prasanna¹•Institutions (2)

University of Southern California¹, Los Alamos National Laboratory²

24 Apr 2006

TL;DR: This paper describes how to partition the application between software and hardware and then model the performance of several alternatives for the task mapped to hardware, and demonstrates an implementation of one of these alternatives on a reconfigurable computer that achieves a 2 times speed-up over the software baseline.

...read moreread less

Abstract: With advances in re configurable hardware, especially field-programmable gate arrays (FPGAs), it has become possible to use reconfigurable hardware to accelerate complex applications, such as those in scientific computing. There has been a resulting development of reconfigurable computers - computers which have both general purpose processors and reconfigurable hardware, as well as memory and high-performance interconnection networks. In this paper, we study the acceleration of molecular dynamics simulations using reconfigurable computers. We describe how we partition the application between software and hardware and then model the performance of several alternatives for the task mapped to hardware. We describe an implementation of one of these alternatives on a reconfigurable computer and demonstrate that for two real-world simulations, it achieves a 2 times speed-up over the software baseline. We then compare our design and results to those of prior efforts and explain the advantages of the hardware/software approach, including flexibility

...read moreread less

Journal Article•DOI•

Hardware description of multi-layer perceptrons with different abstraction levels

[...]

Eva M. Ortigosa¹, Antonio Cañas¹, Eduardo Ros¹, Pilar Martínez Ortigosa², Sonia Mota¹, Javier Diaz¹ - Show less +2 more•Institutions (2)

University of Granada¹, University of Almería²

01 Nov 2006-Microprocessors and Microsystems

TL;DR: The final aim is the comparison of the methodologies applied in the two abstraction levels for designing hardware MLP’s or similar computational structures.

...read moreread less

Proceedings Article•DOI•

Parallel FPGA-based all-pairs shortest-paths in a directed graph

[...]

Uday Bondhugula¹, Ananth Devulapalli², Joseph Fernando², P. Wyckoff², P. Sadayappan¹ - Show less +1 more•Institutions (2)

Ohio State University¹, Ohio Supercomputer Center²

25 Apr 2006

TL;DR: This paper proposes a highly parallel FPGA design for the Floyd-Warshall algorithm to solve the all-pairs shortest-paths problem in a directed graph to maximize parallelism in the presence of significant data dependences.

...read moreread less

Abstract: With rapid advances in VLSI technology, field programmable gate arrays (FPGAs) are receiving the attention of the parallel and high performance computing community. In this paper, we propose a highly parallel FPGA design for the Floyd-Warshall algorithm to solve the all-pairs shortest-paths problem in a directed graph. Our work is motivated by a computationally intensive bio-informatics application that employs this algorithm. The design we propose makes efficient and maximal utilization of the large amount of resources available on an FPGA to maximize parallelism in the presence of significant data dependences. Experimental results from a working FPGA implementation on the Cray XD1 show a speedup of 22 over execution on the XD1's processor.

...read moreread less

Proceedings Article•DOI•

A Scalable FPGA-based Multiprocessor

[...]

Arun Patel¹, Christopher A. Madill¹, Manuel Saldana¹, Christopher Comis¹, Régis Pomès¹, Paul Chow¹ - Show less +2 more•Institutions (1)

University of Toronto¹

24 Apr 2006

TL;DR: An architecture for a scalable computing machine built entirely using FPGA computing nodes that enables designers to implement large-scale computing applications using a heterogeneous combination of hardware accelerators and embedded microprocessors spread across many FPGAs, all interconnected by a flexible communication network is proposed.

...read moreread less

Abstract: It has been shown that a small number of FPGAs can significantly accelerate certain computing tasks by up to two or three orders of magnitude. However, particularly intensive large-scale computing applications, such as molecular dynamics simulations of biological systems, underscore the need for even greater speedups to address relevant length and time scales. In this work, we propose an architecture for a scalable computing machine built entirely using FPGA computing nodes. The machine enables designers to implement large-scale computing applications using a heterogeneous combination of hardware accelerators and embedded microprocessors spread across many FPGAs, all interconnected by a flexible communication network. Parallelism at multiple levels of granularity within an application can be exploited to obtain the maximum computational throughput. By focusing on applications that exhibit a high computation-to-communication ratio, we narrow the extent of this investigation to the development of a suitable communication infrastructure for our machine, as well as an appropriate programming model and design flow for implementing applications. By providing a simple, abstracted communication interface with the objective of being able to scale to thousands of FPGA nodes, the proposed architecture appears to the programmer as a unified, extensible FPGA fabric. A programming model based on the MPI message-passing standard is also presented as a means for partitioning an application into independent computing tasks that can be implemented on our architecture. Finally, we demonstrate the first use of our design flow by developing a simple molecular dynamics simulation application for the proposed machine, which runs on a small platform of development boards

...read moreread less

Book Chapter•DOI•

Hardware/software co-design

[...]

Rob Williams

01 Jan 2006

TL;DR: This chapter introduces the VHSIC hardware description language (VHDL) or Verilog languages other the context for their use, in the development of modern real-time systems.

...read moreread less

Abstract: Reconfigurable hardware is becoming available with large enough capacity and performance to support complete systems. Field-programmable gate array (FPGA) and Programmable logic device (PLD) can be configured using software techniques not unfamiliar to software engineers working in the field of real-time systems. This chapter introduces the VHSIC hardware description language (VHDL) or Verilog languages other the context for their use, in the development of modern real-time systems. Systems-on-chip, using reconfigurable hardware in the form of FPGAs or complex programmable device (CPLDs), are becoming a popular form for embedded systems, often because of their reduced power requirements. This approach demands an integrated approach to hardware and software development if it is to be successful. Opportunities for all kinds of new applications will emerge using this cost-effective technology. FPGAs and PLDs can be configured using software techniques not unfamiliar to software engineers working in the field of real-time systems. The two most common hardware specification languages are VHDL in Europe and Verilog in the US.

...read moreread less

Proceedings Article•DOI•

Improving Usability of FPGA-Based Reconfigurable Computers Through Operating System Support

[...]

Hoyden Kwok-Hay So¹, Robert W. Brodersen¹•Institutions (1)

University of California, Berkeley¹

01 Aug 2006

TL;DR: BORPH, an operating system framework for FPGA-based reconfigurable computers with a goal to ease and accelerate development of high-level applications to run on these computers, provides kernel support for FGPA resources by extending a standard Linux operating system.

...read moreread less

Abstract: Advances in FPGA-based reconfigurable computers have made them a viable computing platform for a vast variety of computation demanding areas such as bioinformatics, speech recognition, and high-end digital signal processing. The lack of common, intuitive operating system support, however, hinders their wide deployment. This paper presents BORPH, an operating system frame-work for FPGA-based reconfigurable computers with a goal to ease and accelerate development of high-level applications to run on these computers. It provides kernel support for FPGA resources by extending a standard Linux operating system. Users therefore compile and execute hardware processes on FPGA resources the same way they run software processes on conventional processor-based systems. The operating system offers run-time general file system support to hardware processes as if they were software. Furthermore, a virtual file system is built to allow access to memories and registers defined in the FPGA, which provides communication links with running hardware processes. Increased productivities have been observed for high-level application developers, who have few previous experiences in hardware design, to implement complex mixed software/hardware designs on a FPGA-based reconfigurable computer running BORPH.

...read moreread less

Journal Article•DOI•

A Stochastic Digital Implementation of a Neural Network Controller for Small Wind Turbine Systems

[...]

Hui Li¹, Da Zhang¹, Simon Y. Foo¹•Institutions (1)

Florida A&M University – Florida State University College of Engineering¹

06 Sep 2006-IEEE Transactions on Power Electronics

TL;DR: This letter presents a reconfigurable hardware implementation of feed-forward neural networks using stochastic techniques to approximate the nonlinear sigmoid activation functions with reduced digital logic resources on an FPGA device with high fault tolerance capability.

...read moreread less

Abstract: This letter presents a reconfigurable hardware implementation of feed-forward neural networks using stochastic techniques. The design is based on the stochastic computation theory to approximate the nonlinear sigmoid activation functions with reduced digital logic resources. The large parallel neural network structure is then implemented on a reconfigurable field-programmable gate array (FPGA) device with high fault tolerance capability. The method is applied to a neural-network based wind-speed sensorless control of a small wind turbine system. The experimental results confirmed the validity of the developed stochastic FPGA implementation. The general design method can be extended to include other power electronics applications with different feed-forward neural network structures

...read moreread less

Journal Article•DOI•

Dynamically Configurable Security for SRAM FPGA Bitstreams

[...]

Lilian Bossuet, Guy Gogniat, Wayne Burleson

06 Jul 2006-International Journal of Embedded Systems

TL;DR: A solution to improve the security of SRAM FPGAs through bitstream encryption that doesn't need any external battery to store the secret key and opens a new way of application partitioning according to the security policy.

...read moreread less

Abstract: FPGAs are becoming increasingly attractive – thanks to the improvement of their capacities and their performances. Today, FPGAs represent an efficient design solution for numerous systems. Moreover, since FPGAs are important for the electronics industry, it becomes necessary to improve their security, particularly for SRAM FPGAs, since they are more vulnerable than other FPGA technologies. This paper proposes a solution to improve the security of SRAM FPGAs through flexible bitstream encryption. This proposition is distinct from other works because it uses the latest capabilities of SRAM FPGAs like partial dynamic reconfiguration and self-reconfiguration. It does not need an external battery to store the secret key. It opens a new way of application partitioning oriented by the security policy.

...read moreread less

Patent•

Adaptive RFID devices

[...]

Raj Bridgelall, Ahmad Chini, Curtis L. Carrender, John M. Price

20 Oct 2006

TL;DR: In this paper, a reconfigurable hardware element is reconfigured by the processor in response to a predetermined condition, such as the presence of a mixed tag population, proximity to an interferer, historical read rates, RF noise level, and reader location.

...read moreread less

Abstract: Techniques for RFID communication adaptive to an environment are provided. An RFID reader includes a processor and a reconfigurable hardware element. The reconfigurable hardware element is reconfigurable by the processor in response to a predetermined condition. Non-volatile memory stores configuration code for the reconfigurable hardware element. In specific embodiments, the predetermined condition can be the presence of a mixed tag population, proximity to an interferer, historical read rates, RF noise level, and reader location, as well as other factors.

...read moreread less

Journal Article•DOI•

Architectures for high-performance FPGA implementations of neural models.

[...]

Randall K. Weinstein¹, Robert H. Lee¹•Institutions (1)

Georgia Institute of Technology¹

01 Mar 2006-Journal of Neural Engineering

TL;DR: The use of FPGAs, or field programmable gate arrays, are described to easily implement a wide variety of neural models with the performance of custom analogue circuits or computer clusters, the reconfigurability of software, and at a cost rivalling personal computers.

...read moreread less

Abstract: As the complexity of neural models continues to increase (larger populations, varied ionic conductances, more detailed morphologies, etc) traditional software-based models have difficulty scaling to reach the performance levels desired. This paper describes the use of FPGAs, or field programmable gate arrays, to easily implement a wide variety of neural models with the performance of custom analogue circuits or computer clusters, the reconfigurability of software, and at a cost rivalling personal computers. FPGAs reach this level of performance by enabling the design of neural models as parallel processed data paths. These architectures provide for a wide range of single-compartment, multi-compartment and population models to be readily converted to FPGA implementations. Generalized architectures are described for the efficient modelling of a first-order, nonlinear differential equation in throughput maximizing or latency minimizing data-path configurations. The homogeneity of population and multicompartment models is exploited to form deep pipelines for improved performance. Limitations of FPGA architectures and future research areas are explored.

...read moreread less

Proceedings Article•DOI•

Advanced Components in the Variable Precision Floating-Point Library

[...]

Xiaojun Wang¹, S. Braganza¹, Miriam Leeser¹•Institutions (1)

Northeastern University¹

24 Apr 2006

TL;DR: The authors recently added three advanced components:floating-point division, floating-point square root and floating- point accumulation to their library, which can be used to achieve more parallelism and less power dissipation than adhering to a standard format.

...read moreread less

Abstract: Optimal reconfigurable hardware implementations may require the use of arbitrary floating-point formats that do not necessarily conform to IEEE specified sizes. The authors have previously presented a variable precision floating-point library for use with reconfigurable hardware. The authors recently added three advanced components: floating-point division, floating-point square root and floating-point accumulation to our library. These advanced components use algorithms that are well suited to FPGA implementations and exhibit a good tradeoff between area, latency and throughput. The floating-point format of our library is both general and flexible. All IEEE formats, including 64-bit double-precision format, are a subset of our format. All previously published floating-point formats for reconfigurable hardware are a subset of our format as well. The generic floating-point format supported by all of our library components makes it easy and convenient to create a pipelined, custom data path with optimal bitwidth for each operation. Our library can be used to achieve more parallelism and less power dissipation than adhering to a standard format. To further increase parallelism and reduce power dissipation, our library also supports hybrid fixed and floating point operations in the same design. The division and square root designs are based on table lookup and Taylor series expansion, and make use of memories and multipliers embedded on the FPGA chip. The iterative accumulator utilizes the library addition module as well as buffering and control logic to achieve performance similar to that of the addition by itself. They are all fully pipelined designs with clock speed comparable to that of other library components to aid the designer in implementing fast, complex, pipelined designs

...read moreread less

Proceedings Article•DOI•

Performance of FPGA implementation of bit-split architecture for intrusion detection systems

[...]

Hong-Jip Jung¹, Zachary K. Baker¹, Viktor K. Prasanna¹•Institutions (1)

University of Southern California¹

25 Apr 2006

TL;DR: This paper extends the bit-split string-matching architecture to satisfy the requirements of the IDS state-of-the-art, and shows that the architecture can be effectively optimized for FPGA implementation.

...read moreread less

Abstract: The use of reconfigurable hardware for network security applications has recently made great strides as field-programmable gate array (FPGA) devices have provided larger and faster resources. The performance of an intrusion detection system is dependent on two metrics: throughput and the total number of patterns that can fit on a device. In this paper, we consider the FPGA implementation details of the bit-split string-matching architecture. The bit-split algorithm allows large hardware state machines to be converted into a form with much higher memory efficiency. We extend the architecture to satisfy the requirements of the IDS state-of-the-art. We show that the architecture can be effectively optimized for FPGA implementation. We have optimized the pattern memory system parameters and developed new interface hardware for communicating with an external controller. The overall performance (bandwidth * number of patterns) is competitive with other memory-based string matching architectures implemented in FPGA.

...read moreread less

Proceedings Article•DOI•

Application-specific customization of parameterized FPGA soft-core processors

[...]

David Sheldon¹, Rakesh Kumar², Roman Lysecky³, Frank Vahid⁴, Dean M. Tullsen² - Show less +1 more•Institutions (4)

University of California, Berkeley¹, University of California, San Diego², University of Arizona³, University of California, Riverside⁴

05 Nov 2006

TL;DR: Two approaches are considered, one using a traditional CAD approach that does an initial characterization using synthesis to create an abstract problem model and then explores the solution space using a knapsack algorithm, and the other using a synthesis-in-the-loop exploration approach that improved speedups by an average of 20% when size constraints were tight.

...read moreread less

Abstract: Soft-core microprocessors mapped onto field-programmable gate arrays (FPGAs) represent an increasingly common embedded software implementation option. Modern FPGA soft-cores are parameterized to support application-specific customization, wherein pre-defined units, such as a multiplication unit or floating-point unit, may be included in the microprocessor architecture to speed up software execution at the expense of increased size. We introduce a methodology for fast application-specific customization of a parameterized FPGA soft core, using synthesis and execution to obtain size and performance data in order to create a tool that can be used across a variety of tool platforms and FPGA devices. As synthesizing a soft core takes tens of minutes, developing heuristics that execute in an acceptable time of an hour or two, yet find near-optimal results, is a challenge. We consider two approaches, one using a traditional CAD approach that does an initial characterization using synthesis to create an abstract problem model and then explores the solution space using a knapsack algorithm, and the other using a synthesis-in-the-loop exploration approach. We compare approaches for a variety of design constraints, on 11 EEMBC benchmarks, using an actual Xilinx soft-core processor, and for two different commercial Xilinx FPGA devices. Our results show that the approaches can generate a customized configuration exhibiting roughly 2x speedups over a base soft core, reaching within 4% of optimal in about 1.5 hours, including complete synthesis of the soft-core onto the FPGA, compared to over 11 hours for exhaustive search. Our results also show that including synthesis-in-the-loop, compared to a traditional CAD approach, improved speedups by an average of 20% when size constraints were tight. The approaches may also be applicable to soft-core processors targeted to ASICs in addition to FPGAs

...read moreread less

Proceedings Article•DOI•

Enabling a Uniform Programming Model Across the Software/Hardware Boundary

[...]

Erik K. Anderson¹, Jason Agron¹, W. Peck¹, Jim Stevens¹, Fabrice Baijot¹, Ed Komp¹, Ron Sass¹, David L. Andrews¹ - Show less +4 more•Institutions (1)

University of Kansas¹

24 Apr 2006

TL;DR: The hardware thread interface (HWTI) component provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.

...read moreread less

Abstract: In this paper, we present hthreads, a unifying programming model for specifying application threads running within a hybrid CPU/FPGA system. Threads are specified from a single pthreads multithreaded application program and compiled to run on the CPU or synthesized to run on the FPGA. The hthreads system, in general, is unique within the reconfigurable computing community as it abstracts the CPU/FPGA components into a unified custom threaded multiprocessor architecture platform. To support the abstraction of the CPU/FPGA component boundary, we have created the hardware thread interface (HWTI) component that frees the designer from having to specify and embed platform specific instructions to form customized hardware/ software interactions. Instead, the hardware thread interface supports the generalized pthreads API semantics, and allows passing of abstract data types between hardware and software threads. Thus the hardware thread interface provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.

...read moreread less

Collapse