scispace - formally typeset
Search or ask a question

Showing papers on "Reconfigurable computing published in 2006"


Proceedings ArticleDOI
01 Aug 2006
TL;DR: In this article, the authors describe architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs, augmented by a new design methodology that uses pre-routed IP cores for communication between static and dynamic modules and permits static designs to route through regions otherwise reserved for dynamic modules.
Abstract: The paper describes architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs. These are augmented by a new design methodology that uses pre-routed IP cores for communication between static and dynamic modules and permits static designs to route through regions otherwise reserved for dynamic modules. A new CAD tool flow to automate the methodology is also presented. The new tools initially target the Virtex-II, Virtex-II Pro and Virtex-4 families and are derived from Xilinx's commercial CAD tools

308 citations


Proceedings ArticleDOI
22 Oct 2006
TL;DR: This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's).
Abstract: This paper presents a hw/sw codesign methodology based on BORPH, an operating system designed for FPGA-based reconfigurable computers (RC's). By providing native kernel support for FPGA hardware, BORPH offers a homogeneous UNIX interface for both software and hardware processes. Hardware processes inherit the same level of service from the kernel, such as file system support, as typical UNIX software processes. Hardware and software components of a design therefore run as hardware and software processes within BORPH's run-time environment. The familiar and language independent UNIX kernel interface facilitates easy design reuse and rapid application development. Performance of our current implementation and our experience with developing a real-time wireless digital signal processing system based on BORPH will be presented.

179 citations


Journal ArticleDOI
TL;DR: An overview of reconfigurable computing in embedded systems, in terms of benefits it can provide, how it has already been used, design issues, and hurdles that have slowed its adoption are presented.
Abstract: Over the past few years, the realm of embedded systems has expanded to include a wide variety of products, ranging from digital cameras, to sensor networks, to medical imaging systems. Consequently, engineers strive to create ever smaller and faster products, many of which have stringent power requirements. Coupled with increasing pressure to decrease costs and time-to-market, the design constraints of embedded systems pose a serious challenge to embedded systems designers. Reconfigurable hardware can provide a flexible and efficient platform for satisfying the area, performance, cost, and power requirements of many embedded systems. This article presents an overview of reconfigurable computing in embedded systems, in terms of benefits it can provide, how it has already been used, design issues, and hurdles that have slowed its adoption.

157 citations


Proceedings ArticleDOI
01 Dec 2006
TL;DR: A nondeterministic finite automata (NFA) based implementation was presented, which takes advantage of new basic building blocks to support more complex regular expressions than the previous approaches.
Abstract: Recent intrusion detection systems (IDS) use regular expressions instead of static patterns as a more efficient way to represent hazardous packet payload contents. This paper focuses on regular expressions pattern matching engines implemented in reconfigurable hardware. A nondeterministic finite automata (NFA) based implementation was presented, which takes advantage of new basic building blocks to support more complex regular expressions than the previous approaches. The methodology is supported by a tool that automatically generates the circuitry for the given regular expressions, outputting VHDL representations ready for logic synthesis. Furthermore, techniques to reduce the area cost of our designs and maximize performance when targeting FPGAs were included. Experimental results show that our tool is able to generate a regular expression engine to match more than 500 IDS regular expressions (from the Snort ruleset) using only 25K logic cells and achieving 2 Gbps throughput on a Virtex2 and 2.9 on a Virtex4 device. Concerning the throughput per area required per matching non-meta character, our design is 3.4 and 10 times more efficient than previous ASIC and FPGA approaches, respectively

146 citations


Proceedings ArticleDOI
14 Jun 2006
TL;DR: In this paper, the authors explore the implementation of MPC technology into reconfigurable hardware such as a FPGA chip and present a rapid prototyping environment suitable for exploring the various implementation issues to bring MPC onto a chip.
Abstract: With its natural ability in handling constraints, model predictive control (MPC) has become an established control technology in the petrochemical industry, and its use is currently being pioneered in an increasingly wide range of process industries. It is also being proposed for a range of higher bandwidth applications, such as ships, aerospace and road vehicles. To extend its applications to miniaturized devices and/or embedded systems, this paper explores the implementation of the MPC technology into reconfigurable hardware such as a FPGA chip. A rapid prototyping environment suitable for exploring the various implementation issues to bring MPC onto a chip is described. Tests were conducted to verify the applicability of the "MPC on a chip" idea. It is shown that a modest FPGA chip could be used to implement a reasonably sized constrained MPC controller.

131 citations


Proceedings ArticleDOI
20 Oct 2006
TL;DR: The initial investigation reveals that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation and can provide an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.
Abstract: Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system.Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energyefficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.

97 citations


Journal ArticleDOI
TL;DR: The overall goal is to investigate biologically realistic models for the real-time control of robots operating within closed action-perception loops, and so the performance of the system on simulating a model of the cerebellum where the emulation of the temporal dynamics of the synaptic integration process is important is evaluated.
Abstract: A computing platform is described for simulating arbitrary networks of spiking neurons in real time. A hybrid computing scheme is adopted that uses both software and hardware components to manage the tradeoff between flexibility and computational power; the neuron model is implemented in hardware and the network model and the learning are implemented in software. The incremental transition of the software components into hardware is supported. We focus on a spike response model (SRM) for a neuron where the synapses are modeled as input-driven conductances. The temporal dynamics of the synaptic integration process are modeled with a synaptic time constant that results in a gradual injection of charge. This type of model is computationally expensive and is not easily amenable to existing software-based event-driven approaches. As an alternative we have designed an efficient time-based computing architecture in hardware, where the different stages of the neuron model are processed in parallel. Further improvements occur by computing multiple neurons in parallel using multiple processing units. This design is tested using reconfigurable hardware and its scalability and performance evaluated. Our overall goal is to investigate biologically realistic models for the real-time control of robots operating within closed action-perception loops, and so we evaluate the performance of the system on simulating a model of the cerebellum where the emulation of the temporal dynamics of the synaptic integration process is important.

94 citations


Proceedings ArticleDOI
01 Aug 2006
TL;DR: Two variants of CoNoChi are presented: one is based on a homogeneous hardware structure that is dynamically reconfigurable on logic block level, and the other one is adapted to the limited partial reconfiguration capabilities of Xilinx Virtex-II (Pro) FPGAs.
Abstract: This paper presents CoNoChi, an adaptable Network-on-Chip for dynamically reconfigurable hardware designs. CoNoChi is designed for taking advantage of the partial dynamic reconfiguration capabilities of modern FPGAs and applies this feature to adapt the network structure to the location, number and size of currently configured hardware modules. The network consists of the minimal number of switches required. Switches can be added or removed from the network by a global control instance at runtime. Compared to common fixed Network-on-Chip structures, the CoNoChi architecture reduces the area requirements and latency of the network and eases the online placement of hardware modules. Two variants of CoNoChi are presented: one is based on a homogeneous hardware structure that is dynamically reconfigurable on logic block level, and the other one is adapted to the limited partial reconfiguration capabilities of Xilinx Virtex-II (Pro) FPGAs.

91 citations


Book ChapterDOI
03 Oct 2006
TL;DR: In this paper, the hardware implementation of neural network using FPGAs is presented using Very High Speed Integrated Circuits Hardware Description Language (VHDL) and is implemented in FPGA chip.
Abstract: The usage of the FPGA (Field Programmable Gate Array) for neural network implementation provides flexibility in programmable systems. For the neural network based instrument prototype in real time application, conventional specific VLSI neural chip design suffers the limitation in time and cost. With low precision artificial neural network design, FPGAs have higher speed and smaller size for real time application than the VLSI design. In addition, artificial neural network based on FPGAs has fairly achieved with classification application. The programmability of reconfigurable FPGAs yields the availability of fast special purpose hardware for wide applications. Its programmability could set the conditions to explore new neural network algorithms and problems of a scale that would not be feasible with conventional processor. The goal of this work is to realize the hardware implementation of neural network using FPGAs. Digital system architecture is presented using Very High Speed Integrated Circuits Hardware Description Language (VHDL) and is implemented in FPGA chip. The design was tested on a FPGA demo board.

89 citations


Journal ArticleDOI
TL;DR: It is experimentally shown that the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%.
Abstract: As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are increasingly being used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multibit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, compared to conventional FPGA routing architectures, the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections.

83 citations


Proceedings ArticleDOI
03 May 2006
TL;DR: The REPLICA2Pro (Relocation per online Configuration Alteration in Virtex-2/-Pro) filter is developed, which is capable of performing task relocations by manipulating the task's bitstream during the regular allocation process without any extra time overhead.
Abstract: One vision of dynamic hardware reconfiguration is to deliver virtually unlimited hardware resources to a set of hardware tasks implementing arbitrary functions. By using partial reconfiguration, these tasks can be allocated and de-allocated on the reconfigurable architecture while others continue to operate. However, the exact placement of each task can only be determined during runtime according to the current resource allocation. This requires relocating each task from its original position after place and route to an area of available resources. The process of relocating tasks can result in a major time overhead. In order to solve this problem we have developed the REPLICA2Pro (Relocation per online Configuration Alteration in Virtex-2/-Pro) filter, which is capable of performing task relocations by manipulating the task's bitstream during the regular allocation process without any extra time overhead. The filter architecture, our reconfigurable system approach as well as our design flow and an experimental system setup are presented in this paper.

Proceedings ArticleDOI
24 Apr 2006
TL;DR: The rationale in choosing the development path taken and the general framework for porting an existing scientific code, such as NAMD, to the SRC-6 platform are presented and discussed in detail and the results are applicable to the large class of problems in scientific computing.
Abstract: This case study presents the results of porting a production scientific code, called NAMD, to the SRC-6 high-performance reconfigurable computing platform based on Field Programmable Gate Array (FPGA) technology. NAMD is a molecular dynamics code designed to run on large supercomputing systems and used extensively by the computational biophysics community. NAMD?s computational kernel is highly optimized to run on conventional von Neumann processors; this presents numerous challenges to its reimplementation on FPGA architecture. This paper presents an overview of the SRC-6 architecture and the NAMD application and then discusses the challenges, solutions, and results of the porting effort. The rationale in choosing the development path taken and the general framework for porting an existing scientific code, such as NAMD, to the SRC-6 platform are presented and discussed in detail. The results and methods presented in this paper are applicable to the large class of problems in scientific computing.


Book
20 Oct 2006
TL;DR: This chapter discusses the development of EHW-Based Fault Recovery for Online Systems, which involves designing Self-Adaptive Systems and quantifying Intrinsic Reconfiguration Time.
Abstract: PREFACE. ACKNOWLEDGMENTS. ACRONYMS. 1 INTRODUCTION. 1.1 Characteristics of Evolvable Circuits and Systems. 1.2 Why Evolvable Hardware Is Good (and Bad!). 1.3 Technology. 1.4 Evolvable Hardware vs. Evolved Hardware. 1.5 Intrinsic vs. Extrinsic Evolution. 1.6 Online vs. Offline Evolution. 1.7 Evolvable Hardware Applications. References. 2 FUNDAMENTALS OF EVOLUTIONARY COMPUTATION. 2.1 What Is an EA? 2.2 Components of an EA. 2.2.1 Representation. 2.2.2 Variation. 2.2.3 Evaluation. 2.2.4 Selection. 2.2.5 Population. 2.2.6 Termination Criteria. 2.3 Getting the EA to Work. 2.4 Which EA Is Best? References. 3 RECONFIGURABLE DIGITAL DEVICES. 3.1 Basic Architectures. 3.1.1 Programmable Logic Devices. 3.1.2 Field Programmable Gate Array. 3.2 Using Reconfigurable Hardware. 3.2.1 Design Phase. 3.2.2 Execution Phase. 3.3 Experimental Results. 3.4 Functional Overview of the POEtic Architecture. 3.4.1 Organic Subsystem. 3.4.2 Description of the Molecules. 3.4.3 Description of the Routing Layer. 3.4.4 Dynamic Routing. 3.5 Remarks. References. 4 RECONFIGURABLE ANALOG DEVICES. 4.1 Basic Architectures. 4.2 Transistor Arrays. 4.2.1 The NASA FTPA. 4.2.2 The Heidelberg FPTA. 4.3 Analog Arrays. 4.4 Remarks. References. 5 PUTTING EVOLVABLE HARDWARE TO USE. 5.1 Synthesis vs. Adaption. 5.2 Designing Self-Adaptive Systems. 5.2.1 Fault Tolerant Systems. 5.2.2 Real-Time Systems. 5.3 Creating Fault Tolerant Systems Using EHW. 5.4 Why Intrinsic Reconfiguration for Online Systems? 5.5 Quantifying Intrinsic Reconfiguration Time. 5.6 Putting Theory Into Practice. 5.6.1 Minimizing Risk With Anticipated Faults. 5.6.2 Minimizing Risk With Unanticipated Faults. 5.6.3 Suggested Practices. 5.7 Examples of EHW-Based Fault Recovery. 5.7.1 Population vs. Fitness-Based Designs. 5.7.2 EHW Compensators. 5.7.3 Robot Control. 5.7.4 The POEtic Project. 5.7.5 Embryo Development. 5.8 Remarks. References. 6 FUTURE WORK. 6.1 Circuit Synthesis Topics. 6.1.1 Digital Design. 6.1.2 Analog Design. 6.2 Circuit Adaption Topics. References. INDEX . ABOUT THE AUTHORS.

Journal ArticleDOI
TL;DR: Results and case studies with optimizations that are: on the gate level-Kasumi and International Data Encryption Algorithm encryptions; on the arithmetic level-redundant addition and multiplication function evaluation for two-dimensional rotation; and on the architecture level-Wavelet and Lempel-Ziv (LZ)-like compression are presented.
Abstract: A stream compiler (ASC) for computing with field programmable gate arrays (FPGAs) emerges from the ambition to bridge the hardware-design productivity gap where the number of available transistors grows more rapidly than the productivity of very large scale integration (VLSI) and FPGA computer-aided-design (CAD) tools. ASC addresses this problem with a softwarelike programming interface to hardware design (FPGAs) while keeping the performance of hand-designed circuits at the same time. ASC improves productivity by letting the programmer optimize the implementation on the algorithm level, the architecture level, the arithmetic level, and the gate level, all within the same C++ program. The increased productivity of ASC is applied to the hardware acceleration of a wide range of applications. Traditionally, hardware accelerators are tediously handcrafted to achieve top performance. ASC simplifies design-space exploration of hardware accelerators by transforming the hardware-design task into a software-design process, using only "GNU compiler collection (GCC)" and "make" to obtain a hardware netlist. From experience, the hardware-design productivity and ease of use are close to pure software development. This paper presents results and case studies with optimizations that are: 1) on the gate level-Kasumi and International Data Encryption Algorithm (IDEA) encryptions; 2) on the arithmetic level-redundant addition and multiplication function evaluation for two-dimensional (2-D) rotation; and 3) on the architecture level-Wavelet and Lempel-Ziv (LZ)-like compression

Journal ArticleDOI
TL;DR: To study single event upset (SEU) impact on signal processing applications, a novel fault injection technique to corrupt configuration bits is used, thereby simulating SEU faults and highlighting the benefits of dynamic reconfiguration for space-based reconfigurable computing.
Abstract: This paper describes novel methods of exploiting the partial, dynamic reconfiguration capabilities of Xilinx Virtex V1000 FPGAs to manage Single-Event Upset (SEU) faults owing to radiation in space environments. The on-orbit fault detection scheme uses radiation-hardened reconfiguration controllers to continuously monitor the configuration bitstreams of nine Virtex FPGAs and to correct errors by partial, dynamic reconfiguration of the FPGAs while they continue to execute. To study the SEU impact on our signal processing applications, we use a novel fault injection technique to corrupt configuration bits, thereby simulating SEU faults. By using dynamic reconfiguration, we can run the corrupted designs directly on the FPGA hardware, giving many orders of magnitude speed-up over purely software techniques. The fault injection method has been validated against proton beam testing, showing 97.6% agreement. Our work highlights the benefits of dynamic reconfiguration for space-based reconfigurable computing.

Proceedings ArticleDOI
24 Apr 2006
TL;DR: This paper describes how to partition the application between software and hardware and then model the performance of several alternatives for the task mapped to hardware, and demonstrates an implementation of one of these alternatives on a reconfigurable computer that achieves a 2 times speed-up over the software baseline.
Abstract: With advances in re configurable hardware, especially field-programmable gate arrays (FPGAs), it has become possible to use reconfigurable hardware to accelerate complex applications, such as those in scientific computing. There has been a resulting development of reconfigurable computers - computers which have both general purpose processors and reconfigurable hardware, as well as memory and high-performance interconnection networks. In this paper, we study the acceleration of molecular dynamics simulations using reconfigurable computers. We describe how we partition the application between software and hardware and then model the performance of several alternatives for the task mapped to hardware. We describe an implementation of one of these alternatives on a reconfigurable computer and demonstrate that for two real-world simulations, it achieves a 2 times speed-up over the software baseline. We then compare our design and results to those of prior efforts and explain the advantages of the hardware/software approach, including flexibility

Journal ArticleDOI
TL;DR: The final aim is the comparison of the methodologies applied in the two abstraction levels for designing hardware MLP’s or similar computational structures.

Proceedings ArticleDOI
25 Apr 2006
TL;DR: This paper proposes a highly parallel FPGA design for the Floyd-Warshall algorithm to solve the all-pairs shortest-paths problem in a directed graph to maximize parallelism in the presence of significant data dependences.
Abstract: With rapid advances in VLSI technology, field programmable gate arrays (FPGAs) are receiving the attention of the parallel and high performance computing community. In this paper, we propose a highly parallel FPGA design for the Floyd-Warshall algorithm to solve the all-pairs shortest-paths problem in a directed graph. Our work is motivated by a computationally intensive bio-informatics application that employs this algorithm. The design we propose makes efficient and maximal utilization of the large amount of resources available on an FPGA to maximize parallelism in the presence of significant data dependences. Experimental results from a working FPGA implementation on the Cray XD1 show a speedup of 22 over execution on the XD1's processor.

Proceedings ArticleDOI
24 Apr 2006
TL;DR: An architecture for a scalable computing machine built entirely using FPGA computing nodes that enables designers to implement large-scale computing applications using a heterogeneous combination of hardware accelerators and embedded microprocessors spread across many FPGAs, all interconnected by a flexible communication network is proposed.
Abstract: It has been shown that a small number of FPGAs can significantly accelerate certain computing tasks by up to two or three orders of magnitude. However, particularly intensive large-scale computing applications, such as molecular dynamics simulations of biological systems, underscore the need for even greater speedups to address relevant length and time scales. In this work, we propose an architecture for a scalable computing machine built entirely using FPGA computing nodes. The machine enables designers to implement large-scale computing applications using a heterogeneous combination of hardware accelerators and embedded microprocessors spread across many FPGAs, all interconnected by a flexible communication network. Parallelism at multiple levels of granularity within an application can be exploited to obtain the maximum computational throughput. By focusing on applications that exhibit a high computation-to-communication ratio, we narrow the extent of this investigation to the development of a suitable communication infrastructure for our machine, as well as an appropriate programming model and design flow for implementing applications. By providing a simple, abstracted communication interface with the objective of being able to scale to thousands of FPGA nodes, the proposed architecture appears to the programmer as a unified, extensible FPGA fabric. A programming model based on the MPI message-passing standard is also presented as a means for partitioning an application into independent computing tasks that can be implemented on our architecture. Finally, we demonstrate the first use of our design flow by developing a simple molecular dynamics simulation application for the proposed machine, which runs on a small platform of development boards

Book ChapterDOI
01 Jan 2006
TL;DR: This chapter introduces the VHSIC hardware description language (VHDL) or Verilog languages other the context for their use, in the development of modern real-time systems.
Abstract: Reconfigurable hardware is becoming available with large enough capacity and performance to support complete systems. Field-programmable gate array (FPGA) and Programmable logic device (PLD) can be configured using software techniques not unfamiliar to software engineers working in the field of real-time systems. This chapter introduces the VHSIC hardware description language (VHDL) or Verilog languages other the context for their use, in the development of modern real-time systems. Systems-on-chip, using reconfigurable hardware in the form of FPGAs or complex programmable device (CPLDs), are becoming a popular form for embedded systems, often because of their reduced power requirements. This approach demands an integrated approach to hardware and software development if it is to be successful. Opportunities for all kinds of new applications will emerge using this cost-effective technology. FPGAs and PLDs can be configured using software techniques not unfamiliar to software engineers working in the field of real-time systems. The two most common hardware specification languages are VHDL in Europe and Verilog in the US.

Proceedings ArticleDOI
01 Aug 2006
TL;DR: BORPH, an operating system framework for FPGA-based reconfigurable computers with a goal to ease and accelerate development of high-level applications to run on these computers, provides kernel support for FGPA resources by extending a standard Linux operating system.
Abstract: Advances in FPGA-based reconfigurable computers have made them a viable computing platform for a vast variety of computation demanding areas such as bioinformatics, speech recognition, and high-end digital signal processing. The lack of common, intuitive operating system support, however, hinders their wide deployment. This paper presents BORPH, an operating system frame-work for FPGA-based reconfigurable computers with a goal to ease and accelerate development of high-level applications to run on these computers. It provides kernel support for FPGA resources by extending a standard Linux operating system. Users therefore compile and execute hardware processes on FPGA resources the same way they run software processes on conventional processor-based systems. The operating system offers run-time general file system support to hardware processes as if they were software. Furthermore, a virtual file system is built to allow access to memories and registers defined in the FPGA, which provides communication links with running hardware processes. Increased productivities have been observed for high-level application developers, who have few previous experiences in hardware design, to implement complex mixed software/hardware designs on a FPGA-based reconfigurable computer running BORPH.

Journal ArticleDOI
TL;DR: This letter presents a reconfigurable hardware implementation of feed-forward neural networks using stochastic techniques to approximate the nonlinear sigmoid activation functions with reduced digital logic resources on an FPGA device with high fault tolerance capability.
Abstract: This letter presents a reconfigurable hardware implementation of feed-forward neural networks using stochastic techniques. The design is based on the stochastic computation theory to approximate the nonlinear sigmoid activation functions with reduced digital logic resources. The large parallel neural network structure is then implemented on a reconfigurable field-programmable gate array (FPGA) device with high fault tolerance capability. The method is applied to a neural-network based wind-speed sensorless control of a small wind turbine system. The experimental results confirmed the validity of the developed stochastic FPGA implementation. The general design method can be extended to include other power electronics applications with different feed-forward neural network structures

Journal ArticleDOI
TL;DR: A solution to improve the security of SRAM FPGAs through bitstream encryption that doesn't need any external battery to store the secret key and opens a new way of application partitioning according to the security policy.
Abstract: FPGAs are becoming increasingly attractive – thanks to the improvement of their capacities and their performances. Today, FPGAs represent an efficient design solution for numerous systems. Moreover, since FPGAs are important for the electronics industry, it becomes necessary to improve their security, particularly for SRAM FPGAs, since they are more vulnerable than other FPGA technologies. This paper proposes a solution to improve the security of SRAM FPGAs through flexible bitstream encryption. This proposition is distinct from other works because it uses the latest capabilities of SRAM FPGAs like partial dynamic reconfiguration and self-reconfiguration. It does not need an external battery to store the secret key. It opens a new way of application partitioning oriented by the security policy.

Patent
20 Oct 2006
TL;DR: In this paper, a reconfigurable hardware element is reconfigured by the processor in response to a predetermined condition, such as the presence of a mixed tag population, proximity to an interferer, historical read rates, RF noise level, and reader location.
Abstract: Techniques for RFID communication adaptive to an environment are provided. An RFID reader includes a processor and a reconfigurable hardware element. The reconfigurable hardware element is reconfigurable by the processor in response to a predetermined condition. Non-volatile memory stores configuration code for the reconfigurable hardware element. In specific embodiments, the predetermined condition can be the presence of a mixed tag population, proximity to an interferer, historical read rates, RF noise level, and reader location, as well as other factors.

Journal ArticleDOI
TL;DR: The use of FPGAs, or field programmable gate arrays, are described to easily implement a wide variety of neural models with the performance of custom analogue circuits or computer clusters, the reconfigurability of software, and at a cost rivalling personal computers.
Abstract: As the complexity of neural models continues to increase (larger populations, varied ionic conductances, more detailed morphologies, etc) traditional software-based models have difficulty scaling to reach the performance levels desired. This paper describes the use of FPGAs, or field programmable gate arrays, to easily implement a wide variety of neural models with the performance of custom analogue circuits or computer clusters, the reconfigurability of software, and at a cost rivalling personal computers. FPGAs reach this level of performance by enabling the design of neural models as parallel processed data paths. These architectures provide for a wide range of single-compartment, multi-compartment and population models to be readily converted to FPGA implementations. Generalized architectures are described for the efficient modelling of a first-order, nonlinear differential equation in throughput maximizing or latency minimizing data-path configurations. The homogeneity of population and multicompartment models is exploited to form deep pipelines for improved performance. Limitations of FPGA architectures and future research areas are explored.

Proceedings ArticleDOI
24 Apr 2006
TL;DR: The authors recently added three advanced components:floating-point division, floating-point square root and floating- point accumulation to their library, which can be used to achieve more parallelism and less power dissipation than adhering to a standard format.
Abstract: Optimal reconfigurable hardware implementations may require the use of arbitrary floating-point formats that do not necessarily conform to IEEE specified sizes. The authors have previously presented a variable precision floating-point library for use with reconfigurable hardware. The authors recently added three advanced components: floating-point division, floating-point square root and floating-point accumulation to our library. These advanced components use algorithms that are well suited to FPGA implementations and exhibit a good tradeoff between area, latency and throughput. The floating-point format of our library is both general and flexible. All IEEE formats, including 64-bit double-precision format, are a subset of our format. All previously published floating-point formats for reconfigurable hardware are a subset of our format as well. The generic floating-point format supported by all of our library components makes it easy and convenient to create a pipelined, custom data path with optimal bitwidth for each operation. Our library can be used to achieve more parallelism and less power dissipation than adhering to a standard format. To further increase parallelism and reduce power dissipation, our library also supports hybrid fixed and floating point operations in the same design. The division and square root designs are based on table lookup and Taylor series expansion, and make use of memories and multipliers embedded on the FPGA chip. The iterative accumulator utilizes the library addition module as well as buffering and control logic to achieve performance similar to that of the addition by itself. They are all fully pipelined designs with clock speed comparable to that of other library components to aid the designer in implementing fast, complex, pipelined designs

Proceedings ArticleDOI
25 Apr 2006
TL;DR: This paper extends the bit-split string-matching architecture to satisfy the requirements of the IDS state-of-the-art, and shows that the architecture can be effectively optimized for FPGA implementation.
Abstract: The use of reconfigurable hardware for network security applications has recently made great strides as field-programmable gate array (FPGA) devices have provided larger and faster resources. The performance of an intrusion detection system is dependent on two metrics: throughput and the total number of patterns that can fit on a device. In this paper, we consider the FPGA implementation details of the bit-split string-matching architecture. The bit-split algorithm allows large hardware state machines to be converted into a form with much higher memory efficiency. We extend the architecture to satisfy the requirements of the IDS state-of-the-art. We show that the architecture can be effectively optimized for FPGA implementation. We have optimized the pattern memory system parameters and developed new interface hardware for communicating with an external controller. The overall performance (bandwidth * number of patterns) is competitive with other memory-based string matching architectures implemented in FPGA.

Proceedings ArticleDOI
05 Nov 2006
TL;DR: Two approaches are considered, one using a traditional CAD approach that does an initial characterization using synthesis to create an abstract problem model and then explores the solution space using a knapsack algorithm, and the other using a synthesis-in-the-loop exploration approach that improved speedups by an average of 20% when size constraints were tight.
Abstract: Soft-core microprocessors mapped onto field-programmable gate arrays (FPGAs) represent an increasingly common embedded software implementation option. Modern FPGA soft-cores are parameterized to support application-specific customization, wherein pre-defined units, such as a multiplication unit or floating-point unit, may be included in the microprocessor architecture to speed up software execution at the expense of increased size. We introduce a methodology for fast application-specific customization of a parameterized FPGA soft core, using synthesis and execution to obtain size and performance data in order to create a tool that can be used across a variety of tool platforms and FPGA devices. As synthesizing a soft core takes tens of minutes, developing heuristics that execute in an acceptable time of an hour or two, yet find near-optimal results, is a challenge. We consider two approaches, one using a traditional CAD approach that does an initial characterization using synthesis to create an abstract problem model and then explores the solution space using a knapsack algorithm, and the other using a synthesis-in-the-loop exploration approach. We compare approaches for a variety of design constraints, on 11 EEMBC benchmarks, using an actual Xilinx soft-core processor, and for two different commercial Xilinx FPGA devices. Our results show that the approaches can generate a customized configuration exhibiting roughly 2x speedups over a base soft core, reaching within 4% of optimal in about 1.5 hours, including complete synthesis of the soft-core onto the FPGA, compared to over 11 hours for exhaustive search. Our results also show that including synthesis-in-the-loop, compared to a traditional CAD approach, improved speedups by an average of 20% when size constraints were tight. The approaches may also be applicable to soft-core processors targeted to ASICs in addition to FPGAs

Proceedings ArticleDOI
24 Apr 2006
TL;DR: The hardware thread interface (HWTI) component provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.
Abstract: In this paper, we present hthreads, a unifying programming model for specifying application threads running within a hybrid CPU/FPGA system. Threads are specified from a single pthreads multithreaded application program and compiled to run on the CPU or synthesized to run on the FPGA. The hthreads system, in general, is unique within the reconfigurable computing community as it abstracts the CPU/FPGA components into a unified custom threaded multiprocessor architecture platform. To support the abstraction of the CPU/FPGA component boundary, we have created the hardware thread interface (HWTI) component that frees the designer from having to specify and embed platform specific instructions to form customized hardware/ software interactions. Instead, the hardware thread interface supports the generalized pthreads API semantics, and allows passing of abstract data types between hardware and software threads. Thus the hardware thread interface provides an abstract, platform independent compilation target that enables thread and instruction-level parallelism across the software/hardware boundary.