scispace - formally typeset
Search or ask a question

Showing papers on "Field-programmable gate array published in 2007"


Journal ArticleDOI
TL;DR: Experimental measurements of the differences between a 90- nm CMOS field programmable gate array (FPGA) and 90-nm CMOS standard-cell application-specific integrated circuits (ASICs) in terms of logic density, circuit speed, and power consumption for core logic are presented.
Abstract: This paper presents experimental measurements of the differences between a 90-nm CMOS field programmable gate array (FPGA) and 90-nm CMOS standard-cell application-specific integrated circuits (ASICs) in terms of logic density, circuit speed, and power consumption for core logic. We are motivated to make these measurements to enable system designers to make better informed choices between these two media and to give insight to FPGA makers on the deficiencies to attack and, thereby, improve FPGAs. We describe the methodology by which the measurements were obtained and show that, for circuits containing only look-up table-based logic and flip-flops, the ratio of silicon area required to implement them in FPGAs and ASICs is on average 35. Modern FPGAs also contain "hard" blocks such as multiplier/accumulators and block memories. We find that these blocks reduce this average area gap significantly to as little as 18 for our benchmarks, and we estimate that extensive use of these hard blocks could potentially lower the gap to below five. The ratio of critical-path delay, from FPGA to ASIC, is roughly three to four with less influence from block memory and hard multipliers. The dynamic power consumption ratio is approximately 14 times and, with hard blocks, this gap generally becomes smaller

1,078 citations


Journal ArticleDOI
TL;DR: This paper reviews the state of the art of field- programmable gate array (FPGA) design methodologies with a focus on industrial control system applications and presents three main design rules, algorithm refinement, modularity, and systematic search for the best compromise between the control performance and the architectural constraints.
Abstract: This paper reviews the state of the art of field- programmable gate array (FPGA) design methodologies with a focus on industrial control system applications. This paper starts with an overview of FPGA technology development, followed by a presentation of design methodologies, development tools and relevant CAD environments, including the use of portable hardware description languages and system level programming/design tools. They enable a holistic functional approach with the major advantage of setting up a unique modeling and evaluation environment for complete industrial electronics systems. Three main design rules are then presented. These are algorithm refinement, modularity, and systematic search for the best compromise between the control performance and the architectural constraints. An overview of contributions and limits of FPGAs is also given, followed by a short survey of FPGA-based intelligent controllers for modern industrial systems. Finally, two complete and timely case studies are presented to illustrate the benefits of an FPGA implementation when using the proposed system modeling and design methodology. These consist of the direct torque control for induction motor drives and the control of a diesel-driven synchronous stand-alone generator with the help of fuzzy logic.

882 citations


Book
02 Nov 2007
TL;DR: This book is intended as an introduction to the entire range of issues important to reconfigurable computing, using FPGAs as the context, or "computing vehicles" to implement this powerful technology.
Abstract: The main characteristic of Reconfigurable Computing is the presence of hardware that can be reconfigured to implement specific functionality more suitable for specially tailored hardware than on a simple uniprocessor. Reconfigurable computing systems join microprocessors and programmable hardware in order to take advantage of the combined strengths of hardware and software and have been used in applications ranging from embedded systems to high performance computing. Many of the fundamental theories have been identified and used by the Hardware/Software Co-Design research field. Although the same background ideas are shared in both areas, they have different goals and use different approaches.This book is intended as an introduction to the entire range of issues important to reconfigurable computing, using FPGAs as the context, or "computing vehicles" to implement this powerful technology. It will take a reader with a background in the basics of digital design and software programming and provide them with the knowledge needed to be an effective designer or researcher in this rapidly evolving field. · Treatment of FPGAs as computing vehicles rather than glue-logic or ASIC substitutes · Views of FPGA programming beyond Verilog/VHDL · Broad set of case studies demonstrating how to use FPGAs in novel and efficient ways

531 citations


Proceedings ArticleDOI
03 Jun 2007
TL;DR: A new version of the NetFPGA 2.1 platform has been developed and is available for use by the academic community and has interfaces that can be parameterized, therefore enabling development of modular hardware designs with varied word sizes.
Abstract: The NetFPGA platform enables students and researchers to build high-performance networking systems in hardware. A new version of the NetFPGA platform has been developed and is available for use by the academic community. The NetFPGA 2.1 platform now has interfaces that can be parameterized, therefore enabling development of modular hardware designs with varied word sizes. It also includes more logic and faster memory than the previous platform. Field Programmable Gate Array (FPGA) logic is used to implement the core data processing functions while software running on embedded cores within the FPGA and/or programs running on an attached host computer implement only control functions. Reference designs and component libraries have been developed for the CS344 course at Stanford University. Open-source Verilog code is available for download from the project website.

360 citations


Journal ArticleDOI
TL;DR: The aim of this paper is to present the interest of implementing digital controllers using field-programmable gate array (FPGA) components, which consist of on-off current controllers, proportional-integral current controller, and predictive current controller.
Abstract: The aim of this paper is to present the interest of implementing digital controllers using field-programmable gate array (FPGA) components. To this purpose, a variety of current control techniques, which is applied to alternating current machine drives, is designed and implemented. They consist of on-off current controllers, proportional-integral current controller, and predictive current controller. The quality of the regulated current is significantly improved. It is mainly due to a very important reduction of the execution time delay. Indeed, in all described techniques, the execution time of the designed hardware architectures is only a few microseconds. This time reduction derives directly from the possibility offered by FPGAs to design very powerful dedicated architectures. Numerous experimental results are given in order to illustrate the efficiency of FPGA-based solutions to achieve high-performance control of electrical systems.

259 citations


Journal ArticleDOI
TL;DR: The designed intelligent control hardware can perform real-time control of the backpropagation learning algorithm of a neural network and becomes cost effective by using a high capacity of an FPGA chip.
Abstract: In this paper, we implement the intelligent neural network controller hardware with a field programmable gate array (FPGA)-based general purpose chip and a digital signal processing (DSP) board to solve nonlinear system control problems. The designed intelligent control hardware can perform real-time control of the backpropagation learning algorithm of a neural network. The basic proportional-integral-derivative (PID) control algorithms are implemented in an FPGA chip and a neural network controller is implemented in a DSP board. By using a high capacity of an FPGA chip, the additional hardware such as an encoder counter and a pulsewidth modulation (PWM) generator is implemented in a single FPGA chip. As a result, the controller becomes cost effective. It was tested for controlling nonlinear systems such as a robot finger and an inverted pendulum on a moving cart to show performance of the controller

244 citations


Journal ArticleDOI
TL;DR: This study presents a speed control integrated circuit (IC) for permanent magnet synchronous motor (PMSM) drive under this SoPC environment with two IPs, a Nios II embedded processor IP and an application IP.
Abstract: The new generation of field programmable gate array (FPGA) technologies enables an embedded processor intellectual property (IP) and an application IP to be integrated into a system-on-a-programmable-chip (SoPC) developing environment. Therefore, this study presents a speed control integrated circuit (IC) for permanent magnet synchronous motor (PMSM) drive under this SoPC environment. First, the mathematic model of PMSM is defined and the vector control used in the current loop of PMSM drive is explained. Then, an adaptive fuzzy controller adopted to cope with the dynamic uncertainty and external load effect in the speed loop of PMSM drive is proposed. After that, an FPGA-based speed control IC is designed to realize the controllers. The proposed speed control IC has two IPs, a Nios II embedded processor IP and an application IP. The Nios II processor is used to develop the adaptive fuzzy controller in software due to the complicated control algorithm and low sampling frequency control (speed control: 2 kHz). The designed application IP is utilized to implement the current vector controller in hardware owing to the requirement for high sampling frequency control (current loop: 16 kHz, pulsewidth modulation circuit: 4-8 MHz) but simple computation. Finally, an experimental system is set up and some experimental results are demonstrated.

233 citations


Book
01 Jan 2007
TL;DR: Digital Hardware Evolution.
Abstract: Digital Hardware Evolution.- An Online EHW Pattern Recognition System Applied to Sonar Spectrum Classification.- Design of Electronic Circuits Using a Divide-and-Conquer Approach.- Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel.- An Intrinsic Evolvable Hardware Based on Multiplexer Module Array.- Estimating Array Connectivity and Applying Multi-output Node Structure in Evolutionary Design of Digital Circuits.- Research on the Online Evaluation Approach for the Digital Evolvable Hardware.- Research on Multi-objective On-Line Evolution Technology of Digital Circuit Based on FPGA Model.- Evolutionary Design of Generic Combinational Multipliers Using Development.- Analog Hardware Evolution.- Automatic Synthesis of Practical Passive Filters Using Clonal Selection Principle-Based Gene Expression Programming.- Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware.- Analog Circuit Evolution Based on FPTA-2.- Bio-inspired Systems.- Knowledge Network Management System with Medicine Self Repairing Strategy.- Design of a Cell in Embryonic Systems with Improved Efficiency and Fault-Tolerance.- Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit.- Bio-inspired Systems with Self-developing Mechanisms.- Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System.- Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems.- Evolution of Polymorphic Self-checking Circuits.- Mechanical Hardware Evolution.- Sliding Algorithm for Reconfigurable Arrays of Processors.- System-Level Modeling and Multi-objective Evolutionary Design of Pipelined FFT Processors for Wireless OFDM Receivers.- Reducing the Area on a Chip Using a Bank of Evolved Filters.- Evolutionary Design.- Walsh Function Systems: The Bisectional Evolutional Generation Pattern.- Extrinsic Evolvable Hardware on the RISA Architecture.- Evolving and Analysing "Useful" Redundant Logic.- Adaptive Transmission Technique in Underwater Acoustic Wireless Communication.- Autonomous Robot Path Planning Based on Swarm Intelligence and Stream Functions.- Research on Adaptive System of the BTT-45 Air-to-Air Missile Based on Multilevel Hierarchical Intelligent Controller.- The Design of an Evolvable On-Board Computer.- Evolutionary Algorithms in Hardware Design.- Extending Artificial Development: Exploiting Environmental Information for the Achievement of Phenotypic Plasticity.- UDT-Based Multi-objective Evolutionary Design of Passive Power Filters of a Hybrid Power Filter System.- Designing Electronic Circuits by Means of Gene Expression Programming II.- Designing Polymorphic Circuits with Evolutionary Algorithm Based on Weighted Sum Method.- Robust and Efficient Multi-objective Automatic Adjustment for Optical Axes in Laser Systems Using Stochastic Binary Search Algorithm.- Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks.- Evolving in Extended Hamming Distance Space: Hierarchical Mutation Strategy and Local Learning Principle for EHW.- Hardware Implementation of Evolutionary Algorithms.- Adaptive and Evolvable Analog Electronics for Space Applications.- Improving Flexibility in On-Line Evolvable Systems by Reconfigurable Computing.- Evolutionary Design of Resilient Substitution Boxes: From Coding to Hardware Implementation.- A Sophisticated Architecture for Evolutionary Multiobjective Optimization Utilizing High Performance DSP.- FPGA-Based Genetic Algorithm Kernel Design.- Using Systolic Technique to Accelerate an EHW Engine for Lossless Image Compression.

231 citations


Journal ArticleDOI
TL;DR: A digital hardware realization of a real-time simulator for a complete induction machine drive using a field programmable gate array (FPGA) as the computational engine and modeling accuracy and efficiency are presented.
Abstract: This paper presents a digital hardware realization of a real-time simulator for a complete induction machine drive using a field-programmable gate array (FPGA) as the computational engine. The simulator was developed using Very High Speed Integrated Circuit Hardware Description Language (VHDL), making it flexible and portable. A novel device-characteristic based model suitable for FPGA implementation has been proposed for the 2-level 6-pulse IGBT-based voltage-source converter (VSC). The VSC model is computed at a fixed time-step of 12.5 ns allowing a highly detailed and precise accounting of gating signals. The simulator also models a squirrel cage induction machine, a direct field-oriented control system, a space-vector pulse-width modulation scheme (SVPWM) and a measurement system. A multirate simulation of the system shows the slow (machine) as well as the fast (VSC and control) dynamic components. Real time simulation results under steady-state and transient conditions demonstrate modeling accuracy and efficiency

217 citations


Journal ArticleDOI
TL;DR: The sequential processing of the layers in an NN has been exploited in this paper to implement large NNs using a method of layer multiplexing, so that a larger NN can be realized on a single chip at a lower cost.
Abstract: This paper presents a hardware implementation of multilayer feedforward neural networks (NN) using reconfigurable field-programmable gate arrays (FPGAs). Despite improvements in FPGA densities, the numerous multipliers in an NN limit the size of the network that can be implemented using a single FPGA, thus making NN applications not viable commercially. The proposed implementation is aimed at reducing resource requirement, without much compromise on the speed, so that a larger NN can be realized on a single chip at a lower cost. The sequential processing of the layers in an NN has been exploited in this paper to implement large NNs using a method of layer multiplexing. Instead of realizing a complete network, only the single largest layer is implemented. The same layer behaves as different layers with the help of a control block. The control block ensures proper functioning by assigning the appropriate inputs, weights, biases, and excitation function of the layer that is currently being computed. Multilayer networks have been implemented using Xilinx FPGA "XCV400hq240." The concept used is shown to be very effective in reducing resource requirements at the cost of a moderate overhead on speed. This implementation is proposed to make NN applications viable in terms of cost and speed for online applications. An NN-based flux estimator is implemented in FPGA and the results obtained are presented

205 citations


Journal ArticleDOI
TL;DR: A hierarchical software structure has been established to perform all data processing and the control of the hand and provides basic API functions and skills to access all hardware resources for data acquisition, computation and tele-operation.

Journal ArticleDOI
TL;DR: A magnetic nanofabric, which may provide a route to building reconfigurable spin-based logic circuits compatible with conventional electron-based devices, and its ability to realize logic gates with fewer devices than in CMOS-based circuits is described.
Abstract: We propose and describe a magnetic NanoFabric which provides a route to building reconfigurable spin-based logic circuits compatible with conventional electron-based devices. A distinctive feature of the proposed NanoFabric is that a bit of information is encoded into the phase of the spin wave signal. It makes possible to transmit information without the use of electric current and utilize wave interference for useful logic functionality. The basic elements include voltage-to-spin wave and wave-to-voltage converters, spin waveguides, a modulator, and a magnetoelectric cell. As an example of a magnetoelectric cell, we consider a two-phase piezoelectric-piezomagnetic system, where the spin wave signal modulation is due to the stress-induced anisotropy caused by the applied electric field. The performance of the basic elements is illustrated by experimental data and results of numerical modeling. The combination of the basic elements let us construct magnetic circuits for NOT and Majority logic gates. Logic gates AND, OR, NAND and NOR are shown to be constructed as the combination of NOT and a reconfigurable Majority gates. The examples of computational architectures such as Cellular Automata, Cellular Nonlinear Network and Field Programmable Gate Array are described. The main advantage of the proposed NanoFabric is in the ability to realize logic gates with less number of devices than it required for CMOS-based circuits. Potentially, the area of the elementary reconfigurable Majority gate can be scaled down to 0.1um2. The disadvantages and limitations of the proposed NanoFabric are discussed.

Journal ArticleDOI
TL;DR: A novel distributed-arithmetic (DA)-based proportional-integral-derivative (PID) controller algorithm is proposed and integrated into a digital feedback control system, resulting in cost reduction, high speed, and low power consumption, which is desirable in embedded control applications.
Abstract: In this paper, modular design of embedded feedback controllers using field-programmable gate array (FPGA) technology is studied. To this end, a novel distributed-arithmetic (DA)-based proportional-integral-derivative (PID) controller algorithm is proposed and integrated into a digital feedback control system. The DA-based PID controller demonstrates 80% savings in hardware utilization and 40% savings in power consumption compared to the multiplier-based scheme. It also offers good closed-loop performance while using less resources, resulting in cost reduction, high speed, and low power consumption, which is desirable in embedded control applications. The complete digital control system is built using commercial FPGAs to demonstrate the efficiency. The design uses a modular approach, so that some modules can be reused in other applications. These reusable modules can be ported into Matlab/Simulink as Simulink blocks for hardware/software cosimulation or integrated into a larger design in the Matlab/Simulink environment to allow for rapid prototyping applications.

Journal ArticleDOI
01 Sep 2007
TL;DR: In this article, a comparison of two scrubbing mitigation schemes for Xilinx field programmable gate array (FPGA) devices is presented, along with an examination of mitigation limitations.
Abstract: A comparison of two scrubbing mitigation schemes for Xilinx field programmable gate array (FPGA) devices is presented. The design of the scrubbers is briefly discussed along with an examination of mitigation limitations. Heavy ion data are then presented and analyzed.

Journal ArticleDOI
TL;DR: Enhanced partially parallel decoding architectures for quasi-cyclic low density parity check (QC-LDPC) codes are proposed to linearly increase the throughput of conventional partially parallel decoders through introducing a small percentage of extra hardware.
Abstract: This paper studies low-complexity high-speed decoder architectures for quasi-cyclic low density parity check (QC-LDPC) codes. Algorithmic transformation and architectural level optimization are incorporated to reduce the critical path. Enhanced partially parallel decoding architectures are proposed to linearly increase the throughput of conventional partially parallel decoders through introducing a small percentage of extra hardware. Based on the proposed architectures, a (8176, 7154) Euclidian geometry-based QC-LDPC code decoder is implemented on Xilinx field programmable gate array (FPGA) Virtex-II 6000, where an efficient nonuniform quantization scheme is employed to reduce the size of memories storing soft messages. FPGA implementation results show that the proposed decoder can achieve a maximum (source data) decoding throughput of 172 Mb/s at 15 iterations

Journal ArticleDOI
TL;DR: The results show that an MLP-BP network uses less clock cycles and consumes less real estate when compiled in an FXP format, compared with a larger and slower functioning compilation in an FLP format with similar data representation width, in bits, or a similar precision and range.
Abstract: In this paper, arithmetic representations for implementing multilayer perceptrons trained using the error backpropagation algorithm (MLP-BP) neural networks on field-programmable gate arrays (FPGAs) are examined in detail. Both floating-point (FLP) and fixed-point (FXP) formats are studied and the effect of precision of representation and FPGA area requirements are considered. A generic very high-speed integrated circuit hardware description language (VHDL) program was developed to help experiment with a large number of formats and designs. The results show that an MLP-BP network uses less clock cycles and consumes less real estate when compiled in an FXP format, compared with a larger and slower functioning compilation in an FLP format with similar data representation width, in bits, or a similar precision and range

Journal ArticleDOI
TL;DR: The performance benefits of a monolithically stacked three-dimensional (3-D) field-programmable gate array (FPGA), whereby the programming overhead of an FPGA is stacked on top of a standard CMOS layer containing logic blocks and interconnects, are investigated.
Abstract: The performance benefits of a monolithically stacked three-dimensional (3-D) field-programmable gate array (FPGA), whereby the programming overhead of an FPGA is stacked on top of a standard CMOS layer containing logic blocks (LBs) and interconnects, are investigated. A Virtex-II-style two-dimensional (2-D) FPGA fabric is used as a baseline architecture to quantify the relative improvements in logic density, delay, and power consumption achieved by such a 3-D FPGA. It is assumed that only the switch transistor and configuration memory cells can be moved to the top layers and that the 3-D FPGA employs the same LB and programmable interconnect architecture as the baseline 2-D FPGA. Assuming they are les 0.7, the area of a static random-access memory cell and switch transistors having the same characteristics as n-channel metal-oxide-semiconductor devices in the CMOS layer are used. It is shown that a monolithically stacked 3-D FPGA can achieve 3.2 times higher logic density, 1.7 times lower critical path delay, and 1.7 times lower total dynamic power consumption than the baseline 2-D FPGA fabricated in the same 65-nm technology node

Journal ArticleDOI
01 Nov 2007
TL;DR: By using a combination of 32-bit and 64- bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution.
Abstract: By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the Cell BE processor. Results on modern processor architectures and the Cell BE are presented.

Proceedings ArticleDOI
11 Nov 2007
TL;DR: The implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the XD1000 platform are presented and a multistage PE (processing element) design is brought forward which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited.
Abstract: An innovative reconfigurable supercomputing platform -- XD1000 is developed by XtremeData Inc. to exploit the rapid progress of FPGA technology and the high-performance of Hyper-Transport interconnection. In this paper, we present the implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the platform. The main features include: (1) we bring forward a multistage PE (processing element) design which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited; (2) our design features a pipelined control mechanism with uneven stage latencies -- a key to minimize the overall PE pipeline cycle time; (3) we also put forward a compressed substitution matrix storage structure, resulting in substantial decrease of the on-chip SRAM usage. Finally, we implement a 384-PE systolic array running at 66.7MHz, which can achieve 25.6GCUPS peak performance. Compared with the 2.2GHz AMD Opteron host processor, the FPGA coprocessor speedups 185X and 250X respectively.

Journal ArticleDOI
TL;DR: The design methodology and tool chain presented in this paper have been applied to the realization of a control system for solving the navigation tasks of an autonomous vehicle.
Abstract: Fuzzy-logic-based inference techniques provide efficient solutions for control problems in classical and emerging applications. However, the lack of specific design tools and systematic approaches for hardware implementation of complex fuzzy controllers limits the applicability of these techniques in modern microelectronics products. This paper discusses a design strategy that eases the implementation of embedded fuzzy controllers as systems on programmable chips. The development of the controllers is carried out by means of a reconfigurable platform based on field-programmable gate arrays. This platform combines specific hardware to implement fuzzy inference modules with a general-purpose processor, thus allowing the realization of hybrid hardware/software solutions. As happens to the components of the processing system, the specific fuzzy elements are conceived as configurable intellectual property modules in order to accelerate the controller design cycle. The design methodology and tool chain presented in this paper have been applied to the realization of a control system for solving the navigation tasks of an autonomous vehicle.

Journal ArticleDOI
02 Apr 2007
TL;DR: A new adaptive software/hardware reconfigurable system is presented in this paper, using a real application in the automotive domain implemented on a Xilinx Virtex-II 3000 FPGA to present results.
Abstract: Today's field programmable gate array (FPGA) architectures, like Xilinx's Virtex-II series, enable partial and dynamic run-time self-reconfiguration. This feature allows the substitution of parts of a hardware design implemented on this reconfigurable hardware, and therefore, a system can be adapted to the actual demands of applications running on the chip. Exploiting this possibility enables the development of adaptive hardware for a huge variety of applications. A novel method for communication interfaces using look up table (LUT)-based communication primitives enables an exact separation of reconfigurable parts and a fast and intelligent bus-system. A new adaptive software/hardware reconfigurable system is presented in this paper, using a real application in the automotive domain implemented on a Xilinx Virtex-II 3000 FPGA to present results

Journal ArticleDOI
TL;DR: This paper focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix and using a grid of this module, the entire SW matrix was computed at the speed of field propagation through theFPGA circuit.
Abstract: To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching.

Journal ArticleDOI
Tim Tuan1, Arif Rahman1, Satyaki Das1, Steve Trimberger1, Sean Kao 
TL;DR: The design and implementation of Pika, a low-power FPGA core targeting battery-powered applications that achieves substantial power savings through a series of power optimizations and is compatible with existing commercial design tools.
Abstract: Programmable logic devices such as field-programmable gate arrays (FPGAs) are useful for a wide range of applications. However, FPGAs are not commonly used in battery-powered applications because they consume more power than application-specified integrated circuits and lack power management features. In this paper, we describe the design and implementation of Pika, a low-power FPGA core targeting battery-powered applications. Our design is based on a commercial low-cost FPGA and achieves substantial power savings through a series of power optimizations. The resulting architecture is compatible with existing commercial design tools. The implementation is done in a 90-nm triple-oxide CMOS process. Compared to the baseline design, Pika consumes 46% less active power and 99% less standby power. Furthermore, it retains circuit and configuration state during standby mode and wakes up from standby mode in approximately 100 ns

01 Jan 2007
TL;DR: A Genetic Algorithm is presented which is capable of evolving 100% functional arithmetic circuits, based on evolving the functionality and connectivity of a rectangular array of logic cells and is modelled on the resources available on the Xilinx 6216 FPGA device.
Abstract: A Genetic Algorithm is presented which is capable of evolving 100% functional arithmetic circuits. Evolved designs are presented for one-bit, two-bit adders with carry, and two and three-bit multipliers and details of the 100% correct evolution of three and four-bit adders. The largest of these circuits are the most complex digital circuits to have been designed by purely evolutionary means. The algorithm is able to re-discover conventionally optimum designs for the one-bit and two-bit adders, but more significantly is able to improve on the conventional designs for the two-bit multiplier. By analysing the history of an evolving design up to complete functionality it is possible to gain insight into evolutionary process. The technique is based on evolving the functionality and connectivity of a rectangular array of logic cells and is modelled on the resources available on the Xilinx 6216 FPGA device. Further work is described about plans to evolve the designs directly onto this device.

Journal ArticleDOI
TL;DR: Two versions of a hardware processing architecture for modeling large networks of leaky-integrate-and-flre (LIF) neurons are presented; the second version provides performance enhancing features relative to the first.
Abstract: In this paper, we present two versions of a hardware processing architecture for modeling large networks of leaky-integrate-and-flre (LIF) neurons; the second version provides performance enhancing features relative to the first. Both versions of the architecture use fixed-point arithmetic and have been implemented using a single field-programmable gate array (FPGA). They have successfully simulated networks of over 1000 neurons configured using biologically plausible models of mammalian neural systems. The neuroprocessor has been designed to be employed primarily for use on mobile robotic vehicles, allowing bio-inspired neural processing models to be integrated directly into real-world control environments. When a neuroprocessor has been designed to act as part of the closed-loop system of a feedback controller, it is imperative to maintain strict real-time performance at all times, in order to maintain integrity of the control system. This resulted in the reevaluation of some of the architectural features of existing hardware for biologically plausible neural networks (NNs). In addition, we describe a development system for rapidly porting an underlying model (based on floating-point arithmetic) to the fixed-point representation of the FPGA-based neuroprocessor, thereby allowing validation of the hardware architecture. The developmental system environment facilitates the cooperation of computational neuroscientists and engineers working on embodied (robotic) systems with neural controllers, as demonstrated by our own experience on the Whiskerbot project, in which we developed models of the rodent whisker sensory system.

Proceedings ArticleDOI
16 Apr 2007
TL;DR: A novel architecture is proposed for at-speed silicon debug that enables a methodology where the designer can iteratively zoom only in the intervals containing erroneous samples, while significantly reducing the number of debug sessions.
Abstract: The size of on-chip trace buffers used for at-speed silicon debug limits the observation window in any debug session. Whenever the debug experiment can be repeated, we propose a novel architecture for at-speed silicon debug that enables a methodology where the designer can iteratively zoom only in the intervals containing erroneous samples. When compared to increasing the size of the trace buffer, the proposed architecture has a small impact on silicon area, while significantly reducing the number of debug sessions

Proceedings ArticleDOI
20 May 2007
TL;DR: This work proposes an isolation primitive, moats and drawbridges, that are built around four design properties: logical isolation, interconnect traceability, secure reconfigurable broadcast, and configuration scrubbing, and each is a fundamental operation with easily understood formal properties, yet maps cleanly and efficiently to a wide variety of reconfigured devices.
Abstract: Blurring the line between software and hardware, reconfigurable devices strike a balance between the raw high speed of custom silicon and the post-fabrication flexibility of general-purpose processors. While this flexibility is a boon for embedded system developers, who can now rapidly prototype and deploy solutions with performance approaching custom designs, this results in a system development methodology where functionality is stitched together from a variety of "soft IP cores," often provided by multiple vendors with different levels of trust. Unlike traditional software where resources are managed by an operating system, soft IP cores necessarily have very fine grain control over the underlying hardware. To address this problem, the embedded systems community requires novel security primitives which address the realities of modern reconfigurable hardware. We propose an isolation primitive, moats and drawbridges, that are built around four design properties: logical isolation, interconnect traceability, secure reconfigurable broadcast, and configuration scrubbing. Each of these is a fundamental operation with easily understood formal properties, yet maps cleanly and efficiently to a wide variety of reconfigurable devices. We carefully quantify the required overheads on real FPGAs and demonstrate the utility of our methods by applying them to the practical problem of memory protection.

Journal ArticleDOI
TL;DR: In this paper, the authors describe a system based on partial reconfiguration for running fault-injection experiments within the configuration memory of SRAM-based FPGAs, which uses the internal configuration capabilities that modern FPGA offer in order to inject SEU within configuration memory.
Abstract: Modern SRAM-based field programmable gate array (FPGA) devices offer high capability in implementing complex system. Unfortunately, SRAM-based FPGAs are extremely sensitive to single event upsets (SEUs) induced by radiation particles. In order to successfully deploy safety- or mission-critical applications, designer need to validate the correctness of the obtained designs. In this paper we describe a system based on partial-reconfiguration for running fault-injection experiments within the configuration memory of SRAM-based FPGAs. The proposed fault-injection system uses the internal configuration capabilities that modern FPGAs offer in order to inject SEU within the configuration memory. Detailed experimental results show that the technique is orders of magnitude faster than previously proposed ones.

Proceedings ArticleDOI
30 Sep 2007
TL;DR: This work shows that the direct mapping of a secure ASIC circuit-style in an FPGA does not preserve the same level of security, unless the symmetrical routing technique is employed, and demonstrates that secure logic implemented with this approach is resistant whereas non-routing-aware directly mapped circuits can be successfully attacked.
Abstract: In current Field-Programmable-Logic Architecture (FPGA) design flows, it is very hard to control the routing of submodules. It is thus very hard to make an identical copy of an existing circuit within the same FPGA fabric. We have solved this problem in a way that still enables us to modify the logic function of the copied sub-module. Our technique has important applications in the design of side-channel resistant implementations in FPGA. Starting from an existing single-ended design, we are able to create a complementary circuit. The resulting overall circuit strongly reduces the power-consumption-dependent information leaks. We show that the direct mapping of a secure ASIC circuit-style in an FPGA does not preserve the same level of security, unless our symmetrical routing technique is employed. We demonstrate our approach on an FPGA prototype of a cryptographic design, and show through power-measurements followed by side-channel power analysis that secure logic implemented with our approach is resistant whereas non-routing-aware directly mapped circuits can be successfully attacked.

Journal ArticleDOI
TL;DR: From this study, it is found that the configurable logic block's routing network is vulnerable to domain crossing errors, or TMR defeats, by even 2-bit multiple-bit upsets.
Abstract: This paper discusses the limitations of single-FPGA triple-modular redundancy in the presence of multiple-bit upsets on Xilinx Virtex-II devices. This paper presents results from both fault injection and accelerated testing. From this study we have found that the configurable logic block's routing network is vulnerable to domain crossing errors, or TMR defeats, by even 2-bit multiple-bit upsets.