scispace - formally typeset
Search or ask a question

Showing papers on "Field-programmable gate array published in 2008"


Patent
02 Jul 2008
TL;DR: In this paper, the authors present techniques to protect intellectual property cores on field programmable gate arrays (FPGAs) by associating each FPGA with a secret key, which is used to charge a customer per-use or per-configuration of their intellectual property.
Abstract: Techniques are used to protect intellectual property cores on field programmable gate arrays. An approach is to associate each field programmable gate array, or a limited number of field programmable gate arrays, with a secret key. Each field programmable gate array may only be properly configured or programmed by an appropriate encrypted bitstream (which includes one or more intellectual property cores). This encrypted bitstream has been encoded by or for the secret key associated with a particular FPGA. Other techniques are also presented in this application and include network-based, nonnetwork-based, software-based, layered, and other approaches. The techniques allow an intellectual property core vendor to charge a customer per-use or per-configuration of their intellectual property. This is because an encrypted bitstream is useable only in a limited number, possibly just one, of the integrated circuits.

218 citations


MonographDOI
10 Dec 2008
TL;DR: FPGA-based Implementation of Signal Processing Systems is an important reference for practising engineers and researchers working on the design and development of DSP systems for radio, telecommunication, information, audio-visual and security applications.
Abstract: Field programmable gate arrays (FPGAs) are an increasingly popular technology for implementing digital signal processing (DSP) systems. By allowing designers to create circuit architectures developed for the specific applications, high levels of performance can be achieved for many DSP applications providing considerable improvements over conventional microprocessor and dedicated DSP processor solutions. The book addresses the key issue in this process specifically, the methods and tools needed for the design, optimization and implementation of DSP systems in programmable FPGA hardware. It presents a review of the leading-edge techniques in this field, analyzing advanced DSP-based design flows for both signal flow graph- (SFG-) based and dataflow-based implementation, system on chip (SoC) aspects, and future trends and challenges for FPGAs. The automation of the techniques for component architectural synthesis, computational models, and the reduction of energy consumption to help improve FPGA performance, are given in detail. Written from a system level design perspective and with a DSP focus, the authors present many practical application examples of complex DSP implementation, involving: high-performance computing e.g. matrix operations such as matrix multiplication; high-speed filtering including finite impulse response (FIR) filters and wave digital filters (WDFs); adaptive filtering e.g. recursive least squares (RLS) filtering; transforms such as the fast Fourier transform (FFT). FPGA-based Implementation of Signal Processing Systems is an important reference for practising engineers and researchers working on the design and development of DSP systems for radio, telecommunication, information, audio-visual and security applications. Senior level electrical and computer engineering graduates taking courses in signal processing or digital signal processing shall also find this volume of interest.

215 citations


Journal ArticleDOI
TL;DR: The achieved system performance is at least one order of magnitude better than a PC-based solution, a result achieved by investigating the impact of several hardware-orientated optimizations on performance, area and accuracy.
Abstract: This paper proposes a parallel hardware architecture for image feature detection based on the scale invariant feature transform algorithm and applied to the simultaneous localization and mapping problem. The work also proposes specific hardware optimizations considered fundamental to embed such a robotic control system on-a-chip. The proposed architecture is completely stand-alone; it reads the input data directly from a CMOS image sensor and provides the results via a field-programmable gate array coupled to an embedded processor. The results may either be used directly in an on-chip application or accessed through an Ethernet connection. The system is able to detect features up to 30 frames per second (320times240 pixels) and has accuracy similar to a PC-based implementation. The achieved system performance is at least one order of magnitude better than a PC-based solution, a result achieved by investigating the impact of several hardware-orientated optimizations on performance, area and accuracy.

198 citations


Journal ArticleDOI
TL;DR: The systolic decomposition scheme is found to offer a flexible choice of the address length of the lookup tables (LUT) for DA-based computation to decide on suitable area time tradeoff, and the choice of address length yields the best of area-delay-power-efficient realizations of the FIR filter for various filter orders.
Abstract: In this paper, we present the design optimization of one- and two-dimensional fully pipelined computing structures for area-delay-power-efficient implementation of finite-impulse-response (FIR) filter by systolic decomposition of distributed arithmetic (DA)-based inner-product computation. The systolic decomposition scheme is found to offer a flexible choice of the address length of the lookup tables (LUT) for DA-based computation to decide on suitable area time tradeoff. It is observed that by using smaller address lengths for DA-based computing units, it is possible to reduce the memory size, but on the other hand that leads to increase of adder complexity and the latency. For efficient DA-based realization of FIR filters of different orders, the flexible linear systolic design is implemented on a Xilinx Virtex-E XCV2000E FPGA using a hybrid combination of Handel-C and parameterizable VHDL cores. Various key performance metrics such as number of slices, maximum usable frequency, dynamic power consumption, energy density, and energy throughput are estimated for different filter orders and address lengths. Analysis of the results obtained indicate that performance metrics of the proposed implementation is broadly in line with theoretical expectations. It is found that the choice of address length yields the best of area-delay-power-efficient realizations of the FIR filter for various filter orders. Moreover, the proposed FPGA implementation is found to involve significantly less area-delay complexity compared with the existing DA-based implementations of FIR filter.

194 citations


Proceedings ArticleDOI
24 Feb 2008
TL;DR: This work aims to raise awareness about security issues for users of FPGAs and makes custom compilation and low-level tinkering with bitstreams - à la JBits - possible.
Abstract: This poster presents an in-depth analysis of the Xilinx bitstream file format. This theoretical analysis is backed by a simple and efficient implementation of a reverse-engineering tool for Xilinx bitstreams. The development process followed these lines. First, publicly available documentation from Xilinx has been analyzed; then some custom assumptions about the bitstream format have been made. This information allowed a suitable algorithm to be run on well-chosen bitstreams. The output from this automated analysis step is a database which relates raw bitstream data to low-level netlist elements. This database is subsequently used as input to an efficient bitstream compiler which can either generate a bitstream from a low-level (XDL) description of the netlist, or conversely decompile any given bitstream to its low-level netlist elements. This work has been validated for the spartan3, virtex2, virtex4 and virtex5 FPGA lines from Xilinx. Decompiling a bitstream is very fast; it is two orders of magnitude faster than the reverse operation of compilation with Xilinx' bitgen. This work aims to raise awareness about security issues for users of FPGAs. It also makes custom compilation and low-level tinkering with bitstreams - a la JBits - possible

182 citations


Journal ArticleDOI
TL;DR: This paper details the design of a new high-speed pipelined application-specific instruction set processor (ASIP) for elliptic curve cryptography (ECC) using field-programmable gate-array (FPGA) technology.
Abstract: This paper details the design of a new high-speed pipelined application-specific instruction set processor (ASIP) for elliptic curve cryptography (ECC) using field-programmable gate-array (FPGA) technology. Different levels of pipelining were applied to the data path to explore the resulting performances and find an optimal pipeline depth. Three complex instructions were used to reduce the latency by reducing the overall number of instructions, and a new combined algorithm was developed to perform point doubling and point addition using the application specific instructions. An implementation for the United States Government National Institute of Standards and Technology-recommended curve over GF(2163) is shown, which achieves a point multiplication time of 33.05 s at 91 MHz on a Xilinx Virtex-E FPGA-the fastest figure reported in the literature to date. Using the more modern Xilinx Virtex-4 technology, a point multiplication time of 19.55 s was achieved, which translates to over 51120 point multiplications per second.

167 citations


Patent
10 Dec 2008
TL;DR: In this paper, a platform supporting reconfigurable computing, enabling the introduction of reconfigable hardware into portable devices is described, which is a heterogeneous multi-processor platform, containing one or more instruction set processors (ISP) and a recon-figurable matrix (for instance a gate array, especially an FPGA).
Abstract: A platform supporting reconfigurable computing, enabling the introduction of reconfigurable hardware into portable devices is described. Dynamic hardware/software multitasking methods for a reconfigurable computing platform including reconfigurable hardware devices such as gate arrays, especially FPGA's, and software, such as dedicated hardware/software operating systems and middleware, adapted for supporting the methods, especially multitasking, are described. A computing platform, which is a heterogeneous multi-processor platform, containing one or more instruction set processors (ISP) and a reconfigurable matrix (for instance a gate array, especially an FPGA), adapted for (dynamic) hardware/software multitasking is described.

147 citations


Journal ArticleDOI
TL;DR: Results show that the reconfigurable instruction cell array delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.
Abstract: This paper presents a novel instruction cell-based reconfigurable computing architecture for low-power applications, thereafter referred to as the reconfigurable instruction cell array (RICA). For the development of the RICA, a top-down software driven approach was taken and revealed as one of the key design decisions for a flexible, easy to program, low-power architecture. These features make RICA an architecture that inherently solves the main design requirements of modern low-power devices. Results show that it delivers considerably less power consumption when compared to leading VLIW and low-power digital signal processors, but still maintaining their throughput performance.

145 citations


Proceedings ArticleDOI
01 Jan 2008

143 citations


Journal Article
TL;DR: This paper implements Feldkamp-Davis-Kress (FDK) algorithm on commodity GPU using an acceleration scheme that saves the copy time, and the combination of z-axis symmetry and multiple render targets (MRTs) reduces the computational cost on the geometry mapping between slices to be reconstructed and projection views.
Abstract: Three dimension Computed Tomography (CT) reconstruction is computationally demanding To accelerate the speed of reconstruction, Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) has been used, but they are expensive, inflexible and not easy to upgrade The modern Graphics Processing Unit (GPU) with its programmable features improves this situation and becomes one of the powerful and flexible tools for 3D CT reconstruction In this paper, we implement Feldkamp-Davis-Kress (FDK) algorithm on commodity GPU using an acceleration scheme In the scheme, two techniques are developed and combined One is cyclic render-to-texture (CRTT) which saves the copy time, and the other is the combination of z-axis symmetry and multiple render targets (MRTs), which reduces the computational cost on the geometry mapping between slices to be reconstructed and projection views Our algorithm performs reconstruction of a 5123 volume from 360 views of the size 512 x 512 about 52s on a single NVIDIA GeForce 8800GTX card

138 citations


Journal ArticleDOI
TL;DR: Two algorithms, 2-D and 3-D, are analyzed and implemented in an FPGA and both implementations are compared in terms of implementation complexity and logic resources required.
Abstract: Multilevel converters can meet the increasing demand of power ratings and power quality associated with reduced harmonic distortion and lower electromagnetic interference. When the number of levels increases, it is necessary to control more and more switches in parallel. Field programmable gate arrays (FPGAs), with their concurrent processing capability, are suitable for the implementation of multilevel modulation algorithms. Among them, space vector pulsewidth modulation algorithms offer great flexibility to optimize switching waveforms and are well suited for digital implementation. In this paper, two algorithms, 2-D and 3-D, are analyzed and implemented in an FPGA. In order to carry out the implementation, both algorithms have been described in very high speed integrated circuit hardware description language, partly hand coded, and partly automatically generated using the system generator tool. Both implementations are compared in terms of implementation complexity and logic resources required. Finally, test results with a neutral-point-clamped inverter are presented.

Book
01 Jan 2008
TL;DR: FPGA Prototyping by VHDL Examples is an indispensable companion text for introductory digital design courses and also serves as a valuable self-teaching guide for practicing engineers who wish to learn more about this emerging area of interest.
Abstract: A hands-on introduction to VHDL synthesis and FPGA prototyping Hardware Descriptive Language (HDL) and Field Programmable Gate Array (FPGA) devices allow designers to quickly develop and simulate a sophisticated digital circuit, realize it on a prototyping device, and verify the operation of its physical implementation. As these technologies have matured, they have become accepted mainstream practice so that it is possible to use a PC and an inexpensive FPGA prototyping board to construct a complex digital system. This book uses a "learn by doing" approach to introduce the concepts and techniques of VHDL and FPGA to designers through a series of hands-on experiments. FPGA Prototyping by VHDL Examples provides: A collection of clear, easy-to-follow templates for quick code development A large number of practical examples to illustrate and reinforce the concepts and design techniques Realistic projects that can be implemented and tested on a Xilinx prototyping board A thorough exploration of the Xilinx PicoBlaze soft-core microcontroller Although the book is an introductory text, the examples are developed in a rigorous manner and the derivations follow strict design guidelines and coding practices used for large, complex systems. It lays a solid foundation for students and new engineers and prepares them for future development tasks. FPGA Prototyping by VHDL Examples is an indispensable companion text for introductory digital design courses and also serves as a valuable self-teaching guide for practicing engineers who wish to learn more about this emerging area of interest.

Journal ArticleDOI
Glen Gibb1, John W. Lockwood1, Jad Naous1, P. Hartke1, Nick McKeown1 
TL;DR: A new version of the NetFPGA platform has been developed and is available for use by the academic community, with modular interfaces that enable development of complex hardware designs by integration of simple building blocks.
Abstract: The NetFPGA platform enables students and researchers to build high-performance networking systems using field-programmable gate array (FPGA) hardware. A new version of the NetFPGA platform has been developed and is available for use by the academic community. The NetFPGA platform has modular interfaces that enable development of complex hardware designs by integration of simple building blocks. FPGA logic is used to implement the core data processing functions while software running on an attached host computer or embedded cores within the device implement control functions. Reference designs and component libraries have been developed for the CS344 course at Stanford University, Stanford, CA, and taught at a series of tutorials held in the United States, United Kingdom, India, China, Australia, and Europe. The open-source Verilog, C, Perl, and Java reference design is available for download from the project website.

Journal ArticleDOI
TL;DR: A new approach using field-programmable gate array (FPGA) to implement a fully digital control algorithm of active power filter (APF) is proposed in this paper, and experimental results on a laboratory prototype are given to demonstrate performance of the proposed approach during steady-state and dynamic operations.
Abstract: A new approach using field-programmable gate array (FPGA) to implement a fully digital control algorithm of active power filter (APF) is proposed in this paper. This FPGA-based controller integrates the whole signal-processing function of an APF, including synchronous-reference-frame transform, low-pass filter, three-phase phase-locked loop, inverter-current controller, etc. By case studies on the principle, performance, and architecture, these control blocks are implemented in real-time and synthesized into a medium-scale FPGA chip by adopting some useful digital-signal-processing techniques, such as pipelining, folding and strength reduction, with respect to minimization of hardware resource and enhancement of operating frequency. As a result, the whole algorithm needs around 5000 logic elements and can run at synchronous system-clock rates of up to 65 MHz. Experimental results on a laboratory prototype are given to demonstrate performance of the proposed approach during steady-state and dynamic operations.

Journal ArticleDOI
TL;DR: A hierarchical software structure has been established to perform all data processing and the control of the hand and provides basic air position indicator functions and skills to access all hardware resources for data acquisition, computation, and teleoperation.
Abstract: This paper presents hardware and software architecture of the newly developed compact multisensory German aerospace research (DLR)-Harbin Institute of Technology (HIT)-Hand. The hand has four identical fingers and an extra degree of freedom for the palm. In each finger, there is a field-programmable gate array (FPGA) for data collection, brushless dc motors for control, and communication is accomplished with palm's FPGA by point-to-point serial communication (PPSeCo). The kernel of the hardware system is a peripheral component interconnect (PCI)- based high-speed floating-point DSP for data processing, and FPGA for high-speed (up to 25 Mb/s) real-time serial communication with the palm's FPGA. In order to achieve high modularity and reliability of the hand, a fully mechatronic integration and analog signals in situ digitalization philosophy is implemented to minimize the dimension and number of the cables (five cables including power supply), and protect data communication from outside disturbances. Furthermore, according to the hardware structure of the hand, a hierarchical software structure has been established to perform all data processing and the control of the hand. It provides basic air position indicator (API) functions and skills to access all hardware resources for data acquisition, computation, and teleoperation. With the nice design of the hand's envelop, the hand looks more like a humanoid.

Journal ArticleDOI
TL;DR: A stochastic NN structure is proposed in this paper for an FPGA implementation of a feedforward NN to estimate the feedback signals in an induction motor drive and significantly reduces the number of logic gates required for the proposed NN estimator.
Abstract: This paper applies stochastic theory to the design and implementation of field-oriented control of an induction motor drive using a single field-programmable gate array (FPGA) device and integrated neural network (NN) algorithms. Normally, NNs are characterized as heavily parallel calculation algorithms that employ enormous computational resources and are less useful for economical digital hardware implementations. A stochastic NN structure is proposed in this paper for an FPGA implementation of a feedforward NN to estimate the feedback signals in an induction motor drive. The stochastic arithmetic simplifies the computational elements of the NN and significantly reduces the number of logic gates required for the proposed NN estimator. A new stochastic proportional-integral speed controller is also developed with antiwindup functionality. Compared with conventional digital controls for motor drives, the proposed stochastic-based algorithm enhances the arithmetic operations of the FPGA, saves digital resources, and permits the NN algorithms and classical control algorithms to be easily interfaced and implemented on a single low-complexity, inexpensive FPGA. The algorithm has been realized using a single FPGA XC3S400 from Xilinx, Inc. A hardware-in-the-loop (HIL) test platform using a Real Time Digital Simulator is built in the laboratory. The HIL experimental results are provided to verify the proposed FPGA controller.

Proceedings ArticleDOI
19 Oct 2008
TL;DR: A system of vectorized software and soft vector processor hardware that is portable to any FPGA architecture and vector processor configuration, scalable to larger yet higher-performance designs, and flexible, allowing the underlying vector processor to be customized to match the needs of each application.
Abstract: While soft processors are increasingly common in FPGA-based embedded systems, it remains a challenge to scale their performance. We propose extending soft processor instruction sets to include support for vector processing. The resulting system of vectorized software and soft vector processor hardware is (i) portable to any FPGA architecture and vector processor configuration, (ii) scalable to larger yet higher-performance designs, and (iii) flexible, allowing the underlying vector processor to be customized to match the needs of each application. Using our robust and verified parameterized vector processor design and industry-standard EEMBC benchmarks, we evaluate the performance and area trade-offs for different soft vector processor configurations using an FPGA development platform with DDR SDRAM. We find that on average we can scale performance from 1.8x up to 6.3x for a vector processor design that saturates the capacity of our platform's Stratix 1S80 FPGA. We also automatically generate application-specific vector processors with reduced datapath width and instruction set support which combined reduce the area by up to 70% (61% on average) without affecting performance.

Book
01 Jan 2008
TL;DR: This book will be ideal for electronic and computer engineering students taking a practical or Lab based course on digital systems development using PLDs and for engineers in industry looking for concrete advice on developing a digital system using a FPGA or CPLD as its core.
Abstract: This textbook explains how to design and develop digital electronic systems using programmable logic devices (PLDs). Totally practical in nature, the book features numerous (quantify when known) case study designs using a variety of Field Programmable Gate Array (FPGA) and Complex Programmable Logic Devices (CPLD), for a range of applications from control and instrumentation to semiconductor automatic test equipment.Key features include:* Case studies that provide a walk through of the design process, highlighting the trade-offs involved.* Discussion of real world issues such as choice of device, pin-out, power supply, power supply decoupling, signal integrity- for embedding FPGAs within a PCB based design.With this book engineers will be able to:* Use PLD technology to develop digital and mixed signal electronic systems* Develop PLD based designs using both schematic capture and VHDL synthesis techniques* Interface a PLD to digital and mixed-signal systems* Undertake complete design exercises from design concept through to the build and test of PLD based electronic hardwareThis book will be ideal for electronic and computer engineering students taking a practical or Lab based course on digital systems development using PLDs and for engineers in industry looking for concrete advice on developing a digital system using a FPGA or CPLD as its core. *Case studies that provide a walk through of the design process, highlighting the trade-offs involved.*Discussion of real world issues such as choice of device, pin-out, power supply, power supply decoupling, signal integrity- for embedding FPGAs within a PCB based design.

Proceedings ArticleDOI
06 Nov 2008
TL;DR: The proposed REM architecture, based on nondeterministic finite automaton (RE-NFA), efficiently constructs regular expression matching engines (REME) of arbitrary regular patterns and character classes in a uniform structure, utilizing both logic slices and block memory available on modern FPGA devices.
Abstract: In this paper we present a novel architecture for high-speed and high-capacity regular expression matching (REM) on FPGA. The proposed REM architecture, based on nondeterministic finite automaton (RE-NFA), efficiently constructs regular expression matching engines (REME) of arbitrary regular patterns and character classes in a uniform structure, utilizing both logic slices and block memory (BRAM) available on modern FPGA devices. The resulting circuits take advantage of synthesis and routing optimizations to achieve high operating speed and area efficiency. The uniform structure of our RE-NFA design can be stacked in a simple way to produce multi-character input circuits to scale up throughput further. An n-state m-character input REME takes only O (n X log2m) time to construct and occupies no more than O (n X m) logic units. The REMEs can be staged and pipelined in large numbers to achieve high parallelism without sacrificing clock frequency.Using the proposed RE-NFA architecture, we are able to implement 3 copies of two-character input REMEs, each with 760 regular expressions, 18715 states and 371 character classes, onto a single Xilinx Virtex 4 LX-100-12 device. Each copy processes 2 characters per clock cycle at 300 MHz, resulting in a concurrent throughput of 14.4 Gbps for 760 REMEs. Compared with the automatic NFA-to-VHDL REME compilation [13], our approach achieves over 9x throughput efficiency (Gbps*state/LUT). Compared with state-of-the-art REMEs on FPGA, our approach also indicates up to 70% better throughput efficiency.

Journal ArticleDOI
TL;DR: Simulation studies on a realistic example show that it is possible to implement constrained MPC on an FPGA chip with a 25MHz clock and achieve MPC implementation rates comparable to those achievable on a Pentium 3.0 GHz PC.

Book ChapterDOI
11 Jun 2008
TL;DR: An updated implementation of the Advanced Encryption Standard (AES) on the recent Xilinx Virtex-5 FPGAs is presented, showing how a modified slice structure in these reconfigurable hardware devices results in significant improvement of the design efficiency.
Abstract: This paper presents an updated implementation of the Advanced Encryption Standard (AES) on the recent Xilinx Virtex-5 FPGAs. We show how a modified slice structure in these reconfigurable hardware devices results in significant improvement of the design efficiency. In particular, a single substitution box of the AES can fit in 8 FPGA slices. We combine these technological changes with a sound intertwining of the round and key round functionalities in order to produce encryption and decryption architectures that perfectly fit with the Digital Cinema Initiative specifications. More generally, our implementations are convenient for any application requiring Gbps-range throughput.

Proceedings ArticleDOI
23 Sep 2008
TL;DR: This paper addresses problems, limitations and results of on- chip reconfiguration that enable the user to decide whether DPR is suitable for a certain design prior to its implementation and presents an IP core that enables fast on-chip DPR close to the maximum achievable speed.
Abstract: Dynamic and partial reconfiguration (DPR) is a special feature offered by Xilinx Field Programmable Gate Arrays (FPGAs), giving the designer the ability to reconfigure a certain portion of the FPGA during run-time without influencing the other parts. This feature allows the hardware to be adaptable to any potential situation. For some applications, such as video-based driver assistance, the time needed to exchange a certain portion of the device might be critical. This paper addresses problems, limitations and results of on-chip reconfiguration that enable the user to decide whether DPR is suitable for a certain design prior to its implementation. A method is therefore introduced to calculate the expected reconfiguration throughput and latency. In addition, an IP core is presented that enables fast on-chip DPR close to the maximum achievable speed. Compared to an alternative state-of-the art realization, an increase in speed by a factor of 58 can be obtained.

01 Jan 2008
TL;DR: This survey establishes the foundations for discussing FPGAs security, examines a wide range of attacks and defenses along with the current state of industry oerings, and outlines on-going research and latest developments.
Abstract: Volatile FPGAs, the dominant type of programmable logic devices, are used in space, military, automotive, and consumer electronics applications which require them to operate in a wide range of environments. The continuous growth in both their capability and capacity now requires signicant resources to be invested in the designs that are created for them. This has brought increased interest in the security attributes of FPGAs; specically, how well do they protect the information processed within it, how are designs protected during distribution, and how developers’ ownership rights are protected while designs from multiple sources are combined. This survey establishes the foundations for discussing \FPGA security", examines a wide range of attacks and defenses along with the current state of industry oerings, and nally, outlines on-going research and latest developments.


Proceedings ArticleDOI
23 Sep 2008
TL;DR: This paper provides the first comprehensive survey of fault detection methods and fault tolerance schemes specifically for FPGAs, with the goal of laying a strong foundation for future research in this field.
Abstract: Reliability and process variability are serious issues for FPGAs in the future. Fortunately FPGAs have the ability to reconfigure in the field and at runtime, thus providing opportunities to overcome some of these issues. This paper provides the first comprehensive survey of fault detection methods and fault tolerance schemes specifically for FPGAs, with the goal of laying a strong foundation for future research in this field. All methods and schemes are qualitatively compared and some particularly promising approaches highlighted.

Journal ArticleDOI
TL;DR: A general-purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips that are programmed using open-source signal-processing libraries that are designed to be flexible, scalable, and chip-independent.
Abstract: . A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field of view, by employing high-performance digital signal-processing hardware to phase and correlate signals from large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN2 B M N 2 , where B B is the signal bandwidth, M M is the number of independent beams, and N N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general-purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal-processing libraries that we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of...

Journal ArticleDOI
TL;DR: A new architecture and set of dynamic CAD tools demonstrate warp processing's potential, resulting in 2X to 100X speedup over executing on microprocessors.
Abstract: Warp processing dynamically and transparently transforms an executing microprocessor's binary kernels into customized field-programmable gate array (FPGA) circuits, commonly resulting in 2X to 100X speedup over executing on microprocessors. A new architecture and set of dynamic CAD tools demonstrate warp processing's potential.

Journal ArticleDOI
TL;DR: In this paper, the authors present an application driven hardware exploration where they implement real-time, isolated digit speech recognition using a Liquid State Machine, a recurrent neural network of spiking neurons where only the output layer is trained.

Journal ArticleDOI
TL;DR: A field programmable gate array (FPGA)-based speed controller for a synchronous machine with an internal current control loop based on a predictive current controller is presented and Experimental results are shown to prove the efficiency of FPGA-based solutions to achieve high performances.
Abstract: In this paper, a field programmable gate array (FPGA)-based speed controller for a synchronous machine with an internal current control loop based on a predictive current controller is presented. Due to their complex computation schemes, predictive current controllers implemented in a full digital system are characterized by an inevitable delay in calculating and applying the switching states to the inverter. Consequently, their performances are affected and the achieved sampling frequency is limited. These digital control limitations are mainly due to the processing speed versus computational complexity trade-off. To cope with this problem, specific digital hardware technology such as FPGA can be used as an alternative digital solution to ensure fast processing operation and to preserve performances of predictive current controllers in spite of their complex computation schemes. Such performances can be preserved thanks to the high flexibility and high computation capabilities of FPGAs. In order to illustrate this, an FPGA implementation of a synchronous machine speed controller based on a predictive current controller is presented and fully analyzed in this work. The obtained execution time is only of few microseconds for the whole control algorithm. Experimental results are shown to prove the efficiency of FPGA-based solutions to achieve high performances.

Proceedings Article
01 Jan 2008
TL;DR: This work presents an application driven digital hardware exploration where real-time, isolated digit speech recognition is implemented using a Liquid State Machine, a recurrent neural network of spiking neurons where only the output layer is trained.
Abstract: Hardware implementations of Spiking Neural Networks are numerous because they are well suited for implementation in digital and analog hardware, and outperform classic neural networks. This work presents an application driven digital hardware exploration where we implement real-time, isolated digit speech recognition using a Liquid State Machine. The Liquid State Machine is a recurrent neural network of spiking neurons where only the output layer is trained. First we test two existing hardware architectures which we improve and extend, but that appears to be too fast and thus area consuming for this application. Next, we present a scalable, serialized architecture that allows a very compact implementation of spiking neural networks that is still fast enough for real-time processing. All architectures support leaky integrate-and-fire membranes with exponential synaptic models. This work shows that there is actually a large hardware design space of Spiking Neural Network hardware that can be explored. Existing architectures have only spanned part of it.