scispace - formally typeset
Search or ask a question

Showing papers on "Field-programmable gate array published in 2011"


Journal ArticleDOI
Jason Cong, Bin Liu, Stephen Neuendorffer1, Juanjo Noguera1, Kees Vissers1, Zhiru Zhang 
TL;DR: AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx are used as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains.
Abstract: Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.

728 citations


Proceedings ArticleDOI
27 Feb 2011
TL;DR: A new open source high-level synthesis tool called LegUp that allows software techniques to be used for hardware design and produces hardware solutions of comparable quality to a commercial high- level synthesis tool.
Abstract: In this paper, we introduce a new open source high-level synthesis tool called LegUp that allows software techniques to be used for hardware design LegUp accepts a standard C program as input and automatically compiles the program to a hybrid architecture containing an FPGA-based MIPS soft processor and custom hardware accelerators that communicate through a standard bus interface Results show that the tool produces hardware solutions of comparable quality to a commercial high-level synthesis tool

531 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A scalable dataflow hardware architecture optimized for the computation of general-purpose vision algorithms — neuFlow — and a dataflow compiler — luaFlow — that transforms high-level flow-graph representations of these algorithms into machine code for neu Flow are presented.
Abstract: In this paper we present a scalable dataflow hardware architecture optimized for the computation of general-purpose vision algorithms — neuFlow — and a dataflow compiler — luaFlow — that transforms high-level flow-graph representations of these algorithms into machine code for neuFlow. This system was designed with the goal of providing real-time detection, categorization and localization of objects in complex scenes, while consuming 10 Watts when implemented on a Xilinx Virtex 6 FPGA platform, or about ten times less than a laptop computer, and producing speedups of up to 100 times in real-world applications. We present an application of the system on street scene analysis, segmenting 20 categories on 500 × 375 frames at 12 frames per second on our custom hardware neuFlow.

407 citations


Book
13 Jun 2011
TL;DR: Design for Embedded Image Processing on FPGAs is ideal for researchers and engineers in the vision or image processing industry, who are looking at smart sensors, machine vision, and robotic vision, as well as FPGA developers and application engineers.
Abstract: Dr Donald Bailey starts with introductory material considering the problem of embedded image processing, and how some of the issues may be solved using parallel hardware solutions Field programmable gate arrays (FPGAs) are introduced as a technology that provides flexible, fine-grained hardware that can readily exploit parallelism within many image processing algorithms A brief review of FPGA programming languages provides the link between a software mindset normally associated with image processing algorithms, and the hardware mindset required for efficient utilization of a parallel hardware design The design process for implementing an image processing algorithm on an FPGA is compared with that for a conventional software implementation, with the key differences highlighted Particular attention is given to the techniques for mapping an algorithm onto an FPGA implementation, considering timing, memory bandwidth and resource constraints, and efficient hardware computational techniques Extensive coverage is given of a range of low and intermediate level image processing operations, discussing efficient implementations and how these may vary according to the application The techniques are illustrated with several example applications or case studies from projects or applications he has been involved with Issues such as interfacing between the FPGA and peripheral devices are covered briefly, as is designing the system in such a way that it can be more readily debugged and tuned Provides a bridge between algorithms and hardwareDemonstrates how to avoid many of the potential pitfallsOffers practical recommendations and solutionsIllustrates several real-world applications and case studiesAllows those with software backgrounds to understand efficient hardware implementationDesign for Embedded Image Processing on FPGAs is ideal for researchers and engineers in the vision or image processing industry, who are looking at smart sensors, machine vision, and robotic vision, as well as FPGA developers and application engineersThe book can also be used by graduate students studying imaging systems, computer engineering, digital design, circuit design, or computer science It can also be used as supplementary text for courses in advanced digital design, algorithm and hardware implementation, and digital signal processing and applicationsCompanion website for the book: wwwwileycom/go/bailey/fpga

302 citations


01 Jan 2011
TL;DR: LegUp as discussed by the authors is a high-level synthesis tool that allows software techniques to be used for hardware design, which can synthesize most of the C language to hardware, including fixed-sized multi-dimensional arrays, structs, global variables and pointer arithmetic.
Abstract: It is generally accepted that a custom hardware implementation of a set of computations will provide superior speed and energy-efficiency relative to a software implementation. However, the cost and difficulty of hardware design is often prohibitive, and consequently, a software approach is used for most applications. In this paper, we introduce a new high-level synthesis tool called LegUp that allows software techniques to be used for hardware design. LegUp accepts a standard C program as input and automatically compiles the program to a hybrid architecture containing an FPGA-based MIPS soft processor and custom hardware accelerators that communicate through a standard bus interface. In the hybrid processor/accelerator architecture, program segments that are unsuitable for hardware implementation can execute in software on the processor. LegUp can synthesize most of the C language to hardware, including fixed-sized multi-dimensional arrays, structs, global variables and pointer arithmetic. Results show that the tool produces hardware solutions of comparable quality to a commercial high-level synthesis tool. We also give results demonstrating the ability of the tool to explore the hardware/software co-design space by varying the amount of a program that runs in software vs. hardware. LegUp, along with a set of benchmark C programs, is open source and freely downloadable, providing a powerful platform that can be leveraged for new research on a wide range of high-level synthesis topics.

250 citations


Journal ArticleDOI
TL;DR: The near optimum design for membership functions and control rules were found simultaneously by genetic algorithms (GAs) which are search algorithms based on the mechanism of natural selection and genetics which are easy to implement and efficient for multivariable optimization problems such as in fuzzy controller design.

240 citations


Journal ArticleDOI
TL;DR: A new version of the VPR toolset is described and illustrated that supports a broad range of single-driver routing architectures, and provides optimized electrical models for a wide range of architectures in different process technologies, including a range of area-delay trade-offs for each single architecture.
Abstract: The VPR toolset has been widely used in FPGA architecture and CAD research, but has not evolved over the past decade. This article describes and illustrates the use of a new version of the toolset that includes four new features: first, it supports a broad range of single-driver routing architectures, which have superior architectural and electrical properties over the prior multidriver approach (and which is now employed in the majority of FPGAs sold). Second, it can now model, for placement and routing a heterogeneous selection of hard logic blocks. This is a key (but not final) step toward the incluion of blocks such as memory and multipliers. Third, we provide optimized electrical models for a wide range of architectures in different process technologies, including a range of area-delay trade-offs for each single architecture. Finally, to maintain robustness and support future development the release includes a set of regression tests for the software.To illustrate the use of the new features, we explore several architectural issues: the FPGA area efficiency versus logic block granularity, the effect of single-driver routing, and a simple use of the heterogeneity to explore the impact of hard multipliers on wiring track count.

215 citations


Patent
31 Mar 2011
TL;DR: In this paper, a combination of software logic and firmware logic can be used to efficiently control and manage the high speed flow of financial market data to and from the reconfigurable logic.
Abstract: Methods and systems for processing financial market data using reconfigurable logic are disclosed. Various functional operations to be performed on the financial market data can be implemented in firmware pipelines to accelerate the speed of processing. Also, a combination of software logic and firmware logic can be used to efficiently control and manage the high speed flow of financial market data to and from the reconfigurable logic.

191 citations


Journal ArticleDOI
TL;DR: This paper presents a field-programmable gate array (FPGA)-based real-time digital simulator for power electronic apparatus based on a realistic device-level behavioral model, implemented using very high speed integrated circuit hardware description language (VHDL).
Abstract: This paper presents a field-programmable gate array (FPGA)-based real-time digital simulator for power electronic apparatus based on a realistic device-level behavioral model. A three-level 12-pulse voltage source converter (VSC)-fed induction machine drive is implemented on the FPGA. The VSC model is computed at a fixed time step of 12.5 ns, allowing a realistic representation of insulated-gate bipolar transistor (IGBT) nonlinear switching characteristics and power losses. The simulator also models a squirrel-cage induction machine, a direct field-oriented control system, and a pulsewidth modulator to achieve the real-time simulation of the complete drive system. All the models have been implemented using very high speed integrated circuit hardware description language (VHDL). Real-time simulation results have been validated using the measured device-level IGBT characteristics.

189 citations


Journal ArticleDOI
TL;DR: A new readout interface for silicon pixel detectors of the Medipix family has been developed in this group in order to provide a higher frame rate and enhanced flexibility of operation.
Abstract: The semiconductor pixel detector Timepix contains an array of 256 × 256 square pixels with pitch 55 μm. In addition to high spatial granularity the single quantum counting detector Timepix can provide also energy or time information in each pixel. This device is a powerful tool for radiation and particle detection, imaging and tracking. A new readout interface for silicon pixel detectors of the Medipix family has been developed in our group in order to provide a higher frame rate and enhanced flexibility of operation. The interface consists of a field programmable gate array, a USB 2.0 interface chip, DAC, ADC and a circuit which generates bias voltage for the sensor. The main control system is placed in the FPGA circuit which fully controls the Timepix device. This approach offers an easy way how to include new functionality and extended operation. The interface for Timepix supports all operation modes of the detector (counting, TOT, timing). The FITPix is a successor of the USB 1.22 Interface and the electronic readout is built with the latest available components, which allows achieving up to 90 frames per second with a single detector. The frame rate is about 20 times faster compared to the previous system while it maintains all same capabilities supported. In addition FITPix newly enables an adjustable clock frequency and hardware triggering which is a useful tool when there is the need for synchronized operation of multiple devices. Three modes of hardware trigger have been implemented: hardware trigger which starts the measurement, hardware trigger which terminates the measurement and hardware trigger which controls measurement fully. The entire system is fully powered through the USB bus. FITPix supports also readout from several detectors in chain in which case just an external power source is required. FITPix is a fully flexible device and the user needs no other equipment. FITPix combines high performance and mobility and it opens new fields of applications. The current version of the FITPix interface has dimension 45 mm × 60 mm.

184 citations


Journal ArticleDOI
TL;DR: The benefits of using field-programmable gate array (FPGA)-based controllers for power electronics and drive applications and the constraints specifically linked to the control of power converters are discussed.
Abstract: This article presents the benefits of using field-programmable gate array (FPGA)-based controllers for power electronics and drive applications. For this purpose, an algorithm perspective is first proposed, where it is stated that, depending on the intrinsic parallelism properties as well as level of complexity, it makes sense to implement each control algorithm on a specific hardware and/or software architecture to get the best performances in terms of execution time or the best ratio performance versus cost. Then, an application perspective is proposed where the constraints specifically linked to the control of power converters are discussed.

Journal ArticleDOI
TL;DR: Results of the experimental tests confirm that the multilayer NN, implemented in the FPGA with the use of the higher level programming language, ensures a high-quality state variable estimation of the two-mass drive system.
Abstract: This paper presents a practical realization of a neural network (NN)-based estimator of the load machine speed for a drive system with elastic coupling, using a reconfigurable field-programmable gate array (FPGA). The system presented is unique because the multilayer NN is implemented in the FPGA placed inside the NI CompactRIO controller. The neural network used as a state estimator was trained with the Levenberg-Marquardt algorithm. Special algorithm for implementation of the multilayer neural networks in such hardware platform is presented, focused on the minimization of the used programmable blocks of the FPGA matrix. The algorithm code for the neural estimator implemented in C-RIO was realized using the LabVIEW software. The neural estimators are tested: offline (based on the measured testing database) and online (in the closed-loop control structure). These estimators are tested also for changeable inertia moment of the load machine of the drive system with elastic joint. Presented results of the experimental tests confirm that the multilayer NN, implemented in the FPGA with the use of the higher level programming language, ensures a high-quality state variable estimation of the two-mass drive system.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a maximum power point tracker (MPPT) method based on fuzzy logic controller (FLC), applied to a stand-alone photovoltaic system under variable temperature and irradiance conditions.

Proceedings ArticleDOI
27 Feb 2011
TL;DR: This paper compares the delay and area of a comprehensive set of processor building block circuits when implemented on custom CMOS and FPGA substrates to infer how the microarchitecture of soft processors on FPGAs should be different from hard processors on customCMOS.
Abstract: As soft processors are increasingly used in diverse applications, there is a need to evolve their microarchitectures in a way that suits the FPGA implementation substrate. This paper compares the delay and area of a comprehensive set of processor building block circuits when implemented on custom CMOS and FPGA substrates. We then use the results of these comparisons to infer how the microarchitecture of soft processors on FPGAs should be different from hard processors on custom CMOS.We find that the ratios of the area required by an FPGA to that of custom CMOS for different building blocks varies significantly more than the speed ratios. As area is often a key design constraint in FPGA circuits, area ratios have the most impact on microarchitecture choices. Complete processor cores have area ratios of 17-27x and delay ratios of 18-26x. Building blocks that have dedicated hardware support on FPGAs such as SRAMs, adders, and multipliers are particularly area-efficient (2-7x area ratio), while multiplexers and CAMs are particularly area-inefficient (>100x area ratio), leading to cheaper ALUs, larger caches of low associativity, and more expensive bypass networks than on similar hard processors. We also find that a low delay ratio for pipeline latches (12-19x) suggests soft processors should have pipeline depths 20% greater than hard processors of similar complexity.

Proceedings ArticleDOI
14 Mar 2011
TL;DR: A novel on-chip structure including a ring oscillator network (RON) distributed across the entire chip is proposed to verify whether the chip is Trojan-free, which effectively eliminates the issue of measurement noise, localizes the measurement of dynamic power, and additionally compensates for the impact of process variations.
Abstract: Integrated circuits (ICs) are becoming increasingly vulnerable to malicious alterations, referred to as hardware Trojans. Detection of these inclusions is of utmost importance, as they may potentially be inserted into ICs bound for military, financial, or other critical applications. A novel on-chip structure including a ring oscillator network (RON), distributed across the entire chip, is proposed to verify whether the chip is Trojan-free. This structure effectively eliminates the issue of measurement noise, localizes the measurement of dynamic power, and additionally compensates for the impact of process variations. Combined with statistical data analysis, the separation of process variations from the Trojan contribution to the circuit's transient power is made possible. Simulation results featuring Trojans inserted into a benchmark circuit using 90nm technology and experimental results on Xilinx Spartan-3E FPGA demonstrate the efficiency and scalability of the RON architecture for Trojan detection.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: A belief propagation (BP) decoder architecture for an increasingly popular hardware platform; Field Programmable Gate Array (FPGA) that supports any code rate and is quite flexible in terms of hardware complexity and throughput.
Abstract: Polar codes are a class of codes versatile enough to achieve the Shannon bound in a large array of source and channel coding problems. For that reason it is important to have efficient implementation architectures for polar codes in hardware. Motivated by this fact we propose a belief propagation (BP) decoder architecture for an increasingly popular hardware platform; Field Programmable Gate Array (FPGA). The proposed architecture supports any code rate and is quite flexible in terms of hardware complexity and throughput. The architecture can also be extended to support multiple block lengths without increasing the hardware complexity a lot. Moreover various schedulers can be adapted into the proposed architecture so that list decoding techniques can be used with a single block. Finally the proposed architecture is compared with a convolutional turbo code (CTC) decoder for WiMAX taken from a Xilinx Product Specification and seen that polar codes are superior to CTC codes both in hardware complexity and throughput.

Proceedings ArticleDOI
05 Sep 2011
TL;DR: This paper formally introduces RapidSmith, a new set of tools and APIs that enable CAD tool creation for Xilinx FPGAs that alleviates several of the difficulties of using XDL and demonstrates the kinds of research facilitated by removing such challenges.
Abstract: Creating CAD tools for commercial FPGAs is a difficult task. Closed proprietary device databases and unsupported interfaces are largely to blame for the lack of CAD research found on commercial architectures versus hypothetical architectures. This paper formally introduces RapidSmith, a new set of tools and APIs that enable CAD tool creation for Xilinx FPGAs. Based on the Xilinx Design Language (XDL), RapidSmith provides a compact, yet, fast device database with hundreds of APIs that enable the creation of placers, routers and several other tools for Xilinx devices. RapidSmith alleviates several of the difficulties of using XDL and this work demonstrates the kinds of research facilitated by removing such challenges.

Journal ArticleDOI
TL;DR: This work describes a platform that offers a high degree of parameterization, while maintaining generalized network design with performance comparable to other hardware-based MLP implementations, and application of the hardware implementation of ANN with backpropagation learning algorithm for a realistic application.
Abstract: This paper presents the development and implementation of a generalized backpropagation multilayer perceptron (MLP) architecture described in VLSI hardware description language (VHDL). The development of hardware platforms has been complicated by the high hardware cost and quantity of the arithmetic operations required in online artificial neural networks (ANNs), i.e., general purpose ANNs with learning capability. Besides, there remains a dearth of hardware platforms for design space exploration, fast prototyping, and testing of these networks. Our general purpose architecture seeks to fill that gap and at the same time serve as a tool to gain a better understanding of issues unique to ANNs implemented in hardware, particularly using field programmable gate array (FPGA). The challenge is thus to find an architecture that minimizes hardware costs, while maximizing performance, accuracy, and parameterization. This work describes a platform that offers a high degree of parameterization, while maintaining generalized network design with performance comparable to other hardware-based MLP implementations. Application of the hardware implementation of ANN with backpropagation learning algorithm for a realistic application is also presented.

Journal ArticleDOI
TL;DR: In this paper, an FPGA-based implementation of a real-time perturb and observe (P&O) algorithm for tracking the maximum power point (MPP) of a photovoltaic (PV) generator is presented.

Journal ArticleDOI
TL;DR: This paper introduces a unified approach to the validation of power-electronics (PE) control hardware, firmware, and software designs based on a scalable application-specific ultralow-latency (ULL) digital processor core.
Abstract: This paper introduces a unified approach to the validation of power-electronics (PE) control hardware, firmware, and software designs. It is based on a scalable application-specific ultralow-latency (ULL) digital processor core. The proposed ULL processor core simulates PE converters and systems comprising multiple power converters with a fixed 1-μs simulation time step and latency, regardless of the size of the system. Owing to its ULL, the proposed platform enables the fully automatic testing and validation of the complete PE design comprising component safe-operating-area validation, system protection, firmware, and software implementation as well as overall system performance optimization.

Proceedings ArticleDOI
03 Oct 2011
TL;DR: PP: a simple high-level language for describing packet parsing algorithms in an implementation-independent manner is introduced and it is demonstrated that this language can be compiled to give high-speed FPGA-based packet parsers that can be integrated alongside other packet processing components to build network nodes.
Abstract: Packet parsing is necessary at all points in the modern networking infrastructure, to support packet classification and security functions, as well as for protocol implementation. Increasingly high line rates call for advanced hardware packet processing solutions, while increasing rates of change call for high-level programmability of these solutions. This paper presents an approach for harnessing modern Field Programmable Gate Array (FPGA) devices, which are a natural technology for implementing the necessary high-speed programmable packet processing. The paper introduces PP: a simple high-level language for describing packet parsing algorithms in an implementation-independent manner. It demonstrates that this language can be compiled to give high-speed FPGA-based packet parsers that can be integrated alongside other packet processing components to build network nodes. Compilation involves generating virtual processing architectures tailored to specific packet parsing requirements. Scalability of these architectures allows parsing at line rates from 1 to 400 Gb/s as required in different network contexts. Run-time programmability of these architectures allows dynamic updating of parsing algorithms during operation in the field. Implementation results show that programmable packet parsing of 600 million small packets per second can be supported on a single Xilinx Virtex-7 FPGA device handling a 400 Gb/s line rate.

Journal ArticleDOI
TL;DR: It is shown that the proposed implementation method enables high switching frequency operation with high pulse resolution as well as a negligible propagation time for the generation of the gating pulses.
Abstract: This paper proposes a real-time DSP- and FPGA-based implementation method of a space vector modulation (SVM) algorithm for an indirect matrix converter (IMC). Therefore, low-cost and compact control platform is built using a 32-bit fixed-point DSP (TMS320F2812) operating at 150 MHz and a SPARTAN 3E FPGA operating at 50 MHz. The method consists in using the event-manager modules of the DSP to build specified pulses at its PWM output peripherals, which are fed to the digital input ports of a FPGA. Moreover, a simple logical processing and delay times are thereafter implemented in the FPGA so as to synthesize the suitable gate pulse patterns for the semiconductor-controlled devices. It is shown that the proposed implementation method enables high switching frequency operation with high pulse resolution as well as a negligible propagation time for the generation of the gating pulses. Experimental results from an IMC prototype confirm the practical feasibility of the proposed technique.

Journal ArticleDOI
TL;DR: The hardware implementation of a two-inputs one-output digital Fuzzy Logic Controller on a Xilinx reconfigurable Field-Programmable Gate Array (FPGA) using VHDL Hardware Description Language shows a satisfactory performance with a good agreement between the expected and the obtained values.

Patent
01 Apr 2011
TL;DR: In this paper, the authors describe a debug system that generates hardware elements from normally non-synthesizable code elements for placement on an FPGA device, called a Behavior Processor.
Abstract: The debug system described in this patent specification provides a system that generates hardware elements from normally non-synthesizable code elements for placement on an FPGA device. This particular FPGA device is called a Behavior Processor. This Behavior Processor executes in hardware those code constructs that were previously executed in software. When some condition is satisfied (e.g., If . . . then . . . else loop) which requires some intervention by the workstation or the software model, the Behavior Processor works with an Xtrigger device to send a callback signal to the workstation for immediate response.

Proceedings ArticleDOI
01 May 2011
TL;DR: This work presents results from creating a new FPGA design flow based on hard macros called HMF low, designed for rapid prototyping that has shown speedups of 10-50X over the fastest configuration of the Xilinx tools.
Abstract: The FPGA compilation process (synthesis, map, place, and route) is a time consuming task that severely limits designer productivity. Compilation time can be reduced by saving implementation data in the form of hard macros. Hard macros consist of previously synthesized, placed and routed circuits that enable rapid design assembly because of the native FPGA circuitry (primitives and nets)which they encapsulate. This work presents results from creating a new FPGA design flow based on hard macros called HMF low. HMF low has shown speedups of 10-50X over the fastest configuration of the Xilinx tools. Designed for rapid prototyping, HMF low achieves these speedups by only utilizing up to 50 percent of the resources on an FPGA and produces implementations that run 2-4X slower than those produced by Xilinx. These speedups are obtained on a wide range of benchmark designs with some exceeding 18,000 slices on a Virtex 4 LX200.

Journal ArticleDOI
TL;DR: A hardware-in-the-loop (HIL) simulation technique applied to a series-resonant multiple-output inverter for new multi-inductor domestic induction heating platforms is presented and a real-time simulation test bench is proposed.
Abstract: This paper presents a hardware-in-the-loop (HIL) simulation technique applied to a series-resonant multiple-output inverter for new multi-inductor domestic induction heating platforms. The control of the topology is based on a system-on-programmable chip (SoPC) solution, which combines the MicroBlaze embedded soft-core processor and a customized peripheral that generates the power converter control signals. The firmware is written in C, and the customized peripheral is described using a hardware description language. Simulating the whole system using digital or mixed-signal simulation tools is a very time-consuming task due to the embedded processor model complexity, and additionally, it does not support tracing C instructions. To overcome these limitations, this paper proposes a real-time simulation test bench. The embedded processor core, peripherals, and the power converter model are all implemented into the same field-programmable gate array (FPGA). Using the hardware and software debugging tools supplied by the FPGA vendor, currents and voltages of the power converter model are monitored, and firmware C instructions are traced while running on the embedded processor core. Then, it is presented a design flow that is proven to be an effective and low-cost solution to verify the functionality of the customized peripheral and to implement a platform to perform firmware verification.

Book
01 Mar 2011
TL;DR: This book describes the synthesis of logic functions using memories, useful to design field programmable gate arrays (FPGAs) that contain both small-scale memories, called look-up tables (LUTs), and medium-scale Memories, called embedded memories.
Abstract: This book describes the synthesis of logic functions using memories. It is useful to design field programmable gate arrays (FPGAs) that contain both small-scale memories, called look-up tables (LUTs), and medium-scale memories, called embedded memories. This is a valuable reference for both FPGA system designers and CAD tool developers, concerned with logic synthesis for FPGAs.

Journal ArticleDOI
TL;DR: An architectural template to enable design space exploration of different possible CGRA designs is proposed, called the template expression-grained reconfigurable array (EGRA), as its ability to generate complex computational cells, executing expressions as opposed to single operations is a defining feature.
Abstract: Reconfigurable arrays combine the benefit of spatial execution, typical of hardware solutions, with that of programmability, present in microprocessors. When mapping software applications (or parts of them) onto hardware, however, fine-grain arrays, such as field-programmable gate arrays (FPGAs), often provide more flexibility than is needed, and do not implement coarser-level operations efficiently. Therefore, coarse grained reconfigurable arrays (CGRAs) have been proposed to this aim. Most CGRA design emerged in research present ad-hoc solutions in many aspects; in this paper we propose an architectural template to enable design space exploration of different possible CGRA designs. We called the template expression-grained reconfigurable array (EGRA), as its ability to generate complex computational cells, executing expressions as opposed to single operations, is a defining feature. Other notable EGRA characteristics include the ability to support heterogeneous cells and different storage requirements through various memory interfaces. The performed design explorations, as shown trough the experimental data provided, can effectively drive designers to further close the performance gap between reconfigurable and hardwired logic by providing guidelines on architectural design choices. Performance results on a number of embedded applications show that EGRA instances can be used as a reconfigurable fabric for customizable processors, outperforming more traditional CGRA designs.

Journal ArticleDOI
TL;DR: This paper combines a digital signal processor (DSP) and a field programmable gate array (FPGA) to realize the online empirical mode decomposition (EMD)-based signal processing system and presents a prototype of the online EMD-based electrocardiogram denoise system.
Abstract: This paper combines a digital signal processor (DSP) and a field programmable gate array (FPGA) to realize the online empirical mode decomposition (EMD)-based signal processing system. The EMD algorithm is a novel signal analysis technique, decomposing signals into a series of intrinsic mode functions. First, the EMD algorithm is implemented in the DSP, named the EMD processor, which has the ability to eliminate noise from the original signal. Next, in order to process the online sequential signal, this paper proposes and implements pipeline and data transfer controllers in the FPGA, called the data processing flow processor. Then, the data processing flow processor coordinates the EMD processor, analog-to-digital converter, and digital-to-analog converter module boards. Finally, this paper presents a prototype of the online EMD-based electrocardiogram denoise system to verify the features of the proposed architecture. The emulations and experimental results demonstrate the effectiveness of the presented system as expected.

Proceedings ArticleDOI
27 Feb 2011
TL;DR: This paper presents an area-driven generic packing tool that can pack the logical atoms into any heterogeneous FPGA described in the new language, including many different kinds of soft and hard logic blocks.
Abstract: The development of future FPGA fabrics with more sophisticated and complex logic blocks requires a new CAD flow that permits the expression of that complexity and the ability to synthesize to it. In this paper, we present a new logic block description language that can depict complex intra-block interconnect, hierarchy and modes of operation. These features are necessary to support modern and future FPGA complex soft logic blocks, memory and hard blocks. The key part of the CAD flow associated with this complexity is the packer, which takes the logical atomic pieces of the complex blocks and groups them into whole physical entities. We present an area-driven generic packing tool that can pack the logical atoms into any heterogeneous FPGA described in the new language, including many different kinds of soft and hard logic blocks. We gauge its area quality by comparing the results achieved with a lower bound on the number of blocks required, and then illustrate its explorative capability in two ways: on fracturable LUT soft logic architectures, and on hard block memory architectures. The new infrastructure attaches to a flow that begins with a Verilog front-end, permitting the use of benchmarks that are significantly larger than the usual ones, and can target heterogenous FPGAs.