scispace - formally typeset
Search or ask a question

Showing papers on "Field-programmable gate array published in 2010"


Book
10 Feb 2010
TL;DR: Fractional Order Systems Fractional order PID Controller Chaotic fractional order systems Field Programmable Gate Array, Microcontroller and Field Pmable Analog Array Implementation Switched Capacitor and Integrated Circuit Design Modeling of Ionic Polymeric Metal Composite as discussed by the authors.
Abstract: Fractional Order Systems Fractional Order PID Controller Chaotic Fractional Order Systems Field Programmable Gate Array, Microcontroller and Field Programmable Analog Array Implementation Switched Capacitor and Integrated Circuit Design Modeling of Ionic Polymeric Metal Composite

713 citations


Proceedings ArticleDOI
03 Aug 2010
TL;DR: This system is fully digital and is a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images.
Abstract: In this paper we present a scalable hardware architecture to implement large-scale convolutional neural networks and state-of-the-art multi-layered artificial vision systems. This system is fully digital and is a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images. We present a performance comparison between a software, FPGA and ASIC implementation that shows a speed up in custom hardware implementations.

298 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).
Abstract: Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.

227 citations


01 Jan 2010
TL;DR: The technical and economic challenges that led Xilinx to develop stacked silicon interconnect technology and innovations that make it possible are explored.
Abstract: The programmable imperative—the critical need to achieve more with less, to reduce risks wherever possible, and to quickly create differentiated products using programmable hardware design platforms—is driving the search for FPGA-based solutions that provide the capacity, lower power, and higher bandwidth with which users can create the system-level functionality currently delivered by ASICs and ASSPs. Xilinx has developed an innovative approach for designing and manufacturing FPGAs that address two key requirements of the programmable imperative. Stacked silicon interconnect technology is the foundation of a new generation of FPGAs that breaks through the limitations of Moore's law and delivers the capabilities to satisfy the most demanding design requirements. It also enables Xilinx to reduce the time required to deliver the largest FPGAs in the quantities needed to satisfy end-customer volume production requirements. This white paper explores the technical and economic challenges that led Xilinx to develop stacked silicon interconnect technology and innovations that make it possible.

193 citations


Book
19 Nov 2010
TL;DR: The introduction and the conclusion are the main chapters of the book, which provide a very strong theoretical and practical background to the field of reconfigurable computing, from the early Estrins machine to the very modern architecture like coarse-grained reconfigured device and the embedded logic devices.
Abstract: Introduction to Reconfigurable Computing provides a comprehensive study of the field Reconfigurable Computing. It provides an entry point to the novice willing to move in the research field reconfigurable computing, FPGA and system on programmable chip design. The book can also be used as teaching reference for a graduate course in computer engineering, or as reference to advance electrical and computer engineers. It provides a very strong theoretical and practical background to the field of reconfigurable computing, from the early Estrins machine to the very modern architecture like coarse-grained reconfigurable device and the embedded logic devices. Apart from the introduction and the conclusion, the main chapters of the book are Architecture of reconfigurable systems, Design and implementation, High-Level Synthesis for Reconfigurable Devices, Temporal placement, On-line and Dynamic Interconnection, Designing a reconfigurable application on Xilinx Virtex FPGA, System on programmable chip, Applications.

190 citations


Proceedings ArticleDOI
01 Dec 2010
TL;DR: A high resolution programmable delay logic (PDL) implemented by lookup table (LUT) internal structure is introduced, and fine tuning is performed to cancel out delay skews caused by asymmetries in routing and systematic variations.
Abstract: This paper proposes a novel approach for efficient implementation of a real-valued arbiter-based physical unclon-able function (PUF) on FPGA. We introduce a high resolution programmable delay logic (PDL) implemented by lookup table (LUT) internal structure. Using the PDL, we perform fine tuning to cancel out delay skews caused by asymmetries in routing and systematic variations. We devise a symmetric switch structure that can be easily implemented on FPGA. To mitigate the arbiter metastability problem, we present and analyze methods for majority voting of responses. Lastly, a method to classify and group challenges into different robustness sets is introduced, to further increase the corresponding responses' stability in the face of environmental variations. Experimental evaluations show that the responses to robust challenges have an average error rate of less than 2% under temperature variations from −10°C to 75°C.

179 citations


Journal ArticleDOI
TL;DR: Two new reconfigurable architectures of low complexity FIR filters are proposed, namely constant shifts method and programmable shifts method, which are capable of operating for different wordlength filter coefficients without any overhead in the hardware circuitry.
Abstract: Reconfigurability and low complexity are the two key requirements of finite impulse response (FIR) filters employed in multistandard wireless communication systems. In this paper, two new reconfigurable architectures of low complexity FIR filters are proposed, namely constant shifts method and programmable shifts method. The proposed FIR filter architecture is capable of operating for different wordlength filter coefficients without any overhead in the hardware circuitry. We show that dynamically reconfigurable filters can be efficiently implemented by using common subexpression elimination algorithms. The proposed architectures have been implemented and tested on Virtex 2v3000ff1152-4 field-programmable gate array and synthesized on 0.18 ?m complementary metal-oxide-semiconductor technology with a precision of 16 bits. Design examples show that the proposed architectures offer good area and power reductions and speed improvement compared to the best existing reconfigurable FIR filter implementations in the literature.

158 citations


Journal ArticleDOI
TL;DR: In this paper, the Wave Union TDC, a novel scheme of FPGA TDC to improve time measurement precision using multiple measurements, along with several other topics in FPGAs delay line based TDCs.
Abstract: This paper discusses implementation of the Wave Union TDC, a novel scheme of FPGA TDC to improve time measurement precision using multiple measurements, along with several other topics in FPGA delay line based TDCs. FPGA specific issues such as considerations on the delay line choice in different FPGA families and encoding logic are first examined. Next, common problems for both FPGA TDCs and ASIC TDCs such as schemes of coarse time counter implementation, bin-by-bin calibration and noise issues due to single ended signals are discussed. Several resource/power saving design approaches for various processing stages are described in the document.

157 citations


Book
29 Nov 2010
TL;DR: The author emphasizes the practical aspects of reconfigurable hardware design, explaining the basic mathematics involved, and giving a comprehensive description of state-of-the-art implementation techniques.
Abstract: Software-based cryptography can be used for security applications where data traffic is not too large and low encryption rate is tolerable But hardware methods are more suitable where speed and real-time encryption are needed Until now, there has been no book explaining how cryptographic algorithms can be implemented on reconfigurable hardware devices This book covers computational methods, computer arithmetic algorithms, and design improvement techniques needed to implement efficient cryptographic algorithms in FPGA reconfigurable hardware platforms The author emphasizes the practical aspects of reconfigurable hardware design, explaining the basic mathematics involved, and giving a comprehensive description of state-of-the-art implementation techniques

145 citations


Proceedings ArticleDOI
18 Jan 2010
TL;DR: The novel design makes use of the underlying FPGA architecture, and unlike prior published PUFs, the proposed PUF can be naturally embedded into a designs HDL, consuming very little area, and does not require the use of hard macros with “xed routing.”
Abstract: The concept of having an integrated circuit (IC) generate its own unique digital signature has broad application in areas such as embedded systems security, and IP/IC counter-piracy. Physically unclonable functions (PUFs) are circuits that compute a unique signature for a given IC based on the process variations inherent in the IC manufacturing process. This paper presents the first PUF design specifically targeted for field-programmable gate arrays (FPGAs). Our novel design makes use of the underlying FPGA architecture, and unlike prior published PUFs, the proposed PUF can be naturally embedded into a design's HDL, consuming very little area, and does not require the use of "hard macros" with fixed routing. Measured results on the Xilinx Virtex-5 65 nm FPGA demonstrate PUF signatures to be both unique and reliable under temperature variation.

139 citations


Patent
Lei He1
04 May 2010
TL;DR: In this article, the authors describe power reduction mechanisms for various portions of the FPGA, including logic blocks, routing circuits, connection blocks, switch blocks, configuration memory cells, and so forth.
Abstract: Field Programmable Logic Arrays (FPGAs) are described which utilize multiple power supply voltages to reduce both dynamic power and leakage power without sacrificing speed or substantially increasing device area. Power reduction mechanisms are described for numerous portions of the FPGA, including logic blocks, routing circuits, connection blocks, switch blocks, configuration memory cells, and so forth. Embodiments describe circuits and methods for implementing multiple supplies as sources of V dd , multiple voltage thresholding V t , signal level translators, and power gating of circuitry to deactivate portions of the circuit which are inactive. The supply voltage levels can be fixed, or programmable. Methods are described for performing circuit CAD in the routing and assignment process on FPGAs, in particular for optimizing FPGA use having the power reduction circuits taught. Routing methods describe utilizing slack timing, power sensitivity, trace-based simulations, and other techniques to optimize circuit utilization on a multi V dd FPGA.

Proceedings ArticleDOI
21 Feb 2010
TL;DR: A new design is introduced that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches.
Abstract: Multi-ported memories are challenging to implement with FPGAs since the provided block RAMs typically have only two ports. We present a thorough exploration of the design space of FPGA-based soft multi-ported memories by evaluating conventional solutions to this problem, and introduce a new design that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches. For example we build a 256-location, 32-bit, 12-ported (4-write, 8-read) memory that operates at 281 MHz on Altera Stratix III FPGAs while consuming an area equivalent to 3679 ALMs: a 43% speed improvement and 84% area reduction over a pure ALM implementation, and a 61% speed improvement over a pure "multipumped" implementation, although the pure multipumped implementation is 7.2x smaller.

Journal ArticleDOI
TL;DR: This paper proposes low-cost structure-independent fault detection schemes for the AES encryption and decryption using new formulations for the fault detection of SubBytes and inverse SubBytes using the relation between the input and the output of the S-box and the inverse S-boxes.
Abstract: The Advanced Encryption Standard (AES) has been lately accepted as the symmetric cryptography standard for confidential data transmission. However, the natural and malicious injected faults reduce its reliability and may cause confidential information leakage. In this paper, we study concurrent fault detection schemes for reaching a reliable AES architecture. Specifically, we propose low-cost structure-independent fault detection schemes for the AES encryption and decryption. We have obtained new formulations for the fault detection of SubBytes and inverse SubBytes using the relation between the input and the output of the S-box and the inverse S-box. The proposed schemes are independent of the way the S-box and the inverse S-box are constructed. Therefore, they can be used for both the S-boxes and the inverse S-boxes using lookup tables and those utilizing logic gates based on composite fields. Our simulation results show the error coverage of greater than 99 percent for the proposed schemes. Moreover, the proposed and the previously reported fault detection schemes have been implemented on the most recent Xilinx Virtex FPGAs. Their area and delay overheads have been compared and it is shown that the proposed schemes outperform the previously reported ones.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: The RAMP Gold prototype is a high-throughput, cycle-accurate full-system simulator that runs on a single Xilinx Virtex-5 FPGA board, and which simulates a 64-core shared-memory target machine capable of booting real operating systems.
Abstract: We present RAMP Gold, an economical FPGA-based architecture simulator that allows rapid early design-space exploration of manycore systems. The RAMP Gold prototype is a high-throughput, cycle-accurate full-system simulator that runs on a single Xilinx Virtex-5 FPGA board, and which simulates a 64-core shared-memory target machine capable of booting real operating systems. To improve FPGA implementation efficiency, functionality and timing are modeled separately and host multithreading is used in both models. We evaluate the prototype's performance using a modern parallel benchmark suite running on our manycore research operating system, achieving two orders of magnitude speedup compared to a widely-used software-based architecture simulator.

Journal ArticleDOI
TL;DR: A novel universal reversible logic gate (URG) and a set of basic sequential elements that could be used for building reversible sequential circuits, with 25% less garbage than the best reported in the literature are proposed.
Abstract: With the advent of nanometer technology, circuits are more prone to transient faults that can occur during its operation. Of the different types of transient faults reported in the literature, the single-event upset (SEU) is prominent. Traditional techniques such as triple-modular redundancy (TMR) consume large area and power. Reversible logic has been gaining interest in the recent past due to its less heat dissipation characteristics. This paper proposes the following: 1) a novel universal reversible logic gate (URG) and a set of basic sequential elements that could be used for building reversible sequential circuits, with 25% less garbage than the best reported in the literature; (2) a reversible gate that can mimic the functionality of a lookup table (LUT) that can be used to construct a reversible field-programmable gate array (FPGA); and (3) automatic conversion of any given reversible circuit into an online testable circuit that can detect online any single-bit errors, including soft errors in the logic blocks, using theoretically proved minimum garbage, which is significantly lesser than the best reported in the literature.

Journal ArticleDOI
TL;DR: A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers, applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence.
Abstract: A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx Virtex-4 field programmable gate array (FPGA). Two orders of magnitude speedup, over a general-purpose processor, is observed for each device for arithmetic intensive algorithms. An FPGA is superior, over a GPU, for algorithms requiring large numbers of regular memory accesses, while the GPU is superior for algorithms with variable data reuse. In the presence of data dependence, the implementation of a customized data path in an FPGA exceeds GPU performance by up to eight times. The trends of the analysis to newer and future technologies are analyzed.

Journal ArticleDOI
TL;DR: A fast search algorithm capable of operating in multi-dimensional spaces is introduced and its utility in the 2D and 3D maximum-likelihood position-estimation problem that arises in the processing of PMT signals to derive interaction locations in compact gamma cameras is demonstrated.
Abstract: A fast search algorithm capable of operating in multi-dimensional spaces is introduced. As a sample application, we demonstrate its utility in the 2D and 3D maximum-likelihood position-estimation problem that arises in the processing of PMT signals to derive interaction locations in compact gamma cameras. We demonstrate that the algorithm can be parallelized in pipelines, and thereby efficiently implemented in specialized hardware, such as field-programmable gate arrays (FPGAs). A 2D implementation of the algorithm is achieved in Cell/BE processors, resulting in processing speeds above one million events per second, which is a 20× increase in speed over a conventional desktop machine. Graphics processing units (GPUs) are used for a 3D application of the algorithm, resulting in processing speeds of nearly 250,000 events per second which is a 250× increase in speed over a conventional desktop machine. These implementations indicate the viability of the algorithm for use in real-time imaging applications.

Journal ArticleDOI
TL;DR: This paper proposes its own methodology for doing an FPGA-based AES implementation, which combines the use of three hardware languages with partial and dynamic reconfiguration, and a pipelined and parallel implementation.

Journal ArticleDOI
TL;DR: A novel stereo matching algorithm that is designed for high efficiency when realized in hardware and designed for the deployment in Field Programmable Gate Arrays and Application Specific Integrated Circuits (ASICs) is proposed.

Journal ArticleDOI
01 Sep 2010
TL;DR: This paper presents a hardware-based complex event detection system implemented on field-programmable gate arrays (FPGAs) that can detect complex events at gigabit wire speed with constant and fully predictable latency, independently of network load, packet size, or data distribution.
Abstract: Complex event detection is an advanced form of data stream processing where the stream(s) are scrutinized to identify given event patterns. The challenge for many complex event processing (CEP) systems is to be able to evaluate event patterns on high-volume data streams while adhering to real-time constraints. To solve this problem, in this paper we present a hardware-based complex event detection system implemented on field-programmable gate arrays (FPGAs). By inserting the FPGA directly into the data path between the network interface and the CPU, our solution can detect complex events at gigabit wire speed with constant and fully predictable latency, independently of network load, packet size, or data distribution. This is a significant improvement over CPU-based systems and an architectural approach that opens up interesting opportunities for hybrid stream engines that combine the flexibility of the CPU with the parallelism and processing power of FPGAs.

Proceedings ArticleDOI
03 Aug 2010
TL;DR: This work presents a VLSI implementation of a computationally efficient algorithm named Orthogonal Matching Pursuit, and further optimize the algorithm to meet typical hardware constraints and describe the different block units of the design.
Abstract: Compressive Sampling reconstruction techniques require computationally intensive algorithms, often using L1 optimization to reconstruct a signal that was originally sampled at a sub-Nyquist rate. In this work we present a VLSI implementation of a computationally efficient algorithm named Orthogonal Matching Pursuit. We further optimize the algorithm to meet typical hardware constraints and describe the different block units of our design. We synthesize our design for the Xilinx Virtex 5 FPGA and give timing and area results. We summarize our work with a short discussion of the possible uses for our system.

Book ChapterDOI
08 Jun 2010
TL;DR: This paper presents a lightweight implementation of the permutation Keccak-f[200] and KeCCak- f[400] of the SHA-3 candidate hash function Keccack, which is also the first lightweight Implementation of a sponge function, which differentiates it from the previous works.
Abstract: In this paper, we present a lightweight implementation of the permutation Keccak-f[200] and Keccak-f[400] of the SHA-3 candidate hash function Keccak. Our design is well suited for radio-frequency identification (RFID) applications that have limited resources and demand lightweight cryptographic hardware. Besides its low-area and low-power, our design gives a decent throughput. To the best of our knowledge, it is also the first lightweight implementation of a sponge function, which differentiates it from the previous works. By implementing the new hash algorithm Keccak, we have utilized unique advantages of the sponge construction. Although the implementation is targeted for Application Specific Integrated Circuit (ASIC) platforms, it is also suitable for Field Programmable Gate Arrays (FPGA). To obtain a compact design, serialized data processing principles are exploited together with algorithm-specific optimizations. The design requires only 2.52K gates with a throughput of 8 Kbps at 100 KHz system clock based on 0.13-µm CMOS standard cell library.

Journal ArticleDOI
TL;DR: A novel procedure of implementing software defined radio modem using a graphics processing unit instead of conventional digital signal processors and/or field programmable gate arrays and it is observed that the GPU-driven modem is nearly 90 times faster than the conventional 8-way Very Long Instruction Word architectured DSP- driven modem.
Abstract: This article presents a novel procedure of implementing software defined radio modem using a graphics processing unit instead of conventional digital signal processors and/or field programmable gate arrays. Considering that modern GPU is suitable for parallel computing due to its numerous powerful arithmetic logic units, we suggest a proper architecture of hardware and software platform for the SDR modem to be implemented on GPU. Then, we show a design example of mobile WiMAX terminal implemented on the proposed GPU platform. In our experimental tests, we observed that the GPU-driven modem is nearly 90 times faster than the conventional 8-way Very Long Instruction Word architectured DSP-driven modem for the application of Viterbi decoder implementation of mobile WiMAX terminal.

Journal ArticleDOI
TL;DR: The Durham adaptive optics real-time controller was initially a proof of concept design for a generic AO control system and has since been developed into a modern and powerful central-processing-unit-based real- time control system, capable of using hardware acceleration.
Abstract: The Durham adaptive optics (AO) real-time controller was initially a proof of concept design for a generic AO control system. It has since been developed into a modern and powerful central-processing-unit-based real-time control system, capable of using hardware acceleration (including field programmable gate arrays and graphical processing units), based primarily around commercial off-the-shelf hardware. It is powerful enough to be used as the real-time controller for all currently planned 8 m class telescope AO systems. Here we give details of this controller and the concepts behind it, and report on performance, including latency and jitter, which is less than 10 μs for small AO systems.

Journal ArticleDOI
TL;DR: The design of an IP core that implements a general purpose GA engine which has been successfully synthesized and verified on a Xilinx Virtex II Pro FPGA device (XC2VP30) is reported.
Abstract: Hardware implementation of genetic algorithms (GAs) is gaining importance because of their proven effectiveness as optimization engines for real-time applications (e.g., evolvable hardware). Earlier hardware implementations suffer from major drawbacks such as absence of GA parameter programmability, rigid predefined system architecture, and lack of support for multiple fitness functions. In this paper, we report the design of an IP core that implements a general-purpose GA engine that addresses these problems. Specifically, the proposed GA IP core can be customized in terms of the population size, number of generations, crossover and mutation rates, random number generator seed, and the fitness function. It has been successfully synthesized and verified on a Xilinx Virtex II Pro Field programmable gate arrays device (xc2vp30-7ff896) with only 13% logic slice utilization, 1% block memory utilization for GA memory, and a clock speed of 50 MHz. The GA core has been used as a search engine for real-time adaptive healing but can be tailored to any given application by interfacing with the appropriate application-specific fitness evaluation module as well as the required storage memory and by programming the values of the desired GA parameters. The core is soft in nature i.e., a gate-level netlist is provided which can be readily integrated with the user's system. The performance of the GA core was tested using standard optimization test functions. In the hardware experiments, the proposed core either found the globally optimum solution or found a solution that was within 3.7% of the value of the globally optimal solution. The experimental test setup including the GA core achieved a speedup of around 5.16× over an analogous software implementation.

Journal ArticleDOI
01 Jan 2010
TL;DR: The discrete wavelet transformation, the morphology edge enhancement sharpness measurement algorithms, and the self-organizing map (SOM) neural network were used in developing the control mechanism of the passive auto-focus camera control system.
Abstract: This paper presents a passive auto-focus camera control system which can easily achieve the function of auto-focus with no necessary of any active component (e.g., infrared or ultrasonic sensor) in comparison with the conventional active focus system. To implement the technique we developed, the hardware system including the adjustable lens with CMOS sensor and servo motor, an 8051 image capture micro-controller, a field programmable gate array (FPGA) sharpness measurement circuit, a pulse width modulation (PWM) controller, and a personal digital assistant (PDA) image displayer was constructed. The discrete wavelet transformation (DWT), the morphology edge enhancement sharpness measurement algorithms, and the self-organizing map (SOM) neural network were used in developing the control mechanism of the system. Compared with other passive auto-focus methods, the method we proposed has the advantages of lower computational complexity and easier hardware implementation.

Journal ArticleDOI
24 May 2010
TL;DR: In this article, a 48-channel Time-to-Digital Converter (TDC) implemented in a general purpose Field Programmable Gate Array (FPGA) is presented, where dedicated carry chains of the FPGA are utilized for time interpolation purposes inside a clock cycle.
Abstract: A high-resolution 48-Channel Time-to-Digital Converter (TDC) implemented in a general purpose Field Programmable Gate Array (FPGA) is presented. Dedicated carry chains of the FPGA are utilized for time interpolation purposes inside a clock cycle. A counter running at the system clock frequency provides a global time stamp. These two values, along with the channel number, are stored for readout. An extra effort was made to improve the resolution beyond the intrinsic cell delay of the carry chain as well as to achieve the same resolution on all 48 channels. Due to large bin width variations a bin-by-bin calibration scheme was used. Time interval (TI) measurements between two channels were made to determine the RMS and the time resolution of a single channel. At least 6 ps single channel resolution was achieved for all channels. Additional measurements were performed to characterize the influence of the temperature and voltage variations on the RMS value and the mean as well as the sensitivity of the TDC to crosstalk. The results of these measurements are also presented in this paper.

Journal ArticleDOI
TL;DR: A novel sort-free approach to path extension, as well as quantized metrics result in a high-throughput VLSI architecture with lower power and area consumption compared to state-of-the-art published systems.
Abstract: This paper describes the design and very-large-scale integration (VLSI) architecture for a 4 × 4 breadth-first K-best multiple-input-multiple-output (MIMO) decoder using a 64 quadrature-amplitude modulation (QAM) scheme. A novel sort-free approach to path extension, as well as quantized metrics result in a high-throughput VLSI architecture with lower power and area consumption compared to state-of-the-art published systems. Functionality is confirmed via a field-programmable gate array (FPGA) implementation on a Xilinx Virtex II Pro FPGA. Comparison of simulation and measurements are given, and FPGA utilization figures are provided. Finally, VLSI architectural tradeoffs are explored for a synthesized application-specific IC (ASIC) implementation in a 65-nm CMOS technology.

Proceedings ArticleDOI
06 Jun 2010
TL;DR: This demonstration shows Glacier, a library and a compiler that can be employed to implement streaming queries as hardware circuits on FPGAs, to show the flexibility of the compositional approach.
Abstract: Field-programmable gate arrays (FPGAs) are a promising technology that can be used in database systems. In this demonstration we show Glacier, a library and a compiler that can be employed to implement streaming queries as hardware circuits on FPGAs. Glacier consists of a library of compositional hardware modules that represent stream processing operators. Given a query execution plan, the compiler instantiates the corresponding components and wires them up to a digital circuit. The goal of this demo is to show the flexibility of the compositional approach.

Journal ArticleDOI
TL;DR: The proposed design is one of the first lifting based complete 3-D-DWT architectures without group of pictures restriction, and the new computing technique based on analysis of lifting signal flow graph minimizes the storage requirement.
Abstract: This paper presents an architecture of the lifting-based running 3-D discrete wavelet transform (DWT), which is a powerful image and video compression algorithm. The proposed design is one of the first lifting based complete 3-D-DWT architectures without group of pictures restriction. The new computing technique based on analysis of lifting signal flow graph minimizes the storage requirement. This architecture enjoys reduced memory referencing and related low power consumption, low latency, and high throughput compared to those of earlier reported works. The proposed architecture has been successfully implemented on Xilinx Virtex-IV series field-programmable gate array, offering a speed of 321 MHz, making it suitable for real-time compression even with large frame dimensions. Moreover, the architecture is fully scalable beyond the present coherent Daubechies filterbank (9, 7).