Showing papers on "Field-programmable gate array published in 2010"

PDF

Open Access

Book•

Fractional Order Systems: Modeling and Control Applications

[...]

Riccardo Caponetto, Giovanni Dongola, Luigi Fortuna, Ivo Petras

10 Feb 2010

TL;DR: Fractional Order Systems Fractional order PID Controller Chaotic fractional order systems Field Programmable Gate Array, Microcontroller and Field Pmable Analog Array Implementation Switched Capacitor and Integrated Circuit Design Modeling of Ionic Polymeric Metal Composite as discussed by the authors.

...read moreread less

Abstract: Fractional Order Systems Fractional Order PID Controller Chaotic Fractional Order Systems Field Programmable Gate Array, Microcontroller and Field Programmable Analog Array Implementation Switched Capacitor and Integrated Circuit Design Modeling of Ionic Polymeric Metal Composite

...read moreread less

713 citations

Proceedings Article•DOI•

Hardware accelerated convolutional neural networks for synthetic vision systems

[...]

Clement Farabet¹, Berin Martini², Polina Akselrod², S. Talay², Yann LeCun¹, Eugenio Culurciello² - Show less +2 more•Institutions (2)

Courant Institute of Mathematical Sciences¹, Yale University²

03 Aug 2010

TL;DR: This system is fully digital and is a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images.

...read moreread less

Abstract: In this paper we present a scalable hardware architecture to implement large-scale convolutional neural networks and state-of-the-art multi-layered artificial vision systems. This system is fully digital and is a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images. We present a performance comparison between a software, FPGA and ASIC implementation that shows a speed up in custom hardware implementations.

...read moreread less

298 citations

Journal Article•DOI•

State-of-the-art in heterogeneous computing

[...]

André R. Brodtkorb¹, Christopher Dyken¹, Trond Runar Hagen¹, Jon M. Hjelmervik¹, Olaf O. Storaasli² - Show less +1 more•Institutions (2)

SINTEF¹, Oak Ridge National Laboratory²

01 Jan 2010-Scientific Programming

TL;DR: In this paper, the authors present an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).

...read moreread less

Abstract: Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.

...read moreread less

227 citations

Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency

[...]

Patrick Dorsey

01 Jan 2010

TL;DR: The technical and economic challenges that led Xilinx to develop stacked silicon interconnect technology and innovations that make it possible are explored.

...read moreread less

Abstract: The programmable imperative—the critical need to achieve more with less, to reduce risks wherever possible, and to quickly create differentiated products using programmable hardware design platforms—is driving the search for FPGA-based solutions that provide the capacity, lower power, and higher bandwidth with which users can create the system-level functionality currently delivered by ASICs and ASSPs. Xilinx has developed an innovative approach for designing and manufacturing FPGAs that address two key requirements of the programmable imperative. Stacked silicon interconnect technology is the foundation of a new generation of FPGAs that breaks through the limitations of Moore's law and delivers the capabilities to satisfy the most demanding design requirements. It also enables Xilinx to reduce the time required to deliver the largest FPGAs in the quantities needed to satisfy end-customer volume production requirements. This white paper explores the technical and economic challenges that led Xilinx to develop stacked silicon interconnect technology and innovations that make it possible.

...read moreread less

193 citations

Book•

Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications

[...]

Christophe Bobda

19 Nov 2010

TL;DR: The introduction and the conclusion are the main chapters of the book, which provide a very strong theoretical and practical background to the field of reconfigurable computing, from the early Estrins machine to the very modern architecture like coarse-grained reconfigured device and the embedded logic devices.

...read moreread less

Abstract: Introduction to Reconfigurable Computing provides a comprehensive study of the field Reconfigurable Computing. It provides an entry point to the novice willing to move in the research field reconfigurable computing, FPGA and system on programmable chip design. The book can also be used as teaching reference for a graduate course in computer engineering, or as reference to advance electrical and computer engineers. It provides a very strong theoretical and practical background to the field of reconfigurable computing, from the early Estrins machine to the very modern architecture like coarse-grained reconfigurable device and the embedded logic devices. Apart from the introduction and the conclusion, the main chapters of the book are Architecture of reconfigurable systems, Design and implementation, High-Level Synthesis for Reconfigurable Devices, Temporal placement, On-line and Dynamic Interconnection, Designing a reconfigurable application on Xilinx Virtex FPGA, System on programmable chip, Applications.

...read moreread less

190 citations

Proceedings Article•DOI•

FPGA PUF using programmable delay lines

[...]

Mehrdad Majzoobi¹, Farinaz Koushanfar¹, Srinivas Devadas²•Institutions (2)

Rice University¹, Massachusetts Institute of Technology²

01 Dec 2010

TL;DR: A high resolution programmable delay logic (PDL) implemented by lookup table (LUT) internal structure is introduced, and fine tuning is performed to cancel out delay skews caused by asymmetries in routing and systematic variations.

...read moreread less

Abstract: This paper proposes a novel approach for efficient implementation of a real-valued arbiter-based physical unclon-able function (PUF) on FPGA. We introduce a high resolution programmable delay logic (PDL) implemented by lookup table (LUT) internal structure. Using the PDL, we perform fine tuning to cancel out delay skews caused by asymmetries in routing and systematic variations. We devise a symmetric switch structure that can be easily implemented on FPGA. To mitigate the arbiter metastability problem, we present and analyze methods for majority voting of responses. Lastly, a method to classify and group challenges into different robustness sets is introduced, to further increase the corresponding responses' stability in the face of environmental variations. Experimental evaluations show that the responses to robust challenges have an average error rate of less than 2% under temperature variations from −10°C to 75°C.

...read moreread less

179 citations

Journal Article•DOI•

New Reconfigurable Architectures for Implementing FIR Filters With Low Complexity

[...]

R. Mahesh¹, A. P. Vinod¹•Institutions (1)

Nanyang Technological University¹

01 Feb 2010-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Two new reconfigurable architectures of low complexity FIR filters are proposed, namely constant shifts method and programmable shifts method, which are capable of operating for different wordlength filter coefficients without any overhead in the hardware circuitry.

...read moreread less

Abstract: Reconfigurability and low complexity are the two key requirements of finite impulse response (FIR) filters employed in multistandard wireless communication systems. In this paper, two new reconfigurable architectures of low complexity FIR filters are proposed, namely constant shifts method and programmable shifts method. The proposed FIR filter architecture is capable of operating for different wordlength filter coefficients without any overhead in the hardware circuitry. We show that dynamically reconfigurable filters can be efficiently implemented by using common subexpression elimination algorithms. The proposed architectures have been implemented and tested on Virtex 2v3000ff1152-4 field-programmable gate array and synthesized on 0.18 ?m complementary metal-oxide-semiconductor technology with a precision of 16 bits. Design examples show that the proposed architectures offer good area and power reductions and speed improvement compared to the best existing reconfigurable FIR filter implementations in the literature.

...read moreread less

158 citations

Journal Article•DOI•

Several Key Issues on Implementing Delay Line Based TDCs Using FPGAs

[...]

Jinyuan Wu

14 Jun 2010-IEEE Transactions on Nuclear Science

TL;DR: In this paper, the Wave Union TDC, a novel scheme of FPGA TDC to improve time measurement precision using multiple measurements, along with several other topics in FPGAs delay line based TDCs.

...read moreread less

Abstract: This paper discusses implementation of the Wave Union TDC, a novel scheme of FPGA TDC to improve time measurement precision using multiple measurements, along with several other topics in FPGA delay line based TDCs. FPGA specific issues such as considerations on the delay line choice in different FPGA families and encoding logic are first examined. Next, common problems for both FPGA TDCs and ASIC TDCs such as schemes of coarse time counter implementation, bin-by-bin calibration and noise issues due to single ended signals are discussed. Several resource/power saving design approaches for various processing stages are described in the document.

...read moreread less

157 citations

Book•

Cryptographic Algorithms on Reconfigurable Hardware

[...]

Francisco Rodríguez-Henríquez, Nazar Abbas Saqib, Arturo Daz Prez, Çetin Kaya Koç

29 Nov 2010

TL;DR: The author emphasizes the practical aspects of reconfigurable hardware design, explaining the basic mathematics involved, and giving a comprehensive description of state-of-the-art implementation techniques.

...read moreread less

Abstract: Software-based cryptography can be used for security applications where data traffic is not too large and low encryption rate is tolerable But hardware methods are more suitable where speed and real-time encryption are needed Until now, there has been no book explaining how cryptographic algorithms can be implemented on reconfigurable hardware devices This book covers computational methods, computer arithmetic algorithms, and design improvement techniques needed to implement efficient cryptographic algorithms in FPGA reconfigurable hardware platforms The author emphasizes the practical aspects of reconfigurable hardware design, explaining the basic mathematics involved, and giving a comprehensive description of state-of-the-art implementation techniques

...read moreread less

145 citations

Proceedings Article•DOI•

A PUF design for secure FPGA-based embedded systems

[...]

Jason H. Anderson¹•Institutions (1)

University of Toronto¹

18 Jan 2010

TL;DR: The novel design makes use of the underlying FPGA architecture, and unlike prior published PUFs, the proposed PUF can be naturally embedded into a designs HDL, consuming very little area, and does not require the use of hard macros with “xed routing.”

...read moreread less

Abstract: The concept of having an integrated circuit (IC) generate its own unique digital signature has broad application in areas such as embedded systems security, and IP/IC counter-piracy. Physically unclonable functions (PUFs) are circuits that compute a unique signature for a given IC based on the process variations inherent in the IC manufacturing process. This paper presents the first PUF design specifically targeted for field-programmable gate arrays (FPGAs). Our novel design makes use of the underlying FPGA architecture, and unlike prior published PUFs, the proposed PUF can be naturally embedded into a design's HDL, consuming very little area, and does not require the use of "hard macros" with fixed routing. Measured results on the Xilinx Virtex-5 65 nm FPGA demonstrate PUF signatures to be both unique and reliable under temperature variation.

...read moreread less

139 citations

Patent•

Low-power FPGA circuits and methods

[...]

Lei He¹•Institutions (1)

University of California¹

04 May 2010

TL;DR: In this article, the authors describe power reduction mechanisms for various portions of the FPGA, including logic blocks, routing circuits, connection blocks, switch blocks, configuration memory cells, and so forth.

...read moreread less

Abstract: Field Programmable Logic Arrays (FPGAs) are described which utilize multiple power supply voltages to reduce both dynamic power and leakage power without sacrificing speed or substantially increasing device area. Power reduction mechanisms are described for numerous portions of the FPGA, including logic blocks, routing circuits, connection blocks, switch blocks, configuration memory cells, and so forth. Embodiments describe circuits and methods for implementing multiple supplies as sources of V dd , multiple voltage thresholding V t , signal level translators, and power gating of circuitry to deactivate portions of the circuit which are inactive. The supply voltage levels can be fixed, or programmable. Methods are described for performing circuit CAD in the routing and assignment process on FPGAs, in particular for optimizing FPGA use having the power reduction circuits taught. Routing methods describe utilizing slack timing, power sensitivity, trace-based simulations, and other techniques to optimize circuit utilization on a multi V dd FPGA.

...read moreread less

Proceedings Article•DOI•

Efficient multi-ported memories for FPGAs

[...]

Charles Eric LaForest¹, J. Gregory Steffan¹•Institutions (1)

University of Toronto¹

21 Feb 2010

TL;DR: A new design is introduced that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches.

...read moreread less

Abstract: Multi-ported memories are challenging to implement with FPGAs since the provided block RAMs typically have only two ports. We present a thorough exploration of the design space of FPGA-based soft multi-ported memories by evaluating conventional solutions to this problem, and introduce a new design that efficiently combines block RAMs into multi-ported memories with arbitrary numbers of read and write ports and true random access to any memory location, while achieving significantly higher operating frequencies than conventional approaches. For example we build a 256-location, 32-bit, 12-ported (4-write, 8-read) memory that operates at 281 MHz on Altera Stratix III FPGAs while consuming an area equivalent to 3679 ALMs: a 43% speed improvement and 84% area reduction over a pure ALM implementation, and a 61% speed improvement over a pure "multipumped" implementation, although the pure multipumped implementation is 7.2x smaller.

...read moreread less

Journal Article•DOI•

Concurrent Structure-Independent Fault Detection Schemes for the Advanced Encryption Standard

[...]

Mehran Mozaffari-Kermani¹, Arash Reyhani-Masoleh¹•Institutions (1)

University of Western Ontario¹

01 May 2010-IEEE Transactions on Computers

TL;DR: This paper proposes low-cost structure-independent fault detection schemes for the AES encryption and decryption using new formulations for the fault detection of SubBytes and inverse SubBytes using the relation between the input and the output of the S-box and the inverse S-boxes.

...read moreread less

Abstract: The Advanced Encryption Standard (AES) has been lately accepted as the symmetric cryptography standard for confidential data transmission. However, the natural and malicious injected faults reduce its reliability and may cause confidential information leakage. In this paper, we study concurrent fault detection schemes for reaching a reliable AES architecture. Specifically, we propose low-cost structure-independent fault detection schemes for the AES encryption and decryption. We have obtained new formulations for the fault detection of SubBytes and inverse SubBytes using the relation between the input and the output of the S-box and the inverse S-box. The proposed schemes are independent of the way the S-box and the inverse S-box are constructed. Therefore, they can be used for both the S-boxes and the inverse S-boxes using lookup tables and those utilizing logic gates based on composite fields. Our simulation results show the error coverage of greater than 99 percent for the proposed schemes. Moreover, the proposed and the previously reported fault detection schemes have been implemented on the most recent Xilinx Virtex FPGAs. Their area and delay overheads have been compared and it is shown that the proposed schemes outperform the previously reported ones.

...read moreread less

Proceedings Article•DOI•

RAMP gold: an FPGA-based architecture simulator for multiprocessors

[...]

Zhangxi Tan¹, Andrew Waterman¹, Rimas Avizienis¹, Yunsup Lee¹, Henry Cook¹, David A. Patterson¹, Krste Asanovica¹ - Show less +3 more•Institutions (1)

University of California, Berkeley¹

13 Jun 2010

TL;DR: The RAMP Gold prototype is a high-throughput, cycle-accurate full-system simulator that runs on a single Xilinx Virtex-5 FPGA board, and which simulates a 64-core shared-memory target machine capable of booting real operating systems.

...read moreread less

Abstract: We present RAMP Gold, an economical FPGA-based architecture simulator that allows rapid early design-space exploration of manycore systems. The RAMP Gold prototype is a high-throughput, cycle-accurate full-system simulator that runs on a single Xilinx Virtex-5 FPGA board, and which simulates a 64-core shared-memory target machine capable of booting real operating systems. To improve FPGA implementation efficiency, functionality and timing are modeled separately and host multithreading is used in both models. We evaluate the prototype's performance using a modern parallel benchmark suite running on our manycore research operating system, achieving two orders of magnitude speedup compared to a widely-used software-based architecture simulator.

...read moreread less

Journal Article•DOI•

Constructing Online Testable Circuits Using Reversible Logic

[...]

S.N. Mahammad¹, K. Veezhinathan²•Institutions (2)

Indian Institutes of Technology¹, Indian Institute of Technology Madras²

01 Jan 2010-IEEE Transactions on Instrumentation and Measurement

TL;DR: A novel universal reversible logic gate (URG) and a set of basic sequential elements that could be used for building reversible sequential circuits, with 25% less garbage than the best reported in the literature are proposed.

...read moreread less

Abstract: With the advent of nanometer technology, circuits are more prone to transient faults that can occur during its operation. Of the different types of transient faults reported in the literature, the single-event upset (SEU) is prominent. Traditional techniques such as triple-modular redundancy (TMR) consume large area and power. Reversible logic has been gaining interest in the recent past due to its less heat dissipation characteristics. This paper proposes the following: 1) a novel universal reversible logic gate (URG) and a set of basic sequential elements that could be used for building reversible sequential circuits, with 25% less garbage than the best reported in the literature; (2) a reversible gate that can mimic the functionality of a lookup table (LUT) that can be used to construct a reversible field-programmable gate array (FPGA); and (3) automatic conversion of any given reversible circuit into an online testable circuit that can detect online any single-bit errors, including soft errors in the logic blocks, using theoretically proved minimum garbage, which is significantly lesser than the best reported in the literature.

...read moreread less

Journal Article•DOI•

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study

[...]

Benjamin Cope¹, Peter Y. K. Cheung¹, Wayne Luk¹, Lee Howes¹•Institutions (1)

Imperial College London¹

01 Apr 2010-IEEE Transactions on Computers

TL;DR: A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers, applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence.

...read moreread less

Abstract: A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx Virtex-4 field programmable gate array (FPGA). Two orders of magnitude speedup, over a general-purpose processor, is observed for each device for arithmetic intensive algorithms. An FPGA is superior, over a GPU, for algorithms requiring large numbers of regular memory accesses, while the GPU is superior for algorithms with variable data reuse. In the presence of data dependence, the implementation of a customized data path in an FPGA exceeds GPU performance by up to eight times. The trends of the analysis to newer and future technologies are analyzed.

...read moreread less

Journal Article•DOI•

Maximum-Likelihood Estimation With a Contracting-Grid Search Algorithm

[...]

Jacob Y. Hesterman, Luca Caucci¹, Matthew A. Kupinski¹, Harrison H. Barrett¹, Lars R. Furenlid¹ - Show less +1 more•Institutions (1)

University of Arizona¹

01 Jun 2010-IEEE Transactions on Nuclear Science

TL;DR: A fast search algorithm capable of operating in multi-dimensional spaces is introduced and its utility in the 2D and 3D maximum-likelihood position-estimation problem that arises in the processing of PMT signals to derive interaction locations in compact gamma cameras is demonstrated.

...read moreread less

Abstract: A fast search algorithm capable of operating in multi-dimensional spaces is introduced. As a sample application, we demonstrate its utility in the 2D and 3D maximum-likelihood position-estimation problem that arises in the processing of PMT signals to derive interaction locations in compact gamma cameras. We demonstrate that the algorithm can be parallelized in pipelines, and thereby efficiently implemented in specialized hardware, such as field-programmable gate arrays (FPGAs). A 2D implementation of the algorithm is achieved in Cell/BE processors, resulting in processing speeds above one million events per second, which is a 20× increase in speed over a conventional desktop machine. Graphics processing units (GPUs) are used for a 3D application of the algorithm, resulting in processing speeds of nearly 250,000 events per second which is a 250× increase in speed over a conventional desktop machine. These implementations indicate the viability of the algorithm for use in real-time imaging applications.

...read moreread less

Journal Article•DOI•

A new methodology to implement the AES algorithm using partial and dynamic reconfiguration

[...]

José M. Granado-Criado¹, Miguel A. Vega-Rodríguez¹, Juan M. Sánchez-Pérez¹, Juan A. Gómez-Pulido¹•Institutions (1)

University of Extremadura¹

01 Jan 2010-Integration

TL;DR: This paper proposes its own methodology for doing an FPGA-based AES implementation, which combines the use of three hardware languages with partial and dynamic reconfiguration, and a pipelined and parallel implementation.

...read moreread less

Journal Article•DOI•

Accurate hardware-based stereo vision

[...]

Kristian Ambrosch¹, Wilfried Kubinger¹•Institutions (1)

Austrian Institute of Technology¹

01 Nov 2010-Computer Vision and Image Understanding

TL;DR: A novel stereo matching algorithm that is designed for high efficiency when realized in hardware and designed for the deployment in Field Programmable Gate Arrays and Application Specific Integrated Circuits (ASICs) is proposed.

...read moreread less

Journal Article•DOI•

Complex event detection at wire speed with FPGAs

[...]

Louis Woods¹, Jens Teubner¹, Gustavo Alonso¹•Institutions (1)

ETH Zurich¹

01 Sep 2010

TL;DR: This paper presents a hardware-based complex event detection system implemented on field-programmable gate arrays (FPGAs) that can detect complex events at gigabit wire speed with constant and fully predictable latency, independently of network load, packet size, or data distribution.

...read moreread less

Abstract: Complex event detection is an advanced form of data stream processing where the stream(s) are scrutinized to identify given event patterns. The challenge for many complex event processing (CEP) systems is to be able to evaluate event patterns on high-volume data streams while adhering to real-time constraints. To solve this problem, in this paper we present a hardware-based complex event detection system implemented on field-programmable gate arrays (FPGAs). By inserting the FPGA directly into the data path between the network interface and the CPU, our solution can detect complex events at gigabit wire speed with constant and fully predictable latency, independently of network load, packet size, or data distribution. This is a significant improvement over CPU-based systems and an architectural approach that opens up interesting opportunities for hybrid stream engines that combine the flexibility of the CPU with the parallelism and processing power of FPGAs.

...read moreread less

Proceedings Article•DOI•

Compressive sampling hardware reconstruction

[...]

Avi Septimus¹, Raphael Steinberg¹•Institutions (1)

Technion – Israel Institute of Technology¹

03 Aug 2010

TL;DR: This work presents a VLSI implementation of a computationally efficient algorithm named Orthogonal Matching Pursuit, and further optimize the algorithm to meet typical hardware constraints and describe the different block units of the design.

...read moreread less

Abstract: Compressive Sampling reconstruction techniques require computationally intensive algorithms, often using L1 optimization to reconstruct a signal that was originally sampled at a sub-Nyquist rate. In this work we present a VLSI implementation of a computationally efficient algorithm named Orthogonal Matching Pursuit. We further optimize the algorithm to meet typical hardware constraints and describe the different block units of our design. We synthesize our design for the Xilinx Virtex 5 FPGA and give timing and area results. We summarize our work with a short discussion of the possible uses for our system.

...read moreread less

Book Chapter•DOI•

A lightweight implementation of Keccak hash function for radio-frequency identification applications

[...]

Elif Bilge Kavun¹, Tolga Yalcin¹•Institutions (1)

Middle East Technical University¹

08 Jun 2010

TL;DR: This paper presents a lightweight implementation of the permutation Keccak-f[200] and KeCCak- f[400] of the SHA-3 candidate hash function Keccack, which is also the first lightweight Implementation of a sponge function, which differentiates it from the previous works.

...read moreread less

Abstract: In this paper, we present a lightweight implementation of the permutation Keccak-f[200] and Keccak-f[400] of the SHA-3 candidate hash function Keccak. Our design is well suited for radio-frequency identification (RFID) applications that have limited resources and demand lightweight cryptographic hardware. Besides its low-area and low-power, our design gives a decent throughput. To the best of our knowledge, it is also the first lightweight implementation of a sponge function, which differentiates it from the previous works. By implementing the new hash algorithm Keccak, we have utilized unique advantages of the sponge construction. Although the implementation is targeted for Application Specific Integrated Circuit (ASIC) platforms, it is also suitable for Field Programmable Gate Arrays (FPGA). To obtain a compact design, serialized data processing principles are exploited together with algorithm-specific optimizations. The design requires only 2.52K gates with a throughput of 8 Kbps at 100 KHz system clock based on 0.13-µm CMOS standard cell library.

...read moreread less

Journal Article•DOI•

Implementation of an SDR system using graphics processing unit

[...]

Jae Hong Kim¹, Seungheon Hyeon¹, Seungwon Choi¹•Institutions (1)

Hanyang University¹

01 Mar 2010-IEEE Communications Magazine

TL;DR: A novel procedure of implementing software defined radio modem using a graphics processing unit instead of conventional digital signal processors and/or field programmable gate arrays and it is observed that the GPU-driven modem is nearly 90 times faster than the conventional 8-way Very Long Instruction Word architectured DSP- driven modem.

...read moreread less

Abstract: This article presents a novel procedure of implementing software defined radio modem using a graphics processing unit instead of conventional digital signal processors and/or field programmable gate arrays. Considering that modern GPU is suitable for parallel computing due to its numerous powerful arithmetic logic units, we suggest a proper architecture of hardware and software platform for the SDR modem to be implemented on GPU. Then, we show a design example of mobile WiMAX terminal implemented on the proposed GPU platform. In our experimental tests, we observed that the GPU-driven modem is nearly 90 times faster than the conventional 8-way Very Long Instruction Word architectured DSP-driven modem for the application of Viterbi decoder implementation of mobile WiMAX terminal.

...read moreread less

Journal Article•DOI•

Durham adaptive optics real-time controller.

[...]

Alastair Basden¹, Deli Geng¹, Richard M. Myers¹, Eddy Younger¹•Institutions (1)

Durham University¹

10 Nov 2010-Applied Optics

TL;DR: The Durham adaptive optics real-time controller was initially a proof of concept design for a generic AO control system and has since been developed into a modern and powerful central-processing-unit-based real- time control system, capable of using hardware acceleration.

...read moreread less

Abstract: The Durham adaptive optics (AO) real-time controller was initially a proof of concept design for a generic AO control system. It has since been developed into a modern and powerful central-processing-unit-based real-time control system, capable of using hardware acceleration (including field programmable gate arrays and graphical processing units), based primarily around commercial off-the-shelf hardware. It is powerful enough to be used as the real-time controller for all currently planned 8 m class telescope AO systems. Here we give details of this controller and the concepts behind it, and report on performance, including latency and jitter, which is less than 10 μs for small AO systems.

...read moreread less

Journal Article•DOI•

Customizable FPGA IP Core Implementation of a General-Purpose Genetic Algorithm Engine

[...]

Pradeep Ruben Fernando¹, Srinivas Katkoori¹, Didier Keymeulen², Ricardo Zebulum², Adrian Stoica² - Show less +1 more•Institutions (2)

University of South Florida¹, Jet Propulsion Laboratory²

01 Feb 2010-IEEE Transactions on Evolutionary Computation

TL;DR: The design of an IP core that implements a general purpose GA engine which has been successfully synthesized and verified on a Xilinx Virtex II Pro FPGA device (XC2VP30) is reported.

...read moreread less

Abstract: Hardware implementation of genetic algorithms (GAs) is gaining importance because of their proven effectiveness as optimization engines for real-time applications (e.g., evolvable hardware). Earlier hardware implementations suffer from major drawbacks such as absence of GA parameter programmability, rigid predefined system architecture, and lack of support for multiple fitness functions. In this paper, we report the design of an IP core that implements a general-purpose GA engine that addresses these problems. Specifically, the proposed GA IP core can be customized in terms of the population size, number of generations, crossover and mutation rates, random number generator seed, and the fitness function. It has been successfully synthesized and verified on a Xilinx Virtex II Pro Field programmable gate arrays device (xc2vp30-7ff896) with only 13% logic slice utilization, 1% block memory utilization for GA memory, and a clock speed of 50 MHz. The GA core has been used as a search engine for real-time adaptive healing but can be tailored to any given application by interfacing with the appropriate application-specific fitness evaluation module as well as the required storage memory and by programming the values of the desired GA parameters. The core is soft in nature i.e., a gate-level netlist is provided which can be readily integrated with the user's system. The performance of the GA core was tested using standard optimization test functions. In the hardware experiments, the proposed core either found the globally optimum solution or found a solution that was within 3.7% of the value of the globally optimal solution. The experimental test setup including the GA core achieved a speedup of around 5.16× over an analogous software implementation.

...read moreread less

Journal Article•DOI•

A passive auto-focus camera control system

[...]

Chih-Yung Chen, Rey-Chue Hwang¹, Yu-Ju Chen²•Institutions (2)

I-Shou University¹, Cheng Shiu University²

01 Jan 2010

TL;DR: The discrete wavelet transformation, the morphology edge enhancement sharpness measurement algorithms, and the self-organizing map (SOM) neural network were used in developing the control mechanism of the passive auto-focus camera control system.

...read moreread less

Abstract: This paper presents a passive auto-focus camera control system which can easily achieve the function of auto-focus with no necessary of any active component (e.g., infrared or ultrasonic sensor) in comparison with the conventional active focus system. To implement the technique we developed, the hardware system including the adjustable lens with CMOS sensor and servo motor, an 8051 image capture micro-controller, a field programmable gate array (FPGA) sharpness measurement circuit, a pulse width modulation (PWM) controller, and a personal digital assistant (PDA) image displayer was constructed. The discrete wavelet transformation (DWT), the morphology edge enhancement sharpness measurement algorithms, and the self-organizing map (SOM) neural network were used in developing the control mechanism of the system. Compared with other passive auto-focus methods, the method we proposed has the advantages of lower computational complexity and easier hardware implementation.

...read moreread less

Journal Article•DOI•

A high-resolution (< 10 ps RMS) 32-Channel Time-to-Digital Converter (TDC) implemented in a Field Programmable Gate Array (FPGA)

[...]

Eugen Bayer¹, M. Traxler²•Institutions (2)

Goethe University Frankfurt¹, GSI Helmholtz Centre for Heavy Ion Research²

24 May 2010

TL;DR: In this article, a 48-channel Time-to-Digital Converter (TDC) implemented in a general purpose Field Programmable Gate Array (FPGA) is presented, where dedicated carry chains of the FPGA are utilized for time interpolation purposes inside a clock cycle.

...read moreread less

Abstract: A high-resolution 48-Channel Time-to-Digital Converter (TDC) implemented in a general purpose Field Programmable Gate Array (FPGA) is presented. Dedicated carry chains of the FPGA are utilized for time interpolation purposes inside a clock cycle. A counter running at the system clock frequency provides a global time stamp. These two values, along with the channel number, are stored for readout. An extra effort was made to improve the resolution beyond the intrinsic cell delay of the carry chain as well as to achieve the same resolution on all 48 channels. Due to large bin width variations a bin-by-bin calibration scheme was used. Time interval (TI) measurements between two channels were made to determine the RMS and the time resolution of a single channel. At least 6 ps single channel resolution was achieved for all channels. Additional measurements were performed to characterize the influence of the temperature and voltage variations on the RMS value and the mean as well as the sensitivity of the TDC to crosstalk. The results of these measurements are also presented in this paper.

...read moreread less

Journal Article•DOI•

Design and Implementation of a Sort-Free K-Best Sphere Decoder

[...]

S. Mondal¹, Ahmed M. Eltawil², Chung-An Shen², Khaled N. Salama³•Institutions (3)

Rensselaer Polytechnic Institute¹, University of California, Irvine², King Abdullah University of Science and Technology³

01 Oct 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel sort-free approach to path extension, as well as quantized metrics result in a high-throughput VLSI architecture with lower power and area consumption compared to state-of-the-art published systems.

...read moreread less

Abstract: This paper describes the design and very-large-scale integration (VLSI) architecture for a 4 × 4 breadth-first K-best multiple-input-multiple-output (MIMO) decoder using a 64 quadrature-amplitude modulation (QAM) scheme. A novel sort-free approach to path extension, as well as quantized metrics result in a high-throughput VLSI architecture with lower power and area consumption compared to state-of-the-art published systems. Functionality is confirmed via a field-programmable gate array (FPGA) implementation on a Xilinx Virtex II Pro FPGA. Comparison of simulation and measurements are given, and FPGA utilization figures are provided. Finally, VLSI architectural tradeoffs are explored for a synthesized application-specific IC (ASIC) implementation in a 65-nm CMOS technology.

...read moreread less

Proceedings Article•DOI•

Glacier: a query-to-hardware compiler

[...]

Rene Mueller¹, Jens Teubner¹, Gustavo Alonso¹•Institutions (1)

ETH Zurich¹

06 Jun 2010

TL;DR: This demonstration shows Glacier, a library and a compiler that can be employed to implement streaming queries as hardware circuits on FPGAs, to show the flexibility of the compositional approach.

...read moreread less

Abstract: Field-programmable gate arrays (FPGAs) are a promising technology that can be used in database systems. In this demonstration we show Glacier, a library and a compiler that can be employed to implement streaming queries as hardware circuits on FPGAs. Glacier consists of a library of compositional hardware modules that represent stream processing operators. Given a query execution plan, the compiler instantiates the corresponding components and wires them up to a digital circuit. The goal of this demo is to show the flexibility of the compositional approach.

...read moreread less

Journal Article•DOI•

An Efficient Architecture for 3-D Discrete Wavelet Transform

[...]

A. Das¹, A. Hazra², Swapna Banerjee³•Institutions (3)

Nvidia¹, STMicroelectronics², Indian Institute of Technology Kharagpur³

01 Feb 2010-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The proposed design is one of the first lifting based complete 3-D-DWT architectures without group of pictures restriction, and the new computing technique based on analysis of lifting signal flow graph minimizes the storage requirement.

...read moreread less

Abstract: This paper presents an architecture of the lifting-based running 3-D discrete wavelet transform (DWT), which is a powerful image and video compression algorithm. The proposed design is one of the first lifting based complete 3-D-DWT architectures without group of pictures restriction. The new computing technique based on analysis of lifting signal flow graph minimizes the storage requirement. This architecture enjoys reduced memory referencing and related low power consumption, low latency, and high throughput compared to those of earlier reported works. The proposed architecture has been successfully implemented on Xilinx Virtex-IV series field-programmable gate array, offering a speed of 321 MHz, making it suitable for real-time compression even with large frame dimensions. Moreover, the architecture is fully scalable beyond the present coherent Daubechies filterbank (9, 7).

...read moreread less

Collapse