Showing papers on "Very-large-scale integration published in 2008"

PDF

Open Access

Journal Article•DOI•

Soft-output sphere decoding: algorithms and VLSI implementation

[...]

Christoph Studer¹, Andreas Burg, Helmut Bölcskei•Institutions (1)

01 Feb 2008

TL;DR: VLSI implementation results are provided which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI Implementation.

...read moreread less

Abstract: Multiple-input multiple-output (MIMO) detection algorithms providing soft information for a subsequent channel decoder pose significant implementation challenges due to their high computational complexity. In this paper, we show how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and (error rate) performance. In particular, we provide VLSI implementation results which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI implementation.

...read moreread less

404 citations

Proceedings Article•DOI•

Wafer-scale integration of analog neural networks

[...]

Johannes Schemmel¹, J. Fieres¹, Karlheinz Meier¹•Institutions (1)

Heidelberg University¹

01 Jun 2008

TL;DR: A novel asynchronous low-voltage signaling scheme is presented that makes the wafer-scale approach feasible by limiting the total power consumption while simultaneously providing a flexible, programmable network topology.

...read moreread less

Abstract: This paper introduces a novel design of an artificial neural network tailored for wafer-scale integration. The presented VLSI implementation includes continuous-time analog neurons with up to 16 k inputs. A novel interconnection and routing scheme allows the mapping of a multitude of network models derived from biology on the VLSI neural network while maintaining a high resource usage. A single 20 cm wafer contains about 60 million synapses. The implemented neurons are highly accelerated compared to biological real time. The power consumption of the dense interconnection network providing the necessary communication bandwidth is a critical aspect of the system integration. A novel asynchronous low-voltage signaling scheme is presented that makes the wafer-scale approach feasible by limiting the total power consumption while simultaneously providing a flexible, programmable network topology.

...read moreread less

277 citations

Book•

Digital Integrated Circuit Design: From VLSI Architectures to CMOS Fabrication

[...]

Hubert Kaeslin¹•Institutions (1)

ETH Zurich¹

28 Apr 2008

TL;DR: This comprehensive guide to how and when to design VLSI circuits, covers the advances, challenges and past mistakes in design, acting as an introduction to graduate students and a reference for practising electronic engineers.

...read moreread less

Abstract: VLSI circuits are ubiquitous in the modern world, and designing them efficiently is becoming increasingly challenging with the development of ever smaller chips. This practically oriented textbook covers the important aspects of VLSI design using a top-down approach, reflecting the way digital circuits are actually designed. Using practical hints and tips, case studies and checklists, this comprehensive guide to how and when to design VLSI circuits, covers the advances, challenges and past mistakes in design, acting as an introduction to graduate students and a reference for practising electronic engineers.

...read moreread less

155 citations

Proceedings Article•DOI•

VLSI implementation of a neural network memory with several hundreds of neurons

[...]

Hans Peter Graf, Lawrence D. Jackel, Richard Howard, B. Straughn, John S. Denker, W. Hubbard, D. M. Tennant, Daniel B. Schwartz - Show less +4 more

05 Jun 2008

TL;DR: An Electronic Neural Network memory with 256 neurons on a single chip using a combination of analog and digital VLSI technology plus a custom microfabrication process to allow a very dense packing of the neurons.

...read moreread less

Abstract: We designed an Electronic Neural Network (ENN) memory with 256 neurons on a single chip using a combination of analog and digital VLSI technology plus a custom microfabrication process. Amplifiers with inverting and noninverting outputs are used for the neurons to make inhibitory and excitatory connections. The connections between the individual neurons are provided by amorphous‐silicon resistors which are placed on the CMOS chip in the last fabrication step. This technique allows a very dense packing of the neurons. Electron‐beam direct‐writing is used to pattern the resistors making it easy to change the information stored in the network from one chip to the next.

...read moreread less

143 citations

Journal Article•DOI•

A High-Performance Droplet Routing Algorithm for Digital Microfluidic Biochips

[...]

Minsik Cho¹, David Z. Pan¹•Institutions (1)

University of Texas at Austin¹

01 Oct 2008-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A high-performance droplet router for a digital microfluidic biochip (DMFB) design that achieves over 35 x and 20 x better routability with comparable timing and fault tolerance than the popular prioritized A* search and the state-of-the-art network-flow-based algorithm, respectively.

...read moreread less

Abstract: In this paper, we propose a high-performance droplet router for a digital microfluidic biochip (DMFB) design. Due to recent advancements in the biomicro electromechanical system and its various applications to clinical, environmental, and military operations, the design complexity and the scale of a DMFB are expected to explode in the near future, thus requiring strong support from CAD as in conventional VLSI design. Among the multiple design stages of a DMFB, droplet routing, which schedules the movement of each droplet in a time-multiplexed manner, is one of the most critical design challenges due to high complexity as well as large impacts on performance. Our algorithm first routes a droplet with higher by passibility which is less likely to block the movement of the others. When multiple droplets form a deadlock, our algorithm resolves it by backing off some droplets for concession. The final compaction step further enhances timing as well as fault tolerance by tuning each droplet movement greedily. The experimental results on hard benchmarks show that our algorithm achieves over 35 x and 20 x better routability with comparable timing and fault tolerance than the popular prioritized A* search and the state-of-the-art network-flow-based algorithm, respectively.

...read moreread less

141 citations

Proceedings Article•DOI•

Integrated circuit design with NEM relays

[...]

Fred F. Chen¹, Hei Kam², Dejan Markovic³, Tiehui Liu², Vladimir Stojanovic¹, Elad Alon² - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, University of California, Berkeley², University of California, Los Angeles³

10 Nov 2008

TL;DR: It is shown that NEM relay-based adders can achieve an order of magnitude or more improvement over CMOS adders with ns-range delays and with no area penalty, and can be achieved at higher throughputs at the cost of increased area.

...read moreread less

Abstract: To overcome the energy-efficiency limitations imposed by finite sub-threshold slope in CMOS transistors, this paper explores the design of integrated circuits based on nano-electro-mechanical (NEM) relays. A dynamical Verilog-A model of the NEM relay is described and correlated to device measurements. Using this model we explore NEM relay design strategies for digital logic and I/O that can significantly improve the energy efficiency of the whole VLSI system. By exploiting the low effective threshold voltage and zero leakage achievable with these relays, we show that NEM relay-based adders can achieve an order of magnitude or more improvement in energy efficiency over CMOS adders with ns-range delays and with no area penalty. By applying parallelism, this improvement in energy-efficiency can be achieved at higher throughputs as well, at the cost of increased area. Similar improvements in high-speed I/O energy are also predicted by making use of the relays to implement highly energy-efficient digital-to-analog and analog-to-digital converters.

...read moreread less

139 citations

Journal Article•DOI•

High-Speed VLSI Implementation of 2-D Discrete Wavelet Transform

[...]

Chao Cheng¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

01 Jan 2008-IEEE Transactions on Signal Processing

TL;DR: This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures that can be easily achieved for an NtimesN image with controlled increase of hardware cost.

...read moreread less

Abstract: This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures. High-speed 2-D DWT with computation time as low as N 2/12 can be easily achieved for an NtimesN image with controlled increase of hardware cost. Compared with recently published 2-D DWT architectures with computation time of N 2/3 and 2N 2/3, the proposed designs can also save a large amount of multipliers and/or storage elements. It can also be used to implement those 2-D DWT traditionally suitable for lifting or flipping-based designs, such as (9,7) and (6,10) DWT. The throughput rate can be improved by a factor of 4 by the proposed approach, but the hardware cost increases by a factor of around 3. Furthermore, the proposed designs have very simple control signals, regular structures and 100% hardware utilization for continuous images.

...read moreread less

107 citations

Journal Article•DOI•

Low-Power and High-Performance 1-Bit CMOS Full-Adder Cell

[...]

Keivan Navi, Omid Kavehei¹, M. Rouholamini, Amir Sahafi, Shima Mehrabi, Nooshin Dadkhahi - Show less +2 more•Institutions (1)

Shahid Beheshti University¹

02 Jan 2008-Journal of Computers

TL;DR: Simulation results illustrate the superiority of the resulting proposed adder against conventional CMOS 1-bit full-adder in terms of power, delay and PDP.

...read moreread less

Abstract: In this paper a new low power and high performance adder cell using a new design style called “Bridge” is proposed. The bridge design style enjoys a high degree of regularity, higher density than conventional CMOS design style as well as lower power consumption, by using some transistors, named bridge transistors. Simulation results illustrate the superiority of the resulting proposed adder against conventional CMOS 1-bit full-adder in terms of power, delay and PDP. We have performed simulations using HSPICE in a 90 nanometer (nm) standard CMOS technology at room temperature; with supply voltage variation from 0.65v to 1.5v with 0.05v steps.

...read moreread less

107 citations

Proceedings Article•DOI•

A low-overhead fault tolerance scheme for TSV-based 3D network on chip links

[...]

Loi, Mitra, Lee, Fujita, Benini - Show less +1 more

01 Jan 2008

101 citations

Book•

Low-Power High-Level Synthesis for Nanoscale CMOS Circuits

[...]

Saraju P. Mohanty, Nagarajan Ranganathan, Elias Kougianos, Priyadarsan Patra

07 Jul 2008

TL;DR: Low-Power High-Level Synthesis for Nanoscale CMOS Circuits addresses the need for analysis, characterization, estimation, and optimization of the various forms of power dissipation in the presence of process variations of nano-CMOS technologies.

...read moreread less

Abstract: Low-Power High-Level Synthesis for Nanoscale CMOS Circuits addresses the need for analysis, characterization, estimation, and optimization of the various forms of power dissipation in the presence of process variations of nano-CMOS technologies. The authors show very large-scale integration (VLSI) researchers and engineers how to minimize the different types of power consumption of digital circuits. The material deals primarily with high-level (architectural or behavioral) energy dissipation because the behavioral level is not as highly abstracted as the system level nor is it as complex as the gate/transistor level. At the behavioral level there is a balanced degree of freedom to explore power reduction mechanisms, the power reduction opportunities are greater, and it can cost-effectively help in investigating lower power design alternatives prior to actual circuit layout or silicon implementation. The book is a self-contained low-power, high-level synthesis text for Nanoscale VLSI design engineers and researchers. Each chapter has simple relevant examples for a better grasp of the principles presented. Several algorithms are given to provide a better understanding of the underlying concepts. The initial chapters deal with the basics of high-level synthesis, power dissipation mechanisms, and power estimation. In subsequent parts of the text, a detailed discussion of methodologies for the reduction of different types of power is presented including: Power Reduction Fundamentals Energy or Average Power Reduction Peak Power Reduction Transient Power Reduction Leakage Power Reduction Low-Power High-Level Synthesis for Nanoscale CMOS Circuits provides a valuable resource for the design of low-power CMOS circuits.

...read moreread less

79 citations

Book•

VLSI Circuits for Biomedical Applications

[...]

Krzysztof Iniewski

30 Jun 2008

TL;DR: In this article, the authors present a comprehensive, state-of-the-art overview of VLSI circuit design for a wide range of applications in biology and medicine supported with over 280 illustrations and over 160 equations.

...read moreread less

Abstract: VLSI (very large scale integration) is the process of creating integrated circuits by combining thousands of transistor based circuits into a single chip Written by top-notch international experts in industry and academia, this groundbreaking resource presents a comprehensive, state-of-the-art overview of VLSI circuit design for a wide range of applications in biology and medicineSupported with over 280 illustrations and over 160 equations, the book offers cutting-edge guidance on designing integrated circuits for wireless biosensing, body implants, biosensing interfaces, and molecular biology Engineers discover innovative design techniques and novel materials to help them achieve higher levels circuit and system performance This invaluable volume is essential reading for professionals and graduate students with a serious interest in circuit design and future biomedical technology

...read moreread less

Proceedings Article•DOI•

Gram-Schmidt-based QR decomposition for MIMO detection: VLSI implementation and comparison

[...]

P. Luethi¹, Christoph Studer¹, S. Duetsch¹, Eugen Zgraggen¹, Hubert Kaeslin¹, Norbert Felber¹, Wolfgang Fichtner¹ - Show less +3 more•Institutions (1)

ETH Zurich¹

01 Nov 2008

TL;DR: An optimized fixed-point VLSI implementation of the modified Gram-Schmidt (MGS) QRD algorithm that incorporates regularization and additional sorting of the MIMO channel matrix that clearly showed superiority of the Givens rotation (GR) solution in terms of area, processing cycles, and throughput.

...read moreread less

Abstract: The QR decomposition (QRD) is an important prerequisite for many different detection algorithms in multiple-input multiple-output (MIMO) wireless communication systems. This paper presents an optimized fixed-point VLSI implementation of the modified Gram-Schmidt (MGS) QRD algorithm that incorporates regularization and additional sorting of the MIMO channel matrix. Integrated in 0.18 mum CMOS technology, the proposed VLSI architecture processes up to 1.56 million complex-valued 4times4-dimensional matrices per second. The implementation results of this work are extensively compared to the Givens rotation (GR)-based QRD implementation of Luethi et al., ISCAS 2007. In order to ensure a fair comparison, both QRD circuits have been integrated in the same IC manufacturing technology, with equal functionality, and the same numeric precision. The comparison of the implementation results clearly showed superiority of the GR-based VLSI solution in terms of area, processing cycles, and throughput.

...read moreread less

Journal Article•DOI•

An Efficient VLSI Architecture for Normal I/O Order Pipeline FFT Design

[...]

Yun-Nan Chang¹•Institutions (1)

National Sun Yat-sen University¹

22 Dec 2008-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: In this paper, an efficient VLSI architecture of a pipeline fast Fourier transform (FFT) processor capable of producing the normal output order sequence is presented and a sequence conversion method by integrating the conversion function into the last-stage data commutator module is presented.

...read moreread less

Abstract: In this paper, an efficient VLSI architecture of a pipeline fast Fourier transform (FFT) processor capable of producing the normal output order sequence is presented. A new FFT design based on the decimated dual-path delay feed-forward data commutator unit by splitting the input stream into two half-word streams is first proposed. The resulting architecture can achieve full hardware efficiency such that the required number of adders can be reduced by half. Next, in order to generate the normal output order sequence, this paper also presents a sequence conversion method by integrating the conversion function into the last-stage data commutator module.

...read moreread less

Proceedings Article•DOI•

Realizing biological spiking network models in a configurable wafer-scale hardware system

[...]

J. Fieres¹, Johannes Schemmel¹, Karlheinz Meier¹•Institutions (1)

Heidelberg University¹

01 Jun 2008

TL;DR: This paper focuses on the usability of the analog VLSI hardware architecture for the distributed simulation of large-scale spiking neural networks by demonstrating that biologically relevant network models can in fact be mapped to this system.

...read moreread less

Abstract: An analog VLSI hardware architecture for the distributed simulation of large-scale spiking neural networks has been developed. Several hundred integrated computing nodes, each hosting up to 512 neurons, will be interconnected and operated on un-cut silicon wafers. The electro-technical aspects and the details of the hardware implementation are covered in a separate contribution to this conference. This paper focuses on the usability of the system by demonstrating that biologically relevant network models can in fact be mapped to this system. Different network configurations are established on the hardware by programmable switch matrices, repeaters, and address decoders. Systematic routing algorithms are presented to map a given network model to the hardware system. Routing is simulated for several network examples, proving the systempsilas practical applicability. Furthermore, the routing simulations are used to fix values for yet open hardware parameters.

...read moreread less

Journal Article•DOI•

Hardware Implementation Trade-Offs of Polynomial Approximations and Interpolations

[...]

Dong-U Lee, Ray C. C. Cheung, Wayne Luk¹, J.D. Villasenor²•Institutions (2)

Imperial College London¹, University of California, Los Angeles²

01 May 2008-IEEE Transactions on Computers

TL;DR: The results show that the extent of memory savings realized by using interpolation is significantly lower than what is commonly believed, and the availability of both interpolation-based and approximation-based designs offers a richer set of design trade-offs than what was available using either interpolation or approximation alone.

...read moreread less

Abstract: This paper examines the hardware implementation trade-offs when evaluating functions via piecewise polynomial approximations and interpolations for precisions of up to 24 bits. In polynomial approximations, polynomials are evaluated using stored coefficients. Polynomial interpolations, however, require the coefficients to be computed on-the-fly by using stored function values. Although it is known that interpolations require less memory than approximations, but at the expense of additional computations, the trade-offs in memory, area, delay, and power consumption between the two approaches have not been examined in detail. This work quantitatively analyzes these trade-offs for optimized approximations and interpolations across different functions and target precisions. Hardware architectures for degree-1 and degree-2 approximations and interpolations are described. The results show that the extent of memory savings realized by using interpolation is significantly lower than what is commonly believed. Furthermore, experimental results on a field-programmable gate array (FPGA) show that, for high output precision, degree-1 interpolations offer considerable area and power savings over degree-1 approximations, but similar savings are not realized when degree-2 interpolations and approximations are compared. The availability of both interpolation-based and approximation-based designs offers a richer set of design trade-offs than what is available using either interpolation or approximation alone.

...read moreread less

Journal Article•DOI•

Evaluation of gunshot detection algorithms

[...]

Alfonso Chacon-Rodriguez, Pedro Julian, L Castro, P Alvarado, N Hernandez - Show less +1 more

07 Oct 2008

TL;DR: Five pre-processing algorithms for the detection of firearm gunshots are statistically evaluated, using the receiver operating characteristic method, as a previous feasibility metric for their implementation on a low power VLSI circuit.

...read moreread less

Abstract: Six preprocessing algorithms for the detection of firearm gunshots are statistically evaluated, using the receiver operating characteristic method as a previous feasibility metric for their implementation on a low-power VLSI circuit. Circuits are intended to serve as the input detection sensors of a low-power environmental surveillance network. Some possible VLSI implementations for the evaluated algorithms are also evaluated. Results indicate that the use of wavelet bank filters, either discrete or continuous, might be the best choice in terms of the compromise between detection efficiency and the power requirements of the intended application.

...read moreread less

Proceedings Article•DOI•

Using TMR Architectures for Yield Improvement

[...]

Julien Vial, Alberto Bosio, Patrick Girard, Christian Landrault, Serge Pravossoudovitch, Arnaud Virazel - Show less +2 more

01 Oct 2008

TL;DR: The classical triple modular redundancy (TMR) fault tolerant architecture is used as a case study and a new manner to implement the TMR architecture is proposed that makes it very effective for yield improvement purpose.

...read moreread less

Abstract: With the technology entering the nano dimension, manufacturing processes are less and less reliable, thus drastically impacting the yield. A possible solution to alleviate this problem in the future could consist in using fault tolerant architectures to tolerate manufacturing defects. In this paper, we use the classical triple modular redundancy (TMR) fault tolerant architecture as a case study. Firstly we analyze the conditions that make the use of TMR architectures interesting for yield improvement purpose. In the second part of the paper, we investigate the test requirements for the TMR architecture and we propose a solution for generating test patterns for this type of architecture. Finally, we propose a new manner to implement the TMR architecture that makes it very effective for yield improvement purpose. Experimental results are provided on ISCAS and ITC benchmark circuits to prove the efficiency of the proposed approach in terms of yield improvement with a low area overhead.

...read moreread less

Proceedings Article•DOI•

The efficient VLSI design of BI-CUBIC convolution interpolation for digital image processing

[...]

Chung-chi Lin¹, Ming-Hwa Sheu¹, Huann-keng Chiang¹, Chishyan Liaw², Zeng-chuan Wu¹ - Show less +1 more•Institutions (2)

National Yunlin University of Science and Technology¹, Tunghai University²

18 May 2008

TL;DR: The simulation results demonstrate that the high performance architecture of bi-cubic convolution interpolation at 279 MHz with 30643 gates in a 498times498 mum chip is able to process digital image scaling for HDTV in real-time.

...read moreread less

Abstract: This paper presents an efficient VLSI design of bicubic convolution interpolation for digital image processing. The architecture of reducing the computational complexity of generating coefficients as well as decreasing number of memory access times is proposed. Our proposed method provides a simple hardware architecture design, low computation cost and is easy to implement. Based on our technique, the high-speed VLSI architecture has been successfully designed and implemented with TSMC 0.13 mum standard cell library. The simulation results demonstrate that the high performance architecture of bi-cubic convolution interpolation at 279 MHz with 30643 gates in a 498times498 mum chip is able to process digital image scaling for HDTV in real-time.

...read moreread less

Journal Article•DOI•

Body Bias Voltage Computations for Process and Temperature Compensation

[...]

Saurabh Kumar¹, Chris H. Kim¹, Sachin S. Sapatnekar¹•Institutions (1)

University of Minnesota¹

01 Mar 2008-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This work provides a means to efficiently compute the body bias voltages required for ensuring high performance operation in gigascale systems and provides a computer-aided design (CAD) perspective for determining the exact amount of bias voltage that can compensate both temperature and process variations.

...read moreread less

Abstract: With continued scaling into the sub-90-nm regime, the role of process, voltage, and temperature (PVT) variations on the performance of VLSI circuits has become extremely important. These variations can cause the delay and the leakage of the chip to vary significantly from their expected values, thereby affecting the yield. Circuit designers have proposed the use of threshold voltage modulation techniques to pull back the chip to the nominal operational region. One such scheme, known as adaptive body bias (ABB), has become extremely effective in ensuring optimal performance or leakage savings. Our work provides a means to efficiently compute the body bias voltages required for ensuring high performance operation in gigascale systems. We provide a computer-aided design (CAD) perspective for determining the exact amount of bias voltages that can compensate both temperature and process variations. Mathematical models for delay and leakage based on minimal tester measurements are built, and a nonlinear optimization problem is formulated to ensure highest frequency operation under all conditions, and thereby minimize the overall circuit leakage. Three different algorithms are presented and their accuracies and runtimes are compared. The algorithms have been applied to a wide range of process and temperature corners, for a 65- and 45-nm technology node-based process. A suitable implementation mechanism has also been outlined.

...read moreread less

Proceedings Article•DOI•

NBTI resilient circuits using adaptive body biasing

[...]

Zhenyu Qi¹, Mircea R. Stan¹•Institutions (1)

University of Virginia¹

04 May 2008

TL;DR: The proposed reliability monitor not only tracks the NBTI effect but also mitigates the degradation by forward biasing the PMOS.

...read moreread less

Abstract: Reliability has become a practical concern in today's VLSI design with advanced technologies. In-situ sensors have been proposed for reliability monitoring to provide advance warnings before system errors occur. This paper presents a reliability monitor design for NBTI (Negative Bias Temperature Instability). NBTI is recognized as very critical as it leads to short device lifetime. The proposed reliability monitor not only tracks the NBTI effect but also mitigates the degradation by forward biasing the PMOS. A worst case scenario static stress experiment demonstrates two orders of magnitude improvement in system lifetime using PTM 65nm technology. A ring oscillator example shows how frequency degradation can be compensated. Deployment of the proposed NBTI monitor is also discussed and two compatible strategies are provided to incorporate these monitors efficiently: the first focuses on low area overhead while the second features low power.

...read moreread less

Journal Article•DOI•

A high-performance reconfigurable VLSI architecture for vbsme in H.264

[...]

Cao Wei¹, Hou Hui¹, Tong Jia-rong¹, Lai Jinmei¹, Min Hao¹ - Show less +1 more•Institutions (1)

Fudan University¹

01 Aug 2008-IEEE Transactions on Consumer Electronics

TL;DR: A new high-performance reconfigurable VLSI architecture to support "meander"-like scan format for a high data reuse of search area to increase the hardware utilization for VBSME with FSBMA.

...read moreread less

Abstract: VBSME (variable block size motion estimation) is adopted in the MPEG-4 AVC/H.264 standard. In order to increase the hardware utilization for VBSME with FSBMA (full search block matching algorithm), this paper proposed a new high-performance reconfigurable VLSI architecture to support "meander"-like scan format for a high data reuse of search area. The architecture can support the three data flows of the scan format through a reconfigurable computing array and a memory of the search area. The computing array can achieve 100% processing element (PE) utilization and can reuse the smaller blocks' SADs to calculate 41 motion vectors (MVs) of a 16X16 block in parallel. The design is implemented with TSMC 0.18 mum CMOS technology. Under a clock frequency of 180 MHz, the architecture allows the real-time processing of 1280 x 720 at 45 fps in a search range [-16, +16].

...read moreread less

Journal Article•DOI•

Low-Complexity Link Microarchitecture for Mesochronous Communication in Networks-on-Chip

[...]

Francesco Vitullo¹, Nicola E. L'Insalata¹, Esa Petri¹, Sergio Saponara¹, Luca Fanucci¹, Michele Casula¹, Riccardo Locatelli², Marcello Coppola² - Show less +4 more•Institutions (2)

University of Pisa¹, STMicroelectronics²

01 Sep 2008-IEEE Transactions on Computers

TL;DR: This work presents a low-complexity link microarchitecture for mesochronous on-chip communication that enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds.

...read moreread less

Abstract: Clock distribution is an important issue when designing multi processor systems-on-chip on deep sub-micron technology nodes and non-synchronous approaches are becoming popular in this field. This work presents a low-complexity link microarchitecture for mesochronous on-chip communication that enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. With respect to the state of the art, the proposed link architecture stands for its low power and low complexity overheads; moreover it can be easily integrated in a conventional digital design flow since it is implemented by means of standard cells only. Results are presented referring to the link integrated within a multi processor tiled architecture based on a network-on-chip communication backbone on a CMOS 65 nm technology.

...read moreread less

Journal Article•DOI•

Superpipelined high-performance optical-flow computation architecture

[...]

Javier Diaz¹, Eduardo Ros¹, Rodrigo Agís¹, José Luis Bernier¹•Institutions (1)

University of Granada¹

01 Dec 2008-Computer Vision and Image Understanding

TL;DR: This work describes a novel superpipelined, fully parallelized architecture for optical-flow processing, which is capable of processing up to 170 frames per second at a resolution of 800x600 pixels, and discusses the advantages of high-frame-rate processing.

...read moreread less

Journal Article•DOI•

A Speed-Optimized Systolic Array Processor Architecture for Spatio-Temporal 2-D IIR Broadband Beam Filters

[...]

H.L.P. Arjuna Madanayake¹, Len T. Bruton¹•Institutions (1)

University of Calgary¹

08 Feb 2008-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A novel high-speed systolic array architecture for a first-order 2-D broadband frequency-planar spatio-temporal beam filter is proposed and employs a field-programmable gate array (FPGA) circuit where the critical path latency is minimized by timing optimization of inter- and intra-parallel processor pipelines, together with 3-D look-ahead techniques.

...read moreread less

Abstract: For high-speed plane-wave filtering applications, real-time 2-D spatio-temporal linear-array broadband beam filters are required, operating at temporal frame rates in excess of hundreds of megahertz. The corresponding application specific VLSI circuits must have low critical-path latencies. A novel high-speed systolic array architecture for a first-order 2-D broadband frequency-planar spatio-temporal beam filter is proposed for this purpose and employs a field-programmable gate array (FPGA) circuit where the critical path latency is minimized by timing optimization of inter- and intra-parallel processor pipelines, together with 3-D look-ahead techniques. The method facilitates single-chip VLSI circuit implementations operating at real-time frame rates of several hundred megahertz.

...read moreread less

Journal Article•DOI•

A Power-Delay Efficient Hybrid Carry-Lookahead/Carry-Select Based Redundant Binary to Two's Complement Converter

[...]

Yajuan He¹, Chip-Hong Chang¹•Institutions (1)

Nanyang Technological University¹

14 Mar 2008-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: An efficient reverse converter for transforming the redundant binary representation into two's complement form that expends at least two times less energy than the competitor circuit and is capable of completing a 64-bit conversion in 829 ps and dissipates merely 5.84 mW.

...read moreread less

Abstract: This paper presents an efficient reverse converter for transforming the redundant binary (RB) representation into two's complement form. The hierarchical expansion of the carry equation for the reverse conversion algorithm creates a regular multilevel structure, from which a high-speed hybrid carry-lookahead/carry-select (CLA/CSL) architecture is proposed to fully exploit the redundancy of RB encoding for VLSI efficient implementation. The optimally designed CSL sections interleaved evenly in the mixed-radix CLA network to boost the performance of the reverse converter well above those designed based on a homogeneous type of carry propagation adder. The logical effort characterization captures the effect of circuit's fan-in, fan-out and transistor sizing on performance, and the evaluation shows that our proposed architecture leads to the fastest design. A 64-bit transistor-level circuit implementation of our proposed reverse converter and that of its most competitive contender were simulated to validate the logical effort delay model. The pre- and post-layout HSPICE simulation results reveal that our new converter expends at least two times less energy (power-delay product) than the competitor circuit and is capable of completing a 64-bit conversion in 829 ps and dissipates merely 5.84 mW at a data rate of 1 GHz and a supply voltage of 1.8 V in TSMC 0.18-mum CMOS technology.

...read moreread less

Proceedings Article•DOI•

Guaranteed stable projection-based model reduction for indefinite and unstable linear systems

[...]

Bradley N. Bond¹, Luca Daniel¹•Institutions (1)

Massachusetts Institute of Technology¹

10 Nov 2008

TL;DR: This work presents a stability-preserving projection framework for model reduction of linear systems that can create accurate stable and passive models of arbitrary indefinite systems at a significantly cheaper cost than existing methods such as balanced truncation.

...read moreread less

Abstract: In this work we present a stability-preserving projection framework for model reduction of linear systems. Specifically, given one projection matrix (e.g. a right-projection matrix), we derive a set of linear constraints for the other projection matrix (e.g. the left-projection matrix) resulting in a projection framework that is guaranteed to generate a stable reduced model. Several efficient techniques for solving the proposed system of constraints are presented, including an optimization problem formulation for finding the optimal stabilizing projection, and a formulation with computational complexity independent of the size of the original system. The resulting algorithms can create accurate stable and passive models of arbitrary indefinite systems at a significantly cheaper cost than existing methods such as balanced truncation. Nevertheless, our algorithms integrate fully and effortlessly with most of the available standard model order reduction approaches for very large systems generated in VLSI applications (such as moment-matching methods, POD, or poor manpsilas TBR), which can guarantee stability and passivity only in very specialized cases. Our algorithms have been tested on a large variety of typical VLSI applications, including field-solver-extracted models of RF inductors for analog applications, power distribution grids for large VLSI digital integrated circuits, and MEMS devices for sensing and actuation applications. The results have been successfully compared to those from existing and much more expensive stabilizing reduction techniques.

...read moreread less

Journal Article•DOI•

Liquid crystal holographic configurations for optically reconfigurable gate arrays

[...]

Naoki Yamaguchi¹, Minoru Watanabe¹•Institutions (1)

Shizuoka University¹

10 Sep 2008-Applied Optics

TL;DR: An optically reconfigurable gate array (ORGA) system, which consists of an ORGA very large scale integration (VLSI), an easily rewritable liquid crystal holographic memory recording four configuration contexts, and a laser array, is proposed.

...read moreread less

Abstract: An optically reconfigurable gate array (ORGA) system, which consists of an ORGA very large scale integration (VLSI), an easily rewritable liquid crystal holographic memory recording four configuration contexts, and a laser array, is proposed. Circuits on a gate array of the ORGA-VLSI can be programmed rapidly by exploiting large parallel connections between a holographic memory and a gate array VLSI; that programming can be executed even as it is being programmed. Consequently, the gate array can be switched from a certain circuit to another circuit instantaneously. We present a demonstration of the ORGA system and experimental results.

...read moreread less

Journal Article•DOI•

Fast Low-Cost Implementation of Single-Clock-Cycle Binary Comparator

[...]

Stefania Perri¹, Pasquale Corsonello¹•Institutions (1)

University of Calabria¹

22 Dec 2008-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: Comparison with the fastest comparator known in the literature demonstrates that, at a parity of technology used, the novel architecture is ~ 12% faster and requires ~ 69% less transistors.

...read moreread less

Abstract: This paper presents a new efficient architecture for the design of fast low-cost single-clock-cycle binary comparators. The proposed 64-bit circuit requires only 1051 transistors and, when implemented by using the ST 90-nm 1-V CMOS technology, it exhibits a running frequency higher than 4 GHz with an average power dissipation of only ~ 4 mW. Comparison with the fastest comparator known in the literature demonstrates that, at a parity of technology used, the novel architecture is ~ 12% faster and requires ~ 69% less transistors.

...read moreread less

Journal Article•DOI•

Monolithic Integration of CMOS VLSI and Carbon Nanotubes for Hybrid Nanotechnology Applications

[...]

Deji Akinwande¹, Shinichi Yasuda², Bipul C. Paul², Shinobu Fujita², G.F. Close¹, Hon-Sum Philip Wong¹ - Show less +2 more•Institutions (2)

Stanford University¹, Toshiba²

01 Sep 2008-IEEE Transactions on Nanotechnology

TL;DR: In this paper, the authors integrate carbon nanotube fabrication with standard commercial CMOS very large scale integration on a single substrate suitable for emerging hybrid nanotechnology applications, such as optical, biological, chemical, and gas sensors.

...read moreread less

Abstract: We integrate carbon nanotube (CNT) fabrication with standard commercial CMOS very large scale integration on a single substrate suitable for emerging hybrid nanotechnology applications. This cointegration combines the inherent advantages of CMOS and CNTs. These emerging applications include CNT optical, biological, chemical, and gas sensors that require complex CMOS electronics for sensor control, calibration, and signal processing. We demonstrate the successful cointegration on a single chip with a vehicle circuit, a two-transistor cascode megahertz amplifier utilizing both silicon n-channel MOSFET and CNT transistors with a total power consumption of 62.5 muW.

...read moreread less

Journal Article•DOI•

Low-Power VLSI Implementation of the Inner Receiver for OFDM-Based WLAN Systems

[...]

Alfonso Troya, Koushik Maharatna¹, Milos Krstic¹, Eckhard Grass¹, Ulrich Jagdhold¹, Rolf Kraemer¹ - Show less +2 more•Institutions (1)

Innovations for High Performance Microelectronics¹

14 Mar 2008-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: Low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems are proposed and the use of multiple clock domains and clock gating reduces the power consumption.

...read moreread less

Abstract: In this paper, we propose low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems. The objective of the work is the optimization, with respect to power, area, and latency, of both the signal processing algorithms themselves and their implementation. Novel circuit design strategies have been employed to realize optimal hardware and power efficient architectures for the fast Fourier transform, arc tangent computation unit, numerically controlled oscillator, and the decimation filters. The use of multiple clock domains and clock gating reduces the power consumption further. These blocks have been integrated into an experimental digital baseband processor for the IEEE 802.11a standard implemented in the 0.25mum- 5-metal layer BiCMOS technology from Institute for High Performance Microelectronics.

...read moreread less

Collapse