Showing papers on "Very-large-scale integration published in 2009"

PDF

Open Access

Journal Article•DOI•

Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation

[...]

B. Feero, Partha Pratim Pande¹•Institutions (1)

01 Jan 2009-IEEE Transactions on Computers

TL;DR: 3D NoC architectures are evaluated and demonstrate their superior functionality in terms of throughput, latency, energy dissipation and wiring area overhead compared to traditional 2D implementations.

...read moreread less

Abstract: The Network-on-Chip (NoC) paradigm has emerged as a revolutionary methodology for integrating a very high number of intellectual property (IP) blocks in a single die. The achievable performance benefit arising out of adopting NoCs is constrained by the performance limitation imposed by the metal wire, which is the physical realization of communication channels. With technology scaling, only depending on the material innovation will extend the lifetime of conventional interconnect systems a few technology generations. According to International Technology Roadmap for Semiconductors (ITRS) for the longer term, new interconnect paradigms are in need. The conventional two dimensional (2D) integrated circuit (IC) has limited floor-planning choices, and consequently it limits the performance enhancements arising out of NoC architectures. Three dimensional (3D) ICs are capable of achieving better performance, functionality, and packaging density compared to more traditional planar ICs. On the other hand, NoC is an enabling solution for integrating large numbers of embedded cores in a single die. 3D NoC architectures combine the benefits of these two new domains to offer an unprecedented performance gain. In this paper we evaluate the performance of 3D NoC architectures and demonstrate their superior functionality in terms of throughput, latency, energy dissipation and wiring area overhead compared to traditional 2D implementations.

...read moreread less

474 citations

Journal Article•DOI•

Real-Time Classification of Complex Patterns Using Spike-Based Learning in Neuromorphic VLSI

[...]

Srinjoy Mitra¹, Stefano Fusi¹, Giacomo Indiveri¹•Institutions (1)

University of Zurich¹

01 Feb 2009-IEEE Transactions on Biomedical Circuits and Systems

TL;DR: Experimental data is presented that demonstrate how the VLSI neural network can learn to classify patterns of neural activities, also in the case in which they are highly correlated.

...read moreread less

Abstract: Real-time classification of patterns of spike trains is a difficult computational problem that both natural and artificial networks of spiking neurons are confronted with. The solution to this problem not only could contribute to understanding the fundamental mechanisms of computation used in the biological brain, but could also lead to efficient hardware implementations of a wide range of applications ranging from autonomous sensory-motor systems to brain-machine interfaces. Here we demonstrate real-time classification of complex patterns of mean firing rates, using a VLSI network of spiking neurons and dynamic synapses which implement a robust spike-driven plasticity mechanism. The learning rule implemented is a supervised one: a teacher signal provides the output neuron with an extra input spike-train during training, in parallel to the spike-trains that represent the input pattern. The teacher signal simply indicates if the neuron should respond to the input pattern with a high rate or with a low one. The learning mechanism modifies the synaptic weights only as long as the current generated by all the stimulated plastic synapses does not match the output desired by the teacher, as in the perceptron learning rule. We describe the implementation of this learning mechanism and present experimental data that demonstrate how the VLSI neural network can learn to classify patterns of neural activities, also in the case in which they are highly correlated.

...read moreread less

228 citations

Book•

Electronic Design Automation: Synthesis, Verification, and Test

[...]

Laung-Terng Wang, Yao-Wen Chang, Kwang-Ting Cheng¹•Institutions (1)

University of California, Santa Barbara¹

11 Mar 2009

TL;DR: EDA/VLSI practitioners and researchers in need of fluency in an "adjacent" field will find this an invaluable reference to the basic EDA concepts, principles, data structures, algorithms, and architectures for the design, verification, and test of VLSI circuits.

...read moreread less

Abstract: This book provides broad and comprehensive coverage of the entire EDA flow. EDA/VLSI practitioners and researchers in need of fluency in an "adjacent" field will find this an invaluable reference to the basic EDA concepts, principles, data structures, algorithms, and architectures for the design, verification, and test of VLSI circuits. Anyone who needs to learn the concepts, principles, data structures, algorithms, and architectures of the EDA flow will benefit from this book. Covers complete spectrum of the EDA flow, from ESL design modeling to logic/test synthesis, verification, physical design, and test - helps EDA newcomers to get "up-and-running" quickly Includes comprehensive coverage of EDA concepts, principles, data structures, algorithms, and architectures - helps all readers improve their VLSI design competence Contains latest advancements not yet available in other books, including Test compression, ESL design modeling, large-scale floorplanning, placement, routing, synthesis of clock and power/ground networks - helps readers to design/develop testable chips or products Includes industry best-practices wherever appropriate in most chapters - helps readers avoid costly mistakes Table of Contents Chapter 1: Introduction Chapter 2: Fundamentals of CMOS Design Chapter 3: Design for Testability Chapter 4: Fundamentals of Algorithms Chapter 5: Electronic System-Level Design and High-Level Synthesis Chapter 6: Logic Synthesis in a Nutshell Chapter 7: Test Synthesis Chapter 8: Logic and Circuit Simulation Chapter 9:?Functional Verification Chapter 10: Floorplanning Chapter 11: Placement Chapter 12: Global and Detailed Routing Chapter 13: Synthesis of Clock and Power/Ground Networks Chapter 14: Fault Simulation and Test Generation.

...read moreread less

200 citations

An enhanced low-power high-speed Adder For Error-Tolerant application

[...]

Ning Zhu¹, Wang Ling Goh¹, Kiat Seng Yeo¹•Institutions (1)

Nanyang Technological University¹

01 Dec 2009

TL;DR: In this article, a novel error-tolerant adversary, named the Error-Tolerant Adder (ETAII), has been proposed to overcome all possible errors in modern VLSI technology.

...read moreread less

Abstract: The occurrence of errors are inevitable in modern VLSI technology and to overcome all possible errors is an expensive task. It not only consumes a lot of power but degrades the speed performance. By adopting an emerging concept in VLSI design and test—Error- Tolerance (ET), we managed to develop a novel Error-Tolerant Adder which we named the Type II (ETAII). The circuit to some extent is able to ease the strict restriction on accuracy to achieve tremendous improvements in both the power consumption and speed performance. When compared to its conventional counterparts, the proposed ETAII is able to achieve more than 60% improvement in the Power-Delay Product (PDP). The proposed ETAII is an enhancement of our earlier design, the ETAI, which has problem adding small number inputs.

...read moreread less

173 citations

Proceedings Article•DOI•

A current-mode conductance-based silicon neuron for address-event neuromorphic systems

[...]

Paolo Livi¹, Giacomo Indiveri²•Institutions (2)

ETH Zurich¹, University of Zurich²

24 May 2009

TL;DR: This work presents a current-mode conductancebased neuron circuit, with spike-frequency adaptation, refractory period, and bio-physically realistic dynamics which is compact, low-power and compatible with fast asynchronous digital circuits.

...read moreread less

Abstract: Silicon neuron circuits emulate the electrophysiological behavior of real neurons. Many circuits can be integrated on a single Very Large Scale Integration (VLSI) device, and form large networks of spiking neurons. Connectivity among neurons can be achieved by using time multiplexing and fast asynchronous digital circuits. As the basic characteristics of the silicon neurons are determined at design time, and cannot be changed after the chip is fabricated, it is crucial to implement a circuit which represents an accurate model of real neurons, but at the same time is compact, low-power and compatible with asynchronous logic. Here we present a current-mode conductancebased neuron circuit, with spike-frequency adaptation, refractory period, and bio-physically realistic dynamics which is compact, low-power and compatible with fast asynchronous digital circuits.

...read moreread less

128 citations

Journal Article•DOI•

High-Throughput Layered LDPC Decoding Architecture

[...]

Zhiqiang Cui¹, Zhongfeng Wang¹, Youjian Liu²•Institutions (2)

Oregon State University¹, University of Colorado Boulder²

01 Apr 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper presents a high-throughput decoder architecture for generic quasi-cyclic low-density parity-check (QC-LDPC) codes and an approximate layered decoding approach is explored to reduce the critical path of the layered LDPC decoder.

...read moreread less

Abstract: This paper presents a high-throughput decoder architecture for generic quasi-cyclic low-density parity-check (QC-LDPC) codes. Various optimizations are employed to increase the clock speed. A row permutation scheme is proposed to significantly simplify the implementation of the shuffle network in LDPC decoder. An approximate layered decoding approach is explored to reduce the critical path of the layered LDPC decoder. The computation core is further optimized to reduce the computation delay. It is estimated that 4.7 Gb/s decoding throughput can be achieved at 15 iterations using the current technology.

...read moreread less

125 citations

Book•DOI•

Embedded Memories for Nano-Scale VLSIs

[...]

Kevin Zhang

08 May 2009

TL;DR: This book provides readers a broad knowledge on the entire embedded memory technologies in order to better comprehend the technologies and create optimal memory solutions in real applications.

...read moreread less

Abstract: The book provides a comprehensive and in-depth view on the state-of-the-art embedded memory technologies The book helps practicing engineers grasp key technology attributes and advanced design techniques in nano-scale VLSI design It also helps them make decisions concerning the right design tradeoffs in real product development This book first provides an overview on the landscape and trend of embedded memory in various VLSI system designs, including high-performance microprocessor, low-power mobile handheld devices, micro-controllers, and various consumer electronics It then shows an in-depth view on each different type of embedded memory technology, including high-speed SRAM, ultra-low-voltage and alternative SRAM, embedded DRAM, embedded nonvolatile memory, and emerging or so-called universal memories such as FeRAM, MRAM, and PRAM Each topic covers all the key technology attributes from a product application perspective, ranging from technology scaling challenges to advanced circuit techniques for achieving optimal design tradeoff in performance and power As VLSI systems become increasingly dependent on on-die memory to provide adequate memory bandwidth for various applications, the book gives readers a broader view of this important field and helps them to achieve their optimal design goals for different applications This book provides readers a broad knowledge on the entire embedded memory technologies in order to better comprehend the technologies and create optimal memory solutions in real applications

...read moreread less

94 citations

Area and energy efficient VLSI architectures for low-density parity-check decoders using an on-the-fly computation

[...]

Kiran Gunnam

15 May 2009

TL;DR: This dissertation presents the decoder architectures for regular and irregular LDPC codes that provide substantial gains over existing academic and commercial implementations and utilize an on-the-fly computation paradigm which permits scheduling of the computations in a way that the memory requirements and re-computations are reduced.

...read moreread less

Abstract: Area and Energy Efficient VLSI Architectures for Low -Density Parity-Check Decoders Using an On-the-Fly Computation. (December 2006) Kiran Kumar Gunnam, M.S., Texas A&M University Co-Chairs of Advisory Committee: Dr. Gwan Choi Dr. Scott Miller The VLSI implementation complexity of a low density parity check (LDPC) decoder is largely influenced by the interconnect and the storage requirements. This dissertation presents the decoder architectures for regular and irregular LDPC codes that provide substantial gains over existing academic and commercial implementations. Several structured properties of LDPC codes and decoding algorithms are observed and are used to construct hardware implementation with reduced processing complexity. The proposed architectures utilize an on-the-fly computation paradigm which permits scheduling of the computations in a way that the memory requirements and re-computations are reduced. Using this paradigm, the run-time configurable and multi-rate VLSI architectures for the rate compatible array LDPC codes and irregular block LDPC codes are designed. Rate compatible array codes are considered for DSL applications. Irregular block LDPC codes are proposed for IEEE 802.16e, IEEE 802.11n, and IEEE 802.20. When compared with a recent implementation of an 802.11n LDPC decoder, the proposed decoder reduces the logic complexity by 6.45x and memory complexity by 2x for a given data throughput. When compared to the latest reported multi-rate decoders, this decoder design has an area

...read moreread less

85 citations

Journal Article•DOI•

HDTV1080p H.264/AVC Encoder Chip Design and Performance Analysis

[...]

Zhenyu Liu¹, Yang Song², Ming Shao, Shen Li³, Lingfeng Li, Shunichi Ishiwata³, Michio Nakagawa³, Satoshi Goto¹, Takeshi Ikenaga¹ - Show less +5 more•Institutions (3)

Waseda University¹, Fujitsu², Toshiba³

27 Jan 2009-IEEE Journal of Solid-state Circuits

TL;DR: A H.264/AVC baseline-profile real-time encoder for HDTV-1080p at 30 fps is proposed in this paper and the design considerations for chief components, including high throughput integer motion estimation, data reusing fractionalmotion estimation, and hardware friendly mode reduction for intra prediction are described.

...read moreread less

Abstract: A H.264/AVC baseline-profile real-time encoder for HDTV-1080p at 30 fps is proposed in this paper. On the basis of the specifications and algorithm optimizations, the dedicated hardware engines and one 32-bit media embedded processor (MeP) equipped with hardware extensions are mapped into the three-stage macroblock pipelining system architecture. This paper describes the design considerations for chief components, including high throughput integer motion estimation, data reusing fractional motion estimation, and hardware friendly mode reduction for intra prediction. The 11.5 Gbps 64 Mb system-in-silicon DRAM is embedded to alleviate the external memory bandwidth. Using TSMC one-poly six-metal 0.18 mum CMOS technology, the prototype chip is implemented with 1140 k logic gates and 108.3 KB internal SRAM. The SoC core occupies 27.1 mm2 die area and consumes 1.41 W at 200 MHz execution speed in typical work conditions.

...read moreread less

82 citations

Proceedings Article•DOI•

Fast circuit simulation on graphics processing units

[...]

Kanupriya Gulati¹, John F. Croix, Sunil P. Khatri¹, Rahm Shastry•Institutions (1)

Texas A&M University¹

19 Jan 2009

TL;DR: This paper reports on early efforts to accelerate transistor model evaluations using a Graphics Processing Unit (GPU) and integrated this accelerator with a commercial fast SPICE tool, and demonstrates that significant speedups can be obtained.

...read moreread less

Abstract: SPICE based circuit simulation is a traditional workhorse in the VLSI design process. Given the pivotal role of SPICE in the IC design flow, there has been significant interest in accelerating SPICE. Since a large fraction (on average 75%) of the SPICE runtime is spent in evaluating transistor model equations, a significant speedup can be availed if these evaluations are accelerated. This paper reports on our early efforts to accelerate transistor model evaluations using a Graphics Processing Unit (GPU). We have integrated this accelerator with a commercial fast SPICE tool. Our experiments demonstrate that significant speedups (2.36× on average) can be obtained. The asymptotic speedup that can be obtained is about 4×. We demonstrate that with circuits consisting of as few as about 1000 transistors, speedups in the neighborhood of this asymptotic value can be obtained. By utilizing the recently announced (but not currently available) quad GPU systems, this speedup could be enhanced further, especially for larger designs.

...read moreread less

80 citations

Proceedings Article•DOI•

An ILP formulation for application mapping onto Network-on-Chips

[...]

Suleyman Tosun¹, Ozcan Ozturk², Meltem Ozen¹•Institutions (2)

Ankara University¹, Bilkent University²

31 Dec 2009

TL;DR: An Integer Linear Programming (ILP) formulation for application mapping onto mesh based Network-on-Chips to minimize the energy consumption of the system and experimentally investigate the impact of the size of the mesh architecture on the application mapping and total communication.

...read moreread less

Abstract: Ever shrinking technologies in VLSI era made it possible to place several modules onto a single die. However, the need for the new communication methods has also increased dramatically since traditional bus-based systems suffer from signal propagation delays, signal integrity, and scalability. Network-on-Chip (NoC) is the biggest step towards the communication bottleneck of System-on-Chip (SoC) architectures. In this paper, we present an Integer Linear Programming (ILP) formulation for application mapping onto mesh based Network-on-Chips to minimize the energy consumption of the system. The proposed method obtains optimal or close to optimal results within the given computation time limit. We also experimentally investigate the impact of the size of the mesh architecture on the application mapping and total communication.

...read moreread less

Journal Article•DOI•

Probabilistic Analysis and Design of Metallic-Carbon-Nanotube-Tolerant Digital Logic Circuits

[...]

Jie Zhang¹, Nishant Patil¹, Subhasish Mitra¹•Institutions (1)

Stanford University¹

01 Sep 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A probabilistic model is presented which incorporates processing and design parameters and enables quantitative analysis of the impact of metallic CNTs on leakage, noise margin, and delay variations of CNFET-based digital logic circuits and provides design and processing guidelines for very large scale integration (VLSI)-scale metallic-CNT-tolerant digital circuits.

...read moreread less

Abstract: Metallic carbon nanotubes (CNTs) pose a major barrier to the design of digital logic circuits using CNT field-effect transistors (CNFETs). Metallic CNTs create source to drain shorts in CNFETs, resulting in undesirable effects such as excessive leakage and degraded noise margins. No known CNT growth technique guarantees 0% metallic CNTs. Therefore, special processing techniques are required for removing metallic CNTs after CNT growth. This paper presents a probabilistic model which incorporates processing and design parameters and enables quantitative analysis of the impact of metallic CNTs on leakage, noise margin, and delay variations of CNFET-based digital logic circuits. With practical constraints on these key circuit performance metrics, the model provides design and processing guidelines that are required for very large scale integration (VLSI)-scale metallic-CNT-tolerant digital circuits.

...read moreread less

Journal Article•DOI•

VLSI Implementation of an Edge-Oriented Image Scaling Processor

[...]

Pei-Yin Chen¹, Chih-Yuan Lien¹, Chi-Pin Lu¹•Institutions (1)

National Cheng Kung University¹

01 Sep 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An edge-oriented area-pixel scaling processor implemented with a low-complexity VLSI architecture to achieve the goal of low cost and performs better in terms of both quantitative evaluation and visual quality.

...read moreread less

Abstract: Image scaling is a very important technique and has been widely used in many image processing applications. In this paper, we present an edge-oriented area-pixel scaling processor. To achieve the goal of low cost, the area-pixel scaling technique is implemented with a low-complexity VLSI architecture in our design. A simple edge catching technique is adopted to preserve the image edge features effectively so as to achieve better image quality. Compared with the previous low-complexity techniques, our method performs better in terms of both quantitative evaluation and visual quality. The seven-stage VLSI architecture of our image scaling processor contains 10.4-K gate counts and yields a processing rate of about 200 MHz by using TSMC 0.18-mum technology.

...read moreread less

Proceedings Article•DOI•

Low-Power Low-Voltage Analog Circuit Design Using Hierarchical Particle Swarm Optimization

[...]

Rajesh A. Thakker¹, M. Shojaei Baghini¹, Mahesh B. Patil¹•Institutions (1)

Indian Institute of Technology Bombay¹

05 Jan 2009

TL;DR: This paper presents application and effectiveness of Hierarchical particle swarm optimization (HPSO) algorithm for automatic sizing of low-power analog circuits and shows that HPSO algorithm converges to a better solution, compared to PSO and GA.

...read moreread less

Abstract: This paper presents application and effectiveness of Hierarchical particle swarm optimization (HPSO) algorithm for automatic sizing of low-power analog circuits. For the purpose of comparison, circuits are also designed using PSO and Genetic Algorithm (GA). CMOS technologies from 0.35 µm down to 0.13 µm are used. PVT (process, voltage, temperature) variations are considered during the design of circuits. We show that HPSO algorithm converges to a better solution, compared to PSO and GA. For CMOS Miller OTA, even performance of the circuit designed by HPSO algorithm is better than the performance of recently reported manually designed circuit. For the first time, design of this OTA, in 0.4 V supply voltage, is also presented. For this new design, HPSO algorithm has taken 23.5 minutes of CPU time on a Sun system with1.2 GHz processor and 8 GB RAM.

...read moreread less

Exploration of energy efficient design methodologies: High performance VLSI adders

[...]

Dursun Baran

01 Jan 2009

TL;DR: By applying energy-delay tradeoffs on various levels, adder topology is developed yielding up to 20% performance improvement and 4.5× energy reduction over existing designs.

...read moreread less

Book•

Analysis and Design of Resilient VLSI Circuits: Mitigating Soft Errors and Process Variations

[...]

Rajesh Garg, Sunil P. Khatri

22 Oct 2009

TL;DR: This book presents algorithms to analyze the effects of radiation particle strikes and processing variations on the electrical behavior of VLSI circuits and circuit design techniques to mitigate the impact of these problems.

...read moreread less

Abstract: This book describes the design of resilient VLSI circuits. VLSI design has become more challenging recently, due to the detrimental effects of radiation particle strikes and processing variations. This book presents algorithms to analyze the effects of these issues on the electrical behavior of VLSI circuits and circuit design techniques to mitigate the impact of these problems.

...read moreread less

Performance Analysis of 32-Bit Array Multiplier with a Carry Save Adder and with a Carry-Look- Ahead Adder

[...]

Raminder Preet, Pal Singh, Parveen Kumar, Balwinder Singh, Sri Sai, Dept Ece - Show less +2 more

01 Jan 2009

TL;DR: Design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines.

...read moreread less

Abstract: In this paper, design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines. The multipliers presented in this paper were all modeled using VHDL (Very High Speed Integration Hardware Description Language) for 32-bit unsigned data. The comparison is done on the basis of three performance parameters i.e. Area, Speed and Power consumption. To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. Previously in the literature, performance analysis was carried out between multiplier using Ripple carry adder (RCA) and by using CLA. In this work, same multiplier is designed by using CSA logic and compare it's performance with the multiplier designed by using CLA logic. Multiplier with CSA gives better result in terms of speed (78.3% improvement), area (reduced by 4.2%) and power consumption (decreased by 1.4%).

...read moreread less

Journal Article•DOI•

Architectural Optimizations for Low-Power $K$ -Best MIMO Decoders

[...]

S. Mondal¹, Ahmed M. Eltawil², Khaled N. Salama•Institutions (2)

Cypress Semiconductor¹, University of California, Irvine²

16 Mar 2009-IEEE Transactions on Vehicular Technology

TL;DR: A modified approach for MIMO detection is proposed, which takes advantage of the quadratic-amplitude modulation (QAM) constellation structure to accelerate the detection procedure and achieves low-power operation by extending the minimum number of paths and reducing the number of required computations for each path extension.

...read moreread less

Abstract: Maximum-likelihood (ML) detection for higher order multiple-input-multiple-output (MIMO) systems faces a major challenge in computational complexity. This limits the practicality of these systems from an implementation point of view, particularly for mobile battery-operated devices. In this paper, we propose a modified approach for MIMO detection, which takes advantage of the quadratic-amplitude modulation (QAM) constellation structure to accelerate the detection procedure. This approach achieves low-power operation by extending the minimum number of paths and reducing the number of required computations for each path extension, which results in an order-of-magnitude reduction in computations in comparison with existing algorithms. This paper also describes the very-large-scale integration (VLSI) design of the low-power path metric computation unit. The approach is applied to a 4times4, 64-QAM MIMO detector system. Results show negligible performance degradation compared with conventional algorithms while reducing the complexity by more than 50%.

...read moreread less

Proceedings Article•DOI•

An efficient implementation of 1-D median filter

[...]

Vasily G. Moshnyaga¹, Koji Hashimoto¹•Institutions (1)

Fukuoka University¹

15 Sep 2009

TL;DR: This paper presents a new architecture and circuit implementation of 1-D median filter that has linear hardware complexity, minimal latency and achieves throughput of 1/2 of the sampling rate.

...read moreread less

Abstract: This paper presents a new architecture and circuit implementation of 1-D median filter. The proposed circuit belongs to the class of non-recursive sorting network architectures that process the input samples sequentially in the word-based manner. In comparison to the related schemes, it maintains sorting of samples from the previous position of the sliding window, positioning only the incoming sample to the correct rank. Unlike existing 1-D filter implementations, the circuit has linear hardware complexity, minimal latency and achieves throughput of 1/2 of the sampling rate. Experimental evaluation and comparisons show high efficiency of our design.

...read moreread less

Proceedings Article•DOI•

Quaternary CMOS Combinational Logic Circuits

[...]

K S Vasundara Patel¹, K. S. Gurumurthy²•Institutions (2)

B.M.S. College of Engineering¹, University Visvesvaraya College of Engineering²

16 Dec 2009

TL;DR: Voltage mode quaternary CMOS circuit design using 90nm technology is presented, suitable to be implemented in classical CMOS VLSI technology.

...read moreread less

Abstract: Good Characteristics and advantages of multi-valued logic (MVL) electronic systems and circuits are created great interest for its practical implementation. This paper presents voltage mode quaternary CMOS circuit design using 90nm technology. Basic gates such as quaternary inverter, NMAX, NMIN and Quaternary multiplexer are designed and simulated. Low power consumption of 14 µ W is observed at 2.2GHz with 1.2 V power supply. Circuits are verified using HSPICE simulations. The circuits described here are also suitable to be implemented in classical CMOS VLSI technology.

...read moreread less

Proceedings Article•DOI•

GPU-based parallelization for fast circuit optimization

[...]

Yifang Liu¹, Jiang Hu¹•Institutions (1)

Texas A&M University¹

26 Jul 2009

TL;DR: This work proposes GPU-based parallel computing techniques and applies them on simultaneous gate sizing and threshold voltage assignment for accelerating VLSI circuit optimization, aimed to fully utilize the benefits of GPU through efficient task scheduling and memory organization.

...read moreread less

Abstract: The progress of GPU (Graphics Processing Unit) technology opens a new avenue for boosting computing power. This work is an attempt to exploit GPU for accelerating VLSI circuit optimization. We propose GPU-based parallel computing techniques and apply them on simultaneous gate sizing and threshold voltage assignment, which is often employed in practice for performance and power optimization. These techniques are aimed to fully utilize the benefits of GPU through efficient task scheduling and memory organization. Compared to conventional sequential computation, our techniques can provide up to 56× speedup without any sacrifice on solution quality.

...read moreread less

Book•

Novel Algorithms for Fast Statistical Analysis of Scaled Circuits

[...]

Rob A. Rutenbar¹, Amith Singhee¹•Institutions (1)

Carnegie Mellon University¹

16 Aug 2009

TL;DR: SiLVR is a nonlinear response surface modeling (RSM) and performance-driven dimensionality reduction strategy, that uses the concepts of projection pursuit and latent variable regression to obtain an absolute improvement in modeling error of up to 34%, over the best quadratic RSM method.

...read moreread less

Abstract: As VLSI technology moves to the nanometer scale for transistor feature sizes, the impact of manufacturing imperfections result in large variations in the circuit performance. Traditional CAD tools are not well-equipped to handle this scenario, since they do not model this statistical nature of the circuit parameters and performances, or if they do, the existing techniques tend to be over-simplified or intractably slow. We draw upon ideas for attacking parallel problems in other technical fields, such as computational finance, machine learning and hydrology, and synthesize them with innovative attacks for our problem domain of integrated circuits, to develop novel solutions to problems of efficient statistical analysis of circuits in the nanometer regime. In particular, this thesis makes three contributions: (1) SiLVR, a nonlinear response surface modeling (RSM) and performance-driven dimensionality reduction strategy, that uses the concepts of projection pursuit and latent variable regression to obtain an absolute improvement in modeling error of up to 34%, over the best quadratic RSM method. SiLVR also captures the designer's insight into the circuit behavior, by automatically extracting quantitative measures of relative global sensitivities and nonlinear correlation. (2) Fast Monte Carlo simulation of circuits using quasi-Monte Carlo, showing speedups of 2× to 50× over standard Monte Carlo. (3) Statistical blockade, an efficient method for sampling rare events and estimating their probability distribution using limit results from extreme value theory, applied to high replication circuits like SRAM cells.

...read moreread less

Journal Article•DOI•

A novel VLSI architecture for full-search variable block-size motion estimation

[...]

Jin-Wook Kim¹, Taegeun Park¹•Institutions (1)

Catholic University of Korea¹

01 May 2009-IEEE Transactions on Consumer Electronics

TL;DR: This paper proposes a scalable VLSI architecture for VBSME in H.264/AVC based on a full-search motion estimation algorithm that shows higher throughput rate with less hardware.

...read moreread less

Abstract: Variable block-size motion estimation (VBSME) has become an important technique in H.264/AVC to improve video quality. In this paper, we propose a scalable VLSI architecture for VBSME in H.264/AVC based on a full-search motion estimation algorithm. A new scan order is introduced to re-use the sum of absolute differences (SAD) values of smaller sub-blocks on an "as-early-as-possible" basis, thus the complexity of the required hardware resources, such as registers, multiplexers, and controls is reduced. It also spreads the timing for the final SAD outputs so that the number of output buses is reduced. The architecture is flexible and scalable with regard to the size of the searching windows and PE arrays. Compared to the conventional approaches, the architecture shows higher throughput rate with less hardware. After logic synthesis using DongbuAnam 0.18 mum standard cell library, the number of gates is 39K (16 PEs) in two-input equivalent NAND gates and the maximum operating clock frequency is 416 MHz (256 fps@CIF).

...read moreread less

Journal Article•DOI•

Memory Reduction Methodology for Distributed-Arithmetic-Based DWT/IDWT Exploiting Data Symmetry

[...]

Amit Acharyya¹, Koushik Maharatna¹, Bashir M. Al-Hashimi¹, Steve R. Gunn¹•Institutions (1)

University of Southampton¹

01 Apr 2009-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: By exploiting the inherent symmetry of the discrete wavelet transform (DWT) algorithm and consequently storing only the nonrepetitive combinations of filter coefficients, the size of required memory can be significantly reduced.

...read moreread less

Abstract: In this brief, we show that by exploiting the inherent symmetry of the discrete wavelet transform (DWT) algorithm and consequently storing only the nonrepetitive combinations of filter coefficients, the size of required memory can be significantly reduced. Subsequently, a memory-efficient architecture for DWT/inverse DWT is proposed. It occupies 6.5-mm2 silicon area and consumes 46.8-muW power at 1 MHz for 1.2 V using 0.13-mum standard cell technology.

...read moreread less

Book•

Digital VLSI Chip Design with Cadence and Synopsys CAD Tools

[...]

Erik Brunvand

25 Feb 2009

TL;DR: Digital VLSI Chip Design with Cadence and Synopsys CAD Tools leads students through the complete process of building a ready-to-fabricate CMOS integrated circuit using popular commercial design software.

...read moreread less

Abstract: Digital VLSI Chip Design with Cadence and Synopsys CAD Tools leads students through the complete process of building a ready-to-fabricate CMOS integrated circuit using popular commercial design software. Detailed tutorials include step-by-step instructions and screen shots of tool windows and dialog boxes. This hands-on book is for use in conjunction with a primary textbook on digital VLSI.

...read moreread less

Proceedings Article•DOI•

Globally optimal time-multiplexing in inter-FPGA connections for accelerating multi-FPGA systems

[...]

Masato Inagi¹, Yasuhiro Takashima², Yuichi Nakamura³•Institutions (3)

Hiroshima City University¹, University of Kitakyushu², Core Laboratories³

29 Sep 2009

TL;DR: This paper extends an ILP-based optimization method of the inter-FPGA connections to improve the system performance and shows that the method improved the circuit performance on a 4- FPGA system by 26.4% compared with a conventional method, on average.

...read moreread less

Abstract: Multi-FPGA systems are widely used for rapid prototyping and logic verification of VLSIs. To implement a huge logic circuit in a multi-FPGA system, the circuit needs to be partitioned into multiple FPGAs. Because of the limited interconnection resources between FPGAs, time-multiplexed I/Os are used for inter-FPGA connections. Due to the large delay of time-multiplexed I/Os, inter-FPGA connections strongly affect the system performance. In this paper, we extend an ILP-based optimization method of the inter-FPGA connections to improve the system performance. Our method uses both a normal I/O and a time-multiplexed I/O, and decides whether each inter-FPGA signal is transferred by a time-multiplexed I/O or not. Our extended method improves the system performance considering the variation of the amount of interconnection resources, and the variation of the number of inter-FPGA signals, from an FPGA pair to another FPGA pair. Experiments showed that our method improved the circuit performance on a 4-FPGA system by 26.4% compared with a conventional method, on average.

...read moreread less

Proceedings Article•DOI•

VLSI Implementation of a 4×4 MIMO-OFDM transceiver with an 80-MHz channel bandwidth

[...]

Shingo Yoshizawa¹, Yoshikazu Miyanaga¹•Institutions (1)

Hokkaido University¹

24 May 2009

TL;DR: VLSI Implementation for a 4×4 multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) transceiver is described that targets 1-Gbps data transmission for next-generation wireless LAN systems and incorporates a minimum meansquare error MIMO detector that drastically shortens processing latency.

...read moreread less

Abstract: VLSI Implementation for a 4×4 multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) transceiver is described that targets 1-Gbps data transmission for next-generation wireless LAN systems. The IEEE802.11 Very High Throughput (VHT) Study Group concluded that a signal bandwidth of more than 80 MHz is needed to achieve 1-Gbps throughput in the MAC layer. The proposed architecture is suitable for VLSI implementation that meets this specification and enables real-time processing in a 4×4 MIMO-OFDM configuration. It incorporates a minimum meansquare error (MMSE) MIMO detector that drastically shortens processing latency. Evaluation of a MIMO-OFDM transceiver implemented in CMOS with 128, 256, or 512 OFDM subcarriers showed that the power dissipation ranged from 451 to 577 mW.

...read moreread less

Journal Article•

New Design Methodologies for High Speed Low Power XOR-XNOR Circuits

[...]

Shiv Shankar Mishra, Subodh Wairya, Rajendra Kumar Nagaria, Sudarshan Tiwari

26 Jul 2009-World Academy of Science, Engineering and Technology, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering

TL;DR: This paper evaluates and compares the performance of various XOR-XNOR circuits based on TSMC 0.18µm process models and reveals that the proposed circuit exhibit lower PDP and EDP, more power efficient and faster when compared with best available Xor-X NOR circuits in the literature.

...read moreread less

Abstract: New methodologies for XOR-XNOR circuits are proposed to improve the speed and power as these circuits are basic building blocks of many arithmetic circuits. This paper evaluates and compares the performance of various XOR-XNOR circuits. The performance of the XOR-XNOR circuits based on TSMC 0.18µm process models at all range of the supply voltage starting from 0.6V to 3.3V is evaluated by the comparison of the simulation results obtained from HSPICE. Simulation results reveal that the proposed circuit exhibit lower PDP and EDP, more power efficient and faster when compared with best available XOR-XNOR circuits in the literature. Keywords—Exclusive-OR (XOR), Exclusive-NOR (XNOR), High speed, Low power, Arithmetic Circuits. I. INTRODUCTION HILE the growth of the electronics market has driven the VLSI industry towards very high integration density and system on chip designs and beyond few GHz operating frequencies, critical concerns have been arising to the severe increase in power consumption and the need to further reduce it. Moreover, with the explosive growth the demand and popularity of portable electronics is driving designers to strive for smaller silicon area, higher speeds, longer battery life, and more reliability. Power is one of the premium resources a designer tries to save when designing a system. The XOR- XNOR circuits are basic building blocks in various circuit especially-Arithmetic circuits (Full adder, and multipliers), Compressors, Comparators, Parity Checkers, Code converters, Error-detecting or Error-correcting codes, and Phase detector circuit in PLL. The performance of the complex logic circuits is affected by the individual performance of the XOR-XNOR circuits that are included in them (1)-(6). Therefore, careful design and analysis is required for XOR-XNOR circuits to obtained -full output

...read moreread less

Journal Article•DOI•

Inversion schemes for sublithographic programmable logic arrays

[...]

Benjamin Gojman¹, Harika Manem², Garrett S. Rose², André DeHon¹•Institutions (2)

University of Pennsylvania¹, New York University²

03 Nov 2009-Iet Computers and Digital Techniques

TL;DR: The authors develop a mapping flow for the dual-rail logic and quantify its cost in both logical product terms and physical implementation area and also develop area and timing models for all three schemes.

...read moreread less

Abstract: A programmable logic array (PLA) needs its inputs available in both the positive and negative polarities. In lithographic-scale VLSI PLAs, programmable array logics (PALs) and programmable logic devices (PLDs) a buffer and inverter at the PLA input typically produces both polarities from a single polarity input. However, the extreme regularity required for sublithographic designs has driven nanoscale architectures to consider alternate solutions. Consequently, the authors compare three schemes: one based on producing both polarities in a restoration stage (selective inversion), one based on a local inversion stage and one based on a full dual-rail logic implementation. The authors develop a mapping flow for the dual-rail logic and quantify its cost in both logical product terms and physical implementation area and also develop area and timing models for all three schemes. Mapping benchmarks from the Toronto 20 set, the authors are able to show that the local inversion scheme is faster (less than one-fifth the latency), lower energy (one-half the energy) and comparable size to the selective inversion scheme and faster (less than half the latency), smaller (one-third of the area) and lower energy (one-ninth the energy) than the dual-rail scheme.

...read moreread less

Proceedings Article•DOI•

Simultaneous buffer and interlayer via planning for 3D floorplanning

[...]

Xu He¹, Sheqin Dong¹, Yuchun Ma¹, Xianlong Hong¹•Institutions (1)

Tsinghua University¹

16 Mar 2009

TL;DR: This paper gives an efficient buffer and interlayer via planning algorithm with linear complexity, which make sure buffer andinterlayer via are inserted as successfully as possible in 3D ICs.

...read moreread less

Abstract: As technology advances, the interconnect delay among modules plays dominant role in chip performance. Buffer insertion, as a traditional approach to reduce wire delay in 2D ICs, is still necessary in 3D ICs to further optimize interconnects. Since those cross multi-layer nets in 3D ICs need to go through vertical interlayer via, the traditional buffer planning turns into simultaneous buffer and interlayer via planning in 3D ICs. In this paper, we give an efficient buffer and interlayer via planning algorithm with linear complexity, which make sure buffer and interlayer via are inserted as successfully as possible. Experimental results show that 3D ICs can significantly improve the interconnect delay.

...read moreread less

Collapse