Showing papers on "Very-large-scale integration published in 2012"

PDF

Open Access

Journal Article•DOI•

Ultra-Low Power VLSI Circuit Design Demystified and Explained: A Tutorial

[...]

02 Jan 2012-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: It is shown that many paradigms and approaches borrowed from traditional above-threshold low-power VLSI design are actually incorrect and common misconceptions in the ULP domain are debunked and replaced with technically sound explanations.

...read moreread less

Abstract: In this paper, the state of the art in ultra-low power (ULP) VLSI design is presented within a unitary framework for the first time. A few general principles are first introduced to gain an insight into the design issues and the approaches that are specific to ULP systems, as well as to better understand the challenges that have to be faced in the foreseeable future. Intuitive understanding is accompanied by rigorous analysis for each key concept. The analysis ranges from the circuit to the micro-architectural level, and reference is given to process, physical and system levels when necessary. Among the main goals of this paper, it is shown that many paradigms and approaches borrowed from traditional above-threshold low-power VLSI design are actually incorrect. Accordingly, common misconceptions in the ULP domain are debunked and replaced with technically sound explanations.

...read moreread less

363 citations

Proceedings Article•DOI•

Survey of low power testing of VLSI circuits

[...]

P. Basker¹, A. Arulmurugan¹•Institutions (1)

Kongu Engineering College¹

01 Mar 2012

TL;DR: Low power dissipation during test application is becoming increasingly important in today's V LSI systems design and is a major goal in the future development of VLSI design.

...read moreread less

Abstract: The System-On-Chip (SoC) revolution challenges both design and test engineers, especially in the area of power dissipation. Generally, a circuit or system consumes more power in test mode than in normal mode. This extra power consumption can give rise to severe hazards in circuit reliability or, in some cases, can provoke instant circuit damage. Moreover, it can create problems such as increased product cost, difficulty in performance verification, reduced autonomy of portable systems, and decrease of overall yield. Low power dissipation during test application is becoming increasingly important in today's VLSI systems design and is a major goal in the future development of VLSI design.

...read moreread less

200 citations

Journal Article•DOI•

VLSI Design of Approximate Message Passing for Signal Restoration and Compressive Sensing

[...]

P. Maechler¹, Christoph Studer², David E. Bellasi¹, Arian Maleki², Andreas Burg³, Norbert Felber¹, Hubert Kaeslin¹, Richard G. Baraniuk² - Show less +4 more•Institutions (3)

ETH Zurich¹, Rice University², École Polytechnique Fédérale de Lausanne³

16 Oct 2012-IEEE Journal on Emerging and Selected Topics in Circuits and Systems

TL;DR: This paper presents two generic very-large-scale integration (VLSI) architectures that implement the approximate message passing (AMP) algorithm for sparse signal recovery and shows that AMP-T is superior to AMp-M with respect to silicon area, throughput, and power consumption, whereasAMP-M offers more flexibility.

...read moreread less

Abstract: Sparse signal recovery finds use in a variety of practical applications, such as signal and image restoration and the recovery of signals acquired by compressive sensing. In this paper, we present two generic very-large-scale integration (VLSI) architectures that implement the approximate message passing (AMP) algorithm for sparse signal recovery. The first architecture, referred to as AMP-M, employs parallel multiply-accumulate units and is suitable for recovery problems based on unstructured (e.g., random) matrices. The second architecture, referred to as AMP-T, takes advantage of fast linear transforms, which arise in many real-world applications. To demonstrate the effectiveness of both architectures, we present corresponding VLSI and field-programmable gate array implementation results for an audio restoration application. We show that AMP-T is superior to AMP-M with respect to silicon area, throughput, and power consumption, whereas AMP-M offers more flexibility.

...read moreread less

82 citations

Journal Article•DOI•

Comparison between frame-constrained fix-pixel-value and frame-free spiking-dynamic-pixel convNets for visual processing

[...]

Clement Farabet¹, Rafael Paz², Jose A. Pérez-Carrasco³, C. Zamarreno-Ramos³, Alejandro Linares-Barranco², Yann LeCun⁴, Eugenio Culurciello⁵, Teresa Serrano-Gotarredona³, Bernabe Linares-Barranco³ - Show less +5 more•Institutions (5)

Courant Institute of Mathematical Sciences¹, University of Seville², Spanish National Research Council³, New York University⁴, University of Paris⁵

10 Apr 2012-Frontiers in Neuroscience

TL;DR: A comparison study of the Frame-Based or Frame-Free Spiking ConvNet Convolution Processors and spike-based convolution processors, two neuro-inspired solutions for real-time visual processing.

...read moreread less

Abstract: Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video in- formation in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed and time multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro- inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons.

...read moreread less

68 citations

Proceedings Article•DOI•

A digital neuromorphic VLSI architecture with memristor crossbar synaptic array for machine learning

[...]

Yongtae Kim¹, Yong Zhang¹, Peng Li¹•Institutions (1)

Texas A&M University¹

31 Dec 2012

TL;DR: The proposed analog-to-digital conversion scheme accumulates pre-synaptic weights of a neuron efficiently and reduces silicon area by using only one shared adder for processing LIF operations of N neurons.

...read moreread less

Abstract: This paper presents a reconfigurable digital neuromorphic VLSI architecture for large scale spiking neural networks. We leverage the memristor nanodevice to build an N×N crossbar array to store synaptic weights with significantly reduced area cost. Our design integrates N digital leaky integrate-and-fire (LIF) neurons and the respective on-line learning circuits for a spike timing-dependent learning rule. The proposed analog-to-digital conversion scheme accumulates pre-synaptic weights of a neuron efficiently and reduces silicon area by using only one shared adder for processing LIF operations of N neurons. The proposed architecture is shown to be both area and power efficient. With 256 neurons and 64K synapses, the power dissipation and the area of our design are evaluated as 9.46-mW and 0.66-mm2, respectively, in a 90-nm CMOS technology.

...read moreread less

65 citations

Journal Article•DOI•

VLSI Design of an SVM Learning Core on Sequential Minimal Optimization Algorithm

[...]

Ta-Wen Kuan¹, Jhing-Fa Wang¹, Jia-Ching Wang², Po-Chuan Lin, Gaung-Hui Gu - Show less +1 more•Institutions (2)

National Cheng Kung University¹, National Central University²

01 Apr 2012-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This work presents an efficient application specific integrated circuit chip design for sequential minimal optimization, implemented as an intellectual property core suitable for use in an SVM-based recognition system on a chip.

...read moreread less

Abstract: The sequential minimal optimization (SMO) algorithm has been extensively employed to train the support vector machine (SVM). This work presents an efficient application specific integrated circuit chip design for sequential minimal optimization. This chip is implemented as an intellectual property core, suitable for use in an SVM-based recognition system on a chip. The proposed SMO chip was tested and found to be fully functional, using a prototype system based on the Altera DE2 board with a Cyclone II 2C70 field-programmable gate array.

...read moreread less

63 citations

Journal Article•DOI•

Six-input lookup table circuit with 62% fewer transistors using nonvolatile logic-in-memory architecture with series/parallel-connected magnetic tunnel junctions

[...]

Daisuke Suzuki, Masanori Natsui, Tetsuo Endoh, Hideo Ohno, Takahiro Hanyu - Show less +1 more

24 Feb 2012-Journal of Applied Physics

TL;DR: In this article, a 6-input lookup table (LUT) circuit using nonvolatile logic-in-memory (LIM) architecture with series/parallel-connected magnetic tunnel junction (MTJ) devices is proposed for a standby-power-free field-programmable gate array.

...read moreread less

Abstract: A compact 6-input lookup table (LUT) circuit using nonvolatile logic-in-memory (LIM) architecture with series/parallel-connected magnetic tunnel junction (MTJ) devices is proposed for a standby-power-free field-programmable gate array. Series/parallel connections of MTJ devices make it possible not only to reduce the effect of resistance variation, but also to enhance the programmability of resistance values, which achieves a sufficient sensing margin even when process variation is serious in the recent nanometer-scaled VLSI. Moreover, the additional MTJ devices do not increase the effective chip area because the configuration circuit using MTJ devices is simplified and these devices are stacked over the CMOS plane. As a result, the transistor counts of the proposed circuit are reduced by 62% in comparison with those of a conventional nonvolatile LUT circuit where CMOS-only-based volatile static random access memory cell circuits are replaced by MTJ-based nonvolatile ones.

...read moreread less

54 citations

Journal Article•DOI•

Relaxed Fault-Tolerant Hardware Implementation of Neural Networks in the Presence of Multiple Transient Errors

[...]

Hamid Reza Mahdiani, Sied Mehdi Fakhraie¹, Caro Lucas¹•Institutions (1)

University of Tehran¹

08 Jun 2012-IEEE Transactions on Neural Networks

TL;DR: A new class of applications is categorized in this paper, inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties which are developed for VLSI implementation of imprecision-tolerant applications.

...read moreread less

Abstract: Reliability should be identified as the most important challenge in future nano-scale very large scale integration (VLSI) implementation technologies for the development of complex integrated systems. Normally, fault tolerance (FT) in a conventional system is achieved by increasing its redundancy, which also implies higher implementation costs and lower performance that sometimes makes it even infeasible. In contrast to custom approaches, a new class of applications is categorized in this paper, which is inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties. Neural networks are good indicators of imprecision-tolerant applications. We have also proposed a new class of FT techniques called relaxed fault-tolerant (RFT) techniques which are developed for VLSI implementation of imprecision-tolerant applications. The main advantage of RFT techniques with respect to traditional FT solutions is that they exploit inherent FT of different applications to reduce their implementation costs while improving their performance. To show the applicability as well as the efficiency of the RFT method, the experimental results for implementation of a face-recognition computationally intensive neural network and its corresponding RFT realization are presented in this paper. The results demonstrate promising higher performance of artificial neural network VLSI solutions for complex applications in faulty nano-scale implementation environments.

...read moreread less

53 citations

Journal Article•DOI•

A Pipeline VLSI Architecture for Fast Computation of the 2-D Discrete Wavelet Transform

[...]

Chengjun Zhang¹, Chunyan Wang¹, M.O. Ahmad¹•Institutions (1)

Concordia University¹

17 Jan 2012-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The performance in terms of the processing speed of the architecture designed based on the proposed scheme is superior to those of the architectures designed using other existing schemes, and it has similar or lower hardware consumption.

...read moreread less

Abstract: In this paper, a scheme for the design of a high-speed pipeline VLSI architecture for the computation of the 2-D discrete wavelet transform (DWT) is proposed. The main focus in the development of the architecture is on providing a high operating frequency and a small number of clock cycles along with an efficient hardware utilization by maximizing the inter-stage and intra-stage computational parallelism for the pipeline. The inter-stage parallelism is enhanced by optimally mapping the computational task of multi decomposition levels to the stages of the pipeline and synchronizing their operations. The intra-stage parallelism is enhanced by dividing the 2-D filtering operation into four subtasks that can be performed independently in parallel and minimizing the delay of the critical path of bit-wise adder networks for performing the filtering operation. To validate the proposed scheme, a circuit is designed, simulated, and implemented in FPGA for the 2-D DWT computation. The results of the implementation show that the circuit is capable of operating with a maximum clock frequency of 134 MHz and processing 1022 frames of size 512 × 512 per second with this operating frequency. It is shown that the performance in terms of the processing speed of the architecture designed based on the proposed scheme is superior to those of the architectures designed using other existing schemes, and it has similar or lower hardware consumption.

...read moreread less

51 citations

Journal Article•DOI•

VLSI Friendly ECG QRS Complex Detector for Body Sensor Networks

[...]

C. F. Zhang¹, Tae-Wuk Bae²•Institutions (2)

Carnegie Mellon University¹, Stanford University²

07 Mar 2012-IEEE Journal on Emerging and Selected Topics in Circuits and Systems

TL;DR: A very-large-scale integration (VLSI) friendly electrocardiogram (ECG) QRS detector for body sensor networks by a mathematical morphological method and the multipixel modulus accumulation is employed to act as a low-pass filter to enhance the QRS complex and improve the signal-to-noise ratio.

...read moreread less

Abstract: This paper aims to present a very-large-scale integration (VLSI) friendly electrocardiogram (ECG) QRS detector for body sensor networks. Baseline wandering and background noise are removed from original ECG signal by a mathematical morphological method. Then the multipixel modulus accumulation is employed to act as a low-pass filter to enhance the QRS complex and improve the signal-to-noise ratio. The performance of the algorithm is evaluated with standard MIT-BIH arrhythmia database and wearable exercise ECG Data. Corresponding power and area efficient VLSI architecture is designed and implemented on a commercial nano-FPGA. High detection rate and high speed demonstrate the effectiveness of the proposed detector.

...read moreread less

50 citations

Journal Article•DOI•

VLSI circuits implementing computational models of neocortical circuits.

[...]

Jayawan H B Wijekoon¹, Piotr Dudek¹•Institutions (1)

University of Manchester¹

15 Sep 2012-Journal of Neuroscience Methods

TL;DR: The design and implementation of three neuromorphic integrated circuits developed for the COLAMN ("Novel Computing Architecture for Cognitive Systems based on the Laminar Microcircuitry of the Neocortex") project are overviews.

...read moreread less

Journal Article•DOI•

Investigating the Impact of Logic and Circuit Implementation on Full Adder Performance

[...]

Sohan Purohit¹, Martin Margala¹•Institutions (1)

University of Massachusetts Lowell¹

01 Jul 2012-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper presents the design and characterization of 12 full-adder circuits in the IBM 90-nm process, including three new full-adders circuits using the recently proposed split-path data driven dynamic logic.

...read moreread less

Abstract: This paper presents the design and characterization of 12 full-adder circuits in the IBM 90-nm process. These include three new full-adder circuits using the recently proposed split-path data driven dynamic logic. Based on the logic function realized, the adders were characterized for performance and power consumption when operated under various supply voltages and fan-out loads. The adders were then further deployed in a 32 bit ripple carry adder and 8×4 multiplier to evaluate the impact of sum and carry propagation delays on the performance, power of these systems. Performance characterization of the adder circuits in the presence of process and voltage variations was also performed through Monte Carlo simulations. Besides analyzing and comparing circuit performance, the possible impact of the choice of logic function has also been underlined in this study.

...read moreread less

Journal Article•DOI•

Quantization analysis and enhancement of a VLSI gradient-based motion estimation architecture

[...]

Guillermo Botella¹, Uwe Meyer-Baese¹, Antonio García², Manuel Rodríguez²•Institutions (2)

Florida A&M University – Florida State University College of Engineering¹, University of Granada²

01 Dec 2012-Digital Signal Processing

TL;DR: A useful framework for building bio-inspired systems in real-time environments, reducing computational complexity is presented, and a complete quantization study of neuromorphic robust optical flow architecture is performed, using properties found in the cortical motion pathway.

...read moreread less

Proceedings Article•DOI•

Fundamental limits on the power consumption of encoding and decoding

[...]

Pulkit Grover¹, Andrea Goldsmith¹, Anant Sahai²•Institutions (2)

Stanford University¹, University of California, Berkeley²

01 Jul 2012

TL;DR: Fundamental information-theoretic bounds are provided on the required circuit wiring complexity and power consumption for encoding and decoding of error-correcting codes and for bounded transmit-power schemes, showing that there is a fundamental tradeoff between the transmit and encoding/decoding power.

...read moreread less

Abstract: We provide fundamental information-theoretic bounds on the required circuit wiring complexity and power consumption for encoding and decoding of error-correcting codes. These bounds hold for all codes and all encoding and decoding algorithms implemented within the paradigm of our VLSI model. This model essentially views computation on a 2-D VLSI circuit as a computation on a network of connected nodes. The bounds are derived based on analyzing information flow in the circuit. They are then used to show that there is a fundamental tradeoff between the transmit and encoding/decoding power, and that the total (transmit + encoding + decoding) power must diverge to infinity at least as fast as cube-root of log 1/p e , where P e is the average block-error probability. On the other hand, for bounded transmit-power schemes, the total power must diverge to infinity at least as fast as square-root of log 1/P e due to the burden of encoding/decoding.

...read moreread less

Journal Article•DOI•

Leakage Power Reduction Techniques in Deep Submicron Technologies for VLSI Applications

[...]

M. Geetha Priya, K. Baskaran, D. Krishnaveni

01 Jan 2012-Procedia Engineering

TL;DR: It is implied from the current literature that only an appropriate choice of leakage power minimization technique for a specific application can be effectively carried by a VLSI circuit designer based on sequential analytical approach.

...read moreread less

Book•

Deep-Submicron CMOS ICs: From Basics to ASICs

[...]

H. J. M. Veendrick

13 Nov 2012

TL;DR: In this paper, the authors discuss the physical and geometrical effects on the behavior of the MOS transistor and the effects of scaling on MOS IC design and consequences for the roadmap.

...read moreread less

Abstract: 1 Basic Principles.- 2 Physical and geometrical effects on the behaviour of the MOS transistor.- 3 Manufacture of MOS devices.- 4 CMOS circuits.- 5 Special circuits, devices and technologies.- 6 Memories.- 7 Very Large Scale Integration (VLSI) and ASICs.- 8 Low power, a hot topic in IC design.- 9 Circuit reliability and signal integrity in deep-submicron designs.- 10 Testing, debugging, yield and packaging.- 11 Effects of scaling on MOS IC design and consequences for the roadmap.

...read moreread less

Journal Article•DOI•

Iterative Superlinear-Convergence SVD Beamforming Algorithm and VLSI Architecture for MIMO-OFDM Systems

[...]

Cheng-Zhou Zhan¹, Yen-Liang Chen¹, An-Yeu Wu¹•Institutions (1)

National Taiwan University¹

01 Jun 2012-IEEE Transactions on Signal Processing

TL;DR: A singular value decomposition (SVD) algorithm with superlinear-convergence rate, which is suitable for the beamforming mechanism in MIMO-OFDM channels with short coherent time, or short training sequence, is proposed.

...read moreread less

Abstract: In this paper, we propose a singular value decomposition (SVD) algorithm with superlinear-convergence rate, which is suitable for the beamforming mechanism in MIMO-OFDM channels with short coherent time, or short training sequence. The proposed superlinear-convergence SVD (SL-SVD) algorithm has the following features: 1) superlinear-convergence rate; 2) the ability of being extended smaller numbers of transmit and receive antennas; 3) being insensitive to dynamic range problems during the iterative process in hardware implementations; and 4) low computational cost. We verify the proposed design by using the VLSI implementation with CMOS 90 nm technology. The post-layout result of the design has the feature of 0.48 core area and 18 mW power consumption. Our design can achieve 7 M channel-matrices/s, and can be extended to deal with different transmit and receive antenna sets.

...read moreread less

Book Chapter•DOI•

Global Approaches for Facility Layout and VLSI Floorplanning

[...]

Miguel F. Anjos¹, Frauke Liers²•Institutions (2)

École Polytechnique de Montréal¹, University of Cologne²

01 Jan 2012

TL;DR: This chapter provides an overview of conic optimization models for facility layout and VLSI floorplanning problems, and finds that the semidefinite optimization approaches can provide global optimal solutions for instances with up to 40 facilities, and tight global bounds for instancesWith up to 100 facilities.

...read moreread less

Abstract: This chapter provides an overview of conic optimization models for facility layout and VLSI floorplanning problems. We focus on two classes of problems to which conic optimization approaches have been successfully applied, namely the single-row facility layout problem, and fixed-outline floorplanning in VLSI circuit design. For the former, a close connection to the cut polytope has been exploited in positive semidefinite and integer programming approaches. In particular, the semidefinite optimization approaches can provide global optimal solutions for instances with up to 40 facilities, and tight global bounds for instances with up to 100 facilities. For the floorplanning problem, a conic optimization model provided the first non-trivial lower bounds in the literature.

...read moreread less

Book•

Defect and Fault Tolerance in Vlsi Systems

[...]

Israel Koren

12 Jun 2012

TL;DR: In this article, the authors present a unified approach to yield analysis of fault-free or fault-tolerant VLSI manufacturing, including the effect of defect density on yield.

...read moreread less

Abstract: 1 Models for VLSI Manufacturing Yield.- Fault-Free or Fault-Tolerant VLSI Manufacturing.- Yield Models - Comparative Study.- 2 Models for Defects and Yield.- A Unified Approach to Yield Analysis of Defect Tolerant Circuits.- Systematic Extraction of Critical Areas From IC Layouts.- The Effect on Yield of Clustering and Radial Variations in Defect Density.- 3 Implementation of Wafer Scale Integration.- Practical Experiences in the Design of a Wafer Scale 2-D Array.- Yield Evaluation of a Soft-Configurable WSI Switch Network.- ASP Modules: WSI Building-Blocks for Cost-Effective Parallel Computing.- 4 Fault Tolerance.- Fault-Tolerant k-out-of-n Logic Unit Network With Minimum Interconnection.- Extended Duplex Fault Tolerant System With Integrated Control Flow Checking.- Experience in Functional Test and Fault Coverage in a Silicon Compiler.- 5 Array Processors.- APES: An Evaluation Environment of Fault-Tolerance Capabilities of Array Processors.- Comparison of Reconfiguration Schemes for Defect Tolerant Mesh Arrays.- An Integer Linear Programming Approach to General Fault Covering Problems.- Probabilistic Analysis of Memory Repair and Reconfiguration Heuristics.- Arithmetic-Based Diagnostics in VLSI Array Processors.- 6 New Approaches and Issues.- Yield Improvement Through X-RAY Lithography.- Reliability Analysis of Application-Specific Architectures.- Fault Tolerance in Analog VLSI: Case Study of a Focal Plane Processor.- 7 Yield and Manufacturing Defects.- Yield Model With Critical Geometry Analysis for Yield Projection from Test Sites on a Wafer Basis With Confidence Limits.- SRAM/TEG Yield Methodology.- A Fault Detection and Tolerance Tradeoff Evaluation Methodology for VLSI Systems.- 8 Designs for Wafer Scale Integration.- A Hypercube Design on WSI.- An Efficient Reconfiguration Scheme for WSI of Cube-Connected Cycles With Bounded Channel Width.- A Communication Scheme for Defect Tolerant Arrays.

...read moreread less

Book•

VLSI for Pattern Recognition and Image Processing

[...]

Dharma P. Agrawal, King-Sun Fu

19 Apr 2012

TL;DR: VLSI-Based Image Resampling for Electronic Publishing and VLSI Based Multicomputer Architecture for Dynamic Scene Analysis: System Architecture and Performance.

...read moreread less

Abstract: 1.1 VLSI System.- 1.2 VLSI Algorithms.- 1.3 Summary of Book.- References.- I General VLSI Design Considerations.- 2. One-Dimensional Systolic Arrays for Multidimensional Convolution and Resampling.- 2.1 Background.- 2.2 Systolic Convolution Arrays.- 2.3 Variants in the Convolution Problem.- 2.4 Implementation.- 2.5 Concluding Remarks.- References.- 3. VLSI Arrays for Pattern Recognition and Image Processing: I/O Bandwidth Considerations.- 3.1 Background.- 3.2 Arrays for Matrix Operations.- 3.3 Arrays for Pattern Analysis.- 3.4 Image-Processing Array.- 3.5 Conclusions.- References.- II VLSI Systems for Pattern Recognition.- 4. VLSI Arrays for Minimum-Distance Classifications.- 4.1 Minimum-Distance Classification.- 4.2 Vector Distances.- 4.3 String Distances.- 4.4 Examples of Application.- 4.5 Summary.- References.- 5. Design of a Pattern Cluster Using Two-Level Pipelined Systolic Array.- 5.1 Background.- 5.2 Description of Squared-Error Pattern Clustering.- 5.3 The Systolic Pattern Clustering Array.- 5.4 System Operating Characteristics.- 5.5 Conclusion.- References.- 6. VLSI Arrays for Syntactic Pattern Recognition.- 6.1 Pattern Description and Recognition.- 6.2 Syntactic Pattern Recognition.- 6.3 VLSI Implementation.- 6.4 Simulation and Examples.- 6.5 Concluding Remarks.- References.- III VLSI Systems for Image Processing.- 7. Concurrent Systems for Image Analysis.- 7.1 Background.- 7.2 Processing Requirements for Image Analysis.- 7.3 Concurrent VLSI Architectures.- 7.4 Conclusions.- References.- 8. VLSI Wavefront Arrays for Image Processing.- 8.1 Background.- 8.2 Parallel Computers for Image Processing.- 8.3 Design of Pipelined Array Processors.- 8.4 Image-Processing Applications.- 8.5 Wafer-Scale Integrated System.- 8.6 Conclusion.- References.- 9. Curve Detection in VLSI.- 9.1 Background.- 9.2 Montanari's Algorithm.- 9.3 Systolic Architectures.- 9.4 A VLSI Design of the Column Pipeline Chip.- 9.5 SIMD Array Algorithm.- 9.6 Concluding Remarks.- References.- 10. VLSI Implementation of Cellular Logic Processors.- 10.1 Cellular Logic Processors.- 10.2 Binary Neighborhood Functions.- 10.3 CELLSCAN.- 10.4 The Golay Parallel Pattern Transform.- 10.5 The diff3-GLOPR.- 10.6 The Preston-Herron Processor (PHP).- 10.7 The LSI/PHP.- References.- 11. Design of VLSI Based Multicomputer Architecture for Dynamic Scene Analysis.- 11.1 Background.- 11.2 Dynamic Scene Analysis Algorithm.- 11.3 Existing Multicomputer Architecture.- 11.4 Scheduling and Parameters of Interest.- 11.5 Performance Evaluation.- 11.6 Conclusion.- References.- 12. VLSI-Based Image Resampling for Electronic Publishing.- 12.1 Introduction to the "Electronic Darkroom".- 12.2 Overview of System Design Concepts.- 12.3 Resampling Algorithms.- 12.4 System Architecture and Performance.- 12.5 Trade-Offs and Conclusions.- References.

...read moreread less

Proceedings Article•DOI•

Exploiting device mismatch in neuromorphic VLSI systems to implement axonal delays

[...]

Sadique Sheik¹, Elisabetta Chicca¹, Giacomo Indiveri¹•Institutions (1)

University of Zurich¹

10 Jun 2012

TL;DR: This paper demonstrates a strategy to implement temporal delays in hardware spiking neural networks distributed across multiple Very Large Scale Integration (VLSI) chips by exploiting the inherent device mismatch present in the analog circuits that implement silicon neurons and synapses inside the chips.

...read moreread less

Abstract: Axonal delays are used in neural computation to implement faithful models of biological neural systems, and in spiking neural networks models to solve computationally demanding tasks. While there is an increasing number of software simulations of spiking neural networks that make use of axonal delays, only a small fraction of currently existing hardware neuromorphic systems supports them. In this paper we demonstrate a strategy to implement temporal delays in hardware spiking neural networks distributed across multiple Very Large Scale Integration (VLSI) chips. This is achieved by exploiting the inherent device mismatch present in the analog circuits that implement silicon neurons and synapses inside the chips, and the digital communication infrastructure used to configure the network topology and transmit the spikes across chips. We present an example of a recurrent VLSI spiking neural network that employs axonal delays and demonstrate how the proposed strategy efficiently implements them in hardware.

...read moreread less

Journal Article•DOI•

A low cost matching motion estimation sensor based on the NIOS II microprocessor.

[...]

Diego González¹, Guillermo Botella¹, Uwe Meyer-Baese¹, Carlos García¹, Concepción Sanz¹, Manuel Prieto-Matias², Francisco Tirado¹ - Show less +3 more•Institutions (2)

Complutense University of Madrid¹, Florida State University²

27 Sep 2012-Sensors

TL;DR: The developed low-cost system is practical for real-time throughput and reduced power consumption and is useful in robotic applications, such as tracking, navigation using an unmanned vehicle, or as part of a more complex system.

...read moreread less

Abstract: This work presents the implementation of a matching-based motion estimation sensor on a Field Programmable Gate Array (FPGA) and NIOS II microprocessor applying a C to Hardware (C2H) acceleration paradigm. The design, which involves several matching algorithms, is mapped using Very Large Scale Integration (VLSI) technology. These algorithms, as well as the hardware implementation, are presented here together with an extensive analysis of the resources needed and the throughput obtained. The developed low-cost system is practical for real-time throughput and reduced power consumption and is useful in robotic applications, such as tracking, navigation using an unmanned vehicle, or as part of a more complex system.

...read moreread less

Journal Article•DOI•

Circuit and Physical Design Implementation of the Microprocessor Chip for the zEnterprise System

[...]

James D. Warnock¹, Yiu-Hing Chan¹, S. Carey¹, Huajun Wen¹, Patrick J. Meaney¹, G. Gerwig¹, Howard H. Smith¹, Yuen Chan¹, John Davis¹, Paul A. Bunce¹, Antonio R. Pelella¹, Daniel Rodko¹, Pradip Patel¹, Thomas Strach¹, D. Malone¹, Frank Malgioglio¹, Jose L. Neves¹, David L. Rude¹, William V. Huott¹ - Show less +15 more•Institutions (1)

IBM¹

01 Jan 2012-IEEE Journal of Solid-state Circuits

TL;DR: The core and chip design methodology and specific design features are presented, focusing on techniques used to enable high-frequency operation, including chip power, IR drop, and supply noise are discussed, being key design focus areas.

...read moreread less

Abstract: This paper describes the circuit and physical design features of the z196 processor chip, implemented in a 45 nm SOI technology. The chip contains 4 super-scalar, out-of-order processor cores, running at 5.2 GHz, on a die with an area of 512 mm2 containing an estimated 1.4 billion transistors. The core and chip design methodology and specific design features are presented, focusing on techniques used to enable high-frequency operation. In addition, chip power, IR drop, and supply noise are discussed, being key design focus areas. The chip's ground-breaking RAS features are also described, engineered for maximum reliability and system stability.

...read moreread less

Book•

Advanced Low-Power Digital Circuit Techniques

[...]

Muhammad E. S. Elrabaa, Issam S. Abu-Khater, Mohamed I. Elmasry

14 Feb 2012

TL;DR: Low-Power VLSI Design, Low-Power Embedded BiCMOS/ECL SRAMs, and Inter-Chip Low-Voltage Swing Transceivers.

...read moreread less

Abstract: List of Figures. List of Tables. Preface. 1. Low-Power VLSI Design. 2. Low-Power High-Performance Adders. 3. Low-Power High-Performance Multipliers. 4. Low-Power Register File. 5. Low-Power Embedded BiCMOS/ECL SRAMs. 6. BiCMOS on-Chip Drivers. 7. Inter-Chip Low-Voltage Swing Transceivers. References. Index.

...read moreread less

Journal Article•DOI•

Reconciling fault-tolerant distributed computing and systems-on-chip

[...]

Matthias Függer¹, Ulrich Schmid¹•Institutions (1)

Vienna University of Technology¹

01 Jan 2012-Distributed Computing

TL;DR: This paper introduces a modeling and analysis framework based on continuous computations and zero-bit message channels, and employs this framework for the correctness & performance analysis of a distributed fault-tolerant clocking approach for Systems-on-Chip (SoCs).

...read moreread less

Abstract: Classic distributed computing abstractions do not match well the reality of digital logic gates, which are the elementary building blocks of Systems-on-Chip (SoCs) and other Very Large Scale Integrated (VLSI) circuits: Massively concurrent, continuous computations undermine the concept of sequential processes executing sequences of atomic zero-time computing steps, and very limited computational resources at gate-level make even simple operations prohibitively costly. In this paper, we introduce a modeling and analysis framework based on continuous computations and zero-bit message channels, and employ this framework for the correctness & performance analysis of a distributed fault-tolerant clocking approach for Systems-on-Chip (SoCs). Starting out from a “classic” distributed Byzantine fault-tolerant tick generation algorithm, we show how to adapt it for direct implementation in clockless digital logic, and rigorously prove its correctness and derive analytic expressions for worst case performance metrics like synchronization precision and clock frequency. Rather than on absolute delay values, both the algorithm’s correctness and the achievable synchronization precision depend solely on the ratio of certain path delays. Since these ratios can be mapped directly to placement & routing constraints, there is typically no need for changing the algorithm when migrating to a faster implementation technology and/or when using a slightly different layout in an SoC.

...read moreread less

Journal Article•DOI•

Novel Code Converter Employing Reversible Logic

[...]

Rakshith Saligram, T. R. Rakshith

30 Aug 2012-International Journal of Computer Applications

TL;DR: This paper implements a reversible 5421 to binary code converter, a model for quantum computation in which a computation is performed by a sequence of quantum gates, which are reversible transformation on a quantum mechanical analog of an n bit register, referred to as an n- qubit.

...read moreread less

Abstract: Quantum computing is one of the emerging fields of technology that has been a guiding light for low power VLSI, low power CMOS design, optical information computing, DNAcomputing, Bio-informatics and Nano-technology .A quantum circuit is a model for quantum computation in which a computation is performed by a sequence of quantum gates, which are reversible transformation on a quantum mechanical analog of an n bit register, referred to as an n- qubit. Code converters are required day in and day out. In this paper, a reversible 5421 to binary code converter is implemented.

...read moreread less

Journal Article•DOI•

Trellis-Search Based Soft-Input Soft-Output MIMO Detector: Algorithm and VLSI Architecture

[...]

Yang Sun¹, Joseph R. Cavallaro¹•Institutions (1)

Rice University¹

01 May 2012-IEEE Transactions on Signal Processing

TL;DR: A trellis-search based soft-input soft-output detection algorithm and its very large scale integration (VLSI) architecture for iterative multiple-input multiple-output (MIMO) receivers is proposed and an approximate Log-MAP algorithm is developed by using a small list of largest exponential terms to compute the LLR (log-likelihood ratio) values.

...read moreread less

Abstract: In this paper, we propose a trellis-search based soft-input soft-output detection algorithm and its very large scale integration (VLSI) architecture for iterative multiple-input multiple-output (MIMO) receivers. We construct a trellis diagram to represent the search space of a transmitted MIMO signal. With the trellis model, we evenly distribute the workload of candidates searching among multiple trellis nodes for parallel processing. The search complexity is significantly reduced because the number of candidates is greatly limited at each trellis node. By leveraging the trellis structure, we develop an approximate Log-MAP algorithm by using a small list of largest exponential terms to compute the LLR (log-likelihood ratio) values. The trellis-search based detector has a fixed-complexity and is very suitable for parallel VLSI implementation. As a case study, we have designed and synthesized a trellis-search based soft-input soft-output MIMO detector for a 4 × 4 16-QAM system using a 1.08 V TSMC 65 nm technology. The detector can achieve a maximum throughput of 1.7 Gb/s with a core area of 1.58 mm2.

...read moreread less

Patent•

Compiler for closed-loop 1xN VLSI design

[...]

Benjamin J. Bowers¹, Matthew W. Baker¹, Anthony Correale¹, Irfan Rashid¹, Paul M. Steinmetz¹ - Show less +1 more•Institutions (1)

IBM¹

01 Feb 2012

TL;DR: In this article, a behavioral representation of RTL HDL with one or more 1×N building blocks is presented, where the behavioral representation can be modified by using logic design tools, synthesis tools, physical design tools and timing analysis tools.

...read moreread less

Abstract: Embodiments that design integrated circuits using a 1×N compiler in a closed-loop 1×N methodology are disclosed. Some embodiments create a physical design representation based on a behavioral representation of a design for an integrated circuit. The behavioral representation may comprise RTL HDL with one or more 1×N building blocks. The embodiments may alter elements of the 1×N building block by using logic design tools, synthesis tools, physical design tools, and timing analysis tools. Further embodiments comprise an apparatus having a first generator to generate a behavioral representation of a design for an integrated circuit, a second generator to generate a logical representation of the design, and a third generator to generate a physical design representation of the design, wherein the representation generators may create updated versions of the representations which reflect alterations made to 1×N building block elements.

...read moreread less

Proceedings Article•

A Digit-serial reconfigurable VLSI based on quaternary inter-cell data transfer scheme

[...]

Xu Bai¹, Nobuaki Okada¹, Michitaka Kameyama•Institutions (1)

Tohoku University¹

28 Dec 2012

Journal Article•DOI•

An Optimized Device Sizing of Analog Circuits using Particle Swarm Optimization

[...]

Praveen Kumar, K. Duraiswamy

19 Mar 2012-Journal of Computer Science

TL;DR: The output of the simulation for the two-stage opamp shows that the PSO technique is an accurate and promising approach in determining the device sizes in an analog circuit.

...read moreread less

Abstract: Problem statement: Day by day more and more products rely on analog circuits to improve the speed and reduce the power consumption(Products rely on analog circuits to improve the speed and reduce the power consumption day by day more and more.). For the VLSI implementation analog circuit design plays an important role. This analog circuit synthesis might be the most challenging and time-consumed task, because it does not only co nsist of topology and layout synthesis but also of component sizing. Approach: A Particle Swarm Optimization (PSO) technique for the optimal design of analog circuits. Analog signal processing finds many applications and widely uses OpAmp based amplifiers, mixers, comparators. and filters. Results: A two-stage opamp (Miller Operational Trans-conductance Amplifier (OTA)) is considered for the synthesis that satisfies certain design specifications. Performance has been evaluated with the Simulation Program with Integrated Circuit Emphasis (SPICE) circuit simulator until optimal si zes of the transistors are found. Conclusion: The output of the simulation for the two-stage opamp sh ows that the PSO technique is an accurate and promising approach in determining the device sizes in an analog circuit.

...read moreread less

Collapse