scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2012"


Journal ArticleDOI
TL;DR: It is shown that many paradigms and approaches borrowed from traditional above-threshold low-power VLSI design are actually incorrect and common misconceptions in the ULP domain are debunked and replaced with technically sound explanations.
Abstract: In this paper, the state of the art in ultra-low power (ULP) VLSI design is presented within a unitary framework for the first time. A few general principles are first introduced to gain an insight into the design issues and the approaches that are specific to ULP systems, as well as to better understand the challenges that have to be faced in the foreseeable future. Intuitive understanding is accompanied by rigorous analysis for each key concept. The analysis ranges from the circuit to the micro-architectural level, and reference is given to process, physical and system levels when necessary. Among the main goals of this paper, it is shown that many paradigms and approaches borrowed from traditional above-threshold low-power VLSI design are actually incorrect. Accordingly, common misconceptions in the ULP domain are debunked and replaced with technically sound explanations.

363 citations


Proceedings ArticleDOI
01 Mar 2012
TL;DR: Low power dissipation during test application is becoming increasingly important in today's V LSI systems design and is a major goal in the future development of VLSI design.
Abstract: The System-On-Chip (SoC) revolution challenges both design and test engineers, especially in the area of power dissipation. Generally, a circuit or system consumes more power in test mode than in normal mode. This extra power consumption can give rise to severe hazards in circuit reliability or, in some cases, can provoke instant circuit damage. Moreover, it can create problems such as increased product cost, difficulty in performance verification, reduced autonomy of portable systems, and decrease of overall yield. Low power dissipation during test application is becoming increasingly important in today's VLSI systems design and is a major goal in the future development of VLSI design.

200 citations


Journal ArticleDOI
TL;DR: This paper presents two generic very-large-scale integration (VLSI) architectures that implement the approximate message passing (AMP) algorithm for sparse signal recovery and shows that AMP-T is superior to AMp-M with respect to silicon area, throughput, and power consumption, whereasAMP-M offers more flexibility.
Abstract: Sparse signal recovery finds use in a variety of practical applications, such as signal and image restoration and the recovery of signals acquired by compressive sensing. In this paper, we present two generic very-large-scale integration (VLSI) architectures that implement the approximate message passing (AMP) algorithm for sparse signal recovery. The first architecture, referred to as AMP-M, employs parallel multiply-accumulate units and is suitable for recovery problems based on unstructured (e.g., random) matrices. The second architecture, referred to as AMP-T, takes advantage of fast linear transforms, which arise in many real-world applications. To demonstrate the effectiveness of both architectures, we present corresponding VLSI and field-programmable gate array implementation results for an audio restoration application. We show that AMP-T is superior to AMP-M with respect to silicon area, throughput, and power consumption, whereas AMP-M offers more flexibility.

82 citations


Journal ArticleDOI
TL;DR: A comparison study of the Frame-Based or Frame-Free Spiking ConvNet Convolution Processors and spike-based convolution processors, two neuro-inspired solutions for real-time visual processing.
Abstract: Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video in- formation in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed and time multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro- inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons.

68 citations


Proceedings ArticleDOI
31 Dec 2012
TL;DR: The proposed analog-to-digital conversion scheme accumulates pre-synaptic weights of a neuron efficiently and reduces silicon area by using only one shared adder for processing LIF operations of N neurons.
Abstract: This paper presents a reconfigurable digital neuromorphic VLSI architecture for large scale spiking neural networks. We leverage the memristor nanodevice to build an N×N crossbar array to store synaptic weights with significantly reduced area cost. Our design integrates N digital leaky integrate-and-fire (LIF) neurons and the respective on-line learning circuits for a spike timing-dependent learning rule. The proposed analog-to-digital conversion scheme accumulates pre-synaptic weights of a neuron efficiently and reduces silicon area by using only one shared adder for processing LIF operations of N neurons. The proposed architecture is shown to be both area and power efficient. With 256 neurons and 64K synapses, the power dissipation and the area of our design are evaluated as 9.46-mW and 0.66-mm2, respectively, in a 90-nm CMOS technology.

65 citations


Journal ArticleDOI
TL;DR: This work presents an efficient application specific integrated circuit chip design for sequential minimal optimization, implemented as an intellectual property core suitable for use in an SVM-based recognition system on a chip.
Abstract: The sequential minimal optimization (SMO) algorithm has been extensively employed to train the support vector machine (SVM). This work presents an efficient application specific integrated circuit chip design for sequential minimal optimization. This chip is implemented as an intellectual property core, suitable for use in an SVM-based recognition system on a chip. The proposed SMO chip was tested and found to be fully functional, using a prototype system based on the Altera DE2 board with a Cyclone II 2C70 field-programmable gate array.

63 citations


Journal ArticleDOI
TL;DR: In this article, a 6-input lookup table (LUT) circuit using nonvolatile logic-in-memory (LIM) architecture with series/parallel-connected magnetic tunnel junction (MTJ) devices is proposed for a standby-power-free field-programmable gate array.
Abstract: A compact 6-input lookup table (LUT) circuit using nonvolatile logic-in-memory (LIM) architecture with series/parallel-connected magnetic tunnel junction (MTJ) devices is proposed for a standby-power-free field-programmable gate array. Series/parallel connections of MTJ devices make it possible not only to reduce the effect of resistance variation, but also to enhance the programmability of resistance values, which achieves a sufficient sensing margin even when process variation is serious in the recent nanometer-scaled VLSI. Moreover, the additional MTJ devices do not increase the effective chip area because the configuration circuit using MTJ devices is simplified and these devices are stacked over the CMOS plane. As a result, the transistor counts of the proposed circuit are reduced by 62% in comparison with those of a conventional nonvolatile LUT circuit where CMOS-only-based volatile static random access memory cell circuits are replaced by MTJ-based nonvolatile ones.

54 citations


Journal ArticleDOI
TL;DR: A new class of applications is categorized in this paper, inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties which are developed for VLSI implementation of imprecision-tolerant applications.
Abstract: Reliability should be identified as the most important challenge in future nano-scale very large scale integration (VLSI) implementation technologies for the development of complex integrated systems. Normally, fault tolerance (FT) in a conventional system is achieved by increasing its redundancy, which also implies higher implementation costs and lower performance that sometimes makes it even infeasible. In contrast to custom approaches, a new class of applications is categorized in this paper, which is inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties. Neural networks are good indicators of imprecision-tolerant applications. We have also proposed a new class of FT techniques called relaxed fault-tolerant (RFT) techniques which are developed for VLSI implementation of imprecision-tolerant applications. The main advantage of RFT techniques with respect to traditional FT solutions is that they exploit inherent FT of different applications to reduce their implementation costs while improving their performance. To show the applicability as well as the efficiency of the RFT method, the experimental results for implementation of a face-recognition computationally intensive neural network and its corresponding RFT realization are presented in this paper. The results demonstrate promising higher performance of artificial neural network VLSI solutions for complex applications in faulty nano-scale implementation environments.

53 citations


Journal ArticleDOI
TL;DR: The performance in terms of the processing speed of the architecture designed based on the proposed scheme is superior to those of the architectures designed using other existing schemes, and it has similar or lower hardware consumption.
Abstract: In this paper, a scheme for the design of a high-speed pipeline VLSI architecture for the computation of the 2-D discrete wavelet transform (DWT) is proposed. The main focus in the development of the architecture is on providing a high operating frequency and a small number of clock cycles along with an efficient hardware utilization by maximizing the inter-stage and intra-stage computational parallelism for the pipeline. The inter-stage parallelism is enhanced by optimally mapping the computational task of multi decomposition levels to the stages of the pipeline and synchronizing their operations. The intra-stage parallelism is enhanced by dividing the 2-D filtering operation into four subtasks that can be performed independently in parallel and minimizing the delay of the critical path of bit-wise adder networks for performing the filtering operation. To validate the proposed scheme, a circuit is designed, simulated, and implemented in FPGA for the 2-D DWT computation. The results of the implementation show that the circuit is capable of operating with a maximum clock frequency of 134 MHz and processing 1022 frames of size 512 × 512 per second with this operating frequency. It is shown that the performance in terms of the processing speed of the architecture designed based on the proposed scheme is superior to those of the architectures designed using other existing schemes, and it has similar or lower hardware consumption.

51 citations


Journal ArticleDOI
TL;DR: A very-large-scale integration (VLSI) friendly electrocardiogram (ECG) QRS detector for body sensor networks by a mathematical morphological method and the multipixel modulus accumulation is employed to act as a low-pass filter to enhance the QRS complex and improve the signal-to-noise ratio.
Abstract: This paper aims to present a very-large-scale integration (VLSI) friendly electrocardiogram (ECG) QRS detector for body sensor networks. Baseline wandering and background noise are removed from original ECG signal by a mathematical morphological method. Then the multipixel modulus accumulation is employed to act as a low-pass filter to enhance the QRS complex and improve the signal-to-noise ratio. The performance of the algorithm is evaluated with standard MIT-BIH arrhythmia database and wearable exercise ECG Data. Corresponding power and area efficient VLSI architecture is designed and implemented on a commercial nano-FPGA. High detection rate and high speed demonstrate the effectiveness of the proposed detector.

50 citations


Journal ArticleDOI
TL;DR: The design and implementation of three neuromorphic integrated circuits developed for the COLAMN ("Novel Computing Architecture for Cognitive Systems based on the Laminar Microcircuitry of the Neocortex") project are overviews.

Journal ArticleDOI
TL;DR: This paper presents the design and characterization of 12 full-adder circuits in the IBM 90-nm process, including three new full-adders circuits using the recently proposed split-path data driven dynamic logic.
Abstract: This paper presents the design and characterization of 12 full-adder circuits in the IBM 90-nm process. These include three new full-adder circuits using the recently proposed split-path data driven dynamic logic. Based on the logic function realized, the adders were characterized for performance and power consumption when operated under various supply voltages and fan-out loads. The adders were then further deployed in a 32 bit ripple carry adder and 8×4 multiplier to evaluate the impact of sum and carry propagation delays on the performance, power of these systems. Performance characterization of the adder circuits in the presence of process and voltage variations was also performed through Monte Carlo simulations. Besides analyzing and comparing circuit performance, the possible impact of the choice of logic function has also been underlined in this study.

Journal ArticleDOI
TL;DR: A useful framework for building bio-inspired systems in real-time environments, reducing computational complexity is presented, and a complete quantization study of neuromorphic robust optical flow architecture is performed, using properties found in the cortical motion pathway.

Proceedings ArticleDOI
01 Jul 2012
TL;DR: Fundamental information-theoretic bounds are provided on the required circuit wiring complexity and power consumption for encoding and decoding of error-correcting codes and for bounded transmit-power schemes, showing that there is a fundamental tradeoff between the transmit and encoding/decoding power.
Abstract: We provide fundamental information-theoretic bounds on the required circuit wiring complexity and power consumption for encoding and decoding of error-correcting codes. These bounds hold for all codes and all encoding and decoding algorithms implemented within the paradigm of our VLSI model. This model essentially views computation on a 2-D VLSI circuit as a computation on a network of connected nodes. The bounds are derived based on analyzing information flow in the circuit. They are then used to show that there is a fundamental tradeoff between the transmit and encoding/decoding power, and that the total (transmit + encoding + decoding) power must diverge to infinity at least as fast as cube-root of log 1/p e , where P e is the average block-error probability. On the other hand, for bounded transmit-power schemes, the total power must diverge to infinity at least as fast as square-root of log 1/P e due to the burden of encoding/decoding.

Journal ArticleDOI
TL;DR: It is implied from the current literature that only an appropriate choice of leakage power minimization technique for a specific application can be effectively carried by a VLSI circuit designer based on sequential analytical approach.

Book
13 Nov 2012
TL;DR: In this paper, the authors discuss the physical and geometrical effects on the behavior of the MOS transistor and the effects of scaling on MOS IC design and consequences for the roadmap.
Abstract: 1 Basic Principles.- 2 Physical and geometrical effects on the behaviour of the MOS transistor.- 3 Manufacture of MOS devices.- 4 CMOS circuits.- 5 Special circuits, devices and technologies.- 6 Memories.- 7 Very Large Scale Integration (VLSI) and ASICs.- 8 Low power, a hot topic in IC design.- 9 Circuit reliability and signal integrity in deep-submicron designs.- 10 Testing, debugging, yield and packaging.- 11 Effects of scaling on MOS IC design and consequences for the roadmap.

Journal ArticleDOI
TL;DR: A singular value decomposition (SVD) algorithm with superlinear-convergence rate, which is suitable for the beamforming mechanism in MIMO-OFDM channels with short coherent time, or short training sequence, is proposed.
Abstract: In this paper, we propose a singular value decomposition (SVD) algorithm with superlinear-convergence rate, which is suitable for the beamforming mechanism in MIMO-OFDM channels with short coherent time, or short training sequence. The proposed superlinear-convergence SVD (SL-SVD) algorithm has the following features: 1) superlinear-convergence rate; 2) the ability of being extended smaller numbers of transmit and receive antennas; 3) being insensitive to dynamic range problems during the iterative process in hardware implementations; and 4) low computational cost. We verify the proposed design by using the VLSI implementation with CMOS 90 nm technology. The post-layout result of the design has the feature of 0.48 core area and 18 mW power consumption. Our design can achieve 7 M channel-matrices/s, and can be extended to deal with different transmit and receive antenna sets.

Book ChapterDOI
01 Jan 2012
TL;DR: This chapter provides an overview of conic optimization models for facility layout and VLSI floorplanning problems, and finds that the semidefinite optimization approaches can provide global optimal solutions for instances with up to 40 facilities, and tight global bounds for instancesWith up to 100 facilities.
Abstract: This chapter provides an overview of conic optimization models for facility layout and VLSI floorplanning problems. We focus on two classes of problems to which conic optimization approaches have been successfully applied, namely the single-row facility layout problem, and fixed-outline floorplanning in VLSI circuit design. For the former, a close connection to the cut polytope has been exploited in positive semidefinite and integer programming approaches. In particular, the semidefinite optimization approaches can provide global optimal solutions for instances with up to 40 facilities, and tight global bounds for instances with up to 100 facilities. For the floorplanning problem, a conic optimization model provided the first non-trivial lower bounds in the literature.

Book
12 Jun 2012
TL;DR: In this article, the authors present a unified approach to yield analysis of fault-free or fault-tolerant VLSI manufacturing, including the effect of defect density on yield.
Abstract: 1 Models for VLSI Manufacturing Yield.- Fault-Free or Fault-Tolerant VLSI Manufacturing.- Yield Models - Comparative Study.- 2 Models for Defects and Yield.- A Unified Approach to Yield Analysis of Defect Tolerant Circuits.- Systematic Extraction of Critical Areas From IC Layouts.- The Effect on Yield of Clustering and Radial Variations in Defect Density.- 3 Implementation of Wafer Scale Integration.- Practical Experiences in the Design of a Wafer Scale 2-D Array.- Yield Evaluation of a Soft-Configurable WSI Switch Network.- ASP Modules: WSI Building-Blocks for Cost-Effective Parallel Computing.- 4 Fault Tolerance.- Fault-Tolerant k-out-of-n Logic Unit Network With Minimum Interconnection.- Extended Duplex Fault Tolerant System With Integrated Control Flow Checking.- Experience in Functional Test and Fault Coverage in a Silicon Compiler.- 5 Array Processors.- APES: An Evaluation Environment of Fault-Tolerance Capabilities of Array Processors.- Comparison of Reconfiguration Schemes for Defect Tolerant Mesh Arrays.- An Integer Linear Programming Approach to General Fault Covering Problems.- Probabilistic Analysis of Memory Repair and Reconfiguration Heuristics.- Arithmetic-Based Diagnostics in VLSI Array Processors.- 6 New Approaches and Issues.- Yield Improvement Through X-RAY Lithography.- Reliability Analysis of Application-Specific Architectures.- Fault Tolerance in Analog VLSI: Case Study of a Focal Plane Processor.- 7 Yield and Manufacturing Defects.- Yield Model With Critical Geometry Analysis for Yield Projection from Test Sites on a Wafer Basis With Confidence Limits.- SRAM/TEG Yield Methodology.- A Fault Detection and Tolerance Tradeoff Evaluation Methodology for VLSI Systems.- 8 Designs for Wafer Scale Integration.- A Hypercube Design on WSI.- An Efficient Reconfiguration Scheme for WSI of Cube-Connected Cycles With Bounded Channel Width.- A Communication Scheme for Defect Tolerant Arrays.

Book
19 Apr 2012
TL;DR: VLSI-Based Image Resampling for Electronic Publishing and VLSI Based Multicomputer Architecture for Dynamic Scene Analysis: System Architecture and Performance.
Abstract: 1.1 VLSI System.- 1.2 VLSI Algorithms.- 1.3 Summary of Book.- References.- I General VLSI Design Considerations.- 2. One-Dimensional Systolic Arrays for Multidimensional Convolution and Resampling.- 2.1 Background.- 2.2 Systolic Convolution Arrays.- 2.3 Variants in the Convolution Problem.- 2.4 Implementation.- 2.5 Concluding Remarks.- References.- 3. VLSI Arrays for Pattern Recognition and Image Processing: I/O Bandwidth Considerations.- 3.1 Background.- 3.2 Arrays for Matrix Operations.- 3.3 Arrays for Pattern Analysis.- 3.4 Image-Processing Array.- 3.5 Conclusions.- References.- II VLSI Systems for Pattern Recognition.- 4. VLSI Arrays for Minimum-Distance Classifications.- 4.1 Minimum-Distance Classification.- 4.2 Vector Distances.- 4.3 String Distances.- 4.4 Examples of Application.- 4.5 Summary.- References.- 5. Design of a Pattern Cluster Using Two-Level Pipelined Systolic Array.- 5.1 Background.- 5.2 Description of Squared-Error Pattern Clustering.- 5.3 The Systolic Pattern Clustering Array.- 5.4 System Operating Characteristics.- 5.5 Conclusion.- References.- 6. VLSI Arrays for Syntactic Pattern Recognition.- 6.1 Pattern Description and Recognition.- 6.2 Syntactic Pattern Recognition.- 6.3 VLSI Implementation.- 6.4 Simulation and Examples.- 6.5 Concluding Remarks.- References.- III VLSI Systems for Image Processing.- 7. Concurrent Systems for Image Analysis.- 7.1 Background.- 7.2 Processing Requirements for Image Analysis.- 7.3 Concurrent VLSI Architectures.- 7.4 Conclusions.- References.- 8. VLSI Wavefront Arrays for Image Processing.- 8.1 Background.- 8.2 Parallel Computers for Image Processing.- 8.3 Design of Pipelined Array Processors.- 8.4 Image-Processing Applications.- 8.5 Wafer-Scale Integrated System.- 8.6 Conclusion.- References.- 9. Curve Detection in VLSI.- 9.1 Background.- 9.2 Montanari's Algorithm.- 9.3 Systolic Architectures.- 9.4 A VLSI Design of the Column Pipeline Chip.- 9.5 SIMD Array Algorithm.- 9.6 Concluding Remarks.- References.- 10. VLSI Implementation of Cellular Logic Processors.- 10.1 Cellular Logic Processors.- 10.2 Binary Neighborhood Functions.- 10.3 CELLSCAN.- 10.4 The Golay Parallel Pattern Transform.- 10.5 The diff3-GLOPR.- 10.6 The Preston-Herron Processor (PHP).- 10.7 The LSI/PHP.- References.- 11. Design of VLSI Based Multicomputer Architecture for Dynamic Scene Analysis.- 11.1 Background.- 11.2 Dynamic Scene Analysis Algorithm.- 11.3 Existing Multicomputer Architecture.- 11.4 Scheduling and Parameters of Interest.- 11.5 Performance Evaluation.- 11.6 Conclusion.- References.- 12. VLSI-Based Image Resampling for Electronic Publishing.- 12.1 Introduction to the "Electronic Darkroom".- 12.2 Overview of System Design Concepts.- 12.3 Resampling Algorithms.- 12.4 System Architecture and Performance.- 12.5 Trade-Offs and Conclusions.- References.

Proceedings ArticleDOI
10 Jun 2012
TL;DR: This paper demonstrates a strategy to implement temporal delays in hardware spiking neural networks distributed across multiple Very Large Scale Integration (VLSI) chips by exploiting the inherent device mismatch present in the analog circuits that implement silicon neurons and synapses inside the chips.
Abstract: Axonal delays are used in neural computation to implement faithful models of biological neural systems, and in spiking neural networks models to solve computationally demanding tasks. While there is an increasing number of software simulations of spiking neural networks that make use of axonal delays, only a small fraction of currently existing hardware neuromorphic systems supports them. In this paper we demonstrate a strategy to implement temporal delays in hardware spiking neural networks distributed across multiple Very Large Scale Integration (VLSI) chips. This is achieved by exploiting the inherent device mismatch present in the analog circuits that implement silicon neurons and synapses inside the chips, and the digital communication infrastructure used to configure the network topology and transmit the spikes across chips. We present an example of a recurrent VLSI spiking neural network that employs axonal delays and demonstrate how the proposed strategy efficiently implements them in hardware.

Journal ArticleDOI
27 Sep 2012-Sensors
TL;DR: The developed low-cost system is practical for real-time throughput and reduced power consumption and is useful in robotic applications, such as tracking, navigation using an unmanned vehicle, or as part of a more complex system.
Abstract: This work presents the implementation of a matching-based motion estimation sensor on a Field Programmable Gate Array (FPGA) and NIOS II microprocessor applying a C to Hardware (C2H) acceleration paradigm. The design, which involves several matching algorithms, is mapped using Very Large Scale Integration (VLSI) technology. These algorithms, as well as the hardware implementation, are presented here together with an extensive analysis of the resources needed and the throughput obtained. The developed low-cost system is practical for real-time throughput and reduced power consumption and is useful in robotic applications, such as tracking, navigation using an unmanned vehicle, or as part of a more complex system.

Journal ArticleDOI
TL;DR: The core and chip design methodology and specific design features are presented, focusing on techniques used to enable high-frequency operation, including chip power, IR drop, and supply noise are discussed, being key design focus areas.
Abstract: This paper describes the circuit and physical design features of the z196 processor chip, implemented in a 45 nm SOI technology. The chip contains 4 super-scalar, out-of-order processor cores, running at 5.2 GHz, on a die with an area of 512 mm2 containing an estimated 1.4 billion transistors. The core and chip design methodology and specific design features are presented, focusing on techniques used to enable high-frequency operation. In addition, chip power, IR drop, and supply noise are discussed, being key design focus areas. The chip's ground-breaking RAS features are also described, engineered for maximum reliability and system stability.

Book
14 Feb 2012
TL;DR: Low-Power VLSI Design, Low-Power Embedded BiCMOS/ECL SRAMs, and Inter-Chip Low-Voltage Swing Transceivers.
Abstract: List of Figures. List of Tables. Preface. 1. Low-Power VLSI Design. 2. Low-Power High-Performance Adders. 3. Low-Power High-Performance Multipliers. 4. Low-Power Register File. 5. Low-Power Embedded BiCMOS/ECL SRAMs. 6. BiCMOS on-Chip Drivers. 7. Inter-Chip Low-Voltage Swing Transceivers. References. Index.

Journal ArticleDOI
TL;DR: This paper introduces a modeling and analysis framework based on continuous computations and zero-bit message channels, and employs this framework for the correctness & performance analysis of a distributed fault-tolerant clocking approach for Systems-on-Chip (SoCs).
Abstract: Classic distributed computing abstractions do not match well the reality of digital logic gates, which are the elementary building blocks of Systems-on-Chip (SoCs) and other Very Large Scale Integrated (VLSI) circuits: Massively concurrent, continuous computations undermine the concept of sequential processes executing sequences of atomic zero-time computing steps, and very limited computational resources at gate-level make even simple operations prohibitively costly. In this paper, we introduce a modeling and analysis framework based on continuous computations and zero-bit message channels, and employ this framework for the correctness & performance analysis of a distributed fault-tolerant clocking approach for Systems-on-Chip (SoCs). Starting out from a “classic” distributed Byzantine fault-tolerant tick generation algorithm, we show how to adapt it for direct implementation in clockless digital logic, and rigorously prove its correctness and derive analytic expressions for worst case performance metrics like synchronization precision and clock frequency. Rather than on absolute delay values, both the algorithm’s correctness and the achievable synchronization precision depend solely on the ratio of certain path delays. Since these ratios can be mapped directly to placement & routing constraints, there is typically no need for changing the algorithm when migrating to a faster implementation technology and/or when using a slightly different layout in an SoC.

Journal ArticleDOI
TL;DR: This paper implements a reversible 5421 to binary code converter, a model for quantum computation in which a computation is performed by a sequence of quantum gates, which are reversible transformation on a quantum mechanical analog of an n bit register, referred to as an n- qubit.
Abstract: Quantum computing is one of the emerging fields of technology that has been a guiding light for low power VLSI, low power CMOS design, optical information computing, DNAcomputing, Bio-informatics and Nano-technology .A quantum circuit is a model for quantum computation in which a computation is performed by a sequence of quantum gates, which are reversible transformation on a quantum mechanical analog of an n bit register, referred to as an n- qubit. Code converters are required day in and day out. In this paper, a reversible 5421 to binary code converter is implemented.

Journal ArticleDOI
TL;DR: A trellis-search based soft-input soft-output detection algorithm and its very large scale integration (VLSI) architecture for iterative multiple-input multiple-output (MIMO) receivers is proposed and an approximate Log-MAP algorithm is developed by using a small list of largest exponential terms to compute the LLR (log-likelihood ratio) values.
Abstract: In this paper, we propose a trellis-search based soft-input soft-output detection algorithm and its very large scale integration (VLSI) architecture for iterative multiple-input multiple-output (MIMO) receivers. We construct a trellis diagram to represent the search space of a transmitted MIMO signal. With the trellis model, we evenly distribute the workload of candidates searching among multiple trellis nodes for parallel processing. The search complexity is significantly reduced because the number of candidates is greatly limited at each trellis node. By leveraging the trellis structure, we develop an approximate Log-MAP algorithm by using a small list of largest exponential terms to compute the LLR (log-likelihood ratio) values. The trellis-search based detector has a fixed-complexity and is very suitable for parallel VLSI implementation. As a case study, we have designed and synthesized a trellis-search based soft-input soft-output MIMO detector for a 4 × 4 16-QAM system using a 1.08 V TSMC 65 nm technology. The detector can achieve a maximum throughput of 1.7 Gb/s with a core area of 1.58 mm2.

Patent
Benjamin J. Bowers1, Matthew W. Baker1, Anthony Correale1, Irfan Rashid1, Paul M. Steinmetz1 
01 Feb 2012
TL;DR: In this article, a behavioral representation of RTL HDL with one or more 1×N building blocks is presented, where the behavioral representation can be modified by using logic design tools, synthesis tools, physical design tools and timing analysis tools.
Abstract: Embodiments that design integrated circuits using a 1×N compiler in a closed-loop 1×N methodology are disclosed. Some embodiments create a physical design representation based on a behavioral representation of a design for an integrated circuit. The behavioral representation may comprise RTL HDL with one or more 1×N building blocks. The embodiments may alter elements of the 1×N building block by using logic design tools, synthesis tools, physical design tools, and timing analysis tools. Further embodiments comprise an apparatus having a first generator to generate a behavioral representation of a design for an integrated circuit, a second generator to generate a logical representation of the design, and a third generator to generate a physical design representation of the design, wherein the representation generators may create updated versions of the representations which reflect alterations made to 1×N building block elements.


Journal ArticleDOI
TL;DR: The output of the simulation for the two-stage opamp shows that the PSO technique is an accurate and promising approach in determining the device sizes in an analog circuit.
Abstract: Problem statement: Day by day more and more products rely on analog circuits to improve the speed and reduce the power consumption(Products rely on analog circuits to improve the speed and reduce the power consumption day by day more and more.). For the VLSI implementation analog circuit design plays an important role. This analog circuit synthesis might be the most challenging and time-consumed task, because it does not only co nsist of topology and layout synthesis but also of component sizing. Approach: A Particle Swarm Optimization (PSO) technique for the optimal design of analog circuits. Analog signal processing finds many applications and widely uses OpAmp based amplifiers, mixers, comparators. and filters. Results: A two-stage opamp (Miller Operational Trans-conductance Amplifier (OTA)) is considered for the synthesis that satisfies certain design specifications. Performance has been evaluated with the Simulation Program with Integrated Circuit Emphasis (SPICE) circuit simulator until optimal si zes of the transistors are found. Conclusion: The output of the simulation for the two-stage opamp sh ows that the PSO technique is an accurate and promising approach in determining the device sizes in an analog circuit.