scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2005"


Book
01 Jan 2005
TL;DR: The present work focuses on circuits, electronics, and electromagnetics, which consist of circuits, which are concerned with the construction of circuits based on discrete componentry, and their applications in telecommunications, media, and electronics.
Abstract: Section 1- Circuits ed. Krishnaiyan Thulasiraman 1- Basic Circuit Analysis, by Rajan and Sekar 2- Circuit Analysis: A Graph Theoretic Foundation, by Thulasiraman and Swamy 3- Computer Aided Design, by Opal 4- Synthesis of Networks, by Vlach 5- Nonlinear Circuits, by Trajkovic Section 2- Electronics ed. Krishna Shenai 1- Power Electronics, by Lee and Zhou 2- Noise in Analog and Digital Systems, by McShane and Shenai 3- Field Effect Transistors, by Ozturk and Misra 4- Active Filters, by Schaumann 5- Junction Diodes and Bipolar Junction Transistors, by Schroter 6- Semiconductors, by Shur 7- Power Semiconductor Devices, by Trivedi and Shenai 3 VLSI Systems ed. Magdy A. Bayoumi 1- VLSI Arithmetic, by Stouraitis 2- Memories, by Cathoor 3- Hardware Description Languages, by Huss 4- Clock Skew Scheduling for Improved Reliability, by Kourtev and Friedman 5-Low Power Technology, by Bayoumi 6- MEM's, by Zaghloul 7-Interconnect Noise Analysis and Optimization, by Elgamel and Bayoumi 8- Noise Analysis and Design in Deep Submicron, by Elgamel and Bayoumi 4 Digital Systems & Computer Engineering ed. Wai-Kai Chen 1- Computer Architecture, by Chang 2- Multiprocessors, Parallel Processors and Reconfigurable Computing, by Luk, Cheung and Constantinides 3- Configurable Computing, by Luk 4- Operating Systems, by Lien 5- Expert Systems, by Shang 6- Multimedia Systems, by Khokhar 7- Multimedia Networks and Communication, by Khokhar 8- Fault-Tolerant Computing, by Dutt 9- Petrinets, High-Level Petrinets and Applications, by Murata and He 5 Electromagnetics ed. David Yang Preface, by Yang 1 Magnetostatics, by Whites 2 Electrostatics, by Diaz 3 Plane Wave Propagation and Reflection, by Jackson 4 Transmission Lines, by Naishadham 5 Guided-Waves, by DeFlaviis 6 Antennas, by Das 7 Microwave Passive Components, by Wu, Zhu and Vahldieck 8 Computational Electromagnetics I: The Method of Moments, by Jin and Chew 9 Computational Electromagnetics II: The Finite Difference Time Domain Method, by Taflove 10 Radar and Inverse Scattering, by Li and Kiang 11 Microwave Active Circuits and Integrated Antennas, by Deal et al 6 Electric Power Systems ed. Anjan Bose Preface, by Bose 1 Three Phase Alternating Current Systems, by Bose 2 Electric Power System Components, by Bose 3 Power Transformers, by Degeneff 4 Rotating Machines, by Salon 5 High Voltage Transmission, by Gorur 6 Power Distribution, by Gonen 7 Power Systems Analysis, by Venkatasubramanian and Tomsovic 8 Power System Operation and Control, by Venkatasubramanian and Tomsovic 9 Fundamentals of Power System Protection, by Kezunovic 10 Power Quality, by Heydt 7 Signal Processing ed. Yih-Fang Huang 1 An Introduction to Signal Processing, by Ansari 2 Digital Filters, by Diniz 3 Speech Processing, by Deller 4 Image Processing, by da Silva 5 Multimedia Systems and Signal Processing, by Smith 6 Adaptive and Statistical Signal Processing, by Huang 7 VLSI Signal Processing, by Hu 8 Digital Communication and Communication Networks Authors Vijay Garg and Yih-Chen Wang 1 Signal Types, Properties and Processes 2. Analog Formatting to Digital Systems: Sampling, Quantization, Coding and Corruption 3. Transmission of Digital Data in Baseband Channels 4. Modulation and Demodulation of Baseband Signals to RF Carriers 5. Access Technologies: FDMA, TDMA, and CDMA 6. Convergence, Data Networking, Transmission, and Network Architecture 9 Control and Systems ed. Michael Sain 1 Algebraic Topics in Control, by Schrader 2 Stability, by Liu 3 Robust Multivariable Control, by Gonzalez and Kelkar 4 State Estimation, by Farrell 5 Cost Cumulants and Risk-Sensitive Control, by Won 6 Frequency Domain System Identification, by Jin 7 Modeling Components and Connections, by Liberty 8 Fault Tolerant Control, by Yen 9 Gain Scheduled Controllers, by Bett 10 Sliding Mode Control, by Yurkovich 11 Nonlinear Input/Output Control: Volterra Synthesis, by Sain 12 Intelligent Control of Nonlinear Systems with Time-Varying Structure, by Passino and Ordonez 13 Learning Controllers, by Si 14 Software Technologies for Complex Control Systems, by Heck

244 citations


Journal ArticleDOI
TL;DR: A smart-sensor VLSI circuit suitable for focal-plane low-level image processing applications is presented, based on a fine-grain software-programmable SIMD processor array utilising a switched-current analog microprocessor concept.
Abstract: A smart-sensor VLSI circuit suitable for focal-plane low-level image processing applications is presented. The architecture of the device is based on a fine-grain software-programmable SIMD processor array. Processing elements, integrated within each pixel of the imager, are implemented utilising a switched-current analog microprocessor concept. This allows the achievement of real-time image processing speeds with high efficiency in terms of silicon area and power dissipation. The prototype 21 /spl times/ 21 vision chip is fabricated in a 0.6 /spl mu/m CMOS technology and achieves a cell size of 98.6 /spl mu/m /spl times/ 98.6 /spl mu/m. It executes over 1.1 giga instructions per second (GIPS) while dissipating under 40 mW of power. The architecture, circuit design and experimental results are presented in this paper.

179 citations


Journal ArticleDOI
TL;DR: Experimental results reveal that the proposed adders achieve delay reductions of up to 14 percent when compared to the fastest parallel-prefix architectures presented for the traditional definition of carry equations.
Abstract: Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are well-suited for VLSI implementations. A novel framework is introduced, which allows the design of parallel-prefix Ling adders. The proposed approach saves one-logic level of implementation compared to the parallel-prefix structures proposed for the traditional definition of carry lookahead equations and reduces the fanout requirements of the design. Experimental results reveal that the proposed adders achieve delay reductions of up to 14 percent when compared to the fastest parallel-prefix architectures presented for the traditional definition of carry equations.

146 citations


Journal ArticleDOI
Xiang Xie1, Guolin Li1, Xinkai Chen1, Lu Liu1, Chun Zhang1, Zhihua Wang1 
01 Nov 2005
TL;DR: An architecture of the wireless endoscopy system for the diagnoses of whole human digestive tract and real-time endoscopic image monitoring and a very large scale integration (VLSI) architecture of three-stage clock management is applied, which can save 46% power inside the capsule compared with the design without such a low-power design.
Abstract: This paper proposes an architecture of the wireless endoscopy system for the diagnoses of whole human digestive tract and real-time endoscopic image monitoring. The low-power digital IC design inside the wireless endoscopic capsule is discussed in detail. A very large scale integration (VLSI) architecture of three-stage clock management is applied, which can save 46% power inside the capsule compared with the design without such a low-power design. A stoppable ring crystal oscillator with minimal overhead is used in the sleep mode, which results in about 60-muW system power dissipation in sleep mode. A new image compression algorithm based on Bayer image format and its corresponding VLSI architecture are both proposed for low-power, high-data volume. Thus, 8 frames per second with 320*288 pixels can be transmitted with 2 Mb/s. The digital IC design also assures that the capsule has many flexible and useful functions for clinical application. The digital circuits were verified on field-programmable gate arrays and have been implemented in 0.18-mum CMOS process with 6.2 mW

121 citations


Proceedings ArticleDOI
07 Mar 2005
TL;DR: In this article, the authors present a digital VLSI design flow to create secure, side-channel attack resistant integrated circuits, where the design flow starts from a normal design in a hardware description language, such as VHDL or Verilog, and provides a direct path to an SCA resistant layout.
Abstract: The paper presents a digital VLSI design flow to create secure, side-channel attack (SCA) resistant integrated circuits. The design flow starts from a normal design in a hardware description language, such as VHDL or Verilog, and provides a direct path to an SCA resistant layout. Instead of a full custom layout or an iterative design process with extensive simulations, a few key modifications are incorporated in a regular synchronous CMOS standard cell design flow. We discuss the basis for side-channel attack resistance and adjust the library databases and constraints files of the synthesis and place-and-route procedures accordingly. Experimental results show that a DPA (differential power analysis) attack on a regular single ended CMOS standard cell implementation of a module of the DES algorithm discloses the secret key after 200 measurements. The same attack on a secure version still does not disclose the secret key after more than 2000 measurements.

104 citations


Proceedings ArticleDOI
30 Aug 2005
TL;DR: It is shown that for many VLSI implementations of signal processing algorithms, such as MPEG and JPEG encoders, a significant proportion of chips having low levels of defects provide erroneous but acceptable results.
Abstract: As feature sizes continue to decrease and clock rates and device count on a VLSI chip increase, it becomes increasingly more difficult to maintain yields at their present levels. Process variation, noise and spot defects create very costly problems for our industry. Luckily, in the domain of multi-media, there exists a large body of functions where computational results need not always be correct. We show that for many VLSI implementations of signal processing algorithms, such as MPEG and JPEG encoders, a significant proportion of chips having low levels of defects provide erroneous but acceptable results. We introduce the concept of error-tolerance, and mention related issues needed to support this concept, including ways for specifying performance, design techniques that consider yield, test techniques for quantifying erroneous behavior, and finally the issue of marketing. The motivation for this work is to significantly increase the effective yield of a process, encourage the implementation of complex data processing chips, and drastically reduce chip costs.

97 citations


Journal ArticleDOI
TL;DR: A flexible and efficient architecture for a one-level 2-D DWT that exploits many advantages of the presented analysis is proposed and is proposed as well.
Abstract: In this paper, a detailed analysis of very large scale integration (VLSI) architectures for the one-dimensional (1-D) and two-dimensional (2-D) discrete wavelet transform (DWT) is presented in many aspects, and three related architectures are proposed as well. The 1-D DWT and inverse DWT (IDWT) architectures are classified into three categories: convolution-based, lifting-based, and B-spline-based. They are discussed in terms of hardware complexity, critical path, and registers. As for the 2-D DWT, the large amount of the frame memory access and the die area occupied by the embedded internal buffer become the most critical issues. The 2-D DWT architectures are categorized and analyzed by different external memory scan methods. The implementation issues of the internal buffer are also discussed, and some real-life experiments are given to show that the area and power for the internal buffer are highly related to memory technology and working frequency, instead of the required memory size only. Besides the analysis, the B-spline-based IDWT architecture and the overlapped stripe-based scan method are also proposed. Last, we propose a flexible and efficient architecture for a one-level 2-D DWT that exploits many advantages of the presented analysis.

95 citations


Book ChapterDOI
TL;DR: This chapter reviews the reduced-order modeling techniques that are most widely employed in VLSI circuit simulation.
Abstract: In recent years, reduced-order modeling techniques have proven to be powerful tools for various problems in circuit simulation. For example, today, reduction techniques are routinely used to replace the large RCL subcircuits that model the interconnect or the pin package of VLSI circuits by models of much smaller dimension. In this chapter, we review the reduced-order modeling techniques that are most widely employed in VLSI circuit simulation.

90 citations


Journal ArticleDOI
TL;DR: Three generic RAM-based architectures are proposed to efficiently construct the corresponding two-dimensional architectures by use of the line-based method for any given hardware architecture of one-dimensional wavelet filters, including conventional convolution-based and lifting-based architecture.
Abstract: In this paper, three generic RAM-based architectures are proposed to efficiently construct the corresponding two-dimensional architectures by use of the line-based method for any given hardware architecture of one-dimensional (1-D) wavelet filters, including conventional convolution-based and lifting-based architectures. An exhaustive analysis of two-dimensional architectures for discrete wavelet transform in the system view is also given. The first proposed architecture is for 1-level decomposition, which is presented by introducing the categories of internal line buffers, the strategy of optimizing the line buffer size, and the method of integrating any 1-D wavelet filter. The other two proposed architectures are for multi-level decomposition. One applies the recursive pyramid algorithm directly to the proposed 1-level architecture, and the other one combines the two previously proposed architectures to increase the hardware utilization. According to the comparison results, the proposed architecture outperforms previous architectures in the aspects of line buffer size, hardware cost, hardware utilization, and flexibility.

89 citations


Journal ArticleDOI
TL;DR: The estimation is quick, not requiring extensive simulation or use of computer-aided design tools, yet sufficiently accurate to provide guidance through various choices in the design process.
Abstract: In this paper, we motivate the concept of comparing very large scale integration adders based on their energy-delay characteristics and present results of our estimation technique. This stems from a need to make appropriate selection at the beginning of the design process. The estimation is quick, not requiring extensive simulation or use of computer-aided design tools, yet sufficiently accurate to provide guidance through various choices in the design process. We demonstrate the accuracy of the method by applying it to examples of high-performance 32- and 64-b adders in 100- and 130-nm CMOS technologies.

84 citations


Journal ArticleDOI
TL;DR: A closed-loop voltage swing controller that samples the error retransmission rate to determine the operational voltage swing and an embodiment of a self-calibrating circuit that compensates for significant manufacturing parameter deviations and environmental variations is described.
Abstract: Systems-on-Chip (SoC) design involves several challenges, stemming from the extreme miniaturization of the physical features and from the large number of devices and wires on a chip. Since most SoCs are used within embedded systems, specific concerns are increasingly related to correct, reliable, and robust operation. We believe that in the future most SoCs will be assembled by using large-scale macro-cells and interconnected by means of on-chip networks. We examine some physical properties of on-chip interconnect busses, with the goal of achieving fast, reliable, and low-energy communication. These objectives are reached by dynamically scaling down the voltage swing, while ensuring data integrity-in spite of the decreased signal to noise ratio-by means of encoding and retransmission schemes. In particular, we describe a closed-loop voltage swing controller that samples the error retransmission rate to determine the operational voltage swing. We present a control policy which achieves our goals with minimal complexity; such simplicity is demonstrated by implementing the policy in a synthesizable controller. Such a controller is an embodiment of a self-calibrating circuit that compensates for significant manufacturing parameter deviations and environmental variations. Experimental results show that energy savings amount up to 42%, while at the same time meeting performance requirements.

Proceedings ArticleDOI
23 May 2005
TL;DR: A novel, low-cost, high-performance VLSI architecture design for MPEG-4 AVC/H.264 CAVLC decoding that can decode every syntax element per cycle and achieves maximum speed at 175 MHz.
Abstract: The demand of high quality video and high data compression enables MPEG-4 AVC/H.264 to adopt the context-based adaptive variable length code (CAVLC) technique contrary to the traditional MPEG-4 VLC techniques. The paper presents a novel, low-cost, high-performance VLSI architecture design for MPEG-4 AVC/H.264 CAVLC decoding. We exploit five different techniques to reduce both the hardware cost and power consumption, and to increase the data throughput rate. They are PCCF (partial combinational component freezing), HLLT (hierarchical logic for look-up tables), ZTEBA (zero-left table elimination by arithmetic), IDS (interleaved double stacks), and ZCS (zero codeword skip). The proposed design can decode every syntax element per cycle. The synthesis result shows that the design achieves maximum speed at 175 MHz. When we synthesize the proposed design at the clock constraint of 125 MHz, the hardware cost is about 4720 gates under a 0.18 /spl mu/m CMOS technology, which achieves the real-time processing requirement for H.264 video decoding on HD1080i format video.

Journal ArticleDOI
14 Nov 2005
TL;DR: In this paper, a new reverse conversion algorithm for the four-moduli set is presented for even values of n, where the number theoretic properties of the popular three moduli set are exploited to realize a VLSI efficient alternative to that reported in the literature.
Abstract: A new reverse conversion algorithm is presented for the four-moduli set {2n−1,2n,2n+1, 2n+1−1}, for even values of n The number theoretic properties of the popular three-moduli set {2n−1,2n, 2n+1} have been exploited to realise a VLSI efficient alternative to that reported in the literature The architecture proposed for most time efficient implementation provides for about three times speed-up Another four-moduli set {2n−1, 2n, 2n+1, 2n−1−1} has also been proposed by further extending this algorithm in an attempt to better adjust to dynamic ranges that cannot be best represented by the former four-moduli set Unlike the existing reverse converter for the four-moduli set {2n−1, 2n, 2n+1, 2n−1−1}, the proposed architecture is shown to be more efficient both in terms of area and time, mainly due to deploying the properties of the three-moduli set {2n−1, 2n, 2n+1} Moreover, adder-based architectures for each moduli set lend themselves well to VLSI efficient implementations Finally, both the architectures can be readily pipelined to achieve higher throughputs


Journal ArticleDOI
TL;DR: The whole control strategy is hybrid in the sense that the gait generation is accomplished by a fully analog CNN, while a simple logic unit modulates the behavior of the CNN-based CPG, so that the strategy is suitable to eventually include sensory feedback.
Abstract: In this paper, the paradigm of emergent computation is applied to locomotion control in legged robots: the locomotion gait is the result of self-organization of a network of locally coupled nonlinear oscillators. This means to adopt the biological paradigm of central pattern generator (CPG), implemented by using cellular neural networks (CNNs). The whole control strategy is hybrid in the sense that the gait generation is accomplished by a fully analog CNN, while a simple logic unit modulates the behavior of the CNN-based CPG, so that the strategy is suitable to eventually include sensory feedback. The design of a VLSI chip implementing the CNN-based CPG and some experimental results on the chip are presented. The chip is designed using a switched-capacitor technique, fundamental to obtain in a simple and direct way some key features of the hybrid control discussed. The experimental results confirm the suitability of the approach.

Journal ArticleDOI
TL;DR: An efficient metric to estimate the capacitive crosstalk in nanometer high-speed very large scale integration circuits is presented and closed-form expressions for the peak amplitude, the pulsewidth, and the time-domain waveform of the crosStalk noise are provided.
Abstract: Rapid technology scaling along with the continuous increase in the operation frequency cause the crosstalk noise to become a major source of performance degradation in high-speed integrated circuits. This paper presents an efficient metric to estimate the capacitive crosstalk in nanometer high-speed very large scale integration circuits. In particular, we provide closed-form expressions for the peak amplitude, the pulsewidth, and the time-domain waveform of the crosstalk noise. Experimental results show that the maximum error of our noise predictions is less than 13%, while the average error is only 5.82%.

Journal ArticleDOI
R.R. Harrison1
TL;DR: A single-chip analog VLSI sensor that detects imminent collisions by measuring radially expanding optic flow is designed and tested and is capable of 94% correct performance in this task using an ultra-low-resolution (132-pixel) image as input.
Abstract: We have designed and tested a single-chip analog VLSI sensor that detects imminent collisions by measuring radially expanding optic flow. The design of the chip is based on a model proposed to explain leg-extension behavior in flies during landing approaches. We evaluated a detailed version of this model in simulation using a library of 50 test movies taken through a fisheye lens. The algorithm was evaluated on its ability to distinguish movies ending in collisions from movies in which no collision occurred. This biologically inspired algorithm is capable of 94% correct performance in this task using an ultra-low-resolution (132-pixel) image as input. A new elementary motion detector (EMD) circuit was developed to measure optic flow on a CMOS focal-plane sensor. This EMD circuit models the bandpass nature of large monopolar cells (LMCs) immediately postsynaptic to photoreceptors in the fly visual system as well as a saturating multiplication operation proposed for Reichart-type motion detectors. A 16/spl times/16 array of two-dimensional motion detectors was fabricated in a standard 0.5-/spl mu/m CMOS process. The chip consumes 140 /spl mu/W of power from a 5 V supply. With the addition of wide-angle optics, the sensor is able to detect collisions 100-400 ms before impact in complex, real-world scenes.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the architectural possibilities enabled by TSVs and the necessary TSV dimensions for dense Z-axis interconnect among logic blocks, and describe the TSV requirements for the DARPA-funded Vertically Integrated Sensor Arrays (VISA) program, and how those requirements differ from a more general purpose TSV technology.
Abstract: A through-silicon via (TSV) process provides a means of implementing complex, multichip systems entirely in silicon, with a physical packing density many times greater than today's advanced multichip modules. This technology overcomes the resistance-capacitance (RC) delays associated with long, in-plane interconnects by bringing out-of-plane logic blocks much closer electrically, and provides a connection density that makes using those blocks for random logic possible by even small system partitions. TSVs and three dimensional (3-D) stacking technology have the potential to reduce significantly the average wire length of block-to-block interconnects by stacking logic blocks vertically instead of spreading them out horizontally. Although TSVs have great potential, there are many fabrication obstacles that must be overcome. This paper discusses the architectural possibilities enabled by TSVs, and the necessary TSV dimensions for dense Z-axis interconnect among logic blocks. It then describes the TSV requirements for the Defense Advanced Research Projects Agency (DARPA)-funded Vertically Integrated Sensor Arrays (VISA) program, and how those requirements differ from a more general purpose TSV technology. Finally, the TSV fabrication process being implemented at the University of Arkansas (UA) is described in detail. Though this process is being developed for the VISA program, it embodies many of the characteristics of a widely applicable TSV technology.

Journal ArticleDOI
TL;DR: An overview of the field is presented and two chip designs are focused on that highlight some of the promising charge recovering techniques in practice that rely on controlled charge recovery to operate at substantially lower power dissipation levels than their conventional counterparts.
Abstract: Three decades ago, theoretical physicists suggested that the controlled recovery of charges could result in electronic circuitry whose power dissipation approaches thermodynamic limits, growing at a significantly slower pace than the fCV/sup 2/ rate for CMOS switching power. Early engineering research in this field, which became generally known as adiabatic computing, focused on the asymptotic energetics of computation, exploring VLSI designs that use reversible logic and adiabatic switching to preserve information and achieve nearly zero power dissipation as operating frequencies approach zero. Recent advances in CMOS VLSI design have taken us to real working chips that rely on controlled charge recovery to operate at substantially lower power dissipation levels than their conventional counterparts. Although their origins can be traced back to the early adiabatic circuits, these charge-recovering systems approach energy recycling from a more practical angle, shedding reversibility to achieve operating frequencies in the hundreds of MHz with relatively low overhead. Among other charge-recovery designs, researchers have demonstrated microcontrollers, standard-cell ASICs, SRAMs, LCD panel drivers, I/O drivers, and multiGHz clock networks. In this paper, we present an overview of the field and focus on two chip designs that highlight some of the promising charge recovering techniques in practice.

Patent
25 Feb 2005
TL;DR: In this paper, a circuit layout methology is provided for eliminating the extra processing time and file space requirements associated with the optical proximity correction (OPC) of a VLSI design.
Abstract: A circuit layout methology is provided for eliminating the extra processing time and file-space requirements associated with the optical proximity correction (OPC) of a VLSI design. The methodology starts with the design rules for a given manufacturing technology and establishes a new set of layer-specific grid values. A layout obeying these new grid requirements leads to a significant reduction in data preparation time, cost, and file size. A layout-migration tool can be used to modify an existing layout in order to enforce the new grid requirements.

Journal ArticleDOI
TL;DR: A low-power, high-speed architecture which performs two-dimension forward and inverse discrete wavelet transform (DWT) for the set of filters in JPEG2000 is proposed by using a line-based and lifting scheme.
Abstract: A low-power, high-speed architecture which performs two-dimension forward and inverse discrete wavelet transform (DWT) for the set of filters in JPEG2000 is proposed by using a line-based and lifting scheme It consists of one row processor and one column processor each of which contains four sub-filters And the row processor which is time-multiplexed performs in parallel with the column processor Optimized shift-add operations are substituted for multiplications, and edge extension is implemented by an embedded circuit The whole architecture which is optimized in the pipeline design way to speed up and achieve higher hardware utilization has been demonstrated in FPGA Two pixels per clock cycle can be encoded at 100 MHz The architecture can he used as a compact and independent IP core for JPEG2000 VLSI implementation and various real-time image/video applications

Journal ArticleDOI
TL;DR: The proposed design provides a superior performance in terms of the hardware complexity, speed, I/O costs, in addition to such features as regularity, modularity, pipelining capability, and local connectivity, which make the unified structure well suited for VLSI implementation.
Abstract: In this paper, an efficient design approach for a unified very large-scale integration (VLSI) implementation of the discrete cosine transform/discrete sine transform/inverse discrete cosine transform/inverse discrete sine transform based on an appropriate formulation of the four transforms into cyclic convolution structures is presented. This formulation allows an efficient memory-based systolic array implementation of the unified architecture using dual-port ROMs and appropriate hardware sharing methods. The performance of the unified design is compared to that of some of the existing ones. It is found that the proposed design provides a superior performance in terms of the hardware complexity, speed, I/O costs, in addition to such features as regularity, modularity, pipelining capability, and local connectivity, which make the unified structure well suited for VLSI implementation.

Journal ArticleDOI
TL;DR: The proposed class-AB analog cells are very compact, exhibit low total harmonic distortion and low nonlinearity, have a wide bandwidth, and are compatible with low-power and low-voltage operation.
Abstract: Analog computations such as four-quadrant multiplication, linear voltage-to-current conversion and sum-square or difference-square are fundamental for many analog signal processing systems. All these functions can be realized based on the principle of the linearized differential pair using floating-voltage sources. This paper describes an improved practical realization of this principle, which is particularly suited to analog VLSI computational systems. The proposed class-AB analog cells are very compact, exhibit low total harmonic distortion and low nonlinearity, have a wide bandwidth, and are compatible with low-power and low-voltage operation. A mathematical discussion on stability and harmonic distortion of the proposed realization is presented. Both simulated results and measurements from fabricated cell samples in a 0.8-/spl mu/m CMOS process are given. The described circuits operate from a single 2-V power supply.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: A highly accurate fast algorithm for computing the on-chip temperature distribution due to power sources located on the top surface of the chip using a combination of several computational techniques including the Green function method, the discrete cosine transform (DCT), and the table look-up technique.
Abstract: Temperature-related effects are critical in determining both the performance and reliability of VLSI circuits. Accurate and efficient estimation of the temperature distribution corresponding to a specific circuit layout is indispensable in physical design automation tools. In this paper, we propose a highly accurate fast algorithm for computing the on-chip temperature distribution due to power sources located on the top surface of the chip. The method is a combination of several computational techniques including the Green function method, the discrete cosine transform (DCT), and the table look-up technique. The high accuracy of the algorithm comes from the fully analytical nature of the Green function method, and the high efficiency is due to the application of the fast Fourier transform (FFT) technique to compute the DCT and later obtaining the temperature field for any power source distribution using the pre-calculated look-up table. Experimental results have demonstrated that our method has a relative error of below 1% compared with commercial computational fluid dynamic (CFD) softwares for thermal analysis, while the efficiency of our method is orders of magnitude higher than the direct application of the Green function method.

01 Jan 2005
TL;DR: ALU capable of performing basic ternary arithmetic & logic operations is proposed, designed for two -bit operation & can be used for n bit operations by cascading n/2 ALU slices.
Abstract: This paper describes the architecture, design & implementation of 2 bit ternary ALU (T-ALU) slice. The proposed ALU is designed for two -bit operation & can be used for n bit operations by cascading n/2 ALU slices. This ALU is implemented using C-MOS ternary logic gates (T-Gates) for ternary arithmetic & logic circuits. Ternary gates are implemented using enhancement / depletion MOSFET technology, thus proposed ALU is suitable for LSI / VLSI implementation. The designed technique used here requires only two stages i.e . decoder & T-gates, as against three stages i.e. decoder, binary gates & encoder require in conventional ternary logic implementation . Index Terms : Ternary, Unary function, T -Gates, Literal. I. Introduction Alexander [1964] showed that natural base (e= 2.71828) is the most efficient radix for implementation of switching circuits. It seems that most efficient radix for the implementation of digital system is 3 than 2. Ternary logic system, meaning that it has 3 valued switching. Ternary system has several important advantages over binary. It can be summarized as reductions in the interconnections require to implement logic functions, thereby reducing chip area, more information can be transmitted over a given set of lines, lesser memory requirement for a given data length. Besides this serial & some serial-parallel operations can be carried out at higher speed [1][2][3]. Its advantages have been confirmed in the application like memories, communications and digital signal processing etc. [7]. It has been proven that realization & implementation of combinational & sequential function is possible for ternary systems [4][5][6][7]. The implementation is based around bipolar transistors, MOSFETs etc. a basic switching elements, which is refereed to as T-Ga tes [8]. Besides this several authors have proposed reduction techniques to realize ternary functions [9][10][11][12]. In this contribution, we propose ALU capable of performing basic ternary arithmetic & logic operations as mentioned in table 1. We also suggest a scheme that takes the advantage of minimization techniques proposed by [9][11][13] & implemented using T-gates designed for ternary operations. This scheme shows reduction in the number of gate count to implement ternary functions. Firstly we describe the design of 2 bit ALU and then integrate over ALU slice. The organization of paper is: Section II describes basic T-Gate implementation, 2 bit ALU architecture is given in section III, section IV describes 2 bit ALU design and ALU slice design. Experimental results & performance evaluation is given in section V. Finally conclusion is given in section VI. Table 1:Functional Table of T -ALU

Journal ArticleDOI
TL;DR: In this paper, a tunneling magnetoresistive (TMR)-based logic-in-memory (SAD) circuit is proposed for a low-power VLSI system.
Abstract: A tunneling magnetoresistive(TMR)-based logic-in-memory circuit, where storage functions are distributed over a logic-circuit plane, is proposed for a low-power VLSI system. Since the TMR device is regarded as a variable resistor with a non-volatile storage capability, any logic functions with external inputs and stored inputs can be performed by using the TMR-based resistor/transistor network. The combination of dynamic current-mode circuitry and a TMR-based logic network makes it possible to perform any switching operations without steady current, which results in power saving. A design example of an SAD unit for MPEG encoding is discussed, and its advantages are demonstrated.

Proceedings ArticleDOI
31 May 2005
TL;DR: A defect-tolerant design flow to minimize customized post-fabrication design efforts to be performed per chip and a greedy O(n log n) mapping algorithm which makes the connection between defect-unaware design steps and the final defect-aware step are presented.
Abstract: Self-assembled nano-fabrication processes yield regular and reconfigurable devices. However, defect densities in this emerging nanotechnology are higher than those in conventional lithography-based VLSI. In this paper, we present a defect-tolerant design flow to minimize customized post-fabrication design efforts to be performed per chip. We also present a greedy O(n log n) mapping algorithm which makes the connection between defect-unaware design steps and the final defect-aware step. Experiments show that the results obtained by this algorithm are very close to the exact solutions.

Journal ArticleDOI
TL;DR: This paper provides the first detailed instruction-level simulation results on motion estimation based on a programmable CPU core and analyzed various aspects of the selected motion estimation algorithms, such as search speed and power distribution.
Abstract: Motion estimation is the most computationally expensive task in MPEG-style video compression. Video compression is starting to be widely used in battery-powered terminals, but surprisingly little is known about the power consumption of modern motion estimation algorithms. This paper describes our effort to analyze the power and performance of realistic motion estimation algorithms in both hardware and software realizations. For custom hardware realizations, this paper presents a general model of VLSI motion estimation architectures. This model allows us to analyze in detail the power consumption of a large class of modern motion estimation engines that can execute the motion estimation algorithms of interest to us. We compare these algorithms in terms of their power consumption and performance. For software realizations, this paper provides the first detailed instruction-level simulation results on motion estimation based on a programmable CPU core. We analyzed various aspects of the selected motion estimation algorithms, such as search speed and power distribution. This paper provides a guideline to two types of machine designs for motion estimation: custom ASIC (application specific integrated circuit) design and custom ASIP (application specific instruction-set processor) designs.

Journal ArticleDOI
TL;DR: A design exploration framework for application-adaptive multiple-clock processors which provides the means for analyzing and identifying the right interdomain communication scheme and the proper granularity for the choice of voltage/frequency islands in case of superscalar, out-of-order processors is proposed.
Abstract: Enabled by the continuous advancement in fabrication technology, present-day synchronous microprocessors include more than 100 million transistors and have clock speeds well in excess of the 1-GHz mark. Distributing a low-skew clock signal in this frequency range to all areas of a large chip is a task of growing complexity. As a solution to this problem, designers have recently suggested the use of frequency islands that are locally clocked and externally communicate with each other using mixed clock communication schemes. Such a design style fits nicely with the recently proposed concept of voltage islands that, in addition, can potentially enable fine-grain dynamic power management by simultaneous voltage and frequency scaling. This paper proposes a design exploration framework for application-adaptive multiple-clock processors which provides the means for analyzing and identifying the right interdomain communication scheme and the proper granularity for the choice of voltage/frequency islands in case of superscalar, out-of-order processors. In addition, the presented design exploration framework allows for comparative analysis of newly proposed or already published application-driven dynamic power management strategies. Such a design exploration framework and accompanying results can help designers and computer architects in choosing the right design strategy for achieving better power-performance tradeoffs in multiple-clock high-end processors.

Journal ArticleDOI
TL;DR: Results from 0.35 /spl mu/m CMOS temporal differentiating pixels and STDP circuits show that the system is capable of adapting to substantially reduce the effects of process variations without interrupting the algorithm's natural processes.
Abstract: A transient-detecting very large scale integration (VLSI) pixel is described, suitable for use in a visual-processing, depth-recovery algorithm based upon spike timing. A small array of pixels is coupled to an adaptive system, based upon spike timing dependent plasticity (STDP), that aims to reduce the effect of VLSI process variations on the algorithm's performance. Results from 0.35 /spl mu/m CMOS temporal differentiating pixels and STDP circuits show that the system is capable of adapting to substantially reduce the effects of process variations without interrupting the algorithm's natural processes. The concept is generic to all spike timing driven processing algorithms in a VLSI.