scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2008"


Journal ArticleDOI
01 Feb 2008
TL;DR: VLSI implementation results are provided which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI Implementation.
Abstract: Multiple-input multiple-output (MIMO) detection algorithms providing soft information for a subsequent channel decoder pose significant implementation challenges due to their high computational complexity. In this paper, we show how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and (error rate) performance. In particular, we provide VLSI implementation results which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI implementation.

404 citations


Proceedings ArticleDOI
01 Jun 2008
TL;DR: A novel asynchronous low-voltage signaling scheme is presented that makes the wafer-scale approach feasible by limiting the total power consumption while simultaneously providing a flexible, programmable network topology.
Abstract: This paper introduces a novel design of an artificial neural network tailored for wafer-scale integration. The presented VLSI implementation includes continuous-time analog neurons with up to 16 k inputs. A novel interconnection and routing scheme allows the mapping of a multitude of network models derived from biology on the VLSI neural network while maintaining a high resource usage. A single 20 cm wafer contains about 60 million synapses. The implemented neurons are highly accelerated compared to biological real time. The power consumption of the dense interconnection network providing the necessary communication bandwidth is a critical aspect of the system integration. A novel asynchronous low-voltage signaling scheme is presented that makes the wafer-scale approach feasible by limiting the total power consumption while simultaneously providing a flexible, programmable network topology.

277 citations


Book
Hubert Kaeslin1
28 Apr 2008
TL;DR: This comprehensive guide to how and when to design VLSI circuits, covers the advances, challenges and past mistakes in design, acting as an introduction to graduate students and a reference for practising electronic engineers.
Abstract: VLSI circuits are ubiquitous in the modern world, and designing them efficiently is becoming increasingly challenging with the development of ever smaller chips. This practically oriented textbook covers the important aspects of VLSI design using a top-down approach, reflecting the way digital circuits are actually designed. Using practical hints and tips, case studies and checklists, this comprehensive guide to how and when to design VLSI circuits, covers the advances, challenges and past mistakes in design, acting as an introduction to graduate students and a reference for practising electronic engineers.

155 citations


Proceedings ArticleDOI
05 Jun 2008
TL;DR: An Electronic Neural Network memory with 256 neurons on a single chip using a combination of analog and digital VLSI technology plus a custom microfabrication process to allow a very dense packing of the neurons.
Abstract: We designed an Electronic Neural Network (ENN) memory with 256 neurons on a single chip using a combination of analog and digital VLSI technology plus a custom microfabrication process. Amplifiers with inverting and noninverting outputs are used for the neurons to make inhibitory and excitatory connections. The connections between the individual neurons are provided by amorphous‐silicon resistors which are placed on the CMOS chip in the last fabrication step. This technique allows a very dense packing of the neurons. Electron‐beam direct‐writing is used to pattern the resistors making it easy to change the information stored in the network from one chip to the next.

143 citations


Journal ArticleDOI
TL;DR: A high-performance droplet router for a digital microfluidic biochip (DMFB) design that achieves over 35 x and 20 x better routability with comparable timing and fault tolerance than the popular prioritized A* search and the state-of-the-art network-flow-based algorithm, respectively.
Abstract: In this paper, we propose a high-performance droplet router for a digital microfluidic biochip (DMFB) design. Due to recent advancements in the biomicro electromechanical system and its various applications to clinical, environmental, and military operations, the design complexity and the scale of a DMFB are expected to explode in the near future, thus requiring strong support from CAD as in conventional VLSI design. Among the multiple design stages of a DMFB, droplet routing, which schedules the movement of each droplet in a time-multiplexed manner, is one of the most critical design challenges due to high complexity as well as large impacts on performance. Our algorithm first routes a droplet with higher by passibility which is less likely to block the movement of the others. When multiple droplets form a deadlock, our algorithm resolves it by backing off some droplets for concession. The final compaction step further enhances timing as well as fault tolerance by tuning each droplet movement greedily. The experimental results on hard benchmarks show that our algorithm achieves over 35 x and 20 x better routability with comparable timing and fault tolerance than the popular prioritized A* search and the state-of-the-art network-flow-based algorithm, respectively.

141 citations


Proceedings ArticleDOI
10 Nov 2008
TL;DR: It is shown that NEM relay-based adders can achieve an order of magnitude or more improvement over CMOS adders with ns-range delays and with no area penalty, and can be achieved at higher throughputs at the cost of increased area.
Abstract: To overcome the energy-efficiency limitations imposed by finite sub-threshold slope in CMOS transistors, this paper explores the design of integrated circuits based on nano-electro-mechanical (NEM) relays. A dynamical Verilog-A model of the NEM relay is described and correlated to device measurements. Using this model we explore NEM relay design strategies for digital logic and I/O that can significantly improve the energy efficiency of the whole VLSI system. By exploiting the low effective threshold voltage and zero leakage achievable with these relays, we show that NEM relay-based adders can achieve an order of magnitude or more improvement in energy efficiency over CMOS adders with ns-range delays and with no area penalty. By applying parallelism, this improvement in energy-efficiency can be achieved at higher throughputs as well, at the cost of increased area. Similar improvements in high-speed I/O energy are also predicted by making use of the relays to implement highly energy-efficient digital-to-analog and analog-to-digital converters.

139 citations


Journal ArticleDOI
TL;DR: This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures that can be easily achieved for an NtimesN image with controlled increase of hardware cost.
Abstract: This paper presents a systematic high-speed VLSI implementation of the discrete wavelet transform (DWT) based on hardware-efficient parallel FIR filter structures. High-speed 2-D DWT with computation time as low as N 2/12 can be easily achieved for an NtimesN image with controlled increase of hardware cost. Compared with recently published 2-D DWT architectures with computation time of N 2/3 and 2N 2/3, the proposed designs can also save a large amount of multipliers and/or storage elements. It can also be used to implement those 2-D DWT traditionally suitable for lifting or flipping-based designs, such as (9,7) and (6,10) DWT. The throughput rate can be improved by a factor of 4 by the proposed approach, but the hardware cost increases by a factor of around 3. Furthermore, the proposed designs have very simple control signals, regular structures and 100% hardware utilization for continuous images.

107 citations


Journal ArticleDOI
TL;DR: Simulation results illustrate the superiority of the resulting proposed adder against conventional CMOS 1-bit full-adder in terms of power, delay and PDP.
Abstract: In this paper a new low power and high performance adder cell using a new design style called “Bridge” is proposed. The bridge design style enjoys a high degree of regularity, higher density than conventional CMOS design style as well as lower power consumption, by using some transistors, named bridge transistors. Simulation results illustrate the superiority of the resulting proposed adder against conventional CMOS 1-bit full-adder in terms of power, delay and PDP. We have performed simulations using HSPICE in a 90 nanometer (nm) standard CMOS technology at room temperature; with supply voltage variation from 0.65v to 1.5v with 0.05v steps.

107 citations


Proceedings ArticleDOI
Loi, Mitra, Lee, Fujita, Benini 
01 Jan 2008

101 citations


Book
07 Jul 2008
TL;DR: Low-Power High-Level Synthesis for Nanoscale CMOS Circuits addresses the need for analysis, characterization, estimation, and optimization of the various forms of power dissipation in the presence of process variations of nano-CMOS technologies.
Abstract: Low-Power High-Level Synthesis for Nanoscale CMOS Circuits addresses the need for analysis, characterization, estimation, and optimization of the various forms of power dissipation in the presence of process variations of nano-CMOS technologies. The authors show very large-scale integration (VLSI) researchers and engineers how to minimize the different types of power consumption of digital circuits. The material deals primarily with high-level (architectural or behavioral) energy dissipation because the behavioral level is not as highly abstracted as the system level nor is it as complex as the gate/transistor level. At the behavioral level there is a balanced degree of freedom to explore power reduction mechanisms, the power reduction opportunities are greater, and it can cost-effectively help in investigating lower power design alternatives prior to actual circuit layout or silicon implementation. The book is a self-contained low-power, high-level synthesis text for Nanoscale VLSI design engineers and researchers. Each chapter has simple relevant examples for a better grasp of the principles presented. Several algorithms are given to provide a better understanding of the underlying concepts. The initial chapters deal with the basics of high-level synthesis, power dissipation mechanisms, and power estimation. In subsequent parts of the text, a detailed discussion of methodologies for the reduction of different types of power is presented including: Power Reduction Fundamentals Energy or Average Power Reduction Peak Power Reduction Transient Power Reduction Leakage Power Reduction Low-Power High-Level Synthesis for Nanoscale CMOS Circuits provides a valuable resource for the design of low-power CMOS circuits.

79 citations


Book
30 Jun 2008
TL;DR: In this article, the authors present a comprehensive, state-of-the-art overview of VLSI circuit design for a wide range of applications in biology and medicine supported with over 280 illustrations and over 160 equations.
Abstract: VLSI (very large scale integration) is the process of creating integrated circuits by combining thousands of transistor based circuits into a single chip Written by top-notch international experts in industry and academia, this groundbreaking resource presents a comprehensive, state-of-the-art overview of VLSI circuit design for a wide range of applications in biology and medicineSupported with over 280 illustrations and over 160 equations, the book offers cutting-edge guidance on designing integrated circuits for wireless biosensing, body implants, biosensing interfaces, and molecular biology Engineers discover innovative design techniques and novel materials to help them achieve higher levels circuit and system performance This invaluable volume is essential reading for professionals and graduate students with a serious interest in circuit design and future biomedical technology

Proceedings ArticleDOI
01 Nov 2008
TL;DR: An optimized fixed-point VLSI implementation of the modified Gram-Schmidt (MGS) QRD algorithm that incorporates regularization and additional sorting of the MIMO channel matrix that clearly showed superiority of the Givens rotation (GR) solution in terms of area, processing cycles, and throughput.
Abstract: The QR decomposition (QRD) is an important prerequisite for many different detection algorithms in multiple-input multiple-output (MIMO) wireless communication systems. This paper presents an optimized fixed-point VLSI implementation of the modified Gram-Schmidt (MGS) QRD algorithm that incorporates regularization and additional sorting of the MIMO channel matrix. Integrated in 0.18 mum CMOS technology, the proposed VLSI architecture processes up to 1.56 million complex-valued 4times4-dimensional matrices per second. The implementation results of this work are extensively compared to the Givens rotation (GR)-based QRD implementation of Luethi et al., ISCAS 2007. In order to ensure a fair comparison, both QRD circuits have been integrated in the same IC manufacturing technology, with equal functionality, and the same numeric precision. The comparison of the implementation results clearly showed superiority of the GR-based VLSI solution in terms of area, processing cycles, and throughput.

Journal ArticleDOI
TL;DR: In this paper, an efficient VLSI architecture of a pipeline fast Fourier transform (FFT) processor capable of producing the normal output order sequence is presented and a sequence conversion method by integrating the conversion function into the last-stage data commutator module is presented.
Abstract: In this paper, an efficient VLSI architecture of a pipeline fast Fourier transform (FFT) processor capable of producing the normal output order sequence is presented. A new FFT design based on the decimated dual-path delay feed-forward data commutator unit by splitting the input stream into two half-word streams is first proposed. The resulting architecture can achieve full hardware efficiency such that the required number of adders can be reduced by half. Next, in order to generate the normal output order sequence, this paper also presents a sequence conversion method by integrating the conversion function into the last-stage data commutator module.

Proceedings ArticleDOI
01 Jun 2008
TL;DR: This paper focuses on the usability of the analog VLSI hardware architecture for the distributed simulation of large-scale spiking neural networks by demonstrating that biologically relevant network models can in fact be mapped to this system.
Abstract: An analog VLSI hardware architecture for the distributed simulation of large-scale spiking neural networks has been developed. Several hundred integrated computing nodes, each hosting up to 512 neurons, will be interconnected and operated on un-cut silicon wafers. The electro-technical aspects and the details of the hardware implementation are covered in a separate contribution to this conference. This paper focuses on the usability of the system by demonstrating that biologically relevant network models can in fact be mapped to this system. Different network configurations are established on the hardware by programmable switch matrices, repeaters, and address decoders. Systematic routing algorithms are presented to map a given network model to the hardware system. Routing is simulated for several network examples, proving the systempsilas practical applicability. Furthermore, the routing simulations are used to fix values for yet open hardware parameters.

Journal ArticleDOI
TL;DR: The results show that the extent of memory savings realized by using interpolation is significantly lower than what is commonly believed, and the availability of both interpolation-based and approximation-based designs offers a richer set of design trade-offs than what was available using either interpolation or approximation alone.
Abstract: This paper examines the hardware implementation trade-offs when evaluating functions via piecewise polynomial approximations and interpolations for precisions of up to 24 bits. In polynomial approximations, polynomials are evaluated using stored coefficients. Polynomial interpolations, however, require the coefficients to be computed on-the-fly by using stored function values. Although it is known that interpolations require less memory than approximations, but at the expense of additional computations, the trade-offs in memory, area, delay, and power consumption between the two approaches have not been examined in detail. This work quantitatively analyzes these trade-offs for optimized approximations and interpolations across different functions and target precisions. Hardware architectures for degree-1 and degree-2 approximations and interpolations are described. The results show that the extent of memory savings realized by using interpolation is significantly lower than what is commonly believed. Furthermore, experimental results on a field-programmable gate array (FPGA) show that, for high output precision, degree-1 interpolations offer considerable area and power savings over degree-1 approximations, but similar savings are not realized when degree-2 interpolations and approximations are compared. The availability of both interpolation-based and approximation-based designs offers a richer set of design trade-offs than what is available using either interpolation or approximation alone.

Journal ArticleDOI
07 Oct 2008
TL;DR: Five pre-processing algorithms for the detection of firearm gunshots are statistically evaluated, using the receiver operating characteristic method, as a previous feasibility metric for their implementation on a low power VLSI circuit.
Abstract: Six preprocessing algorithms for the detection of firearm gunshots are statistically evaluated, using the receiver operating characteristic method as a previous feasibility metric for their implementation on a low-power VLSI circuit. Circuits are intended to serve as the input detection sensors of a low-power environmental surveillance network. Some possible VLSI implementations for the evaluated algorithms are also evaluated. Results indicate that the use of wavelet bank filters, either discrete or continuous, might be the best choice in terms of the compromise between detection efficiency and the power requirements of the intended application.

Proceedings ArticleDOI
01 Oct 2008
TL;DR: The classical triple modular redundancy (TMR) fault tolerant architecture is used as a case study and a new manner to implement the TMR architecture is proposed that makes it very effective for yield improvement purpose.
Abstract: With the technology entering the nano dimension, manufacturing processes are less and less reliable, thus drastically impacting the yield. A possible solution to alleviate this problem in the future could consist in using fault tolerant architectures to tolerate manufacturing defects. In this paper, we use the classical triple modular redundancy (TMR) fault tolerant architecture as a case study. Firstly we analyze the conditions that make the use of TMR architectures interesting for yield improvement purpose. In the second part of the paper, we investigate the test requirements for the TMR architecture and we propose a solution for generating test patterns for this type of architecture. Finally, we propose a new manner to implement the TMR architecture that makes it very effective for yield improvement purpose. Experimental results are provided on ISCAS and ITC benchmark circuits to prove the efficiency of the proposed approach in terms of yield improvement with a low area overhead.

Proceedings ArticleDOI
18 May 2008
TL;DR: The simulation results demonstrate that the high performance architecture of bi-cubic convolution interpolation at 279 MHz with 30643 gates in a 498times498 mum chip is able to process digital image scaling for HDTV in real-time.
Abstract: This paper presents an efficient VLSI design of bicubic convolution interpolation for digital image processing. The architecture of reducing the computational complexity of generating coefficients as well as decreasing number of memory access times is proposed. Our proposed method provides a simple hardware architecture design, low computation cost and is easy to implement. Based on our technique, the high-speed VLSI architecture has been successfully designed and implemented with TSMC 0.13 mum standard cell library. The simulation results demonstrate that the high performance architecture of bi-cubic convolution interpolation at 279 MHz with 30643 gates in a 498times498 mum chip is able to process digital image scaling for HDTV in real-time.

Journal ArticleDOI
TL;DR: This work provides a means to efficiently compute the body bias voltages required for ensuring high performance operation in gigascale systems and provides a computer-aided design (CAD) perspective for determining the exact amount of bias voltage that can compensate both temperature and process variations.
Abstract: With continued scaling into the sub-90-nm regime, the role of process, voltage, and temperature (PVT) variations on the performance of VLSI circuits has become extremely important. These variations can cause the delay and the leakage of the chip to vary significantly from their expected values, thereby affecting the yield. Circuit designers have proposed the use of threshold voltage modulation techniques to pull back the chip to the nominal operational region. One such scheme, known as adaptive body bias (ABB), has become extremely effective in ensuring optimal performance or leakage savings. Our work provides a means to efficiently compute the body bias voltages required for ensuring high performance operation in gigascale systems. We provide a computer-aided design (CAD) perspective for determining the exact amount of bias voltages that can compensate both temperature and process variations. Mathematical models for delay and leakage based on minimal tester measurements are built, and a nonlinear optimization problem is formulated to ensure highest frequency operation under all conditions, and thereby minimize the overall circuit leakage. Three different algorithms are presented and their accuracies and runtimes are compared. The algorithms have been applied to a wide range of process and temperature corners, for a 65- and 45-nm technology node-based process. A suitable implementation mechanism has also been outlined.

Proceedings ArticleDOI
04 May 2008
TL;DR: The proposed reliability monitor not only tracks the NBTI effect but also mitigates the degradation by forward biasing the PMOS.
Abstract: Reliability has become a practical concern in today's VLSI design with advanced technologies. In-situ sensors have been proposed for reliability monitoring to provide advance warnings before system errors occur. This paper presents a reliability monitor design for NBTI (Negative Bias Temperature Instability). NBTI is recognized as very critical as it leads to short device lifetime. The proposed reliability monitor not only tracks the NBTI effect but also mitigates the degradation by forward biasing the PMOS. A worst case scenario static stress experiment demonstrates two orders of magnitude improvement in system lifetime using PTM 65nm technology. A ring oscillator example shows how frequency degradation can be compensated. Deployment of the proposed NBTI monitor is also discussed and two compatible strategies are provided to incorporate these monitors efficiently: the first focuses on low area overhead while the second features low power.

Journal ArticleDOI
Cao Wei1, Hou Hui1, Tong Jia-rong1, Lai Jinmei1, Min Hao1 
TL;DR: A new high-performance reconfigurable VLSI architecture to support "meander"-like scan format for a high data reuse of search area to increase the hardware utilization for VBSME with FSBMA.
Abstract: VBSME (variable block size motion estimation) is adopted in the MPEG-4 AVC/H.264 standard. In order to increase the hardware utilization for VBSME with FSBMA (full search block matching algorithm), this paper proposed a new high-performance reconfigurable VLSI architecture to support "meander"-like scan format for a high data reuse of search area. The architecture can support the three data flows of the scan format through a reconfigurable computing array and a memory of the search area. The computing array can achieve 100% processing element (PE) utilization and can reuse the smaller blocks' SADs to calculate 41 motion vectors (MVs) of a 16X16 block in parallel. The design is implemented with TSMC 0.18 mum CMOS technology. Under a clock frequency of 180 MHz, the architecture allows the real-time processing of 1280 x 720 at 45 fps in a search range [-16, +16].

Journal ArticleDOI
TL;DR: This work presents a low-complexity link microarchitecture for mesochronous on-chip communication that enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds.
Abstract: Clock distribution is an important issue when designing multi processor systems-on-chip on deep sub-micron technology nodes and non-synchronous approaches are becoming popular in this field. This work presents a low-complexity link microarchitecture for mesochronous on-chip communication that enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. With respect to the state of the art, the proposed link architecture stands for its low power and low complexity overheads; moreover it can be easily integrated in a conventional digital design flow since it is implemented by means of standard cells only. Results are presented referring to the link integrated within a multi processor tiled architecture based on a network-on-chip communication backbone on a CMOS 65 nm technology.

Journal ArticleDOI
TL;DR: This work describes a novel superpipelined, fully parallelized architecture for optical-flow processing, which is capable of processing up to 170 frames per second at a resolution of 800x600 pixels, and discusses the advantages of high-frame-rate processing.

Journal ArticleDOI
TL;DR: A novel high-speed systolic array architecture for a first-order 2-D broadband frequency-planar spatio-temporal beam filter is proposed and employs a field-programmable gate array (FPGA) circuit where the critical path latency is minimized by timing optimization of inter- and intra-parallel processor pipelines, together with 3-D look-ahead techniques.
Abstract: For high-speed plane-wave filtering applications, real-time 2-D spatio-temporal linear-array broadband beam filters are required, operating at temporal frame rates in excess of hundreds of megahertz. The corresponding application specific VLSI circuits must have low critical-path latencies. A novel high-speed systolic array architecture for a first-order 2-D broadband frequency-planar spatio-temporal beam filter is proposed for this purpose and employs a field-programmable gate array (FPGA) circuit where the critical path latency is minimized by timing optimization of inter- and intra-parallel processor pipelines, together with 3-D look-ahead techniques. The method facilitates single-chip VLSI circuit implementations operating at real-time frame rates of several hundred megahertz.

Journal ArticleDOI
TL;DR: An efficient reverse converter for transforming the redundant binary representation into two's complement form that expends at least two times less energy than the competitor circuit and is capable of completing a 64-bit conversion in 829 ps and dissipates merely 5.84 mW.
Abstract: This paper presents an efficient reverse converter for transforming the redundant binary (RB) representation into two's complement form. The hierarchical expansion of the carry equation for the reverse conversion algorithm creates a regular multilevel structure, from which a high-speed hybrid carry-lookahead/carry-select (CLA/CSL) architecture is proposed to fully exploit the redundancy of RB encoding for VLSI efficient implementation. The optimally designed CSL sections interleaved evenly in the mixed-radix CLA network to boost the performance of the reverse converter well above those designed based on a homogeneous type of carry propagation adder. The logical effort characterization captures the effect of circuit's fan-in, fan-out and transistor sizing on performance, and the evaluation shows that our proposed architecture leads to the fastest design. A 64-bit transistor-level circuit implementation of our proposed reverse converter and that of its most competitive contender were simulated to validate the logical effort delay model. The pre- and post-layout HSPICE simulation results reveal that our new converter expends at least two times less energy (power-delay product) than the competitor circuit and is capable of completing a 64-bit conversion in 829 ps and dissipates merely 5.84 mW at a data rate of 1 GHz and a supply voltage of 1.8 V in TSMC 0.18-mum CMOS technology.

Proceedings ArticleDOI
10 Nov 2008
TL;DR: This work presents a stability-preserving projection framework for model reduction of linear systems that can create accurate stable and passive models of arbitrary indefinite systems at a significantly cheaper cost than existing methods such as balanced truncation.
Abstract: In this work we present a stability-preserving projection framework for model reduction of linear systems. Specifically, given one projection matrix (e.g. a right-projection matrix), we derive a set of linear constraints for the other projection matrix (e.g. the left-projection matrix) resulting in a projection framework that is guaranteed to generate a stable reduced model. Several efficient techniques for solving the proposed system of constraints are presented, including an optimization problem formulation for finding the optimal stabilizing projection, and a formulation with computational complexity independent of the size of the original system. The resulting algorithms can create accurate stable and passive models of arbitrary indefinite systems at a significantly cheaper cost than existing methods such as balanced truncation. Nevertheless, our algorithms integrate fully and effortlessly with most of the available standard model order reduction approaches for very large systems generated in VLSI applications (such as moment-matching methods, POD, or poor manpsilas TBR), which can guarantee stability and passivity only in very specialized cases. Our algorithms have been tested on a large variety of typical VLSI applications, including field-solver-extracted models of RF inductors for analog applications, power distribution grids for large VLSI digital integrated circuits, and MEMS devices for sensing and actuation applications. The results have been successfully compared to those from existing and much more expensive stabilizing reduction techniques.

Journal ArticleDOI
TL;DR: An optically reconfigurable gate array (ORGA) system, which consists of an ORGA very large scale integration (VLSI), an easily rewritable liquid crystal holographic memory recording four configuration contexts, and a laser array, is proposed.
Abstract: An optically reconfigurable gate array (ORGA) system, which consists of an ORGA very large scale integration (VLSI), an easily rewritable liquid crystal holographic memory recording four configuration contexts, and a laser array, is proposed. Circuits on a gate array of the ORGA-VLSI can be programmed rapidly by exploiting large parallel connections between a holographic memory and a gate array VLSI; that programming can be executed even as it is being programmed. Consequently, the gate array can be switched from a certain circuit to another circuit instantaneously. We present a demonstration of the ORGA system and experimental results.

Journal ArticleDOI
TL;DR: Comparison with the fastest comparator known in the literature demonstrates that, at a parity of technology used, the novel architecture is ~ 12% faster and requires ~ 69% less transistors.
Abstract: This paper presents a new efficient architecture for the design of fast low-cost single-clock-cycle binary comparators. The proposed 64-bit circuit requires only 1051 transistors and, when implemented by using the ST 90-nm 1-V CMOS technology, it exhibits a running frequency higher than 4 GHz with an average power dissipation of only ~ 4 mW. Comparison with the fastest comparator known in the literature demonstrates that, at a parity of technology used, the novel architecture is ~ 12% faster and requires ~ 69% less transistors.

Journal ArticleDOI
TL;DR: In this paper, the authors integrate carbon nanotube fabrication with standard commercial CMOS very large scale integration on a single substrate suitable for emerging hybrid nanotechnology applications, such as optical, biological, chemical, and gas sensors.
Abstract: We integrate carbon nanotube (CNT) fabrication with standard commercial CMOS very large scale integration on a single substrate suitable for emerging hybrid nanotechnology applications. This cointegration combines the inherent advantages of CMOS and CNTs. These emerging applications include CNT optical, biological, chemical, and gas sensors that require complex CMOS electronics for sensor control, calibration, and signal processing. We demonstrate the successful cointegration on a single chip with a vehicle circuit, a two-transistor cascode megahertz amplifier utilizing both silicon n-channel MOSFET and CNT transistors with a total power consumption of 62.5 muW.

Journal ArticleDOI
TL;DR: Low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems are proposed and the use of multiple clock domains and clock gating reduces the power consumption.
Abstract: In this paper, we propose low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems. The objective of the work is the optimization, with respect to power, area, and latency, of both the signal processing algorithms themselves and their implementation. Novel circuit design strategies have been employed to realize optimal hardware and power efficient architectures for the fast Fourier transform, arc tangent computation unit, numerically controlled oscillator, and the decimation filters. The use of multiple clock domains and clock gating reduces the power consumption further. These blocks have been integrated into an experimental digital baseband processor for the IEEE 802.11a standard implemented in the 0.25mum- 5-metal layer BiCMOS technology from Institute for High Performance Microelectronics.