scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 2013"


Book
07 Apr 2013
TL;DR: This book provides a careful selection of essential topics on all three types of circuits, namely, digital, memory, and mixed-signal, each requiring different test and design for testability methods.
Abstract: Today's electronic design and test engineers deal with several types of subsystems, namely, digital, memory, and mixed-signal, each requiring different test and design for testability methods. This book provides a careful selection of essential topics on all three types of circuits. The outcome of testing is product quality, which means "meeting the user's needs at a minimum cost". The book includes test economics and techniques for determining the defect level of VLSI chips. Besides being a textbook for a course on testing, it is a complete testability guide for an engineer working on any kind of electronic device or system or a system-on-a-chip.

1,484 citations


Book
17 Jan 2013
TL;DR: Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation is an important introduction to numerous algorithmic, architectural and system design aspects of the multimedia standard MPEG-2 and H.263.
Abstract: MPEG-4 is the multimedia standard for combining interactivity, natural and synthetic digital video, audio and computer-graphics Typical applications are: internet, video conferencing, mobile videophones, multimedia cooperative work, teleteaching and games With MPEG-4 the next step from block-based video (ISO/IEC MPEG-1, MPEG-2, CCITT H261, ITU-T H263) to arbitrarily-shaped visual objects is taken This significant step demands a new methodology for system analysis and design to meet the considerably higher flexibility of MPEG-4 Motion estimation is a central part of MPEG-1/2/4 and H261/H263 video compression standards and has attracted much attention in research and industry, for the following reasons: it is computationally the most demanding algorithm of a video encoder (about 60-80% of the total computation time), it has a high impact on the visual quality of a video encoder, and it is not standardized, thus being open to competition Algorithms, Complexity Analysis, and VLSI Architectures for MPEG-4 Motion Estimation covers in detail every single step in the design of a MPEG-1/2/4 or H261/H263 compliant video encoder: Fast motion estimation algorithms Complexity analysis tools Detailed complexity analysis of a software implementation of MPEG-4 video Complexity and visual quality analysis of fast motion estimation algorithms within MPEG-4 Design space on motion estimation VLSI architectures Detailed VLSI design examples of (1) a high throughput and (2) a low-power MPEG-4 motion estimator Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation is an important introduction to numerous algorithmic, architectural and system design aspects of the multimedia standard MPEG-4 As such, all researchers, students and practitioners working in image processing, video coding or system and VLSI design will find this book of interest

368 citations



Proceedings ArticleDOI
18 Nov 2013
TL;DR: A novel approximate adder design to significantly reduce energy consumption with a very moderate error rate and critical path delay is proposed that has been adopted in a VLSI-based neuromorphic character recognition chip using unsupervised learning.
Abstract: We propose a novel approximate adder design to significantly reduce energy consumption with a very moderate error rate. The significantly improved error rate and critical path delay stem from the employed carry prediction technique that leverages the information from less significant input bits in a parallel manner. An error magnitude reduction scheme is proposed to further reduce amount of error once detected with low cost. Implemented in a commercial 90 nm CMOS process, it is shown that the proposed adder is up to 2.4× faster and 43% more energy efficient over traditional adders while having an error rate of only 0.18%. The proposed adder has been adopted in a VLSI-based neuromorphic character recognition chip using unsupervised learning. The approximation errors of the proposed adder have been shown to have negligible impact on the training process. Moreover, the energy savings of up to 48.5% over traditional adders is achieved for the neuromorphic circuit with scaled supply level. Finally, we achieve error-free operations by including a low-overhead error correction logic.

112 citations


Journal ArticleDOI
TL;DR: A floating random walk (FRW) solver, called RWCap, is presented for the capacitance extraction of very-large-scale integration (VLSI) interconnects and it is demonstrated that the parallel RWCap is over 6× faster than its serial-computing version.
Abstract: A floating random walk (FRW) solver, called RWCap, is presented for the capacitance extraction of very-large-scale integration (VLSI) interconnects. An approach, including the numerical characterization of the cross-interface transition probability and weight value, is proposed to accelerate the extraction of structures with multiple dielectric layers. A comprehensive variance reduction scheme based on the importance sampling and stratified sampling is proposed to improve the convergence rate of the FRW algorithm. Finally, the space management technique using an octree data structure and the parallel computing technique are presented to further improve the efficiency. Numerical experiments are carried out with the test cases generated under the 180 and 45-nm process technologies. They demonstrate that the proposed multidielectric FRW algorithm achieves up to 160× speedup over the FRW algorithm using spherical transition domains to cross dielectric interface, with very small memory overhead. The variance reduction techniques further bring 3× or more speedup without memory overhead and the loss of accuracy. The RWCap also outperforms other existing FRW algorithm and fast boundary element method solvers in terms of computational time or scalability. The experiments on an 8-core CPU machine show that the parallel RWCap is over 6× faster than its serial-computing version.

83 citations


Journal ArticleDOI
TL;DR: An efficient denoising scheme and its VLSI architecture for the removal of random-valued impulse noise is proposed and can obtain better performances in terms of both quantitative evaluation and visual quality than the previous lower complexity methods.
Abstract: Images are often corrupted by impulse noise in the procedures of image acquisition and transmission. In this paper, we propose an efficient denoising scheme and its VLSI architecture for the removal of random-valued impulse noise. To achieve the goal of low cost, a low-complexity VLSI architecture is proposed. We employ a decision-tree-based impulse noise detector to detect the noisy pixels, and an edge-preserving filter to reconstruct the intensity values of noisy pixels. Furthermore, an adaptive technology is used to enhance the effects of removal of impulse noise. Our extensive experimental results demonstrate that the proposed technique can obtain better performances in terms of both quantitative evaluation and visual quality than the previous lower complexity methods. Moreover, the performance can be comparable to the higher,- complexity methods. The VLSI architecture of our design yields a processing rate of about 200 MHz by using TSMC 0.18 μm technology. Compared with the state-of-the-art techniques, this work can reduce memory storage by more than 99 percent. The design requires only low computational complexity and two line memory buffers. Its hardware cost is low and suitable to be applied to many real-time applications.

75 citations


BookDOI
10 Nov 2013
TL;DR: In this article, the authors argue that feeling bored when reading will be only unless you don't like the book and that if you try to force reading, you may prefer to do other entertaining activities.
Abstract: As known, to finish this book, you may not need to get it at once in a day. Doing the activities along the day may make you feel so bored. If you try to force reading, you may prefer to do other entertaining activities. But, one of concepts we want you to have this book is that it will not make you feel bored. Feeling bored when reading will be only unless you don't like the book. vlsi specification verification and synthesis really offers what everybody wants.

72 citations


Journal ArticleDOI
TL;DR: Compared with the previous low-complexity and high performance techniques, this work achieves lower hardware cost, lower power consumption, and a better compression rate than other lossless ECG encoder designs.
Abstract: An efficient VLSI architecture of a lossless ECG encoding circuit is proposed for wireless healthcare monitoring applications. To reduce the transmission and storage data, a novel lossless compression algorithm is proposed for ECG signal compression. It consists of a novel adaptive rending predictor and a novel two-stage entropy encoder based on two Huffman coding tables. The proposed lossless ECG encoder design was implemented using only simple arithmetic units. To improve the performance, the proposed ECG encoder was designed by pipeline technology and implemented the two-stage entropy encoder by the architecture of a look-up table. The VLSI architecture of this work contains 3.55 K gate counts and its core area is 45987 µm2 synthesised by a 0.18 µm CMOS process. It can operate at 100 MHz processing rate with only 36.4 µW. The data compression rate reaches an average value 2.43 for the MIT-BIH Arrhythmia Database. Compared with the previous low-complexity and high performance techniques, this work achieves lower hardware cost, lower power consumption, and a better compression rate than other lossless ECG encoder designs.

70 citations


Proceedings ArticleDOI
28 Mar 2013
TL;DR: This paper presents an MTJ/MOS-hybrid video coding hardware that uses a cycle-based power-gating technique for a practical-scale MTJ-based NV-LIM LSI, which is fully designed using the established semi-automatedMTJ-oriented design flow.
Abstract: Nonvolatile logic-in-memory (NV-LIM) architecture [1], where magnetic tunnel junction (MTJ) devices [2] are distributed over a CMOS logic-circuit plane, has the potential of overcoming the serious power-consumption problem that has rapidly become a dominant constraint on the performance improvement of today's VLSI processors. Normally-off and instant-on capabilities with a small area penalty due to non-volatility and three-dimensional-stackability of MTJ devices in the above structure allow us to apply a power-gating technique in a fine temporal granularity, which can perfectly eliminate wasted power dissipation due to leakage current. The impact of embedding nonvolatile memory devices into a logic circuit was, however, demonstrated by using only small fabricated primitive logic-circuit elements [3], memory-like structures such as FPGA [4], or circuit simulation because of the lack of an established MTJ-oriented design flow reflecting the chip-fabrication environment, while larger-capacity and/or high-speed-access MRAM has been increasingly developed. In this paper, we present an MTJ/MOS-hybrid video coding hardware that uses a cycle-based power-gating technique for a practical-scale MTJ-based NV-LIM LSI, which is fully designed using the established semi-automated MTJ-oriented design flow.

69 citations


Journal ArticleDOI
TL;DR: The results of synthesis show that, in the first implementation, 17 929 slices or 20% of the chip area is occupied, which makes it suitable for speed-critical cryptographic applications, while in the second implementation, 14203 slices or 16% ofThe resulting architecture is suitable for applications that may require speed-area tradeoff.
Abstract: A new and highly efficient architecture for elliptic curve scalar point multiplication is presented. To achieve the maximum architectural and timing improvements, we reorganize and reorder the critical path of the Lopez-Dahab scalar point multiplication architecture such that logic structures are implemented in parallel and operations in the critical path are diverted to noncritical paths. The results we obtained show that with G=55 our proposed design is able to compute scalar multiplication over GF(2163) in 9.6 μs with the maximum achievable frequency of 250 MHz on Xilinx Virtex-4 (XC4VLX200), where G is the digit size of the underlying digit-serial finite-field multiplier. Another implementation variant for less resource consumption is also proposed; with G=33, the design performs the same operation in 11.6 μs at 263 MHz on the same platform. The results of synthesis show that, in the first implementation, 17 929 slices or 20% of the chip area is occupied, which makes it suitable for speed-critical cryptographic applications, while in the second implementation 14203 slices or 16% of the chip area is utilized, which makes it suitable for applications that may require speed-area tradeoff.

68 citations


Journal ArticleDOI
TL;DR: This work proposes a hierarchical way to merge flip-flops and significantly reduces clock power by 20-30% and the running time is very short, in the largest test case, which contains 1 700 000 flip- flops.
Abstract: Power has become a burning issue in modern VLSI design. In modern integrated circuits, the power consumed by clocking gradually takes a dominant part. Given a design, we can reduce its power consumption by replacing some flip-flops with fewer multi-bit flip-flops. However, this procedure may affect the performance of the original circuit. Hence, the flip-flop replacement without timing and placement capacity constraints violation becomes a quite complex problem. To deal with the difficulty efficiently, we have proposed several techniques. First, we perform a co-ordinate transformation to identify those flip-flops that can be merged and their legal regions. Besides, we show how to build a combination table to enumerate possible combinations of flip-flops provided by a library. Finally, we use a hierarchical way to merge flip-flops. Besides power reduction, the objective of minimizing the total wirelength is also considered. The time complexity of our algorithm is Θ(n1.12) less than the empirical complexity of Θ(n2). According to the experimental results, our algorithm significantly reduces clock power by 20-30% and the running time is very short. In the largest test case, which contains 1 700 000 flip-flops, our algorithm only takes about 5 min to replace flip-flops and the power reduction can achieve 21%.

Journal ArticleDOI
TL;DR: A Topology Aware Adaptive Routing (TAAR) to balance the traffic load for NSI-Mesh in 3D NoC and according to the proposed VLSI architecture, the TAAR only needs less than 24.8 percent hardware overhead.
Abstract: Three-dimensional network-on-chip (3D NoC) has been proposed to solve the complex on-chip communication issues in future 3D multicore systems. However, the thermal problems of 3D NoC are more serious than 2D NoC due to chip stacking. To keep the temperature below a certain thermal limit, the thermal emergent routers are usually throttled. Then, the topology of 3D NoC becomes a Nonstationary Irregular Mesh (NSI-Mesh). To ensure the successful packet delivery in the NSI-Mesh, some routing algorithms had been proposed in the previous works. However, the network still suffers from extremely traffic imbalance among lateral and vertical logic layer. In this paper, we propose a Topology Aware Adaptive Routing (TAAR) to balance the traffic load for NSI-Mesh in 3D NoC. TAAR has three routing modes, which can be dynamically adjusted based on the topology status of the routing path. In addition to increasing routing flexibility, the TAAR also increases both vertical and lateral path diversity to balance the traffic load. Compared with the related adaptive routing methods, the experimental results show that the proposed TAAR can reduce 19 to 295 percent traffic loads in the bottom logic layer and improve around 7.7 to 380 percent network throughput. According to our proposed VLSI architecture, the TAAR only needs less than 24.8 percent hardware overhead compared with the previous works.

Journal ArticleDOI
TL;DR: A novel detection algorithm with an efficient VLSI architecture featuring efficient operation over infinite complex lattices and support of unbounded infinite lattice decoding distinguishes the present method from previous K-Best strategies and also allows its complexity to scale sublinearly with the modulation order.
Abstract: A novel detection algorithm with an efficient VLSI architecture featuring efficient operation over infinite complex lattices is proposed. The proposed design results in the highest throughput, the lowest latency, and the lowest energy compared to the complex-domain VLSI implementations to date. The main innovations are a novel complex-domain means of expanding/visiting the intermediate nodes of the search tree on demand, rather than exhaustively, as well as a new distributed sorting scheme to keep track of the best candidates at each search phase. Its support of unbounded infinite lattice decoding distinguishes the present method from previous K-Best strategies and also allows its complexity to scale sublinearly with the modulation order. Since the expansion and sorting cores are data-driven, the architecture is well suited for a pipelined parallel VLSI implementation. The proposed algorithm is used to fabricate a 4×4, 64-QAM complex multiple-input-multiple-output detector in a 0.13-μm CMOS technology, achieving a clock rate of 417 MHz with the core area of 340 kgates. The chip test results prove that the fabricated design can sustain a throughput of 1 Gb/s with energy efficiency of 110 pJ/bit, the best numbers reported to date.

Journal ArticleDOI
TL;DR: Two novel design approaches called Probabilistic Pruning and Probabilism Logic Minimization are proposed to realize inexact circuits with zero hardware overhead and can independently achieve normalized gains as large as 2x--9.5x in energy-delay-area product for relative error magnitude as low as 10 − 4%--8% compared to corresponding conventional correct circuits.
Abstract: The domain of inexact circuit design, in which accuracy of the circuit can be exchanged for substantial cost (energy, delay, and/or area) savings, has been gathering increasing prominence of late owing to a growing desire for reducing energy consumption of the systems, particularly in the domain of embedded and (portable) multimedia applications. Most of the previous approaches to realizing inexact circuits relied on scaling of circuit parameters (such as supply voltage) taking advantage of an application’s error tolerance to achieve the cost and accuracy trade-offs, thus suffering from acute drawbacks of considerable implementation overheads that significantly reduced the gains. In this article, two novel design approaches called Probabilistic Pruning and Probabilistic Logic Minimization are proposed to realize inexact circuits with zero hardware overhead.Extensive simulations on various architectures of critical datapath elements demonstrate that each of the techniques can independently achieve normalized gains as large as 2x--9.5x in energy-delay-area product for relative error magnitude as low as 10 − 4p--8p compared to corresponding conventional correct circuits.

Journal ArticleDOI
TL;DR: The proposed VLSI circuit has been designed using the AMS 0.35 μm CMOS process and has been simulated using design kits for Synopsys and Cadence tools, and is shown to give rise to a BCM-like learning rule, which is a rate-based rule.

Journal ArticleDOI
TL;DR: In this article, the authors describe a class of 4-D light field filters for image processing, 4-d hyper-fan filters for low-light imaging, depth filtering, denoising and attenuation of distracting objects, with applications in computational photography and habitat monitoring.
Abstract: Advances in the performance of VLsi circuits are leading to a number of emerging applications of multidimensional (md) filters. Early progress was focused on the numerical design of two dimensional (2-d) transfer functions and the challenging stability issues associated with low-complexity infinite impulse response (iir) implementations. However, over the last decade or so, important practical advances have occurred in the design of 3-d and 4-d iir filters, leading to some important emerging applications. in this tutorial article, some of these applications are described, with emphasis on 2-d spatio-temporal beamforming and 4-d light field processing. in particular, advances in spatio-temporal beamforming for cognitive radio systems and for synthetic aperture radio telescopes are considered. in the 4-d case, we describe a class of 4-d light field filters for image processing, 4-d hyper-fan filters for low-light imaging, depth filtering, denoising and the attenuation of distracting objects, with applications in computational photography and habitat monitoring. Both analog and digital systolic VLsi circuit implementations are described with emphasis on recent progress using field programmable gate array (fPgA)-based and digital VLsi circuits that can potentially operate at radio frequencies in the multi-gHz range. these new innovations open up exciting possibilities for real-time md filters having frames rates in the multi-gHz for emerging radio frequency (rf) antenna signal processing and imaging systems.

Journal ArticleDOI
TL;DR: A novel low-power dual mode logic (DML) family, designed to operate in the subthreshold region, is introduced with a simple and intuitive design concept.
Abstract: In this brief, we introduce a novel low-power dual mode logic (DML) family, designed to operate in the subthreshold region. The proposed logic family can be switched between static and dynamic modes of operation according to system requirements. In static mode, the DML gates feature very low-power dissipation with moderate performance, while in dynamic mode they achieve higher performance, albeit with increased power dissipation. This is achieved with a simple and intuitive design concept. SPICE and Monte Carlo simulations compare performance, power dissipation, and robustness of the proposed DML gates to their CMOS and domino counterparts in the 80-nm process. Measurements of an 80-nm test chip are presented in order to prove the proposed concept.

Proceedings ArticleDOI
19 May 2013
TL;DR: Synthesis results on the FPGA platform indicate that the proposed design can double the throughput compared with previous work, and synthesis results under 45nm technology show that it can support real-time processing of 4Kx2K (4096×2048, 30fps) video sequences.
Abstract: In this paper, a high-performance multiplierless VLSI architecture for the transform applied in the emerging video coding standard-High Efficiency Video Coding (HEVC) is presented. The proposed architecture can support a variety of transform sizes from 4×4 to 32×32, and some simplification strategies are adopted during the implementation, such as reusing part of a larger sized transform structure reused by smaller ones, and turning multiplications by constant into shift and sum operations. Synthesis results on the FPGA platform indicate that the proposed design can double the throughput compared with previous work, with almost the same hardware cost. Moreover, synthesis results under 45nm technology show that it can support real-time processing of 4Kx2K (4096×2048, 30fps) video sequences. When a comparison index called “data throughput per unit area” is adopted, the proposed architecture is almost five times more efficient than is the previous design.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: This work adapt VLSI testing principles (justification and sensitization) to quantify the ability of a reverse engineer to unambiguously resolve the functionality of look-alike camouflaged gates.
Abstract: An Integrated Circuit (IC) can be reverse engineered by imaging its layout and reconstructing the netlist. IC camouflaging is a layout-level technique that hampers imaging-based reverse engineering by using, in one embodiment, functionally different standard cells that look alike. Reverse engineering will fail if the functionality of a camouflaged gate cannot be correctly resolved. We adapt VLSI testing principles (justification and sensitization) to quantify the ability of a reverse engineer to unambiguously resolve the functionality of look-alike camouflaged gates. We evaluate the security of look-alike standard cells based IC camouflaging by applying it on the controllers in OpenSPARC T1 processor.

Journal ArticleDOI
TL;DR: A soft-input soft-output fixed-complexity-sphere-decoding algorithm and its very large scale integration architecture are proposed for the iterative MIMO receiver and its deeply pipelined architecture improves the detection performance significantly with low detection latency.
Abstract: By exchanging soft information between the multiple-input multiple-output (MIMO) detector and the channel decoder, an iterative receiver can significantly improve the performance compared to the noniterative receiver. In this brief, a soft-input soft-output fixed-complexity-sphere-decoding algorithm and its very large scale integration architecture are proposed for the iterative MIMO receiver. The deeply pipelined architecture employs the optimized hybrid enumeration to search for the best child node estimate efficiently. By adding the counter hypotheses in parallel with other candidates, the proposed iterative MIMO detector improves the detection performance significantly with low detection latency. An iterative detector for an 4 × 4 64-quadrature amplitude modulation (QAM) MIMO system based on our proposed architecture is designed and implemented using the 90-nm CMOS technology. The detector can achieve a maximum throughput of 2.2 Gbit/s with an area efficiency of 3.96 Mbit/s/kGE, which is more efficient than other iterative MIMO detectors.

Proceedings Article
12 Jun 2013
TL;DR: This work proposes for the first time a complete SRAM offer in FDSOI technology, covering low leakage, high speed and low voltage customer requirements, through simple and innovative process/design solutions.
Abstract: We propose for the first time a complete SRAM offer in FDSOI technology, covering low leakage, high speed and low voltage customer requirements, through simple and innovative process/design solutions. Starting from a bulk-design direct porting, we evidenced +50% and +200% Iread at Vdd=1V and 0.6V, respectively vs 28LP bulk. Additionally, -100mV Vmin reduction has been demonstrated with 28FDSOI. Alternative flip-well and single well architecture provides further speed and Vmin improvement, down to 0.42V on 1Mb 0.197μm2. Ultimate stand-by leakage below 1pA on 0.120μm2 bitcell at Vdd=0.6V is finally reached by taking the full benefits of the back bias capability of FDSOI.

Journal ArticleDOI
TL;DR: This paper presents the first very-large-scale integration (VLSI) design of a monolithic wideband CS-based A2I converter that includes a signal acquisition stage capable of acquiring RF signals having large bandwidths and a high-throughput spectral activity detection unit.
Abstract: One of the key tasks in cognitive radio and communications intelligence is to detect active bands in the radio-frequency (RF) spectrum. In order to perform spectral activity detection in wideband RF signals, expensive and energy-inefficient high-rate analog-to-digital converters (ADCs) in combination with sophisticated digital detection circuitry are typically used. In many practical situations, however, the RF spectrum is sparsely populated, i.e., only a few frequency bands are active at a time. This property enables the design of so-called analog-to-information (A2I) converters, which are capable of acquiring and directly extracting the spectral activity information at low cost and low power by means of compressive sensing (CS). In this paper, we present the first very-large-scale integration (VLSI) design of a monolithic wideband CS-based A2I converter that includes a signal acquisition stage capable of acquiring RF signals having large bandwidths and a high-throughput spectral activity detection unit. Low-cost wideband signal acquisition is obtained via CS-based randomized temporal subsampling in combination with a 4-bit flash ADC. High-throughput spectrum activity detection from the coarsely quantized and compressive measurements is achieved by means of a massively-parallel VLSI design of a novel accelerated sparse spectrum dequantization (ASSD) algorithm. The resulting monolithic A2I converter is designed in 28 nm CMOS, acquires RF signals up to 6 GS/s, and the on-chip ASSD unit detects the active RF bands at a rate 30 × below real-time.

Journal ArticleDOI
TL;DR: Compared with previous low-complexity techniques, this paper performs with better quality, higher performance, less memory requirements, and lower hardware cost than other image scaling methods.
Abstract: In this paper, a low-complexity adaptive edge-enhanced algorithm is proposed for the implementation of 2-D image scaling applications. The proposed novel algorithm consists of a linear space-variant edge detector, a low complexity sharpening spatial filter, and a simplified bilinear interpolation. The edge detector is designed to discover the image edges by a low-cost edge-catching technique. The sharpening spatial filter is added as a prefilter to reduce the blurring effect produced by the bilinear interpolation. Furthermore, an adaptive technology is used to enhance the effect of the edge detector by adaptively selecting the input pixels of the bilinear interpolation. In addition, an algebraic manipulation and a hardware sharing techniques are used to simplify bilinear interpolation, which efficiently reduces the computing resources and silicon area in very large scale integration (VLSI) circuits. By adding eight 8-bit registers as a register bank, this design can process streaming data directly and requires only a one-line-buffer memory. The VLSI architecture of this paper contains 6.67-K gate counts and achieves about 280-MHz processing rate by using the TSMC 0.13-um CMOS process. Compared with previous low-complexity techniques, this paper performs with better quality, higher performance, less memory requirements, and lower hardware cost than other image scaling methods.

Journal ArticleDOI
TL;DR: Quality tests show that the proposed method reaches significantly better accuracy than alternative hardware-oriented approaches for the extraction of disparity maps from stereo images.

Journal ArticleDOI
TL;DR: A low-complexity, low-memory-requirement, and high-quality algorithm is proposed for VLSI implementation of an image scaling processor which reduces gate counts by more than 34.4% and requires only a one-line-buffer memory.
Abstract: In this brief, a low-complexity, low-memory-requirement, and high-quality algorithm is proposed for VLSI implementation of an image scaling processor. The proposed image scaling algorithm consists of a sharpening spatial filter, a clamp filter, and a bilinear interpolation. To reduce the blurring and aliasing artifacts produced by the bilinear interpolation, the sharpening spatial and clamp filters are added as prefilters. To minimize the memory buffers and computing resources for the proposed image processor design, a T-model and inversed T-model convolution kernels are created for realizing the sharpening spatial and clamp filters. Furthermore, two T-model or inversed T-model filters are combined into a combined filter which requires only a one-line-buffer memory. Moreover, a reconfigurable calculation unit is invented for decreasing the hardware cost of the combined filter. Moreover, the computing resource and hardware cost of the bilinear interpolator can be efficiently reduced by an algebraic manipulation and hardware sharing techniques. The VLSI architecture in this work can achieve 280 MHz with 6.08-K gate counts, and its core area is 30378 μm2 synthesized by a 0.13-μm CMOS process. Compared with previous low-complexity techniques, this work reduces gate counts by more than 34.4% and requires only a one-line-buffer memory.

Journal ArticleDOI
TL;DR: The techniques with the distance limit of cell and only searching in cell's neighbor region are proposed to accelerate the construction of the spatial structures and a grid-Octree hybrid structure is proposed, which has advantages over existing structures.
Abstract: In the capacitance extraction with the floating random walk (FRW) algorithm, the space management approach is required to facilitate finding the nearest conductor. The Octree and grid-based spatial structures have been used to decompose the whole domain into cells and to store information of local conductors. In this letter, the techniques with the distance limit of cell and only searching in cell's neighbor region are proposed to accelerate the construction of the spatial structures. A fast inquiry technique is proposed to fasten the nearest conductor query. We also propose a grid-Octree hybrid structure, which has advantages over existing structures. Experiments on large very large scale integration structures with up to 484441 conductors have validated the efficiency of the proposed techniques. The improved FRW algorithm is faster than RWCap for thousands times while extracting a single net, and several to tens times while extracting 100 nets.

Proceedings ArticleDOI
28 Mar 2013
TL;DR: This paper presents a monolithically integrated optical modulator with a new all-digital driver circuit in a commercial 45nm SOI process that enables the carrier-injection modulator to operate at 2.5Gb/s with an energy-cost of 1.23pJ/b, making it ~4× faster and more energy-efficient than the previous monolithicically integrated driver/modulator presented in [5].
Abstract: Integrated photonic interconnect technology presents a disruptive alternative to electrical I/O for many VLSI applications. Superior bandwidth-density and energy-efficient operation can be realized through dense wavelength-division multiplexing (DWDM) and lower transmission losses. There are two main paths towards an integrated platform. Hybrid/heterogeneous designs [1-3] enable each component to be custom-tailored, but suffer from large packaging parasitics, increased manufacturing costs due to requisite process flows, and costly 3D integration or microbump packaging. Monolithic integration mitigates integration overheads, but has not penetrated deeply-scaled technologies due to necessary process customizations [4]. The first monolithic integration of photonic devices and electronic-photonic operation in sub-100 nm (45 nm SOI process with zero foundry changes) is demonstrated in [5]. This paper presents a monolithically integrated optical modulator with a new all-digital driver circuit in a commercial 45nm SOI process. The waveform-conditioning driver circuit enables the carrier-injection modulator to operate at 2.5Gb/s with an energy-cost of 1.23pJ/b, making it ~4× faster and more energy-efficient than the previous monolithically integrated driver/modulator presented in [5].


Proceedings ArticleDOI
29 May 2013
TL;DR: The micro-architectural and circuit design techniques for building complex VLSI circuits with microelectromechanical (MEM) relays are described and experimental results are presented to demonstrate the viability of this technology.
Abstract: This paper describes the micro-architectural and circuit design techniques for building complex VLSI circuits with micro-electromechanical (MEM) relays and presents experimental results to demonstrate the viability of this technology. By tailoring the circuits and micro-architecture to the relay device characteristics, the performance of the relay-based multiplier is improved by an order of magnitude over any known static CMOS style implementation, and by ~4x over CMOS pass-gate equivalent implementations. A 16-bit relay multiplier is shown to offer ~10x lower energy per operation at sub-10 MOPS throughputs when compared to an optimized CMOS multiplier at an equivalent 90 nm technology node. The functionality of the primary multiplier building block, a full (7:3) compressor built with 46 scaled MEM-relays, which is the largest working MEM-relay circuit reported to date, is also demonstrated.

Journal ArticleDOI
TL;DR: An efficient VLSI design of a lossless electrocardiogram (ECG) encoder is proposed for wireless body sensor networks and achieves an average compression rate of 2.56 for the MIT-BIH arrhythmia database.
Abstract: An efficient VLSI design of a lossless electrocardiogram (ECG) encoder is proposed for wireless body sensor networks. To save wireless transmission power, a novel lossless encoding algorithm had been created for ECG signal compression. The proposed algorithm consists of a novel adaptive predictor based on fuzzy decision control, and a novel hybrid entropy encoder including both a two-stage Huffman and a Golomb-Rice coding. The VLSI architecture contains only 2.71 K gate counts and its core area is 33 929 μm 2 synthesized by a 0.18 μm CMOS process. Moreover, this design can be operated at 100 MHz processing rate by consuming only 30 μW. It achieves an average compression rate of 2.56 for the MIT-BIH arrhythmia database. Compared with previous low-complexity and high-performance lossless ECG encoder studies, this design has a higher compression rate, lower power consumption and lower hardware cost than other VLSI designs.