scispace - formally typeset
Search or ask a question

Showing papers in "Vlsi Design in 2011"


Journal ArticleDOI
TL;DR: The proposed lossless algorithm has compression ratio of approximately 73% for endoscopic images and has better compression ratio, lower computational complexity, and lesser memory requirement than the existing lossless compression standard such as JPEG-LS.
Abstract: We present a lossless and low-complexity image compression algorithm for endoscopic images. The algorithm consists of a static prediction scheme and a combination of golomb-rice and unary encoding. It does not require any buffer memory and is suitable to work with any commercial low-power image sensors that output image pixels in raster-scan fashion. The proposed lossless algorithm has compression ratio of approximately 73% for endoscopic images. Compared to the existing lossless compression standard such as JPEG-LS, the proposed scheme has better compression ratio, lower computational complexity, and lesser memory requirement. The algorithm is implemented in a 0.18 µm CMOS technology and consumes 0.16mm × 0.16mm silicon area and 18 µW of power when working at 2 frames per second.

54 citations


Journal ArticleDOI
TL;DR: This work offers new algorithms and a methodology for SPICE-accurate optimization of clock networks, coordinated to satisfy slew constraints and achieve best trade-offs between skew, insertion delay, power, as well as tolerance to variations.
Abstract: On-chip clock networks are remarkable in their impact on the performance and power of synchronous circuits, in their susceptibility to adverse effects of semiconductor technology scaling, as well as in their strong potential for improvement through better CAD algorithms and tools. Existing literature is rich in ideas and techniques but performs large-scale optimization using analytical models that lost accuracy at recent technology nodes and have rarely been validated by realistic SPICE simulations on large industry designs. Our work offers a methodology for SPICE-accurate optimization of clock networks, coordinated to satisfy slew constraints and achieve best tradeoffs between skew, insertion delay, power, as well as tolerance to variations. Our implementation, called Contango, is evaluated on 45 nm benchmarks from IBM Research and Texas Instruments with up to 50 K sinks. It outperforms all published results in terms of skew and shows superior scalability.

27 citations


Journal ArticleDOI
TL;DR: This paper attempts to categorize the challenges and solutions for low-power and low-voltage application and thus provides a roadmap for device designers working in the submicron and deep submicrons region of CMOS devices separately.
Abstract: In recent years, the demand for power sensitive designs has grown significantly due to the fast growth of battery-operated portable applications. As the technology scaling continues unabated, subthreshold device design has gained a lot of attention due to the lowpower and ultra-low-power consumption in various applications. Design of low-power high-performance submicron and deep submicron CMOS devices and circuits is a big challenge. Short-channel effect is a major challenge for scaling the gate length down and below 0.1 µm. Detailed review and potential solutions for prolonging CMOS as the leading information technology proposed by various researchers in the past two decades are presented in this paper. This paper attempts to categorize the challenges and solutions for low-power and low-voltage application and thus provides a roadmap for device designers working in the submicron and deep submicron region of CMOS devices separately.

23 citations


Journal ArticleDOI
TL;DR: The IP-based SoC design flow is discussed to highlight the exact locations and the nature of infringements in the flow, identifies the adversaries, categorizes these infringements, and applies strategic analysis on the effectiveness of the existing IPP techniques for these categories of infringement.
Abstract: Increased design complexity, shrinking design cycle, and low cost--this three-dimensional demandmandates advent of system-onchip (SoC) methodology in semiconductor industry. The key concept of SoC is reuse of the intellectual property (IP) cores. Reuse of IPs on SoC increases the risk of misappropriation of IPs due to introduction of several new attacks and involvement of various parties as adversaries. Existing literature has huge number of proposals for IP protection (IPP) techniques to be incorporated in the IP design flow as well as in the SoC design methodology. However, these are quite scattered, limited in possibilities in multithreat environment, and sometimes mutually conflicting. Existing works need critical survey, proper categorization, and summarization to focus on the inherent tradeoff, existing security holes, and new research directions. This paper discusses the IP-based SoC design flow to highlight the exact locations and the nature of infringements in the flow, identifies the adversaries, categorizes these infringements, and applies strategic analysis on the effectiveness of the existing IPP techniques for these categories of infringements. It also clearly highlights recent challenges and new opportunities in this emerging field of research.

21 citations


Journal ArticleDOI
TL;DR: Memetic Algorithm is an Evolutionary Algorithm that includes one or more local search phases within its evolutionary cycle to obtain the minimum wirelength by reducing delay in partitioning and by reducing area in floorplanning.
Abstract: Minimizing the wirelength plays an important role in physical design automation of very large-scale integration (VLSI) chips. The objective of wirelength minimization can be achieved by finding an optimal solution for VLSI physical design components like partitioning and floorplanning. In VLSI circuit partitioning, the problem of obtaining a minimum delay has prime importance. In VLSI circuit floorplanning, the problem of minimizing silicon area is also a hot issue. Reducing the minimum delay in partitioning and area in floorplanning helps to minimize the wirelength. The enhancements in partitioning and floorplanning have influence on other criteria like power, cost, clock speed, and so forth.Memetic Algorithm (MA) is an Evolutionary Algorithm that includes one or more local search phases within its evolutionary cycle to obtain the minimum wirelength by reducing delay in partitioning and by reducing area in floorplanning. MA applies some sort of local search for optimization of VLSI partitioning and floorplanning. The algorithm combines a hierarchical design technique like genetic algorithm and constructive technique like Simulated Annealing for local search to solve VLSI partitioning and floorplanning problem. MA can quickly produce optimal solutions for the popular benchmark.

13 citations


Journal ArticleDOI
TL;DR: The proposed "Weighted Transition Based Reordering-Columnwise Bit Filling-Difference Vector (WTRCBF-DV)" is a modification to earlier proposed "Hamming Distance based Reordering--ColumnwiseBit Filling and Difference vector."
Abstract: Test data compression is the major issues for the external testing of IP core-based SoC. From a large pool of diverse available techniques for compression, run length-based schemes are most appropriate for IP cores. To improve the compression and to reduce the test power, the test data processing schemes like "don't care bit filling" and "reordering" which do not require any modification in internal structure and do not demand use of any test development tool can be used for SoC-containing IP cores with hidden structure. The proposed "Weighted Transition Based Reordering-Columnwise Bit Filling-Difference Vector (WTRCBF-DV)" is a modification to earlier proposed "Hamming Distance based Reordering--Columnwise Bit Filling and Difference vector." This new method aims not only at very high compression but also aims at shift in test power reduction without any significant on-chip area overhead. The experiment results on ISCAS89 benchmark circuits show that the test data compression ratio has significantly improved for each case. It is also noteworthy that, in most of the case, this scheme does not involve any extra silicon area overhead compared to the base code with which it used. For few cases, it requires an extra XOR gate and feedback path only. As application of this scheme increases run length of zeroes in test set, as a result, the number of transitions during scan shifting is reduced. This may lower scan power. The proposed scheme can be easily integrated into the existing industrial flow.

10 citations


Journal ArticleDOI
TL;DR: Starting from the terminology and models for power consumption during test, the state of the art in low-power testing is presented and the detailed survey on various power reduction techniques proposed for all aspects of testing like external testing, Built-In Self-Test techniques, and the advances in DFT techniques emphasizing low power are presented.
Abstract: Test power is the major issue for current generation VLSI testing. It has become the biggest concern for today's SoC. While reducing the design efforts, the modular design approach in SoC (i.e., use of IP cores in SoC) has further exaggerated the test power issue. It is not easy to select an effective low-power testing strategy from a large pool of diverse available techniques. To find the proper solutions for test power reduction strategy for IP core-based SoC, in this paper, starting from the terminology and models for power consumption during test, the state of the art in low-power testing is presented. The paper contains the detailed survey on various power reduction techniques proposed for all aspects of testing like external testing, Built-In Self-Test techniques, and the advances in DFT techniques emphasizing low power. Further, all the available low-power testing techniques are strongly analyzed for their suitability to IP core-based SoC.

6 citations


Journal ArticleDOI
TL;DR: Different closed form approximations are reviewed and compared against the CDF matching method, which is shown to be most effective method for accurate statistical leakage modeling.
Abstract: Device mismatch and process variation models play a key role in determining the functionality and yield of sub-100nm design. Average characteristics are often of interest, such as the average leakage current or the average read delay. However, detecting rare functional fails is critical for memory design and designers often seek techniques that enable accurately modeling such events. Extremely leaky devices can inflict functionality fails. The plurality of leaky devices on a bitline increase the dimensionality of the yield estimation problem. Simplified models are possible by adopting approximations to the underlying sum of lognormals. The implications of such approximations on tail probabilities may in turn bias the yield estimate. We review different closed form approximations and compare against the CDF matching method, which is shown to be most effective method for accurate statistical leakage modeling.

5 citations


Journal ArticleDOI
TL;DR: Significant energy savings occur at the final stages of the circuits, while the largest relative downsizing occurs in middle stages.
Abstract: A design scenario examined in this paper assumes that a circuit has been designed initially for high speed, and it is redesigned for low power by downsizing of the gates. In recent years, as power consumption has become a dominant issue, new optimizations of circuits are required for saving energy. This is done by trading off some speed in exchange for reduced power. For each feasible speed, an optimization problem is solved in this paper, finding new sizes for the gates such that the circuit satisfies the speed goal while dissipating minimal power. Energy/delay gain (EDG) is defined as a metric to quantify the most efficient tradeoff. The EDG of the circuit is evaluated for a range of reduced circuit speeds, and the power-optimal gate sizes are compared with the initial sizes. Most of the energy savings occur at the final stages of the circuits, while the largest relative downsizing occurs in middle stages. Typical tapering factors for power efficient circuits are larger than those for speed-optimal circuits. Signal activity and signal probability affect the optimal gate sizes in the combined optimization of speed and power.

5 citations


Journal ArticleDOI
Ying Zhou1, Charles J. Alpert1, Zhuo Li1, Cliff Sze1, Louise H. Trevillyan1 
TL;DR: Two simple yet efficient buffering and gate sizing techniques are presented and achieved to achieve a physical synthesis flow with much smaller area bloat and achieves 12% logic area growth reduction and 5.8% total area reduction.
Abstract: Area bloat in physical synthesis not only increases power dissipation, but also creates congestion problems, forces designers to enlarge the die area, rerun the whole design flow, and postpone the design deadline. As a result, it is vital for physical synthesis tools to achieve timing closure and low power consumption with intelligent area control. The major sources of area increase in a typical physical synthesis flow are from buffer insertion and gate sizing, both of which have been discussed extensively in the last two decades, where the main focus is individual optimized algorithm. However, building a practical physical synthesis flow with buffering and gate sizing to achieve the best timing/area/runtime is rarely discussed in any previous literatures. In this paper, we present two simple yet efficient buffering and gate sizing techniques and achieve a physical synthesis flow with much smaller area bloat. Compared to a traditional timing-driven flow, our work achieves 12% logic area growth reduction, 5.8% total area reduction, 10.1% wirelength reduction, and 770 ps worst slack improvement on average on 20 industrial designs in 65nm and 45 nm.

4 citations


Journal ArticleDOI
TL;DR: These considerations incorporate discussions of the feasibility of extending this classification technique beyond n = 5 and a new implementation is presented along with a basic analysis of the complexity of the problem.
Abstract: This paper presents some new considerations for spectral techniques for classification of Boolean functions. These considerations incorporate discussions of the feasibility of extending this classification technique beyond n = 5. A new implementation is presented along with a basic analysis of the complexity of the problem. We also note a correction to results in this area that were reported in previous work.

Journal ArticleDOI
TL;DR: A dedicated hardware module which can reconfigure itself either to the OFDM Wireless LAN or WCDMA standard, and shows that the proposed hardware architecture utilizes less number of resources compared to the conventional Reconfigurable Receiver Architecture System.
Abstract: The Fourth Generation (4G) network is expected to serve mobile subscribers under dynamic network conditions and offer any type service: anytime, anywhere, and anyhow. Two such technologies that can respond to the above said services are Wideband Code Division Multiple Access (WCDMA) and Orthogonal Frequency Division Multiplexing (OFDM). The main contribution of this paper is to propose a dedicated hardware module which can reconfigure itself either to the OFDM Wireless LAN or WCDMA standard. In this paper, Fast Fourier Transform(FFT) algorithm is implemented for OFDM standard, and rake receiver is implemented for WCDMA standard. Initially efficient implementations of these two algorithms are tested separately and identified the resources utilized by them. Then the new hardware architecture, which configures to any one of these two standards on demand, is proposed. This architecture efficiently shares the resources needed for these two standards. The proposed architecture is simulated using ModelSimSE v6.5 and mapped onto a virtex 5 FPGA device (xc5v1x30ff324) using the tool Xilinx ISE 9.2i, and the results are compared with the standard approach. These results show that the proposed hardware architecture utilizes less number of resources compared to the conventional Reconfigurable Receiver Architecture System.

Journal ArticleDOI
TL;DR: This paper presents novel techniques for reducing congestion and minimizing overflows, based on ripping up nets that go through the congested areas and replacing them with congestion-aware topologies, and presents an algorithm for identifying efficient congestion- Aware network coding topologies.
Abstract: In the advent of smaller devices, a significant increase in the density of on-chip components has raised congestion and overflow as critical issues in VLSI physical design automation. In this paper, we present novel techniques for reducing congestion and minimizing overflows. Our methods are based on ripping up nets that go through the congested areas and replacing them with congestion-aware topologies. Our contributions can be summarized as follows. First, we present several efficient algorithms for finding congestion-aware Steiner trees that is, trees that avoid congested areas of the chip. Next, we show that the novel technique of network coding can lead to further improvements in routability, reduction of congestion, and overflow avoidance. Finally, we present an algorithm for identifying efficient congestion-aware network coding topologies. We evaluate the performance of the proposed algorithms through extensive simulations.

Journal ArticleDOI
TL;DR: This paper considers buffer insertion with a fixed-outline constraint using Less Flexibility First (LFF) algorithm, which is able to distinguish geometric differences between two floorplan candidates, even if they have the same topological structure.
Abstract: IP cores are widely used in modern SOC designs. Hierarchical design has been employed for the growing design complexity, which stimulates the need for fixed-outline floorplanning. Meanwhile, buffer insertion is usually adopted tomeet the timing requirement. In this paper, buffer insertion is considered with a fixed-outline constraint using Less Flexibility First (LFF) algorithm. Compared with Simulated Annealing (SA), our work is able to distinguish geometric differences between two floorplan candidates, even if they have the same topological structure. This is helpful to get a better result for buffer planning since buffer insertion is quite sensitive to a geometric change. We also extend the previous LFF to a more robust version called Sliced-LFF to improve buffer planning. Moreover, a 2-staged LFF framework and a post-greedy procedure are introduced based on our net-classing strategy and finally achieve a significant improvement on the success rate of buffer insertion (40.7% and 37.1% in different feature sizes). Moreover, our work is much faster than SA, since it is deterministic without iterations.

Journal ArticleDOI
Guanyi Sun1, Shengnan Xu1, Wang Xu1, Dawei Wang1, Eugene Tang1, Yangdong Deng1, Sun Chan1 
TL;DR: This work proposes a system-level simulation framework, System Performance Simulation Implementation Mechanism, or SPSIM, which consists of an executable SoC model, a simulation tool chain, and a modeling methodology and incorporates effective timing models, which can achieve a high accuracy after hardware-based calibration.
Abstract: Today's System-on-Chips (SoCs) design is extremely challenging because it involves complicated design tradeoffs and heterogeneous design expertise. To explore the large solution space, system architects have to rely on system-level simulators to identify an optimized SoC architecture. In this paper, we propose a system-level simulation framework, System Performance Simulation Implementation Mechanism, or SPSIM. Based on SystemC TLM2.0, the framework consists of an executable SoC model, a simulation tool chain, and a modeling methodology. Compared with the large body of existing research in this area, this work is aimed at delivering a high simulation throughput and, at the same time, guaranteeing a high accuracy on real industrial applications. Integrating the leading TLM techniques, our simulator can attain a simulation speed that is not slower than that of the hardware execution by a factor of 35 on a set of real-world applications. SPSIM incorporates effective timing models, which can achieve a high accuracy after hardware-based calibration. Experimental results on a set of mobile applications proved that the difference between the simulated and measured results of timing performance is within 10%, which in the past can only be attained by cycle-accurate models.

Journal ArticleDOI
TL;DR: The constructed performance models have been used to implement a GAbased high-level topology sizing process and are accurate with respect to real circuit-level simulation results, fast to evaluate, and have a good generalization ability.
Abstract: This paper presents a systematic methodology for the generation of high-level performance models for analog component blocks. The transistor sizes of the circuit-level implementations of the component blocks along with a set of geometry constraints applied over them define the sample space. A Halton sequence generator is used as a sampling algorithm. Performance data are generated by simulating each sampled circuit configuration through SPICE. Least squares support vector machine (LS-SVM) is used as a regression function. Optimal values of the model hyper parameters are determined through a grid search-based technique and a genetic algorithm- (GA-) based technique. The high-level models of the individual component blocks are combined analytically to construct the high-level model of a complete system. The constructed performance models have been used to implement a GAbased high-level topology sizing process. The advantages of the present methodology are that the constructed models are accurate with respect to real circuit-level simulation results, fast to evaluate, and have a good generalization ability. In addition, the model construction time is low and the construction process does not require any detailed knowledge of circuit design. The entire methodology has been demonstrated with a set of numerical results.

Journal ArticleDOI
TL;DR: Process simulation on the proposed VG RF SOI NLIGBT was carried out with TCAD to provide a virtually fabricated device structure and an approximate latching current model was derived according to the condition of minimum regenerative feedback couple between the parasitic dualtransistors.
Abstract: Based on the previous achievements in improving latch-up immunity of SOI LIGBT, process simulation on our proposed VG RF SOI NLIGBT was carried out with TCAD to provide a virtually fabricated device structure. Then, an approximate latching current model was derived according to the condition of minimum regenerative feedback couple between the parasitic dualtransistors. The model indicates that its latching current is a few orders higher than those before. Further verification through device simulation was done with TCAD, which proved that its weak snapback voltage in the off state is about 0.5-2.75 times higher than those breakdown voltages reported before, its breakdown voltage in the off state is about 19V higher than its weak snapback voltage, and its latching current density in the on state is about 2-3 orders of magnitude higher than those reported before at room temperature due to hole current bypass through P+ contact in P-well region. Therefore, it is characterized by significantly improved latch-up immunity.