scispace - formally typeset
Search or ask a question

Showing papers presented at "Asia and South Pacific Design Automation Conference in 2004"


Proceedings ArticleDOI
27 Jan 2004
TL;DR: In this article, a quick tutorial on the Design for Manufacturability problems of these process generations, concentrating primarily on the limitations of optical lithography, is presented and the remainder of the talk covers the changes to physical design tools, such as placement and routing, that are needed to cope with these problems.
Abstract: The next few process generations (65 nm and below) will have serious lithography and manufacturing constraints since the feature size is shrinking much more rapidly than the wavelengths used in manufacturing the chips. This paper starts with a quick tutorial on the Design for Manufacturability problems of these process generations, concentrating primarily on the limitations of optical lithography. The remainder of the talk covers the changes to physical design tools, such as placement and routing, that are needed to cope with these problems.

108 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: A System-level design flow and respective EDA support tools for low power designs and the architecture of an Algorithm-level power estimation tool will be presented together with some use cases based on a EDA product which has been commercially developed from the research results of several collaborative projects funded by the Commission of the European Community.
Abstract: Each year tens of billions of Dollars are wasted by the microelectronics industry because of missed deadlines and delayed design projects. These delays are partially due to design iterations many of which could have been avoided if the low level remifications of high level design decisions, at the Architecture- and Algorithmic-level would have been known before the time consuming and tedious RT- and lower level implementation started. In this contribution we present a System-level design flow and respective EDA support tools for low power designs. We analyze the requirements for such a design technology, which shifts more responsibility to the system architect. We exemplify this approach with a design flow for low power systems. The architecture of an Algorithm-level power estimation tool will be presented together with some use cases based on a EDA product which has been commercially developed from the research results of several collaborative projects funded by the Commission of the European Community.

81 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: This paper first formulate a k-cofamily-based register binding algorithm targeting the multiplexer optimization problem, then further reduce themultiplexer width through an efficient port assignment algorithm and achieves significantly better results consistently.
Abstract: Data path connection elements, such as multiplexers, consume a significant amount of area on a VLSI chip, especially for FPGA designs. Multiplexer optimization is a difficult problem because both register binding and port assignment to reduce total multiplexer connectivity during high-level synthesis are NP-complete problems. In this paper, we first formulate a k-cofamily-based register binding algorithm targeting the multiplexer optimization problem. We then further reduce the multiplexer width through an efficient port assignment algorithm. Experimental results show that we are 44% better overall than the left-edge register binding algorithm on the total usage of multiplexer inputs and 7% better than a bipartite graph-based algorithm. For large designs, we are able to achieve significantly better results consistently. After technology mapping, placement and routing for an FPGA architecture, it shows considerably positive impacts on chip area, delay and power consumption.

75 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: A method for translating Boolean formulas from formal verification of microprocessors to CNF by identifying gates with fanout count of 1, and merging them with their fanout gate to generate a single set of equivalent CNF clauses, which eliminates the intermediate CNF variable for the output of the first gate.
Abstract: We present a method for translating Boolean formulas to CNF by identifying gates with fanout count of 1, and merging them with their fanout gate to generate a single set of equivalent CNF clauses. This eliminates the intermediate CNF variable for the output of the first gate, and reduces the number of CNF clauses, compared to the conventional translation to CNF, where each gate is assigned an output variable and is represented with a separate set of CNF clauses. Chains of nested ITE operators, where each ITE is used only as else-argument of the next ITE, are similarly merged and represented with a single set of clauses without intermediate variables. This method was applied to Boolean formulas from formal verification of microprocessors. The formulas require up to hundreds of thousands of variables and millions of clauses, when translated to CNF with the conventional approach. The best translation reduced the CNF variables by up to 2/spl times/ the SAT-solver decisions by up to 5/spl times/ the SAT-solver conflicts by up to 6/spl times/ and accelerated the SAT checking by up to 7.6/spl times/ for unsatisfiable formulas, and 136/spl times/ for satisfiable ones.

75 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: It is shown that satisfiability captures significant problem characateristics and it offers different trade-offs and it also provides new opportunities for satisfiability-based diagnosis tools and diagnosis-specific satisfiability algorithms.
Abstract: Recent advances in Boolean satisfiability have made it an attractive engine for solving many digital VLSI design problems such as verification, model checking, optimization and test generation. Fault diagnosis and logic debugging have not been addressed by existing satisfiability-based solutions. This paper attempts to bridge this gap by proposing a satisfiability-based solution to these problems. The proposed formulation is intuitive and easy to implement. It shows that satisfiability captures significant problem characateristics and it offers different trade-offs. It also provides new opportunities for satisfiability-based diagnosis tools and diagnosis-specific satisfiability algorithms. Theory and experiments validate the claims and demonstrate its potential.

72 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: A SAT-based ATPG tool targeting on a path-oriented transition fault model, utilizing an efficient false-path pruning technique to identify the longest sensitizable path through each fault site, which can be orders-of-magnitude faster than a commercial AtPG tool.
Abstract: This paper presents a SAT-based ATPG tool targeting on a path-oriented transition fault model Under this fault model, a transition fault is detected through the longest sensitizable path In the ATPG process, we utilize an efficient false-path pruning technique to identify the longest sensitizable path through each fault site We demonstrate that our new SAT-based ATPG can be orders-of-magnitude faster than a commercial ATPG tool To demonstrate the quality of the tests generated by our approach, we compare its resulting test set to three other test sets: a single-detection transition fault test set, a multiple-detection transition fault test set, and a traditional critical path test set added to the single-detection set The superiority of our approach is demonstrated through various experiments based on statistical delay simulation and defect injection using benchmark circuits

72 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: This paper proposes a hardware architecture for the standard HMAC function that supports both SHA-1 and MD5 algorithms and automatically generates the padding words and reuses the key for consecutive HMAC jobs that use the same key.
Abstract: Cryptographic algorithms are prevalent and important in digital communications and storage, e.g., both SHA-1 and MD5 algorithms are widely used hash functions in IPSec and SSL for checking the data integrity. In this paper, we propose a hardware architecture for the standard HMAC function that supports both. Our HMAC design automatically generates the padding words and reuses the key for consecutive HMAC jobs that use the same key. We have also implemented the HMAC design in silicon. Compared with existing designs, our HMAC processor has lower hardware cost---12.5% by sharing of the SHA-1 and MD5 circuitry and a little performance penalty.

60 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: This work presents an adaptive and incremental re-compression technique to maintain efficiency under frequent partial frame buffer updates, based on a run-length encoding for on-the-fly compression, with a negligible burden in resources and time.
Abstract: Despite the limited power available in a battery-operated hand-held device, a display system must still have an enough resolution and sufficient color depth to deliver the necessary information. We introduce some methodologies for frame buffer compression that efficiently reduce the power consumption of display systems and thus distinctly extend battery life for hand-held applications. Our algorithm is based on a run-length encoding for on-the-fly compression, with a negligible burden in resources and time. We present an adaptive and incremental re-compression technique to maintain efficiency under frequent partial frame buffer updates. We save about 30% to 90% frame buffer activity on average for various hand-held applications. We have implemented an LCD controller with frame buffer compression occupying 1,026 slices and 960 flip-flops in a Xilinx Sprantan-II FPGA, which has an equivalent gate count of 65,000 gates. It consumes 30mW more power and 10% additional silicon space than an LCD controller without frame buffer compression, but reduces the power consumption of the frame buffer memory by 400mW.

56 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: This paper investigates a 3D die-stacking based VLSI integration strategy, so-called 2.5D integration, which can potentially overcome many problems stumbling the development of monolithic System-on-Chip (SoC).
Abstract: This paper investigates a 3D die-stacking based VLSI integration strategy, so-called 2.5D integration, which can potentially overcome many problems stumbling the development of monolithic System-on-Chip (SoC). In this paper, we review available fabrication technologies and testing solutions for the new integration strategy. We also propose a design driven system implementation schema for this new integration strategy. A layout synthesis framework is under development by us to analyze typical "what if" questions and resolve major physical attributes for a 2.5D system according to the design specification and constraints.

53 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: Off-line and on-line dynamic voltage scaling algorithms that are based on existing DVS algorithms that utilize the execution behaviors of scheduling server for aperiodic tasks are proposed.
Abstract: We describe dynamic voltage scaling (DVS) algorithms for real-time systems with both periodic and aperiodic tasks. Although many DVS algorithms have been developed for real-time systems with periodic tasks, none of them can be used for the system with both periodic and aperiodic tasks because of arbitrary temporal behaviors of aperiodic tasks. We propose an off-line DVS algorithm and on-line DVS algorithms that are based on existing DVS algorithms. The proposed algorithms utilize the execution behaviors of scheduling server for aperiodic tasks. Experimental results show that the proposed algorithms reduce the energy consumption by 12% and 32% under the RM scheduling policy and the EDF scheduling policy, respectively.

52 citations


Proceedings ArticleDOI
27 Jan 2004
TL;DR: In this article, a deterministic placement method for standard cells is proposed to minimize total power consumption and lead to a smooth temperature distribution over the die, where the overall weighted net length is minimized.
Abstract: This paper describes a deterministic placement method for standard cells which minimizes total power consumption and leads to a smooth temperature distribution over the die. It is based on the Quadratic Placement formulation, where the overall weighted net length is minimized. Two innovations are introduced to achieve the above goals. First, overall power consumption is minimized by shortening nets with a high power dissipation. Second, cells are spread over the placement area such that the die temperature profile inside the package is flattened.Experimental results show a significant reduction of the maximum temperature on the die and a reduction of total power consumption.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: A novel topological floorplan representation, named 3D-subTCG (3-Dimensional sub-Transitive Closure Graph) is used to deal with the 3-dimensional (temporal) floorplanning/placement problem, arising from dynamically reconfigurable FPGAs.
Abstract: Improving logic capacity by time-sharing, dynamically reconfigurable FPGAs are employed to handle designs of high complexity and functionality. In this paper, we use a novel topological floorplan representation, named 3D-subTCG (3-Dimensional sub-Transitive Closure Graph) to deal with the 3-dimensional (temporal) floorplanning/placement problem, arising from dynamically reconfigurable FPGAs. The 3D-subTCG uses three transitive closure graphs to model the temporal and spatial relations between modules. We derive the feasibility conditions for the precedence constraints induced by the execution of the dynamically reconfigurable FPGAs. Because the geometric relationship is transparent to 3D-subTCG and its induced operations, we can easily detect any violation of temporal precedence constraints on 3D-subTCG. We also derive important properties of the 3D-subTCG to reduce the solution space and shorten the running time for 3D (temporal) foorplanning/placement. Experimental results show that our 3D-subTCG based algorithm is very effective and efficient.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: A power grid analyzer that combines a divide-and-conquer strategy with a random-walk engine, first described and then extended to multi-level and "virtual-layer" hierarchy, shows speedups over the generic random- walk method and is more robust in solving various types of industrial circuits.
Abstract: This paper presents a power grid analyzer that combines a divide-and-conquer strategy with a random-walk engine. A single-level hierarchical method is first described and then extended to multi-level and "virtual-layer" hierarchy. Experimental results show that these algorithms not only achieve speedups over the generic random-walk method, but also are more robust in solving various types of industrial circuits. For example, a 71K-node circuit is solved in 4.16 seconds, showing a more than 4 times speedup over the generic method; a 348K-node wire-bond power grid, for which the performance of the generic method degrades, is solved in 75.88 seconds.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: An efficient method to budget on-chip decoupling capacitors (decaps) to optimize power delivery networks in an area efficient way and an efficient gradient computation method and a novel equivalent circuit modeling technique to speed up the optimization process are presented.
Abstract: In this paper, we present an efficient method to budget on-chip decoupling capacitors (decaps) to optimize power delivery networks in an area efficient way. Our algorithm is based on an efficient gradient-based non-linear programming method for searching the solution. Our contributions are an efficient gradient computation method (time-domain merged adjoint network) and a novel equivalent circuit modeling technique to speed up the optimization process. Experimental results demonstrate that the algorithm is capable of efficiently optimizing very large scale P/G networks.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: A method of automatically generating embedded software from system specification written in SLDL is presented and the effectiveness of the proposed method is demonstrated by a tool which can generate efficient ANSI C code from system models written inSLDL.
Abstract: To meet the challenge of increasing design complexity, designers are turning to system level design languages (SLDLs) to model systems at a higher level of abstraction. This paper presents a method of automatically generating embedded software from system specification written in SLDL. Several refinement steps and intermediate models are introduced in our software generation flow. We demonstrate the effectiveness of the proposed method by a tool which can generate efficient ANSI C code from system models written in SLDL.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: Experimental results show that the proposed methodology to perform early design stage validation of hardware/software (HW/SW) systems using a HW/SW interface simulation model is easy to use and efficient while providing fast simulation and accuracy.
Abstract: In this paper, we propose a methodology to perform early design stage validation of hardware/software (HW/SW) systems using a HW/SW interface simulation model. Given a SW application described at the OS abstraction level and a HW Platform described at an arbitrary abstraction level, we aim at providing the adaptation layer, i.e. simulation model of the HW/SW interface, which will enable the timed HW/SW cosimulation of the entire system at an early design stage before the system design is completed. Experimental results show that our approach is easy to use and efficient while providing fast simulation (up to 3 orders of magnitude faster than a HW/SW cosimulation with instruction set simulator, ISS) and accuracy (86% compared with a HW/SW cosimulation with ISS).

Proceedings ArticleDOI
Xiang Lu1, Zhuo Li1, Wangqi Qiu1, Duncan M. Walker1, Weiping Shi1 
27 Jan 2004
TL;DR: In this paper, a path pruning technique is proposed to discard paths that are not longest, resulting in a significantly reduction in the number of paths compared with the previous best method, which can be applied to any process variation as long as its impact on delay is linear.
Abstract: Under manufacturing process variation, a path through a fault site is called longest for delay test if there exists a process condition under which the path has the maximum delay among all paths through that fault site. There are often multiple longest paths for each fault site in the circuit, due to different process conditions. To detect the smallest delay fault, it is necessary to test all longest paths through the fault site. However, previous methods are either inefficient or their results include too many paths that are not longest.This paper presents an efficient method to generate the longest path set for delay test under process variation. To capture both structural and systematic process correlation, we use linear delay functions to express path delays under process variation. A novel path-pruning technique is proposed to discard paths that are not longest, resulting in a significantly reduction in the number of paths compared with the previous best method. The new method can be applied to any process variation as long as its impact on delay is linear.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: This paper settles the open problem whether it is possible to optimize slack and at the same time minimize the buffer usage by showing that for arbitrary integer cost functions, the problem is NP-complete.
Abstract: As gate delays decrease faster than wire delays for each technology generation, buffer insertion becomes a popular method to reduce the interconnect delay. Several modern buffer insertion algorithms (e.g., [7, 6, 15]) are based on van Ginneken's dynamic programming paradigm [14]. However, van Ginneken's original algorithm does not control buffering resources and tends to over-buffering, thereby wasting area and power. It has been a major open problem whether it is possible to optimize slack and at the same time minimize the buffer usage.This paper settles this open problem by showing that for arbitrary integer cost functions, the problem is NP-complete. We also extend the pre-buffer slack technique [12] to minimize the buffer cost. This technique can significantly reduce the running time and memory in buffer cost minimization problem. The experimental results show that our algorithm can speed up the running time up to 17 times and reduces the memory to 1/30 of traditional best know algorithm. Finally, we show how to efficiently deal with multiway merge in buffer insertion.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: A mathematical framework for such a rate analysis for streaming applications is presented, and its feasibility is illustrated through a detailed case study of a MPEG-2 decoder application.
Abstract: While mapping a streaming (such as multimedia or network packet processing) application onto a specified architecture, an important issue is to determine the input stream rates that can be supported by the architecture for any given mapping. This is subject to typical constraints such as on-chip buffers should not overflow, and specified playout buffers (which feed audio or video devices) should not underflow, so that the quality of the audio/video output is maintained. The main difficulty in this problem arises from the high variability in execution times of stream processing algorithms, coupled with the bursty nature of the streams to be processed. In this paper we present a mathematical framework for such a rate analysis for streaming applications, and illustrate its feasibility through a detailed case study of a MPEG-2 decoder application. When integrated into a tool for automated design-space exploration, such an analysis can be used for fast performance evaluation of different stream processing architectures.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: This paper shows that the application of the object-oriented programming paradigm together with the de facto industry standard SystemC is feasible and presents the integration of SystemC into a continuous object- oriented design flow.
Abstract: The constantly increasing complexity of today's systems demands specifications on highest levels of abstraction. In addition to a transition towards the system-level more elaborate techniques are necessary to close a growing productivity gap. Our solution to this problem is the application of the object-oriented programming paradigm together with the de facto industry standard SystemC. In this paper we show that this approach is feasible and present the integration of SystemC into a continuous object-oriented design flow. The design flow includes modeling with UML, hardware/software partitioning and synthesis of object-oriented specifications. We support our claim by results from a case study.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: A comprehensive set of algorithms that can be applied at design time in order to maximally exploit scratch pad memories (SPMs) are described, showing that both the energy consumption as well as the computed worst case execution time (WCET) can be reduced by up to to 80% and 48%, respectively, by establishing a strong link between the memory architecture and the compiler.
Abstract: The design of future high-performance embedded systems is hampered by two problems: First, the required hardware needs more energy than is available from batteries. Second, current cache-based approaches for bridging the increasing speed gap between processors and memories cannot guarantee predictable real-time behavior. A contribution to solving both problems is made in this paper which describes a comprehensive set of algorithms that can be applied at design time in order to maximally exploit scratch pad memories (SPMs). We show that both the energy consumption as well as the computed worst case execution time (WCET) can be reduced by up to to 80% and 48%, respectively, by establishing a strong link between the memory architecture and the compiler.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: This paper proposes a novel and efficient method for optimizing worst-case static IR-drop in hierarchical, uniform power distribution networks and presents a novel approach to optimize the two-level mesh topology.
Abstract: Robust power distribution within available routing area resources is critical to chip performance and reliability. In this paper, we propose a novel and efficient method for optimizing worst-case static IR-drop in hierarchical, uniform power distribution networks. Our results can be used for planning of hierarchical power distribution in early design stages, so that for a fixed total routing area the worst-case IR-drop on the power mesh is minimal, or for a given IR-drop tolerance the power mesh achieves the IR-drop specification with minimal routing area. Our contributions are as follows. (1) We derive a closed-form approximation for the worst-case IR-drop on a single-level power mesh. The formula shows that for a given total routing area, the worst-case IR-drop increases logarithmically with the number of metal lines on the mesh. (2) Based on the previous analysis and empirical studies, we propose a model for the worst-case static IR-drop on a two-level power mesh, and obtain an accurate empirical expression. (3) Using this expression, we present a novel approach to optimize the two-level mesh topology. (4) We extend our study to three-level power meshes, and find that a third, middle-level mesh helps to reduce IR-drop by only a relatively small extent (about 5%, according to our experiments).

Proceedings ArticleDOI
27 Jan 2004
TL;DR: This paper defines the simplified RC circuit model of a hybrid clock mesh/tree structure and proposes a hybrid multi-level mesh and tree structure for global clock distribution and shows that by adding a mesh to the bottom-level leaves of an H-tree, the clock skew can be reduced.
Abstract: In this paper, we investigate the effect of multilevel network for clock skew. We first define the simplified RC circuit model of a hybrid clock mesh/tree structure. The skew reduction effect of shunt segment contributed by the mesh is derived analytically from the simplified model. The result indicates that the skew decreases proportionally to the exponential of -R1/R, where R1 is the driving resistance of a leaf node in the clock tree and R is the resistance of a mesh segment. Based on our analysis, we propose a hybrid multi-level mesh and tree structure for global clock distribution. A simple optimization scheme is adopted to optimize the routing resource distribution of the multi-level mesh. Experimental results show that by adding a mesh to the bottom-level leaves of an H-tree, the clock skew can be reduced from 29.2 ps to 8.7 ps, and the multi-level networks with same total routing area can further reduce the clock skew by another 30%. We also discuss the inductive effect of mesh in the appendix. When the clock frequency is less than 4 GHz, our RC model remains valid for clock meshes with grounded shielding or using differential signals.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: An algorithm with evolutionary search is successfully developed to efficiently handle the fixed-die floorplanning problem and achieve near 100% successful probability, on the average.
Abstract: In this paper, we address the pratical problem of fixed-outline VLSI floorplanning with minimizing the objective of area. This problem was shown significantly much more difficult than the well-researched floorplan problems without fixed-outline regime [1]. We successfully develop an algorithm with evolutionary search to efficiently handle the fixed-die floorplanning problem and achieve near 100% successful probability, on the average.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: An efficient technique for solving a Boolean matching problem in cell-library binding, where the number of cells in the library is large, and is more than two orders of magnitude faster than the Hinsberger-Kolla algorithm---the fastest Boolean matching algorithm for large libraries.
Abstract: This paper presents an efficient technique for solving a Boolean matching problem in cell-library binding, where the number of cells in the library is large. As a basis of the Boolean matching, we use the notion NP-representative (NPR); two functions have the same NPR if one can be obtained from the other by a permutation and/or complementation(s) of the variables. By using a table look-up and a tree-based breadth-first search strategy, our method quickly computes NPR for a given function. Boolean matching of the given function against the whole library is determined by checking the presence of its NPR in a hash table, which stores NPRs for all the library functions and their complements. The effectiveness of our method is demonstrated through experimental results, which shows that it is more than two orders of magnitude faster than the Hinsberger-Kolla's algorithm---the fastest Boolean matching algorithm for large libraries.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: In this paper, a performance comparison of two PLLs for clock generation using a ring oscillator based VCO and an LC oscillator-based VCO is presented, based on a qualitative evaluation in an analytic way.
Abstract: This paper describes a performance comparison of two PLLs for clock generation using a ring oscillator based VCO and an LC oscillator based VCO. We fabricate two 1.6GHz PLLs in a 0.18 μm digital CMOS process and compare their performances based on the measurement results. We also predicts major performances of PLLs in the future such as jitter, power consumption and chip area, based on a qualitative evaluation in an analytic way.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: This work addresses a problem of reusing and customizing soft IP components by introducing a concept of design process - a series of common, well-defined and well-proven domain-specific actions and methods performed to achieve a certain design aim.
Abstract: We address a problem of reusing and customizing soft IP components by introducing a concept of design process - a series of common, well-defined and well-proven domain-specific actions and methods performed to achieve a certain design aim. We especially examine system-level design processes that are aimed at designing a hardware system by integrating soft IPs at a high level of abstraction. We combine this concept with object-oriented hardware design using UML and metaprogramming paradigm for describing generation of domain code.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: Experimental results show that this analysis method can accurately estimate the amount and frequencies of periodic and random jitter of a multi-gigahertz signal.
Abstract: In this paper, we propose a method for extracting the spectral information of a multi-gigahertz jittery signal. This method may utilize existing on-chip single-shot period measurement techniques to measure the multi-gigahertz signal periods for spectral analysis. This method does not require an external sampling clock, nor any additional measurement beyond existing techniques. Experimental results show that this analysis method can accurately estimate the amount and frequencies of periodic and random jitter of a multi-gigahertz signal.

Proceedings ArticleDOI
27 Jan 2004
TL;DR: In this article, the authors proposed a maximum independent set based slack assignment algorithm to select a maximum number of gates working at high-Vth such that the total power gain is maximized.
Abstract: In this paper, we will study the reduction of static power consumption by dual threshold voltage assignment. Our goal is, under given timing constraint, to select a maximum number of gates working at high-Vth such that the total power gain is maximized. We propose an maximum independent set based slack assignment algorithm to select gates for high-Vth. The results show that our assignment algorithm can achieve about 68% improvement as compared to results without using dual Vth.

Proceedings ArticleDOI
Kazutoshi Wakabayashi1
27 Jan 2004
TL;DR: The effects of SoC design with C language-based behavioral synthesis and verification with statistical analysis of several industrial designs and several merits of C-based design are examined using actual chip design results.
Abstract: This paper presents the effects of SoC design with C language-based behavioral synthesis and verification. Initially, the proposed C-based design environment for a large SOC consisting of a hardware and embedded software is explained. Next, the increasse of design productivity by shifting from RTL to behavioral design will be discussed with statistical analysis of several industrial designs. Then, several merits of C-based design are examined using actual chip design results.