scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 1998"


Journal Article•DOI•
TL;DR: In this article, an algorithm for generating provably passive reduced-order N-port models for linear RLC interconnect circuits is described, in which, in addition to macromodel stability, passivity is needed to guarantee the overall circuit stability.
Abstract: This paper describes an algorithm for generating provably passive reduced-order N-port models for RLC interconnect circuits. It is demonstrated that, in addition to macromodel stability, macromodel passivity is needed to guarantee the overall circuit stability once the active and passive driver/load models are connected. The approach proposed here, PRIMA, is a general method for obtaining passive reduced-order macromodels for linear RLC systems. In this paper, PRIMA is demonstrated in terms of a simple implementation which extends the block Arnoldi technique to include guaranteed passivity while providing superior accuracy. While the same passivity extension is not possible for MPVL, comparable accuracy in the frequency domain for all examples is observed.

1,465 citations


Journal Article•DOI•
TL;DR: A denotational framework (a "meta model") within which certain properties of models of computation can be compared is given, which describes concurrent processes in general terms as sets of possible behaviors.
Abstract: We give a denotational framework (a "meta model") within which certain properties of models of computation can be compared. It describes concurrent processes in general terms as sets of possible behaviors. A process is determinate if, given the constraints imposed by the inputs, there are exactly one or exactly zero behaviors. Compositions of processes are processes with behaviors in the intersection of the behaviors of the component processes. The interaction between processes is through signals, which are collections of events. Each event is a value-tag pair, where the tags can come from a partially ordered or totally ordered set. Timed models are where the set of tags is totally ordered. Synchronous events share the same tag, and synchronous signals contain events with the same set of tags. Synchronous processes have only synchronous signals as behaviors. Strict causality (in timed tag systems) and continuity (in untimed tag systems) ensure determinacy under certain technical conditions. The framework is used to compare certain essential features of various models of computation, including Kahn process networks, dataflow, sequential processes, concurrent sequential processes with rendezvous, Petri nets, and discrete-event systems.

687 citations


Journal Article•DOI•
TL;DR: Heuristics with good performance bounds can be derived for combinational circuits tested using built-in self-test (BIST) and considerable reduction in power dissipation can be obtained using the proposed techniques.
Abstract: Reduction of power dissipation during test application is studied for scan designs and for combinational circuits tested using built-in self-test (BIST). The problems are shown to be intractable. Heuristics to solve these problems are discussed. We show that heuristics with good performance bounds can be derived for combinational circuits tested using BIST. Experimental results show that considerable reduction in power dissipation can be obtained using the proposed techniques.

338 citations


Journal Article•DOI•
TL;DR: A hardware-software cosynthesis system, called MOGAC, that partitions and schedules embedded system specifications consisting of multiple periodic task graphs using an adaptive multiobjective genetic algorithm that can escape local minima.
Abstract: In this paper, we present a hardware-software cosynthesis system, called MOGAC, that partitions and schedules embedded system specifications consisting of multiple periodic task graphs. MOGAC synthesizes real-time heterogeneous distributed architectures using an adaptive multiobjective genetic algorithm that can escape local minima. Price and power consumption are optimized while hard real-time constraints are met. MOGAC places no limit on the number of hardware or software processing elements in the architectures it synthesizes. Our general model for bus and point-to-point communication links allows a number of link types to be used in an architecture. Application-specific integrated circuits consisting of multiple processing elements are modeled. Heuristics are used to tackle multirate systems, as well as systems containing task graphs whose hyperperiods are large relative to their periods. The application of a multiobjective optimization strategy allows a single cosynthesis run to produce multiple designs that trade off different architectural features. Experimental results indicate that MOGAC has advantages over previous work in terms of solution quality and running time.

248 citations


Journal Article•DOI•
TL;DR: This paper surveys representative contributions to power modeling, estimation, synthesis, and optimization techniques that account for power dissipation during the early stages of the design flow that have appeared in the recent literature.
Abstract: Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital very-large-scale-integration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the other design parameters. This is primarily due to the remarkable success of personal computing devices and wireless communication systems, which demand high-speed computations with low power consumption. In addition, there exists a strong pressure for manufacturers of high-end products to keep power under control, due to the increased costs of packaging and cooling this type of device. Last, the need of ensuring high circuit reliability has turned out to be more stringent. The availability of tools for the automatic design of low-power VLSI systems has thus become necessary. More specifically, following a natural trend, the interests of the researchers have lately shifted to the investigation of power modeling, estimation, synthesis, and optimization techniques that account for power dissipation during the early stages of the design flow. This paper surveys representative contributions to this area that have appeared in the recent literature.

232 citations


Journal Article•DOI•
TL;DR: This paper shows that Karp's algorithm processes more nodes and arcs than needed to find the maximum cycle mean of a digraph, and proposes a new graph-unfolding scheme that remedies this deficiency and leads to two faster algorithms with different characteristics.
Abstract: Maximum and minimum mean cycle problems are important problems with many applications in performance analysis of synchronous and asynchronous digital systems including rate analysis of embedded systems, in discrete-event systems, and in graph theory. Karp's algorithm is one of the fastest and most common algorithms for these problems. We present this paper mainly in the context of the maximum mean cycle problem. We show that Karp's algorithm processes more nodes and arcs than needed to find the maximum cycle mean of a digraph. This observation motivated us to propose a new graph-unfolding scheme that remedies this deficiency and leads to two faster algorithms with different characteristics. Theoretical analysis tells us that our algorithms always run faster than Karp's algorithm and that they are among the fastest to date. Experiments on small benchmark graphs confirm this fact for most of the graphs. These algorithms have been used in building a framework for analysis of timing constraints for embedded systems.

200 citations


Journal Article•DOI•
TL;DR: This paper proposes a new multilevel partitioning algorithm that exploits some of the latest innovations of classical iterative partitioning approaches and presents quadrisection results which compare favorably to the partitionings obtained by the GORDIAN cell placement tool.
Abstract: Many previous works in partitioning have used some underlying clustering algorithm to improve performance. As problem sizes reach new levels of complexity, a single application of a clustering algorithm is insufficient to produce excellent solutions. Recent work has illustrated the promise of multilevel approaches. A multilevel partitioning algorithm recursively clusters the instance until its size is smaller than a given threshold, then unclusters the instance, while applying a partitioning refinement algorithm. In this paper, we propose a new multilevel partitioning algorithm that exploits some of the latest innovations of classical iterative partitioning approaches. Our method also uses a new technique to control the number of levels in our matching-based clustering algorithm. Experimental results show that our heuristic outperforms numerous existing bipartitioning heuristics with improvements ranging from 6.9 to 27.9% for 100 runs and 3.0 to 20.6% for just ten runs (while also using less CPU time). Further, our algorithm generates solutions better than the best known mincut bipartitionings for seven of the ACM/SIGDA benchmark circuits, including golem3 (which has over 100000 cells). We also present quadrisection results which compare favorably to the partitionings obtained by the GORDIAN cell placement tool. Our work in multilevel quadrisection has been used as the basis for an effective cell placement package.

171 citations


Journal Article•DOI•
TL;DR: This technique is used to solve for the impedance matrix for an arbitrary three-dimensional arrangement of conductors placed anywhere in the substrate, and the substrate coupling and loss in IC circuits can be analyzed.
Abstract: The Green function over a multilayer substrate is derived by solving Poisson's equation analytically in the coordinate and numerically in the z and y coordinates. The x and y functional dependence is transformed into a discrete cosine transform (DCT) representation for rapid evaluation. The Green function is further transformed into a numerically stable form appropriate for finite-precision machine evaluation. This Green function is used to solve for the impedance matrix for an arbitrary three-dimensional arrangement of conductors placed anywhere in the substrate. Using this technique, the substrate coupling and loss in IC circuits can be analyzed. A spiral inductor is presented as an example. Experimental measurement results verify the accuracy of the technique.

157 citations


Journal Article•DOI•
B.P. Dave1, N.K. Jha2•
TL;DR: This paper addresses the problem of hardware-software cosynthesis of hierarchical heterogeneous distributed embedded system architectures from hierarchical or nonhierarchical task graphs and shows how the cosynthesis algorithm can be easily extended to consider fault tolerance or low-power objectives or both.
Abstract: Hardware-software cosynthesis of an embedded system architecture entails partitioning of its specification into hardware and software modules such that its real-time and other constraints are met. Embedded systems are generally specified in terms of a set of acyclic task graphs. For medium- to large-scale embedded systems, the task graphs are usually hierarchical in nature. The embedded system architecture, which is the output of the cosynthesis system, may itself be nonhierarchical or hierarchical. Traditional nonhierarchical architectures create communication and processing bottlenecks and are impractical for large embedded systems. Such systems require a large number of processing elements and communication links connected in a hierarchical manner, thus forming a hierarchical distributed architecture, to meet performance and cost objectives. In this paper, we address the problem of hardware-software cosynthesis of hierarchical heterogeneous distributed embedded system architectures from hierarchical or nonhierarchical task graphs. Our cosynthesis algorithm has the following features: 1) it supports periodic task graphs with real-time constraints, 2) it supports pipelining of task graphs, 3) it supports a heterogeneous set of processing elements and communication links, 4) it allows both sequential and concurrent modes of communication and computation, 5) it employs a combination of preemptive and nonpreemptive static scheduling, 6) it employs a new task-clustering technique suitable for hierarchical task graphs, and 7) it uses the concept of association arrays to tackle the problem of multirate tasks encountered in multimedia systems. We show how our cosynthesis algorithm can be easily extended to consider fault tolerance or low-power objectives or both. Although hierarchical architectures have been proposed before, to the best of our knowledge, this is the first time the notion of hierarchical task graphs and hierarchical architectures has been supported in a cosynthesis algorithm.

145 citations


Journal Article•DOI•
TL;DR: A new chip-level electrothermal timing simulator for CMOS VLSI circuits is presented, and temperature-dependent reliability and timing problems of VLSi circuits can be accurately identified.
Abstract: In this paper, we present a new chip-level electrothermal timing simulator for CMOS VLSI circuits. Given the chip layout, the packaging specification, and the periodic input signal pattern, it finds the on-chip steady-state temperature profile and the resulting circuit performance. A tester chip has been designed for verification of ILLIADS-T, and very good agreement between simulation and experiment was found. Using this electrothermal simulator, temperature-dependent reliability and timing problems of VLSI circuits can be accurately identified.

142 citations


Journal Article•DOI•
TL;DR: A new method of packing the rectangles is proposed with applications to integrated circuit (IC) layout design, called the bounded-sliceline grid, which consists of special segments that dissect the plane into rooms to which binary relations "right-of" and "above" are associated.
Abstract: A new method of packing the rectangles is proposed with applications to integrated circuit (IC) layout design. A special work-sheet, called the bounded-sliceline grid, is introduced. It consists of special segments that dissect the plane into rooms to which binary relations "right-of" and "above" are associated such that any two rooms are uniquely in either relation. A packing is obtained through an assignment of the modules into the rooms followed by a compaction procedure. Changing the assignments by swapping the contents of two rooms, a simulated annealing strategy is implemented to search for a good packing. Empirical results show that hundreds of rectangles are packed with a quite good quality in area efficiency. A wide adaptability is demonstrated specific to IC layout design. Ideas to handle a multilayer, nonrectangular chips with L-shaped modules are suggested.

Journal Article•DOI•
TL;DR: In this article, the authors proposed several heuristics and exact algorithms for the minimum shortest path Steiner arborescence (MSPSA) problem with applications to VLSI physical design.
Abstract: Given an undirected graph G=(V, E) with positive edge weights (lengths) /spl omega/: E/spl rarr//spl Rfr//sup +/, a set of terminals (sinks) N/spl sube/V, and a unique root node e/spl epsiv/N, a shortest path Steiner arborescence (hereafter an arborescence) is a Steiner tree rooted at, spanning all terminals in N such that every source-to-sink path is a shortest path in G. Given a triple (G, N, r), the minimum shortest path Steiner arborescence (MSPSA) problem seeks an arborescence with minimum weight. The MSPSA problem has various applications in the areas of physical design of very large-scale integrated circuits (VLSI), multicast network communication, and supercomputer message routing; various eases have been studied in the literature. In this paper, we propose several heuristics and exact algorithms for the MSPSA problem with applications to VLSI physical design. Experiments indicate that our heuristics generate near optimal results and achieve speedups of orders of magnitude over existing algorithms.

Journal Article•DOI•
TL;DR: This paper describes an approach termed guarded evaluation, which is an implementation of this idea to automatically determine the parts of the circuit that can be disabled on a per-clock-cycle basis and indicates substantial power savings and the strong potential for a large number of benchmark circuits.
Abstract: The need to reduce the power consumption of the next generation of digital systems is clearly recognized at all levels of system design. At the system level, power management is a very powerful technique and delivers large and unambiguous savings. The ideas behind power management can be extended to the logic level. This would involve determining which parts of a circuit are computing results that will be used and which are not. The parts that are not needed are then "shut off". This paper describes an approach termed guarded evaluation, which is an implementation of this idea. A theoretical framework and the algorithms that form the basis of the approach are presented. The underlying idea is to automatically determine the parts of the circuit that can be disabled on a per-clock-cycle basis. This saves the power used in all the useless transitions in those parts of the circuit. Initial experiments indicate substantial power savings and the strong potential of this approach for a large number of benchmark circuits. While this paper presents the development of these ideas at the logic level of design, the same ideas have direct application at the register-transfer level of design also.

Journal Article•DOI•
TL;DR: This paper addresses the issue of switching activity estimation in combinational circuits under the zero-delay model using lag-one Markov chains and proves that the conditional independence problem is NP-complete, and approximate techniques with bounded error are proposed for estimating the switching activity.
Abstract: This paper addresses, from a probabilistic point of view, the issue of switching activity estimation in combinational circuits under the zero-delay model. As the main theoretical contribution, we extend the previous work done on switching activity estimation to explicitly account for complex spatiotemporal correlations which occur at the primary inputs when the target circuit receives data from real applications. More precisely, using lag-one Markov chains, two new concepts-conditional independence and signal isotropy-are brought into attention and based on them, sufficient conditions for exact analysis of complex dependencies are given. From a practical point of view, it is shown that the relative error in calculating the switching activity of a logic gate using only pairwise probabilities can be upper-bounded. It is proved that the conditional independence problem is NP-complete and thus, relying on the concept of signal isotropy, approximate techniques with bounded error are proposed for estimating the switching activity. Evaluations of the model and a comparative analysis on benchmark circuits show that node-by-node switching activities are strongly pattern dependent and therefore, accounting for spatiotemporal dependencies is mandatory if accuracy is a major concern.

Journal Article•DOI•
TL;DR: This paper introduces a new combinatorial optimization problem, matrix synthesis problem (MSP), to model the thermal placement problem for gate arrays and shows that MSP is NP-complete and presents several provably good approximation algorithms for the problem.
Abstract: In this paper, we consider the thermal placement problem for gate arrays. We introduce a new combinatorial optimization problem, matrix synthesis problem (MSP), to model the thermal placement problem. Given a list of mn nonnegative real numbers and an integer t, MSP constructs a m/spl times/n matrix out of the given numbers such that the maximum sum among all t/spl times/t submatrices is minimized. We show that MSP is NP-complete and present several provably good approximation algorithms for the problem. We also demonstrate that our thermal placement strategy is flexible enough to allow simultaneous consideration of other objectives such as wiring.

Journal Article•DOI•
TL;DR: A probabilistic simulation technique to estimate the power consumption of a CMOS circuit under a general delay model based on the notion of a tagged (probability) waveform, which models the set of all possible events at the output of each circuit node.
Abstract: In this paper, we present a probabilistic simulation technique to estimate the power consumption of a CMOS circuit under a general delay model. This technique is based on the notion of a tagged (probability) waveform, which models the set of all possible events at the output of each circuit node. Tagged waveforms are obtained by partitioning the logic waveform space of a circuit node according to the initial and final values of each logic waveform and compacting all logic waveforms in each partition by a single tagged waveform. To improve the efficiency of tagged probabilistic simulation, only tagged waveforms at the circuit inputs are exactly computed. The tagged waveforms of the remaining nodes are computed using a compositional scheme that propagates the tagged waveforms from circuit inputs to circuit outputs. We obtain significant speed up over explicit simulation methods with an average error of only 6%. This also represents a factor of 2-3/spl times/ improvement in accuracy of power estimates over previous probabilistic simulation approaches.

Journal Article•DOI•
TL;DR: An algorithm for automatically restructuring the controllers of the data paths in which variable-latency units have been introduced is formulated, and results show an average throughput improvement exceeding 27%, at the price of a modest area increase.
Abstract: This paper introduces a novel optimization paradigm for increasing the throughput of digital systems. The basic idea consists of transforming fixed-latency units into variable-latency ones that run with a faster clock cycle. The transformation is fully automatic and can be used in conjunction with traditional design techniques to improve the overall performance of speed-critical units. In addition, we introduce procedures for reducing the area overhead of the modified units, and we formulate an algorithm for automatically restructuring the controllers of the data paths in which variable-latency units have been introduced. Results, obtained on a large set of benchmark circuits, show an average throughput improvement exceeding 27%, at the price of a modest area increase (less than 8% on average).

Journal Article•DOI•
Taewhan Kim1, W. Jao, Steve Tjiang2•
TL;DR: Experimental results from a set of typical arithmetic computations found in industry designs indicate that automating CSA optimization with the established algorithm produces designs with up to 53% faster timing and up to 42% smaller area.
Abstract: Carry-save-adder (CSA) is the most often used type of operation in implementing a fast computation of arithmetics of register-transfer-level design in industry. This paper establishes a relationship between the properties of arithmetic computations and several optimizing transformations using CSAs to derive consistently better qualities of results than those of manual implementations. In particular, we introduce two important concepts, operation duplication and operation split, which are the main driving techniques of our algorithm for achieving an extensive utilization of CSAs. Experimental results from a set of typical arithmetic computations found in industry designs indicate that automating CSA optimization with our algorithm produces designs with up to 53% faster timing and up to 42% smaller area.

Journal Article•DOI•
TL;DR: This work addresses the problem of code compression in systems with embedded DSP processors by using data-compression methods to develop code-size minimization strategies and shows that the dictionary can be computed by solving a set-covering problem derived from the original program.
Abstract: Code-size minimization in embedded systems is an important problem because code size directly affects production cost. We address the problem of code compression in systems with embedded DSP processors. We use data-compression methods to develop code-size minimization strategies. In our framework, the compressed program consists of a skeleton and a dictionary. We show that the dictionary can be computed by solving a set-covering problem derived from the original program. We also address performance considerations, and show that they can be incorporated easily into the set-covering formulation. Experimental results are presented.

Journal Article•DOI•
TL;DR: A fast fault simulation approach based on ordinary logic emulation that reduces the number of faults actually emulated by screening off faults not activated or with short propagation distances before emulation, and by collapsing nonstem faults into their equivalent stem faults.
Abstract: A fast fault simulation approach based on ordinary logic emulation is proposed. The circuit configured into our system that emulates the faulty circuit's behaviour is synthesized from the good circuit and the given fault list in a novel way. Fault injection is made easy by shifting the content of a fault injection scan chain or by selecting the output of a parallel fault injection selector, with which we get rid of the time-consuming bit-stream regeneration process. Experimental results for ISCAS-89 benchmark circuits show that our serial fault emulator is about 20 times faster than HOPE. The speedup grows with the circuit size by our analysis. Two hybrid fault emulation approaches are also proposed. The first reduces the number of faults actually emulated by screening off faults not activated or with short propagation distances before emulation, and by collapsing nonstem faults into their equivalent stem faults. The second reduces the hardware requirement of the fault emulator by incorporating an ordinary fault simulator.

Journal Article•DOI•
Andrew R. Conn1, Paula Kristine Coulman1, R. A. Haring1, G.L. Morrill1, Chandu Visweswariah1, Chai Wah Wu1 •
TL;DR: A circuit optimization tool that automates the tuning task by means of state-of-the-art nonlinear optimization, which makes use of a fast circuit simulator and a general-purpose non linear optimization package and presents extensive circuit optimization results.
Abstract: Automating the transistor and wire-sizing process is an important step toward being able to rapidly design high-performance, custom circuits. This paper presents a circuit optimization tool that automates the tuning task by means of state-of-the-art nonlinear optimization. It makes use of a fast circuit simulator and a general-purpose nonlinear optimization package. It includes minimax and power optimization, simultaneous transistor and wire tuning, general choices of objective functions and constraints, and recovery from nonworking circuits. In addition, the tool makes use of designer-friendly interfaces that automate the specification of the optimization task, the running of the optimizer, and the back-annotation of the results of optimization onto the circuit schematic. Particularly for large circuits, gradient computation is usually the bottleneck in the optimization procedure. In addition to traditional adjoint and direct methods, we use a technique called the adjoint Lagrangian method, which computes all the gradients necessary for one iteration of optimization in a single adjoint analysis. This paper describes the algorithms and the environment in which they are used and presents extensive circuit optimization results. A circuit with 6900 transistors, 4128 tunable transistors, and 60 independent parameters was optimized in about 108 min of CPU time on an IBM RISC/System 6000, model 590.

Journal Article•DOI•
TL;DR: This paper presents a set of well-conditioned transformations called "split congruence transformations" (SCTs) which can be used to preserve specified moments and resonances for RLC network reduction, and these transformations are proven to preserve passivity.
Abstract: Resistance-inductance-capacitance (RLC) network reduction refers to the formulation of small networks whose port behavior is similar to that of large RLC networks. Several network reduction algorithms have been developed in the last few years, but none exist for RLC networks which preserve passivity. The loss of passivity can be a serious problem because simulations of the reduced networks may encounter artificial oscillations or "time step too small" errors which render the simulations useless. This paper presents a set of well-conditioned transformations called "split congruence transformations" (SCTs) which can he used to preserve specified moments and resonances for RLC network reduction, and these transformations are proven to preserve passivity. Network reduction examples are provided to demonstrate the utility of SCTs.

Journal Article•DOI•
TL;DR: This paper shows how to apply a generalization of an algorithm due to Weiszfeld to placement with a linear wirelength objective and that the main GORDIAN-L loop is actually a special case of this algorithm, and proposes applying a regularization parameter to the generalized WeisZfeld algorithm to control the tradeoff between convergence and solution accuracy.
Abstract: A linear wirelength objective more effectively captures timing, congestion, and other global placement considerations than a squared wirelength objective. The GORDIAN-L cell placement tool minimizes linear wirelength by first approximating the linear wirelength objective by a modified squared wirelength objective, then executing the following loop-(1) minimize the current objective to yield some approximate solution and (2) use the resulting solution to construct a more accurate objective-until the solution converges. This paper shows how to apply a generalization of an algorithm due to Weiszfeld (1937) to placement with a linear wirelength objective and that the main GORDIAN-L loop is actually a special case of this algorithm. We then propose applying a regularization parameter to the generalized Weiszfeld algorithm to control the tradeoff between convergence and solution accuracy; the GORDIAN-L iteration is equivalent to setting this regularization parameter to zero. We also apply novel numerical methods, such as the primal-Newton and primal-dual Newton iterations, to optimize the linear wirelength objective. Finally, we show both theoretically and empirically that the primal-dual Newton iteration stably attains quadratic convergence, while the generalized Weiszfeld iteration is linear convergent. Hence, primal-dual Newton is a superior choice for implementing a placer such as GORDIAN-L, or for any linear wirelength optimization.

Journal Article•DOI•
TL;DR: The problem of designing linear, zero-aliasing space compactors that provide a high compaction ratio and introduce bounded hardware overhead is investigated and a graph model is developed for the space-compaction process and related to the graph coloring problem.
Abstract: Space compaction is employed in built-in self-testing schemes to compress the test responses from a k-output circuit to q signature streams, where q/spl Lt/k. The effectiveness of a compaction method is measured by its compaction ratio k/q and the amount of hardware required to implement the compaction circuit. However, a high compaction ratio can require a very large compactor as well as introduce aliasing, which occurs when a faulty test response maps to the fault-free signature. We investigate the problem of designing linear, zero-aliasing space compactors that provide a high compaction ratio and introduce bounded hardware overhead. We develop a graph model for the space-compaction process and relate space-compactor design to the graph coloring problem. This technique can also be used to reduce the width of multiple-input signature registers that are used for response compaction. We apply our design method to the ISCAS 85 benchmark circuits and present experimental data on the compaction ratio achieved for these circuits.

Journal Article•DOI•
TL;DR: Experimental results show that BIST TPGs based on input reduction achieve complete stuck-at fault coverage in practical test lengths (/spl les/2/sup 30/) for many benchmark circuits.
Abstract: A new technique called input reduction is proposed for built-in self test (BIST) test pattern generator (TPG) design and test set compaction. This technique analyzes the circuit function and identifies sets of compatible and inversely compatible inputs; inputs in each set can be combined into a test signal in the test mode without sacrificing fault coverage, even if they belong to the same circuit cone. The test signals are used to design BIST TPGs that guarantee the detection of all detectable stuck-at faults in practical test lengths. A deterministic test set generated for the reduced circuit obtained by combining inputs into test signals is usually more compact than that generated for the original circuit. Experimental results show that BIST TPGs based on input reduction achieve complete stuck-at fault coverage in practical test lengths (/spl les/2/sup 30/) for many benchmark circuits. These are achieved with low area overhead and performance penalty to the circuit under test. Results also show that the memory storage and test application time for external testing using deterministic test sets can be reduced by as much as 85%.

Journal Article•DOI•
TL;DR: This scheme identifies a suitable control and data flow from the register-transfer level circuit, and uses it to test each embedded element in the circuit by symbolically justifying its precomputed test set from the system primary inputs to the element inputs and symbolically propagating the output response to the systemPrimary outputs.
Abstract: In this paper, we present a technique for extracting functional (control/data flow) information from register-transfer level controller/data path circuits, and illustrate its use in design for hierarchical testability of these circuits. This scheme does not require any additional behavioral information. It identifies a suitable control and data flow from the register-transfer level circuit, and uses it to test each embedded element in the circuit by symbolically justifying its precomputed test set from the system primary inputs to the element inputs and symbolically propagating the output response to the system primary outputs. When symbolic justification and propagation become difficult, it inserts test multiplexers at suitable points to increase the symbolic controllability and observability of the circuit. These test multiplexers are mostly restricted to off-critical paths. Testability analysis and insertion are completely based on the register-transfer level circuit and the functional information automatically extracted from it, and are independent of the data path bit width owing to their symbolic nature. Furthermore, the data path test set is obtained as a byproduct of this analysis without any further search. Unlike many other design-for-testability techniques, this scheme makes the combined controller-data path very highly testable. It is general enough to handle control-flow-intensive register-transfer level circuits like protocol handlers as well as data-flow intensive circuits like digital filters. It results in low area/delay/power overheads, high fault coverage, and very low test generation times (because it is symbolic and independent of bit width). Also, a large part of our system-level test sets can be applied at speed. Experimental results on many benchmarks show the average area, delay, and power overheads for testability to be 3.1, 1.0, and 4.2%, respectively. Over 99% fault coverage is obtained in most cases with two-four orders of magnitude test generation time advantage over an efficient gate-level sequential test pattern generator and one-three orders of magnitude advantage over an efficient gate-level combinational test pattern generator (that assumes full scan). In addition, the test application times obtained for our method are comparable with those of gate-level sequential test pattern generators, and up to two orders of magnitude smaller than designs using full scan.

Journal Article•DOI•
TL;DR: A new approach for estimating power dissipation in a high performance microprocessor chip is presented, with results obtained for Intel's Pentium processor executing standard benchmark programs show a simulation-time reduction of three to five orders of magnitude.
Abstract: This paper presents a new approach for estimating power dissipation in a high performance microprocessor chip. A characteristic profile (including parameters such as the cache miss rate, branch-prediction miss rate, pipeline stalls, instruction mix, and so on) is first extracted from the application programs. Mixed-integer linear-programming and heuristic rules are then used to gradually transform a generic program template into a fully functional program. The synthesized program exhibits the same characteristics (and hence the same performance and power-dissipation behavior), yet it has an instruction trace that is orders of magnitude smaller than the initial trace. The synthesized program is subsequently simulated on a register-transfer-level description of the target microprocessor to provide the power-dissipation value. Results obtained for Intel's Pentium processor executing standard benchmark programs show a simulation-time reduction of three to five orders of magnitude.

Journal Article•DOI•
TL;DR: A method and tool are presented for generating parameterized and realistic synthetic circuits and a set of graph-theoretic characteristics that describe a physical netlist are proposed, and a tool is built that can measure these characteristics on existing circuits.
Abstract: The development of new field-programmed, mask-programmed, and laser-programmed gate-array architectures is hampered by the lack of realistic test circuits that exercise both the architectures and their automatic placement and routing algorithms. In this paper, we present a method and a tool for generating parameterized and realistic synthetic circuits. To obtain the realism, we propose a set of graph-theoretic characteristics that describe a physical netlist, and have built a tool that can measure these characteristics on existing circuits. The generation tool uses the characteristics as constraints in the synthetic circuit generation. To validate the quality of the generated netlists, parameters that are not specified in the generation are compared with those of real circuits and with those of more "random" graphs.

Journal Article•DOI•
TL;DR: FBB-MW is presented, which is an extension of FBB, to solve the problem of multiway partitioning with area and pin constraints, and Experimental results show that FBB-MW outperforms previous approaches for multiple field programmable gate array partitioning.
Abstract: Network flow is an excellent approach to finding min-cuts because of the celebrated max-flow min-cut theorem. For a long time, however, it was perceived as computationally expensive and deemed impractical for circuit partitioning. Recently, the algorithm FBB successfully applied network flow to two-way balanced partitioning. It for the first time demonstrated that network flow was a viable approach to circuit partitioning. In this paper, we present FBB-MW, which is an extension of FBB, to solve the problem of multiway partitioning with area and pin constraints. Experimental results show that FBB-MW outperforms previous approaches for multiple field programmable gate array partitioning. In particular, although FBB-MW does not employ logic replication and logic resynthesis, it still outperforms some other algorithms, which allow replication and resynthesis for optimization.

Journal Article•DOI•
TL;DR: A clustering algorithm that does not segment circuits by removing FF's is presented, which can produce clustering solutions with the optimal clock period under the unit delay model and the effect of retiming.
Abstract: In this paper we consider the problem of clustering sequential circuits subject to a bound on the area of each cluster, with the objective of minimizing the clock period. Current algorithms address combinational circuits only, and treat a sequential circuit as a special case, by removing all flip-flops (FF's) and clustering the combinational part of the sequential circuit. This approach breaks the signal dependencies and assumes the positions of FF's are fixed. The positions of the FF's in a sequential circuit are in fact dynamic, because of retiming. As a result, current algorithms can only consider a small portion of the whole solution space. In this paper, we present a clustering algorithm that does not segment circuits by removing FF's. In additional, it considers the effect of retiming. The algorithm can produce clustering solutions with the optimal clock period under the unit delay model. For the general delay model, it can produce clustering solutions with a clock period provably close to optimal.