scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 1999"


Journal ArticleDOI
TL;DR: A finite-state, abstract system model for power-managed systems based on Markov decision processes is introduced and the problem of finding policies that optimally tradeoff performance for power can be cast as a stochastic optimization problem and solved exactly and efficiently.
Abstract: Dynamic power management schemes (also called policies) reduce the power consumption of complex electronic systems by trading off performance for power in a controlled fashion, taking system workload into account. In a power-managed system it is possible to set components into different states, each characterized by performance and power consumption levels. The main function of a power management policy is to decide when to perform component state transitions and which transition should be performed, depending on system history, workload, and performance constraints. In the past, power management policies have been formulated heuristically. The main contribution of this paper is to introduce a finite-state, abstract system model for power-managed systems based on Markov decision processes. Under this model, the problem of finding policies that optimally tradeoff performance for power can be cast as a stochastic optimization problem and solved exactly and efficiently. The applicability and generality of the approach are assessed by formulating the Markov model and optimizing power management policies for several systems.

459 citations


Journal ArticleDOI
TL;DR: This paper studies the semantics of hierarchical finite state machines that are composed using various concurrency models, particularly dataflow, discrete-events, and synchronous/reactive modeling, and argues that all three combinations are useful, and that the concurrency model can be selected independently of the decision to use hierarchical FSM's.
Abstract: This paper studies the semantics of hierarchical finite state machines (FSM's) that are composed using various concurrency models, particularly dataflow, discrete-events, and synchronous/reactive modeling. It is argued that all three combinations are useful, and that the concurrency model can be selected independently of the decision to use hierarchical FSM's. In contrast, most formalisms that combine FSM's with concurrency models, such as statecharts (and its variants) and hybrid systems, tightly integrate the FSM semantics with the concurrency semantics. An implementation that supports three combinations is described.

349 citations


Journal ArticleDOI
TL;DR: A new solution of the multiple constant multiplication problem based on the common subexpression elimination technique is presented and it is shown that the number of add/subtract operations can be reduced significantly this way.
Abstract: The problem of an efficient hardware implementation of multiplications with one or more constants is encountered in many different digital signal-processing areas, such as image processing or digital filter optimization. In a more general form, this is a problem of common subexpression elimination, and as such it also occurs in compiler optimization and many high-level synthesis tasks. An efficient solution of this problem can yield significant improvements in important design parameters like implementation area or power consumption. In this paper, a new solution of the multiple constant multiplication problem based on the common subexpression elimination technique is presented. The performance of our method is demonstrated primarily on a finite-duration impulse response filter design. The idea is to implement a set of constant multiplications as a set of add-shift operations and to optimize these with respect to the common subexpressions afterwards. We show that the number of add/subtract operations can be reduced significantly this way. The applicability of the presented algorithm to the different high-level synthesis tasks is also indicated. Benchmarks demonstrating the algorithm's efficiency are included as well.

297 citations


Journal ArticleDOI
TL;DR: This work developed the design methodology for the low-power core-based real-time SOC based on dynamically variable voltage hardware and proposes a nonpreemptive scheduling heuristic, which results in solutions very close to optimal ones for many test cases.
Abstract: The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. The energy efficiency of systems-on-a-chip (SOC) could be much improved if one were to vary the supply voltage dynamically at run time. We developed the design methodology for the low-power core-based real-time SOC based on dynamically variable voltage hardware. The key challenge is to develop effective scheduling techniques that treat voltage as a variable to be determined, in addition to the conventional task scheduling and allocation. Our synthesis technique also addresses the selection of the processor core and the determination of the instruction and data cache size and configuration so as to fully exploit dynamically variable voltage hardware, which results in significantly lower power consumption for a set of target applications than existing techniques. The highlight of the proposed approach is the nonpreemptive scheduling heuristic, which results in solutions very close to optimal ones for many test cases. The effectiveness of the approach is demonstrated on a variety of modern industrial strength multimedia and communication applications.

270 citations


Journal ArticleDOI
TL;DR: Methods for estimating leakage at the circuit level are outlined and a heuristic and exact algorithms to accomplish the same task for random combinational logic are proposed.
Abstract: Subthreshold leakage current in deep submicron MOS transistors is becoming a significant contributor to power dissipation in CMOS circuits as threshold voltages and channel lengths are reduced. Consequently, estimation of leakage current and identification of minimum and maximum leakage conditions are becoming important, especially in low power applications. In this paper we outline methods for estimating leakage at the circuit level and then propose heuristic and exact algorithms to accomplish the same task for random combinational logic. In most cases the heuristic is found to obtain bounds on leakage that are close and often identical to bounds determined by a complete branch and bound search. Methods are also demonstrated to show how estimation accuracy can be traded off against execution time. The proposed algorithms have potential application in power management applications or quiescent current (I/sub D/DQ) testing if one wished to control leakage by application of appropriate input vectors. For a variety of benchmark circuits, leakage was found to vary by as much as a factor of six over the space of possible input vectors.

199 citations


Journal ArticleDOI
TL;DR: This paper provides easily computable expressions for crosstalk amplitude and pulse width in resistive, capacitively coupled lines and these expressions hold for nets with arbitrary number of pins and of arbitrary topology under any specified input excitation.
Abstract: We address the problem of crosstalk computation and reduction using circuit and layout techniques in this paper. We provide easily computable expressions for crosstalk amplitude and pulse width in resistive, capacitively coupled lines. The expressions hold for nets with arbitrary number of pins and of arbitrary topology under any specified input excitation. Experimental results show that the average error is about 10% and the maximum error is less than 20%. The expressions are used to motivate circuit techniques, such as transistor sizing, and layout techniques, such as wire ordering and wire width optimization to reduce crosstalk.

165 citations


Journal ArticleDOI
TL;DR: It is shown that optimizing delay alone cannot fix all of the noise violations and that the performance penalty induced by optimizing both delay and noise as opposed to only delay is less than 2%.
Abstract: Interconnect-driven optimization is an increasingly important step in high-performance design. Algorithms for buffer insertion have been successfully utilized to reduce delay in global interconnect paths; however, existing techniques only optimize delay and timing slack, With the continually increasing ratio of coupling capacitance to total capacitance and the use of aggressive dynamic logic circuit families, noise analysis and avoidance is becoming a major design bottleneck. Hence, timing and noise must be simultaneously optimized to achieve maximum performance. This paper presents comprehensive buffer insertion techniques for noise and delay optimization. Three algorithms are presented, the first for noise avoidance for single sink trees, the second for avoidance for multiple sink trees, and the last for simultaneous noise and delay optimization. We prove the optimality of each algorithm (under various assumptions) and present other theoretical results as well. We ran experiments on a high-performance microprocessor design and show that our approach fixes all noise violations, Our approach was separately verified by a detailed, simulation-based noise analysis tool. Further, we show that optimizing delay alone cannot fix all of the noise violations and that the performance penalty induced by optimizing both delay and noise as opposed to only delay is less than 2%.

135 citations


Journal ArticleDOI
TL;DR: The proposed area model is based on transforming the given, multi-output Boolean function description into an equivalent single-output function, and is empirical, and results demonstrating its feasibility and utility are presented.
Abstract: High-level power estimation, when given only a high-level design specification such as a functional or register-transfer level (RTL) description, requires high-level estimation of the circuit average activity and total capacitance. Considering that total capacitance is related to circuit area, this paper addresses the problem of computing the "area complexity" of multi-output combinational logic given only their functional description, i.e., Boolean equations, where area complexity refers to the number of gates required for an optimal multilevel implementation of the combinational logic. The proposed area model is based on transforming the multi-output Boolean function description into an equivalent single output function. The area model is empirical and results demonstrating its feasibility and utility are presented. Also, a methodology for converting the gate count estimates, obtained from the area model, into capacitance estimates is presented. High-level power estimates based on the total capacitance estimates and average activity estimates are also presented.

119 citations


Journal ArticleDOI
TL;DR: A metric for noise immunity is defined, and a static noise analysis methodology based on this noise-stability metric is introduced to demonstrate how noise can be analyzed systematically on a full-chip basis using simulation-based transistor-level analysis.
Abstract: As technology scales into the deep submicron regime, noise immunity is becoming a metric of comparable importance to area, timing, and power for the analysis and design of very large scale integrated (VLSI) systems. A metric for noise immunity is defined, and a static noise analysis methodology based on this noise-stability metric is introduced to demonstrate how noise can be analyzed systematically on a full-chip basis using simulation-based transistor-level analysis. We then describe Harmony, a two-level (macro and global) hierarchical implementation of static noise analysis. At the macro level, simplified interconnect models and timing assumptions guide efficient analysis. The global level involves a careful combination of static noise analysis, static timing analysis, and detailed interconnect macromodels based on reduced-order modeling techniques. We describe how the interconnect macromodels are practically employed to perform coupling analysis and how timing constraints can be used to limit pessimism in the analysis.

113 citations


Journal ArticleDOI
TL;DR: A test vector simulation-based approach for multiple design error diagnosis and correction in digital VLSI circuits that is applicable to circuits with no global binary decision diagram representation.
Abstract: With the increase in the complexity of digital VLSI circuit design, logic design errors can occur during synthesis. In this paper, we present a test vector simulation-based approach for multiple design error diagnosis and correction. Diagnosis is performed through an implicit enumeration of the erroneous lines in an effort to avoid the exponential explosion of the error space as the number of errors increases. Resynthesis during correction is as little as possible so that most of the engineering effort invested in the design is preserved. Since both steps are based on test vector simulation, the proposed approach is applicable to circuits with no global binary decision diagram representation. Experiments on ISCAS'85 benchmark circuits exhibit the robustness and error resolution of the proposed methodology. Experiments also indicate that test vector simulation is indeed an attractive technique for multiple design error diagnosis and correction in digital VLSI circuits.

112 citations


Journal ArticleDOI
TL;DR: This paper compress the instruction segment of the executable running on the embedded system, and shows how to design a run-time decompression unit to decompress code on the fly before execution.
Abstract: In this paper, we present a method for reducing the memory requirements of an embedded system by using code compression. We compress the instruction segment of the executable running on the embedded system, and we show how to design a run-time decompression unit to decompress code on the fly before execution. Our algorithm uses arithmetic coding in combination with a Markov model, which is adapted to the instruction set and the application. We provide experimental results on two architectures, Analog Devices' Share and ARM's ARM and Thumb instruction sets, and show that programs can often be reduced by more than 50%. Furthermore, we suggest a table-based design that allows multibit decoding to speed up decompression.

Journal ArticleDOI
TL;DR: This paper develops efficient wirelength estimation techniques appropriate for wire length estimation during top-down floorplanning and placement of cell-based designs and develops new wirelength estimates that are functions of a block's complexity (number of cell instances) and aspect ratio.
Abstract: Wirelength estimation in very large scale integration layout is fundamental to any predetailed routing estimate of timing or routability. In this paper, we develop efficient wirelength estimation techniques appropriate for wirelength estimation during top-down floorplanning and placement of cell-based designs. Our methods give accurate, linear-time approaches, typically with sublinear time complexity for dynamic updating of estimates (e.g., for annealing placement). Our techniques offer advantages not only for early on-line wirelength estimation during top-down placement, but also for a posteriori estimation of routed wirelength given a final placement. In developing these new estimators, we have made several contributions, including (1) insight into the contrast between region-based and bounding box-based rectilinear Steiner minimal tree (RStMT) estimation techniques; (2) empirical assessment of the correlations between pin placements of a multipin net that is contained in a block; and (3) new wirelength estimates that are functions of a block's complexity (number of cell instances) and aspect ratio.

Journal ArticleDOI
TL;DR: A serial fault emulation algorithm enhanced by two speed-up techniques that uses the field programmable gate array (FPGA)-based emulation system for fault grading and shows that this approach could be several orders of magnitude faster than the existing software approaches for large sequential designs.
Abstract: In this paper, we introduce a method that uses the field programmable gate array (FPGA)-based emulation system for fault grading. The real-time simulation capability of a hardware emulator could significantly improve the performance of fault grading, which is one of the most time consuming tasks in the circuit design and test process. We employ a serial fault emulation algorithm enhanced by two speed-up techniques. First, a set of independent faults can be injected and emulated at the same time. Second, multiple dependent faults can be simultaneously injected within a single FPGA-configuration by adding extra circuitry. Because the reconfiguration time of mapping the numerous faulty circuits into the FPGA's is pure overhead and could be the bottleneck of the entire process, using extra circuitry for injecting a large number of faults can reduce the number of FPGA-reconfigurations and, thus, improving the performance significantly. In addition, we address the issue of handling potentially detected faults in this hardware emulation environment by using the dual-railed logic. The performance estimation shows that this approach could be several orders of magnitude faster than the existing software approaches for large sequential designs.

Journal ArticleDOI
TL;DR: This paper presents a fast eigendecomposition technique that accelerates operator application in BEM methods and avoids the dense-matrix storage while taking all of the substrate boundary effects into account explicitly.
Abstract: Industry trends aimed at integrating higher levels of circuit functionality have triggered a proliferation of mixed analog-digital systems. Magnified noise coupling through the common chip substrate has made the design and verification of such systems an increasingly difficult task. In this paper we present a fast eigendecomposition technique that accelerates operator application in BEM methods and avoids the dense-matrix storage while taking all of the substrate boundary effects into account explicitly. This technique can be used for accurate and efficient modeling of substrate coupling effects in mixed-signal integrated circuits.

Journal ArticleDOI
B. Chess1, T. Larrabee
TL;DR: This work demonstrates that if information is removed from a fault dictionary, its ability to diagnose unmodeled faults may be severely curtailed even if dictionary quality metrics remain unaffected; it presents a new dictionary organization based on error sets, which is amenable to standard data-compression techniques.
Abstract: Diagnostic fault simulation can generate enormous amounts of data. The techniques used to manage this data can have significant effect on the outcome of the fault diagnosis procedure. We first demonstrate that if information is removed from a fault dictionary, its ability to diagnose unmodeled faults may be severely curtailed even if dictionary quality metrics remain unaffected; we, therefore, focus on methods for producing small, lossless dictionaries, We present a new dictionary organization based on error sets, which is amenable to standard data-compression techniques. We compare several dictionary organizations and the effect of standard data-compression techniques on each of them. An appropriate organization and encoding makes dictionary-based diagnosis practical for very large circuits.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a new design style called extended burst-mode, which can synthesize multiple-input change asynchronous finite state machines and many circuits that are difficult or impossible to synthesize automatically using existing methods.
Abstract: We introduce a new design style called extended burst-mode. The extended burst-mode design style covers a wide spectrum of sequential circuits ranging from delay-insensitive to synchronous. We can synthesize multiple-input change asynchronous finite state machines and many circuits that fall in the gray area (hard to classify as synchronous or asynchronous) which are difficult or impossible to synthesize automatically using existing methods. Our implementation of extended burst-mode machines uses standard CMOS logic, generates low-latency outputs, and guarantees freedom from hazards at the gate level. In Part I, we formally define the extended burst-mode specification, provide an overview of the synthesis methods, and describe the hazard-free synthesis requirements for two different next-state logic synthesis methods: two-level sums-of-products implementation and generalized C-elements implementation. We also present an extension to existing theories for hazard-free combinational synthesis to handle nonmonotonic input changes.

Journal ArticleDOI
TL;DR: Techniques that attempt to reduce glitching power consumption by minimizing propagation of glitches in the RTL circuit are developed, which include restructuring multiplexer networks, clocking control signals, and inserting selective rising/falling delays, in order to kill the propagate of glitches from control as well as data signals.
Abstract: We present design-for-low-power techniques for register-transfer level (RTL) controller/data path circuits. We analyze the generation and propagation of glitches in both the control and data path parts of the circuit. In data-flow intensive designs, glitching power is primarily due to the chaining of arithmetic functional units. In control-flow intensive designs, on the other hand, multiplexer networks and registers dominate the total circuit power consumption, and the control logic can generate a significant amount of glitches at its outputs, which in turn propagate through the data path to account for a large portion of the glitching power in the entire circuit. Our analysis also highlights the relationship between the propagation of glitches from control signals and the bit-level correlation between data signals. Based on the analysis, we develop techniques that attempt to reduce glitching power consumption by minimizing propagation of glitches in the RTL circuit. Our techniques include restructuring multiplexer networks (to enhance data correlations and eliminate glitchy control signals), clocking control signals, and inserting selective rising/falling delays, in order to kill the propagation of glitches from control as well as data signals. In addition, we present a procedure to automatically perform the well-known power-reduction technique of clock gating through an efficient structural analysis of the RTL circuit, while avoiding the introduction of glitches on the clock signals. Application of the proposed power optimization techniques to several RTL circuits shows significant power savings, with negligible area and delay overheads.

Journal ArticleDOI
TL;DR: A software generation methodology is proposed that takes advantage of a restricted class of specifications and allows for tight control over the implementation cost, and exploits several techniques from the domain of Boolean function optimization.
Abstract: Software components for embedded reactive real-time applications must satisfy tight code size and run-time constraints. Cooperating finite state machines provide convenient intermediate format for embedded system co-synthesis, between high-level specification languages and software or hardware implementations. We propose a software generation methodology that takes advantage of a restricted class of specifications and allows for tight control over the implementation cost. The methodology exploits several techniques from the domain of Boolean function optimization. We also describe how the simplified control/data-flow graph used as an intermediate representation can be used to accurately estimate the size and timing cost of the final executable code.

Journal ArticleDOI
TL;DR: Techniques are presented to compactly represent substrate noise currents injected by digital networks using device-level simulation and standard benchmark circuits to verify the validity of the assumptions and to measure the accuracy of the obtained power spectra.
Abstract: Techniques are presented to compactly represent substrate noise currents injected by digital networks. Using device-level simulation, every gate in a given library is modeled by means of the signal waveform it injects into the substrate, depending on its input transition scheme. For a given sequence of input vectors, the switching activity of every node in the Boolean network is computed. Assuming that technology mapping has been performed, each node corresponds to a gate in the library, hence, to a specific injection waveform. The noise contribution of each node is computed by convolving its switching activity with the associated injection waveforms. The total injected noise for the digital block is then obtained by summing all the noise contributions in the circuit. The resulting injected noise can be viewed as a random process, whose power spectrum is computed using standard signal processing techniques. A study was performed on a number of standard benchmark circuits to verify the validity of the assumptions and to measure the accuracy of the obtained power spectra.

Journal ArticleDOI
TL;DR: A matrix-based derivation of the error between the original circuit transfer function and the reduced-order transfer function generated using the PVL technique is presented and may be used for the development of an automated termination of the Lanczos process in the PVl technique and achieve the desired accuracy of the approximate transfer function.
Abstract: Recently, there has been a great deal of interest in using the Pade Via Lanczos (PVL) technique to analyze the transfer functions and impulse responses of large-scale linear circuits. In this paper, a matrix-based derivation of the error between the original circuit transfer function and the reduced-order transfer function generated using the PVL technique is presented. This error measure may be used for the development of an automated termination of the Lanczos process in the PVL technique and achieve the desired accuracy of the approximate transfer function. PVL coupled with such an error bound will be referred to as the PVL-WEB algorithm.

Journal ArticleDOI
Yuejian Wu1, S.M.I. Adham1
TL;DR: This paper presents a novel BIST fault diagnostic technique for scan-based VLSI devices based on faulty signature information that is applicable to all voltage-detectable faults, and applies naturally to multifrequency BIST.
Abstract: Existing built-in self-test (BIST) diagnostic techniques assume the existence of a few bit errors in a test response sequence. This assumption is unrealistic since in a BIST environment a single defect can usually cause hundreds or thousands of errors in a test response sequence. Without making the above assumption, this paper presents a novel BIST fault diagnostic technique for scan-based VLSI devices. Based on faulty signature information, our scheme guarantees correct identification of the scan flip-flops that capture errors during test, regardless of the number of errors the circuit may produce. In addition, it is able to identify failing test vectors with a better diagnostic capacity than existing techniques. The proposed scheme does not assume any specific fault model. Thus, it is applicable to all voltage-detectable faults. It also applies naturally to multifrequency BIST. This paper analyzes the efficiency of the scheme in terms of diagnostic coverage. Experimental results on several large ISCAS89 benchmark circuits and industrial circuits are also reported.

Journal ArticleDOI
TL;DR: A modeling technique for CMOS gates, based on the reduction of each gate to an equivalent inverter, is presented and can be incorporated in existing timing simulators in order to improve their accuracy.
Abstract: In this paper, a modeling technique for CMOS gates, based on the reduction of each gate to an equivalent inverter, is presented. The proposed method can be incorporated in existing timing simulators in order to improve their accuracy. The conducting and parasitic behavior of parallel and serially connected transistors is accurately analyzed and an equivalent transistor is extracted for each case, taking into account the actual operating conditions of each device in the structure. The proposed model incorporates short-channel effects, the influence of body effect and is developed for nonzero transition time inputs. The exact time point when the gate starts conducting is efficiently calculated improving significantly the accuracy of the method. A mapping algorithm for reducing every possible input pattern of a gate to an equivalent signal is introduced and the "weight" of each transistor position in the gate structure is extracted. Complex gates are treated by first mapping every possible structure to a NAND/NOR gate and then by collapsing this gate to an equivalent inverter. Results are validated by comparisons to SPICE and ILLIADS2 for three submicron technologies.

Journal ArticleDOI
TL;DR: This work presents an analytical strategy for exploring the on-chip memory architecture for a given application, based on a memory performance estimation scheme, and demonstrates that its estimations closely follow the actual simulated performance at significantly reduced run times.
Abstract: Embedded processor-based systems allow for the tailoring of the on-chip memory architecture based on application specific requirements. We present an analytical strategy for exploring the on-chip memory architecture for a given application, based on a memory performance estimation scheme. The analytical technique has the important advantage of enabling a fast evaluation of candidate memory architectures in the early stages of system design. Many digital signal-processing applications involve array accesses and loop nests that can benefit from such an exploration. Our experiments demonstrate that our estimations closely follow the actual simulated performance at significantly reduced run times.

Journal ArticleDOI
TL;DR: This paper proves that using information about (partial) symmetries for the minimization of reduced ordered binary decision diagrams (ROBDD's) lead to improvements of the ROBDD sizes by up to 70%.
Abstract: In this paper we study the effect of using information about (partial) symmetries for the minimization of reduced ordered binary decision diagrams (ROBDD's). The influence of symmetries for the integration in dynamic variable ordering is studied for both completely and incompletely specified Boolean functions. The problems above are studied from a theoretical and practical point of view. Statistical results and benchmark results are reported to underline the efficiency of the approach. They prove that our techniques lead to improvements of the ROBDD sizes by up to 70%.

Journal ArticleDOI
TL;DR: To reduce processor time and memory requirements at high drain voltage, a self-consistent option based on a solution of the current continuity equation restricted to a thin slab of the channel is developed.
Abstract: We present a hierarchical approach to the "atomistic" simulation of aggressively scaled sub-0.1-/spl mu/m MOSFETs. These devices are so small that their characteristics depend on the precise location of dopant atoms within them, not just on their average density. A full-scale three-dimensional drift-diffusion atomistic simulation approach is first described and used to verify more economical, but restricted, options. To reduce processor time and memory requirements at high drain voltage, we have developed a self-consistent option based on a solution of the current continuity equation restricted to a thin slab of the channel. This is coupled to the solution of the Poisson equation in the whole simulation domain in the Gummel iteration cycles. The accuracy of this approach is investigated in comparison to the full self-consistent solution. At low drain voltage, a single solution of the nonlinear Poisson equation is sufficient to extract the current with satisfactory accuracy. In this case, the current is calculated by solving the current continuity equation in a drift approximation only, also in a thin slab containing the MOSFET channel. The regions of applicability for the different components of this hierarchical approach are illustrated in example simulations covering the random dopant-induced threshold voltage fluctuations, threshold voltage lowering, threshold voltage asymmetry, and drain current fluctuations.

Journal ArticleDOI
TL;DR: A novel test methodology that not only substantially reduces the total test pattern number for multiple circuits but also allows a single input data line to support multiple scan chains and provides a low-cost and high-performance method to integrate the boundary scan and scan architectures.
Abstract: Scan designs can alleviate test difficulties of sequential circuits by replacing the memory elements with scannable registers. However, scan operations usually result in long test application time. Most classical methods to solving this problem either perform test compaction to obtain fewer test vectors or use multiple scan chain design to reduce the scan time. For a large system, test vector compaction is a time-consuming process, while multiple scan chains either require extra pin overhead or need the sharing of normal I/O and scan I/O pins. In this paper, we present a novel test methodology that not only substantially reduces the total test pattern number for multiple circuits but also allows a single input data line to support multiple scan chains. Our main idea is to explore the "sharing" property of test patterns among all circuits under test (CUT's). By appropriately connecting the inputs of all CUT's during the automatic test-pattern generation process such that the generated test patterns can be broadcast to all scan chains when the actual testing operation is executed, the above-mentioned problems can be solved effectively. Our method also provides a low-cost and high-performance method to integrate the boundary scan and scan architectures. Experimental results show that 157 test patterns are enough to detect all detectable faults in the ten ISCAS'85 combinational circuits, while 280 are enough for the ten largest ISCAS'89 scan-based sequential circuits.

Journal ArticleDOI
TL;DR: In this paper, the critical area for shorts in a circuit layout is computed in O(n log n) time, where n is the size of the input, and is based on the concept of Voronoi diagrams.
Abstract: In this paper, we present a new approach for computing the critical area for shorts in a circuit layout. The critical area calculation is the main computational problem in very large scale integration yield prediction. The method is based on the concept of Voronoi diagrams and computes the critical area for shorts (for all possible defect radii, assuming square defects) accurately in O(n log n) time, where n is the size of the input. The method is presented for rectilinear layouts and layouts containing edges of slope /spl plusmn/1. As a byproduct, we briefly sketch how to speed up the grid method of Wagner and Koren [1995].

Journal ArticleDOI
TL;DR: In this paper, the authors present a communication estimation model and show, by the use of this model, the importance of integrating communication protocol selection with hardware/software partitioning, which is illustrated by a number of design space exploration experiments performed within the LYCOS cosynthesis system, using models of the PCI and USB protocols.
Abstract: This paper explores the problem of determining the characteristics of the communication links in a computer system as well as determining the best functional partitioning. In particular, we present a communication estimation model and show, by the use of this model, the importance of integrating communication protocol selection with hardware/software partitioning. The communication estimation model allows for fast estimation but is still sufficiently detailed as to allow the designer or design tool to efficiently explore tradeoffs between throughputs, bus widths, burst/nonburst transfers, operating frequencies of system components such as buses, CPU's, ASIC's, software code size, hardware area, and component prices. A distinct feature of the model is the modeling of driver processing of data (packing, splitting, compression, etc.) and its impact on communication throughput. The integration of communication protocol selection and communication driver design with hardware/software partitioning is illustrated by a number of design space exploration experiments carried out within the LYCOS cosynthesis system, using models of the PCI and USB protocols.

Journal ArticleDOI
TL;DR: An algorithm is developed, targeted to the decompression hardware imbedded in the Xilinx XC6200 series field-programmable gate array architecture, that can radically reduce the amount of data needed to transfer during reconfiguration.
Abstract: One of the major overheads in reconfigurable computing is the time it takes to reconfigure the devices in the system. This overhead limits the speedups possible in this exciting new paradigm. In this paper me explore one technique for reducing this overhead: the compression of configuration datastreams. We develop an algorithm, targeted to the decompression hardware imbedded in the Xilinx XC6200 series field-programmable gate array architecture, that can radically reduce the amount of data needed to transfer during reconfiguration. This results in an overall reduction of about a factor of four in total bandwidth required for reconfiguration.

Journal ArticleDOI
TL;DR: A formula-specific method for implementing Boolean satisfiability solver circuits in configurable hardware using a template generator, which realizes large amount of fine-grained parallelism, and has broad applications in the very large scale integration CAD area.
Abstract: The issues of software compute time and complexity are very important in current computer-aided design (CAD) tools. As field-programmable gate array (FPGA) speeds and densities increase, the opportunity for effective hardware accelerators built from FPGA technology has opened up. This paper describes and evaluates a formula-specific method for implementing Boolean satisfiability solver circuits in configurable hardware. That is, using a template generator, we create circuits specific to the problem instance to be solved. This approach yields impressive runtime speedups of up to several hundred times compared to the software approaches. The high performance comes from realizing fine-grained parallelism inherent in the clause evaluation and implication and from direct mapping of Boolean relations into logic gates. Our implementation uses a commercially available hardware system for proof of concept. This system yields more than 100 times run-time speedup on many problems, even though the clock rate of the hardware is 100 times slower than that of the workstation running the software solver. While the time to compile the solver circuit to configurable hardware can he quite long on current platforms (20-40 min per chip), this paper discusses new approaches to overcome this compilation overhead. More broadly, we view this work as a case study in the burgeoning domain of high performance configurable computing. Our approach realizes large amount of fine-grained parallelism, and has broad applications in the very large scale integration CAD area.