scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2006"


Journal ArticleDOI
TL;DR: Efficient quantum-logic circuits that perform two tasks are discussed: 1) implementing generic quantum computations, and 2) initializing quantum registers that are asymptotically optimal for respective tasks.
Abstract: The pressure of fundamental limits on classical computation and the promise of exponential speedups from quantum effects have recently brought quantum circuits (Proc. R. Soc. Lond. A, Math. Phys. Sci., vol. 425, p. 73, 1989) to the attention of the electronic design automation community (Proc. 40th ACM/IEEE Design Automation Conf., 2003), (Phys. Rev. A, At. Mol. Opt. Phy., vol. 68, p. 012318, 2003), (Proc. 41st Design Automation Conf., 2004), (Proc. 39th Design Automation Conf., 2002), (Proc. Design, Automation, and Test Eur., 2004), (Phys. Rev. A, At. Mol. Opt. Phy., vol. 69, p. 062321, 2004), (IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 22, p. 710, 2003). Efficient quantum-logic circuits that perform two tasks are discussed: 1) implementing generic quantum computations, and 2) initializing quantum registers. In contrast to conventional computing, the latter task is nontrivial because the state space of an n-qubit register is not finite and contains exponential superpositions of classical bitstrings. The proposed circuits are asymptotically optimal for respective tasks and improve earlier published results by at least a factor of 2. The circuits for generic quantum computation constructed by the algorithms are the most efficient known today in terms of the number of most expensive gates [quantum controlled-NOTs (CNOTs)]. They are based on an analog of the Shannon decomposition of Boolean functions and a new circuit block, called quantum multiplexor (QMUX), which generalizes several known constructions. A theoretical lower bound implies that the circuits cannot be improved by more than a factor of 2. It is additionally shown how to accommodate the severe architectural limitation of using only nearest neighbor gates, which is representative of current implementation technologies. This increases the number of gates by almost an order of magnitude, but preserves the asymptotic optimality of gate counts

545 citations


Journal ArticleDOI
TL;DR: A canonical first-order delay model that takes into account both correlated and independent randomness is proposed, and the first incremental statistical timer in the literature is reported, suitable for use in the inner loop of physical synthesis or other optimization programs.
Abstract: Variability in digital integrated circuits makes timing verification an extremely challenging task. In this paper, a canonical first-order delay model that takes into account both correlated and independent randomness is proposed. A novel linear-time block-based statistical timing algorithm is employed to propagate timing quantities like arrival times and required arrival times through the timing graph in this canonical form. At the end of the statistical timing, the sensitivity of all timing quantities to each of the sources of variation is available. Excessive sensitivities can then be targeted by manual or automatic optimization methods to improve the robustness of the design. This paper also reports the first incremental statistical timer in the literature, which is suitable for use in the inner loop of physical synthesis or other optimization programs. The third novel contribution of this paper is the computation of local and global criticality probabilities. For a very small cost in computer time, the probability of each edge or node of the timing graph being critical is computed. Numerical results are presented on industrial application-specified integrated circuit (ASIC) chips with over two million logic gates, and statistical timing results are compared to exhaustive corner analysis on a chip design whose hardware showed early mode timing violations

416 citations


Journal ArticleDOI
TL;DR: The algorithm uses the positive-polarity Reed-Muller expansion of a reversible function to synthesize the function as a network of Toffoli gates, and is able to quickly synthesize all four-variable and most five-variable reversible functions that were in the test suite.
Abstract: Reversible logic finds many applications, especially in the area of quantum computing. A completely specified n-input, n-output Boolean function is called reversible if it maps each input assignment to a unique output assignment and vice versa. Logic synthesis for reversible functions differs substantially from traditional logic synthesis and is currently an active area of research. The authors present an algorithm and tool for the synthesis of reversible functions. The algorithm uses the positive-polarity Reed-Muller expansion of a reversible function to synthesize the function as a network of Toffoli gates. At each stage, candidate factors, which represent subexpressions common between the Reed-Muller expansions of multiple outputs, are explored in the order of their attractiveness. The algorithm utilizes a priority-based search tree, and heuristics are used to rapidly prune the search space. The synthesis algorithm currently targets the generalized n-bit Toffoli gate library. However, other algorithms exist that can convert an n-bit Toffoli gate into a cascade of smaller Toffoli gates. Experimental results indicate that the authors' algorithm quickly synthesizes circuits when tested on the set of all reversible functions of three variables. Furthermore, it is able to quickly synthesize all four-variable and most five-variable reversible functions that were in the test suite. The authors also present results for some benchmark functions widely discussed in literature and some new benchmarks that the authors have developed. The algorithm is shown to synthesize many, but not all, randomly generated reversible functions of as many as 16 variables with a maximum gate count of 25

377 citations


Journal ArticleDOI
TL;DR: A gate-level radiation hardening technique for cost-effective reduction of the soft error failure rate in combinational logic circuits is described, which uses a novel gate (transistor) sizing technique that is both efficient and accurate.
Abstract: A gate-level radiation hardening technique for cost-effective reduction of the soft error failure rate in combinational logic circuits is described. The key idea is to exploit the asymmetric logical masking probabilities of gates, hardening gates that have the lowest logical masking probability to achieve cost-effective tradeoffs between overhead and soft error failure rate reduction. The asymmetry in the logical masking probabilities at a gate is leveraged by decoupling the physical from the logical (Boolean) aspects of soft error susceptibility of the gate. Gates are hardened to single-event upsets (SEUs) with specified worst case characteristics in increasing order of their logical masking probability, thereby maximizing the reduction in the soft error failure rate for specified overhead costs (area, power, and delay). Gate sizing for radiation hardening uses a novel gate (transistor) sizing technique that is both efficient and accurate. A full set of experimental results for process technologies ranging from 180 to 70 nm demonstrates the cost-effective tradeoffs that can be achieved. On average, the proposed technique has a radiation hardening overhead of 38.3%, 27.1%, and 3.8% in area, power, and delay for worst case SEUs across the four process technologies.

332 citations


Journal ArticleDOI
TL;DR: This paper constitutes the first successful experience of applying formal methods and satisfiability to quantum logic synthesis, thus synthesizing in principle arbitrary multi-output Boolean functions with quantum gate library.
Abstract: This paper proposes an approach to optimally synthesize quantum circuits by symbolic reachability analysis, where the primary inputs and outputs are basis binary and the internal signals can be nonbinary in a multiple-valued domain. The authors present an optimal synthesis method to minimize quantum cost and some speedup methods with nonoptimal quantum cost. The methods here are applicable to small reversible functions. Unlike previous works that use permutative reversible gates, a lower level library that includes nonpermutative quantum gates is used here. The proposed approach obtains the minimum cost quantum circuits for Miller gate, half adder, and full adder, which are better than previous results. This cost is minimum for any circuit using the set of quantum gates in this paper, where the control qubit of 2-qubit gates is always basis binary. In addition, the minimum quantum cost in the same manner for Fredkin, Peres, and Toffoli gates is proven. The method can also find the best conversion from an irreversible function to a reversible circuit as a byproduct of the generality of its formulation, thus synthesizing in principle arbitrary multi-output Boolean functions with quantum gate library. This paper constitutes the first successful experience of applying formal methods and satisfiability to quantum logic synthesis

254 citations


Journal ArticleDOI
TL;DR: The proposed top-down design-automation approach is expected to relieve biochip users from the burden of manual optimization of bioassays, time-consuming hardware design, and costly testing and maintenance procedures, and it will facilitate the integration of fluidic components with a microelectronic component in next-generation systems-on-chips (SOCs).
Abstract: Microfluidics-based biochips are soon expected to revolutionize clinical diagnosis, deoxyribonucleic acid (DNA) sequencing, and other laboratory procedures involving molecular biology. In contrast to continuous-flow systems that rely on permanently etched microchannels, micropumps, and microvalves, digital microfluidics offers a scalable system architecture and dynamic reconfigurability; groups of unit cells in a microfluidics array can be reconfigured to change their functionality during the concurrent execution of a set of bioassays. As more bioassays are executed concurrently on a biochip, system integration and design complexity are expected to increase dramatically. This paper presents an overview of an integrated system-level design methodology that attempts to address key issues in the synthesis, testing and reconfiguration of digital microfluidics-based biochips. Different actuation mechanisms for microfluidics-based biochips, and associated design-automation trends and challenges are also discussed. The proposed top-down design-automation approach is expected to relieve biochip users from the burden of manual optimization of bioassays, time-consuming hardware design, and costly testing and maintenance procedures, and it will facilitate the integration of fluidic components with a microelectronic component in next-generation systems-on-chips (SOCs).

253 citations


Journal ArticleDOI
TL;DR: The authors used a hardware implementation of the advanced encryption standard to show that the traditional scan DFT scheme can compromise the secret key, and showed that by using secure-scan DFT, neither thesecret key nor the testability of the AES implementation is compromised.
Abstract: Scan-based design for test (DFT) is a powerful testing scheme, but it can be used to retrieve the secrets stored in a crypto chip, thus compromising its security. On one hand, sacrificing the security for testability by using a traditional scan-based DFT restricts its use in privacy sensitive applications. On the other hand, sacrificing the testability for security by abandoning the scan-based DFT hurts the product quality. The security of a crypto chip comes from the small secret key stored in a few registers, and the testability of a crypto chip comes from the data path and control path implementing the crypto algorithm. Based on this key observation, the authors propose a novel scan DFT architecture called secure scan that maintains the high test quality of traditional scan DFT without compromising the security. They used a hardware implementation of the advanced encryption standard to show that the traditional scan DFT scheme can compromise the secret key. They then showed that by using secure-scan DFT, neither the secret key nor the testability of the AES implementation is compromised

231 citations


Journal ArticleDOI
TL;DR: An automated static approach for optimizing bit widths of fixed-point feedforward designs with guaranteed accuracy, called MiniBit, is presented and is demonstrated with polynomial approximation, RGB-to-YCbCr conversion, matrix multiplication, B-splines, and discrete cosine transform placed and routed on a Xilinx Virtex-4 FPGA.
Abstract: An automated static approach for optimizing bit widths of fixed-point feedforward designs with guaranteed accuracy, called MiniBit, is presented. Methods to minimize both the integer and fraction parts of fixed-point signals with the aim of minimizing the circuit area are described. For range analysis, the technique in this paper identifies the number of integer bits necessary to meet range requirements. For precision analysis, a semianalytical approach with analytical error models in conjunction with adaptive simulated annealing is employed to optimize the number of fraction bits. The analytical models make it possible to guarantee overflow/underflow protection and numerical accuracy for all inputs over the user-specified input intervals. Using a stream compiler for field-programmable gate arrays (FPGAs), the approach in this paper is demonstrated with polynomial approximation, RGB-to-YCbCr conversion, matrix multiplication, B-splines, and discrete cosine transform placed and routed on a Xilinx Virtex-4 FPGA. Improvements for a given design reduce the area and the latency by up to 26% and 12%, respectively, over a design using optimum uniform fraction bit widths. Studies show that MiniBit-optimized designs are within 1% of the area produced from the integer linear programming approach

226 citations


Journal ArticleDOI
TL;DR: In this paper, a set of algorithms are proposed to find the best instruction set extensions (ISEs) for a given application, based on a detailed analysis of the application code.
Abstract: In embedded computing, cost, power, and performance constraints call for the design of specialized processors, rather than for the use of the existing off-the-shelf solutions. While the design of these application-specific CPUs could be tackled from scratch, a cheaper and more effective option is that of extending the existing processors and toolchains. Extensibility is indeed a feature now offered in real designs, e.g., by processors such as Tensilica Xtensa [T. R. Halfhill, Microprocess Rep., 2003], ARC ARCtangent [T. R. Halfhill, Microprocess Rep., 2000], STMicroelectronics ST200 [P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Proc. 27th Annu. Int. Symp. Computer Architecture, 2000, p. 203], and MIPS CorExtend [T. R. Halfhill, Microprocess Rep., 2003]. While all these processors provide development environments with simulation capabilities for evaluating efficiently hand-crafted solutions, the tools to identify automatically the best processor configuration for a given application are less common. In particular, solutions to choose specialized instruction-set extensions (ISEs) have been investigated in the past years but are still seldom part of commercial toolchains. This paper provides a formal methodology and a set of algorithms that help address the problem. It proposes exact algorithms to derive optimal ISEs; exact identification of a single ISE is applicable to basic blocks of up to 1500 assembler-like instructions. This paper also introduces approximate methods that can process basic blocks of larger size. Results show that the described algorithms find solutions close to those that a designer would obtain by a detailed study of the application code. Both heuristic and exact algorithms find ISEs able to speed up unextended processors up to 5.0x. State-of-the-art comparisons show that the presented algorithms outperform existing ones by up to 2.6x

212 citations


Journal ArticleDOI
TL;DR: A novel system-level buffer planning algorithm that can be used to customize the router design in networks-on-chip (NoCs) is presented, which automatically assigns the buffer depth for each input channel, in different routers across the chip, such that the overall performance is maximized.
Abstract: In this paper, a novel system-level buffer planning algorithm that can be used to customize the router design in networks-on-chip (NoCs) is presented. More precisely, given the traffic characteristics of the target application and the total budget of the available buffering space, the proposed algorithm automatically assigns the buffer depth for each input channel, in different routers across the chip, such that the overall performance is maximized. This is in deep contrast with the uniform assignment of buffering resources (currently used in NoC design), which can significantly degrade the overall system performance. Indeed, the experimental results show that while the proposed algorithm is very fast, significant performance improvements can be achieved compared to the uniform buffer allocation. For instance, for a complex audio/video application, about 80% savings in buffering resources, can be achieved by smart buffer allocation using the proposed algorithm

205 citations


Journal ArticleDOI
TL;DR: This paper proves the feasibility and effectiveness of the proposed approach to desynchronization by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architecture.
Abstract: Asynchronous implementation techniques, which measure logic delays at runtime and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst case delays at design time and constrain the clock cycle accordingly. Desynchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus, permitting widespread adoption of asynchronicity without requiring special design skills or tools. In this paper, different protocols for desynchronization are first studied, and their correctness is formally proven using techniques originally developed for distributed deployment of synchronous language specifications. A taxonomy of existing protocols for asynchronous latch controllers, covering, in particular, the four-phase handshake protocols devised in the literature for micropipelines, is also provided. A new controller that exhibits provably maximal concurrency is then proposed, and the performance of desynchronized circuits is analyzed with respect to the original synchronous optimized implementation. Finally, this paper proves the feasibility and effectiveness of the proposed approach by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architecture

Journal ArticleDOI
TL;DR: This paper presents an efficient circuit-compatible RLC model for metallic SWCNTs, and analyzes the impact of SWC NTs on the performance of ultrascaled digital very large scale integration (VLSI) design.
Abstract: Semiconducting carbon nanotubes (CNTs) have gained immense popularity as possible successors to silicon as the channel material for ultrahigh-performance field-effect transistors (FETs). On the other hand, their metallic counterparts have often been regarded as ideal interconnects for future technology generations. Owing to their high current densities and increased reliability, metallic single-walled CNTs (SWCNTs) have been subjects of fundamental research, both in theory, as well as experiments. Metallic CNTs have been modeled for radio-frequency (RF) applications using a transmission-line model. In this paper, we present an efficient circuit-compatible RLC model for metallic SWCNTs, and analyze the impact of SWCNTs on the performance of ultrascaled digital very large scale integration (VLSI) design.

Journal ArticleDOI
TL;DR: Results indicate that common (capacitive) noise-avoidance techniques can behave quite differently when both capacitive and inductive coupling are considered together, and can be applied to investigate the impact of various physical-design optimizations on total RLC coupled noise.
Abstract: At current operating frequencies, inductive-coupling effects can be significant and should be included for accurate crosstalk-noise analysis. In this paper, an analytical framework to model crosstalk noise in coupled RLC interconnects is presented. The proposed model is based on transmission-line theory and captures high-frequency effects in on-chip interconnects. The new model is generic in nature and can be applied to asymmetric driver-and-line configurations for aggressor and victim wires. The model is compared against SPICE simulations and is shown to capture both the waveform shape and peak noise accurately. Over a large set of random test cases, the average error in noise-peak estimation is approximately 6.5%. A key feature of the new model is that its derivation and form enables physical insight into the total coupling-noise-waveform shape and its dependence on relevant physical-design parameters. Due to its simplicity and physical nature, the proposed model can be applied to investigate the impact of various physical-design optimizations (e.g., wire sizing and spacing, shield insertion) on total RLC coupled noise. The effectiveness of various existing noise-reduction techniques in the presence of mutual-inductance coupling is studied here. The obtained results indicate that common (capacitive) noise-avoidance techniques can behave quite differently when both capacitive and inductive coupling are considered together.

Journal ArticleDOI
TL;DR: A polynomial-time algorithm for coordinating droplet movement under such hardware limitations is developed and described, and a layout-based system that can be rapidly reconfigured for new biochemical analyses is introduced.
Abstract: This paper describes a computational approach to designing a digital microfluidic system (DMFS) that can be rapidly reconfigured for new biochemical analyses. Such a “lab-on-a-chip” system for biochemical analysis, based on electrowetting or dielectrophoresis, must coordinate the motions of discrete droplets or biological cells using a planar array of electrodes. The authors have earlier introduced a layout-based system and demonstrated its flexibility through simulation, including the system's ability to perform multiple assays simultaneously. Since array-layout design and droplet-routing strategies are closely related in such a DMFS, their goal is to provide designers with algorithms that enable rapid simulation and control of these DMFS devices. In this paper, the effects of variations in the basic array-layout design, droplet-routing control algorithms, and droplet spacing on system performance are characterized. DMFS arrays with hardware limited row-column addressing are considered, and a polynomial-time algorithm for coordinating droplet movement under such hardware limitations is developed. To demonstrate the capabilities of our system, we describe example scenarios, including dilution control and minimalist layouts, in which our system can be successfully applied.

Journal ArticleDOI
TL;DR: Measurement-based experimental results have demonstrated that the secure digital design flow is a functional technique to thwart side-channel power analysis, and successfully protects a prototype Advanced Encryption Standard (AES) IC fabricated in an 0.18-mum CMOS.
Abstract: Small embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-called side-channel attacks (SCAs). The attacker can gain information by monitoring the power consumption, execution time, electromagnetic radiation, and other information leaked by the switching behavior of digital complementary metal-oxide-semiconductor (CMOS) gates. This paper presents a digital very large scale integrated (VLSI) design flow to create secure power-analysis-attack-resistant ICs. The design flow starts from a normal design in a hardware description language such as very-high-speed integrated circuit (VHSIC) hardware description language (VHDL) or Verilog and provides a direct path to an SCA-resistant layout. Instead of a full custom layout or an iterative design process with extensive simulations, a few key modifications are incorporated in a regular synchronous CMOS standard cell design flow. The basis for power analysis attack resistance is discussed. This paper describes how to adjust the library databases such that the regular single-ended static CMOS standard cells implement a dynamic and differential logic style and such that 20 000+ differential nets can be routed in parallel. This paper also explains how to modify the constraints and rules files for the synthesis, place, and differential route procedures. Measurement-based experimental results have demonstrated that the secure digital design flow is a functional technique to thwart side-channel power analysis. It successfully protects a prototype Advanced Encryption Standard (AES) IC fabricated in an 0.18-mum CMOS

Journal ArticleDOI
TL;DR: Thermal vias are assigned to specific areas of a 3-D IC and used to adjust their effective-thermal conductivities, and the method efficiently achieves its thermal objective while minimizing the thermal-via utilization.
Abstract: As thermal problems become more evident, new physical design paradigms and tools are needed to alleviate them. Incorporating thermal vias into integrated circuits (ICs) is a promising way of mitigating thermal issues by lowering the effective-thermal resistance of the chip. However, thermal vias take up valuable routing space, and therefore, algorithms are needed to minimize their usage while placing them in areas where they would make the greatest impact. With the developing technology of three-dimensional integrated circuits (3-D ICs), thermal problems are expected to be more prominent, and thermal vias can have a larger impact on them than in traditional two-dimensional integrated circuits (2-D ICs). In this paper, thermal vias are assigned to specific areas of a 3-D IC and used to adjust their effective-thermal conductivities. The method, which uses finite-element analysis (FEA) to calculate temperatures quickly during each iteration, makes iterative adjustments to these thermal conductivities in order to achieve a desired thermal objective and is general enough to handle a number of different thermal objectives such as achieving a desired maximum operating temperature. With this method, 49% fewer thermal vias are needed to obtain a 47% reduction in the maximum temperatures, and 57% fewer thermal vias are needed to obtain a 68% reduction in the maximum thermal gradients than would be needed using a uniform distribution of thermal vias to obtain these same thermal improvements. Similar results were seen for other thermal objectives, and the method efficiently achieves its thermal objective while minimizing the thermal-via utilization.

Journal ArticleDOI
TL;DR: This paper presents general hardware-independent models and algorithms to automate the operation of droplet-based microfluidic systems and an approach toward automatic mapping of a biochemical analysis task onto a DMFS is investigated.
Abstract: This paper presents general hardware-independent models and algorithms to automate the operation of droplet-based microfluidic systems. In these systems, discrete liquid volumes of typically less than 1 $muhboxl$ are transported across a planar array by dielectrophoretic or electrowetting effects for biochemical analysis. Unlike in systems based on continuous flow through channels, valves, and pumps, the droplet paths can be reconfigured on demand and even in real time. Algorithms that generate efficient sequences of control signals for moving one or many droplets from start to goal positions, subject to constraints such as specific features and obstacles on the array surface or limitations in the control circuitry, are developed. In addition, an approach toward automatic mapping of a biochemical analysis task onto a DMFS is investigated. Achieving optimality in these algorithms can be prohibitive for large-scale configurations because of the high asymptotic complexity of coordinating multiple moving droplets. Instead, these algorithms achieve a compromise between high runtime efficiency and a more limited nonglobal optimality in the generated control sequences.

Journal ArticleDOI
TL;DR: Results show that the SER of logic is a much stronger function of timing parameters than the supply voltage, and an "SER peaking" phenomenon in multipliers is observed where the center bits have an SER that is in order of magnitude greater than that of LSBs and MSBs.
Abstract: We present a soft-error-rate analysis (SERA) methodology for combinational and memory circuits. SERA is based on a modeling and analysis approach that employs a judicious mix of probability theory, circuit simulation, graph theory, and fault simulation. SERA achieves five orders of magnitude speedup over Monte Carlo-based simulation approaches with less than 5% error. Dependence of the soft-error rate (SER) of combinational logic circuits on a supply voltage, clock period, latching window, circuit topology, and input vector is explicitly captured and studied for a typical 0.18-mum CMOS process. Results show that the SER of logic is a much stronger function of timing parameters than the supply voltage. Also, an SER peaking phenomenon in multipliers is observed where the center bits have an SER that are orders of magnitude greater than those of the LSBs and the MSBs. An increase of up to 25% in the SER for multiplier circuits of various sizes has been observed as technology scales from 0.18 to 0.13 mum

Journal ArticleDOI
TL;DR: This floorplanner uses B/sup */-tree floorplan representation based on fast three-stage simulated annealing (SA) scheme called Fast-SA and obtains much smaller dead space for the floorplanning with hard/soft macro blocks, compared with the most recent work.
Abstract: Unlike classical floorplanning that usually handles only block packing to minimize silicon area, modern very large scale integration (VLSI) floorplanning typically needs to pack blocks within a fixed die (outline), and additionally considers the packing with block positions and interconnect constraints. Floorplanning with bus planning is one of the most challenging modern floorplanning problems because it needs to consider the constraints with interconnect and block positions simultaneously. In this paper, the authors study two types of modern floorplanning problems: 1) fixed-outline floorplanning and 2) bus-driven floorplanning (BDF). This floorplanner uses B/sup */-tree floorplan representation based on fast three-stage simulated annealing (SA) scheme called Fast-SA. For fixed-outline floorplanning, the authors present an adaptive Fast-SA that can dynamically change the weights in the cost function to optimize the wirelength under the outline constraint. Experimental results show that this floorplanner can achieve 100% success rates efficiently for fixed-outline floorplanning with various aspect ratios. For the BDF, the authors explore the feasibility conditions of the B/sup */-tree with the bus constraints, and develop a BDF algorithm based on the conditions and Fast-SA. Experimental results show that this floorplanner obtains much smaller dead space for the floorplanning with hard/soft macro blocks, compared with the most recent work. In particular, this floorplanner is more efficient than the previous works.

Journal ArticleDOI
TL;DR: The authors propose to integrate min-cut placement with fixed-outline floorplanning to solve the more general placement problem, which includes cell placement,floorplanning, mixed-size placement, and achieving routability.
Abstract: Large macro blocks, predesigned datapaths, embedded memories, and analog blocks are increasingly used in application-specific integrated circuit (ASIC) designs. However, robust algorithms for large-scale placement of such designs have only recently been considered in the literature. Large macros can be handled by traditional floorplanning, but are harder to account for in min-cut and analytical placement. On the other hand, traditional floorplanning techniques do not scale to large numbers of objects, especially in terms of solution quality. The authors propose to integrate min-cut placement with fixed-outline floorplanning to solve the more general placement problem, which includes cell placement, floorplanning, mixed-size placement, and achieving routability. At every step of min-cut placement, either partitioning or wirelength-driven fixed-outline floorplanning is invoked. If the latter fails, the authors undo an earlier partitioning decision, merge adjacent placement regions, and refloorplan the larger region to find a legal placement for the macros. Empirically, this framework improves the scalability and quality of results for traditional wirelength-driven floorplanning. It has been validated on recent designs with embedded memories and accounts for routability. Additionally, the authors propose that free-shape rectilinear floorplanning can be used with rough module-area estimates before logic synthesis

Journal ArticleDOI
TL;DR: In this paper, an iterative technology-mapping tool called IMap is presented, it supports depth-oriented, area- oriented, and duplication-free mapping modes, and the edge-delay model is used throughout.
Abstract: In this paper, an iterative technology-mapping tool called IMap is presented. It supports depth-oriented (area is a secondary objective), area-oriented (depth is a secondary objective), and duplication-free mapping modes. The edge-delay model (as opposed to the more commonly used unit-delay model) is used throughout. Two new heuristics are used to obtain area reductions over previously published methods. The first heuristic predicts the effects of various mapping decisions on the area of the final solution, and the second heuristic bounds the depth of the mapping solution at each node. In depth-oriented mode, when targeting five lookup tables (LUTs), IMap obtains depth optimal solutions that are 44.4%, 19.4%, and 5% smaller than those produced by FlowMap, CutMap, and DAOMap, respectively. Targeting the same LUT size in area-oriented mode, IMap obtains solutions that are 17.5% and 9.4% smaller than those produced by duplication-free mapping and ZMap, respectively. IMap is also shown to be highly efficient. Runtime improvements of between 2.3times and 82times are obtained over existing algorithms when targeting five LUTs. Area and runtime results comparing IMap to the other mappers when targeting four and six LUTs are also presented

Journal ArticleDOI
TL;DR: A framework that uses BDDs and ADDs and enables the analysis of combinational circuit reliability from different aspects, e.g., output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns is presented.
Abstract: Due to the shrinking of feature size and the significant reduction in noise margins, nanoscale circuits have become more susceptible to manufacturing defects, noise-related transient faults, and interference from radiation. Traditionally, soft errors have been a much greater concern in memories than in logic circuits. However, as technology continues to scale, logic circuits are becoming more susceptible to soft errors than memories. To estimate the susceptibility to errors in combinational logic, the use of binary decision diagrams (BDDs) and algebraic decision diagrams (ADDs) for the unified symbolic analysis of circuit reliability is proposed. A framework that uses BDDs and ADDs and enables the analysis of combinational circuit reliability from different aspects, e.g., output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns, is presented. This is demonstrated by the set of experimental results, which show that the mean output error susceptibility can vary from less then 0.1% for large circuits and short glitches (20% cycle time) to about 30% for very small circuits and long enough glitches (50% cycle time)

Journal ArticleDOI
TL;DR: An analytical model is proposed to compute the fringe capacitance between two nonoverlapping interconnects in different layers using a conformal mapping technique and significantly reduces the computational complexity and time in calculating the interconnect capacitances.
Abstract: An analytical model is proposed to compute the fringe capacitance between two nonoverlapping interconnects in different layers using a conformal mapping technique. With this technique, electric field lines are geometrically approximated to separately model the different capacitive components. These components are finally combined to obtain the equivalent fringe capacitance. Using the aforementioned technique, a model was developed to compute the capacitances of typical interconnect geometries using technology-dependent parameters. The proposed model closely matches with FASTCAP results and significantly reduces the computational complexity and time in calculating the interconnect capacitances

Journal ArticleDOI
TL;DR: This paper surveys progress in the field with self-contained expositions of fundamental early results, an account of the recent advances, and some new classifications, as well as a discussion of the major remaining open problems in the complexity of logic minimization.
Abstract: The complexity of two-level logic minimization is a topic of interest to both computer-aided design (CAD) specialists and computer science theoreticians. In the logic synthesis community, two-level logic minimization forms the foundation for more complex optimization procedures that have significant real-world impact. At the same time, the computational complexity of two-level logic minimization has posed challenges since the beginning of the field in the 1960s; indeed, some central questions have been resolved only within the last few years, and others remain open. This recent activity has classified some logic optimization problems of high practical relevance, such as finding the minimal sum-of-products (SOP) form and maximal term expansion and reduction. This paper surveys progress in the field with self-contained expositions of fundamental early results, an account of the recent advances, and some new classifications. It includes an introduction to the relevant concepts and terminology from computational complexity, as well a discussion of the major remaining open problems in the complexity of logic minimization

Journal ArticleDOI
TL;DR: The authors' second approach to leakage optimization consists of altering the routing step of the FPGA computer-aided design (CAD) flow to encourage more frequent use of routing resources that have low leakage power consumptions and allows active leakage to be further reduced, without compromising design performance.
Abstract: Active leakage power dissipation is considered in field-programmable gate arrays (FPGAs) and two "no cost" approaches for active leakage reduction are presented. It is well known that the leakage power consumed by a digital CMOS circuit depends strongly on the state of its inputs. The authors' first leakage reduction technique leverages a fundamental property of basic FPGA logic elements [look-up tables (LUTs)] that allows a logic signal in an FPGA design to be interchanged with its complemented form without any area or delay penalty. This property is applied to select polarities for logic signals so that FPGA hardware structures spend the majority of time in low-leakage states. In an experimental study, active leakage power is optimized in circuits mapped into a state-of-the-art 90-nm commercial FPGA. Results show that the proposed approach reduces active leakage by 25%, on average. The authors' second approach to leakage optimization consists of altering the routing step of the FPGA computer-aided design (CAD) flow to encourage more frequent use of routing resources that have low leakage power consumptions. Such "leakage-aware routing" allows active leakage to be further reduced, without compromising design performance. Combined, the two approaches offer a total active leakage power reduction of 30%, on average.

Journal ArticleDOI
TL;DR: This paper addresses the problem of DVS in the presence of task synchronization by analyzing dynamic voltage scaling of a processor based on slowdown factors and static and dynamic priority scheduling.
Abstract: Slowdown factors determine the extent of slowdown that a computing system can experience based on functional and performance requirements. Dynamic voltage scaling (DVS) of a processor based on slowdown factors can lead to considerable energy savings. This paper addresses the problem of DVS in the presence of task synchronization. Tasks synchronize to enforce mutually exclusive access to the shared resources and can be blocked by lower priority tasks. Task slowdown factors that guarantee meeting all task deadlines are computed. Both static and dynamic priority scheduling viz. rate monotonic (RM) scheduling and earliest deadline first (EDF) scheduling, respectively, are studied

Journal ArticleDOI
TL;DR: The authors apply gate-length biasing only to those devices that do not appear in critical paths, thus assuring zero or negligible degradation in chip performance, and show results that reduce leakage by up to 41%, which may lead to substantial improvements in the manufacturing yield and the product cost.
Abstract: Leakage power has become one of the most critical design concerns for the system level chip designer. While lowered supplies (and consequently, lowered threshold voltage) and aggressive clock gating can achieve dynamic power reduction, these techniques increase the leakage power and, therefore, causes its share of total power to increase. Manufacturers face the additional challenge of leakage variability: Recent data indicate that the leakage of microprocessor chips from a single 180-nm wafer can vary by as much as 20/spl times/. Previously proposed techniques for leakage-power reduction include the use of multiple supply and gate threshold voltages, and the assignment of input values to inactive gates, such that leakage is minimized. The additional design space afforded by the biasing of device gate lengths to reduce chip leakage power and its variability is studied. It is well known that leakage power decreases exponentially and delay increases linearly with increasing gate length. Thus, it is possible to increase gate length only marginally to take advantage of the exponential leakage reduction, while impairing performance only linearly. From a design-flow standpoint, the use of only slight increases in gate length preserves both pin and layout compatibility; therefore, the authors' technique can be applied as a postlayout enhancement step. The authors apply gate-length biasing only to those devices that do not appear in critical paths, thus assuring zero or negligible degradation in chip performance. To highlight the value of the technique, the multithreshold voltage technique, which is widely used for leakage reduction, is first applied and then gate-length biasing is used to show further reduction in leakage. Experimental results show that gate-length biasing reduces leakage by 24%-38% for the most commonly used cells, while incurring delay penalties of under 10%. Selective gate-length biasing at the circuit level reduces circuit leakage by up to 30% with no delay penalty. Leakage variability is reduced significantly by up to 41%, which may lead to substantial improvements in the manufacturing yield and the product cost. The use of gate-length biasing for leakage optimization of cell instances is also assessed, in which: 1) not all timing arcs are timing critical and/or 2) the rise and fall transitions are not both timing critical at the same time.

Journal ArticleDOI
TL;DR: This paper presents a new mapper aimed at mitigating structural bias, based on a simplified cut-based Boolean matching algorithm, and using the speed afforded by this simplification two ideas to reduce structural bias are explored.
Abstract: Technology mapping, based on directed acyclic graph covering, suffers from the problem of structural bias: The structure of the mapped netlist depends strongly on the subject graph. In this paper, the authors present a new mapper aimed at mitigating structural bias. It is based on a simplified cut-based Boolean-matching algorithm, and using the speed afforded by this simplification, they explore two ideas to reduce structural bias. The first, called lossless synthesis, leverages recent advances in structure-based combinational-equivalence checking to combine the different networks seen during technology-independent synthesis into a single network with choices in a scalable manner. They show how cut-based mapping extends naturally to handle such networks with choices. The second idea is to combine several library gates into a single gate (called a supergate) in order to make the matching process less local. They show how supergates help address the structural-bias problem and how they fit naturally into the cut-based Boolean-matching scheme. An implementation based on these ideas significantly outperforms state-of-the-art mappers in terms of delay, area, and run-time on academic and industrial benchmarks

Journal ArticleDOI
TL;DR: Numerical results based on real-life checkpointing data and processor data sheets show that the proposed approach significantly reduces power consumption and guarantees timely task completion in the presence of faults.
Abstract: This paper investigates an integrated approach for achieving fault tolerance and energy savings in real-time embedded systems. Fault tolerance is achieved via checkpointing, and energy is saved using dynamic voltage scaling (DVS). The authors present a feasibility analysis for checkpointing schemes for a constant processor speed as well as for variable processor speeds. DVS is then carried out on the basis of the feasibility analysis. The authors incorporate important practical issues such as faults during checkpointing, rollback recovery time, memory access time, and energy needed for checkpointing, as well as DVS and context switching overhead. Numerical results based on real-life checkpointing data and processor data sheets show that compared to fault-oblivious methods, the proposed approach significantly reduces power consumption and guarantees timely task completion in the presence of faults.

Journal ArticleDOI
TL;DR: The proposed algorithm is implemented in a procedure called orthogonal polynomial expansions for response analysis (OPERA), and results from OPERA simulations on a number of design test cases match well with those from the classical Monte Carlo simulation program with integrated circuits emphasis (SPICE) and from perturbation methods.
Abstract: Variations in the interconnect geometry of nanoscale ICs translate to variations in their performance. The resulting diminished accuracy in the estimates of performance at the design stage can lead to a significant reduction in the parametric yield. Thus, determining an accurate statistical description (e.g., moments, distribution, etc.) of the interconnect's response is critical for designers. In the presence of significant variations, device or interconnect model parameters such as wire resistance, capacitance, etc., need to modeled as random variables or as spatial random processes. The corner-based analysis is not accurate, and simulations based on sampling require long computation times due to the large number of parameters or random variables. This study proposes an efficient method of computing the stochastic response of interconnects. The technique models the stochastic response in an infinite dimensional Hilbert space in terms of orthogonal polynomial expansions. A finite representation is obtained by projecting the infinite series representation onto a finite dimensional subspace. The advantage of the proposed method is that it provides a functional representation of the response of the system in terms of the random variables that represent the process variations. The proposed algorithm has been implemented in a procedure called orthogonal polynomial expansions for response analysis (OPERA). Results from OPERA simulations on a number of design test cases match well with those from the classical Monte Carlo simulation program with integrated circuits emphasis (SPICE) and from perturbation methods. Additionally, OPERA shows good computational efficiency: speedup of up to two orders of magnitude have been observed over Monte Carlo SPICE simulations