Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2010"
TL;DR: A new simulation program with integrated circuit emphasis macromodel of the recently physically implemented memristor, which provides a solution for the modeling of boundary conditions following exactly the published mathematical model of HP Labs.
Abstract: In this paper, we present a new simulation program with integrated circuit emphasis macromodel of the recently physically implemented memristor. This macromodel could be a powerful tool for electrical engineers to design and experiment new circuits with memristors. Our simulation results show similar behavior to the already published measurements of the physical implementation. Our approach provides a solution for the modeling of boundary conditions following exactly the published mathematical model of HP Labs. The functionality of our macromodel is demonstrated with computer simulations. The source code of our macromodel is provided.
TL;DR: In this article, compact models for memristors are developed based on the fundamental constitutive relationships between charge and flux of the memristor, with a few simple steps, and implemented in circuit simulators, including SPICE, Verilog-A, and Spectre.
Abstract: This paper introduces compact models for memristors. The models are developed based on the fundamental constitutive relationships between charge and flux of memristors. The modeling process, with a few simple steps, is introduced. For memristors with limited resistance ranges, a simple method to find their constitutive relationships is discussed, and examples of compact models are shown for both current-controlled and voltage-controlled memristors. Our models satisfy all of the memristor properties such as frequency dependent hysteresis behaviors and also unique boundary assurance to simulate memristors whether they behave memristively or resistively. Our models are implementable in circuit simulators, including SPICE, Verilog-A, and Spectre.
TL;DR: This paper proposes a new approach which is based on a system-theoretic tool, the Loewner matrix pencil constructed in the context of tangential interpolation, which is fast, accurate, they build low order models and are especially designed for a large number of terminals.
Abstract: This paper addresses the problem of modeling systems from measurements of their frequency response. For multiport devices, currently available techniques are expensive. We propose a new approach which is based on a system-theoretic tool, the Loewner matrix pencil constructed in the context of tangential interpolation. Several implementations are presented. They are fast, accurate, they build low order models and are especially designed for a large number of terminals. Moreover, they identify the underlying system, rather than merely fitting the measurements. The numerical results show that our algorithms yield smaller models in less time, when compared to vector fitting.
TL;DR: A detailed experimental study of how one celebrated technique from that domain-quasi-Monte Carlo (QMC) simulation-can be adapted effectively for fast statistical circuit analysis, and rigorous theoretical arguments that support and explain this superior performance of QMC.
Abstract: At the nanoscale, no circuit parameters are truly deterministic; most quantities of practical interest present themselves as probability distributions. Thus, Monte Carlo techniques comprise the strategy of choice for statistical circuit analysis. There are many challenges in applying these techniques efficiently: circuit size, nonlinearity, simulation time, and required accuracy often conspire to make Monte Carlo analysis expensive and slow. Are we-the integrated circuit community-alone in facing such problems? As it turns out, the answer is “no.” Problems in computational finance share many of these characteristics: high dimensionality, profound nonlinearity, stringent accuracy requirements, and expensive sample evaluation. We perform a detailed experimental study of how one celebrated technique from that domain-quasi-Monte Carlo (QMC) simulation-can be adapted effectively for fast statistical circuit analysis. In contrast to traditional pseudorandom Monte Carlo sampling, QMC uses a (shorter) sequence of deterministically chosen sample points. We perform rigorous comparisons with both Monte Carlo and Latin hypercube sampling across a set of digital and analog circuits, in 90 and 45 nm technologies, varying in size from 30 to 400 devices. We consistently see superior performance from QMC, giving 2× to 8× speedup over conventional Monte Carlo for roughly 1% accuracy levels. We present rigorous theoretical arguments that support and explain this superior performance of QMC. The arguments also reveal insights regarding the (low) latent dimensionality of these circuit problems; for example, we observe that over half of the variance in our test circuits is from unidimensional behavior. This analysis provides quantitative support for recent enthusiasm in dimensionality reduction of circuit problems.
TL;DR: Two new reconfigurable architectures of low complexity FIR filters are proposed, namely constant shifts method and programmable shifts method, which are capable of operating for different wordlength filter coefficients without any overhead in the hardware circuitry.
Abstract: Reconfigurability and low complexity are the two key requirements of finite impulse response (FIR) filters employed in multistandard wireless communication systems. In this paper, two new reconfigurable architectures of low complexity FIR filters are proposed, namely constant shifts method and programmable shifts method. The proposed FIR filter architecture is capable of operating for different wordlength filter coefficients without any overhead in the hardware circuitry. We show that dynamically reconfigurable filters can be efficiently implemented by using common subexpression elimination algorithms. The proposed architectures have been implemented and tested on Virtex 2v3000ff1152-4 field-programmable gate array and synthesized on 0.18 ?m complementary metal-oxide-semiconductor technology with a precision of 16 bits. Design examples show that the proposed architectures offer good area and power reductions and speed improvement compared to the best existing reconfigurable FIR filter implementations in the literature.
TL;DR: The proposed model can be used not only to obtain fast and accurate performance estimates, but also to guide the NoC design process within an optimization loop.
Abstract: Networks-on-chip (NoCs) have recently emerged as a scalable alternative to classical bus and point-to-point architectures. To date, performance evaluation of NoC designs is largely based on simulation which, besides being extremely slow, provides little insight on how different design parameters affect the actual network performance. Therefore, it is practically impossible to use simulation for optimization purposes. In this paper, we present a mathematical model for on-chip routers and utilize this new model for NoC performance analysis. The proposed model can be used not only to obtain fast and accurate performance estimates, but also to guide the NoC design process within an optimization loop. The accuracy of our approach and its practical use is illustrated through extensive simulation results.
TL;DR: A dilution/mixing algorithm is presented that significantly reduces the production of waste droplets and also the total number of input droplets compared to earlier methods and always yields nonnegative savings in the number of waste Droplets.
Abstract: The recent emergence of lab-on-a-chip (LoC) technology has led to a paradigm shift in many healthcare-related application areas, e.g., point-of-care clinical diagnostics, high-throughput sequencing, and proteomics. A promising category of LoCs is digital microfluidic (DMF)-based biochips, in which nanoliter-volume fluid droplets are manipulated on a 2-D electrode array. A key challenge in designing such chips and mapping lab-bench protocols to a LoC is to carry out the dilution process of biochemical samples efficiently. As an optimization and automation technique, we present a dilution/mixing algorithm that significantly reduces the production of waste droplets. This algorithm takes O(n) time to compute at most n sequential mix/split operations required to achieve any given target concentration with an error in concentration factor less than [1/(2n)]. To implement the algorithm, we design an architectural layout of a DMF-based LoC consisting of two O(n)-size rotary mixers and O(n) storage electrodes. Simulation results show that the proposed technique always yields nonnegative savings in the number of waste droplets and also in the total number of input droplets compared to earlier methods.
TL;DR: A novel L0-norm regularization method is adapted to address the modeling challenge of aggressive scaling of integrated circuit technology and achieves up to 25× speedup compared to the traditional least-squares fitting method.
Abstract: The aggressive scaling of integrated circuit technology results in high-dimensional, strongly-nonlinear performance variability that cannot be efficiently captured by traditional modeling techniques. In this paper, we adapt a novel L0-norm regularization method to address this modeling challenge. Our goal is to solve a large number of (e.g., 104-106) model coefficients from a small set of (e.g., 102-103) sampling points without over-fitting. This is facilitated by exploiting the underlying sparsity of model coefficients. Namely, although numerous basis functions are needed to span the high-dimensional, strongly-nonlinear variation space, only a few of them play an important role for a given performance of interest. An efficient orthogonal matching pursuit (OMP) algorithm is applied to automatically select these important basis functions based on a limited number of simulation samples. Several circuit examples designed in a commercial 65 nm process demonstrate that OMP achieves up to 25× speedup compared to the traditional least-squares fitting method.
TL;DR: Experimental results demonstrate that the proposed supervised learning based power management technique ensures system-wide energy savings under rapidly and widely varying workloads.
Abstract: This paper presents a supervised learning based power management framework for a multi-processor system, where a power manager (PM) learns to predict the system performance state from some readily available input features (such as the occupancy state of a global service queue) and then uses this predicted state to look up the optimal power management action (eg, voltage-frequency setting) from a precomputed policy table The motivation for utilizing supervised learning in the form of a Bayesian classifier is to reduce the overhead of the PM which has to repetitively determine and assign voltage-frequency settings for each processor core in the system Experimental results demonstrate that the proposed supervised learning based power management technique ensures system-wide energy savings under rapidly and widely varying workloads
TL;DR: This paper claims that a far superior result can be achieved by moving the design-to-manufacturing interface from design rules to a higher level of abstraction based on a defined set of pre-characterized layout templates and demonstrates how this methodology can simplify optical proximity correction and lithography processes for sub-32 nm technology nodes.
Abstract: The financial backbone of the semiconductor industry is based on doubling the functional density of integrated circuits every two years at fixed wafer costs and die yields. The increasing demands for 'computational' rather than 'physical' lithography to achieve the aggressive density targets, along with the complex device-engineering solutions needed to maintain the power density objectives, have caused a rapid escalation in systematic yield limiters that threaten scaling. Specifically, the traditional contract between design and manufacturing based solely on design rules is no longer sufficient to guarantee functional silicon and instead requires a convoluted set of restrictions that force complex modifications to the already costly design flows. In this paper, we claim that a far superior result can be achieved by moving the design-to-manufacturing interface from design rules to a higher level of abstraction based on a defined set of pre-characterized layout templates. We will demonstrate how this methodology can simplify optical proximity correction and lithography processes for sub-32 nm technology nodes, along with various digital block design examples for synthesized intellectual property (IP) cores. Furthermore, with a cost-per-good-die analysis we will show that this methodology will extend economical scaling to sub-32 nm technology nodes.
TL;DR: This paper presents a novel formulation of the debugging problem using MaxSAT to improve the performance and applicability of automated debuggers, and introduces two performance improvements to further reduce the time required to find all error sources within the design by an order of magnitude.
Abstract: As contemporary very large scale integration designs grow in complexity, design debugging has rapidly established itself as one of the largest bottlenecks in the design cycle today. Automated debug solutions such as those based on Boolean satisfiability (SAT) enable engineers to reduce the debug effort by localizing possible error sources in the design. Unfortunately, adaptation of these techniques to industrial designs is still limited by the performance and capacity of the underlying engines. This paper presents a novel formulation of the debugging problem using MaxSAT to improve the performance and applicability of automated debuggers. Our technique not only identifies errors in the design but also indicates when the bug is excited in the error trace. MaxSAT allows for a simpler formulation of the debugging problem, reducing the problem size by 80% compared to a conventional SAT-based technique. Empirical results demonstrate the effectiveness of the proposed formulation as run-time improvements of 4.5 × are observed on average. This paper introduces two performance improvements to further reduce the time required to find all error sources within the design by an order of magnitude.
TL;DR: This work proposes efficient algorithms for three types of analysis of large resistor networks: computation of path resistances; computation of resistor currents; and 3) reduction of resistor networks, which enable simulation of very large networks.
Abstract: Large resistor networks arise during the design of very-large-scale integration chips as a result of parasitic extraction and electro static discharge analysis. Simulating these large parasitic resistor networks is of vital importance, since it gives an insight into the functional and physical performance of the chip. However, due to the increasing amount of interconnect and metal layers, these networks may contain millions of resistors and nodes, making accurate simulation time consuming or even infeasible. We propose efficient algorithms for three types of analysis of large resistor networks: 1) computation of path resistances; 2) computation of resistor currents; and 3) reduction of resistor networks. The algorithms are exact, orders of magnitude faster than conventional approaches, and enable simulation of very large networks.
TL;DR: This paper proposes a simultaneous conflict and stitch minimization algorithm with an integer linear programming (ILP) formulation that can reduce 33% of stitches and remove conflicts by 87.6% compared with two phase greedy decomposition.
Abstract: Double patterning lithography (DPL) is considered as a most likely solution for 32 nm/22 nm technology. In DPL, the layout patterns are decomposed into two masks (colors), and manufactured through two exposures and etch steps. If the spacing between two features (polygons) is less than certain minimum coloring distance, they have to be assigned opposite colors. However, a proper coloring is not always feasible because two neighboring patterns within the minimum distance may be in the same mask due to complex pattern configurations. In that case, a feature may need to be split into two parts to resolve the conflict, resulting in stitch insertion which causes yield loss due to overlay and line-end effect. While previous layout decomposition approaches perform coloring and splitting separately, in this paper, we propose a simultaneous conflict and stitch minimization algorithm with an integer linear programming (ILP) formulation. Since ILP is in class NP-hard, the algorithm includes three speed-up techniques: (1) grid merging; (2) independent component computation; and (3) layout partition. In addition, our algorithm can be extended to handle design rules such as overlap margin and minimum width for practical use as well as off-grid layout. Our approach can reduce 33% of stitches and remove conflicts by 87.6% compared with two phase greedy decomposition.
TL;DR: The droplet-based “digital” microfluidic technology platform and emerging applications are described, and computer-aided design tools for simulation, synthesis and chip optimization are presented.
Abstract: Microfluidics-based biochips enable the precise control of nanoliter volumes of biochemical samples and reagents. They combine electronics with biology, and they integrate various bioassay operations, such as sample preparation, analysis, separation, and detection. Compared to conventional laboratory procedures, which are cumbersome and expensive, miniaturized biochips offer the advantages of higher sensitivity, lower cost due to smaller sample and reagent volumes, system integration, and less likelihood of human error. This paper first describes the droplet-based “digital” microfluidic technology platform and emerging applications. The physical principles underlying droplet actuation are next described. Finally, the paper presents computer-aided design tools for simulation, synthesis and chip optimization. These tools target modeling and simulation, scheduling, module placement, droplet routing, pin-constrained chip design, and testing.
TL;DR: This technique employs network calculus to first compute the equivalent service curve for an individual flow and then calculate its packet delay bound and derivation of a closed-form formula to compute a flow's delay bound under all-to-one gather communication.
Abstract: In network-on-chip (NoC), computing worst-case delay bounds for packet delivery is crucial for designing predictable systems but yet an intractable problem. This paper presents an analysis technique to derive per-flow communication delay bound. Based on a network contention model, this technique, which is topology independent, employs network calculus to first compute the equivalent service curve for an individual flow and then calculate its packet delay bound. To exemplify this method, this paper also presents the derivation of a closed-form formula to compute a flow's delay bound under all-to-one gather communication. Experimental results demonstrate that the theoretical bounds are correct and tight.
TL;DR: A cost estimation method is presented and a set of cost models that include wafer cost, 3-D bonding cost, package cost, and cooling cost are proposed that can help designers analyze the cost implication for3-D ICs during the design space exploration at the early stage.
Abstract: 3-D integration technology is emerging as an attractive alternative to increase the transistor count for future chips. The majority of the existing 3-D integrated circuit (IC) research is focused on the performance, power, density, and heterogeneous integration benefits offered by 3-D integration. All such advantages, however, ultimately have to translate into cost evaluation when a design strategy has to be decided. Consequently, system-level cost analysis at early design stages is imperative to decide on whether 3-D integration should be adopted. This paper presents a cost estimation method for 3-D ICs at early design stages and proposes a set of cost models that include wafer cost, 3-D bonding cost, package cost, and cooling cost. The proposed 3-D IC cost estimation method can help designers analyze the cost implication for 3-D ICs during the design space exploration at the early stage, and it enables a cost-driven 3-D IC design flow that can guide the design choice toward a cost-effective direction. Based on the proposed cost estimation method, this paper demonstrates two case studies that explore the cost benefits of 3-D integration for application-specific integrated circuit designs and many-core microprocessor designs style, respectively. Finally, this paper suggests the optimum partitioning strategy for future 3-D IC designs.
TL;DR: This work presents a design tool, SunFloor 3D, to synthesize application-specific 3D NoCs, and shows that the synthesized topologies result in large power and delay savings when compared to standard topologies.
Abstract: Three-dimensional integrated circuits (3D-ICs) are a promising approach to address the integration challenges faced by current systems on chips (SoCs) Designing an efficient network on chip (NoC) interconnect for a 3-D SoC that meets not only the application performance constraints but also the constraints imposed by the 3-D technology is a significant challenge In this paper, we present a design tool, SunFloor 3D, to synthesize application-specific 3-D NoCs The proposed tool determines the best NoC topology for the application, finds paths for the communication flows, assigns the network components to the 3-D layers, and places them in each layer We perform experiments on several SoC benchmarks and present a comparative study between 3-D and 2-D NoC designs Our studies show large improvements in interconnect power consumption (average of 38%) and delay (average of 13%) for the 3-D NoC when compared to the corresponding 2-D implementation Our studies also show that the synthesized topologies result in large power (average of 54%) and delay savings (average of 21%) when compared to standard topologies
TL;DR: The core idea of this approach is joint relaxation and restriction, which employs consistency relaxation and coupled bi-directional solution search, which can lead to about 22% less power dissipation subject to the same timing constraints.
Abstract: Gate sizing and threshold voltage (Vt) assignment are popular techniques for circuit timing and power optimization. Existing methods, by and large, are either sensitivity-driven heuristics or based on discretizing continuous optimization solutions. Sensitivity-driven heuristics are easily trapped in local optima and the discretization may be subject to remarkable errors. In this paper, we propose a systematic combinatorial approach for simultaneous gate sizing and Vt assignment. The core idea of this approach is joint relaxation and restriction, which employs consistency relaxation and coupled bi-directional solution search. The process of joint relaxation and restriction is conducted iteratively to systematically improve solutions. Our algorithm is compared with a state-of-the-art previous work on benchmark circuits. The results from our algorithm can lead to about 22% less power dissipation subject to the same timing constraints.
TL;DR: A run-time strategy for allocating application tasks to embedded multiprocessor systems- on-chip platforms where communication happens via the network-on-chip approach is proposed, which observes more than 70% communication energy savings compared to an arbitrary contiguous task allocation strategy.
Abstract: In this paper, we propose a run-time strategy for allocating application tasks to embedded multiprocessor systems-on-chip platforms where communication happens via the network-on-chip approach. As a novel contribution, we incorporate the user behavior information in the resource allocation process; this allows the system to better respond to real-time changes and to adapt dynamically to different user needs. Several algorithms are proposed for solving the task allocation problem while minimizing the communication energy consumption and network contention. When the user behavior is taken into consideration, we observe more than 70% communication energy savings (with negligible energy and run-time overhead) compared to an arbitrary contiguous task allocation strategy.
TL;DR: DeFer is presented-a fast, high-quality, scalable, and nonstochastic fixed-outline floorplanning algorithm that achieves the best success rate, the best wirelength, thebest runtime, and the best area on average compared with all other state-of-the-art floorplanners.
Abstract: In this paper, we present DeFer-a fast, high-quality, scalable, and nonstochastic fixed-outline floorplanning algorithm. DeFer generates a nonslicing floorplan by compacting a slicing floorplan. To find a good slicing floorplan, instead of searching through numerous slicing trees by simulated annealing as in traditional approaches, DeFer considers only one single slicing tree. However, we generalize the notion of slicing tree based on the principle of deferred decision making (DDM). When two subfloorplans are combined at each node of the generalized slicing tree, DeFer does not specify their orientations, the left-right/top-bottom order between them, and the slice line direction. DeFer even does not specify the slicing tree structure for small subfloorplan. In other words, we are deferring the decisions on these factors, which are specified arbitrarily at an early step in traditional approaches. Because of DDM, one slicing tree actually corresponds to a large number of slicing floorplan solutions, all of which are efficiently maintained in one single shape curve. With the final shape curve, it is straightforward to choose a good floorplan fitting into the fixed outline. Several techniques are also proposed to further optimize the wirelength. For both fixed-outline and classical floorplanning problems, experimental results show that DeFer achieves the best success rate, the best wirelength, the best runtime, and the best area on average compared with all other state-of-the-art floorplanners.
TL;DR: Based on the pin-constrained chip design, an efficient shuttle-passenger-like droplet manipulation method and test procedure is presented to achieve high-throughput and defect-tolerant well loading.
Abstract: Protein crystallization is a commonly used technique for protein analysis and subsequent drug design. It predicts the 3-D arrangement of the constituent amino acids, which in turn indicates the specific biological function of a protein. Protein crystallization experiments are typically carried out in well-plates in the laboratory. As a result, these experiments are slow, expensive, and error-prone due to the need for repeated human intervention. Recently, droplet-based ?digital? microfluidics have been used for executing protein assays on a chip. Protein samples in the form of nanoliter-volume droplets are manipulated using the principle of electrowetting-on-dielectric. We present the design of a multi-well-plate microfluidic biochip for protein crystallization; this biochip can transfer protein samples, prepare candidate solutions, and carry out crystallization automatically. To reduce the manufacturing cost of such devices, we present an efficient algorithm to generate a pin-assignment plan for the proposed design. The resulting biochip enables control of a large number of on-chip electrodes using only a small number of pins. Based on the pin-constrained chip design, we present an efficient shuttle-passenger-like droplet manipulation method and test procedure to achieve high-throughput and defect-tolerant well loading.
TL;DR: It is shown that the property of reciprocity is also preserved in the reduced-order model, so the passivity-preserving balanced truncation model reduction method can be reduced significantly.
Abstract: We present a passivity-preserving balanced truncation model reduction method for differential-algebraic equations arising in circuit simulation. This method is based on balancing the solutions of projected Lur'e equations. By making use of the special structure of circuit equations, we can reduce the numerical effort for balanced truncation significantly. It is shown that the property of reciprocity is also preserved in the reduced-order model. Network topological interpretations of certain circuit effects are given. The presented model reduction method is illustrated by numerical examples.
TL;DR: A test-grading technique that uses the method of output deviations for screening small-delay defects (SDDs) and a new gate-delay defect probability measure is defined to model delay variations for nanometer technologies.
Abstract: Timing-related defects are major contributors to test escapes and in-field reliability problems for very-deep submicrometer integrated circuits. Small delay variations induced by crosstalk, process variations, power-supply noise, as well as resistive opens and shorts can potentially cause timing failures in a design, thereby leading to quality and reliability concerns. We present a test-grading technique that uses the method of output deviations for screening small-delay defects (SDDs). A new gate-delay defect probability measure is defined to model delay variations for nanometer technologies. The proposed technique intelligently selects the best set of patterns for SDD detection from an n-detect pattern set generated using timing-unaware automatic test-pattern generation (ATPG). It offers significantly lower computational complexity and excites a larger number of long paths compared to a current generation commercial timing-aware ATPG tool. Our results also show that, for the same pattern count, the selected patterns provide more effective coverage ramp-up than timing-aware ATPG and a recent pattern-selection method for random SDDs potentially caused by resistive shorts, resistive opens, and process variations.
TL;DR: An efficient technique to perform design space exploration of a multiprocessor platform that minimizes the number of simulations needed to identify a Pareto curve with metrics like energy and delay is presented.
Abstract: This paper presents an efficient technique to perform design space exploration of a multiprocessor platform that minimizes the number of simulations needed to identify a Pareto curve with metrics like energy and delay. Instead of using semi-random search algorithms (like simulated annealing, tabu search, genetic algorithms, etc.), we use the domain knowledge derived from the platform architecture to set-up the exploration as a discrete-space Markov decision process. The system walks the design space changing its parameters, performing simulations only when probabilistic information becomes insufficient for a decision. A learning algorithm updates the probabilities of decision outcomes as simulations are performed. The proposed technique has been tested with two multimedia industrial applications, namely the ffmpeg transcoder and the parallel pigz compression algorithm. Results show that the exploration can be performed with 5% of the simulations necessary for the most used algorithms (Pareto simulated annealing, nondominated sorting genetic algorithm, etc.), increasing the exploration speed by more than one order of magnitude.
TL;DR: A contamination aware droplet routing algorithm for DMFBs based on four widely used bioassays is proposed, which reduces the used cells and the execution time significantly compared with the state-of-the-art algorithm.
Abstract: Recent advances of digital microfluidic biochips (DMFBs) have revolutionized the traditional laboratory procedures. By providing the droplet-based system, DMFB can perform real-time biological analysis and safety-critical biomedical applications. However, different droplets being transported and manipulated on the DMFB may introduce the contamination problem caused by liquid residue between different biomolecules. To overcome this problem, a wash droplet is introduced to clean the contaminations on the surface of the microfluidic array. However, current scheduling of wash droplet does not restrict the extra used cells and execution time of bioassay, thereby degrading the reliability and fault-tolerance significantly. In this paper, we propose a contamination aware droplet routing algorithm for DMFBs. To reduce the routing complexity and the used cells, we first construct preferred routing tracks by analyzing the global moving vector of droplets to guide the droplet routing. To cope with contaminations within one subproblem, we first apply a k -shortest path routing technique to minimize the contaminated spots. Then, to take advantage of multiple wash droplets, we adopt a minimum cost circulation (MCC) algorithm for optimal wash-droplet routing to simultaneously minimize used cells and the cleaning time. Since the droplet routing problem consists of several subproblems, a look-ahead prediction technique is further used to determine the contaminations between successive subproblems. After that, we can simultaneously clean both contaminations within one subproblem and those between successive subproblems by using the MCC-based algorithm to reduce the execution time and the used cells significantly. Based on four widely used bioassays, our algorithm reduces the used cells and the execution time significantly compared with the state-of-the-art algorithm.
TL;DR: A robust global router called NTHU-Route 2.0 is presented that improves the solution quality and runtime of NTHu-Route by the following enhancements: 1) a new history based cost function; 2) new ordering methods for congested region identification and rip-up and reroute; and 3) two implementation techniques.
Abstract: This paper presents a robust global router called NTHU-Route 2.0 that improves the solution quality and runtime of NTHU-Route by the following enhancements: 1) a new history based cost function; 2) new ordering methods for congested region identification and rip-up and reroute; and 3) two implementation techniques. We report convincing experimental results to show the effectiveness of each individual enhancement. With all these enhancements together, NTHU-Route 2.0 solves all ISPD98 benchmarks with very good quality. Moreover, NTHU-Route 2.0 routes 7 of 8 ISPD07 benchmarks and 12 of 16 ISPD08 benchmarks without any overflow. Compared with other state-of-the-art global routers, NTHU-Route 2.0 is able to produce better solution quality and/or run more efficiently.
TL;DR: This paper proposes novel methods to cluster similar properties and develops efficient learning techniques that can significantly reduce the overall test generation time for the properties in a cluster by sharing knowledge across similar test generation instances.
Abstract: Functional verification is one of the major bottlenecks in system-on-chip design due to the combined effects of increasing complexity and lack of automated techniques for generating efficient tests. Several promising ideas using bounded model checking are proposed over the years to efficiently generate counterexamples (tests). The existing researchers have used incremental satisfiability to improve the counterexample generation, involving only one property by sharing knowledge across instances of the same property with incremental bounds. In this paper, we present a framework that can efficiently reduce the overall test generation time by exploiting the similarity among different properties. This paper makes two primary contributions: (1) it proposes novel methods to cluster similar properties; and (2) it develops efficient learning techniques that can significantly reduce the overall test generation time for the properties in a cluster by sharing knowledge across similar test generation instances. Our experimental results using both software and hardware benchmarks demonstrate that our approach can drastically reduce (on average three to five times) the overall test generation time compared to existing methods.
TL;DR: Numerical results are presented and it is shown that dynamical models can accurately predict important circuit performance metrics, and may thus, be useful for design optimization of analog systems.
Abstract: This paper presents a system identification technique for generating stable compact models of typical analog circuit blocks in radio frequency systems. The identification procedure is based on minimizing the model error over a given training data set subject to an incremental stability constraint, which is formulated as a semidefinite optimization problem. Numerical results are presented for several analog circuits, including a distributed power amplifier, as well as a MEM device. It is also shown that our dynamical models can accurately predict important circuit performance metrics, and may thus, be useful for design optimization of analog systems.
TL;DR: The proposed BIRA minimizes area overhead by eliminating some storage coverage for only must-repair faulty information, and analyzes redundancies quickly and efficiently by evaluating all nodes of a branch in parallel with a new analyzer which is simple and easy-to-implement.
Abstract: As memory capacity and density grow, a corresponding increase in the number of defects decreases the yield and quality of embedded memories for systems-on-chip as well as commodity memories. For embedded memories, built-in redundancy analysis (BIRA) is widely used to solve quality and yield issues by replacing faulty cells with healthy redundant cells. Many BIRA approaches require extra hardware overhead in order to achieve optimal repair rates, or they suffer a loss of repair rate in minimizing the hardware overhead. An innovative BIRA approach is proposed to achieve optimal repair rates, lower area overhead, and increase analysis speed. The proposed BIRA minimizes area overhead by eliminating some storage coverage for only must-repair faulty information. The proposed BIRA analyzes redundancies quickly and efficiently by evaluating all nodes of a branch in parallel with a new analyzer which is simple and easy-to-implement. Experimental results show that the proposed BIRA allows for a much faster analysis speed than that of the state-of-the-art BIRA, as well as the optimal repair rate, and relatively small area overhead.
TL;DR: In this paper, the authors present an approach to automatically validate the implementation against its initial high-level specification using insights from translation validation, automated theorem proving, and relational approaches to reasoning about programs.
Abstract: The growing complexity of systems and their implementation into silicon encourages designers to look for ways to model designs at higher levels of abstraction and then incrementally build portions of these designs - automatically or manually - from these high-level specifications. Unfortunately, this translation process itself can be buggy, which can create a mismatch between what a designer intends and what is actually implemented in the circuit. Therefore, checking if the implementation is a refinement or equivalent to its initial specification is of tremendous value. In this paper, we present an approach to automatically validate the implementation against its initial high-level specification using insights from translation validation, automated theorem proving, and relational approaches to reasoning about programs. In our experiments, we first focus on concurrent systems modeled as communicating sequential processes and show that their refinements can be validated using our approach. Next, we have applied our validation approach to a realistic scenario - a parallelizing high-level synthesis framework called Spark. We present the details of our algorithm and experimental results.