Showing papers in "ACM Transactions on Design Automation of Electronic Systems in 2005"

PDF

Open Access

Journal Article•DOI•

A detailed power model for field-programmable gate arrays

[...]

Kara K. W. Poon¹, Steven J. E. Wilton¹, Andy Yan¹•Institutions (1)

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: A detailed and flexible power model which has been integrated in the widely used Versatile Place and Route (VPR) CAD tool is described, which estimates the dynamic, short-circuit, and leakage power consumed by FPGAs.

...read moreread less

Abstract: Power has become a critical issue for field-programmable gate array (FPGA) vendors. Understanding the power dissipation within FPGAs is the first step in developing power-efficient architectures and computer-aided design (CAD) tools for FPGAs. This article describes a detailed and flexible power model which has been integrated in the widely used Versatile Place and Route (VPR) CAD tool. This power model estimates the dynamic, short-circuit, and leakage power consumed by FPGAs. It is the first flexible power model developed to evaluate architectural tradeoffs and the efficiency of power-aware CAD tools for a variety of FPGA architectures, and is freely available for noncommercial use. The model is flexible, in that it can estimate the power for a wide variety of FPGA architectures, and it is fast, in that it does not require extensive simulation, meaning it can be used to explore a large architectural space. We show how the model can be used to investigate the impact of various architectural parameters on the energy consumed by the FPGA, focusing on the segment length, switch block topology, lookuptable size, and cluster size.

...read moreread less

187 citations

Journal Article•DOI•

RL-huffman encoding for test compression and power reduction in scan applications

[...]

Mehrdad Nourani¹, M.H. Tehranipour¹•Institutions (1)

University of Texas at Dallas¹

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This article mixes two encoding techniques to reduce test data volume, test pattern delivery time, and power dissipation in scan test applications by using run-length encoding followed by Huffman encoding.

...read moreread less

Abstract: This article mixes two encoding techniques to reduce test data volume, test pattern delivery time, and power dissipation in scan test applications. This is achieved by using run-length encoding followed by Huffman encoding. This combination is especially effective when the percentage of don't cares in a test set is high, which is a common case in today's large systems-on-chips (SoCs). Our analysis and experimental results confirm that achieving up to an 89% compression ratio and a 93% scan-in power reduction is possible for scan-testable circuits such as ISCAS89 benchmarks.

...read moreread less

139 citations

Journal Article•DOI•

Behavioral synthesis techniques for intellectual property protection

[...]

Farinaz Koushanfar¹, Inki Hong², Miodrag Potkonjak³•Institutions (3)

University of California, Berkeley¹, Synopsys², University of California, Los Angeles³

01 Jul 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: The essence of the new approach is the addition of a set of design and timing constraints which encodes the author's signature which results in signature data that is highly resilient, difficult to detect and remove, and yet is easy to verify and can be embedded in designs with very low hardware overhead.

...read moreread less

Abstract: We introduce dynamic watermarking techniques for protecting the value of intellectual property of CAD and compilation tools and reusable design components. The essence of the new approach is the addition of a set of design and timing constraints which encodes the author's signature. The constraints are selected in such a way that they result in a minimal hardware overhead while embedding a unique signature that is difficult to remove and forge. Techniques are applicable in conjunction with an arbitrary behavioral synthesis task such as scheduling, assignment, allocation, transformation, and template matching.On a large set of design examples, studies indicate the effectiveness of the new approach that results in signature data that is highly resilient, difficult to detect and remove, and yet is easy to verify and can be embedded in designs with very low hardware overhead. For example, the probability that the same design with the embedded signature is obtained by any other designers by themselves is less than 1 in 10102, and no register overhead was incurred. The probability of tampering, the probability that part of the embedded signature can be removed by random attempts, is shown to be extremely low, and the watermark is additionally protected from such tampering with error-correcting codes.

...read moreread less

137 citations

Journal Article•DOI•

Algorithmic aspects of hardware/software partitioning

[...]

Péter Arató¹, Zoltán Ádám Mann¹, András Orbán¹•Institutions (1)

Budapest University of Technology and Economics¹

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: The polynomial-time algorithm serves as the basis for a highly efficient novel heuristic for the NP-hard version of the problem, which makes use of problem-specific knowledge, and can thus find high-quality solutions rapidly.

...read moreread less

Abstract: One of the most crucial steps in the design of embedded systems is hardware/software partitioning, that is, deciding which components of the system should be implemented in hardware and which ones in software. Most formulations of the hardware/software partitioning problem are NP-hard, so the majority of research efforts on hardware/software partitioning has focused on developing efficient heuristics.This article considers the combinatorial structure behind hardware/software partitioning. Two similar versions of the partitioning problem are defined, one of which turns out to be NP-hard, whereas the other one can be solved in polynomial time. This helps in understanding the real cause of complexity in hardware/software partitioning. Moreover, the polynomial-time algorithm serves as the basis for a highly efficient novel heuristic for the NP-hard version of the problem. Unlike general-purpose heuristics such as genetic algorithms or simulated annealing, this heuristic makes use of problem-specific knowledge, and can thus find high-quality solutions rapidly. Moreover, it has the unique characteristic that it also calculates lower bounds on the optimum solution. It is demonstrated on several benchmarks and also large random examples that the new algorithm clearly outperforms other heuristics that are generally applied to hardware/software partitioning.

...read moreread less

102 citations

Journal Article•DOI•

Large-scale circuit placement

[...]

Jason Cong¹, Joseph R. Shinnerl¹, Min Xie¹, Tim Kong², Xin Yuan³ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Magma Design Automation², IBM³

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This tutorial summarizes results from recent optimality and scalability studies of existing placement tools, and highlights the recent progress on large-scale circuit placement, including techniques for wirelength minimization, routability optimization, and performance optimization.

...read moreread less

Abstract: Placement is one of the most important steps in the RTL-to-GDSII synthesis process, as it directly defines the interconnects, which have become the bottleneck in circuit and system performance in deep submicron technologies. The placement problem has been studied extensively in the past 30 years. However, recent studies show that existing placement solutions are surprisingly far from optimal. The first part of this tutorial summarizes results from recent optimality and scalability studies of existing placement tools. These studies show that the results of leading placement tools from both industry and academia may be up to 50% to 150% away from optimal in total wirelength. If such a gap can be closed, the corresponding performance improvement will be equivalent to several technology-generation advancements. The second part of the tutorial highlights the recent progress on large-scale circuit placement, including techniques for wirelength minimization, routability optimization, and performance optimization.

...read moreread less

77 citations

Journal Article•DOI•

An event-based monitoring service for networks on chip

[...]

Calin Ciordas¹, Twan Basten¹, Andrei Radulescu², Kees Goossens², Jef van Meerbergen² - Show less +1 more•Institutions (2)

Eindhoven University of Technology¹, Philips²

01 Oct 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: A generic reconfigurable online event-based NoC monitoring service, based on hardware probes attached to NoC components, offering run-time observability of NoC behavior and supporting system-level debugging is proposed.

...read moreread less

Abstract: Networks on chip (NoCs) are a scalable interconnect solution for multiprocessor systems on chip. We propose a generic reconfigurable online event-based NoC monitoring service, based on hardware probes attached to NoC components, offering run-time observability of NoC behavior and supporting system-level debugging. We present a probe architecture, its programming model, traffic management strategies, and a cost analysis. We prove feasibility via a prototype implementation for the AEthereal NoC. Two MPEG NoC examples show that the monitoring service area, without advanced optimizations, is 17--24p of the NoC area. Two realistic monitoring examples show that monitoring traffic is several orders of magnitude lower than the 2GB/s/link raw bandwidth.

...read moreread less

58 citations

Journal Article•DOI•

Combinatorial techniques for mixed-size placement

[...]

Saurabh N. Adya¹, Igor L. Markov¹•Institutions (1)

University of Michigan¹

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This work shows how to place macros consistently with large numbers of small standard cells and addresses the computational difficulty of layout problems involving large macros and numerous small logic cells at the same time.

...read moreread less

Abstract: While recent literature on circuit layout addresses large-scale standard-cell placement, the authors typically assume that all macros are fixed. Floorplanning techniques are very good at handling macros, but do not scale to hundreds of thousands of placeable objects. Therefore we combine floorplanning techniques with placement techniques to solve the more general placement problem. Our work shows how to place macros consistently with large numbers of small standard cells. Proposed techniques can also be used to guide circuit designers who prefer to place macros by hand.We address the computational difficulty of layout problems involving large macros and numerous small logic cells at the same time. Proposed algorithms are evaluated in the context of wirelength minimization because a computational method that is not scalable in optimizing wirelength is unlikely to be successful for more complex objectives (congestion, delay, power, etc.)We propose several different design flows to place mixed-size placement instances. The first flow relies on an arbitrary black-box standard-cell placer to obtain an initial placement and then removes possible overlaps using a fixed-outline floorplanner. This results in valid placements for macros, which are considered fixed. Remaining standard cells are then placed by another call to the standard-cell placer. In the second flow a standard-cell placer generates an initial placement and a force-directed placer is used in the engineering change order (ECO) mode to generate an overlap-free placement. Empirical evaluation on ibm benchmarks shows that in most cases our proposed flows compare favorably with previously published mixed-size placers, Kraftwerk, and the mixed-size floor-placer proposed at the 2003 Conference on Design, Automation, and Test in Europe (DATE 2003), and are competitive with mPG-MS.

...read moreread less

49 citations

Journal Article•DOI•

High-level modeling and simulation of single-chip programmable heterogeneous multiprocessors

[...]

JoAnn M. Paul¹, Donald E. Thomas¹, Andrew S. Cassidy¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: The foundations of the layered approach to modeling and performance simulation of PHMs are described, showing an example design space of a network processor explored using the simulation approach.

...read moreread less

Abstract: Heterogeneous multiprocessing is the future of chip design with the potential for tens to hundreds of programmable elements on single chips within the next several years. These chips will have heterogeneous, programmable hardware elements that lead to different execution times for the same software executing on different resources as well as a mix of desktop-style and embedded-style software. They will also have a layer of programming across multiple programmable elements forming the basis of a new kind of programmable system which we refer to as a Programmable Heterogeneous Multiprocessor (PHM). Current modeling approaches use instruction set simulation for performance modeling, but this will become far too prohibitive in terms of simulation time for these larger designs. The fundamental question is what the next higher level of design will be. The high-level modeling, simulation and design required for these programmable systems poses unique challenges, representing a break from traditional hardware design. Programmable systems, including layered concurrent software executing via schedulers on concurrent hardware, are not characterizable with traditional component-based hierarchical composition approaches, including discrete event simulation. We describe the foundations of our layered approach to modeling and performance simulation of PHMs, showing an example design space of a network processor explored using our simulation approach.

...read moreread less

46 citations

Journal Article•DOI•

Energy-aware variable partitioning and instruction scheduling for multibank memory architectures

[...]

Zhong Wang¹, Xiaobo Sharon Hu¹•Institutions (1)

University of Notre Dame¹

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This article proposes an algorithm to iteratively find the variable partition such that the maximum energy saving is achieved while satisfying the given performance constraint.

...read moreread less

Abstract: Many high-end DSP processors employ both multiple memory banks and heterogeneous register files to improve performance and power consumption. The complexity of such architectures presents a great challenge to compiler design. In this article, we present an approach for variable partitioning and instruction scheduling to maximally exploit the benefits provided by such architectures. Our approach is built on a novel graph model which strives to capture both performance and power demands. We propose an algorithm to iteratively find the variable partition such that the maximum energy saving is achieved while satisfying the given performance constraint. Experimental results demonstrate the effectiveness of our approach.

...read moreread less

37 citations

Journal Article•DOI•

XFM: An incremental methodology for developing formal models

[...]

Syed Suhaib¹, Deepak A. Mathaikutty¹, Sandeep K. Shukla¹, David Berner²•Institutions (2)

Virginia Tech¹, Institut de Recherche en Informatique et Systèmes Aléatoires²

01 Oct 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: An agile formal methodology based on Extreme Programming concepts to construct abstract models from natural language specifications of complex systems, focusing on Prescriptive Formal Models (PFMs) that capture the specification of the system under design in a mathematically precise manner is presented.

...read moreread less

Abstract: We present an agile formal methodology named eXtreme Formal Modeling (XFM), based on Extreme Programming (XP) concepts to construct abstract models from natural language specifications of complex systems. In particular, we focus on Prescriptive Formal Models (PFMs) that capture the specification of the system under design in a mathematically precise manner. Such models can be used as golden reference models for formal verification, test generation, coverage monitor generation, etc. This methodology for incrementally building PFMs works by adding user stories expressed as LTL formulae gleaned from the natural language specifications, one by one, into the model. XFM builds the models, retaining correctness with respect to incrementally added properties by regressively model-checking all the LTL properties captured theretofore in the model. We illustrate XFM with a graded set of examples consisting of a traffic light controller and a DLX pipeline. To make the regressive model-checking steps feasible with current model-checking tools, we need to control the model size increments at each subsequent step in the process. We therefore analyze the effects of ordering the LTL properties in XFM on the statespace growth rate of the model. We compare three different property-ordering methodologies: ad hoc ordering, property-based ordering, and predicate-based ordering. We experiment on the models of the ISA bus monitor and the arbitration phase of the Pentium Pro bus. We experimentally show and mathematically reason that the predicate-based ordering is the best among these orderings. Finally, we present a GUI-based toolbox that we implemented to build PFMs using XFM.

...read moreread less

30 citations

Journal Article•DOI•

Technology mapping and architecture evalution for k/m-macrocell-based FPGAs

[...]

Jason Cong¹, Hui Huang¹, Xin Yuan²•Institutions (2)

University of California, Los Angeles¹, IBM²

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: A very efficient technology mapping algorithm, k_m_flow, is developed for a novel field-programmable gate array (FPGA) architecture that is based on k-input single-output programmable logic array- (PLA-) like cells, or, k/m-macrocells and can outperform 4-LUT-based FPGAs on this set of benchmarks.

...read moreread less

Abstract: In this article, we study the technology mapping problem for a novel field-programmable gate array (FPGA) architecture that is based on k-input single-output programmable logic array- (PLA-) like cells, or, k/m-macrocells. Each cell in this architecture can implement a single output function of up to k inputs and up to m product terms. We develop a very efficient technology mapping algorithm, klmlflow, for this new type of architecture. The experimental results show that our algorithm can achieve depth-optimality on almost all the testcases in a set of 16 Microelectronics Center of North Carolina (MCNC) benchmarks. Furthermore it is shown that on this set of benchmarks, with only a relatively small number of product terms (m ≤ k p 3), the k/m-macrocell-based FPGAs can achieve the same or similar mapping depth compared with the traditional k-input single-output lookup table- (k-LUT-) based FPGAs. We also investigate the total area and delay of k/m-macrocell-based FPGAs and compare them with those of the commonly used 4-LUT-based FPGAs. The experimental results show that k/m-macrocell-based FPGAs can outperform 4-LUT-based FPGAs in terms of both delay and area after placement and routing by VPR on this set of benchmarks.

...read moreread less

Journal Article•DOI•

Equivalence checking between behavioral and RTL descriptions with virtual controllers and datapaths

[...]

Masahiro Fujita¹•Institutions (1)

University of Tokyo¹

01 Oct 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: In this paper, the equivalence between behavioral level and RTL designs can be defined precisely using the proposed "attribute statements" in an interactive fashion, and implementation issues as well as considerations on real life industrial design examples are also presented.

...read moreread less

Abstract: In this article, we present techniques for comparison between behavioral level and register transfer level (RTL) design descriptions by mapping the designs into virtual controllers and virtual datapaths. We also discuss about how the equivalence between behavioral level and RTL designs can be defined precisely using the proposed “attribute statements” in an interactive fashion. Implementation issues as well as considerations on real life industrial design examples are also presented.

...read moreread less

Journal Article•DOI•

Energy-efficient datapath scheduling using multiple voltages and dynamic clocking

[...]

Saraju P. Mohanty¹, Nagarajan Ranganathan²•Institutions (2)

University of North Texas¹, University of South Florida²

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: New datapath scheduling algorithms that use multiple supply voltages and dynamic frequency clocking in a coordinated manner in order to reduce the energy consumption ofdatapath circuits are developed.

...read moreread less

Abstract: Recently, dynamic frequency scaling has been explored at the CPU and system levels for power optimization Low-power datapath scheduling using multiple supply voltages has been well researched In this work, we develop new datapath scheduling algorithms that use multiple supply voltages and dynamic frequency clocking in a coordinated manner in order to reduce the energy consumption of datapath circuits In dynamic frequency clocking, the functional units can be operated at different frequencies depending on the computations occurring within the datapath during a given clock cycle The strategy is to schedule high-energy units, such as multipliers at lower frequencies, so that they can be operated at lower voltages to reduce energy consumption and the low-energy units, such as adders at higher frequencies, to compensate for speed The proposed time- and resource-constrained algorithms have been applied to various high-level synthesis benchmark circuits under different time and resource constraints The experimental results show significant reduction in energy for both the algorithms

...read moreread less

Journal Article•DOI•

A scheduling algorithm for optimization and early planning in high-level synthesis

[...]

Seda Ogrenci Memik¹, Ryan Kastner², Elaheh Bozorgzadeh³, Majid Sarrafzadeh⁴•Institutions (4)

Northwestern University¹, University of California, Santa Barbara², University of California, Irvine³, University of California, Los Angeles⁴

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This work presents an algorithm that schedules a chain of operations with data dependencies among consecutive operations at a single step, and uses a technique from the computational geometry domain to solve the matching problem.

...read moreread less

Abstract: Complexities of applications implemented on embedded and programmable systems grow with the advances in capacities and capabilities of these systems. Mapping applications onto them manually is becoming a very tedious task. This draws attention to using high-level synthesis within design flows. Meanwhile, it is essential to provide a flexible formulation of optimization objectives as well as to perform efficient planning for various design objectives early on in the design flow. In this work, we address these issues in the context of data flow graph (DFG) scheduling, which is an essential element within the high-level synthesis flow. We present an algorithm that schedules a chain of operations with data dependencies among consecutive operations at a single step. This local problem is repeated to generate the schedule for the whole DFG. The local problem is formulated as a maximum weight noncrossing bipartite matching. We use a technique from the computational geometry domain to solve the matching problem. This technique provides a theoretical guarantee on the solution quality for scheduling a single chain of operations. Although still being local, this provides a relatively wider perspective on the global scheduling objectives. In our experiments we compared the latencies obtained using our algorithm with the optimal latencies given by the exact solution to the integer linear programming (ILP) formulation of the problem. In 9 out of 14 DFGs tested, our algorithm found the optimal solution, while generating latencies comparable to the optimal solution in the remaining five benchmarks. The formulation of the objective function in our algorithm provides flexibility to incorporate different optimization goals. We present examples of how to exploit the versatility of our algorithm with specific examples of objective functions and experimental results on the ability of our algorithm to capture these objectives efficiently in the final schedules.

...read moreread less

Journal Article•DOI•

Routing-aware scan chain ordering

[...]

Puneet Gupta¹, Andrew B. Kahng¹, Stefanus Mantik²•Institutions (2)

University of California, San Diego¹, Cadence Design Systems²

01 Jul 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: A routing-driven methodology for scan chain ordering with minimum wirelength objective is presented and substantial wirelength reductions for the routing-based flow versus the traditional placement- based flow are shown.

...read moreread less

Abstract: Scan chain insertion can have a large impact on routability, wirelength, and timing of the design. We present a routing-driven methodology for scan chain ordering with minimum wirelength objective. A routing-based approach to scan chain ordering, while potentially more accurate, can result in TSP (Traveling Salesman Problem) instances which are asymmetric and highly nonmetric; this may require a careful choice of solvers. We evaluate our new methodology on recent industry place-and-route blocks with 1200 to 5000 scan cells. We show substantial wirelength reductions for the routing-based flow versus the traditional placement-based flow. In a number of our test cases, over 86p of scan routing overhead is saved. Even though our experiments are, so far, timing oblivious, the routing-based flow also improves evaluated timing, and practical timing-driven extensions appear feasible.

...read moreread less

Journal Article•DOI•

A unified method for phase shifter computation

[...]

Dimitri Kagaris¹•Institutions (1)

Southern Illinois University Carbondale¹

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This article shows how phase shifters can be synthesized uniformly and efficiently for any LFSM, including the aforementioned ones, and demonstrates the method by showing how to obtain phase shifter for two-dimensional cellular automata and for ring generators.

...read moreread less

Abstract: Phase shifters are used to shift the bit sequences produced by the successive stages of a built-in test pattern generator (TPG) based on a linear finite state machine (LFSM) by a specified amount (phase shift) relative to the characteristic sequence. An upper bound on the number of taps to be used for each phase shifter and a lower bound on the phase-shift value between successive stages of the TPG mechanism are the general parameters of the problem. Methods to design such phase shifters have been given in the past separately for Type-1 LFSRs, Type-2 LFSRs, and three-neighborhood cellular automata. In this article, we show how phase shifters can be synthesized uniformly and efficiently for any LFSM, including the aforementioned ones. We demonstrate the method by showing how to obtain phase shifters for two-dimensional cellular automata and for ring generators.

...read moreread less

Journal Article•DOI•

Instruction-level test methodology for CPU core self-testing

[...]

Saeed Shamshiri¹, H. Esmaeilzadeh¹, Zainalabdein Navabi¹•Institutions (1)

University of Tehran¹

01 Oct 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: Two different implementations of TIS are presented, one of which employs a dedicated hardware modules for test vector generation, while the other is a software-based approach that reads test vectors from memory.

...read moreread less

Abstract: TIS is an instruction-level methodology for processor core self-testing that enhances instruction set of a CPU with test instructions. Since the functionality of test instructions is the same as the NOP instruction, NOP instructions can be replaced with test instructions. Online testing can be accomplished without any performance penalty. TIS tests different parts of the processor and detects stuck-at faults. This method can be employed in offline and online testing of single-cycle, multicycle and pipelined processors. But, TIS is more appropriate for online testing of pipelined architectures in which NOP instructions are frequently executed because of data, control and structural hazards. Running test instructions instead of these NOP instructions, TIS utilizes the time that is otherwise wasted by NOPs. In this article, two different implementations of TIS are presented. One implementation employs a dedicated hardware modules for test vector generation, while the other is a software-based approach that reads test vectors from memory. These two approaches are implemented on a pipelined processor core and their area overheads are compared. To demonstrate the appropriateness of the TIS test technique, several programs are executed and fault coverage results are presented.

...read moreread less

Journal Article•DOI•

A 4-geometry maze router and its application on multiterminal nets

[...]

Gene Eu Jan¹, Ki-Yin Chang², Su Gao³, Ian Parberry³•Institutions (3)

National Taipei University¹, National Taiwan Ocean University², University of North Texas³

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: To solve the obstacle-avoiding rectilinear and 4-geometry Steiner tree problems, a heuristic algorithm is presented that utilizes a cost accumulation scheme based on the maze router to determine the Torricelli vertices (points) for improving the quality of multiterminal nets.

...read moreread less

Abstract: The maze routing problem is to find an optimal path between a given pair of cells on a grid plane. Lee's algorithm and its variants, probably the most widely used maze routing method, fails to work in the 4-geometry of the grid plane. Our algorithm solves this problem by using a suitable data structure for uniform wave propagation in the 4-geometry, 8-geometry, etc. The algorithm guarantees finding an optimal path if it exists and has linear time and space complexities. Next, to solve the obstacle-avoiding rectilinear and 4-geometry Steiner tree problems, a heuristic algorithm is presented. The algorithm utilizes a cost accumulation scheme based on the maze router to determine the Torricelli vertices (points) for improving the quality of multiterminal nets. Our experimental results show that the algorithm works well in practice. Furthermore, using the 4-geometry router, path lengths can be significantly reduced up to 12% compared to those in the rectilinear router.

...read moreread less

Journal Article•DOI•

Efficient techniques for transition testing

[...]

Xiao Liu¹, Michael S. Hsiao¹, Sreejit Chakravarty², Paul J. Thadikaran²•Institutions (2)

Virginia Tech¹, Intel²

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: Techniques to address the problem of yield loss due to incidental overtesting of functionally-untestable transition faults, and an efficient adjustment to the algorithm to keep the overtest ratio low are proposed.

...read moreread less

Abstract: Scan-based transition tests are added to improve the detection of speed failures in sequential circuits. Empirical data suggests that both data volume and application time will increase dramatically for such transition testing. Techniques to address the above problem for a class of transition tests, called enhanced transition tests, are proposed in this article.The first technique, which combines the proposed transition test chains with the ATE repeat capability, reduces test data volume by 46.5% when compared with transition tests computed by a commercial transition test ATPG tool. However, the test application time may sometimes increase. To address the test time issue, a new DFT technique, Exchange Scan, is proposed. Exchange scan reduces both data volume and application time by 46.5%. These techniques rely on the use of hold-scan cells and highlight the effectiveness of hold-scan design to address test time and test data volume issues. In addition, we address the problem of yield loss due to incidental overtesting of functionally-untestable transition faults, and we formulate an efficient adjustment to the algorithm to keep the overtest ratio low. Our experimental results show that up to 14.5% reduction in overtest ratio can be achieved, with an average overtest reduction of 4.68%.

...read moreread less

Journal Article•DOI•

Optimized wafer-probe and assembled package test design for analog circuits

[...]

S. Bhattacharya¹, Abhijit Chatterjee¹•Institutions (1)

Georgia Institute of Technology¹

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: It is shown that by using band-limited transient test signals, which can be supported by wafer-probe test instrumentation, significant numbers of bad ICs can be detected early during the wafersafer probe test.

...read moreread less

Abstract: It is well known that wafer-probe test costs of analog ICs are an order of magnitude less than the corresponding test costs of assembled packages. It is therefore natural to push as much of the testing process into wafer-probe testing as possible to reduce the scope of assembled package testing. However, the signal drive and response observation capabilities during wafer probe testing are limited in comparison to assembled packages. In this article, it is shown that by using band-limited transient test signals, which can be supported by wafer-probe test instrumentation, significant numbers of bad ICs can be detected early during the wafer-probe test. The optimal test stimuli are determined by cooptimizing the wafer-probe and assembled package test waveforms. Overall test costs, including the cost of packaging bad ICs, are minimized and are reduced up to four times. The proposed method has been validated using hardware test data, which were obtained through measurements made on a prototype.

...read moreread less

Journal Article•DOI•

An efficient algorithm for finding the minimal-area FPGA technology mapping

[...]

Chi-Chou Kao¹, Yen-Tai Lai²•Institutions (2)

National Pingtung Institute of Commerce¹, National Cheng Kung University²

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: The proposed algorithm is combined with the existing postprocessing procedures which are used to find the gates that can be duplicated on a set of benchmark examples and it is shown that except for some cases the proposed algorithm can find an optimal solution of a given problem.

...read moreread less

Abstract: Minimum area is one of the important objectives in technology mapping for lookup table-based field-progrmmable gate arrays (FPGAs). Although there is an algorithm that can find an optimal solution in polynomial time for the minimal-area FPGA technology mapping problem without gate duplication, its time complexity can grow exponentially with the number of inputs of the lookup-tables. This article proposes an algorithm with approximate to the area-optimal solution and lower time complexity. The time complexity of this algorithm is proven theoretically to be bounded by O(n3), where n is the total number of gates in the given circuit. It is shown that except for some cases the proposed algorithm can find an optimal solution of a given problem. We have combined the proposed algorithm with the existing postprocessing procedures which are used to find the gates that can be duplicated on a set of benchmark examples. The experimental results demonstrate the effectiveness of our algorithm.

...read moreread less

Journal Article•DOI•

Optimizing instruction TLB energy using software and hardware techniques

[...]

Ismail Kadayif¹, Anand Sivasubramaniam², Mahmut Kandemir², Gokul B. Kandiraju², G. Chen² - Show less +1 more•Institutions (2)

Çanakkale Onsekiz Mart University¹, Pennsylvania State University²

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: Four different approaches for reducing the number of accesses to the Translation Look-aside Buffer are proposed and experimentally demonstrate that one of these schemes that uses a combination of compiler and hardware enhancements can reduce iTLB dynamic power by over 85&percent; in most cases.

...read moreread less

Abstract: Power consumption and power density for the Translation Look-aside Buffer (TLB) are important considerations not only in its design, but can have a consequence on cache design as well. After pointing out the importance of instruction TLB (iTLB) power optimization, this article embarks on a new philosophy for reducing the number of accesses to this structure. The overall idea is to keep a translation currently being used in a register and avoid going to the iTLB as far as possible---until there is a page change. We propose four different approaches for achieving this, and experimentally demonstrate that one of these schemes that uses a combination of compiler and hardware enhancements can reduce iTLB dynamic power by over 85% in most cases.The proposed approaches can work with different instruction-cache (iL1) lookup mechanisms and achieve significant iTLB power savings without compromising on performance. Their importance grows with higher iL1 miss rates and larger page sizes. They can work very well with large iTLB structures that can possibly consume more power and take longer to lookup, without the iTLB getting into the common case. Further, we also experimentally demonstrate that they can provide performance savings for virtually indexed, virtually tagged iL1 caches, and can even make physically indexed, physically tagged iL1 caches a possible choice for implementation.

...read moreread less

Journal Article•DOI•

Simplifying the design and automating the verification of pipelines with structural hazards

[...]

Jason T. Higgins¹, Mark D. Aagaard¹•Institutions (1)

University of Waterloo¹

01 Oct 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: A technique that simplifies the design of pipelined circuits automates the specification and verification of structural-hazard and datapath correctness properties for pipelining circuits.

...read moreread less

Abstract: This article describes a technique that simplifies the design of pipelined circuits automates the specification and verification of structural-hazard and datapath correctness properties for pipelined circuits. The technique is based upon a template for pipeline stages, a control-circuit cell library, a decomposition of structural hazard and datapath correctness into a collection of simple properties, and a prototype design tool that generates verification scripts for use by external tools. Our case studies include scalar and superscalar implementations of a 32-bit OpenRISC integer microprocessor.

...read moreread less

Journal Article•DOI•

Voltage scheduling under unpredictabilities: a risk management paradigm

[...]

Azadeh Davoodi¹, Ankur Srivastava¹•Institutions (1)

University of Maryland, College Park¹

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This article proposes a novel risk management based technique that is capable of generating an effective tradeoff between power and “risk”: the more the risk, the less the power.

...read moreread less

Abstract: This article addresses the problem of voltage scheduling in unpredictable situations. The voltage scheduling problem assigns voltages to operations such that the power is minimized under a clock delay constraint. In the presence of unpredictabilities, meeting the clock latency constraint cannot be guaranteed. This article proposes a novel risk management based technique to solve this problem. Here, the risk management paradigm assigns a quantified value to the amount of risk the designer is willing to take on the clock cycle constraint. The algorithm then assigns voltages in order to meet the expected value of clock cycle constraint while keeping the maximum delay within the specified “risk” and minimizing the power. The proposed algorithm is based on dynamic programming and is optimal for trees. Experimental results show that the traditional voltage scheduling approach is incapable of handling unpredictabilities. Our approach is capable of generating an effective tradeoff between power and “risk”: the more the risk, the less the power. The results show that a small increase in design risk positively affects the power dissipation.

...read moreread less

Journal Article•DOI•

An o(min(m, n)) parallel deadlock detection algorithm

[...]

Jaehwan John Lee¹, Vincent J. Mooney¹•Institutions (1)

Georgia Institute of Technology¹

01 Jul 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: An experiment involving a practical situation with an early deadlock condition showed that the time measured from application initialization to deadlock detection was reduced by 46% by employing the DDU as compared to detecting deadlock in software.

...read moreread less

Abstract: This article presents a novel Parallel Deadlock Detection Algorithm (PDDA) and its hardware implementation, Deadlock Detection Unit (DDU). PDDA uses simple Boolean representations of request, grant, and no activity so that the hardware implementation of PDDA becomes easier and operates faster. We prove the correctness of PDDA and that the DDU has a runtime complexity of O(min(m,n)), where m is the number of resources and n is the number of processes. The DDU reduces deadlock detection time by 99p, (i.e., 100X) or more compared to software implementations of deadlock detection algorithms. An experiment involving a practical situation with an early deadlock condition showed that the time measured from application initialization to deadlock detection was reduced by 46p by employing the DDU as compared to detecting deadlock in software.

...read moreread less

Journal Article•DOI•

Bipartitioning and encoding in low-power pipelined circuits

[...]

Shanq-Jang Ruan¹, Kun-Lin Tsai², Edwin Naroska, Feipei Lai•Institutions (2)

National Taiwan University of Science and Technology¹, National Taiwan University²

01 Jan 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This work exploits the bipartition approach as well as encoding techniques to reduce power dissipation not only of combinational logic blocks but also of the pipeline registers to optimize power for pipelined circuits.

...read moreread less

Abstract: In this article, we present a bipartition dual-encoding architecture for low-power pipelined circuits. We exploit the bipartition approach as well as encoding techniques to reduce power dissipation not only of combinational logic blocks but also of the pipeline registers. Based on Shannon expansion, we partition a given circuit into two subcircuits such that the number of different outputs of both subcircuits are reduced, and then encode the output of both subcircuits to minimize the Hamming distance for transitions with a high switching probability. We measure the benefits of four different combinational bipartitioning and encoding architectures for comparison. The transistor-level simulation results show that bipartition dual-encoding can effectively reduce power by 72.7% for the pipeline registers and 27.1% for the total power consumption on average. To the best of our knowledge, it is the first work that presents an in-depth study on bipartition and encoding techniques to optimize power for pipelined circuits.

...read moreread less

Journal Article•DOI•

The open family of temporal logics: Annotating temporal operators with input constraints

[...]

Ansuman Banerjee¹, Pallab Dasgupta¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Jul 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: A new style for writing temporal specifications of open systems that can be integrated with traditional symbolic model-checking techniques and present a complete tool for the verification of Verilog RTL modules in isolation is proposed.

...read moreread less

Abstract: Assume-guarantee style verification of modules relies on the appropriate modeling of the interaction of the module with its environment. Popular temporal logics such as Computation Tree Logic (CTL) and Linear Temporal Logic (LTL) that were originally defined for closed systems (Kripke structures) do not make any syntactic discrimination between input and output variables. As a result, these logics and their recent derivatives (such as System Verilog, Sugar, Forspec, etc) permit the specification of properties that have some semantic problems when interpreted over open systems or modules. These semantic problems are quite common in practice, but are computationally hard to detect within a given specification. In this article, we propose a new style for writing temporal specifications of open systems that helps the designer to avoid most of these problems. In the proposed style, the basic temporal operators (such as next and until) are annotated with assume constraints over the input variables. We formalize this style through an extension of LTL, namely Open-LTL and an extension of CTL with fairness, called Open-CTL. We show that this simple syntactic separation between the assume and the guarantee achieves the desired results. We show that the proposed style can be integrated with traditional symbolic model-checking techniques and present a complete tool for the verification of Verilog RTL modules in isolation.

...read moreread less

Journal Article•DOI•

An algorithm for integrated pin assignment and buffer planning

[...]

Hua Xiang¹, Xiaoping Tang¹, Martin D. F. Wong²•Institutions (2)

IBM¹, University of Illinois at Urbana–Champaign²

01 Jul 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This article presents a polynomial-time exact algorithm for integrated pin assignment and buffer planning for all two-pin nets from one macro block (source block) to all other blocks of a given buffer block plan, while minimizing the total cost.

...read moreread less

Abstract: The buffer block methodology has become increasingly popular as more and more buffers are needed in deep-submicron design, and it leads to many challenging problems in physical design. In this article, we present a polynomial-time exact algorithm for integrated pin assignment and buffer planning for all two-pin nets from one macro block (source block) to all other blocks of a given buffer block plan, while minimizing the total cost α d W p β d R for any positive α and β where W is the total wirelength, and R is the number of buffers. By applying this algorithm iteratively (each time, pick one block as the source block), it provides a polynomial-time algorithm for pin assignment and buffer planning for nets among multiple macro blocks. Experimental results demonstrate its efficiency and effectiveness.

...read moreread less

Journal Article•DOI•

Scheduling and optimal register placement for synchronous circuits derived using software pipelining techniques

[...]

Noureddine Chabini¹, El Mostapha Aboulhamid², Ismail Chabini³, Yvon Savaria⁴•Institutions (4)

Royal Military College of Canada¹, Université de Montréal², Massachusetts Institute of Technology³, École Polytechnique de Montréal⁴

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: The proposed approach provides solutions to the problem of how to place the minimal number of registers in Step 3 and can handle nonzero clock skew, and conjecture that the problem is NP-hard in its general form.

...read moreread less

Abstract: Data dependency constraints constitute a lower bound P on the minimal clock period of single-phase clocked sequential circuits. In contrast to methods based on basic retiming, clocked sequential circuits with clock period P can always be obtained using software pipelining techniques. Such circuits can be derived by any method that can be framed in the following four-step process: Step 1, determine P; Step 2, compute a valid periodic schedule of the computational elements; Step 3, place registers back to the circuit; Step 4, assign the clock signals to control registers.Methods with polynomial run-time to implement this process are proposed in the literature. They implement these steps sequentially, starting with Step 1. These methods do not know how to optimally place registers which leads to an unnecessary number of registers. In this article, we address the problem of how to simultaneously implement Steps 2 and 3 in order to minimize the total number of registers. We conjecture that the problem is NP-hard in its general form. We formulate the problem for the first time in the literature, and devise a Mixed Integer Linear Program (MILP) to solve it. From this MILP, we derive a linear program to determine approximate solutions to the problem for large general circuits. We show that the proposed approach can handle nonzero clock skew. Experimental results confirm the effectiveness of the approach and show that significant reductions of the number of registers can be obtained although register sharing is not used. When the schedule is given, the proposed approach provides solutions to the problem of how to place the minimal number of registers in Step 3.

...read moreread less

Journal Article•DOI•

Synthesis of skewed logic circuits

[...]

Aiqun Cao¹, Naran Sirisantana², Cheng-Kok Koh³, Kaushik Roy³•Institutions (3)

Synopsys¹, Intel², Purdue University³

01 Apr 2005-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This article proposes a two-step synthesis scheme of skewed logic circuits, in the first step, an integer linear programming-based approach is presented to overcome the logic reconvergence problem in skewed Logic circuits with minimal logic duplication cost.

...read moreread less

Abstract: Skewed logic circuits belong to a noise-tolerant high-performance static circuit family. Skewed logic circuits can achieve performance comparable to that of Domino logic circuits but with much lower power consumption. Two factors contribute to the reduction in power. First, by exploiting the static nature of skewed logic circuits, we can alleviate the cost of logic duplication which is typically required to overcome the logic reconvergence problem in both Domino logic and skewed logic circuits. Second, a selective clocking scheme can be applied to a skewed logic circuit to reduce the clock load and hence, clock power. In this article, we propose a two-step synthesis scheme of skewed logic circuits. In the first step, an integer linear programming-based approach is presented to overcome the logic reconvergence problem in skewed logic circuits with minimal logic duplication cost. In the second step, a dynamic programming-based heuristic is applied to achieve an optimal selective clocking scheme. Experimental results show that the average power saving of skewed logic circuits over Domino logic circuits is 41.1%.

...read moreread less