Showing papers presented at "Asia and South Pacific Design Automation Conference in 2005"

PDF

Open Access

Proceedings Article•DOI•

Redundant-via enhanced maze routing for yield improvement

[...]

Gang Xu, Li-Da Huang¹, David Z. Pan², Martin D. F. Wong³•Institutions (3)

Texas Instruments¹, University of Texas at Austin², University of Illinois at Urbana–Champaign³

18 Jan 2005

TL;DR: This paper proposes the first routing algorithm that considers feasibility of redundant via insertion in the detailed routing stage, and transforms the routing problem to a multiple constraint shortest path problem, and solved by Lagrangian relaxation technique.

...read moreread less

Abstract: Redundant via insertion is a good solution to reduce the yield loss by via failure. However, the existing methods are all post-layout optimizations that insert redundant via after detailed routing. In this paper, we propose the first routing algorithm that considers feasibility of redundant via insertion in the detailed routing stage. Our routing problem is formulated as maze routing with redundant via constraints. The problem is transformed to a multiple constraint shortest path problem, and solved by Lagrangian relaxation technique. Experimental results show that our algorithm can find routing layout with much higher rate of redundant via than conventional maze routing.

...read moreread less

159 citations

Proceedings Article•DOI•

Synthesis of quantum logic circuits

[...]

Vivek Shende¹, Stephen S. Bullock², Igor L. Markov¹•Institutions (2)

University of Michigan¹, National Institute of Standards and Technology²

18 Jan 2005

TL;DR: Efficient quantum-logic circuits that perform two tasks are discussed: 1) implementing generic quantum computations, and 2) initializing quantum registers that are asymptotically optimal for respective tasks.

...read moreread less

Abstract: The pressure of fundamental limits on classical computation and the promise of exponential speedups from quantum effects have recently brought quantum circuits to the attention of the EDA community (Iwama et al., 2002; Shende et al., 2003; Bullock and Markov, 2003; Shende et al., 2004; Hung et al., 2004). We discuss efficient circuits to initialize quantum registers and implement generic quantum computations. Our techniques yield circuits that are twice as small as the best previously published technique. Moreover, a theoretical lower bound shows that our new circuits can be improved by at most a factor of two. Further, the circuits grow by at most a factor of nine under severe architectural restrictions.

...read moreread less

159 citations

Proceedings Article•DOI•

Thermal-driven multilevel routing for 3-D ICs

[...]

Jason Cong¹, Yan Zhang¹•Institutions (1)

University of California, Los Angeles¹

18 Jan 2005

TL;DR: This paper proposes an efficient 3D multilevel routing approach that includes a novel through-the-silicon via (TS-via) planning algorithm that features an adaptive lumped resistive thermal model and a two-step multileVEL TS-via planning scheme.

...read moreread less

Abstract: 3-D IC has a great potential for improving circuit performance and degree of integration. It is also an attractive platform for system-on-chip or system-in-package solutions. A critical issue in 3-D circuit design is heat dissipation. In this paper we propose an efficient 3-D multilevel routing approach that includes a novel through-the-silicon via (TS-via) planning algorithm. The proposed approach features an adaptive lumped resistive thermal model and a two-step multilevel TS-via planning scheme. Experimental results show that with multilevel TS-via planning, the thermal-driven approach can reduce the maximum temperature to the required temperature with reasonable wirelength increase. Compared to a post processing approach for dummy TS-via insertion, to achieve the same required temperature, our approach uses 80% fewer TS-vias. To our knowledge, this proposed approach is the first thermal-driven 3-D routing algorithm.

...read moreread less

143 citations

Proceedings Article•DOI•

Opportunities and challenges for better than worst-case design

[...]

Todd Austin¹, Valeria Bertacco¹, David Blaauw¹, Trevor Mudge¹•Institutions (1)

University of Michigan¹

18 Jan 2005

TL;DR: This paper presents the concepts of better than worst-case design and highlights two exemplary designs: the DIVA checker and Razor logic and shows how this approach to system implementation relaxes design constraints on core components, which reduces the effects of physical design challenges and creates opportunities to optimize performance and power characteristics.

...read moreread less

Abstract: The progressive trend of fabrication technologies towards the nanometer regime has created a number of new physical design challenges for computer architects. Design complexity, uncertainty in environmental and fabrication conditions, and single-event upsets all conspire to compromise system correctness and reliability. Recently, researchers have begun to advocate a new design strategy called Better Than Worst-Case design that couples a complex core component with a simple reliable checker mechanism. By delegating the responsibility for correctness and reliability of the design to the checker, it becomes possible to build provably correct designs that effectively address the challenges of deep submicron design. In this paper, we present the concepts of Better Than Worst-Case design and high light two exemplary designs: the DIVA checker and Razor logic. We show how this approach to system implementation relaxes design constraints on core components, which reduces the effects of physical design challenges and creates opportunities to optimize performance and power characteristics. We demonstrate the advantages of relaxed design constraints for the core components by applying typical-case optimization (TCO) techniques to an adder circuit. Finally, we discuss the challenges and opportunities posed to CAD tools in the context of Better Than Worst-Case design. In particular, we describe the additional support required for analyzing run-time characteristics of designs and the many opportunities which are created to incorporate typical-case optimizations into synthesis and verification.

...read moreread less

135 citations

Proceedings Article•DOI•

Mapping and physical planning of networks-on-chip architectures with quality-of-service guarantees

[...]

Srinivasan Murali¹, Luca Benini², G. De Micheli¹•Institutions (2)

Stanford University¹, University of Bologna²

18 Jan 2005

TL;DR: This work presents an integrated approach to mapping of cores onto NoC topologies and physical planning of NoCs, where the position and size of the cores and network components are computed.

...read moreread less

Abstract: Networks on chips (NoCs) have evolved as the communication design paradigm of future systems on chips (SoCs). In this work we target the NoC design of complex SoCs with heterogeneous processor/memory cores, providing quality-of-service (QoS) for the application. We present an integrated approach to mapping of cores onto NoC topologies and physical planning of NoCs, where the position and size of the cores and network components are computed. Our design methodology automates NoC mapping, physical planning, topology selection, topology optimization and instantiation, bridging an important design gap in building application specific NoCs. We also present a methodology to guarantee QoS for the application during the mapping-physical planning process by satisfying the delay/jitter constraints and real-time constraints of the traffic streams. Experimental studies show large area savings (up to 2/spl times/), bandwidth savings (up to 5/spl times/) and network component savings (up to 2.2/spl times/ in buffer count, 3.8/spl times/ in number of wires, 1.6/spl times/ in switch ports) compared to traditional design approaches.

...read moreread less

123 citations

Proceedings Article•DOI•

Detailed placement for improved depth of focus and CD control

[...]

Puneet Gupta, A.B. Kahngt, Chul-Hong Park¹•Institutions (1)

University of California, San Diego¹

18 Jan 2005

TL;DR: In this article, a dynamic programming-based technique for assist-feature correctness (AFCorr) was proposed to improve the depth of focus of standard-cell designs in subwavelength lithography.

...read moreread less

Abstract: Sub-resolution assist features (SRAFs) provide an absolutely essential technique for critical dimension (CD) control and process window enhancement in subwavelength lithography. However, as focus levels change during manufacturing, CDs at a given "legal" pitch can fail to achieve manufacturing tolerances required for adequate yield. Furthermore, adoption of off-axis illumination (OAI) and SRAF techniques to enhance resolution at minimum pitch worsens printability of patterns at other pitches. This paper describes a novel dynamic programming-based technique for assist-feature correctness (AFCorr) in detailed placement of standard-cell designs. For benchmark designs in 130 nm and 90 nm technologies, AFCorr achieves improved depth of focus and substantial improvement in CD control with negligible timing, area, or CPU overhead. The advantages of AFCorr are expected to increase in future technology nodes.

...read moreread less

100 citations

Proceedings Article•DOI•

Dynamic power management using on demand paging for networked embedded systems

[...]

Yuvraj Agarwal¹, Curt Schurgers¹, Rajesh Gupta¹•Institutions (1)

University of California, San Diego¹

18 Jan 2005

TL;DR: This paper has implemented an on-demand paging scheme on an infrastructure based WLAN consisting of iPAQ PDAs equipped with Bluetooth radios and Cisco Aironet wireless networking cards and shows power saving ranging from 23% to 48% over the present 802.11b wireless LAN.

...read moreread less

Abstract: The power consumption of the network interface plays a major role in determining the total operating lifetime of wireless networked embedded systems. In case of on-demand paging, a low power secondary radio is used to wake up the higher power radio, allowing the latter to sleep for longer periods of time. In this paper we present use of Bluetooth radios to serve as a paging channel for the 802.11b wireless LAN. We have implemented an on-demand paging scheme on an infrastructure based WLAN consisting of iPAQ PDAs equipped with Bluetooth radios and Cisco Aironet wireless networking cards. Our results show power saving ranging from 23% to 48% over the present 802.11b standard operating modes with negligible impact on performance.

...read moreread less

99 citations

Proceedings Article•DOI•

Constraint extraction for pseudo-functional scan-based delay testing

[...]

Yung-Chieh Lin¹, Feng Lu¹, Kai Yang¹, Kwang-Ting Cheng¹•Institutions (1)

University of California, Santa Barbara¹

18 Jan 2005

TL;DR: In this article, the authors proposed a pseudo-functional test methodology that attempts to minimize the over-testing problem of the scan-based circuits for the delay faults, where the first pattern of a two-pattern test is still delivered by scan in the test mode but the pattern is generated in such a way that it does not violate the functional constraints extracted from the functional logic.

...read moreread less

Abstract: Recent research results, have shown that the traditional structural testing for, delay and crosstalk faults may result in over-testing due to the non-trivial number of such faults that are untestable in the functional mode while testable in the test mode. This paper presents a pseudo-functional test methodology that attempts to minimize the over-testing problem of the scan-based circuits for the delay faults. The first pattern of a two-pattern test is still delivered by scan in the test mode but the pattern is generated in such a way that it does not violate the functional constraints extracted from the functional logic. In this paper, we use a SAT solver to extract a set of functional constraints which consists of illegal states and internal signal correlation. Along with the functional justification (also called broad-side) test application scheme, the functional constraints are imposed to a commercial delay-fault ATPG tool to generate pseudo-functional delay tests. The experimental results indicate that the percentage of untestable delay faults is non-trivial for many circuits which support the hypothesis of the over-testing problem in delay testing. The results also indicate the effectiveness of the proposed constraint extraction method.

...read moreread less

97 citations

Proceedings Article•DOI•

A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264

[...]

Minho Kim¹, In-Gu Hwang¹, Soo-Ik Chae¹•Institutions (1)

Seoul National University¹

18 Jan 2005

TL;DR: The proposed VBSME can achieve 100% PE utilization by employing a preload register and a search data buffer inside each PE and allow real-time processing of 4CIF(704x576) video with 15 fps at 100 Mhz for a search range of [-32~+31].

...read moreread less

Abstract: We describe a fast VLSI architecture for full-search motion estimation for the blocks with 7 different sizes in MPEG-4 AVC/H.264. The proposed variable block size motion estimation (VBSME) architecture consists of a 16/spl times/16 PE array, an adder tree and comparators to find all 41 motion vectors and their minimum SADs for the blocks of 16/spl times/16, 16/spl times/8, 8/spl times/16, 8/spl times/8, 8/spl times/4, 4/spl times/8 and 4/spl times/4. It employs a 2D datapath and its control of the search area data is simple and regular. The proposed VBSME can achieve 100% PE utilization by employing a preload register and a search data buffer inside each PE and allow real-time processing of 4CIF(704/spl times/576) video with 15 fps at 100 MHz for a search range of |-32/spl sim/+31|.

...read moreread less

96 citations

Proceedings Article•DOI•

Time and energy efficient mapping of embedded applications onto NoCs

[...]

Cesar Marcon¹, Andre Borin¹, Altamiro Amadeu Susin¹, Luigi Carro¹, Flávio Rech Wagner¹ - Show less +1 more•Institutions (1)

Universidade Federal do Rio Grande do Sul¹

18 Jan 2005

TL;DR: This work analyzes the mapping of applications onto generic regular networks-on-chip (NoCs) by taking into consideration the dynamic behavior of the target application and thus potential contentions in the intercommunication of the cores.

...read moreread less

Abstract: This work analyzes the mapping of applications onto generic regular networks-on-chip (NoCs). Cores must be placed considering communication requirements, so as to minimize the overall application execution time and energy consumption. We expand previous mapping strategies by taking into consideration the dynamic behavior of the target application and thus potential contentions in the intercommunication of the cores. Experimental results for a suite of 22 benchmarks and various NoC sizes show that a 42% average reduction in the execution time of the mapped application can be obtained, together with a 21% average reduction in the total energy consumption for state-of-the-art technologies.

...read moreread less

92 citations

Proceedings Article•DOI•

Methodology for high level estimation of FPGA power consumption

[...]

V. Degalahal¹, Tim Tuan²•Institutions (2)

Pennsylvania State University¹, Xilinx²

18 Jan 2005

TL;DR: The methodology uses device-level simulations to characterize a coarse-grained architectural model and incorporates architectural parameters to estimate the dominant wire capacitance and finds that the routing resources and the clock to consume the maximum power.

...read moreread less

Abstract: Power consumption in FPGA designs calls for power-aware design and power budgeting early in the design cycle. In this work, we leverage the FPGA architecture to present an efficient and accurate methodology for pre-silicon dynamic power estimation of FPGA-based designs. Our methodology uses device-level simulations to characterize a coarse-grained architectural model and incorporates architectural parameters to estimate the dominant wire capacitance. Such an approach not only reduces the need for tedious and time consuming silicon characterizations but ensures accurate pre-silicon power predictions. We apply the methodology to estimate the power consumption of a state-of-the-art Spartan-3/spl trade/ FPGA family, evaluate the estimation results against silicon measurements, and present a detailed power breakdown of the FPGA. Our results find that the routing resources and the clock to consume the maximum power.

...read moreread less

Proceedings Article•DOI•

Speed and voltage selection for GALS systems based on voltage/frequency islands

[...]

Koushik Niyogi¹, Diana Marculescu¹•Institutions (1)

Carnegie Mellon University¹

18 Jan 2005

TL;DR: The results show that static voltage and speed assignment can achieve up to 42% savings in total energy for various media and signal processing applications, while application specific dynamic approaches provide up to 44% energy savings in the case of MPEG-2 encoder application, when compared to a single clocked system architecture.

...read moreread less

Abstract: Due to increasing clock speeds and shrinking technologies, distributing a single global clock signal throughout a chip is becoming a difficult and challenging proposition. In this paper, we address the problem of energy optimal local speed and voltage selection in frequency/voltage island based systems under given performance constraints. Our results show that static voltage and speed assignment can achieve up to 42% savings in total energy for various media and signal processing applications, while application specific dynamic approaches provide up to 44% energy savings in the case of MPEG-2 encoder application, when compared to a single clocked system architecture.

...read moreread less

Proceedings Article•DOI•

Fast computation of the temperature distribution in VLSI chips using the discrete cosine transform and table look-up

[...]

Yong Zhan¹, Sachin S. Sapatnekar¹•Institutions (1)

University of Minnesota¹

18 Jan 2005

TL;DR: A highly accurate fast algorithm for computing the on-chip temperature distribution due to power sources located on the top surface of the chip using a combination of several computational techniques including the Green function method, the discrete cosine transform (DCT), and the table look-up technique.

...read moreread less

Abstract: Temperature-related effects are critical in determining both the performance and reliability of VLSI circuits. Accurate and efficient estimation of the temperature distribution corresponding to a specific circuit layout is indispensable in physical design automation tools. In this paper, we propose a highly accurate fast algorithm for computing the on-chip temperature distribution due to power sources located on the top surface of the chip. The method is a combination of several computational techniques including the Green function method, the discrete cosine transform (DCT), and the table look-up technique. The high accuracy of the algorithm comes from the fully analytical nature of the Green function method, and the high efficiency is due to the application of the fast Fourier transform (FFT) technique to compute the DCT and later obtaining the temperature field for any power source distribution using the pre-calculated look-up table. Experimental results have demonstrated that our method has a relative error of below 1% compared with commercial computational fluid dynamic (CFD) softwares for thermal analysis, while the efficiency of our method is orders of magnitude higher than the direct application of the Green function method.

...read moreread less

Proceedings Article•DOI•

Floorplanning for 3-D VLSI design

[...]

Lei Cheng¹, Liang Deng¹, Martin D. F. Wong¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

18 Jan 2005

TL;DR: The surprising results that the new 2D floorplanner has produced slicing floorplans for the two largest MCNC benchmarks ami33 and ami49 which have the smallest areas ever reported in the literature are reported.

...read moreread less

Abstract: In this paper we present a floorplanning algorithm for 3-D ICs. The problem can be formulated as that of packing a given set of 3-D rectangular blocks while minimizing a suitable cost function. Our algorithm is based on a generalization of the classical 2-D slicing floorplans to 3-D slicing floorplans. A new encoding scheme of slicing floorplans (2-D/3-D) and its associated set of moves form the basis of the new simulated annealing based algorithm. The bestknown algorithm for packing 3-D rectangular blocks is based on simulated annealing using sequence-triple floorplan representation. Experimental results show that our algorithm produces packing results on average 3% better than the sequence-triple-based algorithm under the same annealing parameters, and our algorithm runs much faster (17 times for problems containing 100 blocks) than the sequence-triple. Moreover, our algorithm can be extended to consider various types of placement constraints and thermal distribution while the existing sequence-triple-based algorithm does not have such capabilities. Finally, when specializing to 2-D problems, our algorithm is a new 2-D slicing floorplanning algorithm. We are excited to report the surprising results that our new 2-D floorplanner has produced slicing floorplans for the two largest MCNC benchmarks ami33 and ami49 which have the smallest areas (among all slicing/nonslicing floorplanning algorithms) ever reported in the literature.

...read moreread less

Proceedings Article•DOI•

MAIA: a framework for networks on chip generation and verification

[...]

Luciano Ost¹, Aline Mello¹, J.C.S. Palma², Fernando Moraes¹, Ney Calazans¹ - Show less +1 more•Institutions (2)

Pontifícia Universidade Católica do Rio Grande do Sul¹, Universidade Federal do Rio Grande do Sul²

18 Jan 2005

TL;DR: The objective of this paper is to present the MAIA framework, which includes functions to address all requirements of automated NoC generation, production and analysis, and seamless analysis of NoC traffic parameters.

...read moreread less

Abstract: The increasing complexity of SoCs makes networks on chip (NoC) a promising substitute for busses and dedicated wires interconnection schemes. However, new tools need to be developed to integrate NoC interconnection architectures and IP cores into SoCs. Such tools have to fulfill three main requirements: (i) automated NoC generation; (ii) automated production of NoC-IP core interfaces; and (iii) seamless analysis of NoC traffic parameters. The objective of this paper is to present the MAIA framework, which includes functions to address all these requirements. NoCs generated by the MAIA framework have been used to successfully prototype SoCs in FPGAs.

...read moreread less

Proceedings Article•DOI•

MUP: a minimal unsatisfiability prover

[...]

Jinbo Huang¹•Institutions (1)

University of California, Los Angeles¹

18 Jan 2005

TL;DR: This paper describes a practical algorithm that decides the minimal unsatisfiability of any CNF formula through BDD manipulation and provides an empirical evaluation of the algorithm, highlighting its efficiency on a set of hard problems as well as its ability to work with existing subformula extraction tools to achieve optimal results.

...read moreread less

Abstract: After establishing the unsatisfiability of a SAT instance encoding a typical design task, there is a practical need to identify its minimal unsatisfiable subsets, which pinpoint the reasons for the infeasibility of the design Due to the potentially expensive computation, existing tools for the extraction of unsatisfiable subformulas do not guarantee the minimality of the results This paper describes a practical algorithm that decides the minimal unsatisfiability of any CNF formula through BDD manipulation This algorithm has a worse-case complexity that is exponential only in the treewidth of the CNF formula We provide an empirical evaluation of the algorithm, highlighting its efficiency on a set of hard problems as well as its ability to work with existing subformula extraction tools to achieve optimal results

...read moreread less

Proceedings Article•DOI•

Fast floorplanning by look-ahead enabled recursive bipartitioning

[...]

Jason Cong¹, Michail Romesis¹, Joseph R. Shinnerl¹•Institutions (1)

University of California, Los Angeles¹

18 Jan 2005

TL;DR: For large floorplanning benchmarks, an implementation, called partitioning to optimize module arrangement (PATOMA), generates solutions with half the wirelength of state-of-the-art floorplanners in orders of magnitude less run time.

...read moreread less

Abstract: A new paradigm is introduced for floorplanning any combination of fixed-shape and variable-shape blocks under tight fixed-outline area constraints and a wirelength objective. Dramatic improvement over traditional floor-planning methods is achieved by explicit construction of strictly legal layouts for every partition block at every level of a cutsize-driven, top-down hierarchy. By scalably incorporating legalization into the hierarchical flow, post-hoc legalization is successfully eliminated. For large floorplanning benchmarks, an implementation, called PATOMA, generates solutions with half the wirelength of state-of-the-art floorplanners in orders of magnitude less run time.

...read moreread less

Proceedings Article•DOI•

Three-dimensional place and route for FPGAs

[...]

Cristinel Ababei¹, Hushrav Mogal¹, Kia Bazargan¹•Institutions (1)

University of Minnesota¹

18 Jan 2005

TL;DR: This work presents timing-driven partitioning and simulated-annealing (SA)-based placement algorithms together with a detailed routing tool for three-dimensional (3-D) field-programmable gate array (FPGA) integration.

...read moreread less

Abstract: We present timing-driven partitioning and simulated annealing based placement algorithms together with a detailed routing tool for 3D FPGA integration. The circuit is first divided into layers with limited number of inter-layer vias, and then placed on individual layers, while minimizing the delay of critical paths. We use our tool as a platform to explore the potential benefits in terms of delay and wire-length that 3D technologies can offer for FPGA fabrics. Experimental results show on average a total decrease of 21% in wire-length and 24% in delay, can be achieved over traditional 2D chips, when five layers are used in 3D integration.

...read moreread less

Proceedings Article•DOI•

Evaluation of the statistical delay quality model

[...]

Yasuo Sato, Shuji Hamada, Toshiyuki Maeda, Atsuo Takatori, Seiji Kajihara¹ - Show less +1 more•Institutions (1)

Kyushu Institute of Technology¹

18 Jan 2005

TL;DR: A quality model that reflects fabrication process quality, design delay margin, and test timing accuracy is introduced that provides a measure that can predict the level of chip defects that cause delay failure, including marginal delay.

...read moreread less

Abstract: In this paper we introduce a quality model that reflects fabrication process quality, design delay margin, and test timing accuracy. The model provides a measure that can predict the level of chip defects that cause delay failure, including marginal delay. We can therefore use the model to make test vectors that are effective in terms of both testing cost and chip quality. The results of experiments using ISCAS89 benchmark data and some large industrial design data reflect various characteristics of our statistical delay quality model.

...read moreread less

Proceedings Article•DOI•

Sleep transistor sizing using timing criticality and temporal currents

[...]

Anand Ramalingam¹, Bin Zhang¹, Anirudh Devgan, David Z. Pan¹•Institutions (1)

University of Texas at Austin¹

18 Jan 2005

TL;DR: In this article, the authors present a new methodology based on timing criticality and temporal currents to size the sleep transistor, which results in area reduction of sleep transistors by 80% and 49% compared to module based design and cluster based design respectively.

...read moreread less

Abstract: Power gating is a circuit technique that enables high performance and low power operation. One of the challenges in power gating is sizing the sleep transistor which is used to gate the power supply. This paper presents a new methodology based on timing criticality and temporal currents to size the sleep transistor. The timing criticality information and temporal current estimation are obtained using static timing analyzer. The results obtained indicate that our proposed technique results in area reduction of sleep transistors by 80% and 49% compared to module based design and cluster based design respectively.

...read moreread less

Proceedings Article•DOI•

Fast synthesis of exact minimal reversible circuits using group theory

[...]

Guowu Yang¹, Xiaoyu Song¹, William N. N. Hung¹, Marek Perkowski¹•Institutions (1)

Portland State University¹

18 Jan 2005

TL;DR: Fast algorithms to synthesize exact minimal reversible circuits for various types of gates and costs are presented, and it is shown that the Peres gate is a better choice than the standard Toffoli gate in libraries of universal reversible gates.

...read moreread less

Abstract: We present fast algorithms to synthesize exact minimal reversible circuits for various types of gates and costs. By reducing reversible logic synthesis problems to group theory problems, we use the powerful algebraic software GAP to solve such problems. Our algorithms are not only able to minimize for arbitrary cost functions of gates, but also orders of magnitude faster than the existing approaches to reversible logic synthesis. In addition, we show that the Peres gate is a better choice than the standard Toffoli gate in libraries of universal reversible gates.

...read moreread less

Proceedings Article•DOI•

Fast and effective gate-sizing with multiple-V/sub t/ assignment using generalized Lagrangian relaxation

[...]

Hsinwei Chou¹, Yu-Hao Wane, Charlie Chung-Ping Chen²•Institutions (2)

University of Wisconsin-Madison¹, National Taiwan University²

18 Jan 2005

TL;DR: Experimental results show that the novel gate-sizing and multi-V/sub t/ assignment technique based on generalized Lagrangian relaxation exhibits linear runtime and memory usage, and can effectively tune circuits with over 15,000 variables and 8,000 constraints in under 8 minutes.

...read moreread less

Abstract: Simultaneous gate-sizing with multiple V/sub t/ assignment for delay and power optimization is a complicated task in modern custom designs. In this work, we make the key contribution of a novel gate-sizing and multi-V/sub t/ assignment technique based on generalized Lagrangian relaxation. Experimental results show that our technique exhibits linear runtime and memory usage, and can effectively tune circuits with over 15,000 variables and 8,000 constraints in under 8 minutes (250/spl times/ faster than state-of-the-art optimization solvers).

...read moreread less

Proceedings Article•DOI•

Routing track duplication with fine-grained power-gating for FPGA interconnect power reduction

[...]

Yan Lin¹, Fei Li¹, Lei He¹•Institutions (1)

University of California, Los Angeles¹

18 Jan 2005

TL;DR: In this paper, the authors design area-efficient circuits for programmable fine-grained power-gating of individual unused interconnect switches, and reduce interconnect leakage power dramatically.

...read moreread less

Abstract: Power has become an increasingly important design constraint for FPGAs in nanometer technologies, and global interconnects should be the focus of FPGA power reduction as they consume more power than logic cells. We design area-efficient circuits for programmable fine-grained power-gating of individual unused interconnect switches, and reduce interconnect leakage power dramatically because the interconnect switches have an intrinsically low utilization rate for the purpose of programmability. The low leakage interconnect via power-gating reduces total power by 38.18% for the FPGA in 100nm technology. Furthermore, it enables interconnect dynamic power reduction. We design a routing channel containing abundant or duplicated routing tracks with pre-determined high and low Vdd, and develop routing algorithm using low Vdd for non-critical routing to reduce dynamic power. The track-duplicated routing channel has small leakage power and increase the FPGA power reduction to 45.00%.

...read moreread less

Proceedings Article•DOI•

Fast PLL simulation using nonlinear VCO macromodels for accurate prediction of jitter and cycle-slipping due to loop non-idealities and supply noise

[...]

Xiaolue Lai¹, Yayun Wan¹, Jaijeet Roychowdhury¹•Institutions (1)

University of Minnesota¹

18 Jan 2005

TL;DR: In this paper, a nonlinear macro model based PLL simulation technique was proposed to capture the dynamics of complex phenomena such as locking, cycle slipping and power supply noise induced PLL jitter, replicating qualitative features from full SPICE simulations accurately.

...read moreread less

Abstract: Phase-locked loops (PLLs) are widely used in electronic systems. As PLL malfunction is one of the most important factors in re-fabs of SoCs, fast simulation of PLLs to capture non-ideal behavior accurately is an immediate, pressing need in the semiconductor design industry. In this paper, we present a nonlinear macro model based PLL simulation technique that is considerably more accurate than prior linear PLL simulation techniques. Our method is able to accurately capture transient behavior and faithfully estimate timing jitter in noisy PLLs. We demonstrate the proposed technique on ring and LC voltage-controlled oscillator (VCO) based PLLs, and compare results against linear PLL macromodels and full SPICE-level simulation. We show that, unlike prior linear macromodel based approaches, the proposed nonlinear technique captures the dynamics of complex phenomena such as locking, cycle slipping and power supply noise induced PLL jitter, replicating qualitative features from full SPICE simulations accurately while providing speedups of over two orders of magnitude.

...read moreread less

Proceedings Article•DOI•

Optimal redistribution of white space for wire length minimization

[...]

Xiaoping Tang¹, Ruiqi Tian², Martin D. F. Wong³•Institutions (3)

IBM¹, Freescale Semiconductor², University of Illinois at Urbana–Champaign³

18 Jan 2005

TL;DR: This paper first shows that the problem can be formulated as linear programming, and then proposes an efficient min-cost flow based approach to solve it, which guarantees to obtain the minimum of total wire length in polynomial time and meanwhile keep the minimum area by distributing white space smarter for a given floorplan topology.

...read moreread less

Abstract: Existing floorplanning algorithms compact blocks to the left and bottom. Although the compaction obtains an optimal area, it may not be good to meet other objectives such as minimizing total wire length which is the first-order objective. It is not known in the literature how to place blocks to obtain an optimal wire length. In this paper, we first show that the problem can be formulated as linear programming. Thereafter, instead of using the general but slow linear programming, we propose an efficient min-cost flow based approach to solve it. Our approach guarantees to obtain the minimum of total wire length in polynomial time and meanwhile keep the minimum area by distributing white space smarter for a given floorplan topology. We also show that the approach can be easily extended to handle constraints such as fixed-frame (fixed area), IO pins, pre-placed blocks, boundary blocks, range placement, alignment and abutment, rectilinear blocks, soft blocks, one-dimensional cluster placement, and bounded net delay, without loss of optimality. Practically, the algorithm is so efficient in that it finishes in less than 0.4 seconds for all MCNC benchmarks of block placement. It is also very effective. Experimental results show we can improve 4.2% of wire length even on very compact floorplans. Thus it provides an ideal way of post-floorplanning (refine floorplanning).

...read moreread less

Proceedings Article•DOI•

Wire congestion and thermal aware 3D global placement

[...]

Karthik Balakrishnan¹, Vidit Nanda¹, S. Easwar¹, Sung Kyu Lim¹•Institutions (1)

Georgia Institute of Technology¹

18 Jan 2005

TL;DR: This paper develops techniques to reduce the maximum temperature and wire congestion of 3D circuits without compromising total wirelength and via count and shows smooth tradeoff among congestion, temperature, wirelength, and via.

...read moreread less

Abstract: The recent popularity of 3D IC technology stems from its enhanced performance capabilities and reduced wire-length. However, wire congestion and thermal issues are exacerbated due to the compact nature of these layered technologies. In this paper, we develop techniques to reduce the maximum temperature and wire congestion of 3D circuits without compromising total wirelength and via count. Our approach consists of two phases. First, we use a multi-level min-cut placement with a modified gain function for local wire congestion and dynamic power consumption reduction. Second, we perform simulated annealing together with full-length thermal analysis and global routing for global wire congestion and maximum temperature reduction. Our experimental results show smooth tradeoff among congestion, temperature, wirelength, and via.

...read moreread less

Proceedings Article•DOI•

On-chip thermal gradient analysis and temperature flattening for SoC design

[...]

Takashi Sato¹, Junji Ichimiya², Nobuto Ono, Kotaro Hachiya, Masanori Hashimoto³ - Show less +1 more•Institutions (3)

Renesas Electronics¹, Ricoh², Osaka University³

18 Jan 2005

TL;DR: In this paper, the impact of dominant parameters, such as area occupancy of memory/logic, power density, and floorplan, on thermal gradient and clock skew are studied, and a procedure to amend thermal gradient is proposed.

...read moreread less

Abstract: This paper quantitatively analyzes thermal gradient of SoC and proposes a thermal flattening procedure. First, the impact of dominant parameters, such as area occupancy of memory/logic, power density, and floorplan on thermal gradient and clock skew are studied. Important results obtained here are 1) the maximum temperature difference increases with higher memory area occupancy and 2) the difference is very floorplan sensitive. Then, we propose a procedure to amend thermal gradient. A slight floorplan modification using the proposed procedure improves on-chip thermal gradient significantly.

...read moreread less

Proceedings Article•DOI•

Placement with symmetry constraints for analog layout design using TCG-S

[...]

Jai-Ming Lin¹, Guang-Ming Wu, Yao-Wen Chang², Jen-Hui Chuang³•Institutions (3)

Realtek¹, National Taiwan University², National Chiao Tung University³

18 Jan 2005

TL;DR: This paper proposes the necessary and sufficient conditions of TCG-S for symmetry modules, and proposes a polynomial-time packing algorithm for a TCg-S with symmetry constraints that results in the best area utilization.

...read moreread less

Abstract: In order to handle device matching for analog circuits, some pairs of modules need to be placed symmetrically with respect to a common axis. In this paper, we deal with the module placement with symmetry constraints for analog design using the transitive closure graph-sequence (TCG-S) representation. Since the geometric relationships of modules are transparent to TCG-S and its induced operations, TCG-S has better flexibility than previous works in dealing with symmetry constraints. We first propose the necessary and sufficient conditions of TCG-S for symmetry modules. Then, we propose a polynomial-time packing algorithm for a TCG-S with symmetry constraints. Experimental results show that the TCG-S based algorithm results in the best area utilization.

...read moreread less

Proceedings Article•DOI•

An-OARSMan: obstacle-avoiding routing tree construction with good length performance

[...]

Yu Hu¹, Tong Jing¹, Xianlong Hong¹, Zhe Feng¹, Xiaodong Hu², Guiying Yan² - Show less +2 more•Institutions (2)

Tsinghua University¹, Chinese Academy of Sciences²

18 Jan 2005

TL;DR: This paper focuses on the OARSMT problem and presents an algorithm, named An-OARSMan, based on ant colony optimization, which can handle complex obstacle cases including both convex and concave polygon obstacles with good length performance.

...read moreread less

Abstract: Routing is one of the important steps in VLSI/ULSI physical design. The rectilinear Steiner minimum tree (RSMT) construction is an essential part of routing. Since macro cells, IP blocks, and pre-routed nets are often regarded as obstacles in the routing phase, obstacle-avoiding RSMT (OARSMT) algorithms are useful for practical routing applications. This paper focuses on the OARSMT problem and presents an algorithm, named An-OARSMan, based on ant colony optimization. A greedy obstacle penalty distance (OP-distance) local heuristic is used in the algorithm and performed on the track graph. The algorithm has been implemented and tested on different kinds of obstacles. Experimental results show that An-OARSMan can handle complex obstacle cases including both convex and concave polygon obstacles with good length performance. It can always achieve the optimal solution in the cases with no more than 7 terminals.

...read moreread less

Proceedings Article•DOI•

Feasibility analysis of messages for on-chip networks using wormhole routing

[...]

Zhonghai Lu¹, Axel Jantsch¹, Ingo Sander¹•Institutions (1)

Royal Institute of Technology¹

18 Jan 2005

TL;DR: A novel feasibility analysis for real-time (RT) and nonrealtime (NT) messages in wormhole-routed networks on chip is presented and a contention tree is formulated that captures contentions in the network.

...read moreread less

Abstract: The feasibility of a message in a network concerns if its timing property can be satisfied without jeopardizing any messages already in the network to meet their timing properties. We present a novel feasibility analysis for real-time (RT) and nonreal-time (NT) messages in wormhole-routed networks on chip. For RT messages, we formulate a contention tree that captures contentions in the network. For coexisting RT and NT messages, we propose a simple bandwidth partitioning method that allows us to analyze their feasibility independently.

...read moreread less

Collapse