scispace - formally typeset
Search or ask a question

Showing papers presented at "Asia and South Pacific Design Automation Conference in 2005"


Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper proposes the first routing algorithm that considers feasibility of redundant via insertion in the detailed routing stage, and transforms the routing problem to a multiple constraint shortest path problem, and solved by Lagrangian relaxation technique.
Abstract: Redundant via insertion is a good solution to reduce the yield loss by via failure. However, the existing methods are all post-layout optimizations that insert redundant via after detailed routing. In this paper, we propose the first routing algorithm that considers feasibility of redundant via insertion in the detailed routing stage. Our routing problem is formulated as maze routing with redundant via constraints. The problem is transformed to a multiple constraint shortest path problem, and solved by Lagrangian relaxation technique. Experimental results show that our algorithm can find routing layout with much higher rate of redundant via than conventional maze routing.

159 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: Efficient quantum-logic circuits that perform two tasks are discussed: 1) implementing generic quantum computations, and 2) initializing quantum registers that are asymptotically optimal for respective tasks.
Abstract: The pressure of fundamental limits on classical computation and the promise of exponential speedups from quantum effects have recently brought quantum circuits to the attention of the EDA community (Iwama et al., 2002; Shende et al., 2003; Bullock and Markov, 2003; Shende et al., 2004; Hung et al., 2004). We discuss efficient circuits to initialize quantum registers and implement generic quantum computations. Our techniques yield circuits that are twice as small as the best previously published technique. Moreover, a theoretical lower bound shows that our new circuits can be improved by at most a factor of two. Further, the circuits grow by at most a factor of nine under severe architectural restrictions.

159 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper proposes an efficient 3D multilevel routing approach that includes a novel through-the-silicon via (TS-via) planning algorithm that features an adaptive lumped resistive thermal model and a two-step multileVEL TS-via planning scheme.
Abstract: 3-D IC has a great potential for improving circuit performance and degree of integration. It is also an attractive platform for system-on-chip or system-in-package solutions. A critical issue in 3-D circuit design is heat dissipation. In this paper we propose an efficient 3-D multilevel routing approach that includes a novel through-the-silicon via (TS-via) planning algorithm. The proposed approach features an adaptive lumped resistive thermal model and a two-step multilevel TS-via planning scheme. Experimental results show that with multilevel TS-via planning, the thermal-driven approach can reduce the maximum temperature to the required temperature with reasonable wirelength increase. Compared to a post processing approach for dummy TS-via insertion, to achieve the same required temperature, our approach uses 80% fewer TS-vias. To our knowledge, this proposed approach is the first thermal-driven 3-D routing algorithm.

143 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper presents the concepts of better than worst-case design and highlights two exemplary designs: the DIVA checker and Razor logic and shows how this approach to system implementation relaxes design constraints on core components, which reduces the effects of physical design challenges and creates opportunities to optimize performance and power characteristics.
Abstract: The progressive trend of fabrication technologies towards the nanometer regime has created a number of new physical design challenges for computer architects. Design complexity, uncertainty in environmental and fabrication conditions, and single-event upsets all conspire to compromise system correctness and reliability. Recently, researchers have begun to advocate a new design strategy called Better Than Worst-Case design that couples a complex core component with a simple reliable checker mechanism. By delegating the responsibility for correctness and reliability of the design to the checker, it becomes possible to build provably correct designs that effectively address the challenges of deep submicron design. In this paper, we present the concepts of Better Than Worst-Case design and high light two exemplary designs: the DIVA checker and Razor logic. We show how this approach to system implementation relaxes design constraints on core components, which reduces the effects of physical design challenges and creates opportunities to optimize performance and power characteristics. We demonstrate the advantages of relaxed design constraints for the core components by applying typical-case optimization (TCO) techniques to an adder circuit. Finally, we discuss the challenges and opportunities posed to CAD tools in the context of Better Than Worst-Case design. In particular, we describe the additional support required for analyzing run-time characteristics of designs and the many opportunities which are created to incorporate typical-case optimizations into synthesis and verification.

135 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: This work presents an integrated approach to mapping of cores onto NoC topologies and physical planning of NoCs, where the position and size of the cores and network components are computed.
Abstract: Networks on chips (NoCs) have evolved as the communication design paradigm of future systems on chips (SoCs). In this work we target the NoC design of complex SoCs with heterogeneous processor/memory cores, providing quality-of-service (QoS) for the application. We present an integrated approach to mapping of cores onto NoC topologies and physical planning of NoCs, where the position and size of the cores and network components are computed. Our design methodology automates NoC mapping, physical planning, topology selection, topology optimization and instantiation, bridging an important design gap in building application specific NoCs. We also present a methodology to guarantee QoS for the application during the mapping-physical planning process by satisfying the delay/jitter constraints and real-time constraints of the traffic streams. Experimental studies show large area savings (up to 2/spl times/), bandwidth savings (up to 5/spl times/) and network component savings (up to 2.2/spl times/ in buffer count, 3.8/spl times/ in number of wires, 1.6/spl times/ in switch ports) compared to traditional design approaches.

123 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: In this article, a dynamic programming-based technique for assist-feature correctness (AFCorr) was proposed to improve the depth of focus of standard-cell designs in subwavelength lithography.
Abstract: Sub-resolution assist features (SRAFs) provide an absolutely essential technique for critical dimension (CD) control and process window enhancement in subwavelength lithography. However, as focus levels change during manufacturing, CDs at a given "legal" pitch can fail to achieve manufacturing tolerances required for adequate yield. Furthermore, adoption of off-axis illumination (OAI) and SRAF techniques to enhance resolution at minimum pitch worsens printability of patterns at other pitches. This paper describes a novel dynamic programming-based technique for assist-feature correctness (AFCorr) in detailed placement of standard-cell designs. For benchmark designs in 130 nm and 90 nm technologies, AFCorr achieves improved depth of focus and substantial improvement in CD control with negligible timing, area, or CPU overhead. The advantages of AFCorr are expected to increase in future technology nodes.

100 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper has implemented an on-demand paging scheme on an infrastructure based WLAN consisting of iPAQ PDAs equipped with Bluetooth radios and Cisco Aironet wireless networking cards and shows power saving ranging from 23% to 48% over the present 802.11b wireless LAN.
Abstract: The power consumption of the network interface plays a major role in determining the total operating lifetime of wireless networked embedded systems. In case of on-demand paging, a low power secondary radio is used to wake up the higher power radio, allowing the latter to sleep for longer periods of time. In this paper we present use of Bluetooth radios to serve as a paging channel for the 802.11b wireless LAN. We have implemented an on-demand paging scheme on an infrastructure based WLAN consisting of iPAQ PDAs equipped with Bluetooth radios and Cisco Aironet wireless networking cards. Our results show power saving ranging from 23% to 48% over the present 802.11b standard operating modes with negligible impact on performance.

99 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: In this article, the authors proposed a pseudo-functional test methodology that attempts to minimize the over-testing problem of the scan-based circuits for the delay faults, where the first pattern of a two-pattern test is still delivered by scan in the test mode but the pattern is generated in such a way that it does not violate the functional constraints extracted from the functional logic.
Abstract: Recent research results, have shown that the traditional structural testing for, delay and crosstalk faults may result in over-testing due to the non-trivial number of such faults that are untestable in the functional mode while testable in the test mode. This paper presents a pseudo-functional test methodology that attempts to minimize the over-testing problem of the scan-based circuits for the delay faults. The first pattern of a two-pattern test is still delivered by scan in the test mode but the pattern is generated in such a way that it does not violate the functional constraints extracted from the functional logic. In this paper, we use a SAT solver to extract a set of functional constraints which consists of illegal states and internal signal correlation. Along with the functional justification (also called broad-side) test application scheme, the functional constraints are imposed to a commercial delay-fault ATPG tool to generate pseudo-functional delay tests. The experimental results indicate that the percentage of untestable delay faults is non-trivial for many circuits which support the hypothesis of the over-testing problem in delay testing. The results also indicate the effectiveness of the proposed constraint extraction method.

97 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: The proposed VBSME can achieve 100% PE utilization by employing a preload register and a search data buffer inside each PE and allow real-time processing of 4CIF(704x576) video with 15 fps at 100 Mhz for a search range of [-32~+31].
Abstract: We describe a fast VLSI architecture for full-search motion estimation for the blocks with 7 different sizes in MPEG-4 AVC/H.264. The proposed variable block size motion estimation (VBSME) architecture consists of a 16/spl times/16 PE array, an adder tree and comparators to find all 41 motion vectors and their minimum SADs for the blocks of 16/spl times/16, 16/spl times/8, 8/spl times/16, 8/spl times/8, 8/spl times/4, 4/spl times/8 and 4/spl times/4. It employs a 2D datapath and its control of the search area data is simple and regular. The proposed VBSME can achieve 100% PE utilization by employing a preload register and a search data buffer inside each PE and allow real-time processing of 4CIF(704/spl times/576) video with 15 fps at 100 MHz for a search range of |-32/spl sim/+31|.

96 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: This work analyzes the mapping of applications onto generic regular networks-on-chip (NoCs) by taking into consideration the dynamic behavior of the target application and thus potential contentions in the intercommunication of the cores.
Abstract: This work analyzes the mapping of applications onto generic regular networks-on-chip (NoCs). Cores must be placed considering communication requirements, so as to minimize the overall application execution time and energy consumption. We expand previous mapping strategies by taking into consideration the dynamic behavior of the target application and thus potential contentions in the intercommunication of the cores. Experimental results for a suite of 22 benchmarks and various NoC sizes show that a 42% average reduction in the execution time of the mapped application can be obtained, together with a 21% average reduction in the total energy consumption for state-of-the-art technologies.

92 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: The methodology uses device-level simulations to characterize a coarse-grained architectural model and incorporates architectural parameters to estimate the dominant wire capacitance and finds that the routing resources and the clock to consume the maximum power.
Abstract: Power consumption in FPGA designs calls for power-aware design and power budgeting early in the design cycle. In this work, we leverage the FPGA architecture to present an efficient and accurate methodology for pre-silicon dynamic power estimation of FPGA-based designs. Our methodology uses device-level simulations to characterize a coarse-grained architectural model and incorporates architectural parameters to estimate the dominant wire capacitance. Such an approach not only reduces the need for tedious and time consuming silicon characterizations but ensures accurate pre-silicon power predictions. We apply the methodology to estimate the power consumption of a state-of-the-art Spartan-3/spl trade/ FPGA family, evaluate the estimation results against silicon measurements, and present a detailed power breakdown of the FPGA. Our results find that the routing resources and the clock to consume the maximum power.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: The results show that static voltage and speed assignment can achieve up to 42% savings in total energy for various media and signal processing applications, while application specific dynamic approaches provide up to 44% energy savings in the case of MPEG-2 encoder application, when compared to a single clocked system architecture.
Abstract: Due to increasing clock speeds and shrinking technologies, distributing a single global clock signal throughout a chip is becoming a difficult and challenging proposition. In this paper, we address the problem of energy optimal local speed and voltage selection in frequency/voltage island based systems under given performance constraints. Our results show that static voltage and speed assignment can achieve up to 42% savings in total energy for various media and signal processing applications, while application specific dynamic approaches provide up to 44% energy savings in the case of MPEG-2 encoder application, when compared to a single clocked system architecture.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: A highly accurate fast algorithm for computing the on-chip temperature distribution due to power sources located on the top surface of the chip using a combination of several computational techniques including the Green function method, the discrete cosine transform (DCT), and the table look-up technique.
Abstract: Temperature-related effects are critical in determining both the performance and reliability of VLSI circuits. Accurate and efficient estimation of the temperature distribution corresponding to a specific circuit layout is indispensable in physical design automation tools. In this paper, we propose a highly accurate fast algorithm for computing the on-chip temperature distribution due to power sources located on the top surface of the chip. The method is a combination of several computational techniques including the Green function method, the discrete cosine transform (DCT), and the table look-up technique. The high accuracy of the algorithm comes from the fully analytical nature of the Green function method, and the high efficiency is due to the application of the fast Fourier transform (FFT) technique to compute the DCT and later obtaining the temperature field for any power source distribution using the pre-calculated look-up table. Experimental results have demonstrated that our method has a relative error of below 1% compared with commercial computational fluid dynamic (CFD) softwares for thermal analysis, while the efficiency of our method is orders of magnitude higher than the direct application of the Green function method.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: The surprising results that the new 2D floorplanner has produced slicing floorplans for the two largest MCNC benchmarks ami33 and ami49 which have the smallest areas ever reported in the literature are reported.
Abstract: In this paper we present a floorplanning algorithm for 3-D ICs. The problem can be formulated as that of packing a given set of 3-D rectangular blocks while minimizing a suitable cost function. Our algorithm is based on a generalization of the classical 2-D slicing floorplans to 3-D slicing floorplans. A new encoding scheme of slicing floorplans (2-D/3-D) and its associated set of moves form the basis of the new simulated annealing based algorithm. The bestknown algorithm for packing 3-D rectangular blocks is based on simulated annealing using sequence-triple floorplan representation. Experimental results show that our algorithm produces packing results on average 3% better than the sequence-triple-based algorithm under the same annealing parameters, and our algorithm runs much faster (17 times for problems containing 100 blocks) than the sequence-triple. Moreover, our algorithm can be extended to consider various types of placement constraints and thermal distribution while the existing sequence-triple-based algorithm does not have such capabilities. Finally, when specializing to 2-D problems, our algorithm is a new 2-D slicing floorplanning algorithm. We are excited to report the surprising results that our new 2-D floorplanner has produced slicing floorplans for the two largest MCNC benchmarks ami33 and ami49 which have the smallest areas (among all slicing/nonslicing floorplanning algorithms) ever reported in the literature.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: The objective of this paper is to present the MAIA framework, which includes functions to address all requirements of automated NoC generation, production and analysis, and seamless analysis of NoC traffic parameters.
Abstract: The increasing complexity of SoCs makes networks on chip (NoC) a promising substitute for busses and dedicated wires interconnection schemes. However, new tools need to be developed to integrate NoC interconnection architectures and IP cores into SoCs. Such tools have to fulfill three main requirements: (i) automated NoC generation; (ii) automated production of NoC-IP core interfaces; and (iii) seamless analysis of NoC traffic parameters. The objective of this paper is to present the MAIA framework, which includes functions to address all these requirements. NoCs generated by the MAIA framework have been used to successfully prototype SoCs in FPGAs.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper describes a practical algorithm that decides the minimal unsatisfiability of any CNF formula through BDD manipulation and provides an empirical evaluation of the algorithm, highlighting its efficiency on a set of hard problems as well as its ability to work with existing subformula extraction tools to achieve optimal results.
Abstract: After establishing the unsatisfiability of a SAT instance encoding a typical design task, there is a practical need to identify its minimal unsatisfiable subsets, which pinpoint the reasons for the infeasibility of the design Due to the potentially expensive computation, existing tools for the extraction of unsatisfiable subformulas do not guarantee the minimality of the results This paper describes a practical algorithm that decides the minimal unsatisfiability of any CNF formula through BDD manipulation This algorithm has a worse-case complexity that is exponential only in the treewidth of the CNF formula We provide an empirical evaluation of the algorithm, highlighting its efficiency on a set of hard problems as well as its ability to work with existing subformula extraction tools to achieve optimal results

Proceedings ArticleDOI
18 Jan 2005
TL;DR: For large floorplanning benchmarks, an implementation, called partitioning to optimize module arrangement (PATOMA), generates solutions with half the wirelength of state-of-the-art floorplanners in orders of magnitude less run time.
Abstract: A new paradigm is introduced for floorplanning any combination of fixed-shape and variable-shape blocks under tight fixed-outline area constraints and a wirelength objective. Dramatic improvement over traditional floor-planning methods is achieved by explicit construction of strictly legal layouts for every partition block at every level of a cutsize-driven, top-down hierarchy. By scalably incorporating legalization into the hierarchical flow, post-hoc legalization is successfully eliminated. For large floorplanning benchmarks, an implementation, called PATOMA, generates solutions with half the wirelength of state-of-the-art floorplanners in orders of magnitude less run time.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: This work presents timing-driven partitioning and simulated-annealing (SA)-based placement algorithms together with a detailed routing tool for three-dimensional (3-D) field-programmable gate array (FPGA) integration.
Abstract: We present timing-driven partitioning and simulated annealing based placement algorithms together with a detailed routing tool for 3D FPGA integration. The circuit is first divided into layers with limited number of inter-layer vias, and then placed on individual layers, while minimizing the delay of critical paths. We use our tool as a platform to explore the potential benefits in terms of delay and wire-length that 3D technologies can offer for FPGA fabrics. Experimental results show on average a total decrease of 21% in wire-length and 24% in delay, can be achieved over traditional 2D chips, when five layers are used in 3D integration.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: A quality model that reflects fabrication process quality, design delay margin, and test timing accuracy is introduced that provides a measure that can predict the level of chip defects that cause delay failure, including marginal delay.
Abstract: In this paper we introduce a quality model that reflects fabrication process quality, design delay margin, and test timing accuracy. The model provides a measure that can predict the level of chip defects that cause delay failure, including marginal delay. We can therefore use the model to make test vectors that are effective in terms of both testing cost and chip quality. The results of experiments using ISCAS89 benchmark data and some large industrial design data reflect various characteristics of our statistical delay quality model.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: In this article, the authors present a new methodology based on timing criticality and temporal currents to size the sleep transistor, which results in area reduction of sleep transistors by 80% and 49% compared to module based design and cluster based design respectively.
Abstract: Power gating is a circuit technique that enables high performance and low power operation. One of the challenges in power gating is sizing the sleep transistor which is used to gate the power supply. This paper presents a new methodology based on timing criticality and temporal currents to size the sleep transistor. The timing criticality information and temporal current estimation are obtained using static timing analyzer. The results obtained indicate that our proposed technique results in area reduction of sleep transistors by 80% and 49% compared to module based design and cluster based design respectively.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: Fast algorithms to synthesize exact minimal reversible circuits for various types of gates and costs are presented, and it is shown that the Peres gate is a better choice than the standard Toffoli gate in libraries of universal reversible gates.
Abstract: We present fast algorithms to synthesize exact minimal reversible circuits for various types of gates and costs. By reducing reversible logic synthesis problems to group theory problems, we use the powerful algebraic software GAP to solve such problems. Our algorithms are not only able to minimize for arbitrary cost functions of gates, but also orders of magnitude faster than the existing approaches to reversible logic synthesis. In addition, we show that the Peres gate is a better choice than the standard Toffoli gate in libraries of universal reversible gates.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: Experimental results show that the novel gate-sizing and multi-V/sub t/ assignment technique based on generalized Lagrangian relaxation exhibits linear runtime and memory usage, and can effectively tune circuits with over 15,000 variables and 8,000 constraints in under 8 minutes.
Abstract: Simultaneous gate-sizing with multiple V/sub t/ assignment for delay and power optimization is a complicated task in modern custom designs. In this work, we make the key contribution of a novel gate-sizing and multi-V/sub t/ assignment technique based on generalized Lagrangian relaxation. Experimental results show that our technique exhibits linear runtime and memory usage, and can effectively tune circuits with over 15,000 variables and 8,000 constraints in under 8 minutes (250/spl times/ faster than state-of-the-art optimization solvers).

Proceedings ArticleDOI
18 Jan 2005
TL;DR: In this paper, the authors design area-efficient circuits for programmable fine-grained power-gating of individual unused interconnect switches, and reduce interconnect leakage power dramatically.
Abstract: Power has become an increasingly important design constraint for FPGAs in nanometer technologies, and global interconnects should be the focus of FPGA power reduction as they consume more power than logic cells. We design area-efficient circuits for programmable fine-grained power-gating of individual unused interconnect switches, and reduce interconnect leakage power dramatically because the interconnect switches have an intrinsically low utilization rate for the purpose of programmability. The low leakage interconnect via power-gating reduces total power by 38.18% for the FPGA in 100nm technology. Furthermore, it enables interconnect dynamic power reduction. We design a routing channel containing abundant or duplicated routing tracks with pre-determined high and low Vdd, and develop routing algorithm using low Vdd for non-critical routing to reduce dynamic power. The track-duplicated routing channel has small leakage power and increase the FPGA power reduction to 45.00%.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: In this paper, a nonlinear macro model based PLL simulation technique was proposed to capture the dynamics of complex phenomena such as locking, cycle slipping and power supply noise induced PLL jitter, replicating qualitative features from full SPICE simulations accurately.
Abstract: Phase-locked loops (PLLs) are widely used in electronic systems. As PLL malfunction is one of the most important factors in re-fabs of SoCs, fast simulation of PLLs to capture non-ideal behavior accurately is an immediate, pressing need in the semiconductor design industry. In this paper, we present a nonlinear macro model based PLL simulation technique that is considerably more accurate than prior linear PLL simulation techniques. Our method is able to accurately capture transient behavior and faithfully estimate timing jitter in noisy PLLs. We demonstrate the proposed technique on ring and LC voltage-controlled oscillator (VCO) based PLLs, and compare results against linear PLL macromodels and full SPICE-level simulation. We show that, unlike prior linear macromodel based approaches, the proposed nonlinear technique captures the dynamics of complex phenomena such as locking, cycle slipping and power supply noise induced PLL jitter, replicating qualitative features from full SPICE simulations accurately while providing speedups of over two orders of magnitude.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper first shows that the problem can be formulated as linear programming, and then proposes an efficient min-cost flow based approach to solve it, which guarantees to obtain the minimum of total wire length in polynomial time and meanwhile keep the minimum area by distributing white space smarter for a given floorplan topology.
Abstract: Existing floorplanning algorithms compact blocks to the left and bottom. Although the compaction obtains an optimal area, it may not be good to meet other objectives such as minimizing total wire length which is the first-order objective. It is not known in the literature how to place blocks to obtain an optimal wire length. In this paper, we first show that the problem can be formulated as linear programming. Thereafter, instead of using the general but slow linear programming, we propose an efficient min-cost flow based approach to solve it. Our approach guarantees to obtain the minimum of total wire length in polynomial time and meanwhile keep the minimum area by distributing white space smarter for a given floorplan topology. We also show that the approach can be easily extended to handle constraints such as fixed-frame (fixed area), IO pins, pre-placed blocks, boundary blocks, range placement, alignment and abutment, rectilinear blocks, soft blocks, one-dimensional cluster placement, and bounded net delay, without loss of optimality. Practically, the algorithm is so efficient in that it finishes in less than 0.4 seconds for all MCNC benchmarks of block placement. It is also very effective. Experimental results show we can improve 4.2% of wire length even on very compact floorplans. Thus it provides an ideal way of post-floorplanning (refine floorplanning).

Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper develops techniques to reduce the maximum temperature and wire congestion of 3D circuits without compromising total wirelength and via count and shows smooth tradeoff among congestion, temperature, wirelength, and via.
Abstract: The recent popularity of 3D IC technology stems from its enhanced performance capabilities and reduced wire-length. However, wire congestion and thermal issues are exacerbated due to the compact nature of these layered technologies. In this paper, we develop techniques to reduce the maximum temperature and wire congestion of 3D circuits without compromising total wirelength and via count. Our approach consists of two phases. First, we use a multi-level min-cut placement with a modified gain function for local wire congestion and dynamic power consumption reduction. Second, we perform simulated annealing together with full-length thermal analysis and global routing for global wire congestion and maximum temperature reduction. Our experimental results show smooth tradeoff among congestion, temperature, wirelength, and via.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: In this paper, the impact of dominant parameters, such as area occupancy of memory/logic, power density, and floorplan, on thermal gradient and clock skew are studied, and a procedure to amend thermal gradient is proposed.
Abstract: This paper quantitatively analyzes thermal gradient of SoC and proposes a thermal flattening procedure. First, the impact of dominant parameters, such as area occupancy of memory/logic, power density, and floorplan on thermal gradient and clock skew are studied. Important results obtained here are 1) the maximum temperature difference increases with higher memory area occupancy and 2) the difference is very floorplan sensitive. Then, we propose a procedure to amend thermal gradient. A slight floorplan modification using the proposed procedure improves on-chip thermal gradient significantly.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper proposes the necessary and sufficient conditions of TCG-S for symmetry modules, and proposes a polynomial-time packing algorithm for a TCg-S with symmetry constraints that results in the best area utilization.
Abstract: In order to handle device matching for analog circuits, some pairs of modules need to be placed symmetrically with respect to a common axis. In this paper, we deal with the module placement with symmetry constraints for analog design using the transitive closure graph-sequence (TCG-S) representation. Since the geometric relationships of modules are transparent to TCG-S and its induced operations, TCG-S has better flexibility than previous works in dealing with symmetry constraints. We first propose the necessary and sufficient conditions of TCG-S for symmetry modules. Then, we propose a polynomial-time packing algorithm for a TCG-S with symmetry constraints. Experimental results show that the TCG-S based algorithm results in the best area utilization.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: This paper focuses on the OARSMT problem and presents an algorithm, named An-OARSMan, based on ant colony optimization, which can handle complex obstacle cases including both convex and concave polygon obstacles with good length performance.
Abstract: Routing is one of the important steps in VLSI/ULSI physical design. The rectilinear Steiner minimum tree (RSMT) construction is an essential part of routing. Since macro cells, IP blocks, and pre-routed nets are often regarded as obstacles in the routing phase, obstacle-avoiding RSMT (OARSMT) algorithms are useful for practical routing applications. This paper focuses on the OARSMT problem and presents an algorithm, named An-OARSMan, based on ant colony optimization. A greedy obstacle penalty distance (OP-distance) local heuristic is used in the algorithm and performed on the track graph. The algorithm has been implemented and tested on different kinds of obstacles. Experimental results show that An-OARSMan can handle complex obstacle cases including both convex and concave polygon obstacles with good length performance. It can always achieve the optimal solution in the cases with no more than 7 terminals.

Proceedings ArticleDOI
18 Jan 2005
TL;DR: A novel feasibility analysis for real-time (RT) and nonrealtime (NT) messages in wormhole-routed networks on chip is presented and a contention tree is formulated that captures contentions in the network.
Abstract: The feasibility of a message in a network concerns if its timing property can be satisfied without jeopardizing any messages already in the network to meet their timing properties. We present a novel feasibility analysis for real-time (RT) and nonreal-time (NT) messages in wormhole-routed networks on chip. For RT messages, we formulate a contention tree that captures contentions in the network. For coexisting RT and NT messages, we propose a simple bandwidth partitioning method that allows us to analyze their feasibility independently.