scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Fast algorithms for IR drop analysis in large power grid

31 May 2005-pp 351-357
TL;DR: Two iterative algorithms based on node-by-node traversals and row- by-row traversals of the power grid, respectively are presented and can be considered as efficient implementations of the classical successive over relaxation iterative method for solving linear systems.
Abstract: Due to the extremely large size of power grids, IR drop analysis has become a computationally challenging problem both in terms of runtime and memory usage. Although IR drop analysis can be naturally formulated as the problem of solving a linear system, the system is too large to be solved by existing linear solvers. In this paper, we present two iterative algorithms based on node-by-node traversals and row-by-row traversals of the power grid, respectively. Our algorithms are extremely fast and guarantee convergence to the exact solutions. In fact, they can be considered as efficient implementations of the classical successive over relaxation iterative method for solving linear systems. Our methods take full advantage of the special structure of the power grid. Experimental results show that our algorithms out-perform the random-walk-based algorithm which is the best known method today. For a 16-million node problem, our row-based algorithm took 26.47 minutes while the random-walk-based algorithm took 19.6 hours. Our row-based algorithm produced an exact solution while the random walk produced a solution with maximum error of 5.7 mV.
Citations
More filters
Book
11 Mar 2009
TL;DR: EDA/VLSI practitioners and researchers in need of fluency in an "adjacent" field will find this an invaluable reference to the basic EDA concepts, principles, data structures, algorithms, and architectures for the design, verification, and test of VLSI circuits.
Abstract: This book provides broad and comprehensive coverage of the entire EDA flow. EDA/VLSI practitioners and researchers in need of fluency in an "adjacent" field will find this an invaluable reference to the basic EDA concepts, principles, data structures, algorithms, and architectures for the design, verification, and test of VLSI circuits. Anyone who needs to learn the concepts, principles, data structures, algorithms, and architectures of the EDA flow will benefit from this book. Covers complete spectrum of the EDA flow, from ESL design modeling to logic/test synthesis, verification, physical design, and test - helps EDA newcomers to get "up-and-running" quickly Includes comprehensive coverage of EDA concepts, principles, data structures, algorithms, and architectures - helps all readers improve their VLSI design competence Contains latest advancements not yet available in other books, including Test compression, ESL design modeling, large-scale floorplanning, placement, routing, synthesis of clock and power/ground networks - helps readers to design/develop testable chips or products Includes industry best-practices wherever appropriate in most chapters - helps readers avoid costly mistakes Table of Contents Chapter 1: Introduction Chapter 2: Fundamentals of CMOS Design Chapter 3: Design for Testability Chapter 4: Fundamentals of Algorithms Chapter 5: Electronic System-Level Design and High-Level Synthesis Chapter 6: Logic Synthesis in a Nutshell Chapter 7: Test Synthesis Chapter 8: Logic and Circuit Simulation Chapter 9:?Functional Verification Chapter 10: Floorplanning Chapter 11: Placement Chapter 12: Global and Detailed Routing Chapter 13: Synthesis of Clock and Power/Ground Networks Chapter 14: Fault Simulation and Test Generation.

200 citations

Proceedings ArticleDOI
Zhuo Feng1, Peng Li1
10 Nov 2008
TL;DR: This work shows how to exploit recent massively parallel single-instruction multiple-thread (SIMT) based graphics processing unit (GPU) platforms to tackle power grid analysis with promising performance and proposes a GPU-accelerated hybrid multigrid algorithm, GpuHMD, and its implementation.
Abstract: The challenging task of analyzing on-chip power (ground) distribution networks with multi-million node complexity and beyond is key to todaypsilas large chip designs. For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT) based graphics processing unit (GPU) platforms to tackle power grid analysis with promising performance. Several key enablers including GPU-specific algorithm design, circuit topology transformation, workload partitioning, performance tuning are embodied in our GPU-accelerated hybrid multigrid algorithm, GpuHMD, and its implementation. In particular, a proper interplay between algorithm design and SIMT architecture consideration is shown to be essential to achieve good runtime performance. Different from the standard CPU based CAD development, care must be taken to balance between computing and memory access, reduce random memory access patterns and simplify flow control to achieve efficiency on the GPU platform. Extensive experiments on industrial and synthetic benchmarks have shown that the proposed GpuHMD engine can achieve 100times runtime speedup over a state-of-the-art direct solver and be more than 15times faster than the CPU based multigrid implementation. The DC analysis of a 1.6 million-node industrial power grid benchmark can be accurately solved in three seconds with less than 50 MB memory on a commodity GPU. It is observed that the proposed approach scales favorably with the circuit complexity, at a rate about one second per million nodes.

109 citations


Cites methods from "Fast algorithms for IR drop analysi..."

  • ...In the past decade, on the standard general-purpose CPU platform, a body of power grid analysis methods have been proposed [1], [2], [3], [4], [5], [6], [7], [8] with various tradeoffs....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors investigated the low-frequency behavior of the PEEC matrix and demonstrated that the system matrix is well behaved from a full-wave solution at high frequencies to a pure resistive circuit solution at dc, thereby enabling dc-to-daylight simulations.
Abstract: The partial element equivalent circuit (PEEC) formulation is an integral equation based approach for the solution of combined electromagnetic and circuit (EM-CKT) problems. In this paper, the low-frequency behavior of the PEEC matrix is investigated. Traditional EM solution methods, like the method of moments, suffer from singularity of the system matrix due to the decoupling of the charge and currents at low frequencies. Remedial techniques for this problem, like loop-star decomposition, require detection of loops and therefore present a complicated problem with nonlinear time scaling for practical geometries with holes and handles. Furthermore, for an adaptive mesh of an electrically large structure, the low-frequency problem may still occur at certain finely meshed regions. A widespread application of loop-star basis functions for the entire mesh is counterproductive to the matrix conditioning. Therefore, it is necessary to preidentify regions of low-frequency ill conditioning, which in itself represents a complex problem. In contrast, the charge and current basis functions are separated in the PEEC formulation and the system matrix is formulated accordingly. The incorporation of the resistive loss (R) for conductors and dielectric loss (G) for the surrounding medium leads to better system matrix conditioning throughout the entire frequency spectrum, and it also leads to a clean dc solution. We demonstrate that the system matrix is well behaved from a full-wave solution at high frequencies to a pure resistive circuit solution at dc, thereby enabling dc-to-daylight simulations. Finally, these techniques are applied to remedy the low-frequency conditioning of the electric field integral equation matrix

77 citations


Cites background from "Fast algorithms for IR drop analysi..."

  • ...At dc, the problem degenerates to the solution of a resistive circuit [23]....

    [...]

Journal ArticleDOI
TL;DR: A finite-difference formulation based on the latency insertion method (LIM) has been employed for simulating the power-supply noise in the on-chip PDN, and a new common-mode type equivalent circuit has been proposed.
Abstract: Ensuring the integrity of the power supply in the power distribution networks (PDNs) of a chip is essential for building reliable high-performance chips. To ensure the power integrity, accurate, and memory- and time-efficient simulation approaches for simulating the power-supply noise in the on-chip PDN are essential. In this paper, a finite-difference formulation based on the latency insertion method (LIM) has been employed for simulating the power-supply noise in the on-chip PDN. A new common-mode type equivalent circuit has been proposed. In this equivalent circuit, a capacitance to ideal ground may not be present at all the nodes. Further, the nodes can be capacitively coupled to each other. To avoid inverting a large nonbanded matrix, a small capacitance to ground is added to a node that did not have any capacitance to ground, and a small series inductance is added to any floating capacitor that did not have any series inductance. Approximate closed-form expressions to compute the values of these capacitances to ground and series inductances have been proposed. The accuracy of the LIM-enabled transient simulation and the accuracy of the proposed closed-form expressions have been demonstrated. The memory and time complexity of the simulation for each time step have been shown to be O(Nn) each, where Nn is the number of nodes in the equivalent circuit. Stability condition is derived for the first time for multidimensional inhomogeneous RLC circuit. A upper bound of the time step is derived from the stability condition. Using this bound on the time step, the runtime of the overall transient simulation has been estimated to be approximately proportional to Nn 2-2.5 for Nn in the order of millions.

72 citations

Proceedings ArticleDOI
27 Dec 2005
TL;DR: In this article, the low frequency behavior of the PEEC matrix is investigated, and techniques leading to an excellent condition number throughout the entire frequency spectrum are discussed, and these schemes are applied to remedy the low-frequency conditioning of the EFIE method.
Abstract: The partial element equivalent circuit (PEEC) formulation is an integral equation based approach for the solution of combined electromagnetic and circuit (EM-CKT) problems. Traditional EM solvers like the electric field integral equation (EFIE) method suffer from numerical problems at low-frequencies arising from the decoupling of the charge and current basis functions. In this paper, the low frequency behavior of the PEEC matrix is investigated. Techniques leading to an excellent condition number throughout the entire frequency spectrum are discussed. Finally, these schemes are applied to remedy the low-frequency conditioning of the EFIE method.

40 citations

References
More filters
Book
01 Jan 1971
TL;DR: The ASM preconditioner B is characterized by three parameters: C0, ρ(E) , and ω , which enter via assumptions on the subspaces Vi and the bilinear forms ai(·, ·) (the approximate local problems).
Abstract: theory for ASM. In the following we characterize the ASM preconditioner B by three parameters: C0 , ρ(E) , and ω , which enter via assumptions on the subspaces Vi and the bilinear forms ai(·, ·) (the approximate local problems). Assumption 14.6 (stable decomposition) There exists a constant C0 > 0 such that every u ∈ V admits a decomposition u = ∑N i=0 ui with ui ∈ Vi such that N ∑ i=0 ai(ui, ui) ≤ C 0 a(u, u) (14.29) Assumption 14.7 (strengthened Cauchy-Schwarz inequality) For i, j = 1 . . . N , let Ei,j = Ej,i ∈ [0, 1] be defined by the inequalities |a(ui, uj)| ≤ Ei,j a(ui, ui) a(uj, uj) ∀ ui ∈ Vi, uj ∈ Vj (14.30) By ρ(E) we denote the spectral radius of the symmetric matrix E = (Ei,j) ∈ RN×N . The particular assumption is that we have a nontrivial bound for ρ(E) to our disposal. Note that due to Ei,j ≤ 1 (Cauchy-Schwarz inequality), the trivial bound ρ(E) = ∥E∥2 ≤ √ ∥E∥1 ∥E∥∞ ≤ N always holds; for particular Schwarz methods one usually aims at bounds for ρ(E) which are independent of N . Ed. 2011 Iterative Solution of Large Linear Systems 14.2 Additive Schwarz methods (ASM) 159 Assumption 14.8 (local stability) There exists ω > 0 such that for all i = 1 . . . N : a(ui, ui) ≤ ω ai(ui, ui) ∀ ui ∈ Vi (14.31) Remark 14.9 The space V0 is not included in the definition of E ; as we will see below, this space is allowed to play a special role. Ei,j = 0 implies that the spaces Vi and Vj are orthogonal (in the a(·, ·) inner product). We will see below that small ρ(E) is desirable. We will also see below that a small C0 is desirable. The parameter ω represents a one-sided measure of the approximation properties of the approximate solvers ai . If the local solver is of (exact) Galerkin type, i.e, ai(u, v) ≡ a(u, v) for u, v ∈ Vi , then ω = 1 . However, this does not necessarily imply that Assumptions 14.6 and 14.7 are satisfied. Lemma 14.10 (P. L. Lions) Let PASM be defined by (14.23) resp. (14.24). Then, under Assumption 14.6, (i) PASM : V → V is a bijection, and a(u, u) ≤ C 0 a(PASM u, u) ∀ u ∈ V (14.32) (ii) Characterization of b(u, u) : b(u, u) = a(P−1 ASM u, u) = min { N ∑ i=0 ai(ui, ui) : u = N ∑ i=0 ui, ui ∈ Vi } (14.33) Proof: We make use of the fundamental identity (14.27) and Cauchy-Schwarz inequalites. Proof of (i): Let u ∈ V and u = ∑ i ui be a decomposition of the type guaranteed by Assumption 14.6. Then: a(u, u) = a(u, ∑ i ui) = ∑ i a(u, ui) = ∑ i ai(Pi u, ui) ≤ ∑ i √ ai(Pi u, Pi u) ai(ui, ui) = ∑ i √ a(u, Pi u) ai(ui, ui) ≤ √∑ i a(u, Pi u) √∑ i ai(ui, ui) = √ a(u, PASM u) √∑ i ai(ui, ui) ≤ √ a(u, PASM u)C0 √ a(u, u) This implies the estimate (14.32). In particular, it follows that PASM is injective, because with (14.32), PASM u = 0 implies a(u, u) = 0 , hence u = 0 . Due to finite dimension, we conclude that PASM is bijective. Proof of (ii): We first show that the minimum on the right-hand side of (14.33) cannot be smaller than a(P−1 ASM u, u) . To this end, we consider an arbitrary decomposition u = ∑ i ui with ui ∈ Vi and estimate a(P−1 ASM u, u) = ∑ i a(P −1 ASM u, ui) = ∑ i ai(PiP −1 ASM u, ui) ≤ √∑ i ai(PiP −1 ASM u, PiP −1 ASM u) √∑ i ai(ui, ui) = √∑ i a(P −1 ASM u, PiP −1 ASM u) √∑ i ai(ui, ui) = √ a(P−1 ASM u, u) √∑ i ai(ui, ui) In order to see that a(P−1 ASM u, u) is indeed the minimum of the right-hand side of (14.33), we define ui = PiP −1 ASM u . Obviously, ui ∈ Vi and ∑ i ui = u . Furthermore, ∑ i ai(ui, ui) = ∑ i ai(PiP −1 ASM u, PiP −1 ASM u) = ∑ i a(P −1 ASM u, PiP −1 ASM u) = a(P−1 ASM u, ∑ i PiP −1 ASM u) = a(P −1 ASM u, u) This concludes the proof. Iterative Solution of Large Linear Systems Ed. 2011 160 14 SUBSTRUCTURING METHODS The matrix P ′ ASM = B −1A from (14.23) is the matrix representation of the operator PASM . Since PASM is self-adjoint in the A -inner product (see Lemma 14.2), we can estimate the smallest and the largest eigenvalue of B−1A by: λmin(B −1A) = inf 0 ̸=u ∈V a(PASM u, u) a(u, u) , λmax(B −1A) = sup 0 ̸=u ∈V a(PASM u, u) a(u, u) (14.34) Lemma 14.10, (i) in conjunction with Assumption 14.6 readily yields λmin(B −1A) ≥ 1 C 0 An upper bound for λmax(B −1A) is obtained with the help of the following lemma. Lemma 14.11 Under Assumptions 14.7 and 14.8 we have ∥Pi∥A ≤ ω, i = 0 . . . N (14.35) a(PASM u, u) ≤ ω (1 + ρ(E)) a(u, u) for all u ∈ V (14.36) Proof: Again we make use of identity (14.27). We start with the proof of (14.35): From Assumption 14.8, (14.31) we infer for all u ∈ V : ∥Pi u∥2A = a(Pi u, Pi u) ≤ ω ai(Pi u, Pi u) = ω a(u, Pi u) ≤ ω ∥u∥A ∥Pi u∥A which implies (14.35). For the proof of (14.36), we observe that the space V0 is assumed to play a special role. We define

2,527 citations

Proceedings ArticleDOI
01 May 1998
TL;DR: The 21264 is a third generation Alpha microprocessor implementation, containing 15.2 million transistors and operating at 600 MHz, and the electrical design of the power, ground, and clock networks is presented.
Abstract: Power dissipation is rapidly becoming a limiting factor in high performance microprocessor design due to ever increasing device counts and clock rates. The 21264 is a third generation Alpha microprocessor implementation, containing 15.2 million transistors and operating at 600 MHz. This paper describes some of the techniques the Alpha design team utilized to help manage power dissipation. In addition, the electrical design of the power, ground, and clock networks is presented.

391 citations

Journal ArticleDOI
TL;DR: This paper presents a new technique for analyzing a power grid using macromodels that are created for a set of partitions of the grid, and shows that even for a 60 million-node power grid, the approach allows for an efficient analysis, whereas previous approaches have been unable to handle power grids of such size.
Abstract: Careful design and verification of the power distribution network of a chip are of critical importance to ensure its reliable performance. With the increasing number of transistors on a chip, the size of the power network has grown so large as to make the verification task very challenging. The available computational power and memory resources impose limitations on the size of networks that can be analyzed using currently known techniques. Many of today's designs have power networks that are too large to be analyzed in the traditional way as flat networks. In this paper, we propose a hierarchical analysis technique to overcome the aforesaid capacity limitation. We present a new technique for analyzing a power grid using macromodels that are created for a set of partitions of the grid. Efficient numerical techniques for the computation and sparsification of the port admittance matrices of the macromodels are presented. A novel sparsification technique using a 0-1 integer linear programming formulation is proposed to achieve superior sparsification for a specified error. The run-time and memory efficiency of the proposed method are illustrated on industrial designs. It is shown that even for a 60 million-node power grid, our approach allows for an efficient analysis, whereas previous approaches have been unable to handle power grids of such size.

284 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method is very efficient as well as suitable for both DC and transient analysis of power grids, and reduced to a coarser structure, and the solution is mapped back to the original grid.
Abstract: Modern submicron very large scale integration designs include huge power grids that are required to distribute large amounts of current, at increasingly lower voltages. The resulting voltage drop on the grid reduces noise margin and increases gate delay, resulting in a serious performance impact. Checking the integrity of the supply voltage using traditional circuit simulation is not practical, for reasons of time and memory complexity. The authors propose a novel multigrid-like technique for the analysis of power grids. The grid is reduced to a coarser structure, and the solution is mapped back to the original grid. Experimental results show that the proposed method is very efficient as well as suitable for both de and transient analysis of power grids.

268 citations

Proceedings ArticleDOI
02 Jun 2003
TL;DR: This paper presents a power grid analyzer based on a random walk technique that has the desirable property of localizing computation, so that it shows massive benefits over conventional methods when only a small part of the grid is to be analyzed.
Abstract: This paper presents a power grid analyzer based on a random walk technique. A linear-time algorithm is first demonstrated for DC analysis, and is then extended to perform transient analysis. The method has the desirable property of localizing computation, so that it shows massive benefits over conventional methods when only a small part of the grid is to be analyzed (for example, when the effects of small changes to the grid are to be examined). Even for the full analysis of the grid, experimental results show that the method is faster than existing approaches and has an acceptable error margin. This method has been applied to test circuits of up to 2.3M nodes. For example, for a circuit with 70K nodes, the solution time for a single node was 0.42 sec and the complete solution was obtained in 17.6 sec.

165 citations