# A METHODOLOGY FOR THE COMPUTATION OF AN UPPER BOUND ON NOISE CURRENT SPECTRUM OF CMOS SWITCHING ACTIVITY

Alessandra Nardi, Haibo Zeng, Joshua L. Garrett, Luca Daniel, Alberto L. Sangiovanni-Vincentelli

University of California, Berkeley

#### ABSTRACT

Currents injected by CMOS digital circuit blocks into the power grid and into the substrate of a system-on-a-chip may affect reliability and performance of other sensitive circuit blocks. To verify the correct operation of the system, an *upper bound* for the spectrum of the noise current has to be provided with respect to all possible transitions of the circuit inputs. The number of input transitions is exponential in the number of circuit inputs. In this paper, we present a novel approach for the computation of the upper bound that avoids the untractable exhaustive exploration of the entire space. Its computational complexity is indeed linear in the number of gates. Our approach requires CMOS standard cell libraries to be characterized for injected noise current. In this paper, we also present an approach for this characterization of CMOS standard cells. Experimental results have proven the accuracy of both the algorithm and the noise current models used for the library characterization.

#### 1. INTRODUCTION

The complexity of systems-on-a-chip design requires an aggressive re-use of IP (Intellectual Property) circuit blocks. However, IP blocks can be safely re-used only if they do not affect other sensitive components. The switching activity of CMOS digital circuit blocks typically injects high frequency current noise both into the Gnd/Vdd system, and into the substrate of integrated circuits. Such currents can potentially affect circuit reliability and performance of other sensitive components [9]. For instance the Gnd/Vdd currents may produce electromigration, IR voltage drops, voltage oscillations due to resonances, and, possibly, electromagnetic interference. The substrate currents may couple noise to sensitive analog circuitry through body effect or direct capacitive coupling. Current injection analysis is needed to properly account for all such effects during the design phase. Different effects require different types of current injection models. For instance, power consumption analysis requires time-domain average current estimation over several clock periods. Electromigration, IR drop, and timing performance analysis require a time-domain noise current upper-bound with respect to all possible combinations of the inputs. Signal integrity, Gnd/Vdd grid resonances, electromagnetic interference, and substrate coupling on mixed-signal ICs require instead an upper-bound on the spectrum of the current injected into the Gnd/Vdd system or into the substrate respectively over all the possible input transition vectors.

The methodology in [6] can be used to accurately estimate both time-domain and frequency domain injected noise for a given set of input vectors. However exhaustive circuit simulation for all the possible input transition vectors would be required for upper-bound estimation. The stochastic approach in [4] can estimate frequency domain *average* current injection, but not an upper bound on all possible input transition vectors.

The approaches in [3, 7, 1, 8] can estimate such an upper-bound in the time domain, but not in the frequency domain. In fact, these methodologies are suited to derive the maximum current envelope in the time domain, which in general does not correspond to an upper bound in the frequency domain. All these approaches divide the time domain into time intervals and search for an upper bound to the current in each interval, by identifying all the gates that could potentially switch in that interval. Therefore, the logic correlation inside the circuit is neglected or is at most considered only between each pair of gates. Devadas et al. [5] account for this logic correlation, but, since the original target was power consumption estimation, the approach relies on the assumption that the maximum (weighted) switching activity corresponds to the maximum current. The problem is translated into a weighted max-satisfiability problem and, therefore, it can be solved only for relatively small circuits. Furthermore, the use of this methodology is restricted to the time domain.

Signal integrity, Gnd/Vdd grid resonances, electromagnetic interference, and substrate coupling on mixed-signal ICs are not addressed by any of the existing current injection analysis algorithms. For these problems, we have developed a general methodology that estimates the '*Noise Current Spectrum Upper Bound*' (*NISUB*) of a digital block by combining the noise current injected by each gate and accounting for the circuit logic functionality. Our approach includes glitches and accounts for the logic correlation, path by path, at the entire circuit level. Even though the entire logic space is explored by the algorithm, the complexity is simply linear in the number of gates. We describe a heuristic algorithm for a tight *NISUB* estimation and also introduce several heuristics to improve the algorithm speed and accuracy.

In our algorithm, we need a standard cell library characterization for the spectrum of the current noise. CMOS standard cell libraries are commonly characterized for timing performance analysis purposes, measuring and tabulating only output transition times and propagation delays. To the best of our knowledge, no procedure for noise current analysis library characterization is yet available. In this paper we also present a methodology to characterize CMOS standard cell libraries for injected current noise. This part has highlighted some interesting issues about multiple input switching events that were mostly neglected in the past.

The paper is organized as follows: Section 2 gives a formal description of the problem we want to solve, Section 3 reports observations and results concerning the library characterization, and Section 4 describes the proposed algorithm and experimental results.

This work was partially supported by the Semiconductor Research Corporation (SRC) and the Microelectronics Innovation and Computer Research Opportunities (MICRO) program.

#### 2. PROBLEM DEFINITION

# 2.1. Assumptions

The characterization of the noise current spectrum proposed in this paper is intended to be used for signal integrity, electromagnetic interference and noise coupling analysis.

We approach these problems by decomposing them in three stages:

- noise source characterization
- noise propagation
- impact on victim

In this paper we focus on the first stage, so the noise source characterization does not depend on the distance between the source and the victim. Such dependence is taken into account in the transmission model, i.e. during the second stage of the problem.

The transmission part of the problem is usually tackled with electromagnetic field solvers that consider interconnect (or substrate) layout geometries and can account for all sorts of capacitive, inductive, skin, proximity, diffusion, and even fullwave effects.

Such field solvers typically identify on the power grid (or substrate) some input ports and some output ports and use reduced order modeling techniques (e.g. [10]) to calculate frequency domain noise transfer functions from unit current excitations located at the input ports to the output ports.

A key assumption in this approach is that a large integrated circuit or multi-chip module (MCM) can be subdivided into smaller circuit blocks, such that within each of these blocks all effects accounted for by the field solvers are negligible. Hence, we can assume that:

- for each small circuit block all noise injectors can be collected into one single injection port in the global transmission model.
- within a circuit block the power supply voltage is uniformly constant and the current drawn by each gate does not affect significantly such voltage. In other words we assume each gate can be modeled as an independent current source.

Within this framework, the work presented in this paper is intended to provide an estimation for the maximum amplitude of the input excitation current of a single circuit block to be applied at its injection port.

#### 2.2. Maximum Current Spectrum Envelope

The current spectrum due to the switching activity of a CMOS digital circuit block is typically discrete. Significant non-zero components are present at the clock frequency  $f_0$  and at its first *P* harmonics:  $f_k = k \cdot f_0$ , k = 0, ..., P. In practical circuits *P* is typically not larger than 10 to 15 harmonics. The goal of this work is to find an upper bound of such noise current spectrum. One practical way to estimate such an upper bound is to consider separately each harmonic  $f_k$  in the spectrum, and to independently estimate an upper bound  $I_{max}(f_k)$  for the current drawn by the circuit at that particular harmonic. The final result of this procedure is a "Maximum Current Spectrum Envelope", obtained by collecting the individual bounds

$$\{I_{max}(f_0), I_{max}(f_1), \dots, I_{max}(f_P)\}\$$

at the P+1 harmonics in the spectrum.

#### 2.3. Noise Current Model

**Definition 1** The current of gate G', denoted by  $I_G(f_k)$  (or often simply as  $I_G$ ), is defined as the noise current injected by the gate G alone at frequency  $f_k$ , assuming a constant supply  $V_{DD}$ .

**Definition 2** 'The current of a node z', denoted by  $I_z(f_k)$ , (or often simply as  $I_z$ ), is defined as the sum of the noise currents injected at frequency  $f_k$  by all the gates in the transitive fanin network<sup>1</sup> of node z.

**Definition 3** The transition time of a node z is the interval of time between the 10% and 90% points of the waveform  $^2$  at node z. The transition time of a gate G is the transition time of its output node.

**Definition 4** The arrival time of a node z is the instant of time corresponding to the 50% point of the node waveform, with respect to the beginning of a clock cycle.

**Definition 5** The propagation delay of a gate G is the interval of time between the 50% values of the input and output waveforms.

Let G be a gate with n inputs and a single output z. For sake of simplicity and without loss of generality, we can consider gates with only one output. The noise current spectrum value of G at a given frequency  $f_k$  is given by:

$$I_G = f(v, T_T, T_A, C_L); \quad I_G \in \mathbb{C}$$

where:

- $v = \{v_1, v_2, \dots, v_n\}$  is the input transition vector.  $v_i \in B^{q_i}$  where  $B = \{00, 01, 10, 11\}$  and  $q_i$  is the number of transitions on the i-th input. Therefore,  $v \in V = B^{q_1} \times B^{q_2} \times \dots \times B^{q_n}$
- $T_T = \{T_{T1}, T_{T2}, \dots, T_{Tn}\}$  is the input transition time vector.  $T_{Ti} \in S_i^{q_i}$  where  $S_i = [T_{Tmi}, T_{TMi}]$  and  $q_i$  is the number of transitions on the i-th input. Therefore,  $T_T \in S = S_1^{q_1} \times S_2^{q_2} \times \dots \times S_n^{q_n}$ .  $[T_{Tmi}, T_{TMi}]$  represents the range of possible values for the transition time of input *i*. This range is specified in the standard cell library characterization.
- $T_A = \{T_{A1}, T_{A2}, \dots, T_{An}\}$  is the input input arrival vector.  $T_{Ai} \in [0, T_c)^{q_i}$  where  $q_i$  is the number of transitions on the i-th input, and  $T_c$  is the clock period. Finally,  $T_A \in A = [0, T_c)^{q_1} \times [0, T_c)^{q_2} \times \dots \times [0, T_c)^{q_n}$ .

•  $C_L \in \mathbb{R}^+$  is the output capacitive load.

To understand this model more intuitively, we report an example in Figure 1.

Similarly, the output transition time and the propagation delay of the gate are given by:

$$T_{TG} = g(v, T_T, T_A, C_L) \in [0, T_c)$$
$$T_{PG} = h(v, T_T, T_A, C_L) \in [0, T_c)$$

Given a certain circuit topology,  $C_L$  is fixed and, therefore, it can be eliminated from the search space.

If the same notation introduced above for a gate is now used for a circuit block with p primary inputs, then the primary input transition vector space  $\mathcal{L}_p$  is described in the previous model by the case  $q_i = 1, \forall i = 1, ..., n$ .

Therefore,  $\mathcal{L}_p = B^p$  and has cardinality  $card(\mathcal{L}_p) = 2^{2p}$ .

<sup>&</sup>lt;sup>1</sup>The transitive fanin network of z, is the cone at node z including z and *all* its predecessors. A cone at node z, denoted as  $C_z$ , is a subgraph consisting of z and some of its predecessors such that any path connecting a node in  $C_z$  and z lies entirely in  $C_z$ .

<sup>&</sup>lt;sup>2</sup>Since our analysis is performed only on digital circuits, the input waveform space is restricted to linear ramps.



$$T_A = \{ \{35ps, 125ps\}, \{15ps, 65ps, 200ps\} \} = \{T_{A1}, T_{A2}\}$$

Figure 1: Example to illustrate the notation in the case n = 2;  $q_1 = 2$  and  $q_2 = 3$ .

#### 2.4. Problem Statement

For a circuit *C* with *p* primary inputs, input transition vector  $v = \{v_1, v_2, ..., v_p\}$ , input transition times  $T_T$  and arrival times  $T_A$ , for each frequency  $f_k$  in the spectrum we want to calculate:

$$I_{max}(f_k) = \max_{v \in \mathcal{L}_p, T_T \in S, T_A \in A} |I_C(f_k)|$$

where  $I_C(f_k)$  is the total noise current of the circuit block:

$$I_C(f_k) = \sum_{i=1}^N I_{G_i}(f_k).$$

and *N* is the number of gates in the circuit block. We will further restrict the exploration space assuming our circuit block in examination is a combinatorial block standing between edge-triggered flip-flop's: therefore, primary inputs transition time and arrival times are assumed to be given. In particular, we will assume that all primary inputs will only switch at time t = 0. The problem is then reformulated as finding for each frequency  $f_k$  in the spectrum

$$I_{max}(f_k) = \max_{v \in \mathcal{L}_p} |I_C(f_k)|.$$

#### 3. LIBRARY CHARACTERIZATION

The algorithm for the upper bound of the noise current spectrum presented in this paper requires that each gate in the library be characterized both for timing and for noise injection analysis purposes. This means deriving the current spectrum, the output transition time and propagation delay of a gate for all possible input vectors. This section

- · gives an overview of the library characterization issues,
- highlights some cases that require a special attention and that are typically not considered
- describes the criteria we derived to face these special cases.

Note that, referring to the formalism presented in the previous section, the characterization process for a gate *G* with *n* inputs assumes  $q_i = 1, \forall i = 1, \dots, n$ .

**Non-Switching-Output Events** A Non-Switching-Output (NSO) event occurs when, in relation to some input transitions, the output of a gate does not switch. For example,  $(a: 0 \rightarrow 0, b: 0 \rightarrow 1, z: 0 \rightarrow 0)$  and  $(a: 0 \rightarrow 1, b: 0 \rightarrow 0, z: 0 \rightarrow 0)$  are NSO events for a 2-input AND gate with inputs *a* and *b* and output *z*. Noise current may be injected as a consequence of input transitions even if the output does not switch. Therefore, for a *n*-input gate, all the  $2^{2n}$  possible input transitions should be modeled. For general transitions, the injected noise current spectral contents are a function of both

the input transition time and the output capacitive load. However, the mechanism for noise current injection in NSO cases is different from Switching-Output (SO) cases. For example, the noise current injected during NSO events depends only on input transition time and not on the capacitive load.

**Multiple-Input-Switching Transitions** We define a Multiple-Input-Switching (MIS) transition, a transition where more than one gate input switches at the same time. For example,  $(a : 0 \rightarrow 1, b : 0 \rightarrow 1)$ ,  $(a : 0 \rightarrow 1, b : 1 \rightarrow 0)$ ,  $(a : 1 \rightarrow 0, b : 0 \rightarrow 1)$ , and  $(a : 1 \rightarrow 0, b : 1 \rightarrow 0)$  are all the MIS transitions for a 2-input gate.

A first attempt at characterizing MIS transitions has been presented in [2]. However in that work only timing performance is considered.

Intuitively, if the current waveforms due to two consecutive input events do not overlap, then the MIS transition current is simply the superposition of the corresponding two SIS transitions currents. For sake of simplicity and without loss of generality, we will refer to a 2-input gate. Let  $T_{Ai,k}$  and  $T_{Az,k}$  be the mid-point of the input and output voltage waveforms respectively of a SIS transition k; and  $T_{Ti,k}$  and  $T_{Tz,k}$  the input and output transition times respectively for transition k.

We call  $\Delta_k$  the *Temporal Distance* among two consecutive input events in transition k:

1. if  $(MIS,NSO)_k = (SIS,NSO)_{k1} + (SIS,NSO)_{k2}$ , then:  $\Delta_k = T_{Ai,k2} - T_{Ai,k1}$ 

2. if 
$$(MIS, NSO)_k = (SIS, SO)_{k1} + (SIS, SO)_{k2}$$
, then:  

$$\Delta_k = 1/2 \cdot [T_{Ai,k2} - 5/8 \cdot T_{Ti,k2} + T_{Az,k2} + 5/8 \cdot T_{Tz,k2}] - 1/2 \cdot [T_{Ai,k1} - 5/8 \cdot T_{Ti,k1} + T_{Az,k1} + 5/8 \cdot T_{Tz,k1}]$$

3. if  $(MIS, SO)_k = (SIS, NSO)_{k1} + (SIS, SO)_{k2}$ , then:  $\Delta_k = 1/2 \cdot [T_{Ai,k2} - 5/8 \cdot T_{Ti,k2} + T_{Az,k2} + 5/8 \cdot T_{Tz,k2}] - T_{Ai,k1}$ 

4. if 
$$(MIS, SO)_k = (SIS, SO)_{k1} + (SIS, NSO)_{k2}$$
, then:  

$$\Delta_k = T_{Ai,k2} - 1/2 \cdot [T_{Ai,k1} - 5/8 \cdot T_{Ti,k1} + T_{Az,k1} + 5/8 \cdot T_{Tz,k1}]$$

We define *Current Width*  $W_k$  of a transition k, the interval of time during which the current waveform related to transition k is not zero. We call *Disjunction Threshold*  $\Delta_{TH,k}$ , the value of the *Temporal Distance*  $\Delta_k$  beyond which the MIS transition k becomes the superposition of its corresponding SIS transitions  $k_1$  and  $k_2$ .  $\Delta_{TH,k}$ is comparable to the semi-sum of the two SIS transition current widths:  $\Delta_{TH,k} = 1/2 \cdot (W_{k1} + W_{k2})$ . We finally propose to use an on-off type of model for the characterization of MIS transitions:

- If Δ<sub>k</sub> ≥ Δ<sub>TH,k</sub>, we consider the MIS transition k simply as the superposition of the SIS transitions k1 and k2, each with its own input slope
- If  $\Delta_k < \Delta_{TH,k}$ , then we assume the inputs as simultaneous  $(\Delta_k = 0)$ .

If this simple model is used, the library needs to be characterized only for  $\Delta_k = 0$ . An intuitive motivation for the previous model can be given observing that for NSO transitions, the current injection begins when the first input moves, and ends when the last input settles. For SO transitions, the current injection begins when the first input moves, and ends when the output settles. For both type of transitions the current waveform is approximately a peak centered around its mid-point.

**The Base Table** In summary, the model we propose for a gate G is a *Base Table*( $BT_G$ ), an example of which is shown in Table 1. Such table is derived for each gate G in the library for each frequency  $f = k \cdot f_0$ , as defined in Subsection 2.2.

As it can be seen from Table 1,  $BT_G$  of a gate G with n inputs, has  $2^{2n}$  rows: each row corresponds to an input transition vector.

Table 1: The *Base Table* for a 2-input AND gate G.

| j:aba'b' | zz' | $T_{TG}$      | $T_{PG}$      | $I_G$    |
|----------|-----|---------------|---------------|----------|
| 0: 0000  | 00  | 0             | 0             | 0        |
| 1:0001   | 00  | $T_{TG}^1$    | $T_{PG}^1$    | $I^1$    |
| :        | ÷   | :             | •             | :        |
| 14: 1110 | 10  | $T_{TG}^{14}$ | $T_{PG}^{14}$ | $I^{14}$ |
| 15: 1111 | 11  | 0             | 0             | 0        |

For each transition j, the following data are calculated during the characterization:

- The current spectrum value of the gate  $I_G^j = f_i(v, T_T, C_L)$
- The propagation delay  $T_{PG}^{j} = g_{j}(v, T_{T}, C_{L})$
- The transition time  $T_{TG}^{j} = h_{j}(v, T_{T}, C_{L})$

where  $T_T$  and  $C_L$  are the gate input transition time and output capacitive load respectively.

The characterization assumes all gate inputs have the same arrival time:  $T_{Ai} = 5/8 \cdot T_{Ti}$  for each input *i*.

Note that the Base Table is characteristic of a gate of the library: nevertheless, since  $I_G^j$ ,  $T_{PG}^j$  and  $T_{TG}^j$  depend on the gate input transition time and output capacitive load, the *Base Table* has to be "instantiated" in the real circuit to obtain the value regarding the gate in the circuit.

An Instantiated Base Table of gate  $G(IBT_G)$ , is obtained from  $BT_G$  by calculating  $I_G^j$ ,  $T_{PG}^j$ ,  $T_{TG}^j$  for each transition *j* by using the value of the gate input transition time imposed from the circuit environment. Notice that the current injected (at frequency  $f_0$ ) by a gate G whose input arrival time is  $T_s$  can be obtained from  $I_G$  in  $BT_G$  shifted by  $\phi = -2\pi f_0 (T_s - 5/8 \cdot T_{T_i})$  in the frequency domain.

Equivalence Classes According to the BT concept, a gate input vector transition space V is partitioned into four equivalence classes with an equivalence-relation defined as "generating the same output transition". An equivalence-class  $E_G^b, b \in B$  for gate G is the set of all the rows of  $BT_G$  such that the output transition is equal to b.

The equivalence-classes defined by this relation for a two-input AND gate are:

 $E_G^{00} = \{0000, 0001, 0010, 0100, 0101, 0110, 1000, 1001, 1010\}$   $E_G^{01} = \{0011, 0111, 1011\}$   $E_G^{10} = \{1100, 1101, 1110\}$   $E_G^{11} = \{1111\}$ 

Since there is a bijective relation among the row number and the input transition vector, we can equivalently write:

 $E_G^{00} = \{0, 1, 2, 4, 5, 6, 8, 9, 10\}$  $E_{G}^{-} = \{0, 1, 2, 4, 3, E_{G}^{01} = \{3, 7, 11\}$  $E_{G}^{10} = \{12, 13, 14\}$  $E_{G}^{11} = \{15\}$ 

# 3.1. Experimental Results

In this Section we present results obtained by using our procedure to characterize the STMicroelectronics 0.18µm library optimized for high speed performance. As a test case we considered a small circuit containing 6 gates: two AND's, three OR's with different driving capabilities, and one EXOR. We analyzed only one primary input transition, but the input transition times and arrival



Figure 2: a) and c) compare (for two different input transitions) current injection into Vdd/Gnd according to a circuit simulation (solid) and according a reconstruction from the characterized library (dashed). b) and d) are the spectrum of the two curves in a) and c) respectively.

times of the primary inputs are chosen in order to generate on the internal nodes all the particularly critical cases mentioned in the previous library characterization section. We compare in Fig. 2 two waveforms representing the current injected into Gnd/Vdd: the solid curve is obtained by circuit level simulation, while the dashed one is derived by using exclusively library characterization information. In particular, we used gate delays from the library in order to determine the switching instants of each gate. The corresponding current injection waveforms from the library are then positioned accordingly and added together to obtain the total current injection. The case shown in Fig. 2.a includes: 2 (MIS,SO) transitions with  $\Delta < \Delta_{TH}$ , 2 (MIS,SO) with  $\Delta \ge \Delta_{TH}$ , 4 (MIS,NSO) with  $\Delta \ge \Delta_{TH}$ . Glitches are also present. Fig. 2.b compares the spectrum of the two curves obtained by using a Fast Fourier Transform. As mentioned in the previous Section, simultaneous MIS transitions correspond to the case  $\Delta < \Delta_{TH}$ , and we propose to model such cases assuming for simplicity  $\Delta = 0$ . In Fig. 2.c and Fig. 2.d good results are obtained even when our algorithm uses  $\Delta = 0$  to model one of the simultaneous MIS transitions with  $\Delta \approx \Delta_{TH}$ . The complete set of transitions in Fig. 2.c and Fig. 2.d includes: 2 (MIS,SO) transitions with  $\Delta < \Delta_{TH}$ , 1 (MIS,NSO) with  $\Delta < \Delta_{TH}$ , 1 (SIS,NSO), and 1 (SIS,SO).

As an additional remark, while performing the tests in this Section we observed that propagation delays of corresponding MIS and SIS transitions are different as claimed in [2]. We observed that such difference can be particularly critical when modeling noise injected currents. Hence when using propagation delays in a current injection estimation algorithm, a library characterized for timing in the classical may not be appropriate, but rather a timing model should be used which distinguishes between MIS and SIS transitions.

#### 4. UPPER BOUND ESTIMATION

In this section we describe our algorithm for the estimation of a Noise Current Spectrum Upper Bound (NISUB). First, to simplify our presentation we give a description of the algorithm neglecting glitches. Subsection 4.2 explains how to modify the algorithm and introduce glitches obtaing a heuristic estimation of the upper bound, while the computational complexity and a heuristic for speed improvement are analyzed in Subsection 4.3. In the last subsection, we discuss the experimental results and introduce more heuristics for improving the algorithm accuracy.

#### 4.1. Computing NISUB without Glitches

**The Composite Table** Before describing the algorithm, we need to introduce the notion of *Composite Table* of a node z ( $CT_z$ ). The *Composite Table* has the same structure of the *Instantiated Base Table*, i.e. each row j corresponds to an input transition vector. For each transition vector the following data are included in  $CT_z$ :

- The maximum current at frequency  $f_k$  of node  $z: I_z^j$
- The arrival time:  $T_{A_7}^j$
- The transition time:  $T_{T_z}^J$

Considering a gate G with output z, it is important to keep in mind that, although the  $IBT_G$  and the  $CT_z$  have basically the same structure, there is a crucial difference between them:

the values reported in  $IBT_G$  are related to the single gate G, while those in the  $CT_z$  are related to the entire transitive fanin network of node z.

The same equivalence-classes defined in Section 3 for  $IBT_G$  of a gate G can be used for the  $CT_z$  of a node z. We recall here that an equivalence-class  $E_G^b$ ,  $b \in B$  for node z is the set of all rows of  $CT_z$ such that the output transition is equal to b.

Observe that the case of no glitches corresponds to  $q_k = 1$  for each input k of each gate, in the formal model given in Section 2.3.

**The Algorithm** The recursive algorithm for the estimation the *NISUB* of a combinatorial circuit has the following key properties:

- · The recursion step processes one and only one gate
- · Each gate is processed just once
- A gate is processed only when all its inputs have already been processed
- To process a gate means to calculate its *Composite Table*
- It is applied for each frequency f = k ⋅ f<sub>0</sub>, as defined in Subsection 2.2. The composition of all the resulting values gives a *Maximum Current Spectrum Envelope (MCSE)* for the circuit.

The key idea is that: each row of  $CT_z$  is associated with a different input transition vector and includes for that input transition vector the upper bound on the current injected by the transitive fanin up to node z. Hence, when the algorithm has finished processing all gates and termines, the upper bound of the entire circuit can be obtained by simply inspecting and picking from the rows of the composite table of the primary output the one with the largest current.

The pseudo-code for the algorithm is reported in Figure 3. Some comments and explanations on the algorithm:

- 1. The *Composite Table* of a gate is generated from the *Composite Tables* of its inputs and its *Instantiated Base Table*
- 2. For the primary inputs the composite table is given and represents the constraints we mentioned in Subsection 2.4. Formally, for uniformity of notation, each primary input may be considered as output of a dummy buffer.

Legend

- PI = Primary Input of the circuit
- PO = Primary Output of the circuit
- $CT_z^j$  = row *j* of the Composite Table at node *z*
- fanin\_set(z) = set of all the fanin nodes of node z
- *z*.status: if set to VISITED, the  $CT_z$  has been already calculated
- node A

#### SEARCH\_UB(PO)

```
SEARCH_UB(z) {
```

```
if ( (z = PI) OR (z.status = VISITED) )
return;
else
foreach FI \in fanin_set(z)
SEARCH_UB(FI);
end foreach
CALC_CT(z);
return;
end if
```

```
CALC_CT(z) {
```

```
foreach row j in CT_z

foreach input k

E_{jk} \leftarrow \text{EXTRACT\_CLASS}(j,k);

end foreach

E_j \leftarrow E_{j1} \times \cdots \times E_{jn};

e_{MAX}^j = \frac{argmax}{e \in E_j} \{ \text{CALC\_ROW\_CURRENT}(e, j) \};

CT_z^j \leftarrow \text{CALC\_ROW}(e_{MAX}^j, j);

end foreach
```

}

**EXTRACT\_CLASS**(j,k) { extracts from  $CT^{j}$  the equivalence-class of the input k.}.

 $\textbf{CALC\_ROW\_CURRENT}(e, j) \{$ 

Calculates the current of a prospective row j for the *CT* of node z from the rows specified by e.}

# **CALC\_ROW**(e, j) {

Calculates a prospective row j for the CT of node z from the rows specified by e.}

#### Figure 3: Algorithm pseudo-code.

- 3. Each recursion step builds the *Composite Table* of a node *z*. In particular, the *Composite Table* is built row by row. For each row, there is a local search for the maximum: among all the possible combinations of the cartesian product of the equivalence-classes of the inputs, only the one giving the maximum current is chosen. It is worth noticing that the reduction of complexity comes exactly from this step.
- 4. For the cases  $v_i = 00$  or  $v_i = 11$ , we assign a symbolic value VOID to  $T_{Tz}$  and  $T_{Az}$ .

**Reconvergent Fanout** The algorithm sketched in Figure 3 can introduce a large error if the circuit presents reconvergent fanout. In fact, the current of a node z with fanout  $fo_z$  is counted  $fo_z$  times in the total circuit current.

It is actually correct to include these multiple counts to prevent certain nodes to have an improperly low weight during the algorithm selection. However, the final total current has to be adjusted properly to remove the multiple contributions.

One way to do so is to store the  $CT_z$  of a node z if  $fo_z > 1$  (otherwise it is trashed after use). From these we can derive the values to be subtracted from the circuit current  $I_C$  to obtain the corrected value  $I_{C,corr}$ :

$$I_{C,corr} = I_C - \sum_{k=1}^N I_k \cdot (fo_k - 1)$$

where N is the number of gates in the circuit.

**Remarks** The logic space exploration is complete according to the definition of equivalence-class we have given, but the choice of the representative element of a class is performed by using as cost function only the maximum current for the node under analysis. This choice does not necessarily imply maximum current for the following nodes, because the current of the gate in the next step also depends on the combination with the arrival times of its other inputs. Nevertheless, the effect of this error should not be significant given that our approach explores the entire primary input transition vector space. Most of the previous approaches [3, 7, 1, 8]use a much simpler model for the gate current and rely on stronger assumptions: e.g. the current injection of a gate is considered only if the output node switches, the dependency on input transition time of the current, the propagation delay and the output transition time are neglected. Furthermore, the logic correlation inside the circuit is neglected or is at most considered only between each pair of gates.

#### 4.2. Computing NISUB with Glitches

The generic formulation in Section 2.3 accounts for glitches using the variables  $q_k$  for each gate input k. Exploiting such formulation we can easily extend the approach with no glitches presented in Subsection 4.1 to include glitches by simply re-defining the equivalence-relation that partition the input transition vector space. Specifically, a gate input space V can be partitioned into four equivalence classes by the equivalence-relation defined as follows: *'Two input transition vectors are in the same equivalence class if their correspondent output transition has the same initial and final values.'* The same algorithm in 4.1 can now be used to operate on the newly defined classes. Only minor details in the CALC\_ROW function are needed to handle the new classes. Notice also that, having assumed that the *p* primary inputs can switch only once at the beginning of the clock cycle, the cardinality of the input transition vector space is still  $2^{2p}$ .

#### 4.3. Computational Complexity

Since each gate in the circuit is processed just once, the complexity of graph traversal is O(N), where N is the number of gates in the circuit. The cost of processing each node, i.e. building its *Composite Table*, is basically  $C_{node} = c \cdot C_p$  where:

- $C_p$  is the cost of the CALC\_ROW function
- *c* represents the number of times the CALC\_ROW function is applied to process a node

Therefore, for each clock harmonic, the cost of our algorithm is:  $C_{imp} = O(N \cdot C_{node}) = O(N \cdot C_p \cdot c)$ . Given an *n*-input gate, let the i-th input be the output of a  $m_i$ -input gate. It can be proven that, independent of the gate logic function,  $c = 2\sum_{i=1}^{n} 2^{2m_i}$ . Therefore, the value of *c* is bounded by  $2^{2n_{max}^2}$ , where  $n_{max}$  is the maximum number of inputs of a gate (a commonly used value is  $n_{max} = 5$ ).

Since a good CMOS design rule is to minimize the transistor stack size, this worst case is extremely unlikely.

For a circuit with p primary inputs, an explicit exhaustive search on  $\mathcal{L}_p$  (using for example the methodology in [6]) would have cost  $C_{exp} = N \cdot C_p \cdot 2^{2p}$ .

Regardless of the assumptions dictated by good design rules, the comparison shows that  $C_{imp} < C_{exp}$  for any circuit with p > 25. The comparison becomes even sharper when using a more realistic value for c.

It is worth noticing that it is the use of the equivalence-classes of the *Composite Table* that allows to explore the space  $\mathcal{L}_p$  *implicitely* and, therefore, without processing all the  $2^{2p}$  input transitions.

Finally, given the fact that c is different for each gate (it is related to the number of gate's input and the gate's logic funtion) and  $C_p$  also depends on the number of gate's inputs, the linear behavior of runtime is not obvious to be observed. Nevertheless, such linearity can be proven on regular structures composed only by gates with the same logic function and number of inputs.



Figure 4: Results for different version of the algorithm (circle, square, star) compared to exhaustive Spice simulations for a majority circuit.

**Heuristic for speed** Experimental results on the ISCAS-MCNC91 benchmark suite show that, even though c is about  $2^{18}$  on the average, there are cases for which c may become prohibitively large.

Therefore, to extend the applicability of the algorithm to any kind of circuit, we have introduced a heuristic that dramatically reduces the computational complexity, while maintaining accuracy. As we previously mentioned, a large value of c may significantly slow down the algorithm runtime. For a gate G, the value of c is determined by the number of inputs and by the cardinality of the equivalence classes of its inputs composite table. Clearly, to reduce c we can only reduce the cardinality of equivalence classes: in particular, we shrunk the Composite Table of a node to a Reduced Composite Table (*RCT*), by keeping as representative of an equivalence classe is always one: for an n-input gate, it can be proven that, independent of the gate logic function,  $c = 2^{2n}$ .

Clearly, instead of a single row for each equivalence class, one might decide to store two or three rows: slight variations of this heuristic can be used to trade off between speed and accuracy.

#### 4.4. Experimental Results

We evaluated the *NISUB* algorithm on some testcases from the ISCAS-MCNC91 benchmark suite mapped on STMicroelectronics stardard cell library.

Figure 4 reports normalized Vdd/Gnd noise current spectrum of a majority circuit for frequencies up to the 10 - th clock harmonic. Experimental results reported in the paper have been generated for the Vdd/Gnd noise current, but the same algorithm can be used to estimate the upper bound for the substrate noise.

To prove the validity of the algorithm we ran exhaustive Spice simulations (for all possible input transition vectors): the shaded area in Figure 4 represents the envelope of such a set of simulations for a majority circuit. In this Figure, squares correspond to the *NISUB* algorithm, while circles are obtained by using the version with the Reduced Composite Table (*NISUB\_RCT*).

These results confirm the validity of the *NISUB* algorithm and of the heuristic we introduced for speed.

Furthermore, results from a number of other testcases, shows extremely good agreement between these two versions and motivate the use of the faster *NISUB\_RCT*.

Table 2 reports runtime for the *NISUB\_RCT* algorithm and exhaustive Spice simulation: these results confirm the theoretical analysis on complexity reported above. As it can be seen from this Table, speed improvement is good for circuits with a small number of primary inputs and becomes extremely good when increasing the number of circuit primary inputs. In general, to have an idea of how large of a speed improvement to expect we can consider a circuit with 50 4-input gates and 7 primary inputs. Based on experimental data <sup>3</sup> we can roughly predict 91 hours runtime for exhaustive Spice simulation and 8.5 minutes for the *NISUB\_RCT* algorithm: the speed improvement in this case would be about 640.

Figures 5.a and 5.c report good results up to the 15-th/18th clock harmonic for the *NISUB\_RCT* algorithm (circles) with respect to exhaustive Spice simulation (shaded area) for circuits cm82a and cm42a respectively. Figure 5.b also shows accurate upper bound estimation (circles) with respect to a significant set of Spice simulations (shaded area) for circuit 9symml.

**Heuristic for a Tigther Bound** The accuracy desidered for the upper bound estimation may depend on the target application: some problems may require a conservative approach, while others could benefit from a tighter estimation.

Although experimental results proved the *NISUB\_RCT* algorithm to be quite accurate, the algorithm may still be improved to obtain a tighter estimation. As we mentioned in the introduction, our algorithm that accounts for logic correlation at the path level is a step forward with respect to previous approaches that only consider correlation at gate level. Nonetheless, less overestimation can be achieved by accounting for logic correlation at circuit level. This contribution may be become especially significant for multiple-output circuits.

Therefore, we developed a new algorithm implementation that accounts for gate inputs correlation (*NISUB\_RCT\_GIC*).



Figure 5: Normalized noise current spectrum for 20 harmonics of the clock for circuits cm82a (a), 9symml (b) and cm42a (c). Circles, stars and squares represent the upper bound estimation obtained by using the *NISUB\_RCT*, *NISUB\_RCT\_GIC* and *NISUB\_RCT\_GIC\_NP* algorithms respectively.

Figures 4, 5.a and 5.b show that *NISUB\_RCT\_GIC* (stars) actually gives a tighter upper bound than *NISUB\_RCT* (circles).

Furthermore, Table 2 shows that this heuristic for a tighter bound may also result into better runtimes. This is due to the fact that, accounting for logic correlation at a gate's inputs, may actually reduce c, i.e. the number of products we need to perform.

Clearly, more heuristic techniques can be added to the account for other types of logic correlation (e.g. between outputs).

**Remarks on high frequencies behavior** Experimental results have shown that the algorithm we developed (including the different heuristics) give an accurate estimation for the noise current spectrum upper bound. Nevertheless, Figures 5.a and 5.c show that the accuracy of the estimation may worsen at high frequencies

<sup>&</sup>lt;sup>3</sup>The exhaustive Spice simulation needs to run 2<sup>14</sup> input transition vector with an estimated runtime of 180 simulations/hour. For the algorithm, the runtime is given by  $k * N * 2^{2n} * C_p$  where k = 20 (the number of harmonics of interest), N = 50, n = 4 and a bound on  $C_p$  has been estimated around 0.002*s*.

| Circuit (#inputs) | RCT    | RCT_GIC | Spice     | Speed X |
|-------------------|--------|---------|-----------|---------|
| cm42a (4)         | 8 min  | 12 min  | 1.4 hours | 7       |
| cm82a (5)         | 27 min | 23 min  | 5.7 hours | 15      |
| majority (5)      | 12 min | 12 min  | 5.7 hours | 28      |
| 9symml (9)        | 83 min | 54 min  | 60.7 days | 1618    |

Table 2: Comparison of approximate runtimes for *NISUB\_RCT*, *NISUB\_RCT\_GIC* and exhaustive Spice simulation.

for some circuits (e.g. after the 15-th harmonic for cm82a and after the 18-th harmonic for cm42a).

This underestimation is related to the fact that the sensitivity of the current phase increases as the frequency increases. In fact, an error in the delay evaluation  $\Delta t$  results in an error in the phase evaluation  $\Delta \phi$  along the relationship:  $\Delta \phi = 2\pi f_k \Delta t$ , where  $f_k$  is the k-th harmonic of the clock. Clearly, given the same error in the delay evaluation  $\Delta t$ , the error on the phase calculation  $\Delta \phi$  is 10 times larger at the 10-th harmonic than at the fundamental frequency. A large error in the noise current phase may significantly impact the results since it affects the way contributions from different nodes are added. The accuracy in the delay calculation can, therefore, limit the accuracy of the noise current spectrum upper bound <sup>4</sup>.

Hence, when looking at such high frequencies we may use another heuristic (*NISUB\_RCT\_GIC\_NP*) that overcomes these large errors on phases: the phase of the noise current spectrum is neglected and only the magnitude is stored in the composite table. In this was, current are summed up all in phase giving a conservative bound. The comparison of *NISUB\_RCT\_GIC\_NP* against *NISUB\_RCT\_GIC* (showed in Figure 5 as squares and stars respectively) confirms the observation reported above: error on phases is small at low frequencies and gradually increases at high frequencies.

Thus, one may decide to use the  $NISUB\_RCT\_GIC$  algorithm for the first k harmonics and then revert to  $NISUB\_RCT\_GIC\_NP$ , where k may be evaluated by using the relationship between delay and phase errors.

Furthermore, the error may also derive from other factors, for example the noise current spectrum at high frequencies may have a larger sensitivity to the different inputs slopes and the approximation used in the library characterization may need to be refined.

#### 5. CONCLUSIONS

A methodology has been presented for the characterization of the noise current spectrum injected by CMOS switching gates into the Gnd/Vdd system or into the substrate of integrated circuits. Specifically, we have described a procedure to estimate an upper bound for such noise current spectrum with respect to all possible transition vectors at the circuit primary inputs. Our algorithm has linear complexity in the number of gates and has been shown to provide significant computational advantage with respect to an exhaustive exploration of the input space. Furthermore, we have developed several heuristics to improve speed and accuracy of the base algorithm. The particular application determines which heuristics to use to trade-off between a conservative and an accurate estimation. Experimental results have proven the accuracy of the algorithm and have also shown a large speed improvement with respect to exhaustive Spice simulation that would be needed to guarantee conservative noise current estimation. A procedure has also been presented for CMOS standard cell libraries characterization of the switching current injection, which we use in the upper bound estimation algorithm. Our model captures special important cases such as Non-Switching-Output, and Multiple-Input-Switching events which are typically neglected in classical library characterization procedures.

## 6. ACKNOWLEDGMENTS

The authors would like to thank Claudio Pinello for useful discussions and suggestions on the formalization of the problem.

## 7. REFERENCES

- S. Bobba and I. N. Hajj. Estimation of maximum current envelope for power bus analysis and design. *Proceedings of International Symposium on Physical Design*, pages 141–146, April 1998.
- [2] V. Chandramouli and K. A. Sakallah. Modeling the effects of temporal proximity of input transitions on gate propagation delay and transition time. In *Design Automation Conference*, pages 617–22, June 1996.
- [3] S. Chowdhury and J. S. Barkatullah. Estimation of maximum currents in MOS IC logic circuits. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 9(6):642–654, June 1990.
- [4] A. Demir and P.Feldmann. Modeling and simulation of the interference due to digital switching in mixed-signal ICs. In *Proceedings of International Conference on Computer-Aided Design*, pages 70–74, November 1999.
- [5] S. Devadas, K. Keutzer, and J. White. Estimation of power dissipation in CMOS combinational circuits using boolean function manipulation. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 11(3):373– 383, March 1992.
- [6] C. E., R. Garpurey, P. Miliozzi, R. G. Mayer, and A. Sangiovanni-Vincentelli. Substrate noise : analysis and optimization for IC design. Kluwer Academic Publishers, 2001.
- [7] H. Kriplani, F. N. Najm, and I. N. Hajj. Pattern independent maximum current estimation in power and ground buses of CMOS VLSI circuits: Algorithms, signal correlations, and their resolution. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 14(8):998–1012, August 1995.
- [8] A. Krstic and K. Cheng. Vector generation for maximum instantaneous current through supply lines for CMOS circuits. In *Proceedings of Design Automation Conference*, pages 383– 388, June 1997.
- [9] S. Lin and N. Chang. Challenges in power-ground integrity. In *IEEE/ACM Internat. Conf. on Computer-Aided Design*, Nov. 2001.
- [10] A. Odabasioglu, M. Celik, and L. T. Pileggi. PRIMA: passive reduced-order interconnect macromodeling algorithm. *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, 17(8):645–654, August 1998.

<sup>&</sup>lt;sup>4</sup>For example, let us consider a circuit whose clock frequency is f = 333MHz. A  $\frac{\pi}{2}$  error on phase derives from 750*ps* delay calculation error at the clock frequency, while it corresponds to 50*ps* delay calculation error at the 15-th clock harmonic.