## **Estimation of Maximum Power-up Current**\*

Fei Li ECE Department University of Wisconsin Madison, WI 53706, USA feil@ece.wisc.edu Lei He ECE Department University of Wisconsin Madison, WI 53706, USA lhe@ece.wisc.edu Kewal K. Saluja ECE Department University of Wisconsin Madison, WI 53706 saluja@engr.wisc.edu

#### Abstract

Power gating is emerging as a viable solution to reduction of leakage current. However, power gated circuits are different from the conventional designs in the sense that a power-gated circuit must be brought to a valid state from the power-off state, when all nodes in the circuit are at logic zero state, before useful computation can begin. Thus, estimation of the maximum current in a power gated circuit must determine the maximum of all possible power-up and normal switching current. In this paper, we propose a cluster-based ATPG algorithm to estimate the maximum power-up current for combinational circuits. Our method achieves substantial improvement over simulation-based methods and also over the previously proposed ATPG-based methods. Further, we also formulate the sequential circuit maximum current problem as a combinational ATPG problem, and solve it using the cluster-based estimation algorithm. Experimental results show that the maximum power-up current for sequential circuits can be up to 73% larger than the maximum normal switching current.

## 1. Introduction

Power dissipation has been an important constraint for high-performance microprocessor and portable computing system design. Clock gating is effective to reduce dynamic power but not the leakage power, which has become an increasingly large portion of the total power dissipation [9]. Power gating has been introduced to reduce the leakage power [4]. Figure 1 shows a circuit structure employing the power-gating technique, where a PMOS sleep transistor with a high threshold voltage is used to turn-on or turn-off the Vdd supply to the conventional functional block. Note that an NMOS sleep transistor may be used instead of the PMOS sleep transistor to disconnect the ground from the functional block. The knowledge of the maximum current drawn by the functional block is needed to design the sleep transistor. An undersized sleep transistor severely degrades the circuit performance and reliability. An oversized sleep transistor induces an extra area and leads to a large turnon time when powering up the functional block. Maximum current is also related to Vdd voltage drop, ground bounce, electromigration and Ldi/dt noise in P/G wires.



Figure 1. Circuit structure using power gating.

Clearly, the maximum current is decided by the larger one between normal switching current and power-up current. A review of the maximum switching current estimation can be found in [6]. As for the power-up current, it may occur when the circuit is woken up from the power-off state. For combinational circuit, the power-up current depends on only one input-vector, while the switching current depends on two consecutive vectors [6]. Further, two ATPG-based algorithms were proposed to estimate the maximum powerup current for combinational circuits in [6].

In this paper, we propose a new cluster-based ATPG algorithm, and solve the estimation problem for power-up current of sequential circuits. The remainder of the paper is organized as follows. Section 2 first gives a review of the

<sup>\*</sup>This research is partially supported by SRC grant 2000-HJ-782, and a grant from Intel. We used computers donated by SUN Microsystems. Address comments to lhe@ece.wisc.edu.

power-up current estimation, and then presents our clusterbased ATPG algorithm *Imax/Gain-cluster* for the combinational circuits. Section 3 formulates and solves the estimation problem for sequential circuits. Section 4 concludes this paper. A full version of this paper is available as a technique report [7] at website http://eda.ece.wisc.edu/.

## 2. Cluster-based ATPG Algorithm for Combinational Circuits

#### 2.1. Review and Motivation

The maximum power-up current has been studied in [6] for combinational circuits. Because all the internal nodes of the circuit using a PMOS sleep transistor are fully discharged during sleep mode, power-up current only depends on the first input vector after the circuit is woken up. Under the assumption that the power-up current is proportional to the total charge that needs to be recovered after wake-up<sup>1</sup>, the power-up current can be represented as follows:

$$P_{i} = \sum_{\text{for all gates}} VAL(g) \cdot C(g) \cdot V_{dd}$$
(1)

where C(g) is the load capacitance of gate g, VAL(g) is the logic value of gate output and  $V_{dd}$  is the supply voltage. If the gate output is logic "1", there is a charge of  $C(g) \cdot V_{dd}$ stored in the load capacitance. Assuming that all gates have the same input capacitance and ignoring the wire capacitance, the equation (1) reduces to

$$P_{i} = \sum_{\text{for all gates}} VAL(g) \cdot F_{out}(g)$$
(2)

where  $F_{out}(g)$  is the number of fanouts for gate g.

Two estimation algorithms based on ATPG were proposed in [6]: fanout-based algorithm *Imax/Fanout* and gain-based algorithm *Imax/Gain*. Both algorithms are gate-level ATPG algorithms, which assign logic value "1" to the gate outputs in a greedy fashion to maximize  $P_i$ , the metric for power-up current.

In this paper, we also use  $P_i$  as the metric for powerup current. We speculate that clustering may improve the quality of the estimation results. Our speculation is motivated by the following well known result [1]: The outputs of a fanout-free sub-circuit (FFSC) can be justified independently without logic conflicts within the FFSC. Therefore, to find the maximum power-up current for an FFSC, we only need to enumerate the two possible assignments for each primary output and justify all the outputs without any logic conflict. If there are multiple justifications for one assignment, the justification producing the largest power-up current leads to the optimal solution. Compared to the *Imax/Fanout-gate* and *Imax/Gain-gate* algorithms, the cluster-based algorithm has a different ordering for gate processing and may lead to a better estimation. This motivates us to cluster the circuit and perform the ATPG algorithm at the cluster level. Our speculation of the improvement induced by clustering is verified by the experimental results in Section 2.4.

# 2.2. Clustering Based on Maximum Fanout Free Cone

Because the FFSC-based clustering can only reduce the size of the netlist to a limited degree, we use the Maximum Fanout Free Cone to cluster the circuit. Fanout Free Cone (FFC) and Maximum Fanout Free Cone (MFFC) have been used for FPGA technology mapping in [2, 3], and they are defined as follows:

**Definition 1 (FFC & MFFC)** For a given node v in a combinational circuit,

$$FFC_v = \{u | every path from u to some PO \\ passes through v in the circuit \}$$
$$MFFC_v = \{u | for all FFC_v, u \in FFC_v \}$$



Figure 2. Maximum fanout free cones.

In Figure 2, we show the MFFCs rooted at nodes u, vand w in the circuit.  $MFFC_u, MFFC_v$  and  $MFFC_w$ form a disjoint partition of the circuit. We use the algorithm in [3] to construct the MFFCs and cluster the whole circuit. In general, an MFFC may not be an FFSC as the reconvergence structure is allowed within an MFFC but not an FFSC. For example, in Figure 2  $MFFC_v$  is an FFSC, but  $MFFC_u$  is not an FFSC.

#### 2.3. Cluster-based ATPG Algorithm

After the clustering, we have a network of MFFCs. The MFFCs can be treated as supergates and ATPG/gate algo-

<sup>&</sup>lt;sup>1</sup>For those circuits using an NMOS sleep transistor, all internal nodes are charged to Vdd during the sleep mode and power-up current is proportional to the total charge that needs to be discharged after wake-up.

| column  | 2                             | 3         | 4          | 5          | 6                                           | 7         | 8          |
|---------|-------------------------------|-----------|------------|------------|---------------------------------------------|-----------|------------|
|         | Max. Power-up Current $(P_i)$ |           |            | $P_s$      | Runtime for Max. Power-up Current (seconds) |           |            |
| Circuit | Imax/Gain                     | Imax/Gain | simulation | simulation | Imax/Gain                                   | Imax/Gain | simulation |
|         | -gate                         | -cluster  |            |            | -gate                                       | -cluster  |            |
| C432    | 223                           | 219       | 219        | 207        | 0.18                                        | 0.95      | 20.1       |
| C499    | 210                           | 222       | 230        | 280        | 0.4                                         | 8.9       | 27.2       |
| C880    | 517                           | 517       | 456        | 437        | 0.5                                         | 6.5       | 46.6       |
| C1355   | 688                           | 689       | 680        | 567        | 1.6                                         | 26.1      | 62.8       |
| C1908   | 882                           | 898       | 887        | 911        | 2.8                                         | 4.1       | 95.5       |
| C2670   | 1366                          | 1324      | 1182       | 1228       | 2.87                                        | 60.7      | 161.3      |
| C3540   | 1545                          | 1569      | 1548       | 1623       | 10.13                                       | 155.4     | 179.2      |
| C5315   | 2668                          | 2663      | 2354       | 2575       | 6.5                                         | 157.6     | 289.5      |
| C6288   | 1799                          | 2084      | 2021       | 2539       | 9.6                                         | 114.9     | 277.6      |
| C7552   | 3524                          | 3681      | 3380       | 3547       | 14.7                                        | 256.4     | 427.1      |

Table 1. Comparison between power-up current and normal switching current for combinational circuits.  $P_s$  is maximum switching current.

rithms can be performed at the MFFC level. We first define the following concept.

**Definition 2 (MFFC Assignment)** The logic value assignment to an MFFC (MFFC Assignment) is to assign the target value to the root gate and also assign logic values to a subset of the MFFC inputs so that the assignment to the root gate can be justified within the MFFC.

Because there may be different input combinations to the MFFC that can achieve the target assignment to the root gate, we use *backtracing* starting from the root gate to choose one "proper" input combination. To avoid logic conflict between the assignments to different MFFCs, *backtracing* for an MFFC further maps the assignment to a set of PIs.

The gain metric for a gate g was defined in [6] as follows,

$$gain(g, v) = (-1)^{(v+1)} \cdot F_{out}(g) + \sum_{h \in IMP} ((-1)^{V(h)+1} \cdot F_{out}(h))$$
(3)

where v is the output value of gate g, IMP is the set of all the gates whose output values can be uniquely determined by the implication process of the assignment of gate g, V(h)is the output value of gate h, and  $F_{out}$  is the fanout number of gate h.

Moreover, we define the gain for an MFFC as the following,

**Definition 3 (Gain of an MFFC)** *The gain of an MFFC is the gain of its root gate.* 

The flow of our cluster-based algorithm *Imax/Gaincluster* is as follows:

- 1. Cluster circuit into disjoint MFFCs.
- 2. Compute gains for all the MFFCs and sort them in the non-increasing order.

- 3. Iteratively select the MFFC with the largest gain and justify the MFFC assignment using the *justification* mechanism similar to that in [6]. After assignment of the current MFFC, the gains of the remaining MFFCs are updated.
- 4. Because it is possible that only part of the MFFC inputs needs to be assigned to complete the MFFC assignment, there may be some non-root gates that are not assigned after all the MFFC assignments are completed. We use the *Imax/Gain-gate* algorithm in [6] to finish the assignment to those gates.

The above algorithm assigns the root gate of an MFFC earlier than any other gate in the MFFC. This is motivated by the analogy between the MFFC root gate and the FFSC output. Moreover, because MFFC allows the reconvergence structure, the logic conflict may happen during justification even within an MFFC. Therefore, the justification in the above algorithm includes *backtracking* to solve conflicts within an MFFC and between MFFCs.

#### 2.4. Experimental Results

We have implemented the cluster-based ATPG algorithm in C on a Sun Enterprise 220R server. In order to evaluate the quality of our ATPG-based algorithm, we also implemented a simulation-based algorithm, which simulates each circuit with 5000 uniformly distributed input vectors. Both power-up current and normal switching current are monitored during the simulation. We have tested all the algorithms using ISCAS'85 benchmarks and report the results in Table 1.

In columns 2 and 3 of Table 1, we compare the estimation results of the *Imax/Gain-cluster* algorithm and the *Imax/Gain-gate* algorithm<sup>2</sup>. For seven out of the ten cir-

<sup>&</sup>lt;sup>2</sup>The *Imax/Gain-gate* algorithm is re-implemented on the same Sun server in order to make a runtime comparison.

cuits, the Imax/Gain-cluster algorithm achieves larger estimation results than the Imax/Gain-gate algorithm. The improvement is up to 16% (see circuit C6288). In columns 3 and 4 of Table 1, we compare the estimation results of the Imax/Gain-cluster algorithm and the simulation-based algorithm. For eight out of the ten circuits, the Imax/Gain-cluster algorithm achieves larger estimation than the simulation-based algorithm. The improvement is up to 13% (see circuit C5315). In columns 6 to 8, we compare the runtime. Because the new algorithm needs to cluster the circuit into MFFCs first, it takes more runtime than the gate level ATPG algorithm. But our cluster-based algorithm is still much faster than the simulation-based algorithm. In column 5, we present the estimation results of normal switching current obtained by the simulation-based algorithm. For six out of the ten combinational circuits, the maximum power-up current is larger than the maximum normal switching current.

#### **3. Estimation for Sequential Circuits**

#### **3.1. ATPG-based Algorithm**

In this section, we study the maximum power-up current estimation for sequential circuits using power gating. Figure 4 shows a basic circuit structure of a sequential circuit using power gating. Both the combinational block and the flip-flops are gated by sleep transistors<sup>3</sup>.

To study the power-up current, we use SPICE to simulate a simple sequential circuit with only one master/slave flip-flop (FF). The combinational block for this sequential circuit is shown in Figure 5. The *current state line* comes from the output of the FF and the *next state line* feeds to the input of the FF. All the input signals are shown in the figure except the system clock with 4ns period. The circuit will switch to sleep mode when the  $V_{gating}$  signal goes high.

In Figure 3 (a) - (b), we show the SPICE simulation for this sequential circuit. We mainly focus on the FF in this experiment. Figure 3 (a) gives the overall waveform for the FF's output. One can clearly see that the FF's output voltage goes down during the sleep mode. Figure 3 (d) shows the zoom-in of power-up region for the current through the power rail. The first current spike is the power-up surge current, which is significantly larger than the normal switching current in this experiment.

Since FFs are storage elements and they lose their data during sleep mode, the original data need to be recovered after the circuit is woken up in order to operate correctly. The FFs' states immediately after power-up depend on the power-up timing with respect to the system clock, FF structure, signal race within the FF, etc. In order to obtain a conservative estimate of power-up current, we assume that on powering up after sleep mode, the FFs can be in any arbitrary state. Thus our method will choose a state that maximizes the power-up current, leading to a conservative estimate. Then, the estimation problem for a sequential circuit can be transformed into a problem for a combinational circuit, where the primary inputs of the transformed combinational circuit include the PIs of the sequential circuit and the pseudo primary inputs (PPIs) that correspond to the feedback lines from the FFs (see Figure 6). A logic vector for the primary inputs of the transformed circuits can be found by using our cluster-based ATPG algorithm Imax/Gain-cluster. More specifically, the fanout numbers of the FFs are counted into PPIs and FFs are removed from the circuit. After the remaining combinational block is clustered, the *Imax/Gain-cluster* algorithm is applied to obtain the logic vector for both PIs and PPIs of the original circuit. Note that we solve the sequential circuit estimation problem using combinational ATPG methods. This strategy of mapping sequential circuits into combinational circuits has been used for estimation of normal switching current [10, 8] and for acyclic sequential circuit testing [5].



Figure 4. A sequential circuit using power gating.

#### **3.2. Experimental Results**

We present the results for ISCAS'89 benchmark circuits in Table 2. Again, simulation-based algorithm is implemented to evaluate the estimation quality. We use 5000 randomly generated vectors in the simulation to obtain the maximum power-up current. In columns 2 and 3, we compare the results between the *Imax/Gain-cluster* algorithm and the simulation-based algorithm. For three out of the five small circuits with fewer than 1000 gates (s1196 - s1494), *Imax/Gain-cluster* slightly improves the

<sup>&</sup>lt;sup>3</sup>For simplicity, the following discussions are for the case using PMOS sleep transistors. Similar analysis can also be applied to the case using NMOS sleep transistors.



Figure 3. (a) Simulation result for the voltage of the FF's output; (b) Simulation result for the current through the power rail; (c) Zoom-in of the power-up region of (a); (d) Zoom-in of the power-up region of (b).



Figure 5. The combinational block used in a sequential circuit for SPICE simulation.

simulation-based algorithm. For all of the seven large circuits (s5378 - s38584.1), *Imax/Gain-cluster* consistently improves the simulation-based algorithm. The largest improvement is 13% for circuit s15850.1. In the bottom row, we present the total maximum current for all the circuits, which corresponds to the average case. On average, the *Imax/Gain-cluster* algorithm achieves 8% improvement. The *Imax/Gain-cluster* algorithm also takes much less runtime than the simulation-based algorithm, as shown in columns 5 and 6.

In columns 2 to 4, we compare the power-up current and the normal switching current for the sequential circuits. For all of the seven large sequential circuits (s5378 - s38584.1), the maximum power-up current is always larger than the maximum switching current. The largest difference is 73% for circuit s38417. Comparison in the bottom row shows that on average the maximum power-up current obtained



Figure 6. The combinational circuit transformed from the sequential circuit in Figure 4 by removing the FFs.

by the *Imax/Gain-cluster* algorithm is 31% larger than the maximum switching current.

### 4. Conclusions and Future Work

In this paper, we have presented a new cluster-based ATPG algorithm *Imax/Gain-cluster* to estimate the maximum power-up current for combinational circuits. Circuits are first clustered into Maximum Fanout Free Cones (MF-FCs) and then the gain-based ATPG algorithm *Imax/Gain* is performed at the cluster-based level. Our experimental results show that this new algorithm can achieve up to 16% improvement compared to the original gate-level *Imax/Gain* algorithm in [6]. We also compare the esti-

| column   | 2                 | 3               | 4                    | 5                                           | 6          |
|----------|-------------------|-----------------|----------------------|---------------------------------------------|------------|
| Circuit  | Max. Power-up     | Current $(P_i)$ | Max. Switch. Current | Runtime for Max. Power-up Current (seconds) |            |
|          | Imax/Gain-cluster | simulation      | simulation           | Imax/Gain-cluster                           | simulation |
| s1196    | 553               | 558             | 576                  | 3.1                                         | 33.8       |
| s1238    | 567               | 565             | 579                  | 3.2                                         | 33.2       |
| s1423    | 921               | 814             | 743                  | 2.4                                         | 47.2       |
| s1488    | 640               | 639             | 959                  | 5.2                                         | 38.8       |
| s1494    | 588               | 645             | 913                  | 5.0                                         | 38.2       |
| s5378    | 2424              | 2406            | 2165                 | 25.0                                        | 196.9      |
| s9234    | 4630              | 4304            | 3163                 | 69.0                                        | 387.5      |
| s13207.1 | 7409              | 6749            | 4263                 | 138.0                                       | 663.0      |
| s15850.1 | 8811              | 7806            | 5429                 | 232.2                                       | 809.8      |
| s35932   | 22937             | 20802           | 20897                | 942.6                                       | 1616.0     |
| s38417   | 20504             | 19042           | 11885                | 1099.5                                      | 2249.3     |
| s38584.1 | 20290             | 18932           | 16971                | 1420.3                                      | 2022.7     |
| total    | 90274 (31.7%)     | 83262 (21.47%)  | 68543 (0%)           | 3945.5                                      | 8136.4     |

Table 2. Comparison between power-up current and normal switching current for sequential circuits.

mation results of the *Imax/Gain-cluster* algorithm and the simulation-based algorithm. For eight out of the ten circuits, the *Imax/Gain-cluster* algorithm achieves larger estimation than the simulation-based algorithm. The improvement is up to 13%.

We have studied the estimation problem for sequential circuits. Mapping the states of flip-flops (FFs) to pseudo primary inputs (PPIs) and transforming the sequential circuits to combinational circuits, we solve the problem by using the *Imax/Gain-cluster* algorithm. The ATPG-based algorithm for sequential circuits uses much less runtime, and achieves 8% larger estimation than the simulation-based algorithm on average. The experimental results have also shown that the maximum switching current is often larger than the maximum power-up current for small circuits (fewer than 1000 gates). But for large circuits, the maximum power-up current is often larger than the maximum switching current and the difference can be up to 73%.

In the future, we plan to develop new methods to estimate maximum current for unmapped logic functions.

## References

- M. Abramovici, M. A. Breuer, and A. D. Friedman. *Digital* Systems Testing and Testable Design. IEEE PRESS, 1990.
- [2] J. Cong and Y. Ding. On area/depth trade-off in lut-based FPGA technology mapping. In *Proc. Design Automation Conf*, pages 213–218, 1993.
- [3] J. Cong, H. P. Li, S. K. Lim, T. Shibuya, and D. Xu. Large scale circuit partitioning with loose/stable net removal and signal flow based clustering. In *Proc. Int. Conf. on Computer Aided Design*, pages 441–446, 1997.
- [4] J. T. Kao and A. P. Chandrakasan. Dual-threshold voltage techniques for low-power digital circuits. *IEEE Journal of Solid-state circuits*, 35(7):1009–1018, July 2000.

- [5] Y. Kim, K. K. Saluja, and V. D. Agrawal. Combinational test generation for acyclic sequential circuits using a balanced ATPG model. *International Conference on VLSI Design*, pages 143–148, January 2001.
- [6] F. Li and L. He. Maximum current estimation considering power gating. In Proc. Int. Symp. on Physical Design, 2001.
- [7] F. Li, L. He, and K. K. Saluja. Estimation of maximum power-up current. In University of Wisconsin-Madison, Technique Report, ECE-01-2, July 2001.
- [8] F. Li, W. Zhao, and P. Tang. Improved ATPG-based maximum power estimation. *Chinese Journal of Computer-Aided Design and Computer Graphics*, 12(7):538–543, July 2000.
- [9] S. Thompson, P. Packan, and M. Bohr. MOS scaling: Transistor challenges for the 21st century. *Intel Technology Journal*, Q3, 1998.
- [10] C.-Y. Wang, K. Roy, and T.-L. Chou. Maximum power estimation for sequential circuits using a test generation based technique. In *Proc. IEEE Custom Integrated Circuits Conf.*, pages 229–232, Apr. 1996.