# **NBTI-Aware Synthesis of Digital Circuits**<sup>\*</sup>

Sanjay V. Kumar University of Minnesota sanjay@umn.edu Chris H. Kim University of Minnesota chriskim@umn.edu Sachin S. Sapatnekar University of Minnesota sachin@umn.edu

## ABSTRACT

Negative Bias Temperature Instability (NBTI) in PMOS transistors has become a major reliability concern in nanometer scale design, causing the temporal degradation of the threshold voltage of the PMOS transistors, and the delay of digital circuits. A novel method to characterize the delay of every gate in the standard cell library, as a function of the signal probability of each of its inputs, is developed. Accordingly, a technology mapping technique that incorporates the NBTI stress and recovery effects, in order to ensure optimal performance of the circuit, during its entire lifetime, is presented. Our technique, demonstrated over 65nm benchmarks shows an average of 10% area recovery, and 12% power savings, as against a pessimistic method that assumes constant stress on all PMOS transistors in the design.

## **Categories and Subject Descriptors**

B.8.2 [**Performance and Reliability**]: Performance Analysis and Design Aids

## **General Terms**

Reliability, Performance, Design

## **Keywords**

Negative Bias Temperature Instability (NBTI), Signal Probability, Technology Mapping, Area, Delay

## 1. INTRODUCTION

With transistor scaling into the sub-65nm technology node, the impact of NBTI (Negative Bias Temperature Instability) on temporal circuit performance has become extremely important. NBTI can be explained by mechanisms related to the generation of interface traps in PMOS transistors due to the dissociation of Si - H bonds along the oxide interface, and has become a major reliability concern. The effects of this phenomenon are maximized when the gate-to-source voltage,  $V_{gs}$ , across a PMOS transistor equals  $-V_{dd}$ , and is accelerated by elevated operating temperatures. NBTI manifests itself as an increase in the transistor threshold voltage, causing the drive current to decrease. This causes the logic gates to slow down, and the critical paths may no longer be able to sustain the required timing.

While there have been several different physical explanations for the NBTI mechanism, leading to various models [1,2,3,4,5,6,7], most of them have shown that the PMOS transistor threshold voltage rises logarithmically with time, leading to about 25-30% increase in its value after 10 years. The extent of threshold voltage degradation is strongly dependent on the amount of time for which the device has been stressed and relaxed, since the generation of traps under negative bias stress is followed by annealing of the traps during recovery, (i.e., when the stress is relaxed). Circuit simulations using these models have shown that NBTI causes the delay of digital circuits to worsen by about 10% [1,8,9].

A general solution to maintaining optimal performance under the influence of NBTI has been to reduce the delay of the critical paths through the use of gate sizing  $\left[3,9\right]$  . The work in [9] formulates a nonlinear optimization problem to determine the optimal set of gate sizes required to ensure that the circuit runs at its delay specification, after 10 years of operation. The work is based on a model for NBTI, that ignores the effect of recovery, in computing the threshold voltage degradation. The model cumulatively adds the time for which the gates are stressed during their entire lifetime, and estimates the threshold voltage degradation, assuming that the gates are continuously stressed for that duration. Hence, their results show that the increase in the circuit area is rather weakly dependent on the signal probabilities of the nodes, and assuming that all gates in the circuit are always NBTI affected (worst case design) does not significantly affect the final solution. The authors consider the gate sizes to be continuous, and show that an increase in area of about 8.7%, as compared to a design that ignores NBTI effects, is required to meet the target delay.

We observe that the above idea can be readily used in other transforms, such as technology mapping, by replacing the nominal value of the delays of the gates in the standard cell library, with the delay under worst case NBTI. The target frequency is given to the synthesis tool, and technology mapping can be performed using these NBTI-affected library cells to produce a circuit, which is structurally different from that obtained using the sizing algorithm in [9], but is functionally equivalent, and meets the timing.

However, we find the conclusion that the delay is independent of signal probability does not hold, under a model that captures the healing of NBTI, on removal of the applied stress. This happens frequently in a circuit: for example, when the input signal to a CMOS inverter changes from logic 0 to logic 1, the  $V_{gs}$  stress is relaxed from  $-V_{dd}$  to zero. The recovery in threshold voltage on removing the applied stress, can be explained by physical mechanisms related to annealing of interface traps, and reformation of Si - H bonds. Experiments in [2, 7], and subsequently the models in [1, 3, 4], have shown that considering the effect of annealing and re-

<sup>\*</sup>This research was supported in part by the NSF under award CCF-0541367, and by the SRC under contract 2007-TJ-1572.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DAC 2007, June 4-8, 2007, San Diego, CA, USA

Copyright 2007 ACM 978-1-59593-627-1/07/0006 ...\$5.00.

covery has a significant bearing on the overall NBTI impact.

Further, the work in [9] merely computes the fractional increase in the area of each gate on an existing design, (implying a library consisting of infinitely many sizes of each gate), to counter the temporal degradation caused by NBTI. Instead, our work relies on integrating the effects of NBTI stress and recovery into circuit design at a much earlier stage, i.e., during technology mapping. Since a digital circuit consists of millions of nodes with differing signal probabilities, it is essential to estimate the delay of the gates in the library based on their input node signal probabilities, and use these delay values during technology mapping. We present a novel model to estimate the delay degradation of every gate in the standard cell library, as a function of the signal probabilities of its input nodes. Accordingly, our paper proposes an approach to modify the process of technology mapping, based on the signal probability (defined as the probability that the signal is low<sup>1</sup> and denoted by SP) of the nodes in the circuit. The SP values of the principal replacements inputs can be determined, based on RTL-level simulations and statistical estimates. The SP values at every other node in the subject graph are calculated accordingly, and this information is used to choose the best gate to meet the timing at each node.

The effectiveness of this scheme is compared with the worst case NBTI library based synthesis approach, where the delay of every gate in the standard cell library is replaced with its equivalent NBTI affected value, computed by assuming that the PMOS transistors are continuously stressed, and technology mapping is performed using this library. Our results indicate that an average 10% recovery in area can be obtained using the SP based method, as opposed to the worst case NBTI based method.

# 2. NBTI MODELING AND LIBRARY CHAR-ACTERIZATION

In this section, we present an overview of the NBTI model used to calibrate the delay of the gates in the standard cell library. We first characterize the threshold voltage, denoted by  $V_{th}$ , of the PMOS transistors, as a function of the number of interface traps, which is further dependent on the probability that the input to the PMOS device is low. We obtain a look-up table of  $V_{th}$  versus the signal probability, SP, and use this look-up table to estimate the threshold voltage degradation of each PMOS transistor in the gate, based on the nodal signal probabilities, and thereby compute the delay of every cell in the library. The delay of each standard cell in the library is precomputed as a function of the SP of each input node, and can be referenced accordingly, during technology mapping.

The extent of threshold voltage degradation as a function of the number of interface traps is calculated from [1] as: PSfrag replacements

$$\Delta V_{th} = \frac{(m+1)qN_{IT}}{C_{ox}} \tag{1}$$

where  $C_{ox}$  is the oxide capacitance,  $N_{IT}$  is the number of interface traps, and m is the additional factor added due to mobility degradation. The  $N_{IT}$  computations are applicable to deterministic waveforms, whose exact nature of stress and relaxation are known. However, it is impossible to predict the exact input pattern for digital circuits, since the input waveforms are random in nature, and can merely be represented using statistical information, such as signal probabilities and activity factors or switching probabilities. Hence, these random waveforms are converted into equivalent deterministic periodic rectangular signals [1] by ensuring that the signal probabilities of the random waveform and that of the deterministic periodic waveform, i.e., the cumulative off-times for both the waveforms are the same. The method is explained in detail, and proven analytically as well, for square waveforms, in [1], and is pictorially shown in Fig. 1. For convenience p and q are chosen to be integral values such that the time period of this periodic waveform is  $qt_0$ , where  $t_0$  is the time step value used for simulations, and  $\frac{p}{q}$ , i.e., the signal probability or the off-time duty cycle of the periodic waveform, is equal to SP, where SP denotes the signal probability of the random waveform.



Equivalent periodic rectangular waveform

Figure 1: A random waveform (with signal probability = SP) and its equivalent periodic rectangular waveform with the same stress duty cycle  $(\frac{p}{q} = SP)$ .

The above method estimates the number of interface traps  $N_{IT}$ , by assuming that stress is applied on the PMOS device for the period between  $nqt_0$  and  $(nq+p)t_0$ , and is followed by relaxation from time  $(nq+p)t_0$  to  $(n+1)qt_0$ , for n = 0,1,2..., as shown in the figure. From [1],  $N_{IT}$  at any time  $jt_0$  (where j is a positive integer) for this periodic waveform is given by the equation:

$$N_{IT}(jt_0) = s_j N_{IT}(t_0)$$
 (2)

and  $s_j$  for any j = nq + i is given by:

$$s_{nq+i} = \begin{cases} (i+s_{qn}^{6})^{\frac{1}{6}} & 0 < i \le p\\ \frac{s_{nq+p}+s_{nq}\left(\frac{i-p}{2i}\right)^{0.5}}{1+\left(\frac{i-p}{2i}\right)^{0.5}} & p < i \le q \end{cases}$$
(3)



# Figure 2: $|V_{th}|$ versus signal probability (SP) for a 65nm PMOS transistor.

Using the above method, random waveforms at the gate input nodes are equivalently represented as deterministic waveforms with the same SP values, and the number of interface traps, and the  $V_{th}$  values, after  $3 \times 10^8$  seconds ( $\approx 10$ years) of NBTI stress and relaxation are computed, using  $t_0 = 0.01$ s and q = 100 in the above equations. This process is repeated for different values of SP, (i.e., p ranging from 0 to 100, such that SP ( $= \frac{p}{q}$ ) varies from 0 to 1, in steps of 0.01) to obtain a plot of  $|V_{th}|$  versus SP for a 65nm PTM transistor [10], as shown in Fig. 2. The nominal value of  $|V_{th}|$  is 0.365V, while the value of  $|V_{th}|$  under constant NBTI stress, (also known as static NBTI stress or

<sup>&</sup>lt;sup>1</sup>Conventionally, the signal probability has been defined as the probability that the signal is high. However, since NBTI stress on PMOS devices is caused by logic 0 signals, for convenience, we define SP as the probability that the signal is at logic zero.

DC stress), is 0.456V. The curve shows a steep rise near SP = 0.01. Although SP = 0.01 corresponds to  $3 \times 10^6$  seconds of stress, and approximately  $3 \times 10^8$  seconds qb grass replacements The upper curve plots the delay  $D_2$  from node  $I_2$  to the relaxation effect, and hence the amount of time for which stress is applied in this case is sufficiently large enough to cause a significant amount of trap build-up.

A look-up table of  $V_{th}$  versus SP is built for different values of SP, ranging from 0 to 1. Clearly, SP = 0 corresponds to the case where there is no NBTI impact, and is referred to as the nominal case, while SP = 1 corresponds to the case where the gate is under continuous DC stress, and hence represents the worst case NBTI.

Since the SP at a gate input, and hence the delay, depends on where the gate is placed in the circuit, a characterization of gate delay as a function of the input node signal probabilities, is necessary. This is achieved by first computing the threshold voltage of each PMOS device in the gate as a function of the nodal SP values, and performing SPICE based delay simulations on the gate with the modified  $V_{th}$  values. Thus, the NBTI-affected delay of each gate in the library can be characterized. We provide an overview of the effect of NBTI on different classes of logic gates, namely NAND, NOR and inverters, in the next subsection.

#### **Inverters:** 2.1





Since an inverter consists of a single PMOS transistor whose source is always connected to  $V_{dd}$ , applying a logic 0 at the input node corresponds to the case of NBTI stress, while applying a logic 1 corresponds to relaxation.

Thus, given the SP value at the input node, the  $V_{th}$  of the PMOS transistor can be computed using the look-up table described in the previous section. The rise delay of the inverter can be calibrated with respect to the  $V_{th}$  of the PMOS transistor through SPICE simulations, and the plot of delay after 10 years of operation, versus SP, is shown in Fig. 3(a). The results indicate that the rise delay increases linearly with SP. The steep rise near SP = 0 follows from the shape of the  $V_{th}$ -SP plot in Fig. 2, where the threshold voltage rises sharply initially. The fall delay depends on the  $V_{th}$  of the NMOS transistor, and hence largely remains unaffected.

#### 2.2 NAND Gates:

The behavior of NAND gates is similar to that of inverters since they consist of a single PMOS transistor between the output node and  $V_{dd}$ . The SP of node  $I_i$ , denoted as  $SP_i$ , determines the  $V_{th}$  of the PMOS transistor connected to that node only<sup>2</sup>. Accordingly, the rise delay  $D_i$  of the gate, defined as the delay of the transition from input node  $I_i$  to the output node O, is a function of  $SP_i$ , as shown in Fig. 4(a). The rise delay of a two input NAND gate after 10 years the output node O, as a function of  $SP_2$ , while the lower curve plots the delay  $D_1$  from node I<sub>1</sub>, as a function of SP<sub>1</sub>. The shape of the plot is similar to that of the inverter curve shown in Fig. 3(a). The fall transition occurs through the NMOS device stack and is largely unaffected by NBTI.



Figure 4: Schematic of (a) a k-input NAND, and (b) a two input NOR.

### 2.3 NOR Gates:

Unlike the NAND gates or the inverters, which consist of a single PMOS device between  $V_{dd}$  and the output node, NOR gates consist of PMOS stacks, and hence the steady states of the intermediate nodes determine whether each individual transistor is under negative bias stress or relaxation. The analysis for two input NOR gates is presented below:



Figure 5: Rise delay of a two input NOR gate versus  $SP_i$ , taking into account the NBTI effect, after 10 years of operation. The curves plot delay versus  $SP_2$ for  $SP_1 = [0, 0.33, 0.67, 1]$ .

Let us consider a two input NOR gate represented as O  $=\overline{I_1 + I_2}$ , whose schematic is shown in Fig. 4(b). The probability that the upper PMOS transistor, i.e., the one connected to  $I_2$  in Fig. 4(b) is under stress is simply

$$p(\mathbf{I}_2) = \mathbf{SP}_2 \tag{4}$$

since its source is always at  $V_{dd}$ . However, the probability that the lower transistor is NBTI affected is given by

$$p(\mathbf{I}_1) = \mathbf{SP}_1 \cdot \mathbf{SP}_2 \tag{5}$$

since the probability of its source being at  $V_{dd}$  depends on the probability that the gate input at  $I_2$  is 0. Thus, stacking reduces the impact of NBTI.

The delay of a two-input NOR gate as a function of the input SP values is shown in Fig. 5. The worst case rise delay after 10 years of operation, is computed as the delay from node I<sub>2</sub> to the output node. The delay depends on the  $V_{th}$  of

 $<sup>^2 \</sup>mathrm{The}$  worst case rising transition of a NAND gate from  $\mathrm{I}_k$ to O occurs with  $I_1 \stackrel{\sim}{=} I_2 = \dots = I_{k-1}$  all at  $V_{dd}$  i.e., the PMOS devices are all off. Hence,  $D_k$  depends on  $SP_k$  only.



Figure 6: Results of technology mapping for C17 benchmark: (a) shows the result for nominal synthesis, which results in a circuit that fails with aging, while (b) shows the result for worst case NBTI synthesis, and (c) indicates the result for SP based synthesis. Note that "a" represents the minimum size for a particular gate and "b", "c", "d", etc. represent gates of higher sizes.

all transistors in the stack, and is hence plotted as a function of both SP<sub>1</sub>, and SP<sub>2</sub>. When SP<sub>2</sub> = 0, there is no trap buildup for the PMOS transistor connected to node I<sub>2</sub>, and the probability that its drain is at  $V_{dd}$  is zero. Consequently, the source of the transistor at node I<sub>1</sub> is never at  $V_{dd}$ , and there is no trap build-up for the PMOS device connected to I<sub>1</sub> as well. Hence, the rise delay is equal to its nominal value. For the remaining cases, the rise delay increases as a function of SP<sub>1</sub> and SP<sub>2</sub>, as shown in the figure. It is evident from the above expressions that pin reordering can be performed to ensure that the nodes with higher SP values are placed closer to the output node, thereby minimizing the overall delay degradation of the gate. The analysis for three-input NOR gates can be performed similarly, and is omitted due to space constraints.

For complex gates, a combination of the analysis for NAND and NOR gates is performed to characterize the delay as a function of the input signal probabilities. The next section describes the logic synthesis mechanism using the above library characterization, and the results obtained.

## 3. NBTI-AWARE TECHNOLOGY MAPPING

In this section, we describe the process of technology mapping using the smallest ISCAS85 benchmark C17, as an example. The benchmark consists of five inputs i0, i1, i2, i3, and i4. There are two primary outputs, y9 and y10. The logic function computed by this circuit is given as follows:

$$\begin{array}{rcl} y9 & = & (i1+i4) \cdot (\overline{i2}+\overline{i3}) \\ y10 & = & i0 \cdot i2 + i1 \cdot (\overline{i2}+\overline{i3}) \end{array} \tag{6}$$

The subject graph for C17 obtained through SiS [11], using a NAND2-NOT representation. Technology mapping is then performed, using a 65nm PTM [10] based standard cell library, consisting of 10 NOT gates, 6 NAND2 gates, 6 NOR2 gates, 5 NAND3 gates, 3 NOR3 gates, 3 AOI22 gates, 3 AOI12 gates, 3 OAI22 gates, and 3 OAI12 gates of varying sizes. A large library set consisting of different gates of varying sizes is chosen, to provide the synthesis tool with different options to implement a given logic cone. The delay of each of these gates in the library is precharacterized as a function of their input signal probabilities, at each of the original data points. The overhead in precharacterizing the library depends on the original number of corners at which the delay was characterized, and can be reduced by using linear models with respect to the signal probabilities, which provide an accurate fit, as can be observed from Figs. 3(a), 3(b), and 5.

During technology mapping, the logic cone for each node can be implemented in various structurally different ways, to realize the same functionality. The mapping tool computes the area and the delay of each of these realizable structures, using data from the standard cell library, and performs a "best match" search over the candidate gates, based on the optimization constraints (minimum area, minimum delay, a linear combination of both, etc.). The best match is retained and the corresponding input binding is preserved. This procedure is repeated over all nodes in the covering phase, in a primary output to primary input order, such that the fan-out nodes for a particular node are synthesized before mapping the node itself. This step is followed by global area recovery, and fanout-optimization, during which the gate sizes are altered. A buffer insertion step is also performed to further optimize the circuit (This step does not change the SP values of the existing nodes.). Three different objectives are used to synthesize the circuits, namely:

- Nominal Synthesis: Technology mapping is realized using the nominal timing library, where the delay of each gate at time t = 0 is used. It must be noted that circuits designed using this method fail well before their lifetime, due to the NBTI induced temporal delay degradation.
- Worst case NBTI Synthesis: Technology mapping is performed using the NBTI-affected timing library, where the worst case delay of each gate computed after 10 years of continuous NBTI stress, is used, instead of the nominal delay values.
- SP based Synthesis: Technology mapping is performed using the SP information at each node to choose the gate with the least area overhead to meet the timing requirement. The delay of the gates in the library as a function of SP, is precomputed in the form of linear best fit-curves.

The input parameters to the synthesis tool and the results of technology mapping, for each of these three cases for a given target delay, are described below, using C17 as the test circuit. The target delay is chosen as 70ps for this case.

## **3.1** Nominal Synthesis

This corresponds to the case where the nominal delay of each gate is used during technology mapping. The final result of the synthesizer is shown in Fig. 6(a). The area of the circuit, computed as the sum of the widths of all transistors, is 7.4 $\mu$ m, and all the gates used in the circuit for mapping are minimum sized (of size "a"). The nominal synthesis method is considered for comparison purposes only, since circuits designed using this scheme are not NBTI-tolerant and fail to meet the timing after a certain period of time, due to the temporal degradation caused by NBTI.

## **3.2** Worst case NBTI Synthesis

In this case, the rising delays of the gates in the library are replaced by their NBTI-affected value, after 10 years of continuous stress, while the falling delays remain unaltered. SPICE simulations are performed with the  $V_{th}$  value corresponding to constant DC stress in the  $V_{th}$ -SP look-up table, (i.e.,  $V_{th} = 0.456$ V), and the corresponding delays are used in the timing library, instead of the nominal values. Expectedly, larger sized gates must be used to meet the timing required, resulting in higher area as compared with the nominal case. The mapped circuit for C17 is shown in Fig. 6(b). The size of each gate is shown in the figure, and it is evident that the gates along the critical path must be appropriately sized to meet the timing constraints. The final area of the circuit is  $11.6 \mu$ m.

### **3.3** SP based Synthesis

For the SP based synthesis, it is vital to propagate the SP information across all nodes based on the logic function being realized. This step is performed on the subject graph in SiS, which consists of a NAND2-NOT based decomposition of the circuit. The SP for the primary inputs affrag replacements. assigned initial values, determined by RTL level simulations and statistical estimates (0.5 in our case), and these values are propagated along the various nodes in the subject graph in a PI-PO (primary input-primary output) order. During technology-mapping, the SP values of the nodes are passed to a function that determines the delay of the various logic structures that realize the logic cone. The best-delay match is subsequently obtained and the corresponding gate that has the minimum NBTI-impact is chosen. The above step is repeated globally, until all nodes have been mapped to their best matches. The final mapped circuit for C17 is shown in Fig. 6(c). The area for this case is  $9.8\mu$ m. The SP based synthesis requires 15% less area as compared with the worst case NBTI synthesis, for a particular target delay.

The experimental results for the SP based and worst case synthesis methods, obtained over different ISCAS85 and LGSYNTH benchmarks, are presented in the next section.

## 4. **RESULTS**

This section presents the results of technology mapping using a PTM [10] 65nm technology node based library, for the worst case NBTI, and SP based synthesis. The results are shown for some ISCAS85 and LGSYNTH benchmarks in Table 1. The target delay for each benchmark is set such that it lies in the region of the area-delay curve where the percentage change in the area is comparable with the percentage change in the delay, thereby providing scope for optimization. The area (calculated as the sum of the transistor widths of all gates) and the sum of active power (computed using the formula  $P = f C_L V_{dd}^2 \alpha$ , where f = frequency of operation,  $C_L$  = loading capacitance,  $V_{dd}$  = supply voltage, and  $\alpha$  = activity factor or switching probability), and leakage power (computed using the formula  $L = (I_{sub} + I_{gate})V_{dd}$ over all input combinations, where  $I_{sub}$  is the subthreshold current, and  $I_{gate}$  is the current due to gate-leakage), are reported for each circuit. The columns titled "Savings" estimate the amount of area or power that can be recovered using the SP based synthesis as against using the worst case NBTI synthesis. Most benchmarks result in better area and power when synthesized using the SP based method, as opposed to the worst case NBTI synthesis. Although, technology mapping was performed to obtain a circuit with minimal area, the objective function can be modified to minimize the active power, leakage, etc. The table shows an average of 10% recovery in area and an average of 12% savings in

power (active + leakage) for the benchmarks, due to significant reduction in the total device size, and capacitance. The results indicate that considering the SP values during technology mapping has a significant bearing on the circuit generated during logic synthesis.

The area versus delay curve for varying target delay values is shown for the SP based synthesis and worst case NBTI synthesis methods for the LGSYNTH benchmark b1 in Fig. 9. It must be noted that the figure only shows the central linear region of the area-delay curve, where the percentage change in area is comparable with the percentage change in the delay. The target delay of the circuits is chosen to lie in this region, since efficient area-delay trade-offs can be achieved here. Beyond this region, either the area or the delay overhead is large, thereby leading to a suboptimal design. The upper curve represents data for the worst case NBTI synthesis, while the lower curve corresponds to SP based synthesis. In this region, clearly the area of the SP based synthesis method is less than the area of the worst case NBTI library based method by about 10%, for any target delay. Accordingly for a target delay of 108ps, 11% area savings can be obtained as seen from the figure.



Figure 7: Area-delay curve for the benchmark b1.

In order to obtain a comparison of the reliability of the circuits synthesized using the three methods, namely nominal, worst case NBTI, and SP based synthesis, timing simulations are performed on each of the three synthesized circuits on all benchmarks, at various time stamps. The threshold voltage at each time time stamp is computed, and the gate delays are characterized to obtain a library that corresponds to the NBTI induced degradation on the standard cells, at the given time stamp. The SP of all primary inputs are assigned to be 0.5, and the SP values at the intermediate nodes are calculated through Monte Carlo simulations, based on the method in [12]. Accordingly, the arrival times at the primary output nodes at different time stamps are computed. The results for C432 are shown in Fig. 12. The top most curve shows the results for the nominal case, while the bottom most curve shows the results for the worst case NBTI synthesis case, and the middle curve shows the results for the circuit designed using SP based synthesis method. The results show that the delay of the benchmarks increases with time logarithmically, and the three curves run almost parallel to one another.

Since the target delay for C432 is desired to be 790ps, we assume that the circuits are no longer functional if the arrival time exceeds the target delay. Although the area of the circuit synthesized using the nominal case is less than that using the SP based synthesis method, the circuit becomes dysfunctional after  $4 \times 10^4$ s, ( $\approx$  half a day), rendering it practically useless, whereas the circuit synthesized using the SP based method can sustain timing degradation up to 10 years. The circuit synthesized using the worst case NBTI synthesis method is reliable for over 10 years, but this method overestimates the extent of temporal degradation, and hence leads to a design that requires higher area and power. Thus, using the SP based synthesis method leads

| Benchmark                 | Target Delay (ps) | Worst case NBTI Synthesis |                 | SP based Synthesis |           |                 |           |
|---------------------------|-------------------|---------------------------|-----------------|--------------------|-----------|-----------------|-----------|
|                           |                   | Area $(\mu m)$            | Power $(\mu W)$ | Area $(\mu m)$     | % Savings | Power $(\mu W)$ | % Savings |
| C17                       | 70                | 11.3                      | 0.8             | 9.8                | 12%       | 0.7             | 10%       |
| C432                      | 790               | 594.3                     | 57.2            | 548.4              | 8%        | 52.7            | 8%        |
| C499                      | 648               | 1192.9                    | 57.1            | 1075.7             | 10%       | 52.2            | 9%        |
| C880                      | 610               | 636.2                     | 121.7           | 588.5              | 7%        | 107.8           | 11%       |
| C1355                     | 735               | 1282.2                    | 122.0           | 1051.8             | 18%       | 99.2            | 19%       |
| C1908                     | 860               | 1234.6                    | 122.5           | 1191.7             | 3%        | 117.2           | 4%        |
| C2670                     | 765               | 1347.1                    | 127.7           | 1337.9             | 1%        | 127.5           | 0%        |
| C3540                     | 1100              | 2569.8                    | 256.4           | 2057.4             | 20%       | 206.2           | 20%       |
| C6288                     | 3200              | 4356.2                    | 448.0           | 3817.5             | 28%       | 387.4           | 14%       |
| C7552                     | 990               | 4009.9                    | 409.0           | 3858.4             | 4%        | 394.2           | 4%        |
| $\operatorname{majority}$ | 110               | 19.2                      | 1.6             | 16.4               | 14%       | 1.2             | 25%       |
| b1                        | 108               | 27.1                      | 2.8             | 24.0               | 11%       | 2.2             | 23%       |
| decod                     | 151               | 143.4                     | 11.9            | 118.9              | 17%       | 9.2             | 22%       |
| $\operatorname{cordic}$   | 297               | 162.9                     | 13.1            | 152.1              | 7%        | 12.6            | 4%        |
| alu2                      | 923               | 760.2                     | 74.3            | 691.3              | 9%        | 65.5            | 12%       |
| apex6                     | 365               | 1080.9                    | 98.0            | 1044.2             | 3%        | 90.2            | 8%        |
| $\mathrm{des}$            | 620               | 8738.4                    | 891.0           | 8657.1             | 1%        | 866.0           | 3%        |
| alu4                      | 940               | 1498.6                    | 149.0           | 1302.1             | 13%       | 126.2           | 15%       |
| too <u>l</u> arge         | 545               | 1582.1                    | 153.4           | 1511.0             | 4%        | 140.7           | 8%        |
| vda                       | 480               | 2088.0                    | 243.1           | 1966.7             | 6%        | 222.6           | 8%        |
| Average                   |                   |                           |                 |                    | 10%       |                 | 12%       |

Table 1: Results of Technology Mapping for ISCAS85 and LGSYNTH benchmarks



Figure 8: Temporal degradation of C432.

to an optimized circuit that minimizes the area and power overhead to ensure enhanced reliability up to 10 years.

#### CONCLUSION 5.

NBTI has now become an important reliability concern in circuit design. Its deleterious effect on the PMOS transistors has caused circuit delays to worsen by about 10%, after 10 years of operation, thereby forcing designers to relax the target frequency of operation, or seek solutions to sustain optimal performance. Our work proposes a method to perform technology mapping, by taking into account the exact NBTI effect, and its dependency on the amount of time for which the gate has been stressed and relaxed. The delay of the gates in the standard cell library is represented as a function of the input node signal probabilities, and this information is used to perform optimization during technology mapping. Accordingly, circuits are synthesized to ensure optimal performance during the entire lifetime of around 10 years, despite NBTI induced temporal degradation. The results of this SP based NBTI-aware synthesis scheme are compared with a worst case NBTI library based synthesis, and the area-power savings that can be achieved are reported. Our experimental results indicate that an average of 10% savings in area and around 12% savings in power can be achieved

using this method.

#### 6. REFERENCES

- REFERENCES
  S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, "An Analytical Model for Negative Bias Temperature Instability (NBTI)," in Proceedings of the IEEE/ACM International Conference for Computer Aided Design, November 2006.
  M. A. Alam, "A Critical Examination of the Mechanics of Dynamic NBTI for pMOSFETs," in IEEE International Electronic Devices Meeting, pp. 14.4.1-14.4.4, December 2003.
  R. Vattikonda, W. Wang, and Y. Cao, "Modeling and Minimization of PMOS NBTI Effect for Robust Nanometer Design," in Proceedings of the IEEE/ACM Design Automation Conference, pp. 1047-1052, July 2006.
  S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and
  S. Vrudhula, "Predictive Modeling of the NBTI Effect for Reliable Design," in Proceedings of the Custom Integrated Circuits Conference, September 2006.
  M. A. Alam and S. Mahapatra, "A Comprehensive Model of [1]
- [3]
- M. A. Alam and S. Mahapatra, "A Comprehensive Model of PMOS NBTI Degradation," *Journal of Microelectronics Reliability*, vol. 45, pp. 71-81, August 2004. Available at www.sciencedirect.com. [5]
- M. A. Alam, "On the Reliability of Micro-electronic Devices: [6]
- M. A. Alam, "On the Rehability of Micro-electronic Devices: An Introductory Lecture on Negative Bias Temperature Instability," in Nanotechnology 501 Lecture Series, September 2005. Available at http://www.nanohub.org/resources/?id=193.
   S. Chakravarthi, A. T. Krishnan, V. Reddy, C. Machala, and S. Krishnan, "A Comprehensive Framework for Predictive Modeling of Negative Bias Temperature Instability," in Proceedings of the IEEE International Reliability Physics Summering and 272, 282. April 2004 Symposium, pp. 273-282, April 2004.
- [8] B. C. Paul, K. Kang, H. Kufluoglu, M. A. Alam, and K. Roy, "Impact of NBTI on the Temporal Performance Degradation of Digital Circuits," *IEEE Electron Device Letters*, vol. 26, pp. 560-562, August 2003.
- B. C. Paul, K. Kang, H. Kufluoglu, M. A. Alam, and K. Roy, "Temporal Performance Degradation under NBTI: Estimation and Design for Improved Reliability of Nanoscale Circuits," in
- [10]
- and Design for Improved Reliability of Nanoscale Circuits," in Proceedings of the Design Automation and Testing Europe, pp. 1-6, March 2006. "Predictive Technology Model." Device Group at Arizona State University, Available at http://www.eas.asu.edu/~ptm. E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangiovanni-Vincentelli, "SIS: A System for Sequential Circuit Synthesis," Tech. Rep. UCB/ERL M92/41, University of California, Berkeley, 1992. Available at http://www-cad.eecs.berkelew.edu/research/sis. [11]
- [12] R. Burch, F. N. Najm, P. Yang, and T. N. Trick, "A Monte Carlo Approach for Power Estimation," *IEEE Transactions on VLSI Systems*, vol. 1, pp. 63-71, March 1993.