# An assigned probability technique to derive realistic worst-case timing models of digital standard cells

Alessandro Dal Fabbro, Bruno Franzini, Luigi Croce and Carlo Guardiani SGS-THOMSON Microelectronics, v. C.Olivetti, 2,

20041 Agrate Brianza (MI), ITALY

Abstract - The possibility of determining the accurate worstcase timing performance of a library of standard cells is of great importance in a modern VLSI structured semicustom IC design flow. The margin for profitability is indeed extremely tight because of the ever increasing performance demand which can hardly be satisfied by a corresponding progress of the process technology. It is therefore of utmost importance to avoid excessively pessimistic estimates of the actual cell performance in order to exploit all the potential of the fabrication process. In this paper it is described a technique that allows to determine the worst-case points with an assigned probability value. It is thus possible to select the desired level of confidence for the worst-case evaluation of digital IC designs with good accuracy. The results of the Assigned Probability Technique (APT) are presented and compared with those obtained by standard methods both at cell and at circuit level showing the considerable benefits of the new method.

### I - INTRODUCTION

The current methodologies for extracting worst-case information from measured random fluctuations of the fabrication process leads to an overly pessimistic estimate of the circuit performance across the different possible levels of simulation. This paper presents a new methodology that can be used to find a set of process parameters giving the worstcase performance, in terms of propagation delay, for all the cells of a library. Moreover the model parameter set derived with this methodology represents a more realistic worstcase. A realistic worst-case is one that has a finite, predefined probability of being realized in practice. In general the existing techniques for deriving worst-case models cannot take into account the actual probability values in the circuit performance space but only in the process or device parameters space. In fact the mapping from the device parameters space to the circuit performance space that is achieved through the circuit topology cannot be defined

32nd ACM/IEEE Design Automation Conference ®

Permission to copy without fee all or part of this material is granted, provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1995 ACM 0-89791-756-1/95/0006 \$3.50

before actually designing it. Often the worst-case simulation models are derived without even considering the effective simultaneous probability of the corresponding combination of device values, but only represent an heuristic mixture of corner points assembled to give the worst performance possible of the CMOS cell. In most of the cases this leads to a number of problems:

- excessive cell area
- errors in the identification of the actual critical path
- · underestimated circuit potential

The methodology presented in this paper allows to choose in advance the desired probability value of the worst-case <u>in</u> <u>the performance space</u> (e.g. the  $4\sigma$  point of a gaussian timing distribution corresponding to 1.6  $10^{-5}$  probability of realizing a circuit with larger delay). The relative set of process/device parameters achieving the desired performance is then computed accordingly. The choice of the worst-case probability can be realized by a trade-off between the conflicting objectives of exploiting the most of the available potential of the fabrication process and that of achieving the maximum yield. By adopting the technique presented in this paper the designer is provided with the proper tools to take this fundamental decision.

### **II - THE ASSIGNED PROBABILITY TECHNIQUE**

The APT flow is shown in figure 1. It is possible to individuate six major steps:

1. The initial step consists in selecting a subset of the library cells that represent a reasonable sample of the whole library. If a target IC exists then the sample cells could be those actually used by the circuit being designed. Even though the whole library can be used it is possible to reduce considerably the computational effort by using just a small number of cells (about 10% of the cells in the library may be sufficient) that have been properly selected from the existing categories, e.g. combinational, sequential, etc.



2. A Response Surface Model (RSM, [1]-[2]) is built for all the timing parameters of the selected cells as a function of the process parameters set. The process parameters set can be described by a vector of random variables  $\vec{p}$  characterized by a joint probability density function

(jpdf)  $f(\vec{p})$  which can be accurately derived by measurements. The timing vector  $\vec{T}$  is a function of  $\vec{p}$ , i.e.

 $\vec{T} = T(\vec{p})$ . The explicit form of  $T(\vec{p})$  is generally unknown but an accurate analytical approximation can be derived with the RSM technique.

- 3. The third APT step consists in estimating the statistical parameters of the jpdf of  $\vec{T}$  by means of a Monte Carlo technique [3] [4].
- 4. The desired probability value of the worst-case points is selected. This value can be expressed as the probability of realizing a circuit timing performance which is worst then that achieved by the worst-case point:

$$\bar{p} = P\left[\vec{T} \ge \vec{T}^*\right] = \int_{\vec{T}^*}^{\infty} f\left(\vec{T}\right) d\vec{T}$$
(1)

In practice it is possible to define it indirectly by assigning  $\dot{\vec{T}}^*$ , which can be expressed as the distance in standard deviation units from the distribution mean  $\mu$  (i.e.

$$\dot{T}^* = \mu + N\sigma$$
 ).

The corresponding  $\bar{p}$  -percentile of the random vector  $\vec{p}$  can be found by numerical optimization of the RSM models [2], [5]-[8]. For each component of  $\vec{T}$  a different  $\bar{p}$  -percentile value is found which can be represented by a point in the process parameters hyperspace.

- 5. The fifth APT step consists in grouping the  $\bar{p}$ -percentile points according to their mutual euclidean distance in the process parameters hyperspace. There is a limited number of physical effects that tend to increase the propagation delay. It is therefore reasonable to suppose that the different propagation delays tend to be maximized by a limited number of worst-case points. The results presented in this paper will confirm that it is possible to identify a small number of worst-case points each corresponding to a particular physical effect.
- 6. Finally a single process parameters vector for each group is selected. This can be done by finding the combination of process variables that achieves the worst performance relatively to all the timing parameters in each group. This problem can be expressed as a multiple objectives non linear optimization problem [5], [6] which can be easily solved by using a numeric minimization algorithm [8].

## III - DETERMINING THE WORST-CASE MODELS WITH APT: AN APPLICATION EXAMPLE

The APT technique has been applied to the determination of the worst-case models of a standard cell library in a  $0.75\mu$  CMOS process. The library is composed of about 60 combinational cells and 40 sequential cells. A sample of 8 combinational cells (and, nand, or, nor, inverter, buffer, xor) and of two sequential cells (FFs) has been selected. The timing vec-

tor  $\vec{T}$  is represented by pin to pin propagation delays and timing checks (setup, hold) measurements for each cell. The

relevant process parameters set  $(\vec{p})$  used for this technology is shown in table 1. These parameters have been character-

ized as first order independent random variables and their probability distribution function has been accurately estimated from in-line measurements. Another set of process parameters that track after the independent ones has also been characterized (e.g. the p- $\Delta$ l and p- $\Delta$ w tracking after n- $\Delta l$  and n- $\Delta w$  respectively). An analytic approximation to the unknown function  $\vec{T} = T(\vec{p})$  which relates the timing vector to the process parameters vector is obtained using the RSM statistical design CAD system named PLUTO [2]. The basic steps to obtain an analytic approximation  $\hat{T}$  to  $T(\vec{p})$ consist in generating an experimental design [1] (a Central Composite Design, or CCD, with 81 samples has been used) in the geometrical  $\vec{p}$  space, running an accurate electrical simulation with ELDO[9] in each of the CCD points and finally in finding the regression surface  $\hat{T}$  by minimizing a suitable measure of the approximation error, e.g. the least squares.

| Parameter name | Description                           |  |  |
|----------------|---------------------------------------|--|--|
| p_nab          | n-well doping                         |  |  |
| p_dsurf        | p-surface doping                      |  |  |
| n_dsurf        | n-surface doping                      |  |  |
| n_Δl           | drawn-effective n-chan-<br>nel length |  |  |
| n_Aw           | drawn-effective n-chan-<br>nel width  |  |  |
| n_uo           | zero-field electron mobil-<br>ity     |  |  |
| n_eox          | oxide thickness                       |  |  |

Table 1: The independent process parameters for the  $0.75\mu$  technology described in the example

The analytical approximation  $\hat{T}$  thus obtained is used to characterize the mean  $\mu$  and the variance  $\sigma^2$  of the timing vector jpdf using a Monte Carlo simulation with n=10.000 sample points. Assuming a gaussian probability distribution function for the timing parameters, the true variance  $\sigma^2$  can be estimated by the sample variance  $s^2$  with an interval error given by:

$$\frac{(n-1)s^2}{\chi^2_{1-\frac{\delta}{2}}(n-1)} < \sigma^2 < \frac{(n-1)s^2}{\chi^2_{\frac{\delta}{2}}(n-1)}$$
(2)

where  $\chi^2_u(n)$  is the u-percentile of a chi-square density with n degrees of freedom and 1- $\delta$  is the desired confidence level. Once the parameters of the jpdf of the vector of random variables  $\hat{T}$  is known it is possible to select the desired level of confidence of the worst-case point(s). Higher probability worst-case points (i.e. closer to the mean of the timing jpdf) will generate faster timing models but also lower yields, i.e. a higher probability that the cell will actually be slower than the simulated value. The  $4\sigma$ -percentile point  $(T_{4\sigma})$  in the timing parameters space has been chosen as the

assigned probability value of the desired worst-case points. The problem of finding one or more feasible combination of process parameters achieving the desired worst-case timing performance  $(T_{4\sigma})$  can be described as a non-linear programming optimization problem, i.e.:

 $\vec{p}_{4\sigma} = min\left(\left|T(\vec{p}) - T_{4\sigma}\right|\right), \forall T$ (3)

which has been solved by using a sequential quadratic programming algorithm [5], [8]. In general this operation generates as many different worst-case points as are the delay paths in the timing vector. In practice the different worst-case points obtained by solving (3) tend to gather in a limited number of groups. Each group can be associated to a particular physical effect causing the increase of the delay for that particular class of delay paths. In our example it was possible to identify two distinct groups of worst-case points (G1,G2 in table 2), as illustrated by the two dimensional cross-section of the process parameters hyperspace shown in figure 2.

Fig. 2:Two dimensional cross section of the process parameters hyperspace showing the position of the two worst-case points W1 and W2 relative to the worst, typical and best case points derived with the classical methodology



The strong correlation shown by the scatter plot in figure 2 can be explained by analyzing the different delay paths in the two groups. In fact all the delay paths in the first group (G1) correspond to rising input transitions while all the falling input delay paths are in the second group (G2). This is a clear indication that a modification of the threshold voltage of the basic NMOS and PMOS devices could be the physical effect which is responsible for the increase of the delays in G1 and G2 with respect to the typical case. This analysis is confirmed by a direct examination of the process parameters associated with G1 and G2 respectively. The data in table 3 show that G1 correspond to an increase of Vtn and a decrease of Vtp.

Vice versa G2 is obtained in correspondence with a higher p-channel surface doping leading to an increase in Vtp and a decrease of Vtn. This cause a shift of the thresholds Vil and Vih of a basic inverter: the rising input events will increase at W1 and decrease at W2. Vice versa the falling input events will decrease at W2 and increase at W1.

Once the delay paths have been grouped it is necessary to

define a point in the  $\vec{p}$  hyperspace which is representative of each group and that can be used as the actual worst-case point. This problem can be again expressed as a non-linear programming problem with multiple objectives [5]-[6]:

$$W_{i} = min\left(\left|T_{j}(\vec{p}) - T_{j4\sigma}\right|, \forall T_{j} \subset G_{i}\right)$$
(4)

where  $G_i$  and  $W_i$  are respectively the i-th. group and the worst-case point associated with it.

Comparing the parameters of W1 and W2 with those of the classical worst-case point (WORST) it is evident that the latter is obtained by combining both high p and n-channel surface doping. The probability of an occurrence of this combination of process parameters is however extremely unlikely, corresponding to the probability of having a delay greater than  $\mu + 8\sigma$ , because of the intrinsic correlation of the surface doping in the p-channel and in the n-channel that is typical of the particular fabrication process used. The trade-off between manufacturability margins and performances realized in this point is very unbalanced in favor of the first and will therefore penalize excessively the maximum clocking speed that can be safely adopted with this technology.

TABLE 2: PROPAGATION EVENTS ARE GROUPED ACCORDING TO THEIR MUTUAL EUCLIDEAN DISTANCE IN THE PROCESS PARAMETERS SPACE. THE LETTERS R and F denote rising and falling outputs respectively.

| G1          | G2         |
|-------------|------------|
| AN2_A_Z_R   | AN2_A_Z_F  |
| AN2_B_Z_R   | AN2_B_Z_F  |
| AO1_A_Z_F   | AO1_A_Z_R  |
| AO1_B_Z_F   | AO1_B_Z_R  |
| AO1_C_Z_F   | AO1_C_Z_R  |
| AO1_D_Z_F   | AO1_D_Z_R  |
| BTREE_A_Z_R | EO_A_Z_F_1 |
| EO_A_Z_F_0  | EO_A_Z_R_1 |
| EO_B_Z_F_0  | EO_B_Z_R_1 |
| IV_A_Z_F    | IV_A_Z_R   |
| ND2_A_Z_F   | ND2_A_Z_R  |
| ND2_B_Z_F   | ND2_B_Z_R  |
| NR2_A_Z_F   | NR2_A_Z_R  |
| NR2_B_Z_F   | NR2_B_Z_R  |
| OR2_A_Z_R   | OR2_A_Z_F  |
| OR2_B_Z_R   | OR2_B_Z_F  |

TABLE 3: WORST-CASE MODELS, STANDARD TECHNIQUE (WORST) AND APT (W1,W2)

| (                          |                                                                                                           |                                                                                                                                                                                             |  |  |  |  |
|----------------------------|-----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Worst                      | W1                                                                                                        | W2                                                                                                                                                                                          |  |  |  |  |
| -400 %                     | -379.6 %                                                                                                  | -388.1 %                                                                                                                                                                                    |  |  |  |  |
| 31.6 %                     | 29.5 %                                                                                                    | -29.6 %                                                                                                                                                                                     |  |  |  |  |
| 57.5 %                     | 42 %                                                                                                      | -51.8 %                                                                                                                                                                                     |  |  |  |  |
| 8.3 %                      | 7.7 %                                                                                                     | 7.9 %                                                                                                                                                                                       |  |  |  |  |
| -7.5 %                     | -6.8 %                                                                                                    | -2.5 %                                                                                                                                                                                      |  |  |  |  |
| 91 %                       | -80.1 %                                                                                                   | 89.9 %                                                                                                                                                                                      |  |  |  |  |
| 4.33 %                     | 0.7 %                                                                                                     | 2 %                                                                                                                                                                                         |  |  |  |  |
| MOS Threshold Voltage      |                                                                                                           |                                                                                                                                                                                             |  |  |  |  |
| 36 %                       | -28.2 %                                                                                                   | 33 %                                                                                                                                                                                        |  |  |  |  |
| 20.7 %                     | 13.3 %                                                                                                    | -6.8 %                                                                                                                                                                                      |  |  |  |  |
| Inverter Threshold Voltage |                                                                                                           |                                                                                                                                                                                             |  |  |  |  |
| Vil                        |                                                                                                           | -1.6 %                                                                                                                                                                                      |  |  |  |  |
|                            | 6.1 %                                                                                                     | -2.9 %                                                                                                                                                                                      |  |  |  |  |
|                            | Worst   -400 %   31.6 %   57.5 %   8.3 %   -7.5 %   91 %   4.33 %   Voltage   36 %   20.7 %   old Voltage | Worst W1   -400 % -379.6 %   31.6 % 29.5 %   57.5 % 42 %   8.3 % 7.7 %   -7.5 % -6.8 %   91 % -80.1 %   4.33 % 0.7 %   Voltage 36 % -28.2 %   20.7 % 13.3 % old Voltage   6.2 % 6.1 % 6.1 % |  |  |  |  |

Moreover it is important to remark that, because of the existence of two distinct worst-case points for rising and falling transitions, the different delay paths in a circuit will be affected differently depending on the number of rising and falling transitions along the path. Therefore it is reasonable to expect that, not only the value of the longest sensitizable path (critical path) will be affected but also its topological definition. The results obtained have been verified using two different circuits taken from the ISCAS benchmarks [7] (C6288, C7552) and a 16-bit adder. The data in table 4 show the percent variation of the critical path delay with respect to typical conditions in the new worst-case points. The critical path has been extracted in the nominal case using a static timing analyzer with false path elimination [10] and simulated with ELDO [9] at transistor level. The results of electrical simulation show a considerable decrease of the critical path delay both in W1 and in W2 with respect to the previous worst-case point.

TABLE 4: PERCENT VARIATION OF THE CRITICAL PATH DELAY WITH RESPECT TO THE NOMINAL CASE

| Circuit | No. of Gates | WORST | W1   | W2   |
|---------|--------------|-------|------|------|
| C7552   | ~ 7000       | +52%  | +9%  | +15% |
| C6288   | ~ 3000       | +41%  | +14% | +10% |
| Add16   | ~ 100        | +51%  | +9%  | +16% |

### **IV - CONCLUSIONS AND FUTURE WORKS**

A new methodology to derive the worst-case process conditions for the characterization of standard cell libraries has been presented. The most important feature of the proposed methodology is that it is possible to assign the desired worstcase probability value in the target (timing) space. Another important characteristic of the method is that it preserves the parameter correlations thus ensuring the physical realizability of the worst-case conditions. The technique has been applied to derive the worst-case models of a standard cells library in a 0.75 $\mu$  CMOS technology. The experimental results presented in this paper confirm the practical outstanding improvement of the described methodology with respect to classical worst-case techniques.

We are currently focusing our research towards extending the current methodology to derive realistic best-case models and realistic power and combined timing-power worst-case models. At the same time the possibility of exploting circuit level correlations (versus cell level) and to drop the gaussian hypothesis are being expolored to further improve the accuracy of the APT generated worst-case models.

### ACKNOWLEDGEMENTS

The authors wish to thank M. Sivaraman and Professor A. Strojwas of Carnegie-Mellon University and J. Benkoski of EurEpic for their valuable suggestions.

#### REFERENCES

- G.E.P. Box, N. R. Draper, *Empirical model building and response surfaces*, second edition New York, John Wiley & Sons, 1987
- [2] C.Guardiani, M. Amadori, "PLUTO: an RSM analog circuits optimizer," in *Proc. EDAC* (Amsterdam), Feb.1991, p.256
- [3] A. Papoulis, Probability, Random Variables and Stochastic Processes, Third Edition, New-York, McGraw-Hill, 1991
- [4] J.M. Hammersley and D.C. Handscomb, *Monte Carlo Methods*, London, Methuen, 1964
- [5] D.G. Luenberger, *Linear and Nonlinear programming*, Reading (MA), Addison-wesley, 1984
- [6] R.K.Brayton, G.D.Hachtel and A.L. Sangiovanni-Vincentelli, "A survey of optimization techniques for integrated-circuits design", *IEEE Proc.*, vol. 69, pp. 1334-1362, Oct. 1981
- [7] ISCAS-85 Benchmarks, "Special Session: Recent Algorithms for Gate Level ATPG with Fault Simulation and Their Performance Assessment", presented at *IEEE Int. Symp. Circuits and Systems*, June 1985
- [8] R.E. Massara, Optimization methods in electronic circuit design, Harlow, Longman Scientific & Technical, 1991
- [9] ELDO user manual, v.4.2, Ulm, Anacad EES, 1993
- [10] J.Benkoski et al, "Timing verification using statically sensitizable paths," *IEEE Trans. on CAD*, vol. 9, No. 10, Oct. 1990