# A 0.2- $\mu$ m, 1.8-V, SOI, 550-MHz, 64-b PowerPC Microprocessor with Copper Interconnects

Anthony G. Aipperspach, David H. Allen, Dennis T. Cox, Nghia V. Phan, and Salvatore N. Storino

Abstract— A 550-MHz 64-b PowerPC processor in 0.2-um silicon-on-insulator (SOI) copper technology achieves a 22% frequency gain over a similar design in a CMOS bulk technology. Performance gains are 15%–40% at the circuit level, 24%–28% for critical paths. Unique SOI design aspects such as history effect, lowered noise margins, parasitic bipolar current, and self-heating are considered.

*Index Terms*— CMOS memory integrated circuits, integrated circuit design, integrated circuit noise, microprocessors, siliconon-insulator (SOI) technology.

#### I. INTRODUCTION

THE basis for this processor design was a 350-MHz, 64-b PowerPC processor fabricated in a 2.5-V,  $0.35-\mu m$  bulk CMOS technology with five layers of aluminum wiring [1]. This design was migrated to a  $0.22 - \mu m$  bulk CMOS technology with copper interconnects and multithreshold transistors [2]. In addition to the technology migration, some architectural changes were made to improve the number of instructions executed per cycle. Both the instruction and data caches were doubled to 128 kbyte, and a 256-kbyte L2 directory was added. These and other changes increased the transistor count from 12 to 34 million devices. The design was further migrated to a  $0.22-\mu m$  SOI process, which also uses six layers of copper wiring [3]. Table I describes in more detail the physical, electrical, and technology attributes of the original design and the two-step migration that leads to the development of the silicon-on-insulator (SOI) processor described in this paper. Note the progression in core clock frequency from 350 to 450 MHz in bulk technologies and then to 550 MHz in SOI. The decrease in the supply voltage from 2.5 to 1.8 V resulted in less power despite the nearly 30% increase in clock frequency. The 22% increase in clock frequency in the SOI design resulted in less than a 10% increase in power at the same voltage supply. The technology shrink resulted in a smaller die size despite the larger L1 caches and the addition of the L2 directory.

## II. FREQUENCY IMPROVEMENT IN SOI

Fig. 1 illustrates the improvements in cycle time starting with the original design and progressing through the two-stage conversion, which led to the SOI microprocessor described in this paper. The original design, shown for reference, was fabricated in CMOS6S2, which is a bulk technology using alu-

Manuscript received March 21, 1999; revised June 4, 1999.

The authors are with IBM Corp., Rochester, MN 55901 USA.

|                         | CMOS6S2            | CMOS7S             | CMOS7S SC<br>550MHz |  |
|-------------------------|--------------------|--------------------|---------------------|--|
| Core Clock Frequency    | 350MHz             | 450MHz             |                     |  |
| Supply Voltage          | 2.5V               | 1.8V               | 1.8V                |  |
| Power                   | 34W                | 22W                | 24W                 |  |
| Transistors             | 12M                | 34M                | 34M                 |  |
| Die Size                | 162mm <sup>2</sup> | 139mm <sup>2</sup> | 139mm <sup>2</sup>  |  |
| Poly pitch/2            | 0.35um             | 0.22um             | 0.22um              |  |
| L <sub>eff</sub> (NFET) | 0.18um             | 0.12um             | 0.12um              |  |
| T <sub>ox</sub>         | 5.0nm              | 3.5nm              | 3.5nm               |  |
| Metalization            | 5 layers Al        | 6 layers Cu        | 6 layers Cu         |  |
| Contacted M2-M4 pitch   | 1.26um             | 0.81um             | 0.81um              |  |

 TABLE I

 Physical, Electrical, and Technology Comparison



Fig. 1. Hardware frequency measurements.

minum wiring. The first conversion was a shrink to CMOS7S, a copper metallization technology. This resulted in a 29% improvement in core clock frequency, achieving 450 MHz in nominal hardware. The result of a further conversion to CMOS7S SOI technology is hardware capable of running at 550 MHz, a further 22% frequency gain.

#### A. Circuit Delay Reduction in SOI

The improvement in operating frequency in SOI is a direct result of a reduction in the delay through circuits in frequencylimiting critical paths. The delay improvement of circuits is characteristic of SOI [4] and depends on the topology of the circuit. Generally, the more topologically complex the circuit, the more leverage SOI provides. This is especially true of circuits using stacked transistors since the body voltage is rarely negative with respect to the source. The performance improvement of SOI over bulk is illustrated in Fig. 2 by three different types of circuits; static, dynamic, and arrays. Simpler

Publisher Item Identifier S 0018-9200(99)08347-X.



Fig. 2. Circuit delay improvements in SOI CMOS7S bulk versus CMOS7S SOI.

circuits such as inverters and two-input NAND gates realized a gain of 15–20%. More complex circuits with higher stacks, such as four-input NAND gates and AND-OR-INVERT circuits, resulted in a 25–40% improvement. Dynamic circuits, which tend to be dominated by the delay through NFET stacks, improved 15–25%. The improvement in the access time of an SRAM, shown here as 20%, is highly dependent on the organization of the array and on which parasitics tend to dominate the delay.

#### B. Path Delay Improvement in SOI

Since the improvement of individual circuits in SOI depends upon their topologies, the improvement in overall cycle time provided by SOI is the net effect of the improvement of all of the individual circuits composing the frequency-limiting critical paths. Table II describes six paths composed of a variety of circuit types. Each row describes a separate delay path. The first two columns describe the path and the dominant circuit type. For reference, the delay through the path in the original CMOS6S2 design is shown in the third column from the left. The fourth and fifth columns, under the heading CMOS7S, show the delay through each path and the improvement as a result of the migration to a bulk technology with copper wiring. The last two columns show the delay and the improvement of each path as a result of the migration to SOI. The migration to SOI improved these six paths 15-21% over that of the same paths in bulk. Fig. 3 illustrates the topology of one of the paths shown in the Table II, the data cache address generation and setup. This path combines elements of static, domino, and array circuits, as well as significant wire loading, and improved 21% in SOI over the same path in a bulk technology.

#### III. CIRCUIT DESIGN CHALLENGES IN SOI

This paper thus far has dealt with the improvement in cycle time that was achieved by mapping a bulk design to an SOI technology. SOI introduces a number of concerns that are not present in bulk CMOS. Most of these concerns are due to the uncertainty in the potential of the FET body.

#### A. History Effect on Timing

The potential of the body with respect to ground is a function of many factors, including the circuit topology and



Fig. 3. DCache address path.

TABLE II PATH DELAY IMPROVEMENT IN SOI

| Path           | Circuit<br>Type     | CMOS6<br>S2 | CMOS7S |      | CMOS7S<br>SOI |      |
|----------------|---------------------|-------------|--------|------|---------------|------|
|                |                     | delay       | delay  | а    | delay         | b    |
| Rotator        | dynamic             | 1.37ns      | 1.04ns | -24% | 0.83ns        | -20% |
| Adder          | dynamic             | 1.13ns      | 0.84ns | -26% | 0.66ns        | -21% |
| Branch predict | static              | 3.3ns       | 2.1ns  | -36% | 1.7ns         | -19% |
| ICache read    | array               | 2.0ns       | 1.3ns  | -35% | 1.1ns         | -15% |
| DCache address | static +<br>dynamic | 1.7ns       | 1.4ns  | -18% | 1.1ns         | -21% |
| FXR read       | array               | 0.95ns      | 0.57ns | -40% | 0.48ns        | -16% |

switching history. A consequence of this "history effect" is that the delay through a particular circuit or path cannot be predicted without full knowledge of the prior states and transitions of the circuit—information that static timing tools do not possess. The history effect on delays is highly dependent on circuit topology, environment, and other factors. For a gatelevel circuit, history effect is on the order of up to 8% faster or slower than the delay when the circuit is frequently switched. Therefore, a long path comprising a number of different circuit elements with varying tendency toward the history effect will rarely accumulate more than a few percent difference in delay. However, short paths can be more of a concern.

To account for the SOI history effect in static timing, we chose to add a small margin to our latch hold tests to protect against early mode problems. For late mode timing, we assume a history that does not suppose that circuits in a long path are initialized to either the fastest or the slowest possible body voltages. To verify the correctness of this assumption, test patterns directed at exposing history effects were run on the hardware. Within the limits of our test setup and procedures, history effects produced no variability in cycle time, and no fast path failures were observed.

Fig. 4 illustrates the range of delays that are predicted for a simple gate depending on the input switching history. Forcing the body voltages to the highest or lowest possible values results in delays somewhat outside those observed when a random set of patterns is used to precondition body voltages. To avoid being too optimistic or pessimistic in long path



Fig. 4. History effect on OR-AND-INVERT (OAI) gate.



Fig. 5. Dynamic circuit design techniques in SOI.

timing, timing models for individual circuits were generated assuming a history that results in a delay slightly slower than the average delay observed using random patterns. Because of the catastrophic nature of fast-path or early mode failures, the most pessimistic, or fastest, history was assumed for timing models generated for early mode timing. As mentioned earlier, an additional safety margin was added, as is usually done to ensure against other delay variables such as on-chip process variation and power-supply noise.

## B. Parasitic Bipolar Current

Dynamic circuits, and some static circuit families, are at risk due to the parasitic bipolar effect. This allows current flow from the drain to the source of an otherwise OFF FET through a parallel parasitic bipolar transistor. This is generally not a serious problem in fully restoring, static circuits. Circuit topologies that dot together very wide parallel devices, such as wide muxes and OR gates, can see a significant parasitic current, which may affect delay. Dynamic circuits and arrays are at the greatest risk of functional failure due to parasitic bipolar current and may require design changes.

#### C. Lowered Noise Margin in Dynamic Circuits

The floating body in SOI transistors leads to uncertainty in threshold voltages, which in turn means lower noise margins for dynamic circuits. A lower noise margin, coupled with other mechanisms that erode stored charge in dynamic circuits, requires additional attention in dynamic circuit design, and may require some redesign of circuits that were originally intended for a bulk technology. Several design techniques were used to improve the noise immunity of dynamic circuits in SOI while minimizing any impact on their delay (see Fig. 5). These techniques include setting up inputs during the precharge phase, cross-connecting inputs to stacked devices, predischarging intermediate nodes, reordering the pulldown stack, and remapping logic.

Predischarging is a simple technique that discharges intermediate nodes so that the bodies of transistors high in a stack are prevented from charging. This prevents both parasitic bipolar current and sensitivity to input noise caused by the positive body effect on threshold voltages. Note that this assumes that inputs gating devices in the stack above the predischarged node are low during precharge (for instance, coming from the outputs of other domino circuits). If the predischarge device is a PFET, the same clock that is used to gate the precharge device can control the predischarging device. Provided that the input transitions are reasonably fast, an input can be used to gate a PFET predischarge device, but device-level simulations should be used to ensure that crowbar current through the predischarge device does not inadvertently prevent precharge of the circuit. While predischarging is not advisable in a bulk design due to charge sharing, predischarging can be exploited in SOI because of the reduced diffusion capacitances. Reordering the NFET logic tree of a dynamic circuit to position the widest parallel group of transistors at the bottom of the stack prevents their bodies from charging high and thereby reduces parasitic bipolar current. This topology would be nonoptimal in bulk but is possible in SOI because of the lesser concern over charge sharing. Likewise, reordering inputs in multiplefingered stacked transistors can reduce the opportunity for bipolar current by 50%. By rearranging inputs such that the signal connecting to the gate of the higher transistor in one stack connects to the lower transistor in the other stack, at most only one-half of the transistor bodies would be allowed to charge high and enable parasitic bipolar current. Remapping is the technique of moving Boolean inputs forward or backward in a cone of logic to reduce the parallel stacks, which potentially cause parasitic bipolar current. In some cases, inputs can be remapped to static circuits, eliminating any dynamic circuit concerns entirely. Complex domino structures replace the output inverter of a domino gate with a static NAND or NOR gate, allowing multiple precharged signals to be logically combined and buffered. This circuit style can be used to break up large parallel NFET logic trees, which would otherwise induce intolerable parasitic bipolar currents. The reduced loading on the precharge nodes and the improvement in the performance of stacked structures make this technique especially effective in SOI.

Provided that the worst case environmental conditions for dynamic circuit noise immunity are known in advance and limited (for example, during test and reliability stress screens), a logical state can be added to the design that causes circuits at risk to trade off circuit delay for noise immunity. In dynamic circuits, additional feedback was provided during periods of extreme temperatures and voltages through use of a halflatch conditionally gated by a test signal. The delay added



Fig. 6. Self-heating in a nonfooted domino gate.

by the additional feedback is inconsequential during stress testing, and the impact on the delay during normal operation is insignificant.

## D. Self-Heating

Device self-heating can occur because of the thermal resistance of the buried oxide layer. Devices of concern are those that are in a high current state for a significant portion of the clock cycle, such as some off-chip drivers. Generally, in typical CMOS circuits, where transistors sink or source current for only a small fraction of the clock cycle, selfheating will not be significant. Some special cases, such as off-chip drivers driving heavy loads and domino circuits in precharge contention, merit additional attention. An example shown in Fig. 6 is a simple domino circuit that does not contain a foot device in the pulldown stack. When the clock goes low to begin the precharge phase, current will flow through P0, N0, and N1 until either A or B goes low. If this does not happen until late in the precharge phase, the devices will heat slightly and be slower during subsequent evaluate and precharge phases. However, modeling shows that significant current, even when repeated every cycle for a long period, results in a temperature rise of only a few degrees and has a small impact on delay. This effect may be more significant in circuits that sink or source high currents for a larger fraction of the cycle. An example is an heavily loaded off-chip driver with a high switching factor, which could experience a temperature increase of 10-15 °C.

## IV. ARRAY DESIGN CHALLENGES IN SOI

Array design in SOI presents many of the same challenges as does dynamic circuit design, such as the effect of switching history on timing chains. Other effects unique to array design include parasitic bipolar current during the write operation, bitline leakage during the read operation, and sense amplifier drift.

### A. WRITE Cycle

A major concern to the array designer in an SOI technology is dealing with the parasitic bipolar effect. Since arrays consist of many cells dotted together, an access to one of the cells can induce bipolar current from the other cells. If the array has been sitting in the precharge condition for a long time, and all the cells have the same data stored in them, the bodies of the transfer devices will leak to a "VDD" level on the "1" side of the cells. If the initial access intends to WRITE one of these cells to the opposite value, the sources of the nonselected



Fig. 7. Effect of bipolar current on SRAM write-cycle timing. (a) Bipolar current during write cycles. (b) Increase in cell write time due to bipolar current.

transfer devices, which are tied to the falling bitline, will drop below their body voltage, causing the parasitic bipolar device to conduct current. The value of this current is relatively small in each of the cells, but together they add up to a significant amount. The waveforms in Fig. 7 illustrate the additional time required to write the cell resulting from this additional current. On subsequent cycles, the bipolar current reduces very quickly. The forward biasing of the bipolar current reduces the body voltage delta so only the first couple of switches will show this effect. The designer must ensure that the write driver devices are of sufficient size and the pulse width of the selecting signals are of sufficient width to maintain functionality when this occurs.

#### B. READ Cycle

A second effect that must be considered in an SOI technology occurs during a READ cycle. In an SOI technology, modeling both the states and the histories of nonselected cells sharing a common bitline with a selected cell is critical to predicting performance, much more so than in a bulk technology. Two effects must be considered when modeling array cells in SOI.

The first deals with the source capacitance of the transfer devices on the common bitlines. If all cells on a bitline pair are at the same state, the capacitance of the bitline dotted to the transfer gates on the logical "1" side is about 10% more than the bitline on the logical "0" side. When setting the signal margin at the sense amplifier, it is important that the nonselected cells are initialized correctly to present the worst case capacitance.

The second effect deals with leakage currents. Since the bodies of the transfer devices are floating, the leakage currents



Fig. 8. SOI effects on SRAM READ-cycle timing.

of nonselected cells are higher in SOI. A cell just written from one state to the other has a higher body voltage on the "0" side transfer device than do cells that have been in that state for a long time.

Through analysis, the worst case leakage and capacitance condition for a READ cycle occurs when all of the cells are in one state except the cell being accessed, and one of the nonselected cells was written in the previous cycle. The waveforms in Fig. 8 show the signal being generated at the sense amplifier. Under nominal conditions, compared to the case in which half of the background cells are storing ones and half zeroes, if all nonselected cells are opposite the value being read, the added capacitance reduces the signal by 10 mV. Another 10 mV of signal is lost due to the added leakage currents. Under fast processing, high supply voltage, and high temperature, the leakage effect becomes much larger-the signal is cut in half if all cells are opposite the polarity of the cell being read. The sense triggering signal is timed assuming a normal offset. However, during stress testing, the sense trigger is delayed an additional amount to account for the extreme conditions. This assures functionality during stress testing while not affecting performance under normal operating conditions.

### C. Sense Amp Mismatch

A third effect involves sense amplifier design. If the bodies of the cross-coupled NFET's are left floating, and the same data value is sensed many times, a body voltage mismatch develops. Since the body voltages directly affect the threshold voltages of the NFET's, a VT mismatch develops. In the nominal case, this effect is about 70 mV. However, at a higher supply voltage, this effect can cause a mismatch of more than 100 mV. By tying the bodies of the cross-coupled NFET's together, the floating body effect on sense amp mismatch is eliminated.

## V. CONCLUSIONS

In conclusion, the hardware resulting from our experience in designing this SOI microprocessor demonstrates the performance leverage provided by SOI technology. While SOI introduces additional design challenges, especially in dynamic circuits and arrays, circuits originally designed for a bulk technology can be made SOI-compatible without compromising the benefits of SOI.

#### REFERENCES

- S. N. Storino, S. R. Kunkel, R. J. Eickemeyer, and J. M. Borkenhagen, "A commercial multi-threaded RISC processor," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 1998, pp. 236–237.
- [2] N. J. Rohrer, C. C. Akrout, M. G. Canada, D. Cawthron, B. Davari, R. Floyd, S. F. Geissler, R. D. Goldblatt, R. M. Houle, P. D. Kartschoke, D. Kramer, P. McCormick, G. M. Salem, L. Su, R. Schulz, and L. Whitney, "A 480 M Hz RISC microprocessor in a 0.12 μm Leff CMOS technology with copper interconnects," in *ISSCC Dig. Tech. Papers*, Feb. 1998, pp. 240–241.
- [3] F. Assaderaghi, W. Rausch, A. Ajmera, E. Leobandung, D. Schepis, L. Wagner, H.-J. Wann, R. Bolam, D. Yee, B. Davari, and G. Shahidi, "A 7.9/5.5 ps room/low temperature SOI CMOS," in *Proc. IEDM* '97, pp. 415–418.
- [4] G. Shahidi, A. Ajmera, F. Assaderaghi, R. J. Bolam, E. Leobandung, W. Rausch, D. Sankus, D. Schepis, L. F. Wagner, K. Wu, and B. Davari, "Partially-depleted SOI technology for digital logic," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 1999, pp. 427–427.



Anthony G. Aipperspach received the B.S. degree in electrical engineering from Montana State University, Bozeman, in 1979.

In 1979, he joined IBM Rochester Development Laboratory, Rochester, MN, where he was involved in the design of an on-chip DRAM in 3- $\mu$ m NMOS. From 1981 to 1993, he was involved with various ASIC library designs, as well as integrated SRAM design. From 1994 to 1996, he was involved in BiCMOS circuit design and a variety of memory configurations, including SRAM, CAM, and register

arrays. He is currently Team Leader of a group that provides custom memory macros in multiple technologies.



**David H. Allen** received the B.S. degree in electrical engineering from the University of Kansas, Lawrence, in 1980.

He was involved in the design of high-speed memory, logic, and custom circuits at Intel (Portland, OR), VTC (Bloomington, MN), and Micron Technology (Boise, ID) before joining IBM's AS/400 Division in Rochester, MN, in 1992. Since then, he has developed PowerPC processors for AS/400 and RS/6000 servers. He is currently leading the future processor circuit design team for IBM server development.



**Dennis T. Cox** received the B.S. degree from the University of Wisconsin, Madison, in 1970 and the M.S. degree from Syracuse University, Syracuse, NY, both in electrical engineering

He developed SRAM and PLA designs with IBM Circuit Technology, Kingston, NY. In 1976, he joined IBM Circuit Technology, Rochester, MN, where he incorporated automatic macro generation into the ASIC design system. Most recently, he has been Technology Leader on several generations of PowerPC processors for IBM server development.



Nghia V. Phan received the B.S. degree in electrical engineering from Wilkes College, Wilkes-Barre, PA, in 1976.

He worked for Compact Video from 1976 to 1977 as a Video Production Engineer. He joined IBM Circuit Technology, Rochester, MN, designing processors for AS/400 and RS6000. He has been involved in various bipolar/BiCMOS processors and SRAM's and CMOS processors, and is presently involved in CMOS\_SOI processors. He is currently a Senior Design Engineer with the IBM PowerPC

processor design team in Rochester.



**Salvatore N. Storino** received the B.S. degree from the University of Illinois, Urbana, in 1982 and the M.S. degree from the University of Minnesota, Minneapolis, in 1990, both in electrical engineering.

In 1990, he joined IBM's AS/400 Development organization in Rochester, MN, where he was involved with high-performance microprocessor development for commercial servers. His current work has been in advanced custom ALU's, high-speed VLSI circuits, and technology applications for copper interconnects and SOI devices.