# Design Aspects of Carry Lookahead Adders with Vertically-Stacked Nanowire Transistors

Davide Sacchetto<sup>1,2</sup>, M. Haykel Ben-Jamaa<sup>1</sup>, Giovanni De Micheli<sup>1</sup> and Yusuf Leblebici<sup>2</sup>

<sup>1</sup>Integrated System Laboratory (LSI), <sup>2</sup>Microelectronic System Laboratory (LSM)

Ecole Polytechnique Fédérale de Lausanne, Switzerland

e-mail: davide.sacchetto@epfl.ch

Abstract—This paper discusses the newly introduced vertically-stacked silicon nanowire gate-all-around fieldeffect-transistor technology and its advantages for higher density layout design. The vertical nanowire stacking technology allows very-high density arrangement of nanowire transistors with near-ideal characteristics, and opens the possibility for design optimization by adjusting the number of nanowire stacks without affecting the footprint area of the device. Several libraries for combinational logic synthesis have been designed and implemented for the synthesis of carry-lookahead adders, using the vertically-stacked nanowire technology. The reduction in silicon active area occupancy of vertically-stacked gates are envisaged of great significance for regular cell mapping, in disruptive future applications based on nanowire transistor arrays.

 $Index \ Terms$ —logic synthesis, nanowire arrays, cell library, arithmetic blocks

## I. INTRODUCTION

URING the past few years the scaling trend for complementary-metal-oxide-semiconductor(CMOS)technology included emerging research direction in order to address the unbalanced voltage scaling, short-channeleffects (SCEs) and the exponential increase of static power consumption [1]. Solid state research has introduced novel materials, such as high- $\kappa$  dielectrics with metal gates, or strained Si or SiGe channel replacement [2]; and novel configurations, such as double-gate, FinFETs or gate-allaround (GAA) constructions [3]. All this required more advanced processing and additional effort in the evaluation of the state-of-the-art technology. One of the most promising devices that has been investigated recently is the nanowire transistor. Silicon nanowire channels have the advantage of better immunity toward SCEs and full compatibility with the current planar technology [4]. In addition, nanowire channels can be configured in stacks for increased performance [5] or used for new functionalities [6]. However, only a few works have assessed the impact of nanowire technology in complex circuits [7], [8].

In this work we introduce the vertically-stacked Si nanowire (SiNW) top-down technology from a device level description to design and implementation of more complex circuits, such as *carry-lookahead adders* (*CLAs*). Performance and layout density are compared with planar *silicon-on-insulator* (SOI) technology.

Section II gives an overview of the vertically-stacked SiNW transistor technology. Section III shows how vertically-stacked SiNWs can be used to build more complex logic gates, such as an inverter. Then Section IV reports on the assumptions and the modeling for logic gate design and on building the libraries. Section V discusses on the synthesis of CLAs and the performance results after mapping with the obtained nanowire

libraries. In Section VI we discuss potential opportunities given by the technology. Finally in Section VII we draw the conclusions.

# II. Overview on vertically-stacked nanowire technology

Vertically-stacked Si nanowire (SiNW) technology makes use of smart processing to fabricate transistors having parallel SiNW channels or *fingers*. As depicted in Fig. 1, the basic device is composed of a stack of SiNW channels embedded in a poly-Si GAA structure. Each channel forms a unit width transistor. The fabrication process was reported elsewhere [9] and it employs a combination of *deep-reactive-ion-etching (DRIE)* and sacrificial oxidation steps to form the SiNW channels, starting from a relatively inexpensive standard bulk-Si or SOI substrate. The process allows for easy tuning of vertical and horizontal separation between channels as well as the number of channels to be used for a certain device. All the channels are anchored to source/drain regions. Due to stacking, the SiNW channels will have different values of access resistance. This point is investigated in Section IV. One advantage of this technology is that the width of the transistor is wrapped around the stacked SiNW channels, thus reducing the silicon estate of the active area. In addition, the versatility of stacking a variable number of fingers reduces the active area occupancy even more. Besides, due to the one dimensionality of the channels, specific technology boosters such as strain [10] can be envisaged, enhancing the performance compared with planar SOI technology.



Figure 1. Vertically-stacked transistor in GAA configuration. The stack is composed of 4 SiNW channels (fingers). The total width of the transistor is the sum of all channel widths. The ends of the SiNW channels composing the stack are electrically connected to form source/drain regions.

## III. INVERTER CONSTRUCTION

The vertically-stacked transistor technology presented in the previous section can be used to implement different logic gates, with a reduced active area. For instance, an inverter can be fabricated as depicted in Fig. 2. In this construction either the pull-up or the pull-down networks are made of SiNW transistors



Figure 2. Vertically-stacked inverter structure with SiNW channels anchored to Si pillars with GAA configuration. The electrical equivalent circuit is a CMOS inverter with double drive. The number of SiNW channels is double for pull-up network compared with the pulldown network.



Figure 3. Tilted *scanning electron micrograph (SEM)* view of an inverter construction composed of 3 parallel stacks each composed of 4 SiNW channels. In this case pull-up and pull-down transistors have 12 unit width each.

anchored to source/drain pillars. Source/drain electrical contacts are made with metal via, to reduce the access resistance of bottom channels. This configuration with metal via will be used to assess the total effective resistance of the verticallystacked transistors in next section. A fabricated example of the inverter is shown in Fig. 3. In this example pull-up and pulldown networks are composed of three parallel SiNW stacks, each of them is composed of 4 SiNW channels.

# IV. COMBINATIONAL LOGIC DESIGN

To assess the performance of complex logic circuits using the vertically-stacked nanowire technology, we perform combinational logic synthesis of carry-lookahead adders that are then mapped with vertically-stacked nanowire libraries. The electrical behavior of the vertically-stacked transistor was modeled as a switch in series with a resistance, which includes both access and channel resistance. An estimation of the total diffusion capacitance is also carried out. Then, performance and active area estimation are calculated for different combinational logic gates. Finally different libraries of logic gates are built varying both design and technology parameters of the basic transistor cell.

## A. Basic assumptions

The configuration of channels of a vertically-stacked transistor can be generalized into a 3D matrix construction. We introduce the general construction to evaluate the total effective resistance,  $R_{\rm tot}$  of the transistor (see Fig. 4). Since the topmost SiNW in the stack is closer to the via contacts than the other wires, its resistance is simply modeled as the channel resistance,  $R_{\rm channel}$ . The other SiNWs in the stack will have an additional contribution due to the access resistance. Then, each additional channel carries a resistance calculated as follows:

$$R = R_{\text{channel}} + 2 \cdot R_{\text{s}} = \alpha \cdot R_{\text{channel}} \tag{1}$$



Figure 4. Resistive network construction representative of the channel 3D matrix composing a transistor.

where  $R_{\rm channel}$  is the resistance of the single SiNW transistor,  $R_{\rm s}$  is the series resistance path between the end of the SiNW channel and the via contact,  $\alpha = \frac{2 \cdot R_{\rm s}}{R_{\rm channel}} + 1$  expresses the ratio between the topmost SiNW resistance and any of the other SiNWs composing the stack. We assume  $R_{\rm channel} \simeq 2.5 \, \mathrm{k\Omega} \cdot \mu \mathrm{m}$  [11]. Since the top wire has less access resistance compared to the other wires in its stack, the total effective resistance can be reduced by using parallel stacks of fingers. Then, depending on the channel configuration (see Fig. 4) we get a general formulation of the total effective resistance:

$$R_{\rm tot} = \frac{\alpha \cdot R_{\rm channel}}{(\alpha - 1) \cdot k + n} \tag{2}$$

where k is the total number of stacks placed in parallel along the horizontal direction and n is the total number of channels composing the transistor. The total effective resistance of planar SOI transistors was calculated with Equation 2 assuming  $\alpha = 1$  and k = 0.

The same 3D matrix configuration of Fig. 4 is used to calculate the value of input and diffusion capacitances. We assume that each channel has the same contribution to the total diffusion capacitance,  $C_d$  thus the total diffusion capacitance of a single gate become:

$$C_{\rm d,tot} = C_{\rm d} \cdot n \tag{3}$$

The number of wires that compose the channel depends on the drive required by the transistor which in turn determines  $C_{\rm d,tot}$ . The gate input capacitance,  $C_{\rm in}$  is expected to grow linearly with the number of SiNW channels . In order to assess the impact of capacitances on gate delay, we assume equal  $C_{\rm in}$  and  $C_{\rm d,tot}$  [11]. A common value for the capacitances that is used in linear delay modeling is  $C_{\rm d} \simeq 2 \, {\rm fF}/\mu {\rm m}$  [11]. The interconnect contribution is not considered here. Thus  $C_{\rm d,tot}$ corresponds to the parasitic output load of a transistor and to its input capacitance.

### B. Gate modeling and libraries

We considered 9 different combinational logic gates that are listed in Table I. We implemented 14 libraries, listed in Table II; every one of them being formed by these 9 logic gates. The libraries include a planar SOI technology and 13 different SiNW technology libraries. These latter are obtained by varying two technology parameters: the access resistance (represented by  $R_s$ ) and the strain (represented by the actual channel resistance) and a design parameter: the number of nanowire stacks. We designed the 9 logic gates in every library using the linear switch model [11], which assumes that every Table I

GATE PARAMETERS OF THREE DIFFERENT LIBRARIES REPORTING ACTIVE AREA OCCUPANCY, FO4 DELAY AND CAPACITIVE INPUT LOAD.

|       | p           | lanar SOI |         | L06 : Si    | NW, single sta | ick     | L07: SiNW, double stack |           |         |  |
|-------|-------------|-----------|---------|-------------|----------------|---------|-------------------------|-----------|---------|--|
| Gate  | Active Area | FO4 delay | Cin     | Active Area | FO4 delay      | Cin     | Active Area             | FO4 delay | Cin     |  |
|       | [units]     | [ps]      | [units] | [units]     | [ps]           | [units] | [units]                 | [ps]      | [units] |  |
| INV   | 3           | 75        | 3       | 2           | 125            | 5       | 3                       | 75        | 3       |  |
| NAND2 | 8           | 110       | 4       | 4           | 220            | 7       | 8                       | 110       | 4       |  |
| NAND3 | 15          | 145       | 5       | 6           | 315            | 11      | 12                      | 195       | 7       |  |
| NAND4 | 24          | 180       | 6       | 8           | 410            | 14      | 16                      | 280       | 10      |  |
| NOR2  | 10          | 130       | 5       | 4           | 280            | 11      | 6                       | 170       | 7       |  |
| NOR3  | 21          | 185       | 7       | 6           | 435            | 17      | 9                       | 315       | 13      |  |
| NOR4  | 36          | 240       | 9       | 8           | 590            | 23      | 12                      | 460       | 19      |  |
| MUX2  | 24          | 180       | 6       | 8           | 420            | 14      | 16                      | 300       | 10      |  |
| MUX4  | 72          | 240       | 6       | 24          | 560            | 14      | 48                      | 400       | 10      |  |

| Table | П  |
|-------|----|
| 10010 | ** |

LIBRARY LIST OBTAINED VARYING SERIES RESISTANCE, NUMBER OF HORIZONTAL STACKS AND  $R_{\text{channel}}$  Reduction due to strain booster.

|            |                      | n° of  | _                                                        | add8b   |       | add16b  |       | add32b  |       | add64b  |       |
|------------|----------------------|--------|----------------------------------------------------------|---------|-------|---------|-------|---------|-------|---------|-------|
| Library    | $\frac{R_s}{R_s}$    | stacks | $\frac{R_{\text{channel,strained}}}{R_{\text{channel}}}$ | Area    | Delay | Area    | Delay | Area    | Delay | Area    | Delay |
|            | <sup>1</sup> channel |        | - vcnannei                                               | [units] | [ns]  | [units] | [ns]  | [units] | [ns]  | [units] | [ns]  |
| planar SOI | 0%                   | -      | 100%                                                     | 601     | 1.87  | 1265    | 3.87  | 2593    | 7.87  | 5249    | 15.87 |
| L01        | 0%                   | 1      | 100%                                                     | 302     | 1.87  | 638     | 3.87  | 1310    | 7.87  | 2654    | 15.87 |
| L02        | 25%                  | 1      | 100%                                                     | 302     | 2.44  | 638     | 5.08  | 1310    | 10.36 | 2654    | 20.92 |
| L03        | 25%                  | 2      | 100%                                                     | 595     | 1.97  | 1267    | 4.13  | 2611    | 8.45  | 5299    | 17.09 |
| L04        | 50%                  | 1      | 100%                                                     | 302     | 2.95  | 638     | 6.15  | 1310    | 12.55 | 2654    | 25.35 |
| L05        | 50%                  | 2      | 100%                                                     | 571     | 2.02  | 1211    | 4.26  | 2491    | 8.74  | 5051    | 17.7  |
| L06        | 100%                 | 1      | 100%                                                     | 302     | 3.92  | 638     | 8.24  | 1310    | 16.88 | 2654    | 34.16 |
| L07        | 100%                 | 2      | 100%                                                     | 570     | 2.17  | 1210    | 4.65  | 2490    | 9.61  | 5050    | 19.53 |
| L08        | 50%                  | 1      | 80%                                                      | 302     | 2.07  | 638     | 4.31  | 1310    | 8.79  | 2654    | 17.75 |
| L09        | 50%                  | 2      | 80%                                                      | 586     | 1.61  | 1211    | 3.37  | 2491    | 6.89  | 5051    | 13.93 |
| L10        | 100%                 | 1      | 80%                                                      | 302     | 2.49  | 638     | 5.21  | 1310    | 10.65 | 2654    | 21.53 |
| L11        | 100%                 | 2      | 80%                                                      | 571     | 1.66  | 1211    | 3.5   | 2491    | 7.18  | 5051    | 14.54 |
| L12        | 100%                 | 1      | 60%                                                      | 302     | 2.12  | 638     | 4.44  | 1310    | 9.08  | 2654    | 18.36 |
| L13        | 100%                 | 2      | 60%                                                      | 597     | 1.29  | 1261    | 2.73  | 2589    | 5.61  | 5245    | 11.37 |

logic gate is driving the same current as a unit inverter, when it switches. We therefore obtained the required characterization in terms of area and delay. The latter was estimated as the fanout-of-four (FO4) delay, assuming that every gate drives its own intrinsic delay and a load of 4 instances of itself. The obtained area, FO4 delay and input capacitance for 3 different sample libraries are illustrated in Table I.

# V. Synthesis/mapping results

The ABC tool from Berkeley [12] was used to generate CLAs of 8, 16, 32 and 64 bits. Fast and efficient synthesis with the designed libraries is achieved by *directed acyclic graph (DAG)*-aware rewriting of the *and-inverter-graph (AIG)* representation of the adder network. We used the script resyn2 for synthesis, which consist of an alternation of network rewriting and balancing algorithms that reduce the AIG size and the number of AIG levels. The optimized network was then mapped with the different libraries (see Table II). The impact of the different design and technology parameters on mapped CLAs is analyzed in the following.

# A. Impact of series resistance on delay

In Fig. 5 the impact of series resistance on FO4 delays is shown. The data refers to CLAs with different number of bits mapped with single stack configuration. As expected, the FO4 delay linearly increases with series resistance. This behavior is enhanced for adders with more inputs, due to the use of more complex gates. Data points at  $\frac{R_{\rm s}}{R_{\rm channel}} = 0$  correspond to a FO4 delay that is the same of the planar SOI case.



Figure 5. Effect of series resistance on delay. The values shown in the graph are obtained after mapping of carry-lookahead adders.

#### B. Impact of the number of stacks on delay and area

The detrimental effect of series resistance can be counterbalanced by using more stacks instead of a single one, thus reducing the effect of  $R_{\rm s}$  on  $R_{\rm tot}$  which in turn is reflected on the delay. For instance a 64 bits carry-lookahead adder synthetized with 2 stacks shows an evolution of FO4 delay with series resistance that is 6 times lower than the configuration which uses 1 stack (see Fig. 6). Comparing the delays of the two cases, we observed a delay reduction between 18% and 43%, depending on the series resistance. The speed-up achieved by the double stacking costs additional active area occupancy.



Figure 6. Evolution of 64 bits carry-lookahead adder delay with increasing series resistances (R<sub>s</sub> as depicted in Fig. 4). Dashed horizontal line represent delay of planar technology. The use of a double stack significantly reduces the impact of series resistances.



Figure 7. Effect of strain on delay (64 bits carry-lookahead adder data). Dashed horizontal line represent delay of planar technology. The plot shows that either using double stacks and/or strain improve delay. Notice that strained nanowires with double stacking outperform planar Si technology.

However, the total active area occupancy remains lower than the planar case. For instance, the 64 bits CLA with single stack would reduce occupancy area of 49% compared with the one mapped in planar technology. The use of double stack will still reduce occupancy area of 4%. However, the improvement in delay achieved by using double stack is bigger than the additional cost on area.

#### C. Impact of strain on delay

More significant performance improvement can be achieved by using strained nanowires. The impact of strain was evaluated varying  $\alpha$  within single or double stack configurations. By comparing FO4 delays (Fig. 7) we see that strain can effectively counterbalance the effect of series resistance, eventually outperforming planar technology. For instance, a strained double stacked library achieves a delay reduction between 8% and 28%, depending on the level of strain (see Table II).

## VI. OPPORTUNITIES

The presented architecture can be employed for standard cell libraries with minimum sized gates. Alternatively, the verticalstacking of channels can be used to increase the area density of regular array architectures such as for programmable logic array (PLA). As a matter of fact, combinational logic functions can



a) Nanowire based pseudo-NMOS PLA with NOR Figure 8. structure example showing 6 parallel gate line construction, 4 levels of stacked nanowires and 3 parallel channel stacks. b) Schematic representation of the NOR PLA.

be expressed in terms of sum-of-products canonical form, which can be mapped onto a PLA. In general, the PLA construction uses AND and OR planes that can be programmed to give a certain combinational logic function. For instance, an adder can be built using a PLA architecture (see Fig. 8). The PLA planes can be implemented with different logic gates. The smallest possible PLA design makes use of NORs and inverters to construct the AND and the OR planes. The implementation of the planes using vertically-stacked technology would dramatically improve the density (see Fig. 8).

# VII. CONCLUSIONS

In this paper we reported on the assessment of the verticallystacked Si nanowire technology for combinational logic circuit design. The synthesis of CLAs of different sizes were mapped with vertically-stacked nanowire libraries. We showed that the vertically-stacked nanowire technology can effectively reduce the active area of gates from 4% to 49% compared with planar SOI technology. Delay reduction of 43% has been observed using double stacks in place of the single stacks even when characterized of a high series resistance. In addition, the use of strain boosters demonstrated the capability of the nanowire technology to overcome the additional series resistance, reducing the delay by up to 28% when compared with planar SOI results. Finally, we introduced the conceptual idea of using vertically-stacked nanowire technology as a way to enhance the density of PLA regular architectures.

#### References

- [1] H. Iwai, "Roadmap for 22nm and beyond (Invited Paper),"
- Microelec. Eng., vol. 86(7-9), pp. 1520–1528, 2009. W. Fang et al., "Vertically Stacked SiGe Nanowire Array Chan-nel CMOS Transistors," Elec. Dev. Lett., vol. 28(3), pp. 211–213, [2]
- J.-P. Colinge, "From Gate-all-Around to Nanowire MOSFETs," [3]
- [4]
- J.-P. Colinge, "From Gate-all-Around to Nanowire MOSFETS," Proc. of the Int. Sem. Conf. CAS, vol. 1, pp. 11–17, 2007.
  N. Singh et al., "Si, SiGe Nanowire Devices by Top-Down Technology and Their Applications," Trans. on Elec. Dev., vol. 55(11), pp. 3107–3118, 2008.
  T. Ernst et al., "Novel 3D integration process for highly scal-able Nano-Beam stacked-channels GAA (NBG) FinFETs with HfO<sub>2</sub>/TiN gate stack," IEDM, pp. 1–4, 2006.
  J. Appenzeller et al., "Toward nanowire electronics," Trans. on Elec. Dev. vol. 52(11), pp. 2827–2845, 2008. [5]
- [6]
- *Elec. Dev.*, vol. 5(11), pp. 2827–2845, 2008. N. Balasubramanian *et al.*, "Si Nanowire CMOS Transistors and Circuits by Top-Down Technology Approach," *ECS Meeting* [7]Abstracts, vol. 801(16), p. 634, 2008. M. Dong and L. Zhong, "Nanowire crossbar logic and standard
- M. Dong and L. Zhong, "Nanowire crossbar logic and standard cell-based integration," *Trans. on VLSI Systems*, vol. 17(8), pp. [8] 997 - 1007, 2009

- D. H. Neil H.E. Weste, CMOS VLSI DESIGN: A circuits and [11] Systems Perspective, 2005. http://www.eecs.berkeley.edu/ alanmi/abc/.
- [12]