# Level Conversion for Dual-Supply Systems

Fujio Ishihara, Farhana Sheikh, Member, IEEE, and Borivoje Nikolić, Member, IEEE

Abstract—Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Additionally, circuit robustness against supply bounce is a key property that differentiates good level converter design. Novel flip-flops presented in this paper incorporate a half-latch level converter and a precharged level converter. These flip-flops are optimized in the energy-delay design space to achieve over 30% reduction of energy-delay product and about 10% savings of total power in a  $CV\bar{S}$  design as compared to the conventional flip-flop. These benefits are accompanied by 24% flip-flop robustness improvement leading to 13% delay spread reduction in a CVS critical path. The proposed flip-flops also show 18% layout area reduction. Advantages of level conversion in a flip-flop over asynchronous level conversion in combinational logic are also discussed in terms of delay penalty and its sensitivity to supply bounce.

*Index Terms*—Dual-supply voltage, flip-flop, level conversion, low power, robustness, supply bounce.

#### I. INTRODUCTION

OWER DISSIPATION is a limiting factor in both high performance and mobile applications. Independent of application, desired performance is achieved by maximizing operating frequency under power constraints that may be dictated by battery life, chip packaging, and/or cooling costs. Transistor sizing is an efficient method for optimizing the tradeoff between power and performance of a design. However, power savings from sizing alone diminish quickly when available slack in the circuit begins to disappear [1]. Lowering supply voltage results in a quadratic reduction in power dissipation but it significantly impacts delay. In constant-throughput applications, the performance loss due to low supply operation is recovered by increased pipelining or parallelism [2], but it increases the latency of the design. When both throughput and latency are constrained, there exists an optimum energy for given delay of any block achieved through circuit sizing, supply and transistor threshold  $(V_{TH})$  adjustments. To achieve power savings that exceed these conventional boundaries, power reduction techniques such as sizing and supply adjustments have to be extended [1].

Multiple supply voltages can lower power dissipation beyond the conventional supply-sizing energy-delay boundary. A re-

F. Sheikh and B. Nikolić are with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720 USA (e-mail: farhana@eecs.berkeley.edu; bora@eecs.berkeley.edu).

Digital Object Identifier 10.1109/TVLSI.2003.821548

duction in supply voltage for circuits outside critical paths can save power without sacrificing either throughput or latency. Key challenges in design of efficient multiple-supply circuits are minimizing the cost of level conversion and realizing efficient power distribution networks while maintaining the overall robustness of the design. Although these issues have been addressed for a custom data-path design [3], an effective solution for synthesized ASICs is necessary.

In a multiple-supply design, level converters are placed on the boundary between low- $V_{DD}$  ( $V_{DDL}$ ) and high- $V_{DD}$  ( $V_{DDH}$ ) to provide full swing input to  $V_{DDH}$  domain. If a pMOS transistor in the  $V_{DDH}$  region is directly driven by a  $V_{DDL}$  signal, it increases the low-to-high delay and results in significant dc current flowing through the pMOS. Instead, a pMOS cross-coupled level converter (CCLC) in Fig. 1(a) is widely used to suppress the dc current.

Dual-supply voltage (dual- $V_{DD}$ ) design using a clustered voltage scaling (CVS) scheme proposed in [4] minimizes area and delay penalties caused by level converters. In this scheme, a level converter can be combined with a flip-flop (LCFF) which becomes the key element at the voltage boundary, but very few LCFF structures have been investigated [5], [6].

Circuit robustness against supply noise is an important metric to take into account when designing a dual-supply system. The CMOS gate delay is proportional to  $V_{DD}/(V_{DD} - V_{TH})^{\alpha}$  [7] and its sensitivity to supply bounce increases as  $V_{DD}$  is lowered from  $V_{DDH}$  to  $V_{DDL}$ . Fig. 2 illustrates this by comparing the delay spread values of a  $V_{DDH}$  inverter and a  $V_{DDL}$  inverter for  $\pm 10\%$  supply bounce. The figure also includes the delay spread of CCLC which is even more severe than that of the  $V_{DDL}$  inverter thereby making robustness analysis in dual- $V_{DD}$  design indispensable. In synthesized designs, low-supply wires can be exposed to coupling from  $V_{DDH}$  signals. A robust design of a level converter must exhibit the same input noise rejection properties as a static CMOS gate.

In this paper, we expand our study [8] where we examine key properties and design metrics of level converters for dual- $V_{DD}$ systems and present several new LCFF circuits which exhibit improved energy-delay product values, reduced system-level power and better immunity to supply noise without incurring significant layout area penalties. Advantages of level conversion at synchronous boundaries over asynchronous level conversion in combinational logic are also presented in terms of delay penalty and sensitivity to supply bounce.

## **II. DUAL-SUPPLY DESIGN**

# A. Optimal V<sub>DDL</sub> Selection

A theoretical model to investigate power reduction via CVS is proposed in [5]. We employ a similar top-down approach to

Manuscript received March 1, 2003; and revised June 29, 2003. This work was supported in part by the MARCO Gigascale Silicon Research Center (GSRC) and a gift by Toshiba Corporation.

F. Ishihara is with the Broadband System LSI Project, System LSI Division, Toshiba Corporation, Kawasaki 212-8520, Japan (e-mail: fuji.ishihara@toshiba.co.jp).



Fig. 1. Basic level converter structures. A shaded gate represents a  $V_{DDL}$  gate and underlined nodes show  $V_{DDL}$ -swing signals. (a) Cross-coupled pMOS pair (CCLC) [11]. (b) Single-supply diode-voltage-limited buffer (SSLC) [12]. (c) Pass-transistor half latch. (d) Precharged circuit.



Fig. 2. Delay spread of a  $V_{DDH}$  inverter, a  $V_{DDL}$  inverter, and the cross-coupled level converter (CCLC) relative to  $T_{\rm pd}$  of each circuit for  $\pm 10\%$  supply bounce. The  $V_{DDL}$  gate exhibits higer sensitivity to supply bounce than the  $V_{DDH}$  gate, and the CCLC shows ever higher sensitivity.

determine the  $V_{DDL}/V_{DDH}$  ratio for LCFF optimization and comparisons. Two types of path delay distributions, lambda and wedge, are assumed to find the optimal  $V_{DDL}$  value. These two distributions best approximate the delay distributions of real chip designs [5], [9]. Parameters for general-purpose 0.13- $\mu$ m technology are used to simulate delay and power in the theoretical analysis. As shown in Fig. 3, the optimal  $V_{DDL}$  is between 60% and 70% of  $V_{DDH}$  regardless of delay distributions. The latter value is chosen for higher noise immunity of  $V_{DDL}$  signals against  $V_{DDH}$  noise.



Fig. 3. Theoretical analysis of CVS power based on [5] and selection of optimal  $V_{DDL}/V_{DDH}$  ratio at 0.13- $\mu$ m technology. The optimal ratio is 0.6–0.7 regardless of path delay distributions.

Choosing lower  $V_{DDL}$  voltages, such as 50% of  $V_{DDH}$  as suggested in [10] combined with multi-threshold designs yields additional energy savings; however, this low supply in a mixedsupply design presents significant challenges in signal integrity and robustness of the design. In the interest of fair comparison, our work focuses on single-threshold designs only.

#### B. Dual- $V_{DD}$ CVS Simulation

A Perl-script-based simulator is implemented to estimate power reduction of a dual- $V_{DD}$  CVS system. As illustrated in Fig. 4, the simulator models the initial single- $V_{DD}$  design as a series of paths



Fig. 4. Dual- $V_{DD}$  CVS simulation steps.

each of which consists of a chain of fanout-of-four (FO4) inverters sandwiched between two flip-flops. The initial path delay distribution is assumed to be either lambda or wedge shown in Fig. 3. Three different logic depths—12, 20, and 40 FO4 inverter unit delays—are employed to evaluate the impact on power savings of a CVS system.

Initially, all flip-flops and inverters are  $V_{DDH}$  cells. The first step substitutes all  $V_{DDH}$  flip-flops with LCFFs. Since all LCFFs investigated are driven by a  $V_{DDL}$ -swing clock, this substitution can reduce clocking power as well [11]. For negative slack paths caused by the increased delay of LCFFs, the  $V_{DDH}$ inverters are upsized to maintain the original clock cycle time. The FO3-equivalent capacitive load connected to the output of each  $V_{DDH}$  inverter remains unchanged. Then,  $V_{DDH}$  inverters are replaced with  $V_{DDL}$  inverters in each noncritical path until positive slack disappears. This replacement proceeds in reverse order from the end of each path to build the CVS structure. Finally, the simulator calculates the power of the CVS structure and compares it with the power of the initial single- $V_{DD}$  design. The impact of different LCFFs and different logic depths on power saving is quantified by this simulator, which is not possible using a theoretical approach [5].

#### III. REFERENCE LEVEL CONVERTERS

## A. Basic Circuit Structures for Level Conversion

Fig. 1 shows four types of basic level converter circuits: (a) a cross-coupled pMOS pair (CCLC) [11]; (b) a single-supply  $(V_{DDH}$ -only) diode-voltage-limited buffer (SSLC) [12]; (c) a pass-transistor half latch; and (d) a precharged circuit. A simple inverter pair suffers from a severe leakage current flowing through a pMOS which is weakly turned off by a  $V_{DDL}$  input. Our SPICE simulation shows that the dc current is 2400 times larger than the subthreshold leakage current of a pMOS properly cut off by a  $V_{DDH}$  input in a typical 0.13- $\mu$ m technology. Such excessive leakage is not acceptable

for standby-power-constrained applications. The CCLC has been widely used but its operation is relatively slow. The SSLC has been recently proposed in order to eliminate the layout placement restrictions of the level converter [12]. The performance comparison between these two asynchronous level converters will be discussed in Section III-B. The half-latch topology contains small number of transistors and is a promising level converter to minimize delay, power, and area penalties. The precharged implementation is fast, but it requires a low-swing-clock precharge mechanism. The last two circuits are embedded in the proposed LCFFs shown in Section IV.

## B. Asynchronous Level Converters for Extended CVS

Extended CVS (ECVS) [11] for dual-supply designs allows conversion from  $V_{DDL}$  to  $V_{DDH}$  anywhere within the combinational logic block using an asynchronous level converter. This technique provides added flexibility in assigning gates to different supply domains which yields incremental savings over CVS for some delay distributions. In an ECVS design, an asynchronous level converter is separated from a flip-flop and the sum delay of the two circuit elements tends to be larger than the delay of a flip-flop embedding a level converter which is used for a CVS design. The increased delay penalty reduces the amount of the added power saving of ECVS and negatively impacts the robustness of the dual-supply design.

In order to make a fair comparison between asynchronous and synchronous level conversions, it is necessary to find the best performing asynchronous level converter as a reference for a level-converting flip-flop. We employ the level converter circuits shown in Fig. 1(a) and (b), CCLC and SSLC, as candidates for our investigation. An alternative structure for CCLC has been proposed in [13], but it exhibits smaller delay and power than the conventional CCLC only at extremely low  $V_{DDL}$ ( $V_{DDL} < 45\%$  of  $V_{DDH}$ ); thus, it is excluded from our analysis. The  $V_{DDL}$  value of 1.02 V (= 85% of  $V_{DDH}$ ) is used for

TABLE I SPICE SIMULATION CONDITIONS

| Channel length                      | 0.13µm                                                                                                                                    |
|-------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| V <sub>DDH</sub> / V <sub>DDL</sub> | $1.20V / 1.02V (= 85\% \text{ of } V_{DDH})$ for async. LC comparison<br>$1.20V / 0.84V (= 70\% \text{ of } V_{DDH})$ for LCFF comparison |
| Temperature                         | 27°C                                                                                                                                      |

 TABLE
 II

 COMPARISON OF ASYNCHRONOUS LEVEL CONVERTER PROPERTIES

| Property                                         | CCLC | SSLC | Ratio |
|--------------------------------------------------|------|------|-------|
| Delay [ps]                                       | 127  | 166  | 1.31  |
| Delay spread for ± 10% supply bounce [ps]        | 28   | 93   | 3.32  |
| Energy per transition [fJ]                       | 12.7 | 11.2 | 0.88  |
| Leakage power at nominal supply voltage [nW]     | 145  | 113  | 0.78  |
| Worst leakage power for ± 10% supply bounce [nW] | 174  | 206  | 1.18  |
| Total W of transistors [um]                      | 3.68 | 4.82 | 1.31  |

the simulation, which is determined from the  $V_{TH}$  drop across MN1 of SSLC as depicted in Fig. 1(b). Each level converter is sized for minimal delay with the simulation conditions summarized in Table I.

Table II compares the properties of the two LCs. Delay spread and worst-case leakage for  $\pm 10\% V_{DDL}/V_{DDH}$  supply bounce are measured as robustness metrics. Although SSLC shows improved energy per transition and leakage power at nominal supplies, its delay and area penalties are larger than CCLC. The SSLC circuit performs poorly in the robustness arena: delay spread and the worst leakage power values indicate that CCLC is a better choice for the reference asynchronous level converter. Since SSLC limits its first stage inverter supply voltage by the diode-connected nMOS, MN1, its drive strength is extremely sensitive to supply bounce and leads to the large delay spread of the circuit. In addition, the diode-limited voltage of the first stage supply is highly dependent on  $V_{DDH}$  rather than  $V_{DDL}$ and the converter incurs significant leakage increase when it experiences lowered  $V_{DDL}$  swing on its input and raised  $V_{DDH}$ supply to the circuit.

As a result of the above analysis, the conventional CCLC is employed as the reference asynchronous level converter for an ECVS design in the following discussions, although the SSLC is very attractive from the layout perspective.

#### **IV. LEVEL-CONVERTING FLIP-FLOPS**

## A. Flip-Flop Characterization Metrics

Two important metrics to characterize flip-flop timing are d-q delay D and race immunity R [14]. The former parameter consists of setup time  $t_{setup}$  and clk-q delay  $t_{clk-q}$  while the latter is determined as a difference between  $t_{clk-q}$  and hold time  $t_{hold}$ . We introduce another timing metric, sampling window S, which is a sum of  $t_{setup}$  and  $t_{hold}$ . Average flip-flop energy per

clock cycle, E, defined in [14] is obtained by summing an energy value for each data transition  $(0 \rightarrow 0, 0 \rightarrow 1, 1 \rightarrow 1, and 1 \rightarrow 0)$  weighted by the corresponding probability of each transition. The energy-delay product (EDP) [14], [15], is also calculated from the delay D and the energy E to compare the energy-delay tradeoff among the flip-flops. HSPICE is used to obtain the parameter values. Simulation conditions for LCFF characterization are listed in Table I. Since circuit robustness to supply noise is an important criterion in dual-supply design, the sensitivity to supply bounce is measured in terms of 1) d-q delay spread of each LCFF with respect to  $\pm 10\% V_{DDH}/V_{DDL}$  bounce and 2) dual- $V_{DD}$  CVS critical path delay spread with respect to  $\pm 10\% V_{DDH}/V_{DDL}$  bounce at various logic depths.

## B. Flip-Flop Optimization Method

The flip-flop test bench is similar to one in [16] with flip-flop input pin capacitance constrained to be less than 3 fF and output load fixed at 17 fF. Data transition probability for calculating the energy E is assumed to be 10% of clock activity for both transitions (0  $\rightarrow$  1 and 1  $\rightarrow$  0) [17].

We use the optimizer built in HSPICE to explore the energy-delay (E-D) design space and to find the optimal transistor sizing of each LCFF circuit which gives the minimal EDP value. Transistor sizes in each flip-flop are changed by the optimizer to find a minimal flip-flop energy E under a given  $t_{clk-q}$  constraint. Fig. 5 is obtained by repeating this optimization with different  $t_{clk-q}$  targets with added  $t_{setup}$  to obtain the d-q delay D. The thin lines show the EDP contours. The plot touching the minimal EDP curve gives the optimal sizing for each LCFF, which is indicated by a solid symbol in the figure.

## C. Conventional Level-Converting Flip-Flops

The master–slave (M-S) type conventional LCFF [5], denoted as MSCC, is shown in Fig. 6(a). This flip-flop shifts its  $V_{DDL}$ 



Fig. 5. LCFF optimization in energy-delay space. Minimal EDP point for each LCFF is shown by a solid symbol.

input to  $V_{DDH}$  level by using a cross-coupled level converter. The shaded gates in all the schematics represent  $V_{DDL}$  gates and the underlined nodes show  $V_{DDL}$ -swing signals. Fig. 6(b) shows SPICE waveforms of the flip-flop.

Pulsed flip-flops frequently exhibit smaller d-q delay D than M-S flip-flops [17]. By designing a pulsed LCFF, more timing slack from the reduced delay D can be utilized for the additional substitution of  $V_{DDH}$  gates by  $V_{DDL}$  gates for increased power savings. Fig. 7(a) shows the schematic of a pulsed sense-amplifier LCFF (PSA), which incorporates the improved RS latch stage introduced by [18] into another conventional LCFF reported in [5]. This structure is expected to yield small delay Dat the expense of the increased energy consumption E due to repeated charging-discharging operations on nodes sb and rb. SPICE waveforms shown in Fig. 7(b) illustrate such repeated voltage swings on node sb even with the two consecutive high inputs on node d.

#### D. Proposed Level-Converting Flip-Flops

Fig. 8(a) depicts the first of the designed LCFFs, MSHL, which is a M-S latch pair with a half-latch level converter embedded on its slave side. High-level output from the master stage experiences a  $V_{TH}$  drop across the clocked nMOS (MN1) and the full voltage is restored by the pull-up inverter loop which is triggered by the series nMOS pull-down path (MN2 and MN3). This is commonly used for level restoration in pass-transistor networks. The SPICE waveforms are shown in Fig. 8(b). As compared to MSCC, this simple half-latch implementation has smaller transistor count and reduced clock loading.

Figs. 9(a) and 10(a) show two types of proposed pulsed LCFFs. In these two cases, the outputs are inverted in order to decouple the feedback inverter loop by an output inverter from the external loading. The pulsed half-latch (PHL) in Fig. 9(a) has the same topology as the slave portion of MSHL, but its nMOS pass gate (MN1) is driven by a pulsed clock *ck*, generated from *clk* using the NAND gate (ND1) and the inverter delay line (IV1–3). Fig. 9(b) shows the SPICE waveforms.

In contrast to PHL, the pulsed-precharged level converter (PPR) in Fig. 10(a) realizes level conversion by the precharged circuit where the  $V_{DDL}$  signals, d and db drive only the nMOS



Fig. 6. Conventional LCFF, MSCC, from [5] (master–slave, cross-coupled level converter). (a) Schematic. (b) SPICE waveforms.

evaluation networks to prevent the dc current from flowing through pMOS transistors. The  $V_{DDH}$  output is generated by the precharged level on node x. Precharge operation on node x is completed by the combination of the nMOS precharge device (MN1) and the back-to-back inverter loop. MN2 in this inverter loop needs to be clocked to avoid serious contention between MN1 and MN2 at the beginning of the precharge cycle. Since MN1 has a source-follower connection, it quickly loses its pull-up current as the voltage on node x approaches  $V_{DDL} - V_{TH}$ . The inverter loop takes over the remaining precharge operation. This transition is observed by the slight kink on the rising edge of node x in Fig. 10(b). IV1 is skewed to have an inversion threshold well below  $V_{DDL} - V_{TH}$  so that it can be flipped before MN1 loses its pull-up current.

The conditional data capture capability [19] is added to avoid unnecessary discharging of node x when the flip-flop captures two consecutive high inputs on d. The waveforms in Fig. 10(b) show the conditional capture operation in which unnecessary



Fig. 7. PSA (pulsed, sense amplifier-based level converter) based on [5] and [18]. (a) Schematic. (b) SPICE waveforms.

discharging of node x at the second rising edge of clk is effectively suppressed by the NOR gate (NR1) detecting the high level on node qb from the previous cycle.

An alternative LCFF from [6] employs a self-precharging mechanism instead of using the clocked precharge device. The circuit needs to have a noninverting output to trigger self-precharging and incurs additional delay and energy penalties.

#### V. COMPARISON

#### A. Level-Converter Performance

Fig. 11 compares the three timing metrics of the optimally sized LCFFs. The full length of each bar represents the d-q delay



Fig. 8. Master-slave, half-latch level converter (MSHL). (a) Schematic. (b) SPICE waveforms.

D which is divided into the sampling window S and the race immunity R. The timing of a normal  $V_{DDH}$  D-flip-flop and a  $V_{DDL}$  D-flip-flop together with an asynchronous level converter is also shown. The CCLC in Fig. 1(a) is employed as the asynchronous level converter from the comparison results shown in Table II.

The delay sum of the  $V_{DDL}$  D-flip-flop and the asynchronous level converter represents the delay penalty of performing level conversion in combinational logic in an ECVS design and its value is found to be far larger than any of the LCFF delay values. To compensate for this delay penalty, ECVS needs to be able to place many more gates in the  $V_{DDL}$  domain, which is often not possible. All the proposed LCFFs exhibit smaller d-q delay Dvalues than the conventional MSCC. Larger reduction in delay is accomplished by PHL and PPR than by MSHL. The delay improvement of these flip-flops is available at the expense of large sampling window S (or small race immunity R) due to their pulse-driven nature. Race caused by the widened window S, however, should not be a serious issue in a CVS design since all the short paths preceding the LCFFs are slowed down by replacing  $V_{DDH}$  gates with  $V_{DDL}$  gates. The small delay values of the two proposed pulsed LCFFs are even comparable to that of the conventional fast LCFF, PSA. The notable advantage of the circuits over PSA is that they have much smaller energy penalty than PSA as shown in Table III. The table summarizes



Fig. 9. PHL (pulsed, half-latch level converter). (a) Schematic. (b) SPICE waveforms.

energy, delay, and area parameters of each LCFF obtained at its optimal transistor sizing.

The unique benefit of the precharged flip-flop (PPR) is that its  $t_{clk-q}$  is comparable to that of the  $V_{DDH}$  D-flip-flop. As mentioned in Section II-B, all the  $V_{DDH}$  flip-flops are replaced by LCFFs for reduced clocking power in CVS designs and this small  $t_{clk-q}$  property of PPR is very attractive if a path that follows the LCFF is timing critical.

According to Table III, an 11% reduction in EDP is achieved by MSHL over MSCC. PPR has the smallest EDP due to its significant decrease in the d-q delay D in spite of the larger energy E than MSCC. Both of the pulsed LCFFs—PHL and PPR—show more than 30% improvement in EDP. The conventional PSA has increased EDP since its high energy consumption cannot be compensated by the delay reduction.

### B. Level Converter Robustness

A dual- $V_{DD}$  CVS system must be carefully designed to minimize supply bounces on both  $V_{DDL}$  and  $V_{DDH}$  rails. Other-



Fig. 10. Pulsed, precharged level converter (PPR). (a) Schematic. (b) SPICE waveforms.

system. Fluctuation of d-q delay D caused by  $\pm 10\%$  bounce of  $V_{DDL}$  and  $V_{DDH}$  is shown in Fig. 12. Since the delay spread needs to be budgeted as an uncertainty component with respect to cycle time  $T_c$ , its absolute values are compared. The figure also includes the fluctuation value for the combination of the  $V_{DDL}$  D-flip-flop and the asynchronous level converter. This confirms that level conversion in combinational logic for ECVS using an asynchronous level converter separately from a flip-flop suffers from a large delay fluctuation penalty (+27%) due to supply bounce and that an LCFF is more robust to supply noise. The three proposed LCFFs yield comparable or smaller fluctuations against MSCC. The maximum of 24% reduction in delay spread is obtained for PHL among the proposed LCFFs. PSA is significantly more robust against the supply noise due

|      | <i>E</i> [fJ] | <i>D</i> [ps] | EDP [fJ*ps] | Total W<br>[um] | # of<br>Tr. | Area [track <sup>2</sup> ] |
|------|---------------|---------------|-------------|-----------------|-------------|----------------------------|
| MSCC | 9.13          | 287           | 2618 (1.00) | 9.53            | 26          | 264 (1.00)                 |
| PSA  | 15.56         | 184           | 2863 (1.09) | 8.18            | 33          | 288 (1.09)                 |
| MSHL | 9.03          | 259           | 2341 (0.89) | 9.86            | 23          | 216 (0.82)                 |
| PHL  | 8.84          | 204           | 1806 (0.69) | 8.25            | 23          | 216 (0.82)                 |
| PPR  | 9.72          | 181           | 1755 (0.67) | 10.88           | 31          | 288 (1.09)                 |

TABLE III FLIP-FLOP ENERGY, DELAY, AND AREA PARAMETERS



d-q delay D timing with respect to clock edge (ps)

Fig. 11. LCFF timing comparison. Flip-flop d-q delay D is divided into sampling window S and race immunity R.



Fig. 12. Delay spread with  $\pm 10\% V_{DDH}/V_{DDL}$  bounce.

to its differential nature, but the merit comes with the energy penalty as mentioned in Section IV-C.

Fig. 6(b) shows that the conventional MSCC experiences a severe glitch on the master-latch feedback node mf whose magnitude reaches as high as 20% of  $V_{DDL}$ . The glitch appears on the rising edge of the clock clk due to charge sharing between mf and the level converter input via the clocked pass gate (PG2). Such a large glitch may disturb the logic value stored in the master latch especially when it coincides with other disturbances, such as coupling. As a consequence, the noise margin of the flip-flop may be deteriorated. MSHL and PHL are able to avoid this problem as shown in Figs. 8(b) and 9(b) since their latch feedback nodes have no loading gates which cause similar charge sharing. Although PPR also exhibits a significant glitch

on precharge node x due to charge sharing for consecutive high inputs to d as shown in Fig. 10(b), sufficient noise margin is still guaranteed since IV1 in Fig. 10(a) is skewed to have low inversion threshold  $(V_{DDL} - V_{TH})$  to take over the precharge pull-up operation triggered by the source-follower nMOS, MN1.

## C. Level Converter Layout

Robust level converter design requires both  $V_{DDL}$  and  $V_{DDH}$  to be supplied to the cell. If the cell is implemented in the  $V_{DDH}$  domain, one possible solution is to route the  $V_{DDL}$  wire to it [11], and the router must guarantee required IR drop and electromigration constraints. An interesting alternative is to use the SSLC [12], shown in Fig. 1(b), but the circuit is found to have robustness problems in terms of delay and leakage as discussed in Section III-B.

A more robust solution is to implement the dual-rail cell in which the two supply rails travel side-by-side to provide the two voltages to the cell. Such a layout does not comply with the conventional ASIC standard-cell power routing. In this work, we employ a double-cell-height architecture in which  $V_{DDH}$  and  $V_{DDL}$  supplies are available through the top and the bottom metal-1 rails, respectively, while the shared ground rail travels at the center of the cell. The width of the ground rail is twice the width of the other rails in order to have consistent abutment with neighboring single-height ASIC cells. The double-height architecture allows us to place pMOS transistors driven by  $V_{DDL}$  supply in a different standard-cell row from those driven by  $V_{DDH}$  supply and the area penalty caused by well separation can be avoided [3].

Layout patterns of MSCC, PSA, MSHL, PHL, and PPR based on the double-height topology are shown in Fig. 13 and the layout areas are summarized in Table III. The doubled cell height of  $2 \times 12$  tracks is shared by all the layouts. MSHL and PHL have smaller area by 18% compared to MSCC thanks to their simple circuit topologies, while PPR and PSA show 9% area increase due to their more complex transistor connections.

## D. System-Level Performance

The impact of each LCFF on system-level power is investigated by using the simple dual- $V_{DD}$  CVS simulator described in Section II-B and its results are plotted in Fig. 14. The power of the CVS structure normalized to the initial single- $V_{DD}$  power is simulated at different logic depths. Two path delay distributions shown by the insets are tested. Since PHL and PPR have the output inverted, FO1 inverter delay and power are added in the CVS simulation for fair comparison.



Fig. 13. LCFF layout patterns based on the double-height architecture. (a) MSCC. (b) PSA. (c) MSHL. (d) PHL. (e) PPR.



Fig. 14. Dual- $V_{DD}$  CVS system power at different logic depths for two delay distributions: (a) lambda shaped; (b) wedge shaped. CVS power values are normalized to the power of the initial single- $V_{DD}$  design.

For both path delay distributions, all the proposed LCFFs are found to lower the CVS power further from the CVS design using the conventional MSCC. The power savings become larger as the logic depth decreases, therefore, the proposed LCFFs are found to be more attractive for higher performance, deeper pipelined designs. PHL exhibits the lowest power and its power saving over the MSCC design reaches as large as 9% for the lambda-shaped delay distribution and 11% for the wedge-shaped distribution. Since the wedge-shaped delay distribution contains more critical paths, the LCFFs having smaller d-q delay D are more effective. Although PPR shows a lower D than PHL, it consumes more energy than PHL, thus losing its advantage in the CVS system as shown in Table III. The severe energy penalty of PSA causes the system-level power to exceed that of the conventional MSCC-based CVS design. This suggests that the balanced reduction of both delay and energy of an LCFF is the key to achieve improved power saving in a CVS system, which is best realized by the proposed LCFF, PHL.

Fig. 15 plots the power component breakdown of the initial single- $V_{DD}$  design, the MSCC-based CVS, and the PHL-based CVS for logic depth of 12 with lambda-shaped delay distribution. Total power of each design is divided into three components: flip-flop logic power, flip-flop clocking power, and combinational logic power. Each component is normalized to the total power of the single- $V_{DD}$  design. It should be noted that the largest power saving of the CVS designs comes from the clocking power reduction due to low-swing clocking of the LCFF circuits, not from the  $V_{DDH} \rightarrow V_{DDL}$  gate replacement in the combinational logic portion. The 9% improvement of the total CVS system power of the PHL-based design over the MSCC-based design shown in Fig. 14(a) is accomplished by the nearly two times larger reduction  $(0.41 \rightarrow 0.28)$  of the combinational logic power than the MSCC case  $(0.41 \rightarrow 0.34)$ . This results mainly from the reduced delay penalty of PHL.

#### E. System-Level Robustness

In a dual- $V_{DD}$  CVS system, the critical paths have different number of  $V_{DDL}$  gates depending on how much timing slack is available on each of the original single- $V_{DD}$  paths. This varies the delay sensitivity of a critical path to supply bounce since the  $V_{DDL}$  gate has larger supply-bounce sensitivity than the  $V_{DDH}$ gate as shown in Fig. 2. The worst-case delay spread occurs for a critical path having only  $V_{DDL}$  gates.

Fig. 16 shows the worst-case delay spread of a critical path at different logic depths for various level conversion styles assuming  $\pm 10\%$  supply bounce. Delay spread values are normalized to cycle time  $T_c$ . Since the spread includes the contribution from the level converter circuits, the critical path containing a less robust LCFF becomes less robust to supply bounce as well. The figure includes the result corresponding to an ECVS design using the  $V_{DDL}$  D-flip-flop and the asynchronous level converter placed separately in the critical path. As compared to the MSCC-based CVS design, the critical path sensitivity can be improved by 13% by employing the proposed PHL whereas the sensitivity is degraded by 14% for the D-flip-flop and the



Fig. 15. Power component breakdown of the initial single- $V_{DD}$  design, the MSCC-based CVS design, and the PHL-based CVS design. Results are for logic depth of 12 for lambda-shaped delay distribution.



Fig. 16. Dual- $V_{DD}$  CVS critical path delay spread for  $\pm 10\% V_{DDH}/V_{DDL}$  bounce at different logic depths. Delay spread values are normalized to cycle time  $T_c$ .

asynchronous level converter combination. A 30% sensitivity improvement is possible with PSA, but its CVS power performance is very poor as indicated in Fig. 14(a) and (b).

#### VI. CONCLUSIONS

Level conversion for ECVS using asynchronous level converters and that for CVS using LCFFs are compared. The advantages of the latter method are presented in terms of delay and robustness to supply bounce. Based on this comparison, three new LCFF circuits are proposed. Each circuit is optimally sized in the energy-delay design space to minimize EDP. Timing, energy, and robustness parameters of the optimized flip-flops are characterized and compared with those of the two conventional LCFFs. Layout patterns are generated for all the flip-flops to compare the area impact of the circuits accurately. Finally, the simple dual- $V_{DD}$  CVS simulator is prepared to quantify the system-level power saving of each flip-flop structure at various logic depths. The best overall performance is achieved by the PHL. The LCFF yields over 30% reduction in EDP and about 10% improvement in system-level CVS power together with 24% better robustness and 18% smaller layout size. In addition, the flip-flop reduces the critical path delay spread by 13% in a CVS design. The flip-flop also eliminates the charge-sharing glitch on the latch feedback node which is a signal integrity risk in the conventional LCFF.

#### REFERENCES

- R. W. Brodersen, M. A. Horowitz, D. Marković, B. Nikolić, and V. Stojanović, "Methods for true power minimization," in *Int. Conf. Computer-Aided Design Dig. Tech. Papers*, San Jose, CA, Nov. 2002, pp. 35–42.
- [2] A. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-power CMOS digital design," *IEEE J. Solid-State Circuits*, vol. 27, pp. 473–484, Apr. 1992.
- [3] Y. Shimazaki, R. Zlatanovici, and B. Nikolić, "A shared-well dual-supply-voltage 64-bit ALU," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, Feb. 2003, pp. 104–105.
- [4] K. Usami and M. Horowitz, "Clustered voltage scaling technique for low-power design," in *Proc. Int. Symp. Low Power Design*, Dana Point, CA, Apr. 1995, pp. 3–8.
- [5] M. Hamada *et al.*, "A top-down low power design technique using clustered voltage scaling with variable supply-voltage scheme," in *Proc. IEEE Custom Integrated Circuits Conf.*, Santa Clara, CA, May 1998, pp. 495–498.
- [6] H. Mahmoodi-Meimand and K. Roy, "Self-precharging flip-flop (SPFF): A new level converting flip-flop," in *Proc. Eur. Solid-State Circuits Conf.*, Florence, Italy, Sept. 2002, pp. 407–410.
- [7] T. Sakurai and R. A. Newton, "Alpha-power law MOSFET model and its application to CMOS inverter delay and other formulas," *IEEE J. Solid-State Circuits*, vol. 25, pp. 584–594, Apr. 1990.
- [8] F. Ishihara, F. Sheikh, and B. Nikolić, "Level conversion for dual-supply systems," in *Proc. Int. Symp. Low Power Electronics and Design*, Seoul, Korea, Aug. 2003, pp. 164–167.
- [9] J. Tschanz et al., "Design optimizations of a high performance microprocessor using combination of dual-V<sub>T</sub> allocation and transistor sizing," in Symp. VLSI Circuits Dig. Tech. Papers, Honolulu, HI, June 2002, pp. 218–219.
- [10] A. Srivastava and D. Sylvester, "Minimizing total power by simultaneous V<sub>DD</sub>/V<sub>th</sub> assignment," in Proc. Asia and South Pacific Design Automation Conf., Kitakyushu, Japan, Jan. 2003, pp. 400–403.
- [11] K. Usami *et al.*, "Automated low-power technique exploiting multiple supply voltages applied to a media processor," *IEEE J. Solid-State Circuits*, vol. 33, pp. 463–472, Mar. 1998.
- [12] R. Puri et al., "Pushing ASIC performance in a power envelope," in Proc. Design Automation Conf., Anaheim, CA, June 2003, pp. 788–793.
- [13] C. Yu, W. Wang, and B. Liu, "A new level converter for low-power applications," in *Proc. Int. Symp. Circuits and Systems*, Sydney, Australia, May 2001, pp. 113–116.
- [14] D. Marković, B. Nikolić, and R. W. Brodersen, "Analysis and design of low-energy flip-flops," in *Proc. Int. Symp. Low Power Electronics and Design*, Huntington Beach, CA, Aug. 2001, pp. 52–55.
- [15] R. Gonzalez, B. A. Gordon, and M. A. Horowitz, "Supply and threshold voltage scaling for low power CMOS," *IEEE J. Solid-State Circuits*, vol. 32, pp. 1210–1216, Aug. 1997.
- [16] V. Stojanović and V. G. Oklobdzija, "Comparative analysis of masterslave latches and flip-flops for high-performance and low-power systems," *IEEE J. Solid-State Circuits*, vol. 34, pp. 536–548, Apr. 1999.
- [17] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, "Comparative delay and energy of single edge-triggered and dual edgetriggered pulsed flip-flops for high-performance microprocessors," in *Proc. Int. Symp. Low Power Electronics and Design*, Huntington Beach, CA, Aug. 2001, pp. 147–152.
- [18] B. Nikolić, V. Stojanovic, V. G. Oklobdzija, W. Jia, J. Chiu, and M. Leung, "Improved sense amplifier-based flip-flop: Design and measurements," *IEEE J. Solid-State Circuits*, vol. 35, pp. 876–884, June 2000.
- [19] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop technique for statistical power reduction," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, San Francisco, CA, Feb. 2000, pp. 290–291.

Fujio Ishihara received the B.E. and M.E. degrees in electrical engineering from Keio University, Yokohama, Japan, in 1991 and 1993, respectively.

He joined Toshiba Corporation, Kawasaki, Japan, in 1993 and he has been involved in high-performance RISC microprocessor development since 1993. From 2001 to 2003, he studied at University of California, Berkley, as a Visiting Industrial Fellow. His research interests include low-power circuit design and high-speed clocking.

**Farhana Sheikh** (M'93) received the B.Eng. degree in systems and computer engineering (Chancellor's Medal) from Carleton University, Ottawa, ON, Canada in 1993 and the M.Sc. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1996, where she is currently working toward the Ph.D. degree in electrical engineering and computer sciences.

From 1993 to 1994, she worked for Nortel Networks as a Software Engineer in firmware and embedded systems design. In 1996, she joined the Research and Development Department of Cadabra Design Automation, Ottawa, where she spent two years as a Software Engineer and three years as a Senior R&D Manager specializing in automated synthesis of digital CMOS standard cells. Her research interests include low-power digital CMOS design, algorithms and design flows for automated design of multiple supply and multiple threshold CMOS circuits, and physical design for dual-supply CMOS circuits.

Ms. Sheikh received the NSERC'67 scholarship for graduate studies in 1994 and the Association of Professional Engineers of Ontario Medal for Academic Achievement in 1993.

**Borivoje Nikolić** (S'93–M'99) received the Dipl.Ing. and M.Sc. degrees in electrical engineering from the University of Belgrade, Belgrade, Yugoslavia, in 1992 and 1994, respectively, and the Ph.D. degree from the University of California, Davis, in 1999.

He was on the faculty of the University of Belgrade from 1992 to 1996. He spent two years with Silicon Systems, Inc., Texas Instruments Storage Products Group, San Jose, CA, working on disk-drive signal processing electronics. In 1999, he joined the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, as an Assistant Professor. His research activities include high-speed and low-power digital integrated circuits and VLSI implementation of communications and signal-processing algorithms. He is a coauthor of *Digital Integrated Circuits: A Design Perspective* (2nd ed., Englewood Cliffs, NJ: Prentice-Hall, 2003).

Dr. Nikolić received the National Science Foundation CAREER award in 2003, the College of Engineering Best Doctoral Dissertation Prize and the Anil K. Jain Prize for the Best Doctoral Dissertation in Electrical and Computer Engineering from the University of California, Davis, in 1999, and the City of Belgrade Award for the Best Diploma Thesis in 1992.