### 📧 🗋 IPOS 🥵 Electronics History

4

## 🚡 | 🔘 | 🕙 🛛 Quick Abstract | 🔂 PDF (857 KB)

## A low-power area-efficient dynamic circuit using conditionally charging pattern with embedded latching capability

(C)

Nasserian, Mahshid ; Maymandi-Nejad, Mohammd Electrical Engineering (ICEE), 2013 21st Iranian Conference on Digital Object Identifier: 10.1109/IranianCEE.2013.6599736 Publication Year: 2013 , Page(s): 1 - 6 IEEE CONFERENCE PUBLICATIONS

| ⓒ | ལୣ→Quick Abstract | ∰PDF (1208 KB) High speed and small area are the main advantages of the dynamic logic for digital circuits. Power consumption of this logic family is the main drawback. In this paper a new method for reducing the power consumption of dynamic circuits is presented. The proposed technique is especially suitable for large fan-in gates where the dynamic node discharges very frequently. These kinds of gates are widely used in high performance applications like microprocessors. The proposed method is used in an 8-input NOR gate and an 8-input OR gate. The power-delay product of these gates is reduced by 46.7% and 35.15%, respectively in the 90nm CMOS technology, compared to their conventional dynamic counterparts. Meanwhile, we show that an inherent data latching capability exists in the proposed circuit that can result in reduced silicon area in pipelined structures. View full abstract»

# A Low-Power Area-Efficient Dynamic Circuit Using Conditionally Charging Pattern with Embedded Latching Capability

Mahshid Nasserian, and Mohammd Maymandi-Nejad Electrical Engineering Department Ferdowsi University of Mashhad Mashhad, Iran Mahshid.Nasserian@stu-mail.um.ac.ir, maymandi@um.ac.ir

*Abstract*— High speed and small area are the main advantages of the dynamic logic for digital circuits. Power consumption of this logic family is the main drawback. In this paper a new method for reducing the power consumption of dynamic circuits is presented. The proposed technique is especially suitable for large fan-in gates where the dynamic node discharges very frequently. These kinds of gates are widely used in high performance applications like microprocessors. The proposed method is used in an 8-input NOR gate and an 8-input OR gate. The power-delay product of these gates is reduced by 46.7% and 35.15%, respectively in the 90nm CMOS technology, compared to their conventional dynamic counterparts. Meanwhile, we show that an inherent data latching capability exists in the proposed circuit that can result in reduced silicon area in pipelined structures.

**Keywords:** Dynamic logic; power reduction, data-latching; pipelined architecture, wide fan-in gate.

#### I. INTRODUCTION

Digital dynamic circuits can provide higher speed and are more area efficient compared to static CMOS circuits and have been the subject of many researches [1-5]. When the fanin of a gate is large the dynamic architecture becomes more attractive. For example, in a 40-bit tag comparator an OR gate with a fan-in of 80 is employed. If this wide fan-in OR gate is to be implemented by static circuit, the number of transistors that should be stacked will be impractically large. However, wide fan-in dynamic gates are very power hungry due to the fact that the capacitance of the dynamic node is very large and it discharges in almost every clock cycle. Also, dynamic circuits suffer from lower noise margin in comparison to the static logic. Many approaches have so far been presented to enhance the noise margin of wide fan-in dynamic circuits [1, 6]. These kinds of gates are extensively used in multiplexers [6], tag comparators [7], register files [8], fast adders [9], SRAM predecoder gate [10], programmable logic arrays (PLA) [11], and programmable encoders [12].

The techniques that have been proposed for power reduction of dynamic gates can be generally categorized as the following.

a) Reducing the voltage swing of the dynamic node

b) Improving the performance of the keeper

The first approach is effective since the dynamic power is proportional to the square of the voltage swing. Therefore, reducing the voltage swing can lead to power saving. In the dynamic gates the voltage of the dynamic node may change due to the leakage current or noise. Keepers are used as a remedy for this problem. The problem of using a keeper transistor is that a contention current is generated when the dynamic node is to be discharged. This leads to more power dissipation. In the second above mentioned approach the main goal is to reduce the contention current by using smaller/smarter keeper circuitry. Examples of the above mentioned techniques are discussed in section II.

In this paper a new charging scheme for dynamic circuits that is suitable for reducing the power dissipation of large fanin OR gates is proposed. In this approach the dynamic node will experience the full swing only if it is supposed to be 1 in the succeeding evaluation phase. The paper is organized as the following. In section II a few state of the art techniques for power saving in dynamic circuits are discussed. Section III presents the proposed architecture. The inherent latching property of the circuit is discussed in section IV. Simulation results and discussions are provided in section V. Section VI concludes the paper.

#### II. PRIOR TECHNIQUES

As mentioned above, the main techniques for reducing power dissipation of dynamic circuits can be divided into two categories. In this section some of the previous methods are discussed.



Figure 1. Schematic diagram of the existing works for improving dynamic circuit characteristics by techniques: (a) Current comparison based Domino. (b) Conditional keeper. (c) High speed Domino. (d) Reduced dynamic swing domino. (e) Three-phase reduced-swing dynamic logic. (f) Noise-tolerant XOR-based conditional keeper.

#### A. Current-Comparison-Based Domino

Recently an approach is proposed to enhance wide fan-in dynamic circuit robustness and its speed [1]. The main idea is to compare the current of the dynamic gate with its leakage current during the evaluation phase (Fig. 1-a). The reference circuit is a copy of the OR gate when all the inputs are high. This reference circuit produces the leakage currant that is mirrored by M<sub>3</sub>. In the precharge phase, M<sub>D</sub> becomes on and *Out* signal charges to  $V_{DD}$ . Also in this phase, M<sub>1</sub> and the main keeper, K<sub>1</sub> are off. In the evaluation phase, M<sub>D</sub> and Mp turn off. If the PDN turns on, the gate voltage of M<sub>2</sub> will drop and this transistor will be turned on. The difference between the drain current of M<sub>2</sub> and the mirrored leakage goes to the *Dyn* node. Hence, the *Dyn* node charges up and the *Out* signal turns low. The main drawback of this technique is the number of extra transistors.

#### B. Conditional Keeper Domino (CKD)

In [13] a method is presented to enhance the keeper performance of wide fan-in dynamic OR gates. As shown in Fig. 1-b, the keeper transistor is divided into two smaller

keepers,  $K_1$  and  $K_2$ .  $K_1$  is active unconditionally at the beginning of the evaluation phase but  $K_2$  is off and the PDN needs to overpower only  $K_1$ . If the PDN stays off,  $K_2$  turns on after a delay equal to  $T_{delay\_element} + T_{NAND}$ . In order to obtain a desired unity noise gain (UNG), the sum of  $K_1$  and  $K_2$  sizes is selected equal to that of a conventional keeper.

#### C. High Speed Domino (HS-Domino)

Anis *et al.* have proposed a technique called high speed domino that reduces the contention current at the beginning of the evaluation phase (Fig. 1-c) [14]. In this circuit the keeper transistor is off at the rising edge of the *CLK* signal. This is because transistor  $M_1$  charges the gate of  $K_1$  in the precharge phase. After a delay, determined by two inverters, and if the PDN stays off  $M_1$  turns off and  $M_2$  turns on. This activates  $K_2$ . Although this technique is very effective in reducing the contention current, but the circuit operation is not reliable at the beginning of *CLK* cycle.

#### D. Reduced Dynamic Swing Domino Logic

Another technique for reducing the power consumption of dynamic circuits is reducing the voltage swing of the dynamic nodes by lowering/raising the high/low voltage values as shown in Fig. 1-d [15]. In this approach two voltage levels  $(V_{DDL} < V_{DD}, V_{GNDH} > V_{GND})$  are generated by diode-connected transistors. The voltage of the dynamic node swings between  $V_{DDL}$  and  $V_{GNDH}$ . This leads to power saving in the dynamic node since the power consumption is proportional to the voltage swing. The main drawback of this technique will be the leakage current generated in the succeeding stage if the following stage is to be a full swing gate. In addition, converting the low swing voltage levels to the full swing voltage levels is a non-trivial task. Moreover, if  $V_{DDL}$  and  $V_{GNDH}$  are generated by diode connected MOSFETs, the voltage swing will be affected by process variations.

The idea of reduced voltage swing is used in wide fan-in OR gate in [16] (Fig. 1-e). In this case only the voltage swing across the large capacitance at the output of the PDN is reduced and the rest of circuit experiences full swing. This technique adds a pre-evaluation phase to the timing of the circuit. This increases the delay of the gate. Moreover, the low voltage swing at the dynamic node reduces the noise margin of the circuit. Another constraint of this logic is the usage of two extra power supplies and two additional clock pulses.

#### *E.* Other Methods

In [6] many parameters like the power consumption and speed of the traditional dynamic logic are optimized in order to achieve a targeted noise level. This is achieved by using resonant tunneling diodes and depletion-mode NMOS transistors for implementing smart keepers. The main drawback is that these devices are not compatible with the standard CMOS technology. In [17] XOR gate instead of a NOT is used to turn the keeper on and off as shown in Fig. 1-f. This allows the designer to achieve larger noise immunity. Using XOR gate makes the circuit more complex. Moreover, considerable noise may be imposed on the dynamic node during the clock transition.

The reader can refer to [1] and [3] for more detailed comparisons between previous works.

#### III. THE PROPOSED LOW POWER DYNAMIC ARCHITECTURE

In dynamic logic circuits the dynamic nodes charge to  $V_{DD}$  in every clock cycle. This issue results in high power consumption, especially when the capacitance of the dynamic node is large and the dynamic node discharges very frequently in the evaluation phase. This is the case in wide fan-in NOR/OR gates. In the proposed circuit the dynamic node will charge to  $V_{DD}$  only if it is not to be discharged in the succeeding evaluation phase.

Fig. 2 shows the basic circuit. In this circuit, transistor  $M_1$  and the pull-down network (PDN) behave like the conventional dynamic logic. Instead of a PMOS pull-up transistor an NMOS ( $M_n$ ) is used. This causes the dynamic node to charge to  $V_{DD}$  -  $V_{th,n}$  in the pre-charge phase. In low voltage circuits it is possible to make this voltage around  $V_{DD}/2$  by choosing a proper size for the  $M_n$  with respect to the period of the pre-charge phase. Transistor  $M_b$  in Fig. 2 is the bleeder transistor which compensates the leakage current and the impact of power supply, substrate and the input noise. In order to drive the bleeder, a clocked inverter is used. This is another difference between the proposed circuit and its conventional counterpart. The clocked inverter helps to reduce the direct path current in this inverter.

The proposed circuit operates as the following. During the pre-charge phase (CLK=0) the dynamic node (Out) is charged to  $V_{DD}$ - $V_{th,n}$  which will be called  $V_S$  throughout this paper. During this phase the bleeder transistor is off. In the evaluation phase (CLK=1) the clocked inverter is activated and the node Out may either discharge to GND by the PDN or go to  $V_{DD}$  by the bleeder transistor. We discuss each case separately as the following.

a) The dynamic node (*Out*) is to be discharged in the evaluation phase:

Since this node discharges to GND from a voltage less than  $V_{DD}$ , the power consumption is reduced compared to the case where *Out* is pre-charged to  $V_{DD}$ . Ignoring the leakage and direct path currents, the power consumption of the conventional dynamic circuit can be found from the following equation [18].



Figure 2. The schematic of the proposed logic.

$$P_{dyn} = C_L V_{DD}^2 f_{CLK} \tag{1}$$

However, in the proposed circuit the dynamic power is:

$$P_{dyn} = \alpha_{0 \to V_{DD}} \times C_L V_{DD}^2 f_{CLK} + \alpha_{0 \to VS} \times C_L V_{DD} V_S f_{CLK}$$

$$(2)$$

where  $\alpha_{0\to V_{DD}}$  is the probability of the dynamic node being charged to  $V_{DD}$ . In the proposed architecture the  $0\to V_{DD}$ transition happens only when the output node is to evaluate to 1 during the evaluation phase. Since in wide fan-in gates,  $\alpha_{0\to V_S}$  is much greater than  $\alpha_{0\to V_{DD}}$ , the second term in (2) is dominant. Hence we have:

$$\frac{P_{convention al}}{P_{proposed}} \approx \frac{V_{DD}}{V_S}$$
(3)

According to (3), depending on the value of  $V_s$ , we can expect a dynamic power reduction of up to  $I-V_s/V_{DD}$ . Note that in this case the delay of this circuit is less than a conventional dynamic gate since the output is discharged to *GND* from  $V_s$  which is less than  $V_{DD}$ .

b) The dynamic node (*Out*) is not to be discharged in the evaluation phase:

In the evaluation phase the PDN is off and the following clocked-NOT turns on since *CLK* is 1. The voltage  $V_S$  is large enough to turn the NMOS of the inverter on. This pulls the gate of the keeper transistor to *GND* and the dynamic node charges to  $V_{DD}$ . Since, this node is charged to  $V_{DD}$  the proposed circuit behaves like full-swing gate when connected to the following gates. In this regard, the keeper transistor plays a crucial role.

An important point that should be emphasized here is that charging the dynamic node to a voltage less than  $V_{DD}$  does not affect the power consumption of the clocked inverter used for the keeper. This can be explained as the following. During the precharge phase (*CLK=0*) the pull down path of the clocked inverter is off and no direct path exists between  $V_{DD}$  and *GND*. At the beginning of the evaluation phase (*CLK=1*), the output of the clocked inverter ( $\overline{Out}$ ) is high. Hence, no current passes through M<sub>2</sub> irrespective of its gate voltage. In this way, the direct path current in the inverter is not more than that of a conventional gate, although the gate of M<sub>2</sub> is not fully charged to  $V_{DD}$ .

In order to check the functionality of the proposed architecture, an 8-input OR/NOR gate is implemented. The waveforms of important nodes are shown in Fig. 3. As can be seen in this figure, most of the time the node *Out* is not charged to  $V_{DD}$  which leads to a considerable power saving. Also note that node  $\overline{Out}$  stays at  $V_{DD}$  for most of the time, instead of discharging to *GND* in every clock cycle in the conventional dynamic circuit, which leads to a further power saving. Moreover, if  $\overline{Out}$  is connected to a static gate, the



Figure 3. Voltage waveforms of an 8-input NOR gate

switching activity of the cascaded gate will be reduced leading to less power consumption.

Note in the reduced swing technique the propagation delay of the gate is increased due to the lower effective voltage across the gate-source terminals. In the proposed architecture the propagation delay is not degraded since the supply voltage is not lowered.

#### IV. INTRINSIC PROPERTY OF DATA-LATCHING IN THE PROPOSED CIRCUIT

Pipeline structure is one of the most suitable choices for high performance applications, especially when the delay through the critical path is long. While this structure allows higher clock frequency it requires extra registers to be added between stages. This comes at the expense of extra area, power consumption, and timing issues. A typical pipeline structure incorporates a chain of cascaded CLK and  $\overline{CLK}$  modules alternatively. These modules consist of cascaded dynamic or static gates and they are separated by a latch. This latch can be realized using C<sup>2</sup>MOS logic in the NORA structure as shown in Fig. 4 [19]. When the CLK-module is in the precharge phase, the following CLK-module is in the evaluation phase and the intermediate latch is in the hold mode. The proposed dynamic architecture benefits from a data-latching property that can be used in pipeline circuits. In this section we describe how data-latching property is embedded in the proposed dynamic structure.

Fig. 5 illustrates the voltage waveforms of the clock signals and the output nodes of the proposed dynamic circuit (Fig. 2). During the evaluation phase, the output of the circuit may go to 0 or  $V_{DD}$  depending on the input. If the output



Figure 5. (a) Illustration of self data latching property in the proposed circuit. (b) The source-gate voltage of the PMOS transistor of the following dynamic P-block.

voltage is  $V_{DD}$  in the evaluation phase, in the next precharge phase this voltage will remain  $V_{DD}$  as if this value of the output is latched in the circuit. On the other hand, if the output is zero in the evaluation phase, it increases to  $V_S$  in the next precharge phase. Assuming the next stage is a P-block dynamic circuit (operating with  $\overline{CLK}$ ), this voltage  $V_S$  is applied to the gate of a PMOS transistor. Hence, the sourcegate voltage of this PMOS transistor is  $V_{DD}$ - $V_S$ . This voltage is large enough to keep the transistor on as if a 0 is applied to the next stage. This means that the 0 is latched in the proposed dynamic circuit. Note that the source-gate voltage of the PMOS transistor of the following dynamic P-block is reduced from  $V_{DD}$  to  $V_{DD}$ - $V_S$ , causing the performance of this block to





Figure 6. Pipelined structure. (a) Conventional Dynamic logic. (b) Proposed logic.

degrade. This can be avoided by upsizing the PMOS transistor. As can be seen in Fig. 5-b, the gate-source voltage of the PMOS transistor on the rising edge of the *CLK* cycle is  $V_{DD}$  and on the falling edge of the *CLK* cycle is approximately  $V_{DD}$ - $V_S$ , hence the average gate-source voltage can be assumed to be  $V_{DD} - V_S/2$ . We call this voltage  $V_{Avg}$ . For compensating the impact of lowering the effective voltage of the PMOS

transistor, its aspect ratio should be  $\left(\frac{V_{DD} - V_{Th,p}}{V_{Avg} - V_{Th,p}}\right)^2$  times that

of a conventional one. As a result the register between pipelined stages can be omitted if the proposed dynamic circuit is used (Fig. 6). It is worth noting that this technique is race free since the registers are omitted and race can not occur.

#### V. SIMULATION RESULTS

#### A. Basic Gates

In order to explore the performance of the proposed technique, we have implemented an 8-input OR/NOR gate (OR-8, NOR-8) in the 90nm CMOS technology. The supply voltage is 1V and the load is a static inverter with a fan out of 4 (FO4). The clock frequency is set to 2.4GHz. The power consumption, delay, and unity noise gain (UNG) are obtained for the circuit. In the simulations, the input is changed from 0 to 255 uniformly. In order to calculate the UNG the definition in [3] is used. The average power consumption is obtained over a long period of time so that all possible input combinations have happened. The results are shown in Table I and are compared with a similar conventional dynamic circuit.

The results are provided for two cases. For the NOR-8 gate the output is taken from Out while for the OR-8 gate the output is taken from  $\overline{Out}$ . In the case of OR-8, since the load is connected to  $\overline{Out}$  the transistors of the output inverter are sized up.

According to Table I, the proposed structure is very effective in reducing the power consumption, while the other parameters are not changed considerably. Thus, the normalized FOM of the proposed circuit is bigger. As can be seen in this table, the proposed circuit consumes less power for the case of the NOR gate compared to the OR gate. This is due to the fact that in the OR gate the capacitance of the dynamic node is smaller.

 TABLE I.
 Comparison of the Proposed Circuit Performance

 PARAMETERS WITH CONVENTIONAL DYNAMIC LOGIC

|       | Conventional Dynamic<br>Logic |               |            | Proposed Dynamic<br>Logic |               |            |
|-------|-------------------------------|---------------|------------|---------------------------|---------------|------------|
| Gate  | Power<br>(µW)                 | Delay<br>(ps) | UNG<br>(V) | Power<br>(µW)             | Delay<br>(ps) | UNG<br>(V) |
| NOR-8 | 21.56                         | 77            | 0.345      | 12.44                     | 71            | 0.353      |
| OR-8  | 11.06                         | 98            | 0.343      | 7.10                      | 99            | 0.347      |

In order to check the robustness of the proposed technique against process variations, the NOR-8 circuit is simulated in all process corners. The results are shown in Table II. The circuit operates properly in all process corners and the power saving is still considerable in all corners.

TABLE II. PERFORMANCE PARAMETERS OF THE NOR-8 IN DIFFERENT PROCESS CORNERS

| Process<br>corners |               | nvention<br>namic log |            | Proposed Dynamic<br>logic |               |            |  |
|--------------------|---------------|-----------------------|------------|---------------------------|---------------|------------|--|
|                    | Power<br>(µW) | Delay<br>(ps)         | UNG<br>(V) | Power<br>(µW)             | Delay<br>(ps) | UNG<br>(V) |  |
| FF                 | 22.24         | 52                    | 323        | 13.58                     | 51            | 331        |  |
| FS                 | 20.50         | 44                    | 270        | 13.94                     | 41            | 278        |  |
| SF                 | 22.92         | 155                   | 470        | 11.27                     | 140           | 476        |  |
| SS                 | 19.93         | 96                    | 368        | 11.82                     | 87            | 375        |  |

 
 TABLE III.
 PERFORMANCE PARAMETERS OF A 16-BIT OR GATE USING DIFFERENT PRIOR TECHNIQUES

| Gate                     | Power<br>(µW) | Delay<br>(ps) | UNG<br>(mV) | Normalized<br>PDP | No. of<br>Transistors |
|--------------------------|---------------|---------------|-------------|-------------------|-----------------------|
| Conventional             | 24.82         | 108           | 305         | 1                 | 21                    |
| CKP-domino<br>[13]       | 40.78         | 84            | 305         | 1.27              | 32                    |
| HS-domino<br>[14]        | 33.39         | 79            | 305         | 0.984             | 29                    |
| Reduced<br>swing [15]    | 25.62         | 151           | 376         | 1.44              | 23                    |
| XOR-based<br>keeper [17] | 34.93         | 88            | 305         | 1.043             | 25                    |
| Proposed                 | 13.96         | 119           | 305         | 0.619             | 22                    |

For the sake of comparison, an OR-16 gate is simulated and its performance is compared with a few state of the art dynamic circuits. The results are shown in Table III. The simulations are done under the same conditions as for the OR-8 gate explained above. All the gates are designed such that they have the same UNG (305 mV). As can be seen in Table III, HS domino, CKP domino and XOR-based keeper have better performance compared to the conventional dynamic circuit. Although, the reduced dynamic swing inherently offers higher UNG (due to the diode inserted in the foot of the circuit), but its performance and power dissipation are not as good as other circuits. According to Table III, the proposed circuit is very effective in saving power and PDP, while it requires minimum number of extra transistors.

#### B. Pipelined Architecture Using the Concept of Self-Data Latching

A NOR-4 pipelined with a 4-input OR gate is selected as the benchmark circuit to show the inherent data latching property of the proposed circuit. The designed circuit is depicted in Fig. 7. As can be seen in this figure, the NOR-4 gate operates with the rising edge of the *CLK* and the 4-input OR gate operates with the rising edge of  $\overline{CLK}$  signal. This structure can be used as a *PLA* in which the first stage (NOR gate) operates as the *AND* plane [11]. This is because using De Morgan's theorem we can write:  $AB = \overline{\overline{A} + \overline{B}}$ . For this purpose, we suppose that the inputs of NOR gate are inverted. The second stage works as the *OR* plane.



Figure 7. A pipelined PLA using the proposed structure

Simulation results are reported in Table IV. This table shows the benefits of the proposed architecture in PDP reduction without considerable speed degradation. It is assumed that inputs of the second stage are delayed for half of the clock period. This simulation is carried out in the 1-V 90-nm CMOS technology at the clock frequency of 1GHz. The output load is considered to be a 4×minimum-size inverter.

TABLE IV. CHARACTERISTICS OF A PIPELINED 4-BIT PLA

|              | ]     | Power (µW | Dalari | Nf            |                       |  |
|--------------|-------|-----------|--------|---------------|-----------------------|--|
| Architecture | Gates | Register  | Total  | Delay<br>(ps) | No. of<br>Transistors |  |
| Conventional | 24.39 | 2.02      | 26.41  | 78.5          | 22                    |  |
| Proposed     | 17.35 | -         | 17.35  | 79.6          | 18                    |  |

#### VI. CONCLUSIONS

A new technique for designing wide fan-in dynamic logic circuits is presented that modifies the charging scheme of highly capacitive dynamic node. The voltage swing of the dynamic node is kept less than  $V_{DD}$  in the precharge phase and the dynamic node will be charged to  $V_{DD}$  at the beginning of the evaluation phase only if the output is to evaluate to 1.

This technique is very efficient in reducing the power consumption of the wide fan-in dynamic gate. The proposed technique is used in an 8 and 16-input OR/NOR gate and the performance is compared with the performance of a few previously presented circuits. Meanwhile, we show that using the proposed charging scheme, it is possible to lower the circuit area on silicon by omitting the positive latches between stages in the pipelined architectures.

#### REFERENCES

 A. Peiravi and M. Asyaei, "Current-comparison-based domino: New low-leakage high-speed domino circuit for wide fan-in gates," *IEEE Trans. Very Large Scale Integr. Syst.*, 2012.

- [2] H. Mostafa, M. Anis, and M. Elmasry, "Novel timing yield improvement circuits for high-performance low-power wide fan-in dynamic or gates," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 8, pp. 1785-1797, Aug. 2011.
- [3] H. F. Dadgour, and K. Banerjee, "A novel variation-tolerant keeper architecture for high-performance low-power wide fan-in dynamic or gates," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 18, no. 11, pp. 1567-1567, Nov. 2010.
- [4] Y. Lih, N. Tzartzanis, and W. W. Walker, "A leakage current replica keeper for dynamic circuits," *IEEE J. Solid-State Circuits*, vol. 42, no. 1, pp. 48–55, Jan. 2007.
- [5] A. Amirabadi, A. Afzali-Kusha, Y. Mortazavi, and M. Nourani, "Clock delayed domino logic with efficient variable threshold voltage keeper," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 15, no. 2, pp. 125 -134, Feb. 2007.
- [6] L. Ding and P. Mazumder, "On circuit techniques to improve noise immunity of CMOS dynamic logic," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 9, pp. 910-925, Sep. 2004.
- [7] H. Suzuki, C. H. Kim, and K. Roy, "Fast tag comparator using diode partitioned domino for 64-bit microprocessors," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 54, no. 2, pp. 322-328, Feb. 2007.
- [8] R. K. Krishnamurthy, A. Alvandpour, G. Balamurugan, N. R. Shanbhag, K. Soumyanath, and S. Y. Borkar, "A 130-nm 6-GHz 256×32 bit leakage-tolerant register file," *IEEE J. Solid-State Circuits*, vol. 37, pp. 624–632, May 2002.
- [9] C. Wang, C. Huang, C. Lee, and T. Cheng, "A Low Power High-Speed 8-Bit Pipelining CLA Design Using Dual-Threshold Voltage Domino Logic,", "*IEEE Trans. Very Large Scale Integr. Syst.* vol. 16, no. 5, May 2008
- [10] K. Mai E. Alon, D.Liu, Y. Kim, D. Patil, and M. A. Horowitz., "Architecture and circuit techniques for a 1.1-GHz 16-Kb reconfigurable memory in 0.18-µm CMOS," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 261 -275, Jan. 2005.
- [11] K. Oh and L. Kim, "A high performance low power dynamic PLA with conditional evaluation scheme," *IEEE Int. Symp. Circuits Syst.* (ISCAS'04), vol. 2, pp. 881-884, 2004.
- [12] P. Petrov, and A. Orailoglu, "Tag compression for low power in dynamically customizable embedded processors," *IEEE Trans. Comp-Aided Des. Integr. Circuits Syst.*, vol. 23, no. 7, pp. 1031-1047, July 2004.
- [13] A. Alvandpour, R. Krishnamurthy, K. Sourrty, and S. Y. Borkar, "A sub-130-nm conditional-keeper technique," *IEEE J. Solid-State Circuits*, vol. 37, no. 5, pp. 633–638, May 2002.
- [14] M. H. Anis, M. W. Allam, and M. I. Elmasry, "Energy-efficient noisetolerant dynamic styles for scaled-down CMOS and MTCMOS technologies," *IEEE Trans. Very Large Scale (VLSI) Syst.*, vol. 10, no. 2, pp. 71–78, Apr. 2002.
- [15] Zh. Liu and V. Kursun, "High speed low swing dynamic circuits with multiple supply and threshold voltages," in *Proc.* 2006 *Emerg. VLSI Tech. Arch. (ISVLSI'06).*
- [16] A. Rao, Th. Haniotakis1, Y. Tsiatouhas and H. Djemi, "The use of preevaluation phase in dynamic CMOS logic," in *Proc. 2005 IEEE Computer Society Annual Symposium on VLSI New Frontiers in VLSI Design*, pp. 270 – 271.
- [17] C. Hua, W. Hwang and C. Chen, "Noise-tolerant XOR-based conditional keeper for high fan-in dynamic circuits," *IEEE Int. Symp. Circuits Syst. (ISCAS'05)*, vol. 1, pp. 444 – 447, 2005.
- [18] J. M. Rabaey, A. Chandrakasan, and B. Nicolic, *Digital Integrated Circuits: A Design Perspective*, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2003.
- [19] N. Goncalves and H. De Man, "NORA: a race-free dynamic CMOS technique for pipelined logic structures," *IEEE J. Solid-State Circuits*, vol. 18, no. 3, pp. 261–266, June 1983.