# Subthreshold Source-Coupled Logic Circuits for Ultra-Low-Power Applications

Armin Tajalli, Student Member, IEEE, Elizabeth J. Brauer, Member, IEEE, Yusuf Leblebici, Senior Member, IEEE, and Eric Vittoz, Life Fellow, IEEE

Abstract-This paper presents a novel approach for implementing ultra-low-power digital components and systems using source-coupled logic (SCL) circuit topology, operating in weak inversion (subthreshold) regime. Minimum size pMOS transistors with shorted drain-substrate contacts are used as gate-controlled, very high resistivity load devices. Based on the proposed approach, the power consumption and the operation frequency of logic circuits can be scaled down linearly by changing the tail bias current of SCL gates over a very wide range spanning several orders of magnitude, which is not achievable in subthreshold CMOS circuits. Measurements in conventional 0.18  $\mu$ m CMOS technology show that the tail bias current of each gate can be set as low as 10 pA, with a supply voltage of 300 mV, resulting in a power-delay product of less than 1 fJ. Fundamental circuits such as ring oscillators and frequency dividers, as well as more complex digital blocks such as parallel multipliers designed by using the STSCL topology have been experimentally characterized.

*Index Terms*—CMOS integrated circuits, CMOS logic circuit, current-mode logic (CML), pipelining, power-delay product, source-coupled logic (SCL), subthreshold CMOS, subthreshold SCL, ultra-low-power circuits, weak inversion.

# I. INTRODUCTION

HE demand for implementing ultra-low-power digital systems in many modern applications such as mobile systems [1], [2], sensor networks [3], [4], and implanted biomedical systems [5], has increased the importance of designing logic circuits in subthreshold regime [6]. In subthreshold MOSFET operation, current density is very low and the ratio of the transconductance to bias current of the device  $(g_m/I_D)$  is maximum [7], [8]. Meanwhile, the exponential relationship between drain current and gate voltage makes this mode of operation very suitable for implementing widely adjustable circuits [7], [9]. Conventional CMOS logic circuits utilizing subthreshold transistors can typically operate with a very low power consumption [10]–[13], which is mainly due to the dynamic (switching) power consumption and is quadrWRatically dependent to the supply voltage as  $C \cdot f_{\rm op} \cdot V_{\rm DD}^2$  (where  $f_{\rm op}$  is the frequency of operation and  $V_{\rm DD}$ indicates the supply voltage). Hence, reducing the supply voltage will result in reduction of power dissipation [1], [14] as well as the output logic swing. Supply voltage reduction, on the other hand, increases the delay in each gate which means the power dissipation, logic swing, and speed of operation are tightly related

Manuscript received November 19, 2007; revised February 10, 2008.

E. J. Brauer is with Electrical Engineering Department, Northern Arizona University, Flagstaff, AZ 86911 USA (e-mail: elizabeth.brauer@nau.edu).

Digital Object Identifier 10.1109/JSSC.2008.922709

to each other. Meanwhile, the exponential relationship between power dissipation and supply voltage in subthreshold regime makes the accurate control of power consumption difficult. To implement very low power digital systems, it is necessary to minimize the energy dissipation at the system level in addition to the gate level to achieve the desired performance [10].

Source-coupled logic (SCL) circuits are widely used in mixedmode integrated circuits where supply noise and substrate noise injection are crucial [15]. Reduced output voltage swing in SCL circuits compared to the CMOS logic gates has made this topology very suitable for high frequency applications [16], [17]. This paper explores the potentials of subthreshold SCL circuits as an alternative solution for implementing ultra-low-power digital systems. In this approach, the power consumption and maximum speed of operation can be adjusted linearly through the tail bias current of each gate over a very wide range [18], [19], thus, efficiently decoupling the decision of output voltage swing from power dissipation and delay.

To enable operation at very low current levels and to achieve the desired performance specifications, special circuit techniques have to be applied, [18]–[21], for implementing very low power SCL circuits. In [20], the intrinsically limited output impedance of deep-submicron, short-channel pMOS devices has been used to implement very high value load resistances for SCL topology. Here, a more general approach with much less sensitivity to process and technology variations will be introduced [19].

This paper presents novel techniques for implementing subthreshold SCL (STSCL) gates where the bias current of each cell can be set as low as 10 pA. In Section II, after a brief review of SCL circuits, the proposed technique for implementing subthreshold SCL gates will be introduced. Section III discusses the power-delay performance of the proposed circuit configuration. Experimental results and comparison with conventional CMOS circuits are presented in Section IV, followed by conclusions in Section V.

#### II. SUBTHRESHOLD SOURCE-COUPLED LOGIC CIRCUITS

#### A. Conventional SCL Topology

In an SCL gate, the logic operation takes place mainly in current domain. Therefore, the speed of operation can be inherently high. Shown in Fig. 1, a logic network composed of nMOS source coupled differential pair switches steers the tail current  $(I_{SS})$  to one of the output branches based on the input logic levels. The output load resistance  $(R_L)$  converts the branch current back to the voltage domain in order to drive the subsequent SCL gates. The voltage swing at the output node  $(V_{SW} = R_L)$ .

A. Tajalli, Y. Leblebici, and E. Vittoz are with the Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland (e-mail: armin. tajalli@epfl.ch; yusuf.leblebici@epfl.ch; eric.vittoz@epfl.ch).



Fig. 1. A conventional SCL-based inverter/buffer circuit. The switching part can be composed of a network of nMOS source-coupled pairs to implement more complex logic functions [15]. The load resistances can be implemented using pMOS devices biased in triode region.

 $I_{\rm SS}$ ) should be high enough to switch completely the input differential pair of the next stage (i.e.,  $V_{SW} > V_{SW,IN,min}$ ). Based on this observation, the voltage swing should be larger than  $\sqrt{2n} \cdot V_{\text{DSsat}}$  (V<sub>DSsat</sub> is the drain-source overdrive voltage of the input nMOS devices when  $V_{IN} = 0$ ) when the input nMOS devices are in strong inversion [22], and larger than  $4 \cdot n \cdot U_T$  when the devices are in weak inversion [7] ( $U_T = kT/q$  is the thermal voltage and n is the subthreshold slope factor). Therefore, the required voltage swing when the devices are in subthreshold regime can be as low as  $4 \cdot n \cdot U_T$  which is about 150 mV at room temperature (assuming n = 1.5). This swing in the subthreshold regime depends on the subthreshold slope factor n and is independent of the threshold voltage of the nMOS switching devices. Provided that the load resistance can be made sufficiently high, this means that the switching operation of nMOS devices has low dependence on the fabrication process variations. Therefore, as long as the tail bias current  $I_{SS}$  is higher than junction leakage currents and output impedance of the devices is much higher than the load resistance, the proposed topology can operate properly as a logic circuit, even in aggressively scaled deep-submicron technologies. Unlike CMOS logic circuits where the subthreshold channel leakage current is the dominant leakage component, in STSCL topology the main leakage currents are due to the p-n junctions of the MOS devices.

The speed of operation in an SCL gate is mainly limited by the time constant at the output node which is

$$\tau_{\rm SCL} = R_L \cdot C_L = V_{\rm SW} \cdot C_L / I_{\rm SS}.$$
 (1)

Based on this, the propagation delay is inversely proportional to the tail bias current. Meanwhile, the circuit power–delay product (PDP) is independent of  $I_{\rm SS}$  [15], [16], [23].

# B. Load Device Concept

To maintain the desired output voltage swing at very low bias current levels, it is necessary to increase the load resistance value in inverse proportion to the reducing tail bias current as

$$R_L = V_{\rm SW} / I_{\rm SS}.$$
 (2)



Fig. 2. (a) Conventional pMOS load device, (b) proposed load device, (c) *I–V* characteristics of the conventional pMOS load (dotted) in comparison to the proposed device (solid line), (d) measured *I–V* characteristics of the proposed load device in comparison to the BSIM model (all data obtained using 0.18  $\mu$  m CMOS technology).

In subthreshold operation, the tail bias current would be in the range of few nA or even less. Therefore, to obtain a reasonable output voltage swing, the load resistance should be in the range of hundreds of  $M\Omega$ . Meanwhile, this resistance should be controlled very accurately based on the  $I_{SS}$  value. Hence, a well controlled high resistivity load device with a very small area is required. For this range of resistivity, conventional pMOS devices biased in triode region can not be utilized since the required channel length of the transistor would be impractically large [Fig. 2(a)]. Fig. 2(c) (dotted line) shows the I-V characteristics of a pMOS device realized in 0.18  $\mu$ m technology for dif-



Fig. 3. Cross-section view of the proposed pMOS load device, showing the parasitic components that contribute to operation in subthreshold regime.

ferent  $V_{SG}$  values, indicating that the configuration of Fig. 2(a) results in a current source with almost infinite output impedance, even for deep-submicron devices. Hence, the gain would not be limited, neither would the amplitude. Fig. 2(b) shows the proposed load device, where the drain of the pMOS device is connected to its bulk. In this way, as illustrated in Fig. 2(c), the configuration shown in Fig. 2(b) produces a finite and controllable differential resistance, which, associated with the transconductance of the differential pair will provide a controlled, limited gain and amplitude. Thus, it is possible to implement a very high resistivity load device using a single minimum size pMOS device. The fact that each individual pMOS load device must be confined in its own n-well also does not have a severe impact on area as will be demonstrated later. The measured DC I-V characteristics of the device are shown in Fig. 2(d). For  $V_{\rm SD} > 0$ (bulk tied to the drain), the device operates as a very high resistivity element as expected. This plot also shows that the measurement results are very close to the resistance values predicted by simulations.

The cross section view of the proposed pMOS load device can be seen in Fig. 3. Connecting the drain to the bulk of the pMOS load device ties the cathode of the n-well-to-substrate reverse-biased diode to the output node. However, since the devices are minimum size, the parasitic capacitance associated with this diode is very small and can usually be neglected (in this design using 0.18  $\mu$ m technology:  $C_d < 1$  fF). The other important parasitic element is the forward biased source-bulk diode. Illustrated in Fig. 3, this diode can limit the possible voltage swing at the drain of the device to 400–500 mV. However, as the required voltage swing for subthreshold SCL gates is well below this value, the source-bulk diode does not influence the operation of the circuit.

Using the EKV model, the I-V characteristics of the subthreshold pMOS device can be expressed by [7], [8]

$$I_{\rm SD} = I_0 \cdot e^{\frac{V_{\rm BG} - V_{T0}}{n_p U_T}} \left( e^{\frac{-V_{\rm BS}}{U_T}} - e^{\frac{-V_{\rm BD}}{U_T}} \right)$$
(3)

in which  $I_0 = 2n_p \mu C_{\text{ox}} \cdot (W)/(L_{\text{eff}})U_T^2$ . In the proposed configuration illustrated in Fig. 2(b),  $V_{\text{BD}} = 0$ , hence

$$I_{\rm SD} = I_0 \cdot e^{\frac{V_{\rm DG} - V_{T0}}{n_p U_T}} \left( e^{\frac{V_{\rm SD}}{U_T}} - 1 \right).$$
(4)



1701

Fig. 4. A very high value floating resistor composed of two back to back pMOS devices: (a) circuit schematic, and (b) measured I-V characteristics of the controlled floating resistor.

Therefore, the output small signal resistance of the proposed load device is

$$R_{\rm SD} = \left(\frac{\partial I_{\rm SD}}{\partial V_{\rm SD}}\right)^{-1}$$
$$= \left(\frac{n_p U_T}{I_b}\right) \cdot \left((n_p - 1) \cdot e^{(n_p - 1)v_{\rm SD}} + e^{-v_{\rm SD}}\right)^{-1}$$
(5)

$$R_{\rm SD} = \left(\frac{n_p U_T}{I_{\rm SD}}\right) \cdot \left(\frac{e^{V_{\rm SD}/U_T} - 1}{(n_p - 1)e^{V_{\rm SD}/U_T} + 1}\right) \tag{6}$$

in which  $v_{\rm SD} = V_{\rm SD}/(n_p U_T)$  and  $I_b = I_0 \cdot e^{(V_{\rm SG} - V_{T0})/(n_p U_T)}$ . Thus,  $R_{\rm SD}$  can be controlled through the source-gate voltage  $(V_{\rm SG})$  of the device through  $I_{\rm SD}$ . Because of exponential dependence of the output resistance on  $V_{\rm SG}$ , it can be adjusted in a very wide range. To avoid process-related deviations, a replica bias generator is required for  $V_{\rm SG}$ , as explained in the next section. The wide tuning range of  $R_{\rm SD}$  means that the proposed STSCL gate can be used in a very wide range of operating conditions without the need for modifying the size of devices. Meanwhile, as long as the matching requirements are respected, the frequency of operation would be linearly proportional to the bias current.

Note that when  $V_{SD}$  becomes negative, the current direction is reversed and the device switches to conventional configuration in which the bulk is connected to source. In this case, the drain current will increase rapidly. This property can help implement high valued *floating resistors* with a very wide adjusting range by connecting two pMOS transistors in series as shown in



Fig. 5. Subthreshold SCL gate and the replica bias circuit used to control the output voltage swing.

Fig. 4. The measured I-V characteristics of this floating resistor show moderate linearity in a wide voltage range, which can be exploited in various analog circuit applications.

## C. STSCL Gates

The proposed pMOS load device can be utilized to implement an SCL gate biased in subthreshold. Fig. 5 shows the basic structure of the proposed STSCL gate. A simplified circuit diagram of the replica bias circuit used to control the output voltage swing is also shown. In this schematic, all devices operate in subthreshold regime and the tail bias current can be reduced until it becomes comparable in magnitude to the leakage currents that exist in the circuit.

Since the input differential pair transistors are operating in subthreshold, it can be shown that the transconductance of the input differential pair is

$$G_m = \frac{\partial I_{\rm OUT}}{\partial V_{\rm IN}} = \left(\frac{I_{\rm SS}}{2n_n U_T}\right) \cdot \frac{1}{\cosh^2\left(V_{\rm IN}/(2n_n U_T)\right)} \quad (7)$$

in which  $V_{\rm IN}$  indicates the input differential voltage and  $n_n$  is the subthreshold slope of nMOS devices. Based on (7), for  $V_{\rm IN} > 4n_n U_T$  the entire current will be switched to one of the branches. Therefore, a voltage swing of more than  $4n_n U_T$  would be sufficient to make sure that the gain of STSCL circuit is enough to be used as a logic gate. Combining (7) with (6) results in

$$A_V = \frac{\partial V_{\text{OUT}}}{\partial V_{\text{IN}}} \le A_V |_{V_{\text{IN}}=0} \simeq \frac{n_p}{n_n \cdot (n_p - 1)}.$$
 (8)

Fig. 6(a) illustrates the DC transfer characteristics of an STSCL gate as well as the stage gain. The simulated DC gain of 3.2 at the cross-over point is very close to the value estimated by (8). The measured input–output transfer characteristics of an STSCL buffer stage are shown in Fig. 6(b). Since all the devices are operating in subthreshold regime, the transfer characteristics of the circuit is independent of the bias current. In this plot, the deviation from the ideal DC characteristics is

mainly due to the leakage currents in the test circuit coming from electrostatic discharge (ESD) protection circuitry. To measure the DC characteristics, output voltage swing has been adjusted manually.

Meanwhile, based on (5) it can be shown that the equivalent output resistance of the pMOS load for  $V_{SD} = 0$  V is finite and equal to

$$R_{\rm SD}|_{V_{\rm SD}=0} = \frac{U_T}{I_0} \cdot e^{-\frac{V_{\rm SG}-V_{T0}}{n_p U_T}}$$
(9)

which means the load devices are capable of pulling up the output node completely to  $V_{\rm DD}$ .

Concerning the area overhead associated with the pMOS load devices, actual mask layout examples using 0.18  $\mu$ m CMOS technology design rules provide an accurate assessment. The layout of a three-input XOR gate is shown in Fig. 7 where the area required for the pMOS load devices is demonstrated to be small compared to the remaining parts of the circuit.

### D. Voltage Swing Control

A controlling circuit is necessary to keep the voltage swing at the output of the SCL gates on the desired value. Fig. 5 shows the simplified schematic of a replica bias (RB) circuit [15]. This circuit should be well matched to the SCL gates to have very low deviation in operating point. Meanwhile, amplifier  $A_{VR}$  should provide enough gain with a very low offset to have the desired accuracy. In this work, a folded-cascode amplifier has been used to provide a large swing at the output node and to be able to test the SCL gates in a very wide range of bias current values.

Any mismatch in the bias current or devices of the SCL gates and RB circuit will result in variation of the desired output voltage swing ( $\Delta V_{\rm SW}$ ) and it can be shown that the sensitivity of this circuit to the mismatches is

$$\left(\frac{\Delta V_{\rm SW}}{U_T}\right)^2 \simeq \left(\frac{n_p}{n_p - 1}\right)^2 \cdot \left(\left(\frac{\Delta I_{\rm SD}}{I_{\rm SD}}\right)^2 + \left(\frac{\Delta\beta}{\beta}\right)^2 + \left(\frac{\Delta V_{T0}}{n_p U_T}\right)^2\right) \quad (10)$$



Fig. 6. (a) Simulated DC transfer characteristics of an STSCL gate biased at  $I_{\rm SS} = 1$  nA and its DC gain. (b) Measured transfer characteristics of an STSCL buffer stage for two different supply voltages ( $V_{\rm DD} = 0.6$  V and 1.0 V) and different bias currents ( $I_{\rm SS} = 1$  nA, 10 nA, and 100 nA).



Fig. 7. Mask layout of the three-input XOR gate showing the area occupied by the major components. Note that the pMOS load devices with their isolated n-wells occupy a relatively small area compared to the nMOS logic network.

in which  $\beta = \mu C_{\text{ox}} W / L_{\text{eff}}$ . Monte Carlo simulations show that for minimum size devices,  $\Delta V_{\text{SW}}$  can be as high as 20–40 mV in

a typical 0.18  $\mu$ m process considered in this work. To compensate the influence of device mismatch,  $V_{\rm SW}$  should be selected a little larger than the minimum value.

Meanwhile, it can be shown that the voltage gain from gate to drain of transistor MPR in Fig. 5 is small  $(|A_{V,MPR}| = g_{m,MPR} \cdot R_{SD} \simeq 1/(n_p - 1))$ . Therefore, in spite of the exponential relationship between  $I_{SD,MPR}$  and  $V_{SG,MPR}$ , the gain of this stage is low and the RB circuit can be stabilized without difficulty. Finally, please note that one single replica bias circuit can be used for a large number of STSCL gates. Therefore, its area overhead would be negligible in large scale applications.

# **III. PERFORMANCE ANALYSIS AND OBSERVATIONS**

*Power-Delay Product:* The power dissipation of the STSCL gate is  $P = V_{DD} \cdot I_{SS}$  where  $I_{SS}$  is the tail bias current, and the typical delay of the gate is

$$t_d = ln(2) \times C_L V_{\rm SW} / I_{\rm SS}.$$
 (11)

Thus, the Power×Delay product (PDP) is found as

$$PDP_{STSCL} = ln(2) \times C_L V_{SW} V_{DD}.$$
 (12)

Meanwhile, the power-to-frequency ratio can be calculated as

$$(P/f)_{\rm STSCL} = \frac{V_{\rm DD}I_{\rm SS}}{f} \tag{13}$$

where the operating frequency f is defined as

$$f = \alpha \cdot f_{\max} \tag{14}$$

with  $\alpha$  being the activity rate factor (duty rate) and  $f_{\text{max}}$  being the maximum possible operating frequency:  $f_{\text{max}} = 1/(2 \times t_d)$ . Thus, the (P/f) ratio is

$$(P/f)_{\rm STSCL} = \frac{2ln(2)C_L V_{\rm DD} V_{\rm SW}}{\alpha}$$
(15)

which provides a more practical measure for the power/frequency tradeoff of any functional block.

Observation 1: The delay (or the maximum operating frequency) in a STSCL gate depends on the tail bias current ( $I_{SS}$ ), but not on  $V_{DD}$ . Therefore, the delay of a logic block can be controlled without influencing PDP, which is not possible in conventional CMOS topologies. More importantly, the speed and the operation (supply) voltage can be effectively decoupled in STSCL circuits. This point will be further elaborated in Section IV-B.

Observation 2: To reduce the (P/f) ratio,  $\alpha$  should be kept as large as possible. This observation does not contradict with similar results for conventional CMOS, where

$$(P/f)_{\rm CMOS} = C_L V_{\rm DD}^2 \left( 1 + \frac{2}{\alpha} e^{-\frac{V_{\rm DD}}{nU_T}} \right) \tag{16}$$

as shown in [6]. However, the influence of  $V_{\rm DD}$  on (P/f) is quite different in conventional CMOS, where an optimum  $V_{\rm DD}$ value to minimize (P/f) can be found, especially for small  $\alpha$ values, due to the significant leakage in CMOS.

*Observation 3:* Assuming that the system clock frequency is dictated by the longest delay path between two consecutive



Fig. 8. Photomicrograph of the test circuits: (a) ring oscillator; (b) frequency divider.

register stages, and assuming that the activity rate depends inversely on the maximum logic depth between two registers, it is most beneficial to keep the logic depth as shallow as possible, and thus, increase  $\alpha$ . This calls for very short (one stage) pipelining in STSCL systems, which is demonstrated with an example in Section IV-D.

The output load capacitance  $C_L$  is partially due to the device parasitic capacitances such as the capacitance of n-well to p-substrate reverse biased diode  $(C_d)$  and wiring capacitance related to interconnections  $(C_W)$ . Since n-well to p-substrate capacitance  $(C_d)$  for small size pMOS devices is less than 1 fF, it can be ignored in comparison to the wiring capacitance  $C_W$  which can be much larger even for simple circuits.

Regarding (12), one can conclude that the achievable power-delay product per unit capacitance would be  $V_{\rm SW} \cdot V_{\rm DD}$ . This means that for a supply voltage of 400 mV and  $V_{\rm SW} = 150 - 200$  mV, the minimum achievable PDP would be 0.04-0.06 [fJ/fF/Gate]. Since the total parasitic capacitance due to the STSCL gate itself (including  $C_d$ ) is less than 1 fF, the minimum PDP that can be expected for an unloaded gate is PDP<sub>min</sub> > 0.04 [fJ/Gate]. Notice that PDP also depends on temperature through  $V_{\rm SW}$  and can be reduced by reducing the temperature.

#### IV. TEST STRUCTURES AND MEASUREMENT RESULTS

#### A. Ring Oscillator and Divider Operation

To measure the delay versus power consumption for the proposed STSCL gates, a test chip has been designed and fabricated in conventional 0.18  $\mu$ m CMOS technology. The test structures



Fig. 9. Measured oscillation frequency versus power dissipation of the eight-stage ring oscillator based on the proposed STSCL topology for  $V_{\rm DD} = 0.3$  V, 0.4 V, and 1.0 V.

consist of eight-stage ring oscillator and frequency divider (divide-by-8) circuits, both of which are implemented based on a two-input multiplexer (MUX) STSCL gate. The microphotographs of the test circuits are shown in Fig. 8. To control the operation of the test circuits, the tail bias current of the SCL gates can be adjusted externally. Internal current mirrors with the ratio of 1/100 are used to simplify the measurement process. The supply voltages of the test blocks are directly accessible to measure the total power consumption of each block using HP4156A Semiconductor Analyzer. An internal replica bias circuit has been applied to control the voltage swing at the output of the gates, as described in Section II-D, ensuring a minimum output swing of 100 mV. The die-to-die variation of the gate bias voltage ( $V_{\rm BP}$  in Fig. 5) required to ensure a fixed voltage swing of 150 mV at a given tail current was found to be less than  $\pm 8\%$ , in conventional 0.18  $\mu$ m CMOS technology.

Fig. 9 illustrates the measured oscillation frequency of an eight-stage ring oscillator with differential STSCL NAND gates (which are constructed based on two-input MUX) in comparison to the simulation results. The conventional CMOS oscillator used for comparison is built with two-input standard NAND gates in the same 0.18  $\mu$ m CMOS technology with driving strength of  $\times$  1. As depicted in this figure, the measurement results of the STSCL oscillator are very close to the simulation results, and consistent over a range of several orders of magnitude. Meanwhile, PDP is very well predictable by (12). This figure also shows the results for the CMOS ring oscillator, operating in subthreshold regime with different supply voltage values between 0.1 and 0.4 V.

The divide-by-8 circuit has been realized using the sourcecoupled latch structure as shown in Fig. 10. Since all transistors operate in weak inversion, the device dimensions can be kept close to minimum size. The measured maximum operating (input) frequency of the divider is plotted against power dissipation in Fig. 11(a) at  $V_{\rm DD} = 0.4$  V and  $V_{\rm DD} = 1.0$  V, comparing the results with the performance of an optimized CMOS frequency divider operating in subthreshold regime. While the



Fig. 10. (a) STSCL latch circuit schematic, and (b) the topology of the divideby-8 circuit used for measurement, consisting of three D-flip-flop (DFF) stages.

CMOS divider cannot sustain correct operation below 200 mV supply voltage, the SCL divider with the bulk-drain connected pMOS load continues its operation down to 10 pA/Gate of tail current, and 3 kHz of input frequency. The resulting (measured) PDP corresponds to less than 1 fJ/Gate.

To compare the performance of the STSCL gates at scaled technology nodes, the maximum operating frequency of a divide-by-8 circuit has been simulated using technology parameters for 90 nm, 130 nm, and 180 nm CMOS processes [Fig. 11(b)]. Here, it is assumed that the DFF gates are loaded with the same amount of interconnect capacitance, and all leakage components are taken into account. It can be seen that the STSCL frequency divider exhibits very similar performance in different technology nodes. It is possible to reduce the tail bias current of the circuit down to 10 pA in a controlled manner both in 130 nm and 90 nm technologies, whereas the subthreshold leakage current would be very difficult to limit in conventional CMOS logic circuits.

Considering the results presented in Figs. 9 and 11, it can be observed that the STSCL solution can successfully extend



Fig. 11. (a) Measured maximum frequency of operation versus power dissipation of the divide-by-8 frequency divider shown in Fig. 10 for  $V_{\rm DD} = 0.4$  V, and 1.0 V. (b) Simulated maximum operating frequency of STSCL divider in different technologies (CMOS 90 nm, 130 nm, and 180 nm).

the range of operation by two orders of magnitude along the power axis, and by about one order of magnitude along the frequency axis, while allowing completely separate control of voltage swing and power dissipation.

#### B. Carry-Save Multiplier Using SCL Gates

To illustrate the use of the proposed circuit topology for more complex functions, a second test chip containing an  $(8 \times 8)$  bit parallel carry–save multiplier has been designed and fabricated using 0.18  $\mu$ m CMOS technology (Fig. 12). Fig. 13 shows the measured input-to-output delay of the STSCL-based multiplier, operating at  $V_{\rm DD} = 0.3$  V, 0.4 V, and 1.0 V, in comparison to the simulation results. It can be seen that the performance of the STSCL multiplier is accurately predicted by the simulations. The supply voltage can be reduced to 0.3 V while the circuit remains operational over a very wide range of tail bias current. The saturation behavior of the delay at higher bias currents is mainly due to the limited swing of the replica bias circuit that is



Fig. 12. Photomicrograph of the measured STSCL-based  $8 \times 8$  bit carry–save multiplier.

used to produce the proper gate voltage for the pMOS load devices. To illustrate the independent control of the delay and the voltage supply, the PDP versus the delay of the STSCL multiplier circuit is plotted in Fig. 14 for different bias current levels, and compared with the variation of PDP of an equivalent CMOS multiplier circuit, also operating in subthreshold regime. In this example, the power supply voltage and the output voltage swing of the STSCL circuit is kept at 0.35 V and 0.15 V, respectively, resulting in nearly constant PDP of less than 1 pJ over the entire operating range. The PDP of the CMOS circuit, on the other hand, varies significantly with  $V_{\rm DD}$ , due to the quadratic dependence of PDP on  $V_{\rm DD}$ , and increasing dominance of leakage at low  $V_{\rm DD}$  values.

# C. Compound SCL Gates to Improve Power–Delay Performance

Using STSCL topology, the power consumption of a functional block is directly proportional to the number of logic gates to be biased with a tail current. Therefore, implementing more complex logic functions in a single stage SCL gate can be expected to result in smaller number of gates and hence, reduced power consumption. In this approach, since the time constant at the common source nodes (i.e.,  $\tau_{\rm CS} \simeq g_m/C_{\rm CS}$ , in which  $C_{\rm CS}$  indicates the parasitic capacitance in each common source node) is much smaller than the time constant at the output node ( $\tau$  as shown in (1)), the speed degradation due to the stacking will be negligible for  $N << \tau/\tau_{\rm CS}$  (N indicates the number of stacked stages in nMOS switching network) where

$$\tau/\tau_{\rm CS} = g_m R_L \cdot \frac{C_L}{C_{\rm CS}}.$$
(17)

Fig. 15(a) shows a unit cell which is required to implement the carry–save multiplier [24]. This unit block consists of a two-input AND gate and a full-adder (FA), and it can be implemented by two separate SCL gates, as shown in Fig. 15(b). Alternatively, Fig. 15(c) shows an STSCL gate implemented by merging two logic functions of AND and XOR on a single



Fig. 13. Measured total propagation delay of the proposed STSCL multiplier versus tail bias current  $(I_{\rm SS})$  for different supply voltages and in comparison to the simulation results.



Fig. 14. Comparing the power-delay product versus delay for two  $8 \times 8$  bit carry-save multiplier circuits built with conventional CMOS and STSCL components.

branch and realizing the compound logic operation  $S\langle n \rangle =$  $(A\langle i \rangle \cdot B\langle j \rangle) \oplus C\langle n-1 \rangle \oplus S\langle n-1 \rangle$ . Using the merged STSCL gate topology [Fig. 15(c)] results in a significant improvement of the power-frequency performance of the  $8 \times 8$  multiplier, as illustrated in Fig. 16(a). The multiplier in this example is built out of 56 adders and 64 AND gates (total number of gates is 120), of which 49 can be merged with the corresponding adder as described above. This modification alone results in approximately 40% power reduction. In the general case of an  $N \times N$ multiplier, the total number of gates is N(2N - 1), and it is possible to merge  $(N-1)^2$  AND gates with adders, resulting in almost 50% power reduction for higher N values. In addition to the obvious reduction of tail currents, the merging of AND gates with adders also reduces the layout area of the unit cell, and hence, lowers the parasitic capacitance due to wiring. Finally, the operating frequency is further increased by reducing the overall logic depth, resulting in about 80% total improvement of speed at iso-power. While the results are difficult to



(c)

Fig. 15. (a) Unit block needed to implement a carry–save multiplier consists of a two-input AND gate and a full-adder (FA) [24]. (b) Possible implementation of the unit block based on STSCL logic (only the part generating  $S\langle n \rangle$  is shown) in which an AND gate is followed by an adder stage. The total current consumption in this case would be  $2 \times I_{SS}$ , while the total delay of this block would be approximately twice of a single STSCL gate. (c) Alternative implementation: Merging the adder and AND functions in a single compound STSCL gate to improve the PDP. All switching nMOS transistors are minimum size devices.

generalize for random logic topologies, the merging of complex logic gates clearly presents a valuable opportunity that can be exploited for improving the power-frequency performance in STSCL circuits.



Fig. 16. (a) Power-frequency improvement that can be achieved in the  $8 \times 8$  carry-save multiplier circuit, by using compound gates as described in Fig. 15(b) Comparison of maximum operation frequency versus power consumption for two identical  $8 \times 8$  carry-save multipliers, implemented with merged STSCL gates ( $V_{\rm DD} = 0.35$  V) and with CMOS gates ( $V_{\rm DD} = 0.2$  V-0.5 V).

Fig. 16(b) compares the maximum frequency of operation for an  $8 \times 8$  bit STSCL carry–save multiplier with merged components (operating at  $V_{\rm DD} = 0.35$  V) and that of a conventional CMOS multiplier, operating at  $V_{\rm DD} = 0.2$  V – 0.5 V. It can be seen that the power–frequency performance of the STSCL circuit is comparable to, and in many cases better than, the CMOS equivalent, over a wide frequency range. The main drawback of the merged-gate approach is a slight increase of the minimum useable supply voltage, since compound gates with more levels typically require a higher supply voltage. However, this is a relatively minor limitation as long as the nMOS network transistors are biased in subthreshold regime.

# D. Shallow Pipelining to Improve Activity Rate

As already discussed in Section III, the power-to-frequency ratio of STSCL circuits (i.e., the power efficiency to operate at



Fig. 17. (a) Section of the parallel multiplier where the signal flow is regulated using two-phase micro-pipelining technique for improving the performance of SCL gates. Note that every FA stage output is followed by a keeper/latch stage. (b) Eye diagram of the output of the multiplier circuit. This plot shows the output after SCL-to-CMOS level converter circuit. Input is a  $2^7-1$  pseudo-random bit stream (PRBS). Here, the period of input data is  $T_p = 1.5 \ \mu s$ ,  $I_{\rm SS} = 10$  nA and  $I_{\rm SS,L} = 100$  pA, i.e., the keeper stages dissipate only 1% of the power dissipated by the FA stages.

a given frequency) can be significantly improved by increasing the activity rate using shallow pipelining and by reducing logic depth, as much as possible. One possibility is to implement



Fig. 18. Power–frequency improvement that can be achieved in the  $8 \times 8$  carry–save multiplier circuit, by using shallow pipelining with keeper-latch stages.

two-phase latch-based pipelining where the output of each gate is latched during one clock phase, and passed on to the next stage during the other clock phase-effectively reducing the maximum logic depth to two consecutive gates. Instead of using explicit latch stages, such two-phase pipelining can be achieved by increasing (and reducing) the source (tail) current bias of alternating stages, using the gate terminal of the tail current bias transistor of each stage as the "clock" input. In this approach, illustrated in Fig. 17 for the example of the carry-save multiplier architecture, the current bias of odd stages is reduced to a low (yet non-zero) level to retain (hold) their output while the current bias of even stages is raised to the nominal operating value to enable evaluation. Very simple cross-coupled "keeper" stages connected to each gate output ensure that the output levels do not degrade significantly during the "hold" phase. Fig. 17(a) shows the circuit topology of an adder (sum generator) stage and the output keeper stage, where the pulsed tail bias achieves a very robust dynamic latching effect, augmented by the output keeper with a tail bias current of 100 pA. In an  $8 \times 8$  bit carry-save multiplier circuit, taking into account the additional power overhead of pipelining (which is 1% only), shallow pipelining using keeper-latch stages will result in an overall improvement of the (P/f) by a factor of 5 (Fig. 18).

The pipelining technique described above can certainly be applied in combination with the gate-merging approach discussed in Section IV-C, to improve the power–frequency performance of subthreshold SCL circuits considerably.

#### V. CONCLUSION

A new approach for implementing ultra-low-power sourcecoupled logic circuits biased in subthreshold regime has been demonstrated. The new topology uses compact high resistance pMOS load devices to provide the required voltage swing at the output for proper logic operation. Measurement results show that the tail bias current of each logic gate can be reduced to less than 10 pA, while the power-delay product of the gate remains less than 1 fJ, using 0.18  $\mu$ m CMOS technology. Robust operation of ring oscillator and frequency divider circuits, as well as more complex logic blocks (8 × 8 bit carry–save multiplier) has been demonstrated over a very wide range of frequencies. Among other advantages, the proposed approach effectively decouples the circuit propagation delay from the operating voltage, resulting in near-constant PDP versus frequency. The bias current of the STSCL gate can be scaled over several decades using the same device dimensions, which makes this circuit topology very suitable for ultra-low-power configurable digital systems.

#### ACKNOWLEDGMENT

The authors would like to thank M. Stanisavljevic, B. Rey, M. Mercaldi, and S. Badel for their valuable contributions in block design and layout, and S. Hauser for preparing the test setup.

#### REFERENCES

- M. Horowitz et al., "Low-power digital design," in Proc. IEEE Int. Symp. Low Power Electronics and Design (ISLPED), 1994, pp. 8–11.
- [2] D. Suvakovic and C. A. T. Salama, "A low V<sub>t</sub> CMOS implantation of an LPLV digital filter core for portable audio applications," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 47, no. 11, pp. 1297–1300, Nov. 2000.
- [3] G. Gielen, "Ultra-low-power sensor networks in nanometer CMOS," in *Int. Symp. Signals, Circuits and Systems (ISSCS)*, Jul. 2007, vol. 1, pp. 1–2.
- [4] B. A. Warneke and K. S. J. Pister, "An ultra-low energy microcontroller for smart dust wireless sensor networks," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2004, pp. 316–317.
- [5] L. S. Wong *et al.*, "A very low-power CMOS mixed-signal IC for implantable pacemaker applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2446–2456, Dec. 2004.
- [6] E. Vittoz, "Weak inversion for ultimate low-power logic," in *Low-Power Electronics Design*, C. Piguet, Ed. Boca Raton, FL: CRC Press, 2005.
- [7] C. Enz and E. Vittoz, Charge-Based MOS Transistor Modeling: The EKV Model for Low-Power and RF IC Design. New York: Wiley, 2006.
- [8] C. Enz, F. Krummenacher, and E. Vittoz, "An analytical MOS transistor model valid in all regions of operation and dedicated to lowvoltage and low-current applications," *Analog Integr. Circuits Signal Process. J.*, vol. 8, pp. 83–114, Jun. 1995.
- [9] C. Enz, M. Punzenberger, and D. Python, "Low-voltage log-domain signal processing in CMOS and BiCMOS," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 46, no. 3, pp. 279–289, Mar. 1999.
- [10] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 9, pp. 1778–1786, Sep. 2005.
- [11] B. H. Calhoun and A. Chandrakasan, "Ultra-dynamic voltage scaling (UDVS) using subthreshold operation and local voltage dithering," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 238–245, Jan. 2006.
- [12] R. Amirtharajah and A. Chandrakasan, "A micropower programmable DSP using approximate signal processing based on distributed arithmetic," *IEEE J. Solid-State Circuits*, vol. 39, no. 2, pp. 337–347, Feb. 2004.
- [13] H. Soeleman, K. Roy, and B. C. Paul, "Robust subthreshold logic for ultra-low-power operation," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 9, no. 1, pp. 90–99, Sep. 2001.
- [14] A. Chandrakasan and R. Brodersen, "Minimizing power consumption in digital CMOS circuits," *Proc. IEEE*, vol. 83, no. 4, pp. 498–523, Apr. 1995.
- [15] J. M. Musicer and J. Rabaey, "MOS current mode logic for low power, low noise CORDIC computation in mixed-signal environment," in *Proc. IEEE Int. Symp. Low Power Electronics and Design (ISLPED)*, 2000, pp. 102–107.

- [16] S. Badel and Y. Leblebici, "Breaking the power-delay tradeoff: Design of low-power high-speed MOS current-mode logic circuits operating with reduced supply voltage," in *Proc. IEEE Int. Symp. Circuits and Systems (ISCAS)*, May 2007, pp. 1871–1874.
- [17] M. Alioto and G. Palumbo, Model and Design of Bipolar and MOS Current-Mode Logic (CML, ECL and SCL Digital Circuits). New York: Springer, 2005.
- [18] E. Brauer and Y. Leblebici, "Semiconductor based high-resistance device and logic application," European Patent Application No. 07104895.3-1235, Mar. 26, 2007.
- [19] A. Tajalli, E. Vittoz, Y. Leblebici, and E. J. Brauer, "Ultra low power subthreshold MOS current mode logic circuits using a novel load device concept," in *Proc. European Solid-State Circuits Conf.* (*ESSCIRC*), Munich, Germany, Sep. 2007, pp. 281–284.
- [20] F. Cannillo and C. Toumazou, "Nano-power subthreshold current-mode logic in sub-100 nm technologies," *IEE Electron. Lett.*, vol. 41, no. 23, pp. 1268–1269, Nov. 2005.
- [21] F. Cannillo, C. Toumazou, and T. S. Lande, "Bulk-drain connected load for subthreshold MOS current-mode logic," *IEE Electron. Lett.*, vol. 43, no. 12, pp. 662–664, Jun. 2007.
- [22] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, Analysis and Design of Analog Integrated Circuits, 4th ed. New York: Wiley, 2000.
- [23] M. Alioto and G. Palumbo, "Power-aware design techniques for nanometer MOS current-mode logic gates: A design framework," *IEEE Circuits Syst. Mag.*, vol. 6, no. 4, pp. 40–59, 2006.
- [24] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits: A Design Perspective*. New York: Prentice-Hall, 2003.



Yusuf Leblebici (M'90–SM'98) received the B.S. and M.S. degrees in electrical engineering from Istanbul Technical University, Istanbul, Turkey, in 1984 and 1986, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign in 1990.

From 1991 to 1993, he was Visiting Assistant Professor of electrical and computer engineering at the University of Illinois at Urbana-Champaign. From 1993 to 1998, he was on the faculty of Istanbul Technical University as Associate Professor of

electrical engineering. He was Associate Professor of electrical and computer engineering at Worcester Polytechnic Institute (WPI) in Massachusetts between 1998 and 2001, where he established and directed the VLSI Design Laboratory, and served as Project Director at the New England Center for Analog and Mixed-Signal IC Design. From 2000 to 2001, he also took the responsibility of developing the microelectronics degree program at Sabanci University, as the Microelectronics Program Coordinator. Since 2002, he has been a Chair Professor at the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland, and Director of Microelectronic Systems Laboratory. His research interests include design of high-speed CMOS digital and mixed-signal integrated circuits, computer-aided design of VLSI systems, intelligent sensor interfaces, modeling and simulation of semiconductor devices, and VLSI reliability issues. He is the coauthor of two textbooks, Hot-Carrier Reliability of MOS VLSI Circuits (Kluwer Academic Publishers, 1993) and CMOS Digital Integrated Circuits: Analysis and Design (McGraw Hill, 1996, 1998, and 2002), as well as more than 150 scientific articles published in international journals and conferences.

Dr. Leblebici has been on the organizing and steering committees of several international conferences in microelectronics. He served as an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II between 1998 and 2000, and as an Associate Editor of IEEE TRANSACTIONS ON VLSI between 2001 and 2003. He received the Young Scientist Award of the Turkish Scientific and Technological Research Council in 1995, and the Joseph Samuel Satin Distinguished Fellow Award of the Worcester Polytechnic Institute in 1999.



**Armin Tajalli** (S'04) received the B.S. and M.S. degrees (Hons.) in electrical engineering from Sharif University of Technology, Tehran, Iran, and Tehran Polytechnic University in 1997 and 1999, respectively, and the Ph.D. degree from Sharif University of Technology in 2006 (Hons.).

From 1998 to 2004, he was with Emad Semicon as a Senior Analog Design Engineer. In 2006, he joined Microelectronic Systems Laboratory (LSM) in Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, working on ultra-low-power circuit

design techniques.

Dr. Tajalli received the award of the Best Design Engineer from Emad Semicon, 2001, the Kharazmi Award on Research and Development, 2002, and the Presidential Award of the best Iranian researchers, 2003.



**Elizabeth J. Brauer** (M'94) received the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign in 1994.

She is presently an Associate Professor of Electrical Engineering at Northern Arizona University, Flagstaff. She has worked for Motorola and Fairchild Semiconductor, and taught at the University of Kentucky. Her technical interests are in computer-aided design, verification, and testing of integrated circuits, microelectronics and biomimetic circuits.



Eric Vittoz (M'72–SM'87–F'89–LF'04) received the electrical engineering degree from Polytechnical School University of Lausanne, Switzerland, in 1961 and the Ph.D. degree from EPFL (Swiss Institute of Technology Lausanne) in 1969.

He joined the Watchmakers Electronic Center (CEH) in 1962 as a member of the team that developed the first quartz watch. He became head of the Advanced Circuit Department at CEH in 1967 and was appointed Vice Director and head of the Applications Division in 1971. In 1984, he took the

responsibility of the Circuits and Systems Research Division of the newly founded CSEM (Swiss Center for Electronics and Microtechnology), where he was appointed Executive Vice-President in 1991, head of Integrated Circuits and Systems, then head of Advanced Microelectronics after 1997. Since 2004, he is retired from CSEM after spending three years of partial retirement as a Fellow researcher. Since 1975, he has been teaching analog circuit design, and supervising undergraduate and graduate student projects at EPFL,where he became Professor in 1982. He has authored or co-authored more than 140 papers and holds 26 patents in the fields of very low-power microelectronics, compact transistor modeling, analog CMOS circuit design and biology-inspired analog VLSI.

Dr. Vittoz has been involved in the formation of the IEEE Solid-State Circuits Society and was a member of its AdCom from 1996 to 1999. He was a member of the European Program Committee of ISSCC from 1977 to 1989, and during more than 25 years, a member of the Steering Committee of ESSCIRC, the European Solid-State Circuits Conference. A Life Fellow of the IEEE, he is the recipient of the 2004 IEEE Solid-State Circuits Award.