# A Slew Controlled LVDS Output Driver Circuit in 0.18 $\mu$ m CMOS Technology

Armin Tajalli, Student Member, IEEE, and Yusuf Leblebici, Senior Member, IEEE

Abstract—This article presents a power-efficient low-voltage differential signaling (LVDS) output driver circuit. The proposed approach helps to reduce the total input capacitance of the LVDS driver circuit and hence relaxes the tradeoffs in designing a low-power pre-driver stage. A slew control technique has also been introduced to reduce the impedance mismatch effect between the output driver circuit and the line. The pre-driver stage shows a total input capacitance of 50 fF and also controls the voltage swing and common-mode voltage at the input of the LVDS driver output stage. This makes the operation at low supply voltages using a conventional 0.18  $\mu$ m CMOS technology feasible. The output driver circuit consumes 4.5 mA while driving an external 100  $\Omega$ resistor with an output voltage swing of  $V_{OD} = 400$  mV, achieving a normalized power dissipation of 3.42 mW/Gbps. The area of the LVDS driver circuit is 0.067 mm<sup>2</sup> and the measured output jitter is  $\sigma_{rms}$  = 4.5 ps. Measurements show that the proposed LVDS driver can be used at frequencies as high as 2.5 Gbps where the speed will be limited by the load RC time constant.

Index Terms-CMOS integrated circuits, current-mode logic (CML), low-voltage differential signaling (LVDS), output driver, source-coupled logic (SCL).

#### I. INTRODUCTION

TIGH performance serial transmitters and receivers are key components in modern chip-to-chip interconnections. To provide a high density link, tens or hundreds of such circuits are typically integrated on a single chip [1], [2]. Therefore, power consumption, crosstalk, and the integration density are emerging as the key design issues for implementing these building blocks. These issues underline the existing challenges in design of fully differential low-power CMOS transmitters and receivers.

This paper introduces a power-efficient output driver (OD) circuit based on low-voltage differential signaling (LVDS) standard [3], [4]. This standard has been developed for high-performance chip-to-chip interconnections with the advantage that it can be applied in very high data rates. Limited voltage swing as well as differential signaling scheme in this standard helps to achieve very low noise generation with a good immunity to the noise. Based on the LVDS requirements, the circuit should be able to drive an external 100  $\Omega$  termination resistor with a voltage swing of  $V_{OD} = 247$  to 454 mV (Fig. 1). Therefore,

The authors are with the Microelectronic Systems Laboratory (LSM), Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland (e-mail: armin.tajalli@epfl.ch, yusuf.leblebici@epfl.ch).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2008.2010788

a large amount of current should be driven to the load by the output driver circuit, which makes the driver circuit design in low supply voltages very challenging. Meanwhile, the common mode voltage of the output signal should remain within the range of  $V_{OS} = 1.125$  to 1.375 V [4]. Thus, it is necessary to balance these conflicting requirements and at the same time ensure the correct current switching (or voltage switching [5]) at the output stage.

Recently demonstrated LVDS drivers operating at reduced supply voltages either rely on a diminished output swing [6], [7], or sacrifice the fully differential topology [8]. It is also necessary to apply proper termination schemes to avoid fast transitions or ringing caused by incomplete impedance matching in different points of the transmission system [5], [6]. The fast transitions contain high-frequency components that directly affect the electromagnetic radiation (EMR) created by the signal and may create electromagnetic compatibility (EMC) problems [3], [9]. On the other hand, since the total input capacitance of the output driver circuit could be very high, a pre-driver circuit is generally necessary to reduce the total input capacitance.

In this work, the entire circuit including the pre-driver circuit is designed based on a fully differential topology to reduce the current spikes on supply lines and have a low sensitivity to the supply and substrate injecting noise. Meanwhile, a new technique will be introduced to reduce the total input capacitance of the LVDS output driver circuit and at the same time to compensate the impedance mismatch effect and hence keeping the total power consumption low. In the following, after a brief review and discussion on the existing topologies in Section II, the proposed LVDS driver circuit will be described in Section III and then the measurement results will be shown in Section IV.

#### **II. LVDS DRIVER CHARACTERISTICS**

Fig. 1 shows a general view of an output buffer circuit followed by the output driver stages. In this configuration, the predriver (PD) stage isolates the internal circuitry from the output driver circuit which can have a very high input capacitance. The total power dissipation in PD stage depends on the speed of operation ( $f_D = 1/T_D$  or input data rate) and total input capacitance of the OD stage  $(C_{in,OD})$ . Therefore, to have a low-power PD stage, it is necessary to keep the  $C_{in,OD}$  as low as possible. The OD stage may switch a differential current or voltage to the output [10]. In this figure, it is assumed that  $Z_T$  provides the necessary matching properties between the OD and external circuitry. An extended discussion on different types of LVDS drivers can be found in [5].

Manuscript received September 05, 2007; revised August 14, 2008. Current version published January 27, 2009.



Fig. 1. Complete output driving circuit including pre-driver stage and matching circuitry  $(Z_T)$ .

Fig. 2 shows two common topologies that can be utilized as an LVDS driver. In very high-frequency applications, source coupled logic (SCL)-based circuits (Fig. 2(a)) are generally used to drive the pads and external components [11]. In this circuit topology,  $R_{CM}$  and  $I_{CM}$  are used to adjust the output commonmode voltage while  $R_L$  is the load resistance and is also acting as the internal termination resistance. By proper control on the biasing, it is possible to apply this topology as an output driver with compatibility to the LVDS standard requirements. Due to the internal termination resistors, this circuit shows a very good performance from impedance matching point of view. However, it dissipates a relatively high amount of power. Indeed, this circuit draws  $I_{SS} = 4 \times V_{sw,T}/R_T$  from the supply voltage to drive the output termination resistance  $(R_T)$  for a voltage swing of  $V_{sw,T}$ . The total input capacitance of this circuit can be estimated approximately as

$$C_{\text{in},SCL} \approx C_{qs} + \gamma_M \cdot C_{qd} \tag{1}$$

in which  $\gamma_M$  stands for the Miller effect ( $\gamma_M \approx V_{sw,out}/V_{sw,in}$ ),  $C_{gs} \approx C_{ox} \cdot W \cdot (L_{min} + L_{ov})$  (W and  $L_{min}$  are the effective width and length of M1-M2 in Fig. 2(a), and  $L_{ov}$  indicates the gate-drain and gate-source overlap length), and  $C_{gd} \simeq C_{ox} \cdot W \cdot L_{ov}$ . Here, it is assumed that for all MOS devices, the minimum possible device length ( $L_{min}$ ) is selected. Then, regarding the required bias current in this stage, the total input capacitance would be

$$C_{\text{in},SCL} \simeq 8 \times \frac{C_{ox} \cdot L_{\min}^2}{k'_n} \\ \cdot \frac{V_{sw,\text{out}}}{V_{sw,\text{in}}^2} \cdot \gamma_{sat}^2 \cdot \frac{1}{R_T} \cdot \left(1 + (1 + \gamma_M) \cdot \frac{L_{ov}}{L_{\min}}\right) \quad (2)$$

where  $V_{sw,in}$  and  $V_{sw,out}$  are voltage swings at the input and output of this circuit,  $k'_n = \mu_n \cdot C_{ox}$ , and  $\mu_n$  indicates the effective electron mobility in NMOS devices. To derive (2), it is assumed that the drain-source saturation voltage of all MOS devices ( $V_{DSsat}$ ) is  $\gamma_{sat}$  times smaller than the input voltage swing or:  $V_{sw,in} = \gamma_{sat} \cdot V_{DSsat}$ . This input capacitance should be driven by the PD stage and hence a larger capacitance at the



Fig. 2. (a) SCL-based buffer circuit. (b) Conventional LVDS driver circuit [4].



Fig. 3. Simulated impulse response of a practical line including non-ideality effects (solid line that uses distributed model for line) in comparison to the response of an ideal line which includes only a termination resistor and parasitic capacitor (dashed line). In both cases:  $\Delta I_{\rm OUT} = 8 \,\mathrm{mA}$ ,  $R_T = 100 \,\Omega$ .

input means that more power will be consumed by the PD stage to drive the OD at the desired speed of operation.

To reduce the power consumption, it is possible to use the LVDS driver topology shown in Fig. 2(b) [4], [10]. In this case one fourth of the bias current of an SCL-based driver is sufficient (i.e.,  $I_{SS} = V_{sw,T}/R_T$ ) to achieve the same voltage swing at the output.

In this circuit, the total input capacitance is

$$C_{\text{in},LVDS} \simeq 2 \times \frac{C_{ox} \cdot L_{\min}^2}{k'_n \cdot \eta} \cdot \frac{V_{sw,\text{out}}}{V_{sw,\text{in}}^2} \cdot \gamma_{sat}^2 \cdot \frac{1}{R_T} \cdot \left(1 + (1 + \gamma_M) \cdot \frac{L_{ov}}{L_{\min}}\right) \quad (3)$$

where  $\eta = k'_p / (k'_p + k'_n)$ ,  $k'_p = \mu_p \cdot C_{ox}$ , and  $\mu_p$  indicates the effective hole mobility in PMOS devices and it is assumed that the drain-source saturation voltage  $(V_{DSsat})$  is the same for all NMOS and PMOS devices.

Comparing (2) and (3), it can be seen that  $C_{in,LVDS}$  can be as high as  $C_{in,SCL}$ , due to the additional gate capacitance of the PMOS devices (see Fig. 2(b)) (in this technology  $\eta \approx 1/4$ ). Since the power consumption in the PD stage is proportional to the input capacitance of the OD stage, the PD stage in both cases would dissipate almost the same amount of power. As shown in [5], using an internal termination resistor in the topology of Fig. 2(b) would increase the bias current of this stage by a factor of as high as two (i.e.,  $I_{SS} = 2 \times V_{sw}/R_T$ ). This will also increase the total input capacitor by almost the same factor, accordingly.

In Section III, a technique for avoiding internal termination, and hence keeping the power consumption of the LVDS driver shown in Fig. 2(b) low, will be introduced.

# III. PROPOSED CIRCUIT

## A. Topology Description

Due to the several non-idealities such as non-ideal transmission line characteristics, imperfect termination, and pad para-



Fig. 4. Compensating the step response of the driver circuit: (a) a simple driver model, (b) controlling the output voltage slew by introducing some delay [6]. All  $G_m$  cells are in practice non-linear current switches as shown in Fig. 2(b).

sitic effects, voltage spikes or ringing can be observed at the output of LVDS driver. The typical impulse response of a practical line is shown in Fig. 3. As illustrated in this figure, the pulse response in presence of line parasitics shows a large peaking followed by ringing.

To achieve an acceptable output waveform, either internal termination should be applied (which increases the power consumption considerably), or the output voltage slew should be controlled to compensate for the effect of impedance mismatch between the OD stage and its load. Fig. 4(a) shows the simplified topology of a differential driver where  $G_m$  cells are implemented with non-linear differential pair MOS current switches as shown in Fig. 2(b). Based on this figure, as long as the input voltage swing is higher than  $V_{SW,min}$ , the tail current will be switched between the two branches and will be delivered to the load. As explained before, any impedance mismatch between the OD and line will cause some reflection and hence it exhibits overshoot and undershoot at the output. Fig. 4(b) introduces a possible remedy to control the output slew in fast transitions and thus, to control the overshoot at the output of the OD circuit [6]. Based on this approach, a part of the output current is delivered to the load by a delay through  $G_{m2}$  while the total current delivered to the output load remains unchanged. In this configuration, the total transconductance is

$$G_{m}(s) = \frac{I_{\text{OUT}}}{V_{\text{IN}}} = (G_{m1} + G_{m2}) \cdot \frac{1 + s \cdot R_{D}C_{D} \cdot \frac{G_{m1}}{(G_{m1} + G_{m2})}}{1 + s \cdot R_{D}C_{D}}.$$
 (4)

Based on (4), the zero of the transfer function (i.e.,  $|z| = (G_{m1} + G_{m2})/(R_D C_D G_{m1})$ ) is larger than its pole (i.e., |p| =



Fig. 5. Compensating the line transfer characteristics using the configuration shown in Fig. 4(b). Here,  $C_L = 1$  pF,  $C_P = 1$  pF,  $L_S = 4$  nH,  $C_D = 1$  pF,  $R_D = 160 \ \Omega$ ,  $G_{m1}/(G_{m1} + G_{m2}) = 0.625$ . The circuit transfer characteristic is normalized to  $(G_{m1} + G_{m2})R_T$ .

 $1/(R_D C_D)$ ). Hence, this topology has a lower transconductance at high frequencies (or equivalently in the fast transitions) which is helpful to control the output voltage slew. Using a simplified model for the line (see Fig. 5 inset), it is possible to show that the transfer function of the system is

$$\frac{V_{\rm OUT}(s)}{V_{\rm IN}(s)} = \frac{(G_{m1} + G_{m2}) + G_{m1}R_DC_Ds}{a_0 + a_1s + a_2s^2 + a_3s^3 + a_4s^4}$$
(5)

in which

$$a_0 = \frac{1}{R_T}$$

$$a_1 = C_P + C_L + \frac{R_D C_D}{R_T}$$

$$a_2 = R_D C_D (C_P + C_L) + \frac{L_S C_P}{R_T}$$

$$a_3 = L_S C_L C_P + \frac{R_D C_D C_P L_S}{R_T}$$

$$a_4 = C_L R_D C_D C_P L_S.$$

By properly choosing the position of the pole and zero of the  $G_m(s)$  as well as the ratio of  $G_{m1}/(G_{m1} + G_{m2})$  with respect to the line specifications, it is possible to reduce the amount of overshoot at the output. It is not possible to derive a closed form solution to find the proper values for these parameters, however, it is possible to simplify (5) with respect to the line specifications and find the proper design parameters. As an approximation, the peak value of the line impedance occurs in  $\omega_n \approx \sqrt{(C_P + C_L)/(L_S C_P C_L)}$  and is equal to the  $Q \approx \omega_n L_S/R_T$ . Therefore, one can set  $|G_m(j\omega_n)| \cdot Q \simeq G_m(j0) \cdot R_T$  to reduce the peaking in the transfer function as shown in Fig. 5. It is worth noticing that  $Q \propto 1/C_P$  and for  $C_P \ll C_L$  the quality factor of the system can be very large.

As described in the previous section, the other issue associated with the circuit topologies of Figs. 2 and 4 is their relatively high input capacitance that makes the design of the pre-driver stage difficult. To alleviate this issue, the circuit topology of Fig. 6(a) can be applied. Here, only the first part of the transconductance (i.e.,  $G_{m1}$ ) is driven by the input voltage,  $V_{IN}$ , so the total input capacitance would be much smaller than in the previous approach. This makes the design of low-power pre-driver buffers much simpler. In this case, the second part of the current will be provided by the cross coupled transconductor of  $G_{m2}$ . As shown in Fig. 4(b),  $R_D$  and  $C_D$  can provide the required delay to control the output overshoot. In this configuration, switching large enough current by  $G_{m1}$  produces a large voltage swing at the output that switches the current flow in  $G_{m2}$ and consequently the total required amount of current will be delivered to the output load. After this switching takes place,  $G_{m2}$  in Fig. 6(a) operates as a current source. The behavioral analysis of this circuit is very complicated. Assuming simplified linear models for the transconductors and the line as shown in Fig. 6(a) and (b), the system transfer function can be represented as

$$\frac{V_{\rm OUT}(s)}{V_{\rm IN}(s)} = \frac{G_{m1} \cdot (1 + R_D C_D s)}{b_0 + b_1 s + b_2 s^2 + b_3 s^3 + b_4 s^4} \tag{6}$$

in which

$$b_{0} = \frac{1}{R_{L}} - G_{m2}$$

$$b_{1} = C_{D} + C_{P} + C_{L} + \frac{(R_{D}C_{D} - G_{m2}L_{S})}{R_{T}}$$

$$b_{2} = R_{D}C_{D}(C_{P} + C_{L})$$

$$+ L_{S}\left(\frac{C_{D}}{R_{T}} + \frac{C_{P}}{R_{T}} - G_{m2}C_{L}\right)$$

$$b_{3} = L_{S}C_{L}(C_{P} + C_{D}) + \frac{R_{D}C_{D}C_{P}L_{S}}{R_{T}}$$

$$b_{4} = C_{L}R_{D}C_{D}C_{P}L_{S}$$

This equation is valid only in transition before  $G_{m2}$  completely switches. Regarding (6), to make sure that the cross-coupled device at the output (i.e.,  $G_{m2}$ ) will not cause instability, it is necessary that

$$G_{m2} < \frac{1}{R_T}.$$
(7)

Fig. 6(b) shows the small signal model for the proposed driver circuit and the load. To have a better understanding, the Norton equivalent circuit of the driver is shown. During transitions, capacitors exhibit very low impedance while inductors exhibit a very high impedance. Therefore, in each transition the load that is seen by the transconductor  $G_{m1}$  would be approximately equal to  $R_D$ . Hence, by proper choice of  $G_{m1}R_D$  and also time constant of  $\tau_D = R_DC_D$  it is possible to limit the output voltage swing in presence of transition. As a complementary effect of compensating the line characteristics, the second part of the current will be delivered to the output by  $G_{m2}$  with a delay determined by  $\tau_D$ . In other words, it can be shown that the maximum value of the transfer characteristics shown in (6) occurs in

$$\omega_n \approx \sqrt{\frac{b_1}{b_3}}$$
$$\approx \sqrt{\frac{C_L + C_P}{L_S C_L C_P}} \cdot \left(1 - \frac{G_{m2} L_S}{R_L C_L + R_L C_D + R_D C_D}\right)$$
$$\approx \frac{1}{\sqrt{L_S C_P}}.$$
(8)

Generally  $C_P$  (internal parasitic capacitance) is much smaller than the external parasitic capacitance  $C_L$ . Increasing  $C_P$  can



Fig. 6. (a) Proposed topology to control the output slew (all  $G_m$  cells are in practice non-linear current switches as shown in Fig. 2(b)). (b) Small signal model of the driver and line. Here, the Norton equivalent circuit of the cross-coupled transconductor is shown. (c) Line transfer characteristics and the compensated system transfer function after compensation using the proposed topology.  $V_M$  stands for voltage at the output of the OD circuit. Here:  $C_L = 1$  pF,  $C_P = 0.5$  pF,  $L_S = 4$  nH,  $C_D = 0.5$  pF,  $R_D = 100 \Omega$ ,  $r_f = G_{m1}/(G_{m1} + G_{m2}) R_T$ .

help to reduce the ringing at the output, however, larger  $C_P$  reduces the total bandwidth in the signal path which is not desirable. Assuming  $C_L \gg C_P$ :

$$|H(j\omega_n)| \approx G_{m1}R_T \frac{\sqrt{1 + (R_D C_D \omega_n)^2}}{\left|\frac{C_D}{C_P} + \frac{C_D R_D R_T}{L_S} + G_{m2}R_T \left(1 - \frac{C_L}{C_P}\right)\right|}.$$
(9)

Therefore, by equating  $|H(j\omega_n)| \approx |H(j0)|$  it is possible to control the peaking at the output. An example for compensating the peaking in the line transfer characteristics is shown in

Fig. 6(c). For a large enough voltage swing in  $V_M$ , the tail bias current of transconductor  $G_{m2}$  switches completely to one branch and hence  $G_{m2}$  will act as a simple current source. This means that after switching,  $Z_C$  (in Fig. 6(c)) can be replaced by a constant current source.

In high frequency applications, the overshoot in the transfer characteristics of the output driver can be created deliberately to increase the bandwidth of the circuit. This issue can be also taken into account in the design techniques introduced in this Section. Indeed, the goal here is to reduce the peaking in the transfer function of the entire system and have minimum ringing on the load side (i.e., at the output node  $V_T$ ). At the same time, as shown in Fig. 6(c), the peaking on  $V_M/V_{\rm IN}$  can be maintained to have a good performance in high frequencies.

Meanwhile, the low-pass filter at the input of  $G_{m2}$  constructed by  $R_D$  and  $C_D$  will reduce the transition time at this node. Therefore, because of smaller voltage slew at the input of  $G_{m2}$ , the total jitter due to this element at the output will increase [12]. Hence, the values of  $R_D$  and  $C_D$  in Fig. 6(a) should be selected very carefully to have minimum transition time degradation at the input of  $G_{m2}$ . This implies that when there is little or no peaking in the line transfer characteristics, the value of  $\tau_D = R_D C_D$  should be selected as small as possible to reduce the contribution of  $G_{m2}$  to the output jitter.

## B. Circuit Structure

Fig. 7 shows the driver circuit implemented based on Fig. 6. In this figure,  $M_{PDO}$  and  $M_{NDO}$  are implementing the  $G_{m1}$  while  $M_{PDX}$  and  $M_{NDX}$  are implementing the cross-coupled transconductor of  $G_{m2}$ . The input capacitance of these transistors is large enough to implement the  $C_D$  in Fig. 6(a).

The entire circuit including the pre-driver stage benefits from a fully differential topology which results in very good power supply rejection (PSR) as well as very low current spikes on power supply lines. As shown in Fig. 8, the proposed circuit exhibits a very smooth and fast settling time while driving the realistic line. The step response of the circuit compares favorably with respect to the ideal RC line case (or the case where  $L_S \approx 0$  and hence there is no problem due to the near end impedance mismatch). Indeed, using the set of element values shown in Fig. 8 and if there would be no peaking in the line transfer function, then the transient response will be similar to the transient response shown in Fig. 8 (dashed line). However, because of avoiding internal termination resistance in the proposed topology, the peaking will always occur. Meanwhile, the size of  $R_D$  and  $C_D$  are selected with respect to the line specifications (as explained in Section III-A). It means that if there is no peaking in the transfer characteristics, then very small  $R_D$ and  $C_D$  could be selected in order to have a proper transient response similar to Fig. 8 (solid line).

As explained earlier, utilizing the cross-coupled switches at the output helps us to reduce the total input capacitance of the output stage. Defining:

$$r_{f} = \frac{W_{M_{NDO}}}{W_{M_{NDO}} + W_{M_{NXO}}} = \frac{W_{M_{PDO}}}{W_{M_{PDO}} + W_{M_{PXO}}} = \frac{G_{m1}}{G_{m1} + G_{m2}}$$
(10)



Fig. 7. Proposed output driver stage and common-mode feedback circuit.



Fig. 8. Simulated step response of the proposed LVDS driver with realistic line model (solid line) in comparison to the ideal termination case (dashed curve).

Fig. 9 shows how much the input capacitance can be reduced by reducing the  $r_f$ . The minimum possible value for  $r_f$  depends on the line specifications and it should be high enough to make sure that by switching of  $G_{m1}$ , the cross-coupled nonlinear transconductor  $G_{m2}$  will also be switched. In this design  $r_f = 0.625$  and hence the total input capacitance has been reduced by approximately 30%.

Fig. 10(a) shows the Monte Carlo simulation results (including process variations and also mismatch effect of all the components in Fig. 7). Fig. 10(b) depicts the behavior of the circuit in different process corners and temperatures to show the stability of the circuit over process and temperature variations.

# C. Power Dissipation

Using a two stage pre-driver stage as shown in Fig. 11(a), the total current consumption of the circuit can be calculated by

$$I_T = I_{PD} + I_{OD} \tag{11}$$



Fig. 9. Simulated input capacitance reduction of the output stage by increasing the size of cross-coupled switches  $(M_{PDX} \text{ and } M_{NDX})$  with respect to the size of  $M_{NDO}$  and  $M_{PDO}$  [see Fig. 7].

Here,  $I_{PD} = I_{PD1} + I_{PD2}$  is the total bias current of the predriver stage which is implemented by a two stage SCL-based buffer [as shown in Fig. 2(a)], and  $I_{OD}$  is the bias current of the output LVDS driver:

$$I_{OD} = \frac{V_{sw,T}}{R_T}.$$
(12)

It can be shown that  $I_{PD2}$  is proportional to the  $C_{in,LVDS}$  as well as to the voltage swing at the input of LVDS stage called  $V_{sw,2}$  (as depicted in Fig. 11(a)). This voltage ( $V_{sw,2}$ ) should be higher than the drain over-drive voltage of the differential switching transistors to have complete current switching. It can be shown that these two currents can be expressed approximately by

$$I_{PD2} \approx \frac{V_{sw,2}}{R_{L2}}$$
  
=  $\beta \cdot C_{ox} \cdot L_{\min}^2 \cdot \frac{m_T}{T_D} \cdot \frac{\gamma_{sat}^2}{V_{sw,2}} \cdot \frac{V_{sw,out}}{R_T} \cdot \frac{k'_p + k'_n}{k'_p \cdot k'_n}$   
(13)



Fig. 10. (a) Monte Carlo simulation results including process variations and mismatch effect of all elements ( $T = +85 \,^{\circ}$ C). (b) Corner simulation results in different temperature values ( $T = -25 \,^{\circ}$ C to  $+125 \,^{\circ}$ C). In these simulations, the length of PCB line is assumed to be 20 cm,  $C_L = 1 \,$  pF,  $C_P = 0.5 \,$  pF,  $R_T = 100 \,\Omega$ , and  $L_S = 4 \,$  nH [see Fig. 5].

and for the first stage:

$$I_{PD1} \approx \frac{V_{sw,1}}{R_{L1}} = \alpha \cdot C_{ox} \cdot L_{\min}^2 \cdot \frac{m_T}{T_D} \cdot \frac{\gamma_{sat}^2}{V_{sw,1}} \cdot \frac{V_{sw,2}}{R_{L2}} \cdot \frac{1}{k'_n}$$
(14)

where  $V_{sw}$  indicates the voltage swing at the corresponding nodes,  $\gamma_{sat} = V_{sw}/V_{DSsat}$  (it is assumed that  $\gamma_{sat}$  is the same for all devices in all stages),  $R_L$  is the load resistance in each stage,  $T_D$  is the period of the input data or clock,  $k' = \mu_{\text{eff}} \cdot C_{ox}$ , the parameters  $\alpha$  and  $\beta$  are added to take into account the effect of wiring or other parasitic capacitors ( $\alpha, \beta \geq 1$ ), and  $m_T = T_D / \tau$  ( $\tau = R_L \cdot C_L$  is the time constant of the corresponding node in Fig. 11(a)). To derive these equations, the settling time in each node is estimated and then the corresponding bias current is determined such that the time constant at the proposed output node be  $m_T$  times smaller than  $T_D$ . The other important parameter for estimating (13) and (14) is the time constant at the output of pre-driver stage  $(\tau)$  which should be much smaller than the input pulse period  $(T_D)$ , i.e.,  $m_T \gg 1$ . Considering that in the proposed topology  $C_{in,LVDS}$  reduces by reducing the  $r_f$ , the total current drawn from supply can be indicated by

$$I_{T} = I_{OD} \cdot \left( 1 + \frac{m_{T} \cdot C_{ox} \cdot L_{\min}^{2} \cdot \gamma_{sat}^{2}}{T_{D} \cdot k_{n}' \cdot \eta} \cdot \frac{r_{F} \cdot \beta}{V_{sw,2}} + \left( \frac{m_{T} \cdot C_{ox} \cdot L_{\min}^{2} \cdot \gamma_{sat}^{2}}{T_{D} \cdot k_{n}' \cdot \eta} \right)^{2} \cdot \frac{r_{f} \cdot \alpha}{V_{sw,1} \cdot V_{sw,2}} \right).$$
(15)

Based on (15), the power dissipation increases with the frequency of operation through  $T_D$ . It can also be seen that the total power dissipation can be reduced by increasing the voltage swing in the intermediate nodes  $(V_{sw,1} \text{ and } V_{sw,2})$ . The Appendix shows in more detail the main tradeoffs existing in design of an SCL buffer chain. Fig. 11(b) shows the estimated total current consumption of the driver circuit (including the PD stages without current consumption of the biasing and common-mode feedback circuits) for different output voltage swing values. Based on this plot, the total power consumption can be reduced either by increasing the settling time (which is not desirable), or by choosing a smaller output voltage swing. Illustrated in Fig. 11(b), the current consumption of the proposed topology with  $r_f = 0.625$  is significantly lower than that of the conventional topology for a given settling time and output swing. It should be mentioned that this plot does not take into account the power consumption needed for satisfying the impedance matching requirement in the conventional topologies. This plot also compares the estimated power consumption to the measured circuit power consumption for  $V_{OD} = 430 \text{ mV}.$ 

## D. Common-Mode Feedback

As shown in Fig. 5, a simple common-mode (CM) feedback circuit has been applied to control the output CM value [13]–[15]. Because of the large size of output transistors, stabilization of this CM feedback loop is difficult. In the proposed circuit,  $R_C$  and  $C_C$  are used to compensate the CM feedback loop frequency response. Meanwhile, the tail current of the PMOS switching transistors is divided into two parts provided by  $M_{PBO}$  and  $M_{PCO}$  in order to reduce the total CM feedback loop gain, and thereby improve the stability. It should be mentioned that the common-mode feedback control circuit has a low sensitivity to the output differential load. Hence, the line impedance would have minor effect on the stability of the CM feedback.

Operating with a low supply voltage requires a very careful control on the input CM voltage and also voltage swing at the input of the LVDS driver circuit. The input CM voltage of the OD circuit should be controlled such that both NMOS- and PMOS-side tail bias transistors stay in saturation region. For this reason, a separate CM feedback loop controls the output CM voltage of the pre-driver stage (or input CM voltage of the output stage) which is an SCL based buffer.  $R_{CM}$  and  $I_{CM}$  in Fig. 2(a) are used for this purpose. To make sure that both current sources are in saturation region:

$$0.5 \times V_{sw,in} + V_{CM,in} \ge 2 \times V_{DSsat} + V_{th,n} \tag{16}$$

and

$$V_{CM,in} - 0.5 \times V_{sw,in} \ge V_{DD} - (|2 \times V_{DSsat}| + |V_{th,p}|)$$
 (17)

meaning that the  $V_{sw,in}$  should be high enough to ensure that circuit is operational for supply voltages as low as 1.8 V ( $V_{th}$  stands for threshold voltage of MOS devices).



Fig. 11. (a) Block diagram of the proposed output driver and pre-driver stages, (b) estimated current consumption of the entire driver circuit including a two-stage SCL-based PD (excluding the biasing and common-mode feedback circuits). Dashed line shows the current consumption for the conventional topology while the solid line shows it for the proposed topology with  $r_f = 0.625$  (here:  $f_D = 250$  Msps).

#### **IV. MEASUREMENT RESULTS**

The proposed circuit has been designed in a conventional 0.18  $\mu$ m CMOS technology with 6 metal layers. Fig. 12 shows the chip micro-photograph of the driver circuit which occupies  $250 \times 270 \ \mu m^2$ . Fig. 13 shows the eye diagram of the output signal in two different data rates measured by LeCroy SDA6000 Serial Data Analyzer. The total equivalent series inductance at the output of the OD is 3.5 nH and the input signal is a PRBS (pseudo random bit stream)  $2^{31}$ -1 random data stream. Fig. 13(a) shows the oscilloscope snapshot. For 1 Gbps input data stream, as shown in Fig. 13(b), the eye diagram of the output signal is quite open and there is no overshoot at the output. The rise and fall time of the output signal are 500 ps and 750 ps, respectively. The measured output rms (root mean square) jitter is  $\sigma_{rms} = 4.5$  ps. The total current consumption of the OD stage and common-mode feedback to have a swing of 430 mV<sub>pp</sub> at the output is 4.8 mA (6.0 mA including PD). Fig. 13(c) shows the output eye diagram for 2.5 Gbps input data rate. The RC time constant of the load is the main limiting factor for increasing the speed beyond the 2.5 Gbps.

Table I compares these results with some previously reported works. As can be seen in this table, the proposed approach achieves a power-efficient design while satisfying the LVDS standard requirements, and also overcoming the impedance mismatch problem.

## V. CONCLUSION

In this paper, a low-power LVDS driver circuit for serial link applications has been presented. The proposed circuit includes input buffers to isolate the input digital circuitry from the output driver circuit. A technique to reduce the input capacitance of the output LVDS driver stage and hence reducing the power consumption of the input buffers has been demonstrated. The output driver stage draws 4.5 mA (2.5 mA) while driving 100  $\Omega$ off-chip differential termination resistor with a swing of 400 mV (200 mV) with a supply voltage of 1.9 V at 2.5 Gbps. To our knowledge, this is significantly lower than the power dissipation of most LVDS drivers reported earlier. A new pre-emphasis circuit is also suggested to improve the matching properties of the circuit.



Fig. 12. Chip micro-photograph of the proposed LVDS output driver circuit implemented in 0.18  $\mu m$  CMOS technology.

## APPENDIX TRADEOFFS IN DESIGN OF SCL BUFFER CHAINS

Consider that n consecutive SCL-based buffer stages have been utilized to drive a load capacitance of  $C_L$  (Fig. 14). If the maximum acceptable input capacitance is  $C_{\rm IN,Max}$ , then it is possible to determine the value of n for minimum possible power consumption. Assuming that the time constant at the output of *i*th stage is  $m_T$  times less than  $T_D$  (input data period), then:

$$R_{L,i} \cdot C_i \le \frac{T_D}{m_T}, \quad i \in \{1, \dots, n\}$$

$$(18)$$

By applying this constraint to all the intermediate nodes, it can be shown that the input capacitance of each stage with respect to the input capacitance of the next stage can be represented by

$$C_i = P \cdot S \cdot D_i \cdot C_{i+1} \tag{19}$$

in which P is a process-dependent constant defined as

$$P = \frac{2L_{\min}^2}{\mu_n}.$$
 (20)

Here, the parameter S depends on the speed of operation as

$$S = \frac{m_T}{T} \tag{21}$$

and  $D_i$  is

$$D_i = \left(1 + (1 + \gamma_M) \cdot \frac{L_{ov}}{L_{\min}}\right) \cdot \gamma_{sat}^2 \cdot \frac{V_{sw,i}}{V_{sw,i-1}^2}.$$
 (22)

Therefore, the total input capacitance can be found as

$$C_{\rm IN} = (P^n \cdot S^n \cdot \prod_{i=1}^n (D_i)) \cdot C_L < C_{\rm IN,Max}.$$
 (23)

Regarding (22) and (23), it can be seen that larger voltage swing at the preceding stages leads to smaller input capacitance or, in other words, smaller number of stages is needed to achieve the desired input capacitance. Meanwhile, (19) implies that to



Fig. 13. (a) Transient of rise- and fall-times, and measured output eye diagram for input data rate of: (b) 1 Gbps (*x*-axis: 125 ps/div, *y*-axis: 145 mV/div), (c) 2.5 Gbps (*x*-axis: 50 ps/div, *y*-axis: 100 mV/div).

be able to reduce the total input capacitance by buffering, it is necessary that:  $P \cdot S \cdot D_i < 1$ . Assuming that all the stages have the same voltage swing  $(V_{sw,i} = V_{sw} \text{ for } i = 1 \text{ to } n)$ , then this criteria puts an upper limit on the maximum operation speed of the circuit as

$$f_D < \frac{\mu_n}{2L_{\min} \cdot (L_{\min} + (1 + \gamma_M) \cdot L_{ov})} \cdot \frac{V_{sw}}{\gamma_{sat}^2} \cdot \frac{1}{m_T}.$$
 (24)

This equation means that the voltage swing at the intermediate stages should be maximized to achieve a higher speed of operation. The main reason is that by increasing the voltage swing at the input of each stage by a factor of  $k_V$ , it is possible to reduce the size of switching transistors of that stage by a factor of  $k_V^2$  without affecting the switching process. This voltage scaling leads to  $k_V^2$  times smaller input capacitance.

| Ref.        | Tech.                    | <i>V<sub>DD</sub></i><br>[V] | <i>V<sub>OD</sub></i><br>[ <b>m</b> V] | I <sub>DD</sub><br>[mA] | f <sub>D</sub><br>[Gbps] | $\sigma_{Jitter}$ [ps $_{rms}$ ] | Area<br>[mm <sup>2</sup> ] | Normalized Pow. Diss.<br>[mW/Gbps] |
|-------------|--------------------------|------------------------------|----------------------------------------|-------------------------|--------------------------|----------------------------------|----------------------------|------------------------------------|
| Bratov [5]  | $0.35 \mu m$ SiGe BiCMOS | 1.7-3.5                      | 265-300                                | 6-7                     | 1-2                      | 14.2                             | 0.068                      | 10.2-12.25                         |
| Jamasb [6]  | $0.18 \mu m CMOS$        | 1.8                          | 152-212                                | 12.8                    | 1.244                    |                                  | 0.022                      | 18.5                               |
| Yan [7]     | $0.18 \mu m CMOS$        | 1.8                          | 80-250                                 | 2.8-7.4                 | 4                        |                                  | 0.011                      | 1.26-3.33                          |
| Chen [8]    | $0.35 \mu m$ CMOS        | 1.8                          | 340                                    | 7.1-12.8                | 1.2-1.4                  |                                  | 0.14-0.11                  | 10.7-16.4                          |
| Boni [13]   | $0.35 \mu m$ CMOS        | 3.3                          | 400-425                                | 13                      | 2                        |                                  | 0.175                      | 21.45                              |
| Mandal [14] | $0.35 \mu m$ CMOS        | 3.3                          | 320                                    | 5.5                     | 1                        |                                  | 0.039                      | 18.15                              |
| This Work   | $0.18 \mu m$ CMOS        | 1.9                          | 200-430                                | 2.5-4.8                 | 2.5                      | 4.5                              | 0.067                      | 1.9-3.7                            |

TABLE I Performance Comparison to Similar Designs



Fig. 14. SCL-based buffer chain to drive the load capacitance of  $C_L$  at the desired data rate. The load resistance of the stage (i) is  $R_{L,i}$  and  $C_i$  is the total capacitance seen by  $R_{L,i}$ .

Meanwhile,  $\gamma_{sat}$  should be selected as small as possible to increase the lower limit on  $f_D$ . The lower limit on  $\gamma_{sat}$  is  $\sqrt{2}$  [16].

In addition, based on (24),  $m_T$  should be selected as small as possible. In a configuration with n identical stages, the total circuit bandwidth  $(BW_n)$  can be estimated by  $BW_n = BW \cdot \sqrt{\sqrt[4]{2}-1}$  (BW is the bandwidth of each stage) [17]. Then  $m_T$ should be high enough to satisfy the general requirement of  $BW \ge 0.7 \times f_D$  [10].

To calculate the power consumption, one can show that

$$I_{i} = k_{I,i} \cdot I_{i+1} = P \cdot S \cdot D_{i} \cdot \frac{V_{sw,i}}{V_{sw,i+1}} \cdot I_{i+1}.$$
 (25)

This expression is derived assuming that the time constants of the all intermediate nodes are satisfying (18). Equation (25) also shows that the bias current in each stage depends on the voltage swing at the input  $(V_{sw,i-1})$  and output of that stage  $(V_{sw,i})$ as well as the voltage swing at the output of the next stage  $(V_{sw,i+1})$ . Assuming a constant voltage swing for all the stages, the total current drawn from the supply voltage can be evaluated as

$$I_{tot} = \frac{V_{sw,\text{out}} \cdot m_T \cdot C_L}{T_D} \cdot \frac{1 - k_I^n}{1 - k_I} \tag{26}$$

which would be dominated by the latest stages of the buffer chain and also increases by  $V_{sw,out}$ . Based on (24) and (26), choosing a low voltage swing for the last stage and at the same time higher voltage swing at the intermediate stages can help achieving a good speed-power consumption compromise. Fig. 15 shows the total current consumption calculated based on (26) for different number of stages and different voltage swing values. Based on Fig. 15, to get the desired input capacitance  $(C_{IN,Max} = 50 \text{ fF})$  it is possible to increase the number of stages or increase the voltage swing at the intermediate stages. To have small *n* values, the only possibility is to increase the voltage swing to 0.5 V. Also, it can be seen that it is possible to reduce the total current consumption by increasing the voltage swing for high *n* values.



Fig. 15. Current consumption in an SCL buffer chain for different number of stages n and different voltage swing values at the intermediate nodes  $(V_{sw,i})$  based on (26). In this simulation:  $C_L = 2 \text{ pF}$ ,  $V_{sw,in} = 0.4 \text{ V}$  and it is assumed that  $C_{\text{TN}}$  should be smaller than 50 fF. In the gray area, it is not possible to achieve the desired  $C_{\text{TN}}$ .

#### ACKNOWLEDGMENT

The authors would like to thank Dr. A. Schmid for his help during the measurements.

#### REFERENCES

- E. Yeung and M. A. Horowitz, "A 2.4 Gb/s/pin simultaneous bidirectional parallel link with pre-pin skew compensation," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1619–1628, Dec. 2000.
- [2] A. Tajalli, P. Muller, and Y. Leblebici, "A power-efficient clock and data recovery circuit in 0.18-µm CMOS technology for multi-channel short-haul optical data communication," *IEEE J. Solid-State Circuits*, vol. 42, no. 10, pp. 2235–2244, Oct. 2007.
- [3] IEEE Standard for Low-Voltage Differential Signals (LVDS) for Scalable Coherent Interface (SCI), IEEE Std 1596.3-1996, Mar. 1996.
- [4] Chapter 1: Introduction to LVDS [Online]. Available: http://lvds.national.com
- [5] V. Bratov, J. Binkley, V. Katzman, and J. Choma, "Architecture and implementation of a low-power LVDS output buffer for high-speed applications," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 53, no. 10, pp. 2101–2108, Oct. 2005.

- [6] S. Jamasb et al., "A 622 MHz stand-alone LVDS driver pad in 0.18-µm CMOS," in Proc. IEEE Midwest Conf. Circuits and Systems (*MWSCAS*), 2001, vol. 2, pp. 610–613. [7] Y. Yan and T. H. Szymanski, "Low power high speed I/O interface in
- 0.18-µm CMOS," in Proc. IEEE ICECS, Dec. 2003.
- [8] M. Chen et al., "Low-voltage low-power LVDS driver," IEEE J. Solid-State Circuits, vol. 40, no. 2, pp. 472-479, Feb. 2005.
- [9] M. Bartolini et al., "A reduced output ringing CMOS buffer," IEEE Trans. Circuits Syst. II: Express Briefs, vol. 54, no. 2, pp. 102-106, Feb. 2007.
- [10] T. Gabara et al., "LVDS I/O buffers with a controlled reference circuit," in Proc. IEEE ASIC Conf., Sep. 1997, pp. 311-315.
- [11] S. Galal and B. Razavi, "10-Gb/s limiting amplifier and laser/modulator driver in 0.18-µm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2138–2146, Dec. 2003.
- [12] T. C. Weigandt, B. Kim, and P. R. Gray, "Analysis of timing jitter in CMOS ring oscillators," in Proc. ISCAS, May 1994, vol. 4, pp. 27-30.
- [13] A. Boni et al., "LVDS I/O interface for Gb/s-per-pin operation in 0.35-µm CMOS," IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 706-711, Apr. 2001.
- [14] G. Mandal and P. Mandal, "Low power LVDS transmitter with low common mode variation for 1 Gb/s-per pin operation," in Proc. ISCAS, May 2004, vol. I, pp. 1120-1123.
- [15] A. Tajalli and Y. Leblebici, "A power-efficient LVDS driver circuit in 0.18-µm CMOS technology," in Proc. Ph.D. Research in Microelec-tronics and Electronics Conf. (PRIME), Jul. 2007, pp. 145–148.
- [16] C. H. Doan, "Design and implementation of a highly-integrated low-power CMOS frequency synthesizer for an indoor wireless wideband-CDMA direct-conversion receiver," Masters thesis, Electr. Eng. Comput. Sci. Dept., Univ. California, Berkeley, 2000.
- [17] B. Razavi, Design of Integrated Circuits for Optical Communications. New York: McGraw-Hill, 2004.



Armin Tajalli (S'04) received the B.S. and M.S. degrees (Hons.) in electrical engineering from Sharif University of Technology, Tehran, Iran, and Tehran Polytechnic University in 1997 and 1999, respectively, and the Ph.D. degree (Hons.) from Sharif University of Technology in 2006.

From 1998 to 2004, he was with Emad Semicon as a Senior Analog Design Engineer. In 2006, he joined Microelectronic Systems Laboratory (LSM) in the Ecole Polytechnique Fédérale de Lausanne (EPFL) working on ultra-low power circuit design techniques.

Dr. Tajalli received the Award of the Best Design Engineer from Emad Semicon, 2001, the Kharazmi Award on Research and Development, 2002, and the Presidential Award of the Best Iranian Researchers, 2003.



Yusuf Leblebici (M'90-SM'98) received the B.S. and M.S. degrees in electrical engineering from Istanbul Technical University in 1984 and 1986, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign (UIUC) in 1990.

Between 1991 and 2001, he worked as a faculty member at UIUC, at Istanbul Technical University, and at Worchester Technical Institute, where he established and directed the VLSI Design Laboratory. Since 2002, he has been a Chair Professor at the

Swiss Federal Institute of Technology in Lausanne (EPFL), and director of Microelectronic Systems Laboratory. His research interests include design of high-speed CMOS digital and mixed-signal integrated circuits, computer-aided design of VLSI systems, intelligent sensor interfaces, modeling and simulation of semiconductor devices, and VLSI reliability issues. He is the coauthor of three textbooks, Hot-Carrier Reliability of MOS VLSI Circuits (Kluwer Academic, 1993), CMOS Digital Integrated Circuits: Analysis and Design (McGraw-Hill, 1996, 1998, and 2002), and CMOS Multi-Channel Single-Chip Receivers for Multi-Gigabit Optical Data Communications (Springer, 2007), as well as more than 150 scientific articles published in international journals and conferences.

Dr. Leblebici was on the organizing and steering committees of several international conferences in microelectronics. He has served as an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II between 1998 and 2000, and as an Associate Editor of IEEE TRANSACTIONS ON VLSI between 2001 and 2003. He received the Young Scientist Award of the Turkish Scientific and Technological Research Council in 1995, and the Joseph Samuel Satin Distinguished Fellow Award of the Worcester Polytechnic Institute in 1999.