# A 20-Gb/s Burst-Mode Clock and Data Recovery Circuit Using Injection-Locking Technique

Jri Lee, Member, IEEE, and Mingchung Liu

Abstract—A 20-Gb/s clock and data recovery circuit incorporates injection-locking technique to achieve high-speed operation with low power dissipation. The circuit creates spectral line at the frequency of data rate and injection-locks two cascaded *LC* oscillators. A frequency-monitoring mechanism is employed to ensure a close matching between the VCO natural frequency and data rate. Fabricated in 90-nm CMOS technology, this circuit achieves a bit error rate of less than  $10^{-9}$  in both continuous (PRBS of  $2^{31} - 1$ ) and burst modes while consuming 175 mW from a 1.5-V supply.

*Index Terms*—Burst mode, clock and data recovery (CDR), operational amplifier, phase-locked loop (PLL), injection-locked, voltage-controlled oscillator (VCO).

#### I. INTRODUCTION

HE rapid growing of last-mile solution such as passive optical networks (PONs) activates research on instantaneous locking techniques. Usually manifested itself in burst-mode operation, fast-locking technique plays a critical role in various applications. Fig. 1(a) shows one example of typical PON systems, where the optical line terminal (OLT) must deal with asynchronous packets with different amplitudes and lengths during upstream mode, necessitating clock and data recovery (CDR) circuits with immediate clock extraction and data retiming. Some other electrical data links such as broadband correlators would also require CDR circuits which can tolerate hundreds of consecutive ONEs or ZEROs. As illustrated in Fig. 1(b), two broadband signals coming from different sources (e.g., two antennae) are first digitized by high-speed ADCs, and then transmitted to a correlator for further processing. Since no scrambler or encoder can be easily obtained at high speed, the CDR circuit in the correlator frontend must manipulate the raw data from the ADCs, which may contain very long runs. In other words, the CDR needs to respond and lock expeditiously whenever data transition arrives and remain in lock as close as possible during long runs.

Unlike synchronous optical network (SONET) systems that impose strict specification on jitter transfer, the above applications have no or few repeaters in their data paths. It allows us to trade the loop bandwidth with fast locking in phase and frequency. Among the existing solutions, [1] incorporates gated voltage-controlled oscillators (GVCO) to acquire rapid phase locking [Fig. 2(a)], but the ring structure of the GVCO results in higher phase noise and lower operation speed. More seriously,

Digital Object Identifier 10.1109/JSSC.2007.916598

the gating behavior would cause momentary fluctuation on the recovered clock, potentially incurring undesired jitter and intersymbol interference (ISI). In addition, the truncation or prolongation of the clock cycle during phase alignment induces other uncertainties such as locking (settling) time. The circuit in [2] employs oversampling technique. However, the complexity and power consumption limit its potential at high speed. Conventional injection-locked CDRs [Fig. 2(b)] also suffers from issues such as limited locking range, process, temperature, and supply (PVT) variations, and weak injection signals.

In this paper, we propose a new and simple approach based on injection locking technique. Two voltage-controlled *LC* oscillators in cascade are injection-locked to the input data transition, instantaneously generating recovered clock with constant amplitude. To overcome the PVT variations, a reference phaselocked loop (PLL), consisting of a duplicate VCO along with a tunable buffer, dynamically provides the control voltage. This technique forces the free-running frequency of the injectionlocking VCO to track the data rate closely, allowing the CDR to endure at least hundreds of consecutive bits. A 20-Gb/s prototype realized in 90-nm CMOS technology achieves a bit error rate (BER) of less than  $10^{-9}$  for both burst-mode and continuous-mode (with PRBS of  $2^{31} - 1$ ) while consuming less than 175 mW from a 1.5-V supply.

Section II develops the foundation of injection-locked CDRs, describing the architecture as well as its advantages and limitations. Section III presents the design and analysis of each building blocks. Section IV discusses the effect of finite frequency offset, and Section V summarizes the experimental results.

## II. ARCHITECTURE AND CONSIDERATIONS

Injection-locking technique has found extensive usage in many aspects of serial-link receivers where instantaneous locking is required. The circuit in [3] extracts the clock by injection-locking the local oscillator to the tiny embedded clock signal, which primarily arises from leakage coupling. No data recovery is performed in this design. The work in [4] utilizes an edge detector to reproduce the clock frequency. However, the limited gain of the differentiator and rectifier would substantially deteriorate the performance at high speed<sup>1</sup>. Note that both designs may suffer from severe performance degradation, as the natural frequency of the VCOs deviates from the data rate due to PVT variations. In other words, the lack of frequency-tracking mechanism prevents them from being used in real world.

<sup>1</sup>It is especially true in CMOS technologies.

Manuscript received October 12, 2006; revised October 2, 2007.

The authors are with the Electrical Engineering Department, National Taiwan

University, Taipei, Taiwan, R.O.C. (e-mail: jrilee@cc.ee.ntu.edu.tw).



Fig. 1. Applications of burst-mode CDRs: (a) passive optical networks, (b) broadband correlator.



Fig. 2. Clock recovery with fast locking techniques: (a) GVCO, (b) injection-locked VCO.





To overcome the above difficulties, we propose CDR architecture as depicted in Fig. 3. Here, the input  $D_{in}$  and its delayed replica  $D'_{in}$  are XORed to create pulses upon occurrence of data transition. As can be shown in the next paragraph, a pulsewidth of half bit period is generated to achieve an optimal injection to VCO<sub>1</sub>. Two identical oscillators, VCO<sub>1</sub> and VCO<sub>2</sub>, are coupled in cascade to purify the clock. In contrast to the gating circuits in [1], this two-stage coupling ensures a constant amplitude in output clock  $CK_{out}$ , and suppresses more noise by the filtering nature of the *LC* tanks. Driven by  $CK_{out}$ , the flipflop retimes the input data with proper phase alignment, owing to the variable delay buffer that provides an adjustable delay. The reference PLL, consisting of another duplicated VCO (i.e., VCO<sub>3</sub>) and a divider chain of modulus 128, produces a control voltage  $V_{ctrl}$  for VCO<sub>1</sub> and VCO<sub>2</sub>. Such a control voltage reproduces itself as  $V'_{ctrl}$  by a unity gain buffer before being sent to VCO<sub>1,2</sub>. Note that a bypass capacitor  $C_p$  (= 10 pF) is placed on chip to stabilize  $V'_{ctrl}$  without disturbing the loop filter of reference PLL. For testing purpose, a 2-to-1 MUX is added in this prototype, such that the CDR can also work with a fixed control voltage for comparison. A novel phase and frequency detector (PFD) is employed to minimize the ripple on the control voltage and will be described in Section III as well. The input buffer incorporates broadband matching technique described in [5]. Due to the background frequency tracking, a wide operation range of 800 Mb/s is achieved by this architecture.



Fig. 4. Spectra for different pulse width: (a)  $T_b/2$ , (b)  $T_b/4$ .



Fig. 5. Simulated injection level.

The pulse generated by the XOR gate not only indicates the data transition but creates spectral lines at the data rate and its harmonics, facilitating the injection locking of the subsequent VCOs. Since the transition of a random data sequence is still random, the spectrum of the generated pulses resembles that of a return to zero (RZ) data, as shown in Fig. 4. That is, the spectrum displays as a square of sinc function with strong clock spectral lines at data rate and the harmonics. It can be proven that the breadth of the lobes and the appearance of the clock lines vary significantly for different pulse widths: for a half bit-period  $(T_b/2)$  pulse, the spectrum nulls at  $2/T_b$  [Fig. 4(a)], whereas for a quarter bit-period  $(T_b/4)$  pulse, it expands to twice wider but with lower magnitude. The impulses still locate at the harmonics of  $1/T_b$  except the nulls. As a result, the VCOs can easily injection-locks to the data rate or even its harmonics. It is worth noting that these clock lines always exist regardless of the pulse width, but maintaining a  $T_b/2$  delay between the two inputs of the XOR gate yields the strongest injection. In fact, the normalized magnitude of  $1/T_b$  line can be expressed as  $(\sin x\pi)/\pi$ , where x represents the relative pulsewidth and 0 < x < 1. In this design, we choose the delay to be around  $T_b/2$  (= 25 ps), and the XOR gate, input and variable delay buffers are realized as current mode logic (CML) with 500-mV logic level. Transistor-level simulation suggests that the  $1/T_b$  line is about -9.78dBm (Fig. 5). It is more than  $10^4$  times larger than that of [3].<sup>2</sup>

In actual applications, the severe PVT variations would deviate the VCO natural frequency from the data rate significantly, degrading the recovered clock or even making the CDR out of lock. Fig. 6(a) depicts the simulated VCO frequency under different conditions. The variation between extreme cases exceeds 1 GHz, a value well beyond the locking range (typically a few tens of MHz). Even with proper control on the supply voltage, the temperature itself still drifts the natural frequency by 100 MHz, as shown in Fig. 6(b). That is, the CDR circuit without a proper frequency tracking mechanism would either suffer from severe jitter or simply lose lock. Design in [4], which employs a fixed control voltage, is impractical for this reason.

Owing to the injection-locking behavior, one edge of the recovered clock aligns with the input data transition and the other coincides with the eye center when the VCO resonance frequency is equal to the data rate [6], [7]. However, a finite phase error may still exist if the intermediate buffer or imbalanced routing causes skews and pushes the sampling points away from the eye center. Fortunately, the variable delay buffer inherently provides data with different delays, and one optimal output is selected and sent to the flipflop. The variable delay buffer contains 8 identical inductively-peaked differential pairs, each stage

<sup>2</sup>Abrupt rising/falling edge would further raise the  $1/T_b$  line by approximately 5 dB.



Fig. 6. VCO tuning curves for (a) PVT variations, (b) temperature variation only  $(V_{ctrl} = 0 V)$ .



Fig. 7. VCO and clock buffer.

corresponds to a delay of around 3 ps. A 3-bit selector is preset manually to pick the optimal data phase.<sup>3</sup> Simulation shows that the 0.5-UI (3 ps × 8 = 24 ps  $\approx$  0.5 UI) coverage can accommodate all the PVT variations. For a given selector input, the maximum phase error for temperature (0°C ~ 70°C) and supply variations (±10%) is about 0.5 ps. That implies the delay of the clock path tracks that of the data path very well, and the flipflop always performs the sampling in the vicinity of the data eye.

#### **III. BUILDING BLOCKS**

## A. VCO and Clock Buffer

The VCO and buffer design is shown in Fig. 7, where the injection pairs  $M_{1,2}$  and  $M_{5,6}$  translate the input signal into current to lock the oscillator. Two identical VCOs are coupled in cascade, and are preceded and followed by the XOR gate and clock buffer, respectively. The buffer isolates the VCOs from data transitions of the flipflop. It has an input loading approximately equal to that of VCO<sub>1</sub> (or VCO<sub>2</sub>) with routing capacitance included (i.e.,  $C_{in1} = C_{in2}$ ). For a fixed control voltage, the lock range of the VCO can be given by [7]

$$\omega_L = \frac{\omega_0}{2Q} \cdot \frac{I_{inj}}{I_{osc}} \cdot \frac{1}{\sqrt{1 - \frac{I_{inj}^2}{I_{osc}^2}}},\tag{1}$$

 $^{3}$ In future design, an automatic selector should be added to increase robustness.

where  $\omega_0$  denotes the oscillation frequency, Q the quality factor of the tank, and  $I_{inj}$  and  $I_{osc}$  the injection and oscillation currents, respectively. Since the pulling between VCO<sub>1</sub> and VCO<sub>2</sub> is quite strong, the overall lock range is primarily determined by the coupling between the XOR gate and VCO<sub>1</sub><sup>4</sup>. Verified by measurement, the lock range (for a fixed control voltage) is equal to 22 MHz. This value is insufficient for many applications, manifesting the importance of the frequency tracking PLL.

One important aspect of the cascaded VCOs is the relatively constant output swing. Fig. 8 shows the output waveform of the two VCOs injection-locked to a PRBS of  $2^7 - 1$ . VCO<sub>2</sub> oscillates with almost uniform magnitude, since VCO<sub>1</sub> swings during long runs. As compared with the single VCO [4] and the gated VCO [1] structures which may suffer from significant clock fluctuations, this design stabilizes the sampling in the flipflop and improves the signal integrity.

## B. Variable Delay Buffer, XOR Gate, and Flipflop

The 20-Gb/s operation speed necessitates CML designs all over the place. Fig. 9 depicts the variable delay buffer design. Eight identical differential pairs are cascaded in series to provide the 25-ps delay. One of the 8 phases is manually selected by means of the 8-to-1 MUX and is sent to the flipflop. A singleended swing of 500 mV is employed as logic level in each block,

 $^4 The output swing of the XOR gate is about 150 mV whereas that of VCO <math display="inline">_1$  is 750 mV.



Fig. 8. Simulated output waveforms of (a)  $VCO_1$ , (b)  $VCO_2$ .



Fig. 9. Variable delay buffer.

and complementary [8] technique is incorporated in the XOR gate. Meanwhile, on-chip inductors are added in these circuits to extend the bandwidth without sacrificing voltage swing or increasing power consumption. The inductors are implemented as 3-layer stacked spirals [10] to facilitate the routing in layout. Here, a 0.5-nH inductor occupies only  $14 \times 14 \ \mu m^2$ .

#### C. Unity Gain Buffer

The unity gain buffer isolates the two control voltages to allow more flexibility on design. A tunable offset compensator is required here because (1) mismatch may exist between VCO<sub>1,2</sub> and VCO<sub>3</sub>; (2) gain error and intrinsic offset of the opamp itself need to be balanced. In other words, an adjustable offset is required to provide finite difference between  $V_{\rm ctrl}$  and  $V_{\rm ctrl}^{\prime}$ .

The unity gain buffer is illustrated in Fig. 10(a). Here, we employ a two-stage opamp structure with source degeneration  $(R_1)$  and tunable tail currents  $(I_{SSA} \text{ and } I_{SSB})$  in the first stage. The ratio of these two current sources can be manually adjusted to provide an artificial offset between  $V_{in}$  ( $V_{ctrl}$ ) and  $V_{out}$  ( $V'_{ctrl}$ ).  $I_{SSA}$  and  $I_{SSB}$  are nominally equivalent, and they are tuned with a constant total amount of 200  $\mu$ A. For a ratio of 1:4 (i.e., 40  $\mu$ A:160  $\mu$ A), the buffer can provide an offset of  $\pm 15$  mV between the input and the output. A compensation capacitor  $C_1$  (= 10 pF) together with zero-removing resistor  $R_2$  (= 1.1 k $\Omega$ ) are placed between the two stages to ensure stability. Fig. 10(b) depicts the Bode Plot of the open-loop opamp, revealing a dc gain of 56 dB and phase margin of 80°. Note that no common-mode feedback is needed in this design. The closed-loop bandwidth is around 65 MHz. Since the opamp operates at near-dc frequency (control voltage drifting due to temperature variation is very slow), the bandwidth is more than adequate here. The single-ended output inevitably suffers from supply fluctuation issue. A large bypass capacitor is added on chip to minimize this effect.

The input-output deviation due to gain error is also analyzed. Fig. 10(c) shows the gain error of the opamp as a function of temperature for different process corners. The worst case occurs at  $FF + 70^{\circ}C$ , which is equivalent to an input-output difference of 1.1 mV. Due to the lack of statistical data of the 90-nm process from fab, Monte Carlo analysis cannot be used to estimate the opamp offset. We instead extrapolate it from older technologies and predict an rms value of around 3 mV.

## D. Reference PLL

The PLL needs to provide a control voltage that tracks the data rate with minimum disturbance. The reference feedthrough would be a serious issue in this application, since the VCO<sub>1,2</sub> would experience the same control-line ripples during long runs, and substantial jitter would appear in the recovered clock and data. Many attempts have been made to suppress this nonideality [11]–[13]. However, they are either too complicated or consume too much power or area. Fractional-N architectures with  $\Sigma$ - $\Delta$  modulation is also not an option due to the same reason.

The proposed PLL is shown in Fig. 11. It consists of a 20-GHz oscillator VCO<sub>3</sub> (duplicated from VCO<sub>1,2</sub>), a chain of frequency dividers ( $\div$ 128), a novel phase and frequency detector (PFD) along with two V/I converters [(V/I)<sub>PD</sub> and



Fig. 10. (a) Unity gain buffer, (b) Bode Plot of the open-loop opamp, (c) simulated gain error for different process corners.

 $(V/I)_{FD}$ ], and a third-order loop filter. The phase and frequency detections are decomposed into two loops, similar to that in [15]. Here, we present a novel phase detection technique that significantly reduces the effect of reference feedthrough. As illustrated in Fig. 12(a), a "quiet" phase detection can be accomplished by mixing two quadrature signals, one from the reference input ( $CK_{ref}$ ) and the other from the last divider stage ( $CK_{div}$ ). Denoting the magnitudes of these two inputs as  $A_1$  and  $A_2$ , the mixer gain as k, and the phase error as  $\theta$ , we arrive at the phase detector output as:

$$V_{PD} = kA_1 A_2 \sin \theta, \qquad (2)$$

given that the single-sideband (SSB) mixer is perfectly symmetric. That is, the phase detector reveals a sinusoidal inputoutput characteristic, which can be approximately considered linear in the vicinity of the origin. The V/I converter thus pumps a proportional current, either positive or negative, into the loop filter and changes the control voltage accordingly. Note that a static divider is used in front of the PFD to generate the quadrature phases of  $CK_{\rm ref}$  (Fig. 11).

In the presence of mismatches, finite "image" could be observed at  $2\omega_{in}$ . Thus, we add a low-pass filter with corner frequency of 8.3 MHz right after the SSB mixer to suppress this image by 31.5 dB. To evaluate the ripple reduction, we compare the control voltages of two PLLs with the proposed and conventional type IV PFDs [14] under locked condition, and plot the result in Fig. 12(b). Simulation shows that the maximum control-line ripple of the proposed phase detector is only 20  $\mu$ V.



Fig. 11. Proposed PLL.

The periodic characteristic of the phase detector implies a limited capture range. Fortunately, we can obtain the frequency error by introducing an additional SSB mixer. As shown in Fig. 13(a), the two outputs  $V_1$  and  $V_2$  appear orthogonally and are given by

$$V_1 = V_{PD} = kA_1A_2\sin(\Delta\omega_{\rm in}t + \theta) \tag{3}$$

$$V_2 = kA_1A_2\cos(\Delta\omega_{\rm in}t + \theta). \tag{4}$$

Here,  $\Delta \omega_{in}$  represents the frequency difference between  $CK_{ref}$ and  $CK_{div}$ . Obviously, whether  $V_1$  is leading or lagging  $V_2$ depends on the sign of  $\Delta \omega_{in}$ , which can be easily examined by sampling one signal with the other in a flipflop [15]. Note that the very slow sinusoids  $V_1$  and  $V_2$  may cause malfunction of FF<sub>1</sub> if they drive the flipflop directly, because the transitions



Fig. 12. (a) Phase detection based on SSB mixer and its characteristic, (b) simulated control line ripples.

of  $V_1$  and  $V_2$  become extremely slow when the loop is close to lock. The fluctuation caused by unwanted coupling or additive noise makes the transition ambiguous, i.e., multiple crossovers may occur. To remedy this issue, two hysteresis buffers are employed to sharpen the waveforms. Fig. 13(b) depicts the buffer design. The cross-coupled pair  $M_3 - M_4$  provides different switching thresholds for low-to-high and high-to-low transitions, and the positive feedback helps to create square waves. Here,  $(W/L)_{1,2} = (W/L)_{3,4} = 8/0.25$ , and a threshold difference of 46 mV is observed. The hysteresis buffer introduces a 120-ps delay, which is negligible as compared with the operation period of the PFD (= 1/156 MHz = 6.4 ns).

The complete PFD design is shown in Fig. 13(c). The frequency acquisition should be turned off upon lock so as to minimize the disturbance. Fortunately, the frequency detector preserves the automatic switching-off function. It is clear from (3) and (4) that, upon lock,  $V_1$  approaches zero and  $V_2$  stays in a (positive) constant  $kA_1A_2$ . We thus apply the ENFD signal to  $(V/I)_{FD}$  [Fig. 14(a)] and have it disabled when the loop is locked. Similar to [15],  $(V/I)_{FD}$  activates for 50% of the time during tracking, and automatically switches itself off when the frequency acquisition is accomplished. A pumping current 4 times larger than that of  $(V/I)_{PD}$  [Fig. 14(b)] is used here to ensure a smooth frequency tracking. In Fig. 14(b), a source degeneration resistor  $R_1$  (= 200  $\Omega$ ) is employed to retain the linearity.

## **IV. FINITE FREQUENCY OFFSET**

A common issue of open loop CDR circuits arises from the nonzero difference between the data rate and the multiple of reference frequency. The local oscillator (usually implemented with crystals) inevitably resonates at a frequency away from the desired value by a few tens of ppm. The frequency error thus accumulates during consecutive ONEs and ZEROs, resulting in jitter in time domain. To quantify the jitter, we define the frequency deviation  $\Delta f$  as

$$\Delta f = f_b - M \cdot f_{\text{ref}},\tag{5}$$

where  $f_b = 1/T_b$  denotes the data rate,  $f_{ref}$  the reference frequency, and M the corresponding divide ratio. Since  $\Delta f$  is typically much less than  $f_b$ , the clock zero crossing shifts  $\Delta f/f_b$ UI per bit period during long runs [positions 3, 6, and 7 in Fig. 15(a)]. Here we assume the clock zero crossing aligns to data transition immediately whenever it occurs (positions 1, 2, 4, 5, and 8). For N consecutive bits, the phase error accumulates up to  $(N-1)\Delta f/f_b$  in the last bit, and a bit error would occur if it exceeds 0.5 UI. That is, in the presence of frequency offset, the maximum tolerable length of consecutive bits is given by

$$N_{max} = \frac{1}{2} \cdot \frac{f_b}{\Delta f} + 1. \tag{6}$$

It is of course an optimistic estimation since VCO's phase noise would deteriorate the result considerably.

Moreover, for a random sequence, the probability of occurring a phase deviation of  $n\Delta f/f_b$  is equal to  $2^{-(n+1)}$ . Fig. 15(b) illustrates the probability distribution. That is, the clock zerocrossing points accumulate at equally-spaced positions with different probabilities, and the average position is therefore given by

$$\frac{\Delta f}{f_b} \sum_{n=0}^{\infty} \frac{n}{2^{n+1}} = \frac{\Delta f}{f_b}.$$
(7)

Authorized licensed use limited to: National Taiwan University. Downloaded on February 27, 2009 at 00:54 from IEEE Xplore. Restrictions apply



Fig. 13. (a) Frequency detection, (b) hysteresis buffer and its characteristic, and (c) complete PFD design.



Fig. 14. Realization of V-to-I converters, (a)  $(V/I)_{FD}$ , (b)  $(V/I)_{PD}$ .

The rms jitter due to this effect can be obtained as

$$J_{rms} = \left[ (-1)^2 \cdot \frac{1}{2} + 0^2 \cdot \frac{1}{4} + 1^2 \cdot \frac{1}{8} + 2^2 \cdot \frac{1}{16} + \dots \right]^{1/2} \\ \cdot \left| \frac{\Delta f}{f_b} \right|$$
(8)

$$= \left(\frac{1}{2} + 0 + \sum_{n=1}^{\infty} n^2 \cdot \frac{1}{2^{n+2}}\right)^{1/2} \cdot \left|\frac{\Delta f}{f_b}\right|$$
(9)

$$=\sqrt{2} \cdot \left|\frac{\Delta f}{f_b}\right|.\tag{10}$$

Since it is proportional to  $\Delta f/f_b$ , keeping the frequency offset small is desirable in critical applications. Fig. 15(c) depicts the simulated rms jitter as a function of frequency deviation, verifying the prediction of (10). Fig. 15(d) illustrates one possible realization with no frequency offset in wireline systems where reference is provided by the transmitter rather than a local crystal.

626

Authorized licensed use limited to: National Taiwan University. Downloaded on February 27, 2009 at 00:54 from IEEE Xplore. Restrictions apply.



Fig. 15. (a) Phase error due to finite frequency offset for different data pattern, (b) probability of zero-crossing positions, (c) simulated rms jitter using behavior model, (d) example of offset-free realization.

## V. EXPERIMENTAL RESULTS

The CDR circuit has been designed and fabricated in 90-nm CMOS technology. Fig. 16(a) shows a photo of the die, which occupies  $0.8 \times 1.2 \text{ mm}^2$ . The circuit has been tested on a high-speed probe station with Anritsu random data generator providing the input. Testing setup is illustrated in Fig. 16(b). The circuit achieves a wide operation range of 800 Mb/s, across which no performance degradation is observed. The chip consumes a total power of 175 mW from a 1.5-V supply, where 102 mW is dissipated in the CDR core, 70 mW in the reference PLL, and 3 mW in the unity-gain buffer.

Fig. 17 depicts the time and frequency domain measurements on the 20-GHz output clock of the reference PLL. The rms and peak-to-peak jitters are 0.89 ps and 6.89 ps, respectively. The spectrum reveals reference spurs of less than -60 dBc. The loop bandwidth of the reference PLL is 1 MHz.

Fig. 18 shows the recovered data and clock in response to continuous mode PRBS of length  $2^7-1$  and  $2^{31}-1$ , suggesting data jitter of 1.27 ps,rms/8.0 ps,pp and 1.87 ps,rms/13.77 ps,pp, respectively. The recovered clock jitter is recorded as 1.2 ps,rms. As expected, the waveforms look a little shaky because the finite frequency offset accumulates over a longer period of time. The burst-mode operation has been verified by compiling the input data pattern as that in [1] and having it preceded and followed by long runs of 500 bits. Here, the sub-rate (1/64) clock from the PRBS generator provides the reference input so that no frequency offset is expected. The input-output waveforms around the edge of data arrival are plotted in Fig. 19, demonstrating an immediate locking without any missing bit. The CDR circuit achieves a BER





Fig. 16. (a) Chip micrograph, (b) testing setup.

of less than  $10^{-9}$  in both continuous ( $2^{31}-1$  PRBS) and burst modes. The free-running and injection-locked spectra of VCO<sub>2</sub> are shown in Fig. 20. The noise shaping phenomenon



Fig. 17. Measurements of the reference PLL: (a) clock waveform, (b) spectrum under locked condition.





Fig. 18. Recovered data and clock for (a)  $2^7 - 1$  (b)  $2^{31} - 1$  PRBS (horizontal scale: 20 ps/div, vertical scale: 100 mV/div).

is observed, suggesting that the VCO locking range for a fixed control voltage is about 22 MHz. Note that with the help of the frequency tracking PLL, our circuit achieves an operation range 36 times larger than this value.



Fig. 19. Input and output waveforms under burst-mode operation.



Fig. 20. Free-running and locked spectra.

The bit error rate measurement is also conducted here. With a fixed control voltage, we plot the BER as a function of the deviation frequency (Fig. 21). An error-free region of approximately  $\pm 20$  MHz verifies the estimation of locking range. We also measure the jitter performance as a function of frequency offset. It

|                        | [1]                                                   | [4]                                                  | [16]                                          | This Work                                                                                      |
|------------------------|-------------------------------------------------------|------------------------------------------------------|-----------------------------------------------|------------------------------------------------------------------------------------------------|
| Data Rate              | 10 Gb/s                                               | 10.3 Gb/s                                            | 10 Gb/s                                       | 20 Gb/s                                                                                        |
| Rec. Clock Jitter      | N/A                                                   | 1.47 ps, rms<br>(with 2 <sup>7</sup> −1 PRBS)        | 1.35 ps, rms<br>(with 2 <sup>7</sup> –1 PRBS) | 1.2 ps, rms<br>(with 2 <sup>7</sup> −1 PRBS)                                                   |
|                        |                                                       |                                                      |                                               | 1.26 ps, rms<br>(with 2 <sup>31</sup> –1 PRBS)                                                 |
| BER                    | < 10 <sup>-12</sup><br>(with 2 <sup>31</sup> -1 PRBS) | < 10 <sup>-12</sup><br>(with 2 <sup>7</sup> -1 PRBS) | N/A                                           | $< 10^{-12}$<br>(with 2 <sup>7</sup> -1 PRBS)<br>$< 10^{-9}$<br>(with 2 <sup>31</sup> -1 PRBS) |
| <b>Operation Range</b> | N/A                                                   | 160 MHz                                              | N/A                                           | 800 MHz                                                                                        |
| Locking Time           | 5 bits                                                | N/A                                                  | 32 bits                                       | 1 bit                                                                                          |

1.8 V

200 mW

2.0 mm x 1.7 mm

0.18-µm CMOS

TABLE I CDR PERFORMANCE SUMMARY

3.3 V

405 mW

1.25 mm x 1.05 mm

SiGe BiCMOS



2.5 V

1.2 W

2.5 mm x 2.5 mm

0.13- µm CMOS

Locking Time Supply Voltage

Power Diss.

Technology

Chip Area

Fig. 21. BER as a function of deviation frequency with  $2^7 - 1$  PRBS input.

is conducted by fixing the control voltage, and deliberately altering the input data rate. Fig. 22 shows the rms and peak-topeak jitter obtained with  $2^7-1$  PRBS input. Table I summarizes the performance of this work and some other burst-mode CDRs recently published in the literature.

### VI. CONCLUSION

A new approach to realize clock and data recovery from NRZ data stream has been introduced. Based on injection locking technique, this circuit simplifies the CDR design significantly and provides instant locking for burst-mode systems. With the help of frequency tracking PLL, it reaches a truly wide operation range (800 Mb/s), accommodating severe frequency deviations caused by PVT variations. This work holds great promise for future burst-mode communication systems running at tens of gigabits per second.

#### REFERENCES

[1] M. Nogawa et al., "A 10 Gb/s burst-mode CDR IC in 0.13 µm CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2005, pp. 228-229.



1.5 V

175 mW

0.8 mm x 1.2 mm

90-nm CMOS

Fig. 22. Measured jitter on the recovered clock: (a) rms, (b) peak-to-peak, with 27-1 PRBS input.

- [2] M. van Ierssel et al., "A 3.2 Gb/s semi-blind-oversampling CDR," in IEEE ISSCC Dig. Tech. Papers, Feb. 2006, pp. 334-335.
- [3] T. Gabara, "A 3.25 Gb/s injection locked CMOS clock recovery cell," in Proc. IEEE Custom Integrated Circuits Conf. (CICC), Sep. 1999, pp. 521-524.
- [4] J. Zhan et al., "A full-rate injection-locked 10.3 Gb/s clock and data recovery circuit in a 45 GHz $-f_T$  SiGe process," in *Proc. IEEE Custom* Integrated Circuits Conf. (CICC), Sep. 2005, pp. 557-560.
- [5] J. Lee, "A 20-Gb/s adaptive equalizer in 0.13-µm CMOS technology," IEEE J. Solid-State Circuits, vol. 41, pp. 2058-2066, Sep. 2006.
- [6] R. Adler, "A study of locking phenomena in oscillators," Proc. IEEE, vol. 61, pp. 1380-1385, Oct. 1973.

- [7] B. Razavi, "A study of injection locking and pulling in oscillators," *IEEE J. Solid-State Circuits*, vol. 39, pp. 1415–1424, Sep. 2004.
- [8] J. Lee, "A 3-to-8-GHz fast-hopping frequency synthesizer in 0.18-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, pp. 566–573, Mar. 2006.
- [9] J. Lee and B. Razavi, "A 40-Gb/s clock and data recovery circuit in 0.18-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, pp. 2181–2190, Dec. 2003.
- [10] S. Park et al., "A 4 GS/s 4b Flash ADC in 0.18 μm CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2006, pp. 570–571.
- [11] A. Maxim, "A -86 dBc reference spurs 1-5 GHz 0.13 μm CMOS PLL using a dual-path sampled loop filter architecture," in Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2005, pp. 248–251.
- [12] T. Lee and W. Lee, "A spur suppression technique for phase-locked frequency synthesizers," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 592–593.
- [13] R. Gu et al., "A 6.25 GHz 1 V LC-PLL in 0.13 μm CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2006, pp. 594–595.
- [14] B. Razavi, Design of Integrated Circuits for Optical Communications. New York: McGraw-Hill, 2002.
- [15] J. Lee and S. Wu, "Design and analysis of a 20-GHz clock multiplication unit in 0.18-μm CMOS technology," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2005, pp. 140–143.
- [16] C. Liang et al., "A 10 Gbps burst-mode CDR circuit in 0.18  $\mu$ m CMOS," in Proc. IEEE Custom Integrated Circuits Conf. (CICC), Sep. 2006.



**Jri Lee** (S'03–M'04) received the B.Sc. degree in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1995, and the M.S. and Ph.D. degrees in electrical engineering from the University of California, Los Angeles (UCLA), both in 2003.

After military service (1995-1997), he was with Academic Sinica, Taipei, Taiwan, from 1997 to 1998, and subsequently with Intel Corporation from 2000 to 2002. He has been with National Taiwan University since 2004, where he is currently Associate Professor

of electrical engineering. His current research interests include high-speed wireless and wireline transceivers, phase-locked loops, and data converters.

Prof. Lee is currently serving in the Technical Program Committees of the IEEE International Solid-State Circuits Conference (ISSCC), the Symposium on VLSI Circuits, and the Asian Solid-State Circuits Conference (A-SSCC). He has received the Beatrice Winner Award for Editorial Excellence at the 2007 ISSCC, the Takuo Sugano Award for Outstanding Far-East Paper at the 2008 ISSCC, and the NTU Outstanding Teaching Award in 2007.



**Mingchung Liu** was born in Taipei, Taiwan, in 1982. He received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, in 2005 and 2007, respectively.

His research interests include broadband data communication circuits, phase-locked loops and clock and data recovery circuits.