# LOW JITTER LOW POWER PHASE LOCKED LOOPS USING SUB-SAMPLING PHASE DETECTION

Xiang Gao

#### **Promotion Committee:**

| Chairman:           | prof.dr.ir. A. J. Mouthaan    | Universiteit Twente            |
|---------------------|-------------------------------|--------------------------------|
| Secretary:          | prof.dr.ir. A. J. Mouthaan    | Universiteit Twente            |
| Promotor:           | prof.dr.ir. B. Nauta          | Universiteit Twente            |
| Assistant Promotor: | dr.ing. E. A. M. Klumperink   | Universiteit Twente            |
| Members:            | prof.dr.ir. F. E. van Vliet   | Universiteit Twente            |
|                     | prof.ir. A. J. M. van Tuijl   | Universiteit Twente            |
|                     | prof.dr.ir. M. S. J. Steyaert | Katholieke Universiteit Leuven |
|                     | prof.dr.ir. R. B. Staszewski  | Technische Universiteit Delft  |

| Title:  | Low Jitter Low Power Phase Locked Loops Using Sub-Sampling |  |
|---------|------------------------------------------------------------|--|
|         | Phase Detection                                            |  |
| Author: | Xiang Gao                                                  |  |
| ISBN:   | 978-90-365-3022-4                                          |  |
| DOI:    | 10.3990/1.9789036530224                                    |  |

© 2010, Xiang Gao All rights reserved.

This work was supported by National Semiconductor Corporation, Santa Clara, California.

## LOW JITTER LOW POWER PHASE LOCKED LOOPS USING SUB-SAMPLING PHASE DETECTION

DISSERTATION

to obtain the degree of doctor at the University of Twente, on the authority of the rector magnificus, prof.dr. H. Brinksma, on account of the decision of the graduation committee, to be publicly defended on Wednesday 9 June 2010 at 16:45

by

Xiang Gao born on 5 June 1983 in Zhejiang, China This dissertation is approved by

the promotor, prof.dr.ir. B. Nauta

and the assistant promotor, dr.ing. E. A. M. Klumperink

"Everything should be made as simple as possible, but not simpler." – Albert Einstein

To Xiaoyan, Ruoxi and my parents

## Abstract

A periodic clock signal is required in many ICs. These clocks are for instance used to define the sampling moments in data converters; to up-convert and down-convert the wanted signals in wireless transceivers and to synchronize the data flow in wireline and optical serial data communication links. The clock timing/phase accuracy affects the overall system performance and therefore a clock generator should have low jitter/phase-noise. Moreover, a clock generator is also desired to dissipate low power to save energy.

This thesis aims to design a clock generation phase-locked loop (PLL) with low jitter as well as low power. It starts with the classical PLL phase noise and jitter analysis. Different sources of PLL phase noise are identified and analyzed. The overall PLL phase noise and output jitter are calculated and optimization methods are discussed. The scaling of the PLL jitter and power with the input frequency, output frequency and the division ratio *N* are examined and a benchmark figure-of-merit is proposed to evaluate the overall PLL jitter and power performance.

In some applications, e.g. time-interleaved ADCs and image and harmonic rejection radio transceivers, a group of clocks with multiple phases are needed. Two competing techniques to realize such clocks, one based on a shift register (SR) and the other on a delay-locked loop (DLL), are discussed. The relative merits of the two techniques are compared, primarily based on their jitter and power performance. Analysis shows that a SR is not only more flexible, but also almost always generates less jitter than a DLL for a given power, when both are realized with current mode logic. The analytical results are verified with simulation results. To generate high quality multi-phase clocks, both methods need a reference clock with low jitter. Such a reference clock can be generated using a low jitter PLL which is the main topic of this thesis.

In a classical PLL, the phase detector (PD), charge pump (CP) and divider noise is multiplied by  $N^2$  due to the existence of the divide-by-N in the feedback path. This is often the bottleneck for low PLL in-band phase noise. This work proposes to use a sub-sampling PLL (SSPLL) architecture to break this bottleneck. The SSPLL exploits a sub-sampling phase detector (SSPD) that directly samples the high frequency VCO output with the low frequency reference clock and converts the VCO phase error into sampled voltage variation. The SSPLL is divider-less in the locked state and thus has no divider noise. Furthermore, analysis shows that the PD and CP noise is not multiplied by  $N^2$  in this PLL, resulting in very low in-band phase noise. To prove the concepts, a fully integrated 2.2 GHz SSPLL is implemented in 0.18-µm CMOS. It achieves -126 dBc/Hz at 200 kHz in-band phase noise, 0.15 ps rms output jitter (10 kHz to 40 MHz), -46 dBc reference spur while consuming 7.6 mW. When normalized to the same output jitter, it is an order of magnitude more power efficient than the state-of-art classical PLLs.

In order to improve the power efficiency of the SSPLL even further, a buffer-less direct VCO sampling scheme is proposed. No buffer is used between the VCO and SSPD while dummy samplers keep the disturbance of the SSPD sampling to the VCO low. Furthermore, a modified inverter with separate gate control for the NMOS and PMOS transistors are proposed to convert the sine-wave reference clock to a square wave in a power efficient way. By making the conduction time for the NMOS and PMOS non-overlapping, the direct current path from the supply to ground is eliminated, which eliminates the inverter short-circuit current and drastically reduces its power consumption. Measurements show that a 2.2 GHz SSPLL designed with these techniques can achieve -125 dBc/Hz in-band phase noise at 200 kHz while only dissipating 700  $\mu$ W power excluding the VCO. The whole SSPLL consumes 2.5 mW and its rms jitter is 0.16 ps (10 kHz to 100 MHz). The reference spurs measured from 20 chips are lower than -56 dBc.

Although the previous two SSPLL designs achieve very low phase-noise/jitter with low power, the reference spurs are relatively high. In the classical PLL, the major source of spurs is usually the mismatch between the CP up- and down- current sources. In contrast, analysis reveals that the CP in the SSPLL is actually insensitive to mismatch due to its amplitude-controlled nature. Analysis reveals that the main source of the SSPLL spurs is the SSPD sampler which periodically disturbs the VCO operation via charge injection, charge sharing and frequency modulation by periodically changing the VCO capacitive load. Dummy samplers and isolation buffers are then used to minimize the disturbance of the SSPD to the VCO. A duty cycle controlled reference buffer with DLL tuning are proposed to further reduce the worst case spur. To verify the spur reduction concepts, a new SSPLL design optimizing for low spur is fabricated in 0.18-µm CMOS. While using a high loop-bandwidth-to-reference-frequency ratio of 1/20, the reference spurs measured from 20 chips are <-80 dBc. The rms output jitter is 0.3 ps (10 kHz to 100 MHz) while the power consumption is 3.8 mW.

## Samenvatting

Vele ICs (Integrated Circuits, "chips") hebben een periodiek kloksignaal nodig. Deze klok wordt bijvoorbeeld gebruikt om het moment van bemonstering ("samplen") bij data conversie te bepalen; om de frequentietranslatie in draadloze zenders en/of ontvangers vast te leggen en om datastromen over kabels en optische communicatie verbindingen te synchroniseren. De nauwkeurigheid van de timing c.q. fase van de klok beïnvloedt de haalbare systeemprestaties, zodat een klokgenerator een lage jitter en fasefout moeten hebben. Bovendien dient een klokgenerator bij voorkeur weinig vermogen te gebruiken om energie te besparen.

Het in dit proefschrift beschreven werk is erop gericht een klokgeneratie Phase Locked Loop (PLL) te ontwerpen met lage jitter. Het proefschrift analyseert eerst de faseruis en jitter in een klassieke PLL, waarbij verschillende ruisbronnen worden geïdentificeerd en gekwantificeerd. De totale faseruis en jitter van het uitgangssignaal wordt berekend en methodes om deze te optimaliseren worden besproken. De afhankelijkheid van PLL jitter en vermogensdissipatie van de frequentie van de PLL in- en uitgangssignaal en het deelgetal *N* worden onderzocht, waarbij een figure of merit (FoM) wordt voorgesteld ter evaluatie van de totale jitter in relatie tot de vermogensdissipatie van een PLL.

In sommige toepassingen, b.v. time-interleaved ADCs en ten behoeve van de spiegelonderdrukking en harmonischen onderdrukking in radiozenders en ontvangers is een groep klokken met equidistante fases nodig. Twee concurrerende technieken om dergelijke klokken te realiseren worden vergeleken, één op basis van een schuifregister (SR) en de andere op basis van een delay-locked loop (DLL). De relatieve verdiensten van de twee technieken worden vergeleken, hoofdzakelijk op basis van hun jitter en vermogensverbruik. De analyse toont aan dat een SR niet alleen flexibeler is, maar ook vrijwel altijd minder jitter produceert dan een DLL die evenveel vermogen verbruikt, veronderstellend dat beide gerealiseerd zijn in current-mode logica. De analytische resultaten zijn geverifieerd met simulatieresultaten. Om meerfasige klokken van goede kwaliteit te produceren, is een hoogfrequente referentieklok met lage jitter nodig, wat het hoofddoel van dit proefschrift is.

In een klassieke PLL worden de ruisbijdragen van fasedetector (PD), de ladingspomp (CP) en de frequentiedeler naar de uitgang vermenigvuldigd met  $N^2$ , vanwege de aanwezigheid van de deler in het terugkoppelpad. Deze ruisbijdragen vormen veelal de bottleneck voor de laagfrequente faseruis van de PLL, binnen de lusbandbreedte. In dit proefschrift wordt een sub-sampling PLL (SSPLL) architectuur voorgesteld om deze bottleneck te breken. De SSPLL exploiteert een sub-sampling fasedetector (SSPD) die de hoogfrequente VCO-uitgang direct bemonstert in het ritme van de referentieklok, waarbij de fasefout van de VCO omgezet wordt in een spanning. De SSPLL heeft geen deler en daardoor geen faseruis bijdrage van de deler. Voorts toont analyse aan dat, anders dan in de klassieke PLL, de ruis van de PD en CP niet worden vermenigvuldigd met  $N^2$  in de SSPLL, hetgeen resulteert in een zeer lage faseruis binnen de lusbandbreedte. Om de praktische waarde van de architectuur te bewijzen is een 2.2 GHz SSPLL ontworpen die volledig is geïntegreerd op een 0.18-µm CMOS chip. Metingen geven een faseruis van -126 dBc/Hz bij 200 kHz binnen de lusbandbreedte, 0.15 ps rms output jitter (van 10 kHz t/m 40 MHz), een spur van -46dBc en een vermogensverbruik van 7.6 mW. Genormeeerd naar dezelfde output jitter, is de SSPLL een grootteorde meer energie efficiënt dan de klassieke PLL.

Om het vermogensverbruik van de SSPLL nog verder te verbeteren, wordt voorgesteld om een VCO direct te bemonsteren zonder de VCO te bufferen. Door "dummy-samplers" toe te voegen, wordt de verstoring van de VCO door het sampling proces geminimaliseerd. Voorts wordt een gewijzigde referentiebuffer voorgeteld waarin de NMOS en PMOS apart wordt geschakeld om de sinus-golf op een efficiënte manier om te zetten naar een blokgolf. Door gelijktijdige geleiding van de NMOS en PMOS te vermijden, wordt het directe stroompad tussen de voeding en aarde geëlimineerd, wat het vermogensgebruik drastisch vermindert. Metingen aan een 2.2 GHz SSPLL met deze technieken geven een faseruis van -125 dBc/Hz binnen de lusbandbreedte bij 200 kHz, terwijl slechts 700  $\mu$ W vermogen wordt gebruikt door de loopcomponenten (PLL exclusief de VCO). Het totale vermogensverbruik van deze SSPLL is 2.5 mW en zijn rms jitter is 0.16 ps (van 10 kHz t/m 100 MHz). De spurs zijn <-56 dBc voor 20 gemeten chips.

Hoewel de vorige twee SSPLL ontwerpen een zeer lage faseruis en jitter hebben bij een laag vermogensgebruik, zijn de spurs nog vrij sterk. In de klassieke PLL, is de belangrijkste bron van spurs gewoonlijk de ongelijkheid van de laad- en ontlaadstroom van de CP. Uit analyse blijkt echter dat de CP in een SSPLL ongevoelig is voor deze ongelijkheid, door zijn amplitude geregelde karakter. Uit analyse blijkt dat de belangrijkste bron van spurs in de SSPLL wordt gevormd door periodieke verstoringen van de VCO door de sampling actie. Dit gebeurt via ladingsinjectie, ladingsdeling en frequentiemodulatie door de periodiek variabele capacitieve belasting. Door het gebruik van dummy samplers en isolerende buffers kan de verstoring van de VCO in de SSPD geminimaliseerd worden. Om de worst case spur te verbeteren wordt de duty-cycle het referentiebuffer signaal afgeregeld via een DLL. Om de reductie in spurs te verifiëren, is een SSPLL ontworpen in 0.18-µm CMOS. De SSPLL chip geeft bij een hoge verhouding van de lusbandbreedte en referentiefrequentie van 1/20 voor 20 chips in alle gevallen een spur <-80 dBc. De rms output jitter is 0.3 ps (van 10 kHz t/m 100 MHz) terwijl het vermogensgebruik slechts 3.8 mW is.

# Contents

| Abstract |                                                             | i   |
|----------|-------------------------------------------------------------|-----|
| Sa       | amenvatting                                                 | iii |
| L        | ist of Abbreviations                                        | ix  |
| 1        | Introduction                                                | 1   |
|          | 1.1 Low Jitter PLL: Motivation                              | 1   |
|          | 1.2 A Brief PLL Review                                      | 3   |
|          | 1.3 Research Objectives                                     | 6   |
|          | 1.4 Thesis Organization                                     | 7   |
|          | 1.5 References                                              | 8   |
|          |                                                             |     |
| 2        | Classical PLL Jitter Analysis                               | 11  |
|          | 2.1 Introduction                                            | 11  |
|          | 2.2 Classical PLL Phase Domain Model                        |     |
|          | 2.3 VCO Phase Noise and Benchmarking                        | 14  |
|          | 2.4 Loop Phase Noise and Benchmarking                       | 14  |
|          | 2.4.1 Phase Noise due to the Reference Path, Divider and PD | 15  |
|          | 2.4.2 Phase Noise due to the CP                             | 17  |
|          | 2.4.3 Loop Phase Noise Benchmarking                         | 17  |
|          | 2.5 PLL Jitter and Benchmarking                             |     |
|          | 2.5.1 PLL Output Jitter                                     |     |
|          | 2.5.2 PLL Jitter Optimization                               |     |
|          | 2.5.3 PLL Benchmarking                                      |     |
|          | 2.6 Conclusion                                              |     |
|          | 2.7 References                                              | 25  |
| 3        | Low Jitter Multi-phase Clock Generation                     | 27  |

| 3.1 Introduction                                            | 27 |
|-------------------------------------------------------------|----|
| 3.2 DLL MPCG Jitter                                         | 28 |
| 3.2.1 DLL MPCG Architecture                                 | 28 |
| 3.2.2 DLL MPCG Output Jitter                                | 29 |
| 3.3 SR MPCG JITTER                                          | 30 |
| 3.3.1 SR MPCG Architecture                                  | 30 |
| 3.3.2 SR MPCG Output Jitter                                 | 30 |
| 3.4 Comparison between DLL and SR MPCG Jitter               | 31 |
| 3.4.1 Comparing Jitter Transferred from the Reference Clock | 31 |
| 3.4.2 Comparing Jitter Generated due to Thermal Noise       | 32 |
| 3.4.3 Comparing Jitter Generated due to Mismatch            | 35 |
| 3.4.4 Discussion                                            | 36 |
| 3.5 Simulation Results                                      | 38 |
| 3.6 Conclusion                                              | 38 |
| 3.7 References                                              | 39 |
|                                                             |    |
| Low Jitter Sub-Sampling PLL 4                               | 41 |
| 4.1 Introduction                                            | 41 |
| 4.2 Low Noise Phase Detection                               | 42 |

| 4.2 Low Noise Phase Detection        |    |
|--------------------------------------|----|
| 4.2.1 Classical 3-state PFD/CP       | 43 |
| 4.2.2 Proposed Sub-Sampling PD/CP    | 44 |
| 4.2.3 CP Noise Comparison            | 47 |
| 4.3 Sub-Sampling PLL                 |    |
| 4.3.1 Modeling and Noise Analysis    |    |
| 4.3.2 Chip Area Considerations       |    |
| 4.3.3 SSPD/CP with Gain Control      |    |
| 4.3.4 Frequency Locking              | 53 |
| 4.4 Design and Implementation        | 54 |
| 4.4.1 VCO and Measurement Buffer     | 54 |
| 4.4.2 Phase Detector and Charge Pump |    |
| 4.4.3 3-state PFD/CP with Dead Zone  |    |
| 4.5 Experimental Results             | 58 |

4

| 4.6 Conclusion | 63 |
|----------------|----|
| 4.7 References |    |

## 5 Power Reduction Techniques for SSPLI

| Power Reduction Techniques for SSPLL | 67 |
|--------------------------------------|----|
| 5.1 Introduction                     |    |
| 5.2 Buffer-less Direct VCO Sampling  |    |
| 5.3 Low Power Ref Buffer             |    |
| 5.4 Design and Implementation        |    |
| 5.5 Experimental Results             | 74 |
| 5.6 Conclusion                       | 77 |
| 5.7 References                       |    |

| 6 | Spur Reduction Techniques for SSPLL                      | 81  |
|---|----------------------------------------------------------|-----|
|   | 6.1 Introduction                                         | 81  |
|   | 6.2 Spur due to Charge Pump                              | 83  |
|   | 6.2.1 Conventional CP                                    | 83  |
|   | 6.2.2 Low Spur CP Using Sub-sampling                     | 84  |
|   | 6.3 Spur due to VCO Sampling and Techniques to Reduce It | 86  |
|   | 6.3.1 BFSK Effect                                        | 87  |
|   | 6.3.2 Charge Sharing/Injection                           | 89  |
|   | 6.3.3 Low Spur SSPLL Architecture                        | 91  |
|   | 6.4 Design and Implementation                            | 92  |
|   | 6.4.1 VCO                                                |     |
|   | 6.4.2 SSPD/CP with Pulser                                | 94  |
|   | 6.4.3 SSDLL                                              | 94  |
|   | 6.4.4 Settling Behavior                                  | 95  |
|   | 6.5 Experimental Results                                 | 97  |
|   | 6.6 Conclusion                                           | 100 |
|   | 6.7 References                                           | 101 |
|   |                                                          |     |

| 7 | Conclusions                 | 103   |
|---|-----------------------------|-------|
|   | 7.1 Summary and Conclusions | . 103 |

| 7.2 Original Contributions          |     |
|-------------------------------------|-----|
| 7.3 Recommendations for Future Work |     |
| 7.4 References                      |     |
| List of Publications                | 111 |
| Acknowledgements                    | 115 |
| About the Author                    | 117 |

# List of Abbreviations

| ADC   | Analog-to-Digital Converter             |
|-------|-----------------------------------------|
| BFSK  | Binary Frequency Shift Keying           |
| СР    | Charge Pump                             |
| CML   | Current Mode Logic                      |
| CMOS  | Complementary Metal-Oxide-Semiconductor |
| DCO   | Digitally Controlled Oscillator         |
| DFF   | D Flip-Flop                             |
| DLL   | Delay-Locked Loop                       |
| DU    | Delay Unit                              |
| DZ    | Dead Zone                               |
| FLL   | Frequency-Locked Loop                   |
| FOM   | Figure-of-Merit                         |
| IC    | Integrated Circuit                      |
| LF    | Loop Filter                             |
| MPCG  | Multi-Phase Clock Generator             |
| PD    | Phase Detector                          |
| PFD   | Phase-Frequency Detector                |
| PLL   | Phase-Locked Loop                       |
| rms   | Root-Mean-Square                        |
| SNR   | Signal-to-Noise Ratio                   |
| SSPD  | Sub-Sampling Phase Detector             |
| SSPLL | Sub-Sampling PLL                        |
| TDC   | Time-to-Digital Converter               |
| VCDL  | Voltage-Controlled Delay Line           |
| VCO   | Voltage-Controlled Oscillator           |
| XO    | Crystal Oscillator                      |
|       |                                         |

## Chapter 1

## Introduction

#### **1.1 Low Jitter PLL: Motivation**

The integrated circuit (IC) has enjoyed an exponential growth in the last half century since it was invented in 1959 [1], [2]. Following the famous Moore's law [3], the number of transistors in an IC or "chip" has doubled approximately every two years and reached more than two billion in 2009 [4]. IC products are now ubiquitous and universal in everyday life.

A periodic clock signal is required in many ICs. These clocks are for instance used to define the sampling moments in analog-to-digital or digital-or-analog data converters; to up-convert and down-convert the wanted signals in wireless transceivers; to synchronize the data flow in wireline and optical serial data communication links, and last but certainly not least, as a metronome to coordinate the actions of internal circuits in digital ICs.

An ideal clock is a periodic signal with a constant frequency. It delivers edge transitions or zero-crossings at precise time intervals. In reality, the frequency of the clock signal fluctuates around its mean value due to e.g. the thermal noise in the electronic devices in the clock generator. In the time domain, the inaccuracies lead to a deviation of the edge transitions of the practical clock to that of the ideal clock, called jitter or timing jitter. In the frequency domain, clock inaccuracies result in spectral components at frequencies other than the desired frequency, referred to as phase noise or spurious signals ("spurs" for short). Jitter and phase noise are related and linked with mathematical equations [5] since they characterize the inaccuracies of the same clock in the time and frequency domain.

In general, the jitter or phase noise on the clock signal results in a degradation of the signal-to-noise ratio (SNR) of the signals clocked by or mixed with it. Therefore, the clock source must exhibit very low levels of jitter or phase noise in high performance ICs. One critical example is a high speed high resolution analog-to-digital converter (ADC). Fig. 1.1 shows a simple model of the sampling process in an ADC, where a sine-wave signal with frequency  $f_{sig}$  and amplitude  $A_{sig}$  is sampled by a clock. The signal voltage is sampled at the rising edge of the clock and later converted to digital by a quantizer in the ADC. Due to



Figure 1.1. Sampling process in an ADC.

jitter in the clock, the actual sampling moment deviates from the ideal one by  $\Delta t$ , resulting in an error in the sampled voltage:

$$\Delta v_{sam} = A_{sig} \cdot 2\pi f_{sig} \cdot \Delta t \cdot \cos(2\pi f_{sig} t) \,. \tag{1.1}$$

As a result of this sampled voltage error, the SNR of the ADC is degraded. Defining  $\sigma_t$  as the root-mean-square (rms) value of the clock jitter  $\Delta t$ , the achievable ADC SNR can be calculated to be [6]:

$$SNR_{jitter} = 20\log(\frac{1}{2\pi f_{sig} \cdot \sigma_t}) \cdot$$
(1.2)

The achievable SNR of the ADC is limited by (1.2) even if the quantizer is perfect. Fig. 1.2 plots the ADC SNR and the corresponding effective number of bits for different input signal frequency limited by a certain amount of sampling clock jitter. We see that for an ADC with a higher resolution and higher frequency, the requirement on the sampling clock jitter is more stringent. In order to realize a high performance ADC, a sampling clock with low jitter thus must be available.

Crystal oscillators (XOs) can provide very accurate and stable clocks due to the high quality factor (in the range of  $10^4$  to  $10^6$ ) of the quartz crystal. However, the frequency of a practical crystal is often limited to tens-of-MHz [7]. For frequencies as needed on chip, which are typically in the GHz-range, no crystal is available. The frequency of the XO thus should be multiplied up before it can be used. The most common way of realizing frequency multiplication is using a phase-locked loop (PLL). A delay-locked loop (DLL) with an edge combiner can also be used as a frequency multiplier. However, its jitter performance for a given power budget is worse than for a PLL [8], especially when an LC oscillator is used in the PLL. This thesis focuses on the design of frequency multiplication PLLs for applications that require high speed clocks with very low jitter such as high performance ADCs.



Figure 1.2. Achievable ADC SNR with certain signal frequency and sampling clock jitter.

#### **1.2 A Brief PLL Review**

A PLL is a feedback system in which the feedback signal is used to lock the frequency and phase of the output signal to the frequency and phase of an input signal. The earliest concept of a PLL was provided by de Bellescize in 1932 [9]. However, the PLL did not fall into widespread use until the IC technology had advanced enough. The first integrated PLL debuted in ISSCC in 1969 [10] and one of the earliest use of a feedback divider in a PLL for frequency multiplication appeared in 1970 [11]. Since then the PLL has become a ubiquitous component in modern ICs due to its versatility. Apart from frequency multiplication and clock generation, PLLs can for instance also be used for frequency synthesis, frequency modulation and demodulation, clock and data recovery, synchronization, skew compensation and spread spectrum signal generation.

To the present time, many different PLL architectures [12-15] have been developed. The one shown in Fig. 1.3(a) is probably the most popular architecture. It consists of an input reference clock Ref, a phase detector (PD) or phase-frequency-detector, a charge pump (CP), a loop filter (LF), a voltage controlled oscillator (VCO) which creates an output frequency and a frequency divider with division ratio N ( $\div$ N). The basic operation of the PLL is as follows. The PD compares its two input signals and generates two signals UP and DN with the pulse width proportional to the amount of phase difference at the input, see Fig. 1.3(b). The CP consists of two current sources switched by UP and DN, driving a low pass filter. The filter output is used to drive the VCO and adjusts its oscillation frequency. The frequency divider reduces the VCO frequency by N times and feeds it back to the input of the PD, producing a negative feedback loop. When the loop reaches steady state, the reference clock and the divider output have the same phase and phase locking is achieved.



Figure 1.3. Classical charge pump PLL (a) schematic; (b) timing diagram.

The reference clock and the divider output then also have the same frequency since frequency is the first derivative of phase. In other words, the VCO frequency is equal to N times the reference frequency and frequency multiplication is achieved. Although the implementation of individual blocks may be different, most of the modern PLLs [16-25] have the same architecture as the one in Fig. 1.3(a). Therefore, we will refer to it as the "classical PLL" architecture.

In addition to the PD described in Fig. 1.3, other types of PD like a mixer and sampleand-hold also exist. For example, the sampling or sample-and-hold PD [12] uses one input to sample another input similar to the case in Fig. 1.1. The sampled voltage value represents the phase difference between the two input signals. It will be described in more detail in Chapter 4 and we will see that a sub-sampling PD derived from the sample-and-hold PD can bring significant phase noise benefits.

The PD (and the succeeding CP, LF) can also be implemented digitally, resulting in a digital PLL. Digital PLLs [26-28] are recently becoming popular because they benefit from the shrinking transistor size and have better programmability and portability over different processes. Since oscillators are inherently analog and the phase information is continuous, a time-to-digital converter (TDC) and a digitally controlled oscillator (DCO) are often used to interface the analog and digital world. Both the limited resolution of the TDC and the DCO contribute to quantization noise or limit cycle within the loop, which results in either phase noise or spurs at the PLL output. The resolution of the TDC and DCO improves with the advance of technology. However, the jitter performance of the digital PLL is still fundamentally limited by the inherent noise of the analog components in the TDC and DCO.



Figure 1.4. Jitter and power performance of state-of-art classical PLLs in literature.

For clock generation PLLs, two of the most important parameters are jitter  $\sigma_t$  and power *P*. As we will see from Chapter 2, the PLL performance can be characterized using a figure-of-merit (FOM) defined as:

$$FOM_{PLL} = 10\log[(\frac{\sigma_t}{ls})^2 \cdot \frac{P}{lmW}].$$
(1.3)

A smaller  $FOM_{PLL}$  corresponds to a better PLL design. Fig. 1.4 plots the performance of state-of-the-art classical PLLs in literature [16-28]. We see that they typically output more than 0.2 ps rms jitter while consuming tens-of-mW, which may not be good enough in applications that require very low jitter as well as low power consumption.

#### **1.3 Research Objectives**

The main objective of this PhD work is to explore the performance limits of existing PLLs and provide solutions to overcome these limits. We focus on the area of clock multiplication for applications that require high speed clocks with very low jitter such as high performance data links and ADCs, making the integer-*N* structure a great candidate due to its simplicity.

Apart from low jitter, a clock generator is also desired to dissipate low power. This is especially important in portable applications, such as cellular phones where lower power consumption leads to longer talk time and a longer battery life. Power dissipated by the PLL may be a small fraction of the total active power in the system. However, during sleep modes where the PLL must remain in lock, it can be a significant fraction of dissipated power. The low power requirement makes the design of low jitter PLL even more challenging due to the fundamental trade-off between power and noise.

In order to design a low jitter PLL, a deep understanding of the PLL noise mechanisms is needed. One of the first goals of this thesis is thus to study the phase noise and jitter of the classical PLL. We will analyze different PLL noise sources and their relative impacts. A relation between jitter, phase noise, power consumption and loop bandwidth will be derived and jitter optimization methods will be described. Based on the insights developed, a benchmark FOM relating jitter and power will be defined, which stimulates the design of power efficient PLLs.

In some applications e.g. time-interleaved ADCs [29], clocks with more than one phase are needed. Therefore, we will study two common multi-phase clock generation methods, one using a delay-locked loop (DLL) and the other a shift register (SR). The jitter and power performance of the two methods will be analyzed and their relative merits will be compared.

The ultimate goal of our research is to develop a fully integrated PLL with low jitter (on the order of 100 fs) with low power consumption (on the order of 10 mW). The title of the thesis refers to this goal. The term "sub-sampling phase detection" refers to the phase detection technique we used in this work. In the end we will demonstrate several PLL designs which meet the target and have >10 times less power dissipation than the PLLs in Fig. 1.5 when normalized to the same amount of output jitter.

From the implementation aspects, we will focus on the complementary metal-oxidesemiconductor (CMOS) technology. In CMOS technology, both the digital and analog parts of a complete system can be integrated on the same die, leading to smaller size and cost but also reduced power dissipation for instance by eliminating power consuming chip-to-chip interfacing.

#### **1.4 Thesis Organization**

The rest of the thesis is organized as follows.

Chapter 2 describes the classical PLL architecture and analyzes its phase noise performance using the small signal phase domain model. Different sources of PLL phase noise and power consumption are identified and analyzed. The overall PLL output jitter is calculated and jitter optimization methods are discussed. The scaling of the PLL jitter and power with the input frequency, output frequency and the division ratio *N* are examined. Based on the insights developed, a benchmark figure-of-merit to evaluate PLL jitter performance in relation to the consumed power is proposed [30].

Chapter 3 deals with low jitter multi-phase clock generation. Such clocks are for instance needed for time-interleaved ADCs and for image and harmonic rejection radio transceivers exploiting multiple clock phases. Two competing techniques to realize such clocks, one based on a shift register (SR) and the other on a DLL, are discussed [31], [32]. The relative merits of the two techniques are compared, primarily based on their jitter generation and power consumption. Analysis shows that a SR is not only more flexible, but also almost always generates less jitter than a DLL for a given power, assuming both are realized with current mode logic circuits. The analytical results are verified with simulation results. To generate high frequency multi-phase clocks, both methods need a low jitter high frequency reference clock, which can be generated from a crystal oscillator using a clock multiplying PLL. A PLL design with very low jitter is discussed in Chapter 4.

One important conclusion from Chapter 2 is that the PD, CP and divider noise is multiplied by  $N^2$  in a classical PLL due to the divide-by-N in the feedback path. This is often the bottleneck for a classical PLL to achieve low phase noise. Chapter 4 proposes a new sub-sampling based PLL architecture which can break this bottleneck [33], [34]. It uses a PD that sub-samples the VCO output with the reference clock. No divider is needed in the locked state and hence divider noise and power can be eliminated. Moreover, analyses shows that the PD and CP noise is not multiplied by  $N^2$  in this sub-sampling PLL (SSPLL), resulting in a low noise contribution from the PD and CP. To prove the concept, a 2.2 GHz SSPLL with a frequency division ratio of 40 is implemented in a standard 0.18- $\mu$ m CMOS process. The in-band phase noise at 200 kHz offset is measured to be -126 dBc/Hz. The reference spur is -46 dBc. The SSPLL has an rms output jitter of 0.15 ps (integrated over 10 kHz to 40 MHz) while consuming 5.8 mW on the loop-components and 1.8 mW on the VCO. When normalized to the same output jitter, this SSPLL is an order of magnitude more power efficient than the state-of-art classical PLLs.

Chapter 5 elaborates design techniques that can boost the power efficiency of the SSPLL even further [35]. We aim to reduce the loop-components power of the SSPLL in Chapter 4 by an order of magnitude while keeping its superior in-band phase noise performance. To this end, a buffer-less direct VCO sampling scheme is proposed which eliminates the power hungry VCO buffer. Dummy samplers are added to keep the disturbance of the SSPD sampling to the VCO low. A modified inverter with separate gate control for the NMOS and PMOS transistors are used as a power efficient reference clock buffer. By making the conduction time for the NMOS and PMOS non-overlapping, direct current path from the supply to ground is eliminated, thereby eliminating the inverter short-circuit current. Measurements show that a 2.2 GHz SSPLL designed with these techniques can achieve - 125 dBc/Hz in-band phase noise at 200 kHz with only 700  $\mu$ W loop-components power. The whole SSPLL consumes 2.5 mW while the rms jitter is 0.16 ps (10 kHz to 100 MHz). The reference spurs measured from 20 samples are lower than -56 dBc.

Although the SSPLLs in Chapter 4 and 5 achieve very low phase-noise/jitter with low power, the measured reference spurs are relatively high. Chapter 6 analyzes the SSPLL spur mechanisms and proposes design techniques to drastically reduce the spur level [37], [38]. It is discovered that the amplitude-controlled CP in the SSPLL is actually insensitive to mismatch and generates low ripple. The main source of the SSPLL spur is the SSPD sampler which periodically disturbs the VCO operation via charge injection, charge sharing and frequency modulation by periodically changing the VCO capacitive load. A DLL/PLL dual loop architecture and a duty-cycle controlled reference buffer is then proposed which suppresses all the SSPD spur mechanisms. To verify the spur reduction concepts, a new SSPLL design is fabricated in 0.18-µm CMOS. The prototype generates 0.3 ps (10 kHz to 100 MHz) rms jitter while consuming 3.8 mW. The reference spurs measured from 20 randomly selected chips are <-80 dBc.

Finally, Chapter 7 summarizes the most important conclusions that were drawn in this thesis, gives an overview of the original contributions and recommends some future work directions.

#### **1.5 References**

- [1] J. S. Kilby, "Miniaturized electronic circuits," U.S. patent 3138743, filed Feb. 6, 1959.
- [2] R. N. Noyce, "Semiconductor device-and-lead structure," U.S. patent 2981877, filed July 30, 1959.
- [3] G. E. Moore, "Cramming more components onto integrated circuits", *Electronics*, vol. 38, no. 8, pp. 114-117, Apr. 1965.
- [4] S. Rusu, S. Tam, H. Muljono, J. Stinson, D. Ayers, J. Chang, R. Varada, M. Ratta and S. Kottapalli, "A 45nm 8-Core Enterprise Xeon® Processor", *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 56-57, Feb. 2009.
- [5] D. C. Lee, "Analysis of jitter in phase-locked loops," *IEEE Trans. Circuits Syst. II*, vol. 49, pp. 704–711, Nov.2002.
- [6] National Semiconductor, Clock Conditioner Owner's Manual, 2006. Accessed on Mar. 20<sup>th</sup>, 2010 http://www.national.com/appinfo/interface/files/clk\_conditioner\_owners\_manual.pdf
- [7] T. H. Lee, *The Design of CMOS Radio-Frequency Integrated Circuits*. New York, NY: Cambridge University Press, 1998.
- [8] R. C. H. van de Beek, E. Klumperink, C. S. Vaucher and B. Nauta, "Low-jitter clock multiplication: a comparison between PLLs and DLLs," *IEEE Trans. Circuits Syst. II*, vol. 49, pp. 555-566, Aug. 2002.
- [9] H. de Bellescize, "La Reception Synchrone," Onde Electr, Vol. 11, pp. 230-240, Jun. 1932.

- [10] A. B. Grebene and H. R. Camenzind, "Phase Locking as a New Approach for Tuned Integrated Circuits," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, vol. XII, pp. 100-101, Feb. 1969.
- [11] R. B. Sepe and R. I. Johnston, "Frequency multiplier and frequency waveform generator," U.S. patent 3551826, Dec. 1970.
- [12] V. F. Kroupa, Frequency Synthesis: Theory, Design and Applications. London, U.K.: Griffin, 1973.
- [13] J. A. Crawford, Frequency Synthesizer Design Handbook. Boston, MA: Artech House, 1994.
- [14] W. F. Egan, *Frequency Synthesis By Phase Lock.* 2nd ed., New York: Wiley Interscience, 2000.
- [15] C. S. Vaucher, Architectures for RF Frequency Synthesizers. Boston, MA: Kluwer, 2002.
- [16] J. Craninckx and M. Steyaert, "A Fully Integrated CMOS DCS-1800 Frequency Synthesizer," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 372-373, Feb.1998.
- [17] L. Lin and P. R. Gray, "A 1.4 GHz differential low-noise CMOS frequency synthesizer using a wideband PLL architecture," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 204–205, Feb. 2000.
- [18] H. Cong, S. M. Logan, M. J. Loinaz, K. J. O'Brien, E. E. Perry, G. D. Polhemus, J. E. Scoggins, K. P. Snowdon and M. G. Ward, "A 10-Gb/s 16:1 multiplexer and 10-GHz clock synthesizer in 0.25-um SiGe BiCMOS," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1946–1953, Sep. 2001.
- [19] N. Da Dalt and C. Sandner, "A subpicosecond jitter PLL for clock generation in 0.12 µm digital CMOS," *IEEE J. Solid-State Circuits*, vol. 38, no. 7, pp. 1275–1278, Jul. 2003.
- [20] A. M. Terrovitis, M. Mack, K. Singh and M. Zargari, "A 3.2 to 4 GHz, 0.25 um CMOS frequency synthesizer for IEEE 802.11a/b/g WLAN," *IEEE ISSCC Dig. Tech. Papers*, pp. 98–99, Feb. 2004.
- [21] R. C. H. van de Beek, C. S. Vaucher, D. M. W. Leenaerts, E. A. M. Klumperink and B. Nauta, "A 2.5–10-GHz clock multiplier unit with 0.22-ps RMS jitter in standard 0.18µm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 11, pp. 1862–1872, Nov. 2004.
- [22] R. Nonis, N. Da Dalt, P. Palestri and L. Selmi, "Modeling, design and characterization of a new low-jitter analog dual tuning LC-VCO PLL architecture," *IEEE J. Solid-State Circuits*, vol. 40, pp. 1303-1309, Jun. 2005.
- [23] R. Gu, A. Yee, Y. Xie and W. Lee, "A 6.25GHz 1V LC-PLL in 0.13µm CMOS," IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 594-595, Feb. 2006.
- [24] A. L. S. Loke, R. K. Barnes, T. T. Wee, M. M. Oshima, C. E. Moore, R. R. Kennedy and M. J. Gilsdorf, "A versatile 90-nm CMOS charge-pump PLL for SerDes transmitter clocking," *IEEE J. Solid-State Circuits*, vol. 41, pp. 1894-1907, Aug. 2006.

- [25] A. Swaminathan, K. J. Wang and I. Galton, "A Wide-Bandwidth 2.4 GHz ISM Band Fractional-N PLL With Adaptive Phase Noise Cancellation," *IEEE J. Solid-State Circuits*, vol. 42, pp. 2639-2650, Dec. 2007.
- [26] R. B. Staszewski, J. L. Wallberg, S. Rezeq, C.-M. Hung, O. E. Eliezer, S. Vemulapalli, K.C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. Cruise, M. Entezari, K. Muhammad and D. Leipold, "All-digital PLL and transmitter for mobile phone," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2469–2482, Dec. 2005.
- [27] N. Da Dalt, E. Thaller, P. Gregorius and L. Gazsi, "A Compact Triple-Band Low-Jitter Digital LC PLL With Programmable Coil in 130-nm CMOS," *IEEE J. Solid-State Circuits*, Vol. 40, No. 7, pp.1482-1490, Jul. 2005.
- [28] C. Hsu, M. Z. Straayer and M. H. Perrott, "A Low-Noise, Wide-BW 3.6GHz Digital ∆∑ Fractional-N Frequency Synthesizer with a Noise- Shaping Time-to-Digital Converter and Quantization Noise Cancellation," *IEEE Int. Solid-State Circuits Conf.* (ISSCC), pp. 340-341, Feb. 2008.
- [29] W. C. Black and D. A. Hodges, "Time interleaved converter arrays," *IEEE J. Solid-State Circuits*, vol.15, no. 6, pp. 1022–1029, Dec. 1980.
- [30] X. Gao, E. Klumperink, P. J. F. Geraedts and B. Nauta, "Jitter Analysis and a Benchmarking Figure-of-Merit for Phase-Locked Loops," *IEEE Trans. Circuits Syst. II*, vol. 56, no.2, pp. 117-121, Feb. 2009.
- [31] X. Gao, E. Klumperink, and B. Nauta, "Low-Jitter Multi-phase Clock Generation: A Comparison between DLLs and Shift Registers," *IEEE Int. Symp. Circuits Syst.* (ISCAS), pp. 2854–2857, May 2007.
- [32] X. Gao, E. Klumperink and B. Nauta, "Advantages of Shift Registers Over DLLs for Flexible Low Jitter Multiphase Clock Generation," *IEEE Trans. Circuits Syst. II*, vol. 55, no.3, pp. 244-248, Mar. 2008.
- [33] X. Gao, E. Klumperink, M. Bohsali and B. Nauta, "A 2.2GHz 7.6-mW Sub-Sampling PLL with -126dBc/Hz In-band Phase Noise and 0.15ps<sub>rms</sub> Jitter in 0.18-μm CMOS," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 392-393, Feb. 2009.
- [34] X. Gao, E. Klumperink, M. Bohsali and B. Nauta, "A Low Noise Sub-Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is not Multiplied by N<sup>2</sup>," *IEEE J. Solid-State Circuits (JSSC)*, vol. 44, no.12, pp. 3253-3263, Dec. 2009.
- [35] X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "A 2.2GHz Sub-Sampling PLL with 0.16ps<sub>rms</sub> Jitter and -125dBc/Hz In-band Phase Noise at 700μW Loop-Components Power," *IEEE Symposium on VLSI Circuits*, paper 14.1, Jun. 2010.
- [36] X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "Spur Reduction Techniques for Phase-Locked Loops Using Sub-Sampling Phase Detection," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 474-475, Feb. 2010.
- [37] X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "Spur Reduction Techniques for Phase-Locked Loops Exploiting a Sub-Sampling Phase Detector," accepted to *IEEE J. Solid-State Circuits (JSSC)*.

## Chapter 2

## **Classical PLL Jitter Analysis**

#### **2.1 Introduction**

In the previous chapter, we explained that the goal of this thesis is to design a low jitter phase-locked loop (PLL) with low power. In order to reach this goal, a deep understanding of the PLL noise mechanisms is needed. Of the many known PLL architectures [1-4], the one shown in Fig. 2.1(a) is perhaps the most widely-used one which we call the "classical PLL" architecture. It consists of a voltage controlled oscillator (VCO) which is locked to a reference clock by a feedback loop with the following "loop components": a phase detector (PD) combined with a charge pump (CP), a loop filter (LF) and a frequency divider with division ratio N ( $\div$ N). In this chapter, we will study the classical PLL architecture and analyze its phase noise and jitter performance.

The PLL jitter has been the topic of numerous studies [5-8]. Different from previous work, we focus on finding a systematic relation between the PLL jitter and key design parameters like the reference frequency, output frequency, loop bandwidth and power consumption. As we will see from the analytical results, changing these parameters largely affects the timing error in a systematic way. It thus makes sense to define a benchmark figure-of-merit (FOM) that normalizes for this systematic dependency. A well defined FOM makes it possible to compare different PLL designs and get an indication of their relative merits, in a similar way as for ADCs [9] or VCOs [10], [11], and can stimulate the development of power efficient high performance PLLs.

Following this introduction, Section 2.2 describes the classical PLL phase domain model and the noise transfer functions for different building blocks. Section 2.3 estimates the noise contribution and power consumption of the VCO and Section 2.4 does this for the loop components. Section 2.5 discusses the PLL output jitter and how it can be optimized. Based on the insights developed, a benchmark FOM to evaluate PLL jitter performance in relation to consumed power is proposed. Section 2.6 draws conclusions.



Figure 2.1. Classical PLL (a) architecture; (b) phase domain model.

#### 2.2 Classical PLL Phase Domain Model

The transient response of a PLL is generally a nonlinear process that cannot be formulated easily. Nevertheless, once phase locking is achieved, a linear approximation can be used to gain intuition. A linear phase-domain model for the classical PLL is shown in Fig. 2.1(b) [4], where  $K_d$  is the PD/CP detection gain,  $F_{LF}(s)$  the loop filter trans-impedance transfer function and  $K_{VCO}$  the VCO tuning gain in rad/V. Various noise sources are also shown. The noise transfer function from the VCO to the PLL output can be calculated as

$$H_{VCO}(s) = \frac{1}{1 + \frac{1}{N} \cdot K_d \cdot F_{LF}(s) \cdot \frac{K_{VCO}}{s}} = \frac{1}{1 + G(s)}$$
(2.1)

where G(s) is the PLL open loop transfer function and  $s=j2\pi f$ .

The rest of the noise all originates from the loop components and is therefore called the loop phase noise. When referred to the divider input<sup>1</sup>, the loop phase noise can be calculated as

$$\mathcal{L}_{loop} \approx \frac{S_{\phi loop,n}}{2} = \frac{1}{2} \cdot N^2 \cdot (S_{\phi ref,n} + S_{\phi div,n} + S_{\phi PD,n} + \frac{S_{iCP,n}}{K_d^2})$$
(2.2)

where the phase noise is expressed with the often used single-side-band noise power to

<sup>&</sup>lt;sup>1</sup> Here the loop phase noise is referred to the divider input (not to the PD input!), so that its level can be directly measured at the PLL output.



Figure 2.2. Overall PLL output phase noise originating from the loop-components and VCO, with 1/f noise neglected.

carrier power ratio  $\mathcal{L}$ , which is approximately half of the phase noise power spectral density under practical conditions [12]. In (2.2), we neglected the loop filter noise since it should be made negligible in a well designed low noise PLL. This can be done without adding power, by either properly sizing the filter components [13] or lowering  $K_{VCO}$  by design [14]. The reference clock is commonly generated by crystal oscillators whose phase noise is usually also negligible. The reference phase noise  $S_{\phi ref,n}$  is mainly contributed by reference dividers or reference clock buffers.

The noise transfer function from the (divider input referred) loop phase noise to the PLL output can be easily calculated as

$$H_{loop}(s) = \frac{G(s)}{1 + G(s)} = 1 - H_{VCO}(s) \cdot$$
(2.3)

Comparing (2.1) and (2.3), the VCO phase noise is high pass filtered while the loop phase noise is low pass filtered. Moreover, the 3-dB bandwidth for the two transfer functions is the same and determined by G(s). We define their 3-dB bandwidth as the PLL bandwidth  $f_c$ .

In the following noise analysis, we assume the PLL is implemented with CMOS which is the technology of interest in this thesis. We will focus on the fundamental limitation due to thermal noise and neglect 1/f noise, with similar arguments as in [6] and [7]. In most PLL designs, 1/f noise doesn't contribute much to the output jitter. The VCO 1/f noise is suppressed by the PLL loop since the 1/f corner frequency is normally lower than  $f_c$ . The contribution of the 1/f noise in the loop components is also not significant when the 1/f corner frequency is small compared with the PLL bandwidth<sup>2</sup>. With the 1/f noise neglected, the spectrum of the loop phase noise is flat. The VCO phase noise has a  $1/f^2$  shape due to the integration of white noise. Fig. 2.2 shows the overall PLL output phase noise when a first order low pass loop filter is used. Outside the loop bandwidth (offset frequency  $f_m > f_c$ ) the  $1/f^2$  shape of the VCO noise is visible, as VCO phase noise is hardly affected there by the loop filter. Within the loop bandwidth, for  $f_m < f_c$ , the filtering suppresses the VCO noise and the loop noise dominates. As this happens inside the loop bandwidth, the loop phase noise is sometimes also referred to as PLL in-band phase noise.

#### 2.3 VCO Phase Noise and Benchmarking

The VCO phase noise has been the topic of several studies, e.g. [11], [15-17]. It is found that the phase noise of a VCO in some respects is systematically dependent on design parameters like the oscillation frequency  $f_{VCO}$ , power dissipation  $P_{VCO}$  and the offset frequency  $f_m$  at which the phase noise is measured. To compare the quality of VCO designs, the following benchmark FOM [10], [11] is widely used:

$$FOM_{VCO} = 10\log(\mathcal{L}_{VCO}(f_m) \cdot \frac{f_m^2}{f_{VCO}^2} \cdot \frac{P_{VCO}}{\mathrm{ImW}}) \cdot$$
(2.4)

The unit of  $FOM_{VCO}$  is dBc/Hz ( $\mathcal{L} \cdot$  dimensionless factor). A smaller  $FOM_{VCO}$ , i.e. a more negative number, corresponds to a better VCO design<sup>3</sup>. The VCO phase noise can thus be expressed using  $FOM_{VCO}$  as

$$\mathcal{L}_{VCO}(f_m) = \frac{10^{FOM_{VCO}/10}}{P_{VCO}/1\text{mW}} \cdot \frac{f_{VCO}^2}{f_m^2} \,.$$
(2.5)

#### 2.4 Loop Phase Noise and Benchmarking

In [18], Banerjee found that the classical PLL (in-band) loop phase noise is related to N and the phase detector frequency  $f_{PD}$  as

$$\mathcal{L}_{loop} \propto N^2 \cdot f_{PD} \,. \tag{2.6}$$

To eliminate this dependence, he proposed a normalized phase noise floor  $PN_{IHz}$  to benchmark the quality of a loop design, defined as

<sup>&</sup>lt;sup>2</sup> For example, if the 1/f corner frequency is 100kHz and loop bandwidth is 1MHz, and jitter is integrated over a wide region of [1kHz, 100MHz], calculation shows that 1/f noise contributes only about 10% of the total jitter.

<sup>&</sup>lt;sup>3</sup> Sometimes the negative of (2.4), i.e., a  $FOM_{VCO}$  with plus sign is used [11], but this leads to very strange units for  $FOM_{VCO}$ .



Figure 2.3. Schematic of (a) 3-state PFD/CP; (b) divider with synchronization.

$$PN_{1Hz} = \frac{\mathcal{L}_{loop}}{N^2 \cdot f_{PD}} \cdot$$
(2.7)

The Banerjee model was applied to a wide range of PLL IC's in industry and was supported by measurement results [18]. However, the theoretical basis for (2.6) is not clear in [18]. Moreover, (2.6) does not take into account the power consumption while phase noise performance is known to be strongly related to power consumption. The analysis below addresses these issues.

To analyze the loop phase noise, we assume the popular 3-state phase-frequency-detector (PFD) and CP combination as shown in Fig. 2.3(a) is used. For the divider design, synchronization is often used in low noise designs [19], as shown in Fig. 2.3(b). The only noise source of the divider is then the retiming D flip-flop (DFF). The divide-by-N block only acts as an edge selector and does *not* contribute to noise. Its power consumption can thus be progressively scaled down [19]. As we aim to model the power needed to meet a certain phase-noise/jitter requirement, we will ignore the divide-by-N block hereafter and only model the power of the retiming DFF<sup>4</sup> in the divider.

#### 2.4.1 Phase Noise due to the Reference Path, Divider and PD

Among the loop noise sources,  $S_{\phi ref,n}$ ,  $S_{\phi div,n}$  and  $S_{\phi PD,n}$ , are caused by circuits like the reference buffer, divider retiming DFF and the 3-state PFD, which all (effectively) run at frequency  $f_{PD}$ . These circuits all respond to zero-crossings at their inputs by producing zero-crossings at their outputs. This time discrete behavior causes sampling of the phase of the output signal at the operation frequency  $f_{PD}$ . The sampling process folds back any noise

<sup>&</sup>lt;sup>4</sup> There can be occasions where the power of the divide-by-*N* block becomes significant, e.g. in order to make it fast enough to cover very high VCO frequencies. However, this is not because of jitter or noise requirements.

component at frequency higher than  $f_{PD}/2$  and the phase noise spectrum is thus defined in the Nyquist band between 0 and  $f_{PD}/2$ . With the white noise assumption, the output phase noise of the circuit is then related to the absolute output jitter  $\sigma_t$  as [19]

$$S_{\phi,n} = 8\pi^2 \cdot f_{PD} \cdot \sigma_t^2. \tag{2.8}$$

The output jitter of circuits like DFFs or inverters is related to the output noise voltage  $\overline{v_n^2}$  and the slew rate  $SR_{out}$  of the output voltage at its zero crossing as [19], [20]

$$\sigma_t^2 = \frac{\overline{v_n^2}}{SR_{out}^2} = \frac{F_n \cdot kT / C_{out}}{SR_{out}^2}$$
(2.9)

where  $F_n$  is the noise factor and  $C_{out}$  is the capacitance at the output node.

Assuming the minimum power needed is the dynamic power, the circuit power consumption P can be calculated as

$$P \approx P_{dynamic} = f_{PD} \cdot C_{tot} \cdot V_{dd}^2 \tag{2.10}$$

where  $C_{tot}$  is the total capacitance of the circuit.

Combining (2.9) and (2.10), we get

$$\sigma_t^2 = \frac{f_{PD}}{P} \cdot \left\{ \frac{F_n \cdot kT \cdot V_{dd}^2 \cdot C_{tot} / C_{out}}{SR_{out}^2} \right\} \cdot$$
(2.11)

In order to minimize the output jitter, designers can optimize the circuit by choosing the relative sizes of components e.g. to maximize  $SR_{out}$ . Once this optimization has been done, jitter can always be reduced on system level via admittance level scaling [21]. Admittance level scaling puts *n* identical circuits in parallel. As a result, power consumption is *n* times higher and  $\overline{v_n^2}$  is *n* times lower while the voltage slope at every node does not change [21]. Thus  $C_{tot}/C_{out}$ ,  $F_n$  as well as  $SR_{out}$  remains the same as all nodes' admittances scale together. Therefore, on the system level, we can treat the bracketed part in (2.11) as a design dependent constant<sup>5</sup> and we get

$$\sigma_t^2 \propto f_{PD} / P \,. \tag{2.12}$$

For loop noise  $S_{\phi ref,n}$ ,  $S_{\phi div,n}$  and  $S_{\phi PD,n}$ , we can conclude with (2.8) and (2.12) that

$$S_{\phi ref,n} \propto f_{PD}^2 / P_{ref}; \qquad (2.13)$$

<sup>&</sup>lt;sup>5</sup> It is assumed (like in [19], [20]) that  $SR_{out}$  is independent of input rise time and frequency (e.g. the inputs are high-slope signals or square-waves).

$$S_{\phi div,n} \propto f_{PD}^2 / P_{div}; \qquad (2.14)$$

$$S_{\phi PD,n} \propto f_{PD}^2 / P_{PD}.$$
 (2.15)

where  $P_{ref}$ ,  $P_{div}$  and  $P_{PD}$ , are respectively the power consumption of the reference buffer, divider and PD.

#### 2.4.2 Phase Noise due to the CP

Different from the circuits in section 2.4.1, the CP outputs current/charge instead of crossings moments. Assuming for simplicity that the CP up- and down-current sources have the same properties, the power spectral density of the (thermal) noise current generated by the CP is

$$S_{i,n} = 2 \times 4kT \gamma \cdot g_{m,CP} = 8kT \gamma \cdot (\alpha I_{CP} / V_{eff,CP})$$
(2.16)

where  $\gamma$  and  $V_{eff,CP}$  are respectively the noise factor and effective gate voltage of the transistors in the current sources,  $I_{CP}$  is the CP current,  $\alpha$  is the transistor model parameter which is equal to 2 for the square-law model, and  $\alpha I/V_{eff}$  represents the transconductance  $g_m$ .

In steady state, the CP is switched on only for a fraction of time  $\tau_{PD}$  of each period  $T_{PD}$  to avoid the dead zone. The equivalent CP (thermal) noise current can be calculated as [8]

$$S_{iCP,n} = S_{i,n} \cdot (\tau_{PD} / T_{PD}).$$
(2.17)

The minimum power needed by a CP is related to the charge delivered in steady state:

$$P_{CP} = I_{CP} V_{dd} \cdot (\tau_{PD} / T_{PD}) = I_{CP} V_{dd} \tau_{PD} \cdot f_{PD} .$$
(2.18)

For a 3-state PFD/CP, it is well known that  $K_d = I_{CP}/2\pi$ . With (2.16)-(2.18) and some manipulations, we get

$$\frac{S_{iCP,n}}{K_d^2} = \frac{f_{PD}^2}{P_{CP}} \cdot \{\tau_{PD}^2 \cdot \frac{32\pi^2 \alpha \gamma \cdot kT \cdot V_{dd}}{V_{eff,CP}}\} \propto \frac{f_{PD}^2}{P_{CP}}$$
(2.19)

where the bracketed part is treated as a design and process dependent constant.

#### 2.4.3 Loop Phase Noise Benchmarking

The overall power consumption of the PLL loop  $P_{loop}$  is the sum of  $P_{ref}$ ,  $P_{div}$ ,  $P_{PD}$  and  $P_{CP}$ . There should be an optimal way to distribute the total loop-components power  $P_{loop}$  into different blocks. Once the optimization has been done,  $P_{ref}$ ,  $P_{div}$ ,  $P_{PD}$  and  $P_{CP}$  remains a constant portion of  $P_{loop}$  when the admittance level scaling is applied to the whole loop.

With this assumption, we can derive from (2.13-2.15) and (2.19) that:

$$S_{\phi ref,n} \propto f_{PD}^2 / P_{loop}; \qquad (2.20)$$

$$S_{\phi div,n} \propto f_{PD}^2 / P_{loop}; \qquad (2.21)$$

$$S_{\phi PD,n} \propto f_{PD}^2 / P_{loop}; \qquad (2.22)$$

$$\frac{S_{iCP,n}}{K_d^2} \propto f_{PD}^2 / P_{loop}$$
(2.23)

Based on (2.2), and (2.20-2.23) we can conclude that

$$\mathcal{L}_{loop} \propto N^2 \cdot f_{PD} \cdot \frac{f_{PD}}{P_{loop}} = \frac{f_{out}^2}{P_{loop}} \cdot$$
(2.24)

Note that we assumed dynamic power consumption, i.e.  $P_{loop}$  scales with  $f_{PD}$ , so (2.24) shows the same proportionality as the Banerjee model in (2.6). In addition to (2.6), (2.24) also takes into account the power dissipation. For a given  $f_{out}$ , using a larger  $f_{PD}$  reduces the (in-band) loop phase noise but also increases the power consumption.

Based on (2.24), we propose to define a benchmark FOM for PLL loop designs as

$$FOM_{loop} = 10\log[\mathcal{L}_{loop} \cdot (\frac{1\text{Hz}}{f_{out}})^2 \cdot \frac{P_{loop}}{1\text{mW}}]$$
(2.25)

where  $f_{out}$  and  $P_{loop}$  are normalized to 1 Hz and 1 mW respectively so that the unit of  $FOM_{loop}$  is again dBc/Hz, the same as for  $FOM_{VCO}$ . The normalization to 1 mW is practical and similar to what is used for  $FOM_{VCO}$ , as the power consumption of circuits is typically expressed in mW, while signal power is RF circuits is often expressed in decibel milliwatts, again with 1 mW as reference [21]. A smaller  $FOM_{loop}$  (more negative values in dBc/Hz) again corresponds to a better loop design. The loop phase noise can now be expressed with  $FOM_{loop}$  as

$$\mathcal{L}_{loop} = 10^{FOM_{loop}/10} \cdot \left(\frac{f_{out}}{1\text{Hz}}\right)^2 \cdot \frac{1\text{mW}}{P_{loop}} \cdot$$
(2.26)

#### 2.5 PLL Jitter and Benchmarking

#### 2.5.1 PLL Output Jitter

Jitter can be characterized in several different ways [5]. This work chooses to use absolute jitter as it is often used in PLL design literature. The relation with other jitter

measures can be found in [5]. The variance of the long term PLL absolute jitter is related to the phase noise as

$$\sigma_{t,PLL}^{2} = \frac{2\int_{0}^{\infty} \boldsymbol{\mathcal{L}}_{PLL}(f_{m})df_{m}}{(2\pi f_{out})^{2}} = \frac{1}{2\pi^{2}f_{out}^{2}} \cdot \int_{0}^{\infty} \boldsymbol{\mathcal{L}}_{PLL}(f_{m})df_{m} \cdot$$
(2.27)

The PLL output jitter variance  $\sigma_{t,PLL}^2$  is the sum of the jitter variance caused by the VCO  $\sigma_{t,VCO}^2$  and the loop  $\sigma_{t,loop}^2$ . The jitter variance due to the VCO can be calculated as

$$\sigma_{t,VCO}^{2} = \frac{1}{2\pi^{2} f_{out}^{2}} \cdot \int_{0}^{\infty} \mathcal{L}_{VCO}(f_{m}) \cdot |H_{VCO}(j2\pi f_{m})|^{2} df_{m} \cdot$$
(2.28)

The value of (2.28) is dependent on the bandwidth and shape (related to phase margin) of  $H_{VCO}(s)$ . Assuming a given open loop transfer function  $G_0(s)$  which results in a close loop transfer function  $H_{VCO,0}(s)$  with a 3-dB bandwidth of  $f_{c,0}$ , scaling the bandwidth to  $f_c$  while keeping the same shape (thus the phase margin) results in a new transfer function [22]:

$$H_{VCO}(s) = H_{VCO,0}(s \cdot \frac{f_{c,0}}{f_c}) \cdot$$
(2.29)

Substituting (2.29) into (2.28) yields

$$\sigma_{t,VCO}^{2} = \frac{1}{2\pi^{2} f_{out}^{2}} \cdot \int_{0}^{\infty} \mathcal{L}_{VCO}(f_{m}) \cdot |H_{VCO,0}(j2\pi f_{m} \cdot \frac{f_{c,0}}{f_{c}})|^{2} df_{m} \cdot$$
(2.30)

Since the VCO phase noise has a  $1/f^2$  shape, it can also be expressed as

$$\mathcal{L}_{VCO}(f_m) = \frac{\mathcal{L}_{VCO}(f_r) \cdot f_r^2}{f_m^2}$$
(2.31)

where  $\mathcal{L}_{VCO}(f_r)$  is the VCO phase noise measured at a certain offset frequency  $f_r$ . We can then re-write (2.30) as

$$\sigma_{t,VCO}^{2} = \frac{\mathcal{L}_{VCO}(f_{r}) \cdot f_{r}^{2}}{2\pi^{2} f_{out}^{2}} \cdot \int_{0}^{\infty} |H_{VCO,0}(j2\pi f_{m} \cdot \frac{f_{c,0}}{f_{c}})|^{2} \frac{df_{m}}{f_{m}^{2}}$$
$$= \frac{\mathcal{L}_{VCO}(f_{r}) \cdot f_{r}^{2}}{2\pi^{2} f_{out}^{2}} \cdot \frac{f_{c,0}}{f_{c}} \cdot \int_{0}^{\infty} |H_{VCO,0}(j2\pi f)|^{2} \frac{df}{f^{2}} \cdot$$
(2.32)

Substituting (2.1) into (2.32) and using  $s=j2\pi f$  yields

$$\sigma_{t,VCO}^{2} = \frac{f_{c,0}}{f_{c}} \cdot \frac{2\mathcal{L}_{VCO}(f_{r}) \cdot f_{r}^{2}}{f_{out}^{2}} \cdot \int_{0}^{\infty} |\frac{1}{s \cdot [1 + G_{0}(s)]}|^{2} df \cdot$$
(2.33)

Using similar analysis as for the VCO, the PLL output jitter variance due to the loop can

be calculated as

$$\sigma_{t,loop}^{2} = \frac{f_{c}}{f_{c,0}} \cdot \frac{\mathcal{L}_{loop}}{2\pi^{2} f_{out}^{2}} \cdot \int_{0}^{\infty} |\frac{G_{0}(s)}{1 + G_{0}(s)}|^{2} df$$
 (2.34)

Therefore, the overall PLL output jitter can be calculated with (2.33) and (2.34) as

$$\sigma_{t,PLL}^{2} = \sigma_{t,VCO}^{2} + \sigma_{t,loop}^{2} = \frac{f_{c,0}}{f_{c}} \cdot \frac{2\mathcal{L}_{VCO}(f_{r}) \cdot f_{r}^{2}}{f_{out}^{2}} \cdot \int_{0}^{\infty} \left|\frac{1}{s \cdot [1 + G_{0}(s)]}\right|^{2} df + \frac{f_{c}}{f_{c,0}} \cdot \frac{\mathcal{L}_{loop}}{2\pi^{2} f_{out}^{2}} \cdot \int_{0}^{\infty} \left|\frac{G_{0}(s)}{1 + G_{0}(s)}\right|^{2} df \cdot$$
(2.35)

#### 2.5.2 PLL Jitter Optimization

It is clear from (2.33) and (2.34) that a larger value of  $f_c$  will lower the output jitter due to the VCO while raising the jitter contribution of the loop. The optimum PLL bandwidth  $f_{c,opt}$  which gives the minimum PLL output jitter is calculated with (2.35) as

$$f_{c,opt} = \sqrt{\frac{\mathcal{L}_{VCO}(f_r) \cdot f_r^2}{\mathcal{L}_{loop}}} \cdot 2\pi \cdot \sqrt{f_{c,0}^2 \cdot \frac{\int_0^\infty |\frac{1}{s \cdot [1 + G_0(s)]}|^2 df}{\int_0^\infty |\frac{G_0(s)}{1 + G_0(s)}|^2 df}}$$
(2.36)

Substituting (2.36) into (2.31) yields:

$$\mathcal{L}_{VCO}(f_{c,opt}) = \mathcal{L}_{loop} \cdot \frac{1}{4\pi^2 \cdot f_{c,0}^2} \cdot \frac{\int_0^\infty |\frac{G_0(s)}{1+G_0(s)}|^2 df}{\int_0^\infty |\frac{1}{s \cdot [1+G_0(s)]}|^2 df}$$
(2.37)

In (2.37), the results of the integrations are related to the design of the loop filter and the phase margin of the loop transfer function. In a second-order type-II PLL with a simple RC filter, a large phase margin is preferred for less jitter peaking [7]. Since a second-order PLL with a large phase margin (i.e. an over-damped second-order PLL) can be approximated with a first-order loop [7], we can re-write (2.37) as:

$$\mathcal{L}_{VCO}(f_{c,opt}) \approx \mathcal{L}_{loop} \cdot \frac{1}{4\pi^2 \cdot f_{c,0}^2} \cdot \frac{\int_0^\infty |\frac{2\pi f_{c,0}^2 / s}{1 + 2\pi f_{c,0}^2 / s}|^2 df}{\int_0^\infty |\frac{1}{s \cdot [1 + 2\pi f_{c,0}^2 / s]}|^2 df} = \mathcal{L}_{loop}$$
(2.38)

which means that  $f_{c,opt}$  is approximately where the spectrum of the VCO and the loop noise

intersects. This conclusion is the same as the one drawn in [4].

The jitter variance due to the VCO and loop at  $f_{c,opt}$  can be calculated by substituting (2.36) into (2.33) and (2.34):

$$\sigma_{t,VCO,opt}^{2} = \frac{\sqrt{\mathcal{L}_{loop}\mathcal{L}_{VCO}(f_{r}) \cdot f_{r}^{2}} \cdot \sqrt{\int_{0}^{\infty} |\frac{G_{0}(s)}{1 + G_{0}(s)}|^{2} df \cdot \int_{0}^{\infty} |\frac{1}{s \cdot [1 + G_{0}(s)]}|^{2} df}}{\pi \cdot f_{out}^{2}}, \qquad (2.39)$$

$$\sigma_{t,loop,opt}^{2} = \frac{\sqrt{\mathcal{L}_{loop}\mathcal{L}_{VCO}(f_{r}) \cdot f_{r}^{2}} \cdot \sqrt{\int_{0}^{\infty} |\frac{1}{s \cdot [1 + G_{0}(s)]}|^{2} df \cdot \int_{0}^{\infty} |\frac{G_{0}(s)}{1 + G_{0}(s)}|^{2} df}}{\pi \cdot f_{out}^{2}}.$$
 (2.40)

We get  $\sigma_{t,VCO,opt}^2 = \sigma_{t,loop,opt}^2$ , meaning that the VCO and the loop-components contribute equal jitter in an optimized PLL design.

Given  $f_{c,opt}$  in (2.36), the minimum PLL output jitter variance  $\sigma^2_{t,PLL,min}$  is calculated as:

$$\sigma_{t,PLL,\min}^{2} = \frac{1}{\sqrt{P_{loop} \cdot P_{VCO}}} \cdot 10^{\frac{FOM_{loop} + FOM_{VCO}}{20}} \cdot \frac{2}{\pi}$$
$$\cdot \sqrt{\int_{0}^{\infty} |\frac{G_{0}(s)}{1 + G_{0}(s)}|^{2} df} \cdot \int_{0}^{\infty} |\frac{1}{s \cdot [1 + G_{0}(s)]}|^{2} df} \cdot \frac{\mathrm{ImW}}{\mathrm{IHz}}$$
(2.41)

where the VCO and loop phase noise in (2.35) has been represented with  $FOM_{VCO}$  and  $FOM_{loop}$  using (2.5) and (2.26).

For a fixed PLL power budget  $P_{PLL} = P_{loop} + P_{VCO}$ , it is easy to show that the minimum value of (2.41) occurs when  $P_{loop} = P_{VCO} = P_{PLL}/2$ , when the other conditions are kept the same. This means that *the VCO and the loop components consume equal power in an optimized PLL design*. Under this condition, the minimum PLL jitter variance in (2.41) can be re-written as

$$\sigma_{t,PLL,\min}^{2} = \frac{1}{P_{PLL}} \cdot \{ 10^{\frac{FOM_{loop} + FOM_{FCO}}{20}} \cdot \frac{4}{\pi} \cdot \frac{\text{ImW}}{\text{IHz}} \} \cdot \left\{ \sqrt{\int_{0}^{\infty} |\frac{G_{0}(s)}{1 + G_{0}(s)}|^{2} df} \cdot \int_{0}^{\infty} |\frac{1}{s \cdot [1 + G_{0}(s)]}|^{2} df} \right\} \cdot (2.42)$$

It should be noted that the optimal PLL bandwidth for minimum jitter may not meet the stability or locking time requirements, spending equal power on the loop and the VCO may also have practical difficulties. However, they are still the theoretical optimum under the conditions mentioned and provide designers directions for PLL jitter and power optimization. From a practical point of view, it is useful to know how sensitive the optimum is for



Figure 2.4. Variations of PLL output jitter when (a) PLL bandwidth is not optimal; (b)  $P_{loop} \neq P_{VCO}$  for a given PLL power budget.

parameter variations and how much the PLL jitter will be increased when the optimum condition is not met. Fig. 2.4(a) draws the relative change in the amount of PLL output jitter when  $f_c$  deviates from  $f_{c,opt}$  and Fig. 2.4(b) shows what happens when the VCO and loop do not consume equal power for a given PLL power budget. We see that setting the PLL bandwidth two times larger/smaller than the optimum one or spending 4 times more/less power on the VCO than the loop instead of making them equal increases the output jitter by less than 12%. Therefore, we can conclude that the optimum is relatively flat.

### 2.5.3 PLL Benchmarking

In (2.42), the first bracketed part is a constant determined by the quality of the VCO and loop design. The value of the second bracketed part, the integration, is related to the phase margin of the loop transfer function. In an over-damped second-order PLL (for small jitter peaking), the result of the integration is about 0.25 and we get:

$$\sigma_{t,PLL,\min}^{2} = \frac{1}{P_{PLL}} \cdot 10^{\frac{FOM_{loop} + FOM_{FCO}}{20}} \cdot \frac{1}{\pi} \cdot \frac{1 \text{mW}}{1 \text{Hz}} \cdot (2.43)$$

When the integration part in (2.42) is treated as a (PLL type and order dependent) constant, we can conclude that

$$\sigma_{t,PLL,\min}^2 \propto 1/P_{PLL}.$$
(2.44)



Figure 2.5. ISSCC low jitter PLL designs (Year\_PaperNumber).

We see that when a PLL design is optimized, i.e., when (2.42) holds (equal loop and VCO power, and optimal PLL bandwidth), the minimum PLL jitter is *independent* of  $f_{PD}$  and  $f_{out}$ , given a fixed PLL power budget. Note that for a higher  $f_{out}$ , the loop and VCO phase noise is higher according to (2.5) and (2.26). However, the output clock period is smaller with a higher  $f_{out}$ . When phase noise is converted to jitter using (2.27), these two factors cancel out. A similar observation was also made in [23]. Based on (2.44), we define a PLL benchmark FOM as

$$FOM_{PLL} = 10\log[(\frac{\sigma_{\iota,PLL}}{1s})^2 \cdot \frac{P_{PLL}}{1mW}]$$
(2.45)

The unit of  $FOM_{PLL}$  is dB. A smaller  $FOM_{PLL}$  corresponds to a better PLL design.

Comparing (2.42) and (2.45), we can see that

$$FOM_{PLL} \propto FOM_{loop} + FOM_{VCO}$$
 (2.46)

Therefore, the design qualities of the loop and the VCO are equally important. This is intuitive since the loop and the VCO have equal contribution to both power and jitter in an optimized PLL design.

With the defined PLL FOM, different PLL designs can be compared using a single number. Fig. 2.5 shows the performance of some PLL designs in recent years' International Solid State Circuits Conference (ISSCC) along with the  $FOM_{PLL}$  lines. We see that the  $FOM_{PLL}$  improves over the years, as we would expect for a conference that claims to present the state-of-the-art work. The state-of-the-art  $FOM_{PLL}$  is close to -240 dB.

# **2.6 Conclusion**

The phase noise and power consumption of the VCO and loop components in a classical PLL is analyzed. A benchmark FOM for loop designs ( $FOM_{loop}$ ) is proposed, complementary to the existing VCO FOM. The absolute PLL output jitter is calculated and an expression for the minimum jitter is derived. It is shown that, to minimize the output jitter for a given power budget, designers should aim at: 1) spending equal power on the loop and the VCO; and 2) setting the loop bandwidth such that the loop and the VCO contribute equally to the total jitter. In such an optimized PLL, the output jitter is independent of the reference frequency and output frequency for a given power budget. Based on these insights, a benchmark FOM for PLL designs ( $FOM_{PLL}$ ) is proposed. This  $FOM_{PLL}$  can be used to compare various PLL designs in applications where jitter and power are important. Moreover, system designers can use it to predict and trade-off jitter and power during system level design.

# 2.7 References

- V. F. Kroupa, Frequency Synthesis: Theory, Design and Applications. London, U.K.: Griffin, 1973.
- [2] J. A. Crawford, *Frequency Synthesizer Design Handbook*. Boston, MA: Artech House, 1994.
- [3] W. F. Egan, Frequency Synthesis by Phase Lock, 2nd ed., New York: Wiley, 1999.
- [4] C. S. Vaucher, Architectures for RF Frequency Synthesizers. Boston, MA: Kluwer, 2002.
- [5] D. C. Lee, "Analysis of jitter in phase-locked loops," *IEEE Trans. Circuits Syst. II*, vol. 49, pp. 704–711, Nov.2002.
- [6] R. C. H. van de Beek, E. Klumperink, C. S. Vaucher and B. Nauta, "Low-jitter clock multiplication: a comparison between PLLs and DLLs," *IEEE Trans. Circuits Syst. II*, vol. 49, pp. 555-566, Aug. 2002.
- [7] M. Mansuri and C. K. K. Yang, "Jitter optimization based on phase-locked loop design parameters," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1375-1382, Nov. 2002.
- [8] H. Arora, N. Klemmer, J. Morizio and P. Wolf, "Enhanced phase noise modeling of fractional-N frequency synthesizers," *IEEE Trans. Circuits Syst. I*, vol. 52, no.2, pp. 379-395, Feb. 2005.
- [9] R. H. Walden, "Analog-to-digital converter survey and analysis," *IEEE J. Sel. Areas Commun.*, vol. 17, no.4, pp. 539-550, Apr. 1999.

- [10] P. G. M. Baltus, A. G. Wagemans, R. Dekker, A. Hoogstraate, H. Maas, A. Tombeur and J. van Sinderen, "A 3.5-mW, 2.5-GHz diversity receiver and a 1.2-mW, 3.6-GHz VCO in silicon on anything," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 2074-2080, Dec. 1998.
- [11] P. Kinget, "Integrated GHz voltage controlled oscillators," Analog Circuit Design: (X)DSL and Other Communication Systems; RF MOST Models; Integrated Filters and Oscillators, W. Sansen, J. Huijsing and R. van de Plassche, Ed. Boston, MA: Kluwer, pp. 353-381, 1999.
- [12] A. Demir, "Computing timing jitter from phase noise spectra for oscillators and phaselocked loops with white and 1/f Noise," *IEEE T. Circuits Systems I*, vol. 53, no. 9, pp. 1869-1884, Sep. 2006.
- [13] H. Rategh, H. Samavati and T. H. Lee, "A CMOS frequency synthesizer with an injection-locked frequency divider for a 5 GHz Wire LAN receiver," *IEEE J. Solid-State Circuits*, vol. 35, pp. 779-786, May 2000.
- [14] R. Nonis, N. Da Dalt, P. Palestri and L. Selmi, "Modeling, design and characterization of a new low-jitter analog dual tuning LC-VCO PLL architecture," *IEEE J. Solid-State Circuits*, vol. 40, pp. 1303-1309, Jun. 2005.
- [15] A. Hajimiri, S. Limotyrakis and T. H. Lee, "Jitter and phase noise in ring oscillators," *IEEE J. Solid-State Circuits*, vol. 34, pp. 790–804, Jun. 1999.
- [16] P. Andreani and A. Fard, "More on the 1/f<sup>2</sup> phase noise performance of CMOS differential-pair LC tank oscillators," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2703–2712, Dec. 2006.
- [17] E. Hegazi, H. Sjoland and A. A. Abidi, "A filtering technique to lower LC oscillator phase noise," *IEEE J. Solid-State Circuits*, vol. 36, pp.1921–1930, Dec. 2001.
- [18] D. Banerjee, PLL performance, simulation, and design, 4<sup>th</sup> edition, National Semiconductor, 2006. [on-line] Accessed on Mar. 20<sup>th</sup>, 2010. http://www.national.com/analog/timing/pll designbook
- [19] S. Levantino, L. Romano, S. Pellerano, C. Samori and A. L. Lacaita, "Phase noise in digital frequency dividers," *IEEE J. Solid-State Circuits*, vol. 39, no.5, pp. 775–784, May 2004.
- [20] A. A. Abidi, "Phase noise and jitter in CMOS ring oscillators", *IEEE J. Solid-State Circuits*, vol. 41, no.8, pp. 1803-1816, Aug. 2006.
- [21] E. Klumperink and B. Nauta, "Systematic comparison of HF CMOS transconductors," *IEEE Trans. Circuits Syst. II*, vol. 50, no.10, pp. 728-741, Oct. 2003.
- [22] R. C. H. van de Beek, "High-speed low-jitter clock multiplication in CMOS," PhD thesis, University of Twente, ISBN 90-365-1989-6, 2004.
- [23] X. Gao, E. Klumperink and B. Nauta, "Advantages of shift registers over DLLs for flexible low jitter multiphase clock generation," *IEEE Trans. Circuits Syst. II*, vol. 55, no.3, pp. 244-248, Mar. 2008.

# Chapter 3

# Low Jitter Multi-phase Clock Generation

### 3.1 Introduction

In a GHz PLL design aiming for very low jitter as well as low power, an LC oscillator is a better choice over a ring oscillator since the former is often orders of magnitude more power efficient due to the high quality factor of the LC tank. For a PLL with an LC VCO, the output clock often has a single or differential phase<sup>1</sup>. In some applications, clocks with more phases, i.e., multi-phase clocks are needed. Multi-phase clocks are a group of *M* clocks which have uniform waveform but with  $2\pi/M$  phase difference in between. Fig. 3.1 shows an example timing diagram when *M* is equal to 8. Multi-phase clocks are useful, e.g. in high speed serial links [1] to process data streams at a bit rate higher than the clock frequency, and in time-interleaved ADCs to achieve a high overall sample-rate while keeping the sub-ADC sample-rate low [2]. In wideband wireless communication systems, harmonic rejection mixers and multi-path poly-phase circuits need multi-phase clocking to reject unwanted harmonics and sidebands [3].

To generate multi-phase clocks from a single or differential clock, both delay-locked loops (DLLs) and shift registers (SRs) can be used. A SR multi-phase clock generator (MPCG) also functions as a divide-by-*M* divider for *M*-phase clock generation. It runs at *M* times higher frequency than the DLL MPCG and at first glance seems to consume more power. However, a SR MPCG doesn't have jitter accumulation from one clock phase to the other as in a DLL equivalent, which should be taken into account for a fair comparison. This chapter aims to make a solid comparison between these two MPCGs, primarily based on their jitter and power performance.

The rest of the chapter is arranged as follows. Section 3.2 describes the architecture of the DLL MPCG and analyses its jitter performance, while Section 3.3 addresses the SR MPCG. Section 3.4 makes a comparison between the two MPCGs and Section 3.5 verifies the analysis via simulation results. Section 3.6 presents conclusions.

<sup>&</sup>lt;sup>1</sup> Quadrature LC VCOs or multi-stage ring type LC VCOs can provide 4 or more phases. However, they occupy significant amount of chip area due to the use of multiple inductors.



Figure 3.1. Timing diagram of 8-phase clocks.



Figure 3.2. (a) DLL MPCG architecture (b) CML delay unit schematic.

# **3.2 DLL MPCG Jitter**

### 3.2.1 DLL MPCG Architecture

The architecture of a DLL MPCG is shown in Fig. 3.2(a). It consists of a voltagecontrolled delay line (VCDL) which has *M* identical delay units (DUs) and a control loop consisting of a phase detector (PD), a charge pump (CP) and a loop filter (LF). In the DLL, a reference clock  $CLK_{ref}$ , generated by a VCO with a frequency of *f*, is propagated through the VCDL. The loop compares the phase of the last output of the VCDL with  $CLK_{ref}$  and controls the VCDL so that its total delay time is one reference clock period. Once locking is achieved, the *M* outputs  $CLK_{I}$ ~ $CLK_{M}$  are multi-phase clocks with  $2\pi/M$  phase spacing.

### **3.2.2 DLL MPCG Output Jitter**

The DLL MPCG output jitter can be divided into three parts: 1) jitter transferred from the reference clock, 2) jitter generated by the VCDL and 3) jitter from the PD/CP/LF control loop. The reference clock jitter is transferred to the DLL outputs with some jitter peaking [5], [6]. The DLL cannot decrease reference clock jitter, but jitter peaking can be made very small by choosing a low DLL loop bandwidth [5], [6]. For an optimal DLL design, the jitter contribution of the control loop is negligible [5] and hence ignored hereafter. Thus, VCDL jitter is our main worry.

In a DLL MPCG, the VCDL generates two types of jitter: random noise jitter caused by *thermal noise* and deterministic mismatch jitter due to *mismatch* of the DUs. The DLL renders no improvement of VCDL noise jitter. Again, the VCDL noise jitter is lowest for low values of the loop bandwidth, in which case it would be almost equal to that of a free-running VCDL [5]. The jitter will thus accumulate from one DU to the other. If the noise jitter variance of one DU is  $\sigma_{t,DU,noise}^2$ , and we assume uncorrelated white noise, the noise jitter variance on the output of the  $m^{th}$  delay unit will be *m* times bigger. For multi-phase clock applications like the software defined radio transmitter in [3], the jitter of every clock phase is equally relevant. To quantify the jitter of a set of *M*-phase clocks, the averaged jitter variance of the *M* clocks is a meaningful quantity. The average noise jitter variance generated by the DLL can be calculated as:

$$(\sigma_{t,DLL,noise}^2)_{avgN} = \frac{1}{M} \cdot \sum_{m=1}^M m \cdot \sigma_{t,DU,noise}^2 = \frac{M+1}{2} \sigma_{t,DU,noise}^2$$
(3.1)

Different from noise jitter, the DLL loop *can* improve the deterministic mismatch jitter. The start and end of the VCDL are both aligned to the reference clock and thus have zero deterministic timing error. The maximum mismatch jitter appears at the middle of the VCDL. If we define the mismatch jitter variance of one delay unit as  $\sigma_{t,DU,mis}^2$ , the jitter variance on the output of the *m*<sup>th</sup> delay unit can be calculated as [5]

$$\sigma_{t,DU_m,mis}^2 = \frac{m(M-m)}{M} \sigma_{t,DU,mis}^2 \,. \tag{3.2}$$

The average mismatch jitter variance generated is then:

$$(\sigma_{t,DLL,mis}^{2})_{avgM} = \frac{M^{2} - 1}{6M} \sigma_{t,DU,mis}^{2} \approx \frac{M^{2} > 1}{6} \sigma_{t,DU,mis}^{2} \cdot$$
(3.3)



Figure 3.3. (a) SR MPCG architecture (b) DFF block schematic.

# **3.3 SR MPCG JITTER**

#### 3.3.1 SR MPCG Architecture

The architecture of a SR MPCG, sometimes referred to as a ring counter, is shown in Fig. 3.3(a). It consists of a D flip-flop (DFF) chain with M identical DFFs. A reference clock  $CLK_{refs}$  generated by a VCO with frequency  $M \cdot f$ , is fed into the DFF chain. A flip logic (FL) circuit monitors the M outputs of the DFF chain and flips the logic value at the D input of the first DFF twice every M reference clock cycles. In other words, the outputs of the DFF chain run at a frequency of f and the SR based MPCG also functions as a divide-by-M divider. Since a DFF is sensitive to rising or falling edges, the Q output of each DFF is delayed from the previous DFF's output by one reference clocks  $CLK_{I}\sim CLK_{M}$  are generated. Depending on different implementations of the flip logic, the duty cycle of the M-phase clocks can theoretically vary from 1/M to (M-1)/M. For example, if 18-phase clocks with a 1/3 duty cycle are wanted, the flip logic can simply be a NOR-gate with  $CLK_6$  and  $CLK_{12}$  as its inputs [3]. This gives the SR based MPCG extra flexibility.

### 3.3.2 SR MPCG Output Jitter

The SR MPCG output jitter can be divided into two parts: jitter transferred from the reference clock and jitter generated by the DFF chain. The flip logic is simply a logical "enabler" for the first DFF and will not contribute to jitter.

For the jitter transferred from the reference clock, the SR MPCG renders no improvement. Any timing error at the reference clock will be transferred to the DFF chain outputs.

Similar to the VCDL, the DFF chain also generates two types of jitter: noise jitter and mismatch jitter. However, there is *no jitter accumulation* from one DFF to the other, since

each DFF output only acts as an "enabler" for the next DFF, while the VCO defines the timing. A DFF can be designed with two master/slave latches as shown in Fig. 3.3(b). For a proper design, only the second latch contributes to jitter since the first is just an "enabler". If we define the rms noise and mismatch jitter variance of one latch as  $\sigma_{t,Latch,noise}^2$  and  $\sigma_{t,Latch,noise}^2$  respectively, the average jitter variance for the set of *M*-phase clocks generated by the SR can be easily calculated as

$$(\sigma_{t,SR,noise}^2)_{avgM} = \frac{1}{M} \cdot \sum_{m=1}^M \sigma_{t,Latch,noise}^2 = \sigma_{t,Latch,noise}^2, \qquad (3.4)$$

$$(\sigma_{t,SR,mis}^2)_{avgM} = \frac{1}{M} \cdot \sum_{m=1}^M \sigma_{t,Latch,mis}^2 = \sigma_{t,Latch,mis}^2 \cdot$$
(3.5)

### 3.4 Comparison between DLL and SR MPCG Jitter

#### 3.4.1 Comparing Jitter Transferred from the Reference Clock

From the analysis above, we see that both the DLL and SR MPCGs render no improvement on the reference clock jitter. However, the SR MPCG needs a reference clock with M times higher frequency than the DLL. If both clocks are generated by a VCO, the VCO for the SR should work at M times higher frequency, raising the question how this impacts power consumption. Assuming the VCO has an  $1/f^2$  power spectrum and its quality of design is adequately assessed via the often used VCO figure-of-merit  $FOM_{VCO}$  [7], the single-side-band phase noise to carrier ratio at an offset frequency  $f_m$  can be expressed as

$$\mathcal{L}_{VCO}(f_m) = \frac{10^{FOM_{VCO}/10}}{P_{VCO}/1\text{mW}} \cdot \frac{f_{VCO}^2}{f_m^2}$$
(3.6)

where  $f_{VCO}$  is the frequency and  $P_{VCO}$  is the power dissipation of the VCO. It is well-known that the variance for long term absolute jitter is related to the total area of its power spectrum, i.e. the reference clock jitter variance  $\sigma_{t,ref}^2$  becomes

$$\sigma_{t,ref}^{2} = \frac{2 \times \int_{f_{l}}^{f_{h}} \mathcal{L}_{VCO}(f_{m}) d(f_{m})}{(2\pi f_{VCO})^{2}} = \frac{10^{FOM_{VCO}/10}}{2\pi^{2} \cdot P_{VCO}} \cdot (\frac{1}{f_{l}} - \frac{1}{f_{h}})$$
(3.7)

where  $[f_i, f_h]$  is the specified integration region. Equation (3.7) indicates that although the VCO in the SR MPCG runs at *M* times higher frequency, it outputs the same jitter, given the same power and the same quality of design. For an *LC* VCO, higher working frequency may even be preferred, since it leads to a smaller inductor value which in turn requires less chip area [8]. On the other hand there are limits of increasing the frequency such as the self-resonant frequency of the inductor and the clock buffer power consumption.

In most practical designs, the VCO will be part of a PLL where it is locked to a low frequency crystal oscillator. From Chapter 2 we see that running the PLL at a higher frequency will not increase the output jitter for a given power budget. The PLL for the SR at the first glance seems to require an extra divide-by-*M*. However, it is not necessary since the SR itself functions as a divide-by-*M* and can be re-used.

#### 3.4.2 Comparing Jitter Generated due to Thermal Noise

To compare the jitter generated by the two MPCGs, we assume that they both use current mode logic (CML) circuits<sup>2</sup>. The simplified schematic of a CML delay unit is shown in Fig. 3.2(b). It is based on an NMOS source coupled differential pair driving the resistive load  $R_L$  and biased by a current source  $I_B$ . As the loads are *RC* circuits, the propagation delay  $t_d$  can be approximated as:

$$t_d = \ln 2 \cdot R_L C_L = \ln 2 \cdot (V_{SW} / I_B) \cdot C_L \tag{3.8}$$

where  $V_{SW}$  is the differential output swing and is determined by  $R_L$  and  $I_B$  due to the full switching of the tail current.

The CML implementation of a latch is shown in Fig. 3.4(a). For a proper operation, the D inputs of the latch should be already stable before the *CLK* starts to switch. For example, D is high and  $\overline{D}$  is low and therefore, at the switching moment, transistors M4 and M5 are off. M3 and M6 are in their saturation region and work as cascode transistors on top of the differential pair. The noise contribution of M3-M6 can thus be neglected. The schematic of the latch can be simplified to Fig. 3.4(b) which is exactly the same as the schematic of the CML delay unit in Fig. 3.2(b). Therefore, we can apply the same noise jitter analysis for the delay unit and the latch.

The noise jitter variance of a CML delay unit can be predicted using the analysis presented in [9] as:

$$\sigma_{t,noise}^{2} = (1 + \gamma + \gamma_{T} \cdot \frac{2I_{B}}{V_{gs,eff}} \cdot \frac{R_{L}}{2}) \cdot \frac{2kTC_{L}}{I_{B}^{2}}$$
(3.9)

where  $\gamma$  and  $\gamma_T$  are respectively the noise factor of the differential pair transistors and the tail bias transistor,  $V_{gs,eff}$  is the effective gate-source voltage of the tail bias transistor and  $2I_B/V_{OV,T}$  represents its transconductance assuming a square-law model.

<sup>&</sup>lt;sup>2</sup> Although the following comparisons are based on CML circuit, the analytical approach developed can also be used when the MPCGs are implemented with other logic families.



Figure 3.4. (a) Schematic of a CML latch at the switching instant. (b) Simplified schematic for jitter analysis.

In most of the clock generator designs, jitter and power are both important. Via admittance level scaling [10], both noise and mismatch jitter can always be reduced at the cost of increasing the power consumption *P*. The tradeoff between jitter and power is also clear from the analysis in Chapter 2. In order to take this tradeoff into account, we define a 1 mW power normalized jitter variance for a fair comparison:

$$(\sigma_t^2)_{NorP} = \sigma_t^2 \cdot (P/1\text{mW}) \tag{3.10}$$

For a given circuit, applying admittance level scaling will not change the value of  $(\sigma_t^2)_{NorP}$ . A circuit with smaller  $(\sigma_t^2)_{NorP}$  means that it generates less jitter for a given amount of power. For a CML circuit, the power consumption is dominated by the static power  $I_B V_{DD}$ . With (3.9) and (3.10), we find for both a CML delay unit and latch:

$$(\sigma_{t,noise}^2)_{NorP} = (1 + \gamma + \gamma_T \cdot \frac{I_B R_L}{V_{gs,eff}}) \cdot \frac{2kT \cdot V_{DD}}{1\text{mW}} \cdot \frac{C_L}{I_B}.$$
(3.11)

Substituting (3.8) into (3.11) yields:

$$(\sigma_{t,noise}^{2})_{NorP} = \{(1 + \gamma + \gamma_{T} \cdot \frac{V_{SW}}{V_{gs,eff}}) \cdot \frac{2kT \cdot V_{DD}}{\ln 2V_{SW} \cdot \ln W}\} \times t_{d} \cdot$$
(3.12)

Equation (3.12) indicates that the *power normalized noise jitter variance is proportional* to  $t_d$ .

In a DLL, if  $t_d$  is tuned by tuning  $R_L$  while keeping  $V_{SW}$  constant,  $I_B$  and thus  $V_{gs,eff}$  in (3.12) will vary with  $t_d$ . Here to simplify the comparison, we ignore this second order effect and assume the delay unit and the latch have the same  $V_{SW}$  and  $V_{gs,eff}$ . We will see the consequence of this simplification in Section 3.5. A DLL has *M* delay units contributing to

jitter and power while a SR has M latches contributing to jitter and 2M latches dissipating power. The average noise jitter variance generated by the DLL and the SR MPCGs can then be compared using (3.1), (3.4) and (3.12) as

$$\frac{(\sigma_{t,SR,noise}^2)_{avgM,NorP}}{(\sigma_{t,DLL,noise}^2)_{avgM,NorP}} = \frac{(\sigma_{t,Latch,noise}^2)_{NorP} \times 2M}{\frac{M+1}{2} \times (\sigma_{t,DU,noise}^2)_{NorP} \times M} = \frac{4}{M+1} \cdot \frac{t_{d,Latch}}{t_{d,DU}} \cdot$$
(3.13)

The comparison result thus depends on the amount of delay of the delay unit  $t_{d,DU}$  and that of the latch  $t_{d,Latch}$ . In a DLL MPCG, the VCO defines the frequency and the VCDL defines the delay in between the *M* output clocks. Both the VCO and the VCDL need to be tuned for the DLL MPCG to work at a frequency *f*, where the delay of each delay unit should satisfy

$$t_{d,DU} = \frac{T}{M} = \frac{1}{M \cdot f} \cdot \tag{3.14}$$

In contrast, the SR MPCG is more flexible. For different f, only the VCO needs to be tuned since both the frequency and the delay in between the M output clocks are defined by the period of the VCO. The only concern is that the DFFs should operate correctly, which requires [11]

$$t_{d,Latch} + t_{su} \le \frac{1}{M \cdot f} \tag{3.15}$$

where  $t_{su}$  is the setup time required by the DFF. Defining the maximum working frequency of a SR MPCG for *M*-phase clock generation in a certain technology as  $f_{max,SR}$ , the latch delay will have its minimum value  $t_{d,Latch,min}$  at  $f_{max,SR}$  given by

$$t_{d,Latch,\min} = \frac{1}{1 + \alpha_{su}} \cdot \frac{1}{M \cdot f_{\max,SR}}$$
(3.16)

with  $\alpha_{su}$  the ratio between  $t_{su}$  and  $t_{d,Latch,min}$ . As a small delay is preferred for a small  $(\sigma_{t,noise}^2)_{NorP}$ , the latch delay can be set to its minimum in (3.16). For a delay unit, the delay is limited by (3.14). Taking this factor into account, (3.13) can be re-written as

$$\frac{(\sigma_{i,noise,SR}^2)_{avgM,NorP}}{(\sigma_{i,noise,DLL}^2)_{avgM,NorP}} = \frac{1}{1+\alpha_{su}} \cdot \frac{f}{f_{\max,SR}} \cdot \frac{4}{M+1} \cdot$$
(3.17)

As soon as the wanted number of clock phases is larger than three (M>3), (3.17) is smaller than one since the DFF needs a finite setup time ( $a_{su}>0$ ) and the working frequency of the SR can't surpass the technology limit ( $f \le f_{max,SR}$ ). This means that the SR based MPCG generates less noise jitter than the DLL counterpart for a given power budget. Equation (3.17) also indicates that the noise jitter advantage of the SR based MPCG will be larger if more advanced technologies are used and in applications where clocks with a larger number of phases at lower frequencies are needed.

#### 3.4.3 Comparing Jitter Generated due to Mismatch

Based on similar reasoning as for the noise jitter analysis, the latch can be simplified as shown in Fig. 3.4(b) for mismatch jitter analysis and we can apply a similar analysis. In a CML delay unit, there are two mismatch jitter sources: one is the *RC* load which contributes to *RC* delay mismatch  $\sigma_{t,RC,mis}^2$  and the other is the differential pair input referred offset voltage  $\sigma_{Voff}^2$  which makes the switching moment deviate from the actual crossing point of the input clocks. The tail bias transistor mismatch does not lead to jitter since it is a common mode error and we are interested in the crossing points.

Using (3.8), the jitter due to the *RC* load mismatch becomes

$$\left(\frac{\sigma_{t,RC,mis}}{t_d}\right)^2 = \sigma_{\Delta R_L/R_L}^2 + \sigma_{\Delta C_L/C_L}^2$$
(3.18)

with  $\Delta R_L$  and  $\Delta C_L$  the absolute error in the value of  $R_L$  and  $C_L$ .

In a DLL, the *RC* delay must be tunable. For simplicity, we assume that  $C_L$  is tuned by putting less or more capacitors in parallel and  $R_L$  is tuned by putting less or more resistors in parallel<sup>3</sup>. Since the matching improves with area [10], (3.18) can be rewritten as:

$$\sigma_{t,RC,mis}^{2} = [(A_{R} \cdot \sqrt{R_{L}})^{2} + (A_{C} / \sqrt{C_{L}})^{2}] \times t_{d}^{2}$$
(3.19)

where  $A_R$  and  $A_C$  are IC process constants for the matching property of the load resistor and capacitor, respectively.

The input referred offset voltage of a differential pair can be calculated using the method presented in [12] as

$$\sigma_{Voff}^2 = \sigma_{\Delta Vt}^2 + \frac{I_B}{4K} \times \sigma_{\Delta R_L^{'}/R_L}^2 + \frac{I_B}{4K} \times \sigma_{\Delta K/K}^2$$
(3.20)

where  $\sigma^2_{\Delta V t}$  is the differential pair threshold voltage mismatch variance,  $\Delta R'_L$  is the relative error between the two  $R_L$  loads, *K* is the transconductance parameter of the differential pair with  $\sigma^2_{\Delta K/K}$  describing its mismatch.

The total mismatch jitter variance  $\sigma_{t,mis}^2$  can be found by adding  $\sigma_{t,RC,mis}^2$  and the jitter variance caused by  $\sigma_{Voff}^2$  which is  $\sigma_{Voff}^2$  divided by  $(I_B/C_L)^2$ , the square of the slope of the differential switching voltage at the zero crossing.

<sup>&</sup>lt;sup>3</sup> If  $R_L$  is realized with a MOS transistor in linear region and  $R_L$  is tuned by tuning the gate voltage, it can be shown that the matching property of  $R_L$  in a DLL delay unit is even worse.

$$\sigma_{t,mis}^{2} = A_{R}^{2} \cdot R_{L} \cdot t_{d}^{2} + \frac{A_{C}^{2} \cdot t_{d}^{2}}{C_{L}} + \frac{\sigma_{\Delta V t}^{2} + \frac{I_{B}}{4K} \times A_{R}^{2} \cdot R_{L} + \frac{I_{B}}{4K} \times \sigma_{\Delta \beta/\beta}^{2}}{(I_{B}/C_{L})^{2}}.$$
(3.21)

The power normalized mismatch jitter variance can be derived with (3.10) and (3.21) as

$$(\sigma_{t,mis}^{2})_{NorP} = \frac{V_{DD}}{\ln W} \cdot \{V_{SW} \cdot A_{R}^{2} \times t_{d}^{2} + \ln 2 \cdot V_{SW} \cdot A_{C}^{2} \times t_{d} + \frac{\sigma_{\Delta V t}^{2}}{\ln 2 \cdot V_{SW}} \times C_{L} \cdot t_{d} + \frac{A_{R}^{2}}{\ln 2 \times 4K} \times C_{L} \cdot t_{d} + \frac{\sigma_{\Delta K/K}^{2}}{4K} \times C_{L}^{2}\}.$$
(3.22)

Equation (3.22) shows that the delay unit and latch generates less mismatch jitter for a smaller delay, with a given power. It also suggests that with a constant  $V_{SW}$ , it's better for a DLL to tune up  $R_L$  instead of  $C_L$  when larger delay is needed.

Assuming the terms with  $t_d$  proportionality in (3.22) which include the threshold voltage mismatch are the dominating mismatch jitter sources and setting the other initial conditions the same for a fair comparison, the mismatch jitter generated by the DLL and SR can be compared with (3.3), (3.5) and (3.22) as

$$\frac{(\sigma_{\iota,SR,mis}^2)_{avgM,NorP}}{(\sigma_{\iota,DLL,mis}^2)_{avgM,NorP}} \approx \frac{12}{M} \cdot \frac{t_{d,Latch}}{t_{d,DU}} \cdot$$
(3.23)

Substituting (3.14) and (3.16) into (3.23) yields:

$$\frac{(\sigma_{t,SR,mis}^2)_{avgM,NorP}}{(\sigma_{t,DLL,mis}^2)_{avgM,NorP}} = \frac{1}{1 + \alpha_{su}} \cdot \frac{f}{f_{max,SR}} \cdot \frac{12}{M} \cdot$$
(3.24)

The situation where (3.24) is larger than one only occurs when the wanted clock frequency *f* is close to  $f_{max,SR}$  and the wanted number of clock phases *M* is smaller than 12. In other cases, (3.24) is smaller than one, which means that the SR MPCG generates less mismatch jitter than the DLL counterpart for a given power budget. Equation (3.24) also indicates that the mismatch jitter advantage of the SR based MPCG will be larger if more advanced technologies are used and a larger number of clock phases at lower frequencies are needed.

#### 3.4.4 Discussion

The analysis above shows that both a SR MPCG and a DLL MPCG have no improvement on the reference clock jitter and transfers the same amount of jitter from the reference clock. It is therefore critical for both MPCGs to have a clean reference clock in order to achieve a low output jitter. The design of a clock generation PLL with very low jitter will be discussed in the next chapter. Apart from the jitter transfer, analysis shows that



Figure 3.5. Noise jitter simulation results in 0.13-µm CMOS with M=8 for (a) a CML delay unit (b) DLL and SR comparison.



Figure 3.6. Mismatch jitter simulation results in 0.13-µm CMOS with M=8 for (a) a CML delay unit (b) DLL and SR comparison.

a SR MPCG almost always generates less jitter<sup>4</sup> than a DLL MPCG for a given power consumption. For mismatch jitter, the DLL MPCG may have a slight advantage in some high frequency cases<sup>5</sup>.

From an implementation point of view, the SR MPCG has a simpler architecture since it does not require analog tuning. However, it can be more difficult to implement in applications where M is large and f is high since it works at  $M \cdot f$ , but this improves as technology advances. Another concern is that the loading of the VCO is more severe in the SR MPCG, since it needs to drive M DFFs. This problem can be alleviated by down-scaling

<sup>&</sup>lt;sup>4</sup> In case phase noise is important, the SR is also better as both the SR and DLL generate white phase noise, while the reference clock has the same spectrum shape for both cases.

<sup>&</sup>lt;sup>5</sup> If 50% reference clock duty cycle is guaranteed, both edges can be used. The *M* DFFs in the SR can be replaced with *M* latches as in [3]. The previous analysis then overestimates the SR MPCG power consumption by two times.

the DFFs by admittance scaling [10], which is acceptable because they generate less jitter than the delay units, thus saving power and chip area.

Aiming for multi-functionality (e.g. software defined radio), we would like a flexible MPCG to adapt to largely different data rates, sampling rates or radio frequencies. The SR MPCG is clearly more attractive. It is basically a digital circuit which can operate from arbitrarily low frequency up to  $f_{max,SR}$ , while the frequency operation range of a DLL is limited by the tuning range of the delay line. Also, a SR can basically instantaneously change its output frequency, while a DLL settles slowly, due to the preferred low loop bandwidth. Finally, a SR MPCG has the flexibility to generate clocks with different duty cycle.

### **3.5 Simulation Results**

In order to verify the calculations, simulations were done for a DLL and a SR for M=8 in 1.2 V 0.13-µm CMOS. The reference clocks are voltage sources with 1 k $\Omega$  source resistance. The VCDL delay is tuned up by tuning the load resistance as suggested by (3.22) while keeping  $V_{SW}$  to be 0.6 V. For the DFFs,  $\alpha_{su}$  is about 0.5. The load capacitance is 100 fF, which is comparable to the parasitic capacitances. In this implementation,  $f_{max,SR}$  is about 1.5 GHz for 8-phase clock generation. Fig. 3.5 shows the strobed PNoise analysis results for noise jitter. The simulated values coarsely fit the estimated curve. The larger deviation when  $t_d$  is larger relates to the simplification we made below (3.12). We see this simplification is in favor of the DLL which normally has a larger  $t_d$ . Therefore, it does not affect the conclusion. Fig. 3.6 shows the Monte Carlo simulation results for mismatch jitter. The bent shape of the simulated values when  $t_d$  is tuned from low to high is predicted by (3.22). The simulated values fit the estimated curve well which means the threshold voltage mismatch dominates in this design.

## 3.6 Conclusion

This chapter discusses two common multi-phase clock generation methods and motivates why a SR MPCG is more attractive for low jitter applications. Analysis shows that a SR MPCG almost always generates less jitter than a DLL equivalent when both are realized with CML circuits, at a given power budget. This is partly because a SR MPCG has no jitter accumulation from one clock phase to the other as in a DLL counterpart. In addition, a SR MPCG can use latches with very small delay time, while jitter generation of a CML circuit is proportional to its (functionally required) delay time. A SR MPCG requires a reference clock with higher frequency, which can be realized in a power neutral way provided that the VCO core determines the power consumption. Furthermore, a SR MPCG is also more attractive for flexible multi-functional circuits than a DLL MPCG as it is easier to change its frequency and duty cycle. The advantages of a SR MPCG will be larger as technology advances.

# **3.7 References**

- [1] C. K. Yang and M. A. Horowitz, "A 0.8-μm CMOS 2.5 Gb/s oversampling receiver and transmitter for serial links," *IEEE J. Solid-State Circuits*, vol. 31, pp. 2015-2023, Dec. 1996.
- [2] W. C. Black and D. A. Hodges, "Time interleaved converter arrays", *IEEE J. Solid-State Circuits*, vol.15, no. 6, pp. 1022–1029, Dec. 1980.
- [3] E. Klumperink, R. Shresta, E. Mensink, V. J. Arkesteijn and B. Nauta, "Cognitive radios for dynamic spectrum access - polyphase multipath radio circuits for dynamic spectrum access," *IEEE Communications Magazine*, vol. 45, no.5, pp. 104-112, May 2007.
- [4] X. Gao, E. Klumperink and B. Nauta, "Low-jitter multi-phase clock generation: a comparison between DLLs and shift registers," *IEEE Int. Symp. Circuits Syst.*, pp. 2854-2857, May 2007.
- [5] R. C. H. van de Beek, E. Klumperink, C. Vaucher and B. Nauta, "Low-jitter clock multiplication: a comparison between PLLs and DLLs," *IEEE Trans. Circuits Syst. II*, vol. 49, pp. 555-566, Aug. 2002.
- [6] M.-J. Lee, W. J. Dally, T. Greer, H.-T. Ng, R. Farjad-Rad, J. Poulton and R. Senthinathan, "Jitter transfer characteristics of delay-locked loops-theories and design techniques," *IEEE J. Solid-State Circuits*, vol. 38, pp. 614-621, Apr. 2003.
- [7] P. R. Kinget, "Integrated GHz voltage controlled oscillators," Analog Circuit Design: (X)DSL and Other Communication Systems; RF MOST Models; Integrated Filters and Oscillators, W. Sansen, J. Huijsing and R. van de Plassche, Ed. Boston, MA: Kluwer, pp. 353-381, 1999.
- [8] S.-A. Yu and P. R. Kinget, "Scaling LC oscillators in nanometer CMOS technologies to a smaller area but with constant performance," *IEEE Trans. Circuits Syst. II*, vol. 56, pp. 354-358, May 2009.
- [9] S. Levantino, L. Romano, S. Pellerano, C. Samori and A. L. Lacaita, "Phase noise in digital frequency dividers," *IEEE J. Solid-State Circuits*, vol. 39, no.5, pp. 775-784, May 2004.
- [10] E. Klumperink and B. Nauta, "Systematic comparison of HF CMOS transconductors," *IEEE Trans. Circuits Syst. II*, vol. 50, no.10, pp. 728-741, Oct. 2003.
- [11] J. M. Rabaey, *Digital Integrated Circuits, A Design Perspective* Englewood Cliffs, NJ: Prentice-Hall, 1996.

[12] P. R. Gray, P. Hurst, S. Lewis and R. Meyer, *Analysis and Design of Analog Integrated Circuits*, 4th Edition. John Wiley & Sons, Inc., pp.236-237, 2001.

# Chapter 4

# Low Jitter Sub-Sampling PLL

## 4.1 Introduction

A clock with low jitter/phase-noise is a fundamental requirement in many applications, e.g. in wireless communication systems to up-convert and down-convert the wanted signals and in ADCs to accurately define the sampling moments. The goal of our research is to develop a clock generation PLL with low jitter as well as low power. To the present time, many different PLL architectures [1-3] have been developed. The classical PLL architecture as shown in Fig. 4.1 is probably the most popular one in modern PLL ICs [4-16]. From the analysis in Chapter 2, we conclude that the PLL phase noise can be divided into two parts: 1) the VCO noise which dominates out-of-band; 2) the loop noise (noise from the reference clock, PD/CP and divider) which dominates in-band as illustrated in Fig. 4.1(c). In an optimized PLL, the two types of noise contribute equally to the output jitter and thus are equally important. The VCO phase noise has been studied in literature and noise reduction techniques have been addressed, e.g. in [17-19]. The focus of this work is on reducing the loop noise, i.e., the in-band phase noise. In a classical PLL, the main loop noise sources are usually the PD/CP and the divider. Due to the existence of the divide-by-N in the feedback path, the PD/CP and divider noise (in power) is multiplied by  $N^2$  when transferred to the PLL output. This is often the bottleneck for a classical PLL to achieve low phase noise.

Phase detectors based on the principle of voltage sampling is an old practice [1], [2]. Unlike the widely used 3-state phase-frequency detector (PFD), a sampling or sample-and-hold PD can work without using a divider as we will show later. Thus divider noise and power dissipation can be eliminated. However, using a sampling PD has drawbacks like the need for a large filter capacitor due to its large detection gain and limited acquisition range [2], which have kept it from wide use in fully integrated PLLs. In this chapter, we describe our proposed PLL architecture [20] which utilizes voltage sampling and overcomes the aforementioned drawbacks. In addition to the elimination of divider noise, analysis shows that, in contrast to what happens in a classical PLL, the PD/CP noise is not multiplied by  $N^2$  in this (sub-)sampling PLL. As a result, the in-band phase noise is greatly improved which leads to a PLL design with very low jitter as well as low power.



Figure 4.1. Classical PLL (a) architecture, (b) phase domain model, (c) phase noise spectrum (1/f noise neglected).

Following this introduction, Section 4.2 discusses and compares the CP noise contributions in a PLL using a classical 3-state PFD/CP and a PLL using a sub-sampling PD/CP. Section 4.3 describes the proposed sub-sampling PLL architecture, analyze its noise performance and discusses the design techniques used to overcome its drawbacks. The circuit level design is described in Section 4.4 and the experimental results are presented in Section 4.5. Finally, Section 4.6 draws conclusions.

### 4.2 Low Noise Phase Detection

In the following sections, we will discuss the PD/CP noise, with focus on the CP noise which often dominates. In order to calculate the CP noise contribution in a feedback system like a PLL, it is convenient to define a CP feedback gain  $\beta_{CP}$  as the gain from the PLL output to the CP output. Using the phase domain model in Fig. 4.1(b), the close loop CP noise transfer function can be calculated as:

$$H_{CP}(s) = \frac{\phi_{out,n}}{i_{CP,n}} = \frac{1}{\beta_{CP}} \cdot \frac{\beta_{CP} \cdot F_{LF}(s) \cdot K_{VCO} / s}{1 + \beta_{CP} \cdot F_{LF}(s) \cdot K_{VCO} / s} = \frac{1}{\beta_{CP}} \cdot \frac{G(s)}{1 + G(s)}$$
(4.1)

where G(s) is the PLL open loop transfer function.



Figure 4.2. 3-state PFD/CP: (a) schematic, (b) timing diagram, (c) characteristic.

Inside the PLL bandwidth, G(s) >> 1 and the PLL in-band phase noise contributed by the CP can be approximated as:

$$\mathcal{L}_{\text{in-band,CP}} \approx \frac{1}{2} \cdot S_{iCP,n} \cdot |H_{CP}(s)|^2 \approx \frac{S_{iCP,n}}{2\beta_{CP}^2}$$
(4.2)

where the phase noise is expressed with the often used single sideband noise-power to carrier-power ratio  $\mathcal{L}$  and  $S_{iCP,n}$  is the power spectral density of the CP current noise.

Equation (4.2) indicates that the CP noise is suppressed by  $(\beta_{CP})^2$  when transferred to the PLL output. A larger  $\beta_{CP}$  is thus desired as it provides more suppression for the CP noise.

### 4.2.1 Classical 3-state PFD/CP

In the classical 3-state PFD/CP as shown in Fig. 4.2, the VCO output is firstly divided down so that the divider output Div has the same frequency as the reference clock Ref. The timing/phase of Div and Ref are then compared and the CP outputs a current pulse with width equal to the amount of timing/phase error. The CP feedback gain of the classical 3-state PFD/CP can be calculated as:

$$\beta_{CP,PFD} = \frac{\Delta i_{CP}}{\Delta \phi_{VCO}} = \frac{I_{CP} \cdot (\Delta \phi_{div} / 2\pi)}{\Delta \phi_{VCO}} = \frac{I_{CP}}{2\pi} \cdot \frac{1}{N} = K_d \cdot \frac{1}{N}$$
(4.3)

where  $I_{CP}$  is the bias current of the CP current sources,  $\overline{i_{CP}}$  is the mean CP output current,  $\Delta \phi_{VCO}$  and  $\Delta \phi_{div}$  are respectively the VCO and divider phase error. Equation (4.3) indicates that  $\beta_{CP,PFD}$  is reduced by the frequency division ratio N. That is the reason why the CP noise power is multiplied by  $N^2$  as according to (4.2) the CP noise contribution is inversely proportional to  $(\beta_{CP})^2$ .

The reduction of  $\beta_{CP,PFD}$  by the division ratio is perhaps easier understood in the time domain where the VCO timing error is directly transferred to the divider output without scaling. When a timing error  $\Delta t$  between the VCO/Div and Ref is detected, the CP will output a current pulse with width  $\Delta t$ . The mean CP output current is then  $I_{CP} \Delta t / T_{ref}$  with  $T_{ref}$ the period of Ref. If we increase N while keeping  $f_{VCO}$  the same,  $f_{ref}$  becomes lower and  $T_{ref}$ becomes larger. On the other hand, the width of the CP output current pulse remains the same for the same amount of VCO/Div timing error. Consequently, the mean CP output current becomes smaller due to the larger  $T_{ref}$  corresponding to a lower  $\beta_{CP,PFD}$ .

It is possible to physically eliminate the divider (and its noise contribution) and design a 3-state PFD/CP based divider-less PLL as proposed in [21], where the PFD compares the phase of the VCO and Ref at every rising edge of Ref for only a small time window (aperture). However, since the phase detection mechanism remains the same,  $\beta_{CP,PFD}$  remains proportional to  $\Delta t/T_{ref}$  meaning that it is still reduced by N and the CP noise is still multiplied by  $N^2$ .

In steady state, a CP driven by a PFD is switched on only for  $\tau_{PFD}$  in each period  $T_{ref}$ . Assuming that the noise of the CP UP/DN current source is dominated by a single MOS transistor with transconductance  $g_m$ , the power spectral density of the (thermal) noise generated by the CP can be estimated as:

$$S_{iCP,n,PFD} = 8kT\gamma \cdot g_m \cdot \frac{\tau_{PFD}}{T_{ref}}$$
(4.4)

where  $\gamma$  is a noise model parameter of the MOS transistor typically in the range of 2/3 to 1.5.

#### 4.2.2 Proposed Sub-Sampling PD/CP



Figure 4.3. Sampling based PD (a) conceptual diagram (b) timing diagram.



Figure 4.4. Conceptual schematic and characteristic of a sub-sampling based PD/CP.

The sampling based PD has been known for years [1]. Fig. 4.3 shows its conceptual diagram and timing diagram. The VCO output, a sine wave with amplitude  $A_{VCO}$  and DC voltage  $V_{DC}$ , is sampled by a reference clock Ref. When the VCO and Ref are phase aligned and their frequency ratio N is an integer, the sampled voltage  $V_{sam}$  has a constant value equal to  $V_{DC}$ . When there is phase error between the VCO and Ref,  $V_{sam}$  will deviate from  $V_{DC}$ . The voltage difference between  $V_{sam}$  and  $V_{DC}$  represents the amount of phase error as shown in Fig. 4.3(b). Note that this PD works without using a divider as soon as the ratio  $f_{VCO}/f_{ref}$  is an integer, which is an often mentioned reason to use it. However, we will show below that a (sub-)sampling PD can also bring a significant phase noise benefit.

In a sampling PD, the timing/phase error is converted into voltage error. Since the high frequency VCO has a high slew rate:  $SR_{VCO}=A_{VCO}\cdot 2\pi f_{VCO}$ , a high detection gain can be expected. Fig. 4.4(a) shows the first step toward our sub-sampling PD/CP (SSPD/CP) proposal. The working principal of the SSPD is the same as the traditional sampling PD. Here we used the name "sub-sampling" to stress the fact that a high frequency VCO is sampled by a low frequency Ref. In order to process  $V_{sam}$  via the traditional current driven loop filter, a transconductor converts voltage  $V_{sam}$  into current  $g_m V_{sam}$ , acting as the UP current source. The DN current source is controlled by  $V_{DC}$ , the expected VCO voltage when sampled at the crossing moment. Thus, in contrast to a traditional CP, the output current is not proportional to  $\Delta t/T_{ref}$ , but rather amplitude controlled by the difference of  $V_{sam}$  and  $V_{DC}$ , which is proportional to  $\Delta t \cdot SR_{VCO}$ . The transfer characteristic of the SSPD/CP has the same shape as the VCO waveform, as shown in Fig. 4.4(b). The ideal locking point is the crossing moment of the sine wave (corresponding to  $V_{sam}=V_{DC}$ ) where it is most linear. The sinusoidal characteristic of the SSPD is similar to that of a mixer based phase detector. However, the SSPD is not sensitive to the duty cycle or shape of the sampling reference clock as it only takes one sample per period instead of processing the whole VCO waveform.

The architecture of a PLL utilizing the SSPD/CP, which we call a sub-sampling PLL (SSPLL), is shown in Fig. 4.5(a). In the steady state, the VCO phase error is small. The gain of the SSPD can be calculated to be:

$$K_{SSPD} = \frac{\Delta v_{sam}}{\Delta \phi_{VCO}} = \frac{A_{VCO} \sin(\Delta \phi_{VCO})}{\Delta \phi_{VCO}} \approx A_{VCO}$$
(4.5)

which is *independent* of the reference and VCO frequency. The gain of the CP is equal to its transconductance:

$$K_{CP} = \frac{\Delta \overline{i_{CP}}}{\Delta v_{sam}} = g_m \,. \tag{4.6}$$

Therefore, the CP feedback gain of the SSPLL can be calculated as:

$$\beta_{CP,SS} = \frac{\Delta \overline{i_{CP}}}{\Delta \phi_{VCO}} = K_{SSPD} \cdot K_{CP} \approx A_{VCO} \cdot g_m \cdot$$
(4.7)

We see that there is no N in (4.7), which means that  $\beta_{CP,SS}$  is not related to N. Consequently, the CP noise of the SSPLL is not multiplied by  $N^2$  when transferred to the output.

Assuming that the CP current source is implemented with a single square-law MOS transistor, (4.7) can be re-written as:



Figure 4.5. Sub-Sampling PLL (a) architecture, (b) phase domain model.

$$\beta_{CP,SS} = A_{VCO} \cdot \frac{2I_{CP}}{V_{es,eff}}$$
(4.8)

where  $V_{gs,eff}$  is the effective gate-source voltage of the MOS transistor and  $2I_{CP}/V_{gs,eff}$  represents  $g_m$ .

Unlike the 3-state PFD/CP, the two current sources in the SSPD/CP are always on. The equivalent CP (thermal) noise current can be estimated as:

$$S_{iCP,n,SS} = 8kT\gamma \cdot g_m. \tag{4.9}$$

### 4.2.3 CP Noise Comparison

In this section, we compare the CP noise contribution of a classical PLL using the 3-state PFD/CP and a SSPLL using the SSPD/CP. In both PLLs, the CP noise contribution can be reduced by increasing the CP bias current  $I_{CP}$ . For a fair comparison, we assume the two CPs use equal  $I_{CP}$ .

The CP feedback gain of the classical PLL and the SSPLL can be compared using (4.3) and (4.8) as:

$$\frac{\beta_{CP,SS}}{\beta_{CP,PFD}} = 4\pi \cdot N \cdot \frac{A_{VCO}}{V_{gs,eff}} \cdot$$
(4.10)

It is easy to see that (4.10) is much larger than 1 as  $4\pi \gg 1$ ,  $N \ge 1$  (most often  $\gg 1$ ) and usually  $A_{VCO} > V_{gs,eff}$ . Thus, the SSPLL has a *much larger*  $\beta_{CP}$  than the classical PLL, and thus has much more suppression for the CP noise.

On the other hand, the CP in the SSPLL is always on and continuously injects noise to the loop filter, while the CP in the classical PLL only injects noise for a fraction of time  $\tau_{PFD}$  during each  $T_{ref}$ . Effectively, the CP in the classical PLL generates  $\tau_{PFD}/T_{ref}$  times less (thermal) noise than the CP in the SSPLL:

$$\frac{S_{iCP,n,PFD}}{S_{iCP,n,SS}} = \frac{\tau_{PFD}}{T_{ref}}.$$
(4.11)

Overall, the in-band phase noise due to the CP of the two PLLs can be compared using (4.2), (4.10) and (4.11) as

$$\frac{\boldsymbol{\mathcal{L}}_{in-band,CP,PFD}}{\boldsymbol{\mathcal{L}}_{in-band,CP,SS}} = (4\pi \cdot N \cdot \frac{A_{VCO}}{V_{gs,eff}})^2 \times (\frac{\tau_{PFD}}{T_{ref}}) = (4\pi \cdot \frac{A_{VCO}}{V_{gs,eff}} \cdot \sqrt{\tau_{PFD}})^2 \times (\frac{f_{VCO}^2}{f_{ref}}) \cdot (4.12)$$

The value of (4.12) indicates the amount of CP noise reduction we can achieve by using a SSPLL instead of a classical PLL. Assuming  $A_{VCO} = 0.4$  V,  $V_{gs,eff} = 0.2$  V and  $\tau_{PFD} = 200$  ps, the ratio in (4.12) is plotted in Fig. 4.6 for  $f_{ref}$  ranging from 1 MHz to 100 MHz and  $f_{VCO}$ ranging from 100 MHz to 10 GHz. We see that the SSPLL has orders of magnitude less CP contributed in-band phase noise than the classical PLL. The advantage of the SSPLL is larger when a higher  $f_{VCO}$  or a lower  $f_{ref}$  are used.

### 4.3 Sub-Sampling PLL

Although the sampling PD has been existing for years, its potential of achieving very low in-band phase noise is not fully appreciated to the best of our knowledge. It also has drawbacks like difficulty of integration (large filter capacitor needed) and limited frequency acquisition range [2], which have kept it from wide use in full integrated PLLs. The sampling PD has been used in MMIC PLLs [22,28] and a DLL [23]. However, both of them use off-chip loop filters. The CDR in [24] also uses a sampling PD but the division ratio is one. To the best of our knowledge, our design [20] is the first fully integrated sub-sampling PD based PLL. In the following sub-sections, we will build a phase domain model for the SSPLL and compare its phase noise with the classical PLL. We will also discuss SSPLL drawbacks and propose design techniques to overcome them.

#### 4.3.1 Modeling and Noise Analysis

A linear phase domain model for the SSPLL is shown in Fig. 4.5(b). Here we model the SSPLL as a time continuous system, which is valid as soon as the PLL bandwidth is an order of magnitude smaller than  $f_{ref}$  [25]. In case the bandwidth is higher, the sampling effects will affect loop stability. They can be modeled using the method in [2] and can be added into Fig. 4.5(b).



Figure 4.6. Theoretical CP noise improvement factor (Equation 4.12) as a function of the VCO frequency for various reference frequencies, assuming  $A_{VCO} = 0.4$  V,  $V_{gs.eff} = 0.2$  V and  $\tau_{PFD} = 200$  ps.

Unlike the classical PLL, there is no divide-by-N in the feedback path in the SSPLL model. Instead, a virtual multiplier multiply-by- $N(\times N)$  is added to the reference clock path. This (physically non-existing) multiplier originates from the sub-sampling process. When the high frequency VCO is sub-sampled by the low frequency Ref, the baseband alias falling in the loop filter band has a frequency of

$$f_{alias} = f_{VCO} - N \cdot f_{ref} \,. \tag{4.13}$$

Therefore, the sub-sampling process works as if the VCO is sampled by a signal with frequency N times higher than Ref. In other words, the frequency and thus phase of Ref is virtually multiplied by N. Viewed in another way, the sampler output voltage is proportional to the timing error between the VCO and Ref. However, a given timing error corresponds to N times more phase error if we refer it to the VCO instead of Ref since  $f_{VCO} = N \cdot f_{ref}$ . As the phase of the VCO is subtracted at the phase comparison point, a multiplication  $\times N$  of the phase of Ref before this subtraction point is incorporated in the model.

Using this phase domain model for noise analysis, we see that the reference clock phase noise is still multiplied by  $N^2$  when transferred to the output, same as in a classical PLL.

However, due to the absence of the divide-by-*N* in the feedback path<sup>1</sup>, both the CP and PD noise is not multiplied by  $N^2$ . Moreover, the SSPLL does not need a divider in the locked state, thus the divider noise is eliminated. Therefore, we can expect the SSPLL to achieve much lower in-band phase noise than the classical PLL. The CP noise analysis was already done in section 4.2 (Fig. 4.6). The noise contribution of the SSPD can be calculated by relating the voltage noise at the SSPD output  $\overline{v_{SSPD,n}^2}$  and the corresponding VCO phase error in steady state:

$$\overline{v_{SSPD,n}^2} = \frac{kT}{C_{sam}} \approx (A_{VCO} \cdot \Delta \phi_{VCO,SSPD})^2$$
(4.14)

where  $C_{sam}$  is the value of the sampling capacitor.

Assuming white noise and using the fact that the SSPD noise is band-limited by  $f_{ref}/2$  due to aliasing, the PLL in-band phase noise due to the SSPD can be calculated as

$$2\mathcal{L}_{in-band,SSPD} \times \frac{f_{ref}}{2} = (\Delta \phi_{VCO,SSPD})^2 \cdot$$
(4.15)

Using (4.14) and (4.15), we get

$$\mathcal{L}_{in-band,SSPD} = \frac{kT}{C_{sam} \cdot A_{VCO}^2 \cdot f_{ref}} \cdot$$
(4.16)

We see that the SSPD noise is indeed not multiplied by  $N^2$ . Because of that, its contribution to the overall in-band phase noise can be small without using a big  $C_{sam}$ . As a numerical example, with  $f_{ref}$ =55 MHz and  $A_{VCO}$ =0.4 V, a 10 fF  $C_{sam}$  is sufficient to bring  $\boldsymbol{\ell}_{in-band SSPD}$  to be as low as -133 dBc/Hz.

### 4.3.2 Chip Area Considerations

In a charge pump PLL, the most common implementation of the loop filter is a passive RC filter where a resistor  $R_1$  is in series with a capacitor  $C_1$ . A second capacitor  $C_2$  is often added in parallel to reduce the voltage ripple. In order to integrate the loop filter on chip, the value of  $C_1$  and  $C_2$  should not be too large. In the following discussions we will neglect  $C_2$  since it is much smaller than  $C_1$  and is not the major concern.

Substituting the loop filter transfer function  $F_{LF}(s) = R_1 + 1/sC_1$  into the PLL phase domain model in Fig. 4.5(b), the PLL open loop bandwidth  $f_c$  and the frequency of the loop gain zero  $f_{zero}$  can be expressed as:

<sup>&</sup>lt;sup>1</sup> Compared with the divider-less PLL in [21], the SSPLL does not only eliminate the physical divider, but also eliminate the divider in the phase domain model. In this sense it is a truly divider-less PLL.



Figure 4.7. Relation between  $\beta_{CP}$  and  $f_{c,opt}$  for the cases of dominant and non-dominant CP noise (conceptual).

$$f_c = \frac{\omega_c}{2\pi} \approx \frac{\beta_{CP} \cdot R_1 \cdot K_{VCO}}{2\pi}, \qquad (4.17)$$

$$f_{zero} = \frac{1}{2\pi \cdot R_1 C_1} \cdot \tag{4.18}$$

Combining (4.17) and (4.18), we get:

$$C_1 = \left\{ \frac{K_{VCO}}{4\pi^2} \cdot \frac{f_c}{f_{zero}} \right\} \cdot \frac{\beta_{CP}}{f_c^2} \cdot$$
(4.19)

In (4.19),  $K_{VCO}$  is related to the VCO analog tuning range requirement and  $f_c f_{zero}$  is related to the phase margin requirement. Once they are specified, the bracketed part is a constant. The value of  $C_1$  is thus proportional to  $\beta_{CP}$  and inversely proportional to the square of  $f_c$ .

In order to achieve low output jitter, the PLL bandwidth  $f_c$  needs to be carefully chosen. From the analysis in Chapter 2 we conclude that the optimal bandwidth  $f_{c,opt}$  for minimum jitter is roughly where the spectrum of the VCO and the loop noise intersects. For lower loop noise,  $f_{c,opt}$  is thus higher, requiring a smaller  $C_1$ . When the loop noise is dominated by the CP noise, having a larger  $\beta_{CP}$  reduces the loop noise and increases  $f_{c,opt}$  as shown in Fig.



Figure 4.8. Schematic of proposed sub-sampling PD/CP with pulse width gain reduction.

4.7(a). However, when the CP noise becomes negligible and other loop-components' noise start dominating the loop noise, having a further larger  $\beta_{CP}$  still reduces the CP noise but will hardly reduce the overall loop noise as shown in Fig. 4.7(b). In the latter case, increasing  $\beta_{CP}$  further can not increase  $f_{c,opt}$ , but does require a larger  $C_1$  to stabilize the PLL. Such an "unnecessarily high"  $\beta_{CP}$  will thus make full integration difficult. Fig. 4.6 shows that the SSPLL reduces the CP noise contribution so much that it easily becomes negligible. Therefore,  $\beta_{CP,SS}$  easily enters the "unnecessarily high" region and it is actually desired to reduce  $\beta_{CP,SS}$  in order to reduce filter capacitor area.

#### 4.3.3 SSPD/CP with Gain Control

Fig. 4.8 shows the proposed SSPD/CP with gain reduction. Instead of leaving the CP always on, two switches and a block called "Pulser" are added. Also, anti-phase VCO outputs and differential sampling are used. The locking point is then the crossing moment of the differential VCO outputs with no need for a reference voltage  $V_{DC}$  as in Fig. 4.4. Using differential sampling also alleviates charge injection and charge sharing issues and helps to reject supply noise.

The Pulser generates a pulse with width  $\tau_{pul}$  and simultaneously switches on the UP- and DN- current sources for a fraction of time  $\tau_{pul}$  in each Ref period  $T_{ref}$ . In this way, the mean CP output current and thus  $\beta_{CP,SS}$  is reduced by  $\tau_{pul}/T_{ref}$ .

$$\beta_{CP,SS} = 2A_{VCO} \cdot g_m \cdot \frac{\tau_{pul}}{T_{ref}} \cdot$$
(4.20)

The additional factor of 2 in (4.20) compared with (4.7) is due to the use of differential sampling. On the other hand, switching on the current sources only for a fraction of time also reduces CP noise:

$$S_{iCP,n,SS} = 8kT\gamma \cdot g_m \cdot \frac{\tau_{pul}}{T_{ref}} \cdot$$
(4.21)

Since the reduction of the CP noise suppression factor  $(\beta_{CP,SS})^2$  is a stronger than the reduction of the CP noise  $S_{iCP,n,SS}$ , the overall effect is that the in-band phase noise due to CP increases with  $T_{ref}/\tau_{pul}$ :

$$\mathcal{L}_{\text{in-band,CP,SS}} \approx \frac{S_{iCP,n,SS}}{2\beta_{CP,SS}^2} = \frac{kT\gamma}{A_{VCO}^2 \cdot g_m} \cdot \frac{T_{ref}}{\tau_{pul}} \cdot$$
(4.22)

By a careful choice of  $\tau_{pul}/T_{ref}$ , the value of  $\beta_{CP,SS}$  will not be "unnecessarily high" but still high enough to keep the CP a negligible source of the loop noise. In this way, the low noise feature of the SSPD/CP can be explored without paying unnecessary filter capacitor area.

Apart from gain reduction, the Pulser also has a second role. In a normal sampler implementation, two non-overlapping track-and-hold circuits are needed in order to make the sampled voltage a constant DC value as shown in Fig. 4.3. By designing the Pulser such that its output has no overlap with the sampling clock Ref, only one track-and-hold circuit is needed to implement the sampler; see Fig. 4.8. In other words, adding the Pulser and the two switches eliminates the need for the second track-and-hold circuit.

The proposed CP in Fig. 4.8 may at first sight look similar to the conventional CP in Fig. 4.2. However, a key difference is that in the proposed CP the current source amplitude is controlled by the SSPD while in the conventional CP the current source switch-on time is controlled by the PFD. Combined with the SSPD, the proposed CP has the unique feature that the CP noise is not multiplied by  $N^2$ .

### 4.3.4 Frequency Locking

Due to its sinusoidal characteristic, the SSPD has limited frequency acquisition range similar to the case of the mixer based PD. Moreover, the sub-sampling process can not distinguish between  $N \cdot f_{ref}$  and other harmonics of  $f_{ref}$  and thus the SSPLL may false lock to an un-wanted division ratio. Therefore, measures are needed to guarantee frequency lock.

Fig. 4.9 shows the top-level block diagram of the proposed SSPLL. The core loop consists of a SSPD/CP, a Pulser, a passive loop filter and a VCO. In order to ensure correct locking of the PLL, a frequency-locked loop (FLL) is added. The FLL consists of a divideby-*N* and a 3-state PFD/CP as in a classical PLL, except that a dedicated dead zone (DZ) is



Figure 4.9. Block diagram of the proposed SSPLL.

inserted between the PFD and CP. The intended PLL action is as follows. When  $f_{VCO}$  is much different from  $N f_{ref}$ , the phase/frequency error between VCO and Ref is large and falls outside of the FLL DZ. The FLL has a larger gain than the core loop, dominates the loop control and brings down  $|f_{VCO} - N f_{ref}|$ . When it is close to locking, the phase error between VCO and Ref is small and falls inside the FLL DZ. The output current of the CP in the FLL will then be zero. The loop settles with a time constant determined by the core loop. The FLL and the divide-by-N then have no influence on the core loop and do not degenerate the PLL jitter performance. In order to realize the aforementioned functions, the width of the DZ is set larger than the expected jitter at the VCO output in the locked state. The bias current for the FLL CP should be set large enough so that the FLL dominates the loop control outside the DZ. After locking is achieved, the FLL can also be disabled to save power.

## 4.4 Design and Implementation

### 4.4.1 VCO and Measurement Buffer

Fig. 4.10 shows the schematic of the VCO and the 50  $\Omega$  buffer for measurement. The LC-VCO used in this design is a (NMOS) current biased one with double switch pair. It has a tuning gain of 50 MHz/V. To increase the frequency tuning range, digital tuning by means of switching on/off MOS capacitors can be easily applied. The output buffer for measurements consists of a tapered multi-stage CML inverter chain. Each stage has scaled dimensions as shown in the figure. The final stage has 50  $\Omega$  on-chip termination resistors. Note that this output buffer is for measurement purpose only, and would not be considered as a part of an integrated PLL sub-system.



Figure 4.10. Schematic of the VCO and 50  $\Omega$  measurement buffer.



Figure 4.11. Schematic of (a) sub-sampling phase detector, (b) charge pump.

### 4.4.2 Phase Detector and Charge Pump

Fig. 4.11 shows the SSPD/CP schematic. The differential sampler is implemented simply with two NMOS transistors and two poly-poly capacitors. Two source follower buffers isolate the sampler from the VCO. Since the buffers also add noise, the SSPD noise contribution is larger than the one calculated in (4.16). The value of the sampling capacitor is chosen to be 60 fF so that the noise from the sampler and its buffer contributes to less than 20% of the overall loop noise. The CP is realized with a differential pair which converts voltage into current and cascode current mirrors which diverts the current into the loop filter. When the Pulser output *Pul* is high, the CP UP- and DN- current sources are connected and inject currents into the loop filter. When *Pul* is low,  $\overline{Pul}$  is high. The current



Figure 4.12. Pulser (a) block diagram, (b) timing diagram, (c) tunable delay cell.

sources are steered away to a voltage  $V_{dump}=V_{DD}/2$  instead of switched off to alleviate the charge sharing between the loop filter and the current sources. After locking is achieved, the VCO phase error is small and thus the variation on the sampled voltage is also small (a few mV in this design). The SSPD/CP characteristic is thus fairly linear in the locked state.

Since the crystal oscillator output is a low slew-rate sine-wave, an inverter chain is used as Ref buffer to convert it into a steep square wave. To achieve low PLL in-band phase noise, the Ref noise is critical as it will be multiplied by  $N^2$  when transferred to the output. Since a high quality crystal oscillator has low phase noise, the buffer is the major source of the Ref noise. The inverter chain especially the first inverter in the chain is sized large to reduce noise, at the expense of power consumption. Interestingly, we observed that the SSPLL has such a low PD/CP noise (and no divider noise) that the Ref buffer becomes the dominant source of the in-band phase noise as well as the dominant source of the power consumption. Simulation shows that it consumes 60% of the total loop power while contributing 50% of the in-band phase noise. We will come back to this in section 4.5.

The Pulser is implemented using a delay cell and a few logic gates as shown in Fig. 4.12(a). Fig. 4.12(b) illustrates the timing of the signals. The width of *Pul* is determined by the amount of delay  $\tau_{pul}$  of the delay cell which has a nominal value of 1.5nS. The delay cell is realized with two inverters as shown in Fig. 4.12(c), where the charging and discharging currents of the output capacitance in the first inverter are controlled by  $V_{tune}$ . Therefore,  $\tau_{pul}$  can be controlled by  $V_{tune}$ . Since the sampling PLL loop gain is proportional to  $\tau_{pul}$  as shown in (4.20), the PLL loop bandwidth can be tuned by tuning  $V_{tune}$ . Note that this bandwidth tuning is done without affecting the operation point of the rest of the circuits. For experimental purposes,  $V_{tune}$  is fed from off-chip in this design. In practice, the delay of the delay cell may subject to process, voltage and temperature (PVT) changes. The delay can then be generated with a DLL which helps to tune out PVT variations.



Figure 4.13. 3-state PFD/CP with dead zone: (a) schematic, (b) example timing diagram when Ref lags.

#### 4.4.3 3-state PFD/CP with Dead Zone

The schematic of the 3-state PFD/CP with dead zone is shown in Fig. 4.13(a). In addition to the conventional 3-state PFD/CP, two D flip-flops (DFFs) are inserted which re-sample the generated UP and DN pulses. Unlike the DFFs in the 3-state PFD, the two added DFFs are triggered by the falling edges, which are  $T_{ref}/2$  delayed from the rising edges if the clock duty cycle is 50%<sup>2</sup>. In this way, any UP and DN pulses with width smaller than  $T_{ref}/2$  will be 'filtered' out, creating a dead zone of (- $\pi$ ,  $\pi$ ). Fig. 4.13(b) shows one example timing diagram when Ref lags illustrating no activity in the right case.

 $<sup>^{2}</sup>$  If the clock duty cycle is ill-defined, an extra divide-by-2 can be added to both the Ref and Div before they drive the 3-state PFD. The duty cycle is then well defined to be 50%.



Figure 4.14. Chip microphotograph.



Figure 4.15. Measured PLL output phase noise.

# **4.5 Experimental Results**

To verify the ideas presented and demonstrate the low loop noise perspective, a prototype chip was designed and fabricated in a standard 0.18- $\mu$ m CMOS process. Fig. 4.14 shows a die microphotograph. The total chip area including the pads is 0.8 x 0.8 mm<sup>2</sup>, while the active area is 0.4 x 0.45 mm<sup>2</sup> and is dominated by the LC VCO. Thanks to the use of the pulse width gain reduction in the SSPD/CP, the loop filter does not require large capacitors and is fully integrated. Aiming at a 60 degree phase margin, the largest filter capacitor  $C_1$  has a value of 90 pF. The IC was tested in a 24 pin Quad LLP package. Excluding the 50  $\Omega$  CML buffer for measurements<sup>3</sup>, the PLL core (including the source follower SSPD buffer and the Ref buffer) consumes 4.2 mA from a 1.8 V supply. The VCO dissipates 1 mA, the Ref buffer 1.9 mA, the VCO buffer 1 mA and the rest circuits 0.3 mA. The FLL consumes 0.8 mA and is disabled after locking is achieved to save power.

The reference clock is derived from an off-chip high quality 55.25 MHz SC Sprinter crystal oscillator from Wenzel Associates. The crystal oscillator output passes an off-chip attenuator before it is fed into the chip such that the signal arriving on-chip has 1.8 V<sub>p-p</sub> amplitude fitting to the 1.8 V supply. Fig. 4.15 shows the phase noise spectrum of the 2.21 GHz PLL output measured from an Agilent E5501B phase noise measurement setup. The in-band phase noise is -126 dBc/Hz at 200 kHz offset and out-of-band phase noise is -141 dBc/Hz at 20 MHz offset. Switching on/off the FLL has negligible effect on the spectrum. The PLL output rms jitter can be related to the phase noise as:

$$\sigma_t^2 = \frac{2 \times \int_{f_i}^{f_h} \mathcal{L}(f) df}{\left(2\pi f_{out}\right)^2}$$
(4.23)

where  $[f_i, f_h]$  is the integration region. The integrated rms jitter from 10 kHz to 40 MHz of this design is 0.15 ps.

According to the noise summary in Spectre RF PNoise simulations, the inverter chain Ref buffer, the crystal oscillator and the rest circuits contributes 50%, 20% and 30% to the in-band phase noise at 200 kHz, respectively. The Ref buffer noise is dominated by the first inverter in the chain as its input is a slow 55.25 MHz sine-wave. The in-band phase noise due to this inverter can be related to the voltage noise  $\overline{v_{out,n}^2}$  and slew-rate  $SR_{out}$  at its output crossing moment as [26]:

$$\mathcal{L}_{\text{in-band,Ref-Buff}} \approx \frac{1}{2} \cdot N^2 \cdot S_{\phi,\text{Ref-Buff},n} = 4\pi^2 \cdot N^2 \cdot f_{ref} \cdot \frac{v_{out,n}^2}{SR_{out}^2} \cdot$$
(4.24)

<sup>&</sup>lt;sup>3</sup> This buffer is needed merely to drive the off-chip 50  $\Omega$  measurement equipment. Similar to all the PLL literature we compared to in this thesis, we exclude this buffer when we report the PLL power consumption. The 50  $\Omega$  buffer also contributes to the PLL phase noise with a white spectrum. However, this contribution is small and is only visible at high offset frequency, e.g. at about 80 MHz in Fig. 4.15.



Figure 4.16. Measured in-band phase noise at 200 kHz offset with different input reference clock amplitudes.



Figure 4.17. Measured PLL output spectrum.

|                                                          | This Work      | [16] *       | [15]         | [11]          | [10]          | [9]            |
|----------------------------------------------------------|----------------|--------------|--------------|---------------|---------------|----------------|
| Output Freq. (GHz)                                       | 2.21           | 3.67         | 4.8          | 3.125         | 2.4           | 10             |
| Reference Freq. (MHz)                                    | 55.25          | 50           | 200          | 62.5          | 25            | 2500           |
| In-band Phase Noise                                      | -126@200kHz    | -108@400kHz  | -108@1MHz    | -108@100kHz   | -106@200kHz   | -109@600kHz    |
| Normalized In-band<br>Phase Noise (dBc/Hz <sup>2</sup> ) | -235@200kHz    | -222@400kHz  | -218@1MHz    | -220@100kHz   | -219@200kHz   | -215@600kHz    |
| Power (mW)                                               | 7.6            | 39           | 19.5         | 25            | 32            | 81             |
| RMS Jitter (ps)                                          | 0.15 (10k-40M) | 0.2 (1k-40M) | 0.6(10k-40M) | 0.56 (1k-50M) | 0.74 (1k-10M) | 0.22 (10k-20M) |
| PLL FOM (dB)                                             | -246           | -238         | -231         | -231          | -229          | -234           |
| Active Area (mm <sup>2</sup> )                           | 0.18           | 0.95         | 0.21         | 0.43          | 0.70          | 0.71           |
| Technology (µm)                                          | 0.18           | 0.13         | 0.13         | 0.13          | 0.12          | 0.18           |

\*It is a fractional-N PLL.

Table 4.1. PLL performance summary and comparison.

Since the input of the inverter is a slow sine wave Ref,  $SR_{out}$  can be calculated as the voltage gain  $G_v$  times the Ref slew-rate<sup>4</sup>:

$$\mathcal{L}_{\text{in-band,Ref-Buff}} = 4\pi^2 \cdot N^2 \cdot f_{ref} \cdot \frac{v_{out,n}^2}{(G_v \cdot A_{ref} \cdot 2\pi f_{ref})^2}$$
(4.25)

with  $A_{ref}$  the Ref amplitude. Therefore, the in-band phase noise due to Ref buffer will be higher with a smaller  $A_{ref}$ . The measured in-band phase noise at 200 kHz offset with different  $A_{ref}$  is shown in Fig. 4.16. The phase noise is indeed higher with a smaller  $A_{ref}$ , in a 20 dB/dec manner as predicted by (4.25). This also fits to the expectation that the Ref buffer is the dominant source of the in-band phase noise.

The PLL reference spur was measured with an Agilent Spectrum Analyzer E4440A to be -46 dBc at 55.25 MHz offset as shown in Fig. 4.17. It is caused by the disturbance of the SSPD sampling activity to the VCO. Design techniques to reduce the spur level will be discussed in Chapter 6.

Table 4.1 summarizes the PLL performance and shows a comparison with a few representative classical PLLs. When directly compared, the in-band phase noise of this work is at least 18 dB lower. However, this direct comparison is unfair since the classical PLL in-band phase noise level is systematically dependent<sup>5</sup> on the choice of N and  $f_{ref}$  as

<sup>&</sup>lt;sup>4</sup> In order to achieve lower phase noise, we could use a higher  $f_{ref}$  or steepen the Ref clock edges before it is fed to the chip. However, this is not done as it only shifts the problem to other blocks, e.g. to the generation of a clean high frequency Ref. When the Ref slew rate is very high, e.g.  $f_{ref}$  is very high or Ref is a square wave instead of a sine-wave,  $SR_{out}$  will be eventually limited by I/C at the inverter output.

<sup>&</sup>lt;sup>5</sup> In the SSPLL, the PD/CP noise is not related to *N*. However, the Ref buffer noise, which dominates the inband phase noise, is still multiplied by  $N^2$  as in a classical PLL.



Figure 4.18. Jitter and power comparison between this work and the classical PLLs.

shown in Chapter 2. The often used normalized in-band phase noise which normalizes this systematic dependency out is defined as [27]:

$$\mathcal{L}_{norm} = \mathcal{L}_{in-band} - 20\log N - 10\log f_{ref} \,. \tag{4.26}$$

After normalization, the in-band phase noise of this design is at least 13 dB lower than the designs in [9-11, 15, 16]. For a fairer comparison, the loop-components power should also be taken into account. This is not done because most of the papers only give the total PLL power consumption and do not break it into VCO power and loop-components power. However, we can see that the total PLL power consumption of this work is several times less than [9-11, 15, 16].

For the PLL as a whole, it has been shown in Chapter 2 that the PLL jitter performance is systematically related to its power consumption. In order to take the tradeoff between jitter and power into account, the PLL benchmarking figure-of-merit (FOM) defined in Chapter 2 can be used to make a fair comparison:

$$FOM_{PLL} = 20\log\frac{\sigma_t}{1s} + 10\log\frac{P}{1mW}.$$
(4.27)

Fig. 4.18 shows the jitter and power performance of this work and the classical PLLs in [4-16]. This work achieves the lowest jitter as well as lowest power and thus has the best PLL FOM.

## 4.6 Conclusion

Design considerations and measurement results of a fully integrated 2.21 GHz PLL in a standard 0.18-um CMOS process with reduced in-band phase noise have been presented. This PLL employs a PD/CP that sub-samples a high frequency VCO output with a low frequency reference clock. In contrast to what happens in a classical PLL, the PD/CP noise is not multiplied by  $N^2$  in this sub-sampling PLL, resulting in a low noise contribution from the PD/CP. Moreover, no frequency divider is needed in the locked state thus divider noise and power are eliminated. Despite of the low noise feature, a traditional sub-sampling PLL has drawbacks like difficulty of integration (large filter capacitor needed due to high detection gain) and limited frequency acquisition range. In order to overcome these drawbacks, pulse width gain control is added to the sub-sampling PD/CP to reduce the detection gain and thus the needed filter capacitor value. A classical 3-state PFD/CP based PLL with a dedicated dead zone is added as a frequency-locked loop which guarantees correct frequency locking without degenerating jitter performance. Operating at 1.8 V with a 55.25 MHz sine wave reference clock, the 2.21 GHz PLL draws 4.2 mA. The measured in-band phase noise is -126 dBc/Hz at 200 kHz offset and the rms output jitter integrated from 10 kHz to 40 MHz is 0.15 ps.

### 4.7 References

- V. F. Kroupa, Frequency Synthesis: Theory, Design and Applications, London, U.K.: Griffin, 1973.
- [2] J. A. Crawford, Frequency Synthesizer Design Handbook. Boston: Artech House, 1994.
- [3] C. S. Vaucher, *Architectures for RF Frequency Synthesizers*. Boston, MA: Kluwer, 2002.
- [4] J. Craninckx and M. Steyaert, "A fully integrated CMOS DCS-1800 frequency synthesizer," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 372-373, Feb.1998.
- [5] L. Lin and P. R. Gray, "A 1.4 GHz differential low-noise CMOS frequency synthesizer using a wideband PLL architecture," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 204–205, Feb. 2000.
- [6] H. Cong, S. M. Logan, M. J. Loinaz, K. J. O'Brien, E. E. Perry, G. D. Polhemus, J. E. Scoggins, K. P. Snowdon and M. G. Ward, "A 10-Gb/s 16:1 multiplexer and 10-GHz clock synthesizer in 0.25-um SiGe BiCMOS," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1946-1953, Sep. 2001.

- [7] N. Da Dalt and C. Sandner, "A subpicosecond jitter PLL for clock generation in 0.12 μm digital CMOS," *IEEE J. Solid-State Circuits*, vol. 38, no. 7, pp. 1275–1278, Jul. 2003.
- [8] A. M. Terrovitis, M. Mack, K. Singh and M. Zargari, "A 3.2 to 4 GHz, 0.25 um CMOS frequency synthesizer for IEEE 802.11a/b/g WLAN," *IEEE ISSCC Dig. Tech. Papers*, pp. 98–99, Feb. 2004.
- [9] R. C. H. van de Beek, C. S. Vaucher, D. M. W. Leenaerts, E. Klumperink and B. Nauta, "A 2.5–10-GHz clock multiplier unit with 0.22-ps RMS jitter in standard 0.18µm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 11, pp. 1862–1872, Nov. 2004.
- [10] R. Nonis, N. Da Dalt, P. Palestri and L. Selmi, "Modeling, design and characterization of a new low-jitter analog dual tuning LC-VCO PLL architecture," *IEEE J. Solid-State Circuits*, vol. 40, pp. 1303-1309, Jun. 2005.
- [11] R. Gu, A. Yee, Y. Xie and W. Lee, "A 6.25GHz 1V LC-PLL in 0.13µm CMOS," IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 594-595, Feb. 2006.
- [12] A. L. S. Loke, R. K. Barnes, T. T. Wee, M. M. Oshima, C. E. Moore, R. R. Kennedy and M. J. Gilsdorf, "A versatile 90-nm CMOS charge-pump PLL for SerDes transmitter clocking," *IEEE J. Solid-State Circuits*, vol. 41, pp. 1894-1907, Aug. 2006.
- [13] A. Swaminathan, K. J. Wang and I. Galton, "A wide-bandwidth 2.4 GHz ISM band fractional-N PLL with adaptive phase noise cancellation," *IEEE J. Solid-State Circuits*, vol. 42, pp. 2639-2650, Dec. 2007.
- [14] R. B. Staszewski, J. L. Wallberg, S. Rezeq, C.-M. Hung, O. E. Eliezer, S. Vemulapalli, K. C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. Cruise, M. Entezari, K. Muhammad and D. Leipold, "All-digital PLL and transmitter for mobile phone," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2469–2482, Dec. 2005.
- [15] N. Da Dalt, E. Thaller, P. Gregorius and L. Gazsi, "A compact triple-band low-jitter digital LC PLL with programmable coil in 130-nm CMOS," *IEEE J. Solid-State Circuits*, Vol. 40, No. 7, pp.1482-1490, Jul. 2005.
- [16] C. Hsu, M. Z. Straayer and M. H. Perrott, "A low-noise, wide-BW 3.6GHz digital ∆∑ fractional-N frequency synthesizer with a noise-shaping time-to-digital converter and quantization noise cancellation," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 340-341, Feb. 2008.
- [17] P. Andreani and A. Fard, "More on the 1/f<sup>2</sup> phase noise performance of CMOS differential-pair LC tank oscillators," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2703–2712, Dec. 2006.
- [18] D. Ham and A. Hajimiri, "Concepts and methods in optimization of integrated LC VCOs," *IEEE J. Solid-State Circuits*, vol. 36, no. 6, pp. 896–909, Jun. 2001.
- [19] E. Hegazi, H. Sjoland, and A. A. Abidi, "A filtering technique to lower LC oscillator phase noise," *IEEE J. Solid-State Circuits*, vol. 36, pp.1921–1930, Dec. 2001.

- [20] X. Gao, E. Klumperink, M. Bohsali and B. Nauta, "A 2.2GHz 7.6mW sub-sampling PLL with -126dBc/Hz in-band phase noise and 0.15ps<sub>rms</sub> jitter in 0.18μm CMOS," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 392-393, Feb. 2009.
- [21] A. Shahani, D. Shaeffer, S. Mohan, H. Samavati, H. Rategh, M. Hershenson, M. Xu, C. Yue, D. Eddleman and T. H. Lee, "Low-power dividerless frequency synthesis using aperture phase detector," *IEEE J. Solid-State Circuits*, vol. 33, pp. 2232–2239, Dec. 1998.
- [22] S. Desgrez, D. Langrez, M. Delmond, J.-C. Cayrou and J.-L. Cazaux, "A new MMIC sampling phase detector design for space applications," *IEEE J. Solid State Circuits*, vol. 38, pp. 1438-1442, Sep. 2003.
- [23] S. B. Anand and B. Razavi, "A CMOS clock recovery Circuit for 2.5-Gb/s NRZ Data," IEEE J. Solid State Circuits, vol. 36, pp. 432–439, Mar. 2001.
- [24] P. C. Maulik and D. A. Mercer, "A DLL-based programmable clock multiplier in 0.18-um CMOS with -70 dBc reference spur," *IEEE J. Solid-State Circuits*, vol. 42, no. 8, pp. 1642–1648, Aug. 2007.
- [25] F. M. Gardner, "Charge-pump phase-lock loops," *IEEE Trans. on Communications*, vol. COM-28, no.11, pp. 1849-58, Nov. 1980.
- [26] X. Gao, E. Klumperink, P. F. J. Geraedts and B. Nauta, "Jitter analysis and a benchmarking figure-of-merit for phase-locked loops," *IEEE Trans. Circuits Syst. II*, vol. 56, no.2, pp. 117-121, Feb. 2009.
- [27] D. Banerjee, PLL performance, simulation, and design, 4<sup>th</sup> edition, National Semiconductor, 2006. [on-line] Accessed on Mar. 20<sup>th</sup>, 2010. http://www.national.com/analog/timing/pll designbook
- [28] K. V. Puglia, "Phase-locked DRO uses a sampling phase detector," *Microwaves RF*, pp. 103–111, Jul. 1993.

## **Chapter 5**

# **Power Reduction Techniques for SSPLL**

## **5.1 Introduction**

In the previous chapter, a PLL architecture based on sub-sampling phase detection has been proposed. The generic block diagram of a sub-sampling PLL (SSPLL) is shown in Fig. 5.1. A sub-sampling phase detector (SSPD) samples the high frequency VCO output with the low frequency reference clock Ref and converts the VCO phase error into sampled voltage variation. A charge pump (CP) acts as a transconductor, converts the sampled voltage into current that drives the loop filter (LF). A frequency-locked loop guarantees correct frequency locking and can be disabled once locking is achieved. Compared with the classical PLL, the SSPLL has the advantages that the divider noise is eliminated and the PD and CP noise is not multiplied by  $N^2$ . It can thus achieve a much lower in-band phase noise than the classical PLL for a given power budget. In other words, the SSPLL is much more power efficient than the classical PLL while achieving the same output jitter as shown by the design in Chapter 4. Based on the work in Chapter 4 (referred to as [1] hereinafter), this chapter attempts to improve the power efficiency of the SSPLL even further. We will again focus on the loop design and propose design techniques [2] to push down the loopcomponents power of the SSPLL in [1] by an order of magnitude, while maintaining its superior in-band phase noise performance.

In the SSPLL as shown in Fig. 5.1, the frequency-locked loop can be disabled once locking is achieved and thus does not consume power. The loop-components power is contributed by the SSPD, the CP, the VCO buffer which may be used to isolate the VCO from the SSPD, and the Ref buffer which is needed to boost the edge steepness of the sampling clock when a sine-wave crystal oscillator (XO) is used as the PLL input. Since the SSPD and CP noise is not multiplied by  $N^2$  in a SSPLL, their noise contribution is low and thus their size and power can be progressively scaled down. The VCO and Ref buffers for the SSPD then become the bottlenecks for low loop-components power. In the design in [1], they respectively account for 30% and 60% of the total loop-components power.

In section 5.2 and 5.3, we will propose two techniques to alleviate these bottlenecks: 1) direct sampling of the VCO without buffer while keeping the disturbance to the VCO low;



Figure 5.1. Generic sub-sampling PLL (SSPLL) architecture.



Figure 5.2. (a) Simple model for VCO sampling, (b) VCO sampling with dummy sampler.

2) power efficient Ref buffering with drastically reduced short-circuit current. Section 5.4 describes the circuit level design. The experimental results are presented in Section 5.5 and Section 5.6 draws the conclusions.

## 5.2. Buffer-less Direct VCO Sampling

In most applications, the VCO frequency  $f_{VCO}$  is high, typically in the GHz range. Buffers running at  $f_{VCO}$  are thus power consuming. In the SSPLL design in [1],  $f_{VCO}$  is 2.2GHz. The

buffer used to isolate the VCO from the SSPD consumes about 30% of the total loopcomponents power. From the power consumption point of view, it is desirable to remove this high speed buffer and directly interface the SSPD with the VCO. However, a concern of directly sampling the VCO without using a buffer is the disturbance of the SSPD to the VCO operation. Fig. 5.2(a) shows a simplified diagram of the VCO and SSPD, where the VCO is represented by an ideal LC tank. A switch-capacitor SSPD uses Ref as the sampling clock and samples the VCO output. For an ideal sampler, the sampling clock should be a Dirac pulse with an infinite small duration time. As this requires an unpractical virtually zero duty cycle clock, a practical sampler is usually implemented using track-andhold circuits driven by a block-waveform with more practical duty-cycle as in Fig. 5.2(a). When the sampling clock Ref turns on the switch, the sampling capacitor  $C_{sam}$  is connected to the VCO and becomes part of the VCO loading. When Ref turns off the switch,  $C_{sam}$  is disconnected and the VCO is not loaded by C<sub>sam</sub>. Therefore, the capacitive load of the LCtank and thus  $f_{VCO}$  is time varying as shown in Fig. 5.2(a). The periodic switching of the sampler at frequency  $f_{ref}$  modulates  $f_{VCO}$  in a way similar to the case of binary frequency shift keying (BFSK), causing spurs at integer multiples of  $f_{ref}$ .

To suppress the BFSK effect without resorting to power consuming isolation buffer, we propose to add a dummy sampler as displayed in Fig. 5.2(b). The dummy sampler is a copy of the existing sampler but is controlled by the inverted Ref. Due to the complementary switching of the sampler and its dummy, the VCO is always connected to one  $C_{sam}$ . The VCO capacitive load thus does not change over time and the BFSK effect can be compensated. In reality, the compensation is not perfect due to capacitor mismatch  $\Delta C_{sam}$  between the sampler and its dummy. In a CMOS process, the amount of mismatch  $\Delta C_{sam}$  scales with the value of  $C_{sam}$ . It is thus desirable to have a small  $C_{sam}$  for a low spur level. However, a smaller  $C_{sam}$  means a larger  $kT/C_{sam}$  and more sampler noise. The in-band phase noise contributed by a single SSPD sampler has been calculated as (4.16) in Chapter 4. When differential sampling is used, the noise power is doubled since there are now two samplers in the SSPD contributing to noise. Nonetheless, the VCO swing is also doubled. It is easy to show that the noise contribution of the SSPD is 3 dB lower than the one calculated in (4.16), resulting in

$$\mathcal{L}_{in-band,SSPD} = \frac{kT}{2C_{sam} \cdot A_{VCO}^2 \cdot f_{ref}} \cdot$$
(5.1)

There is thus a tradeoff between the spur level and the in-band phase noise due to the SSPD. When focusing on low phase noise, a good compromise would be reducing the value of  $C_{sam}$  until the SSPD noise becomes a considerable portion of the overall in-band phase noise.

Besides the BFSK effect, there are also other spur mechanisms induced by the VCO sampling. They will be discussed in more detail in Chapter 6.



Figure 5.3. Schematic and timing diagram of conventional inverter buffer.

### **5.3 Low Power Ref Buffer**

In order to properly sample the high frequency GHz VCO, the sampling clock Ref should have a steep sampling edge. The Ref slew rate should be higher than the VCO slew rate. In most applications, the PLL input is a sine wave XO which often has a much lower slew rate than the VCO since  $f_{XO} \ll f_{VCO}$ . A buffer converting the slow sine wave XO into a steep square wave Ref is thus needed. In the tens-of-MHz XO frequency range, a CMOS inverter buffer is more power efficient than a CML buffer as it mainly consumes dynamic power. The noise of the Ref buffer is critical for achieving low in-band phase noise as the Ref noise is still multiplied by  $N^2$  when transferred to the SSPLL output. To reduce the Ref buffer noise, large size transistors need to be used at the expense of more power consumption. In the SSPLL design in [1], we see that the inverter Ref buffer consumes about 60% of the total loop-components power.

Fig. 5.3 shows the schematic of a conventional inverter buffer, which converts a low slew rate sine wave clock into a high slew rate square wave. Also shown is a simplified timing diagram, with  $V_{th,NI}$  and  $V_{th,PI}$  the threshold voltage of the NMOS N1 and PMOS P1 and  $V_{SP,inv}$  the switching point voltage of the inverter. A key issue of an inverter with slow input is the "short-circuit" current [3]. Due to the low slew rate of the input signal, the NMOS and PMOS transistors in the inverter conduct simultaneously for a considerable period of time during switching (see Fig. 5.3), causing a direct current path between the supply and ground. This short-circuit current is significant when the inverter input has a much lower slew rate than the inverter output, which is the case in our Ref buffer. For a lightly loaded inverter with a  $1.8-V_{p-p}$  55 MHz sine wave input in 0.18-µm CMOS, the short-circuit current accounts for as much as 90% of the total inverter power in simulation.



Figure 5.4. Schematic and timing diagram of the proposed low power Ref buffer.

While a sampling clock has two edges, only the edge that corresponds to the switch-off moment is used for sampling, which we call the sampling edge. The other clock edge corresponds to switch-on moment and we will refer to as the tracking edge since it is where voltage tracking starts. For low noise sampling, the sampling edge is highly critical and needs to be clean while the noise on the tracking edge is hardly relevant. Fig. 5.4 shows the proposed Ref buffer, which exploits this property to drastically reduce the buffer power. The idea is to directly convey the critical edge and re-position the other non-critical edge at a convenient place to avoid short-circuit currents. The buffer core is an inverter with an NMOS N1 and a PMOS P1. The gate of N1 is directly connected to the input as in a conventional inverter, while a timing control circuit (TCC) is inserted between the input and the gate of P1. The TCC consists of two delay cells  $\Delta t_1$  and  $\Delta t_2$  and a few standard logic gates. It generates a narrow pulse  $V_{GP}$  from the input and controls the gate of P1. As shown in Fig. 5.4,  $\Delta t_1$  and  $\Delta t_2$  are set such that the time when  $V_{GP}$  is low (P1 conducts) and the time when the input is higher than the threshold of N1 (N1 conducts) is non-overlapping. Since the buffer runs at  $f_{ref}$  which is often low, this timing plan is easy to achieve. In this way, N1 and P1 will not conduct simultaneously thereby eliminating the short-circuit current.

Fig. 5.4 also shows an example of transistor sizing which is used in this design. Since N1 is used to convey the critical sampling edge (the falling edge in this example), its size is kept big to maintain a low sampling edge noise. The TCC and P1 use much smaller sizes to save power as they only add noise to the non-critical tracking edge. The first block Inv1 in the TCC is a conventional inverter and has the slow sine wave as its input. It thus still has a large portion of short-circuit current, but the contribution to the total buffer power is negligible



Figure 5.5. Low power sub-sampling PLL architecture.



Figure 5.6. Schematic of the VCO and SSPD.

as its size is small. In practical design, more inverters may be added following the one in Fig. 5.4 to further boost the clock slew rate or invert the critical edge. These inverters can be the conventional ones as their inputs are already close to square wave after the amplification of N1, and thus their short-circuit current is small. Having a high slew rate square wave as input also means that these inverters generates small jitter and has small contribution to the overall buffer phase noise. To sum up, the proposed Ref buffer greatly reduces power by drastically reducing the short-circuit current while maintaining the critical edge's noise performance. This buffer can not only be used in the SSPLL, it can also be used in other applications where only one clock edge is critical. This reference clock buffer also has a nice feature that the rising and falling edge of the output clock can be tuned separately, which will be explored in Chapter 6. In case the XO is delivered in a differential way, one phase of the differential XO can be connected to the gate of N1 while the other phase to the source of N1, in a way similar to [4].



Figure 5.7. Schematic of CP and Pulser.

## 5.4 Design and Implementation

The overall architecture of the low power SSPLL is displayed in Fig. 5.5, which is similar to the one proposed in Chapter 4. Fig. 5.6 shows the schematic of the LC VCO and SSPD. The 2.2 GHz VCO is tail biased and has a double switched differential pair. It has a 50 MHz/V analog tuning gain and a 3-bit capacitor bank for digital tuning to realize a more than 10% tuning range. No buffer is used between the VCO and SSPD samplers to save power, while complementary switched dummy samplers reduces the disturbance of the SSPD to the VCO. The samplers use PMOS switches since the VCO DC level is high. The size of  $C_{sam}$  is set such that the SSPD contributes to 10% of the total in-band phase noise. Here we aim to achieve the same -126 dBc/Hz in-band phase noise as the design in Chapter 4. With  $f_{ref}$ =55 MHz,  $A_{VCO}$ =0.4 V,  $C_{sam}$  is chosen to be 10 fF resulting in  $\mathcal{L}_{in-band,SSPD}$  =-136 dBc/Hz according to (5.1).

Fig. 5.7 displays the schematic of the CP and the Pulser. The CP consists of a differential pair which converts voltage into current. The currents are then mirrored and injected either into the loop filter or into a dumping node  $V_{dump}$  depends on the state of the Pulser. The diode connected transistor M1 is added to improve the drain node voltage matching of the current mirror transistor M2 and M3. Since the differential pair in the CP directly interfaces the VCO through the sampling switch, it is also part of the VCO loading. A dummy differential pair connected to the dummy sampler is added to balance the VCO loading during switching.

| 0.5 mr |                |  |
|--------|----------------|--|
| VCO    | Loop<br>Filter |  |

Figure 5.8. Chip microphotograph.



Figure 5.9. Measured PLL output phase noise.

## **5.5 Experimental Results**

To verify the presented ideas, a prototype have been fabricated in 0.18-µm CMOS process and tested in a 24 pin Quad LLP package with 1.8 V supply. Fig. 5.8 shows a die microphotograph. The active area is 0.4 x 0.5 mm<sup>2</sup>. The reference clock is derived from an off-chip 55.25 MHz SC Sprinter XO from Wenzel Associates. The XO output is attenuated to 1.8 V<sub>p-p</sub> and DC biased using a bias-T before it is fed into the chip.



Figure 5.10. Measured reference spur of one chip while tuning the Ref duty cycle, via changing the DC bias of the XO before it is fed into the PLL chip.



Figure 5.11. Measured reference spur from 20 chips.

Excluding the 50  $\Omega$  measurement buffer and disabling the frequency locked loop after locking is achieved, the PLL consume 2.5 mW, where the loop-components consume 0.7 mW and the VCO 1.8 mW. Among the 0.7 mW loop-components power, 0.4 mW is consumed by the reference buffer and 0.2 mW by the CP. Fig. 5.9 shows the phase noise spectrum of the 2.21 GHz output measured using an Agilent E5501B phase noise measurement setup. The in-band phase noise is -125 dBc/Hz at 200 kHz offset and out-of-band phase noise is -140 dBc/Hz at 20 MHz offset. The rms jitter integrated from 10 kHz to 100 MHz is 0.16 ps.

Since no isolation buffer is used between the VCO and SSPD in this design, the VCO spur level is a concern. The VCO spurs are thus measured with an Agilent E4440A Spectrum Analyzer. During the measurement, it is discovered that the spur level changes while tuning the Ref duty cycle via changing the DC bias of the XO output, see Fig. 5.10.

This is because there is charge sharing between the VCO and the sampling capacitor which is the major cause of VCO spur after the BFSK effect is compensated by the dummy sampler. The amount of charge sharing depends on the Ref duty cycle and so is the VCO spur. In addition to spurs at  $f_{ref}$  (reference spur), in Fig. 5.10 we also show spurs at  $2f_{ref}$ away from the VCO frequency. Due to the addition of the complementary switched dummy sampler, the SSPD switching activity is doubled. There is thus also noticeable spurs at  $2f_{ref}$ , although the worst case spur still occurs at  $f_{ref}$ . Fig. 5.11 shows the reference spurs measured from 20 chips. For each chip, the VCO reference spur is measured by changing the input clock duty cycle and the worst case spur level is recorded. The spur related issues will be discussed in more detail in Chapter 6.

Fig. 5.12 and Table 5.1 summarize the PLL performance and display a comparison with the state-of-art low jitter PLLs in literature [5-11]. This design has a PLL FOM of -252 dB, which is the best. In parallel to the development of the SSPLL in this work, sub-harmonic injection locking based PLLs are reported [5], [6] which also achieve very good performance. In fact, the normalized in-band phase noise of the injection locked PLL in [5], [6] is equal or a few dB better than this work. However, this work consumes more than 10 times less power and thus has a better FOM. Also note that we directly used a 55 MHz sine wave XO (low slew rate) as the PLL input while [5] used a 50 MHz square wave (high slew rate) and [6] used a 1 GHz sine wave (high slew rate). If this work is also designed and measured with a high slew rate input, the jitter number can be even lower since having a high slew rate input reduces the Ref buffer noise which is the dominant source of in-band phase noise in this SSPLL.

Compared with the SSPLL design in [1], the loop-components power is 8x lower; the reference spur is 10 dB lower while the in-band phase noise is only 1 dB worse, which proves the effectiveness of the proposed power reduction techniques. The PLL FOM is 4 dB better than [1], not as much as the improvement of the loop-components power. That is because the VCO is the same as in [1] and has no improvement, while the quality of the VCO design, i.e.,  $FOM_{VCO}$ , accounts for half of the PLL FOM as discussed in Chapter 2. In fact, the VCO is now the limiting factor for the SSPLL performance. The  $FOM_{VCO}$  of this design can be calculated from the measurement results as

$$FOM_{VCO} = -140 \text{dBc/Hz} + 20 \log \frac{20 \text{MHz}}{2.2 \text{GHz}} + 10 \log \frac{1.8 \text{mW}}{1 \text{mW}} \approx -178 \text{dBc/Hz} \cdot$$
(6.2)

This number is pale compared with the -194 dBc/Hz of the state-of-art VCOs [12], [13], partly due to the inferior quality factor of the LC tank in this design. As stated before, this thesis focuses on improving the loop design not the VCO design. However, if the state-of-art VCO is available to us and replaces the VCO in this design, the overall PLL FOM would then be improved by 16/2=8 dB to -260 dB according to (2.35). In theory this means, for instance, a 100 fs rms jitter can be achieved with only 1 mW power consumption.



Figure 5.12. Jitter and power comparison with state-of-art low jitter PLLs.

|                                                          | This Work          | [1]               | [5]               | [6-chip A]        | [6-chip B]         |
|----------------------------------------------------------|--------------------|-------------------|-------------------|-------------------|--------------------|
| f <sub>out</sub> (GHz)                                   | 2.21               | 2.21              | 3.2               | 20                | 20                 |
| f <sub>ref</sub> (MHz)                                   | 55.25              | 55.25             | 50                | 1000              | 2500               |
| RMS jitter $q_i$ (ps)                                    | 0.16<br>(10k-100M) | 0.15<br>(10k-40M) | 0.13<br>(100-40M) | 0.11<br>(50k-80M) | 0.048<br>(50k-80M) |
| In-band phase<br>noise (dBc/Hz)                          | -125<br>@200kHz    | -126<br>@200kHz   | -125<br>@200kHz   | -113<br>@1MHz     | -123<br>@1MHz      |
| Normalized In-band<br>Phase Noise (dBc/Hz <sup>2</sup> ) | -234<br>@200kHz    | -235<br>@200kHz   | -238<br>@200kHz   | -229<br>@1MHz     | -235<br>@1MHz      |
| Ref Spur (dBc)<br>(# of sample)                          | -56<br>(#=20)      | -46<br>(#=1)      | -64<br>(#=1)      | -46<br>(#=1)      | -55<br>(#=1)       |
| PLL Power P (mW)                                         | 2.5                | 7.6               | 28.6              | 38                | 105                |
| Loop-Components<br>Power (mW)                            | 0.7                | 5.8               | ?                 | ?                 | ?                  |
| PLL FOM (dB)                                             | -252               | -248              | -243              | -243              | -246               |
| Active area (mm <sup>2</sup> )                           | 0.20               | 0.18              | 0.40              | <0.45             | < 0.32             |
| Technology (CMOS)                                        | 0.18- μm           | 0.18- μm          | 0.13- μm          | 90-nm             | 90-nm              |

Table 5.1. Performance summary and comparison with state-of-art PLL designs.

# **5.6** Conclusion

In this chapter we proposed design techniques to reduce the SSPLL loop-components power while maintaining its in-band phase noise performance. A direct VCO sampling

scheme is adopted by removing the power consuming buffer between the VCO and the SSPD sampler. Complementary switched dummy samplers are added to keep the VCO spur still below -56 dBc. A modified inverter based Ref buffer is proposed. By using separate gate control for NMOS and PMOS, non-overlapping conduction of the transistors is guaranteed. The direct-current path from the supply to ground is thus eliminated and the Ref buffer power is drastically reduced. The SSPLL designed in 0.18- $\mu$ m CMOS process achieves -125 dBc/Hz in-band phase noise at 200 kHz with only 700  $\mu$ W loop-components power.

## **5.7 References**

- [1] X. Gao, E. Klumperink, M. Bohsali and B. Nauta, "A low noise sub-sampling PLL in which divider noise is eliminated and PD/CP noise is not multiplied by N<sup>2</sup>," *IEEE J. Solid-State Circuits (JSSC)*, pp. 3253-3263, vol. 44, no.12, Dec. 2009.
- [2] X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "A 2.2GHz sub-sampling PLL with 0.16ps<sub>rms</sub> jitter and -125dBc/Hz in-band phase noise at 700μW loopcomponents power," *IEEE Symposium on VLSI Circuits*, paper 14.1, Jun. 2010.
- [3] H. Veendrick, "Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits," *IEEE J. Solid-State Circuits*, vol. SC-19, pp. 468–473, Aug. 1984.
- [4] S. M. Louwsma, A. J. M. van Tuijl, M. Vertregt and B. Nauta, "A 1.35 GS/s, 10 b, 175 mW time-interleaved AD converter in 0.13 um CMOS," *IEEE J. Solid-State Circuits*, vol. 43, pp. 778–786, Apr. 2008.
- [5] B. Helal, C.-M. Hsu, K. Johnson and M. H. Perrott, "A low jitter programmable clock multiplier based on a pulse injection-locked oscillator with a highly-digital tuning loop," J. Solid-State Circuits, pp.1391–1400, May 2009.
- [6] J. Lee and H. Wang, "Study of subharmonically injection-locked PLLs," J. Solid-State Circuits, pp.1539–1553, May 2009.
- [7] A. M. Terrovitis, M. Mack, K. Singh and M. Zargari, "A 3.2 to 4 GHz, 0.25 um CMOS frequency synthesizer for IEEE 802.11a/b/g WLAN," *IEEE ISSCC Dig. Tech. Papers*, pp. 98–99, Feb. 2004.
- [8] R. C. H. van de Beek, C. S. Vaucher, D. M. W. Leenaerts, E. Klumperink and B. Nauta, "A 2.5–10-GHz clock multiplier unit with 0.22-ps RMS jitter in standard 0.18μm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 11, pp. 1862–1872, Nov. 2004.

- [9] R. Gu, A. Yee, Y. Xie and W. Lee, "A 6.25GHz 1V LC-PLL in 0.13μm CMOS," IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 594-595, Feb. 2006.
- [10] N. Da Dalt, E. Thaller, P. Gregorius and L. Gazsi, "A Compact Triple-Band Low-Jitter Digital LC PLL With Programmable Coil in 130-nm CMOS," *IEEE J. Solid-State Circuits*, Vol. 40, No. 7, pp.1482-1490, Jul. 2005.
- [11] C. Hsu, M. Z. Straayer and M. H. Perrott, "A low-noise, wide-BW 3.6GHz digital ∆∑ fractional-N frequency synthesizer with a noise-shaping time-to-digital converter and quantization noise cancellation," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 340-341, Feb. 2008.
- [12] Z. Li and K. K. O, "A low-phase-noise and low-power multi-band CMOS voltagecontrolled oscillator," *IEEE J. Solid-State Circuits*, vol. 40, no. 6, pp. 1296–1302, Jun. 2005.
- [13] A. Mazzanti and P. Andreani, "A 1.4 mW 4.90–5.65 GHz class-C CMOS VCO with an average FoM of 194.5 dBc/Hz," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 474– 475, Feb. 2008.

## Chapter 6

## **Spur Reduction Techniques for SSPLL**

## **6.1 Introduction**

In Chapter 4 and 5, we have shown that the sub-sampling PLL (SSPLL) is able to achieve very low in-band phase noise with low power. However, the measured reference spurs, -46 dBc in Chapter 4 and -56 dBc in Chapter 5, are relatively high. The reference spur is also an indication of the spectral purity of a clock signal and can be important in some applications. Clock spurs may cause spectral mask violation in transmitters, mix interferers into the band of interest in receivers [1] and translate to deterministic jitter degrading the ADC signal-to-noise ratio. The measurement results in the previous chapters raise the question whether the high spur level is an inherent drawback of the SSPLL. In this chapter, we will analyze the underlying spur mechanisms in a SSPLL and propose techniques to suppress them.

The discussion of the spur mechanisms starts in section 6.2 with the charge pump (CP) mismatch, which is often the major source of the PLL reference spur [2-8]. In classical PLLs, amplitude mismatches in the CP current sources generate CP output-current ripple which is then converted to ripple on the VCO control voltage by the loop filter (LF), resulting in VCO spurs. A small LF bandwidth can be used to suppress the ripple, but at the expense of a lower PLL bandwidth, slower settling time, larger on-chip LF area and more sensitivity of the VCO to pulling [6]. In order to alleviate the tradeoff between low spur and large bandwidth, various design techniques have been proposed to reduce the CP ripple. Examples are CP designs that improve current source matching [2],[8], detect the current source mismatch and then apply analog [4] or digital [5] calibration, or designs that add a sample-and-hold between the CP and the loop filter [6],[7].

In a SSPLL, the CP acts as a transconductor and converts the sampled voltage into current. In other words, the amount of the CP output current is dependant on the amplitude of the sampled voltage and thus the CP is amplitude controlled. In the SSPLL design in Chapter 4 (referred to as [9] hereinafter), a block Pulser is used to switch on/off the CP in



Figure 6.1. (a) 3-state PFD and timing controlled CP, (b) conventional low ripple CP implementation.

order to lower the CP gain and reduce the filter capacitor area. Unlike a conventional CP, the on-time of this pulsed CP does not depend on the phase-difference, but is constant. We will show in section 6.2 that this CP is actually insensitive to mismatch. The CP design can thus be largely simplified while still producing small ripple. Although the SSPD and the amplitude controlled CP have been already used in [9], they did not lead to a low spur level there. We will show that this is because the SSPD periodically disturbs the VCO operation during sampling, causing actually large VCO spurs. The VCO sampling spur mechanisms will be analyzed in section 6.3 and design techniques will be proposed to mitigate them [10]. Different from the CP, the SSPD disturbs the VCO without going through the LF and hence there is no tradeoff between low SSPD spur and large PLL bandwidth. As a result, very low reference spur can be achieved while using a high PLL bandwidth. The circuit implementation details of the low spur SSPLL are presented in Section 6.4. Section 6.5 presents the experimental results, showing that reference spur lower than -80 dBc can be achieved. Finally, Section 6.6 draws conclusions.

### 6.2 Spur due to Charge Pump

We will now first discuss the conventional CP and then the amplitude controlled CP for the SSPLL, to explain why the latter is beneficial in terms of output current ripple generation.

#### 6.2.1 Conventional CP

Fig. 6.1(a) shows the schematic of the conventional phase frequency detector (PFD) and CP. During operation, the PFD compares the phase of the divided-down VCO to the phase of Ref and generates two signals UP and DN to control the CP. It converts the VCO phase error into the on-time difference  $\tau_{UP}$ - $\tau_{DN}$  between the CP up-current-source I<sub>UP</sub> and down-current-source I<sub>DN</sub>. In this conventional CP, I<sub>UP</sub> and I<sub>DN</sub> have a *variable on-time* but a *constant amplitude* fixed by biasing. When the PLL is phase locked, the net charge provided by the CP should be zero. To maintain the steady state locking condition, the following equation must be satisfied:

$$I_{UP} \cdot \tau_{UP} = I_{DN} \cdot \tau_{DN} \,. \tag{6.1}$$

In case there is mismatch between the amplitudes of  $I_{UP}$  and  $I_{DN}$ , we have  $I_{UP} \neq I_{DN}$  and  $\tau_{UP} \neq \tau_{DN}$ . One of the CP current sources thus has to be on for a longer time in order to satisfy (6.1). This causes CP output current ripple as shown in Fig. 6.1(a), which is then converted to ripple on the VCO control voltage by the LF. If  $i_{CP,fref}$  is the amplitude of the fundamental component of the CP output current ripple, the corresponding VCO reference spur  $SP_{fref,CP}$  can be calculated as [1]:

$$SP_{fref,CP} = 20\log \frac{i_{CP,fref} \cdot |F_{LF}(j2\pi \cdot f_{ref})| \cdot K_{VCO}/2\pi}{2f_{ref}}$$
(6.2)

where  $F_{LF}(s)$  is the LF trans-impedance transfer function and  $K_{VCO}$  is the VCO tuning gain in rad/V. When the often used second order RC filter as in Fig. 6.1(b) is used, we have:

$$F_{LF}(s) = \frac{1}{(C_1 + C_2)s} \cdot \frac{1 + R_1 C_1 s}{1 + R_1 \cdot \frac{C_1 C_2}{C_1 + C_2} \cdot s} = \frac{1}{(C_1 + C_2)s} \cdot \frac{1 + s/2\pi f_{zero}}{1 + s/2\pi f_{pole}}$$
(6.3)

where  $f_{zero}=1/(2\pi R_1C_1)$  and  $f_{pole}=1/[2\pi R_1C_1C_2/(C_1+C_2)]$  is the LF zero and pole frequency.

In most designs, we have  $f_{zero} < f_{pole} << f_{ref}$  and  $C_1 >> C_2$ . The VCO spur can then be approximated using (6.2) and (6.3) as:

$$SP_{fref,CP} \approx 20\log \frac{i_{CP,fref} \cdot R_1 \cdot K_{VCO}}{4\pi f_{ref}} + 20\log \frac{f_{pole}}{f_{ref}}$$
(6.4)

Defining a CP feedback gain  $\beta_{CP}$  as the gain from the VCO output to the CP output same as in Chapter 4, the PLL open loop bandwidth  $f_c$  can be expressed as:

$$f_c \approx \frac{\beta_{CP} \cdot R_1 \cdot K_{VCO}}{2\pi} \,. \tag{6.5}$$

Substituting (6.5) into (6.4) yields:

$$SP_{fref,CP} \approx 20\log\frac{f_{pole}/f_c}{2} + 20\log\frac{i_{CP,fref}}{\beta_{CP}} + 40\log\frac{f_c}{f_{ref}} \quad . \tag{6.6}$$

Therefore to reduce the CP induced VCO spur, we can: 1) adopt a small  $f_{pole}/f_c$ , but it is often limited by the phase margin requirement; 2) use a large  $\beta_{CP}$  or in other words use a small  $R_1 \cdot K_{VCO}$  for a given  $f_c$ , but it increases filter capacitor area or reduces VCO analog tuning range; 3) reduce the CP output current ripple *i<sub>CP,fref</sub>*; 4) use a small loop-bandwidthto-reference-frequency ratio  $f_c/f_{ref}$  for more ripple suppression. For a given  $f_{ref}$ , there is thus a trade-off between low VCO spur and large  $f_c$ . For a given spur requirement, a CP design with lower ripple enables the use of a higher  $f_c$ , which is often desired as it offers faster settling time, reduces on-chip loop filter area and sensitivity of the VCO to pulling. Fig. 6.1(b) shows a classical low ripple CP design [8]. The current sources are implemented with cascoded transistors to boost the output impedance and improve matching. Another factor which also contributes to CP current ripple is the charge sharing between the parasitic capacitances at the current sources' drain nodes d1 and d2 and the LF capacitors if their voltages are not equal when they are connected during CP switching. The conventional CP in Fig. 6.1(b) uses a current steering topology, where IUP and IDN are either connected to LF or dumped to  $V_{dump}$ . An operational amplifier acting as unity gain buffer sets  $V_{dump} = V_{LF}$ . In this way,  $I_{UP}$  and  $I_{DN}$  are kept on all the time and the voltages on d1 and d2 are kept constant during CP switching, thereby minimizing the LF-CP charge sharing.

#### 6.2.2 Low Spur CP Using Sub-sampling

Fig. 6.2(a) shows the top level schematic of the SSPD/CP [9]. During operation, the SSPD directly samples the high frequency VCO with the low frequency Ref without using a frequency divider. It detects the phase difference between the VCO and the Ref sampling edge and converts it into a sampled voltage difference  $(V_{sam+}-V_{sam-})$ , which is then used to control the amplitude of  $I_{UP}$  and  $I_{DN}$ . A block Pulser generates a pulse *Pul*, non-overlapping with Ref, and switches on/off  $I_{UP}$  and  $I_{DN}$  simultaneously. This Pulser controls the CP gain and also functions as the slave track-and-hold for the VCO sampling. Therefore in this CP,  $I_{UP}$  and  $I_{DN}$  have a *variable amplitude* but a *constant on-time* equal to the on-time of the Pulser  $\tau_{pul}$ . Assuming ideal switching, the following equation must be satisfied to meet the steady state locking condition of zero net CP output charge:



Figure 6.2. (a) SSPD and amplitude controlled CP, (b) proposed low ripple CP design.

$$I_{UP} \cdot \tau_{pul} = I_{DN} \cdot \tau_{pul} \implies I_{UP} = I_{DN}.$$
(6.7)

In other words,  $I_{UP}$  and  $I_{DN}$  must equal and the  $I_{UP}$  and  $I_{DN}$  amplitude mismatch is eliminated<sup>1</sup>. Actually, there is always mismatch between  $I_{UP}$  and  $I_{DN}$  if they are implemented with MOS transistors. However, the SSPLL loop tunes  $V_{sam+}$  and  $V_{sam-}$  until the amplitudes of  $I_{UP}$  and  $I_{DN}$  match, by shifting the sampling/locking point away from the ideal point ( $V_{sam+}=V_{sam-}$ , VCO zero-crossing), see Fig. 6.2(a). So the mismatch between the current sources' transistors still causes static phase error as in a conventional CP, but here it does not generate CP output current ripple.

Fig. 6.2(b) shows the proposed low ripple CP design which is much simpler than the conventional one in Fig. 6.2(b). Since  $I_{UP}$  and  $I_{DN}$  amplitude mismatch will be tuned out by the PLL loop, the current sources' output impedance is not an issue and single transistors are used, which saves voltage headroom. While the conventional CP needs a unity-gain

<sup>&</sup>lt;sup>1</sup> This assumes ideal current source switches. In practice, there is also mismatch between the switches. Due to the finite rise and fall time of Pul, this causes mismatch in  $I_{UP}$  and  $I_{DN}$  switch-on time and thus mismatch in  $I_{UP}$  and  $I_{DN}$  amplitudes. If this is the limiting factor for VCO spur, the Pulser and the two switches which acts as the slave track and hold for VCO sampling can be removed and instead a second switch-capacitor circuit can be added to the SSPD. The CP is then always connected to the LF and no switching is needed. However, we will see that the CP is not anymore the major spur source in this SSPLL. It is therefore still beneficial to keep the Pulser as it simplifies the SSPD design and can be used to control the CP gain [9].



Figure 6.3. (a) Simple model for VCO sampling, (b) VCO sampling with dummy sampler.

buffer to keep  $V_{dump}=V_{LF}$  and minimize CP-LF charge sharing, we discovered that here this can be achieved by just connecting a capacitor  $C_{dump}$  to the current dumping node as explained below. In steady state, the net charge into the LF and  $C_{dump}$  should be both zero. Since  $I_{UP}$  and  $I_{DN}$  have equal on-time in both 'connected to LF' and 'connected to  $C_{dump}$ ' cases, they must also have equal amplitude in both cases. This condition is met only when  $V_{dump}=V_{LF}$  where the finite current source output impedance is actually the equalizing mechanism. When the drain nodes of the PMOS current source  $I_{UP}$  and NMOS current source  $I_{DN}$  are connected together, there is only one drain node voltage satisfying  $I_{UP}=I_{DN}$ due to the finite current source output impedance.

## 6.3 Spur due to VCO Sampling and Techniques to Reduce It

In the previous section, we have shown that the amplitude controlled CP in the SSPLL is inherently insensitive to mismatch and produces small ripple. In the design of [9], a CP based on the same principle has been used. However, a rather poor -46 dBc reference spur was measured. Research shows that this is because the SSPD disturbs the VCO operation, via periodically changing the VCO capacitive load, charge injection from the sampling switch to the VCO and charge sharing between the VCO tank and the sampling capacitor. In the sub-sections below, we will analyze these VCO sampling spur mechanisms and propose techniques to suppress them. We will use a simplified diagram as shown in Fig. 6.3(a), where an ideal LC tank is directly sampled by Ref via a switch-and-capacitor SSPD.

In the real design, a buffer will be added between the VCO and SSPD for better isolation and lower spur. To simplify the analysis and gain insights, we will firstly ignore the buffer and discuss the effect of the buffer later.

#### 6.3.1 BFSK Effect

For an ideal sampler, the sampling clock should be a Dirac pulse with an infinitesimal duration time. As this requires an unpractical virtually zero duty-cycle clock, a practical sampler is usually implemented using a track-and-hold driven by a block-waveform with more practical duty-cycle as in Fig. 6.3(a). When Ref turns on the switch, the sampling capacitor  $C_{sam}$  is connected to the VCO and becomes part of the VCO loading. When Ref turns off the switch,  $C_{sam}$  is disconnected and the VCO is not loaded by  $C_{sam}$ . Therefore, the periodic switching of the sampler at frequency  $f_{ref}$  modulates  $f_{VCO}$  in a way similar to the case of binary frequency shift keying (BFSK) as shown in Fig. 6.3(a). The VCO waveform in this case can be expressed as:

$$v_{VCO}(t) = A_{VCO} \cos[2\pi f_{VCO,avg} t + \int 2\pi \Delta f_{VCO}(t) dt]$$
(6.8)

where  $A_{VCO}$  is the VCO amplitude and  $f_{VCO,avg}$  is the average VCO frequency which is locked to  $N \cdot f_{ref}$  by the PLL.  $\Delta f_{VCO}(t)$  is the difference between the instantaneous VCO frequency and  $f_{VCO,avg}$  and has the same shape as the Ref waveform. Using Fourier transform, the fundamental harmonic content of  $\Delta f_{VCO}(t)$  can be calculated as:

$$\Delta f_{VCO}(t) = \frac{2}{\pi} \cdot \Delta f_{VCO, p-p} \cdot \sin(\pi \cdot D_{ref}) \cdot \cos(2\pi f_{ref}t)$$
(6.9)

where  $D_{ref}$  is the Ref duty cycle and  $\Delta f_{VCO,p-p}$  is the peak-to-peak amplitude of  $\Delta f_{VCO}(t)$ . Assuming  $C_{sam} \ll C_{tank}$ , we have

$$\Delta f_{VCO,p-p} \approx \frac{C_{sam}}{2C_{tank}} \cdot f_{VCO,avg} = \frac{C_{sam}}{2C_{tank}} \cdot N \cdot f_{ref} \cdot$$
(6.10)

Substituting (6.9) and (6.10) into (6.8), the VCO spur at  $f_{ref}$  offset, i.e., the VCO reference spur can be derived as:

$$SP_{fref,BFSK} = 20\log[\sin(\pi \cdot D_{ref}) \cdot \frac{N}{2\pi} \cdot \frac{C_{sam}}{C_{tank}}].$$
(6.11)

When there is a buffer between the VCO and SSPD as in [9],  $C_{sam}$  in (6.11) should be replaced by the effective capacitance-change seen by the VCO due to Ref switching.

Equation (6.11) indicates that the BFSK effect induced reference spur varies with  $\sin(\pi D_{ref})$ , which can be used to verify whether it is the dominant spur source.



Figure 6.4. (a) Schematic and timing diagram of inverter buffer, (b) measured change of reference spur level with buffer input bias from the design in [9].

In [9], inverters as shown in Fig. 6.4 (a) are used to convert the sine wave crystal oscillator (XO) into a steep square wave Ref. Now, the XO output is DC biased to  $V_{DC,in}$  with an offchip bias-T and  $D_{ref}$  can be tuned by tuning  $V_{DC,in}$ . Fig. 6.4(b) shows the measured reference spur variations of the design of [9] while tuning  $V_{DC,in}$ . The shape matches well with the simulated 20log  $\sin(\pi \cdot D_{ref})$ . We can conclude here that the BFSK effect is the major cause of the poor reference spur in [9].

In order to suppress the BFSK effect, a complementary switched dummy sampler can be added (see Fig. 6.3(b)) as discussed in Chapter 5. Due to the complementary switching of the sampler and its dummy, the VCO is always connected to one  $C_{sam}$ . The VCO capacitive load thus does not change over time and the BFSK effect is compensated. In reality, this compensation is not perfect due to capacitor mismatch between the sampler and its dummy. Since the mismatch in the sampling capacitor  $\Delta C_{sam}$  is proportional to the square root of  $C_{sam}$ , (6.11) becomes

$$SP_{fref,BFSK} = 20\log[\sin(\pi \cdot D_{ref}) \cdot \frac{N}{2\pi} \cdot \frac{A_C \sqrt{2C_{sam}}}{C_{tank}}]$$
(6.12)

where  $A_C$  is a process constant describing the matching property of the sampling capacitor. The  $\sqrt{2}$  factor rises because it is the mismatch between two  $C_{sam}$ . It is thus desirable to have a small  $C_{sam}$  for a low spur level. However, a smaller  $C_{sam}$  means a larger kT/ $C_{sam}$  and more sampler noise [9]. There is thus a tradeoff between the spur level and the in-band phase noise due to the SSPD.



Figure 6.5. Conceptual illustration of (a) the case of minimum charge sharing, (b) the case of maximum charge sharing; (c) amount of charge sharing when the relative position of the Ref falling edge and VCO zero-crossing changes.

### 6.3.2 Charge Sharing/Injection

Apart from the BFSK effect, the VCO sampling activity also brings two other mechanisms which disturb the VCO operation, namely charge injection from the sampling switches to the VCO and charge sharing between the VCO and  $C_{sam}$ . While the former can be canceled by adding dummy switches [6], [7], the latter needs more effort to deal with. The VCO- $C_{sam}$  charge sharing occurs because the voltages on  $C_{sam}$  and the VCO tank capacitor may not be equal when they are connected at the switch-on moment, which can be explained using Fig. 6.5. Without loss of generality, we assume that the sampling switch is on when Ref is low and off when Ref is high (PMOS switches are used in the design for practical reasons). The Ref rising edge is then the sampling edge, i.e., the moment of switch-off where holding starts and voltage is sampled. The Ref falling edge is the tracking edge, i.e., the moment of switch-on where tracking starts. After the PLL achieves locking, the Ref sampling edge is aligned with a VCO zero-crossing. The voltage on  $C_{sam}$  at the switch-on moment is then well-defined and equal to the VCO DC voltage:  $V_{sam,ont}=V_{VCO,DC}$ , where the symbol '!' is used to stress the specific moment in time. In contrast, the voltage on the VCO tank capacitor at the switch-on moment  $V_{VCO,ont}$  depends on the position of the



Figure 6.6. Schematic and timing diagram of the proposed duty cycle controlled Ref buffer.

Ref tracking edge which is ill-defined<sup>2</sup>. When the Ref tracking edge occurs at the VCO zero-crossings as shown in Fig. 6.5(a), we have  $V_{VCO,on!}=V_{sam,on!}=V_{VCO,DC}$  and hence no VCO- $C_{sam}$  charge sharing. When the Ref tracking edge occurs at the VCO peaks as shown in Fig. 6.5(b), we have  $V_{VCO,on!}=V_{sam,on!}-A_{VCO}$  and maximum charge sharing. Using the simplified model in Fig. 6.3(a) and assuming  $C_{sam} << C$ , the amount of charge sharing can be calculated as:

$$\Delta q \approx (V_{VCO,on!} - V_{VCO,DC}) \cdot C_{sam} = A_{VCO} \cos(2\pi f_{VCO} \cdot \Delta t_{track-VCO}) \cdot C_{sam}$$
(6.13)

When the relative position of the Ref tracking edge and VCO zero-crossing  $\Delta t_{track-VCO}$  changes,  $\Delta q$  follows the VCO waveform and is periodic as shown in Fig. 6.5(c). Since more charge sharing means more disturbance to the VCO, qualitatively we can expect the VCO spur due to charge sharing to vary in a periodic pattern when we change  $\Delta t_{track-VCO}$ . This is already observed in Chapter 5 and will be discussed further in the measurement part in Section 6.5.

It is worth noting that, in contrast to the case with the CP, all the aforementioned SSPD spur mechanisms disturb the VCO without going through the PLL loop filter. In other words, the loop filter renders no filtering for the SSPD caused spur and there is no tradeoff between low (SSPD caused) spur and high PLL bandwidth.

<sup>&</sup>lt;sup>2</sup> It is determined by the distance between the two Ref edges, i.e., determined by the Ref duty cycle which is un-controlled at this stage.



Figure 6.7. Block diagram of the low spur PLL.

#### 6.3.3 Low Spur SSPLL Architecture

From the previous section, it is clear that if we can tune the Ref tracking edge such that it is also aligned to a VCO zero-crossing, there is ideally no VCO- $C_{sam}$  charge sharing. For the SSPLL, the timing of the Ref sampling edge is highly critical while the tracking edge is hardly relevant. It is thus desired to leave the sampling edge alone while tuning the tracking edge. With the simple inverter Ref buffer in Fig. 6.4(a), the Ref falling edge can be tuned by tuning  $V_{DC,in}$  but it also changes the timing of the Ref rising edge. Fig. 6.6 shows the modified inverter buffer proposed in Chapter 5 which can solve this problem. As explained in Chapter 5,  $\Delta t_1$  and  $\Delta t_2$  are set such that the conduction time for P1 and N1 is nonoverlapping. Therefore, the Ref rising edge is defined by XO via N1 while the Ref falling edge is independently defined by  $V_{GP}$  via P1 (and the inverter thereafter). The Ref falling edge can then be tuned by tuning  $\Delta t_1$ , without affecting the Ref rising edge.

In order to align the Ref falling edge with the VCO zero-crossing, we also need a phase detector to detect the phase difference between them. The dummy sampler in Fig. 6.3(b) serves this purpose well since it operates in a complementary way and uses the Ref rising edge, i.e., Ref falling edge as its sampling edge. Fig. 6.7 shows the proposed low spur PLL architecture. The core is a SSPLL similar to the one in [9]. It uses a SSPD that utilizes the Ref rising edge to sample the VCO and thus aligns the Ref rising with a VCO zero-crossing.

On top of the SSPLL, a sub-sampling DLL (SSDLL) is added which uses the same SSPD/CP as the SSPLL, but its sampling clock  $\overline{\text{Ref}}$  is the inverse of Ref. A transmission gate compensates the inverter delay. The SSDLL thus uses the  $\overline{\text{Ref}}$  rising edge to sample the VCO and aligns the  $\overline{\text{Ref}}$  rising edge, i.e., the Ref falling edge to the VCO zerocrossing. Now, both the Ref rising and falling edges are aligned with the VCO zerocrossings and the condition for no VCO-*C*<sub>sam</sub> charge sharing is achieved. Moreover, the SSPD/CP in the SSDLL acts as a dummy for the SSPD/CP in the SSPLL which compensates the BFSK effect and cancels the charge injection from the sampling switches to the VCO. Therefore, all the three aforementioned SSPD related spur mechanisms are largely suppressed. Since the SSDLL tuning only affects the timing of Ref falling edge which is the non-critical edge for the SSPLL, it will neither disturb the SSPLL operation nor add noise to the SSPLL output.

For simplicity, the above spur analysis assumed that the SSPD is directly connected to the VCO. In practice, buffers can be added between the SSPD and VCO to provide isolation. However, practical buffers have limited isolation due to e.g. parasitic capacitors. The SSPD will still disturb the VCO via parasitic paths and the insights developed for SSPD spur mechanisms in the case of no buffer remain useful design guidelines. The proposed techniques provide extra spur reduction in addition to the use of buffering, and thus relax the buffering needs while achieving a certain spur level. This saves power as buffers running at  $f_{VCO}$  are power consuming. In applications where moderate spur level is tolerable, this power advantage can be exploited to its maximum by removing buffering for isolation completely as we did in Chapter 5. In the design described in this chapter we do use a buffer in order to demonstrate very low spur.

## 6.4 Design and Implementation

#### 6.4.1 VCO

Fig. 6.8 shows the schematic of the VCO which is the same as the one used in Chapter 5. It is a tail biased one with double switch pair. The inductor has a value of  $9 \text{ nH}^3$ . The VCO has a 50 MHz/V analog tuning gain and a 3-bit digital controlled capacitor bank to increase the frequency tuning range to overcome process spread. It draws 1 mA from a 1.8 V supply.

<sup>&</sup>lt;sup>3</sup> The inductor used here has a large value. To lower the spur level, a smaller coil could be used so that the tank capacitor can be larger which reduces the sensitivity of the VCO to the SSPD spur mechanisms.



Figure 6.8. Schematic of the VCO.



Figure 6.9. Schematic of the SSPD/CP with Pulser.



Figure 6.10. Schematic of the SSDLL.

### 6.4.2 SSPD/CP with Pulser

Fig. 6.9 shows the schematic of the SSPD/CP with Pulser. Aiming at very low spur, a 2stage CML inverter is used as a buffer to isolate the VCO from the SSPD. The sampling capacitor in the SSPD has a value of 10 fF. A 2 k $\Omega$  passive resistance  $R_{sam}$  is added in series with the MOS switch on the shared path of the SSDLL and SSPLL, which serves two purposes. Because  $C_{sam}$  is charged and discharged by the MOS switch, the on-resistance of the MOS switch plays a role in the transient behavior. By setting the value of  $R_{sam}$  to be larger than the on-resistance of the MOS switch, the overall on-resistance will be governed by  $R_{sam}$ . Since  $R_{sam}$  is shared, the mismatch between the on-resistance of the two SSPDs is reduced, leading to a better matching in the SSPD RC constant. Secondly, the sine-wave VCO becomes more like square wave after the CML buffer, which reduces the linear range of the SSPD. The added  $R_{sam}$  together with  $C_{sam}$  also forms a low pass filter and brings the waveform back to sine-wave like before it is sampled by the SSPD. Since the noise contribution of the SSPD is governed by KT/C, adding  $R_{sam}$  will not increase the SSPD noise.

The CP consists of two stages. The first stage is a differential pair converting the sampled voltage into current and the second stage has been explained in Fig. 6.2(b). The CP up- and down-current sources are biased at 20  $\mu$ A. The current source switches use near minimum size and the dumping capacitor is set to 2.5 pF, to reduce the effect of clock feed-through and charge injection.

### 6.4.3 SSDLL

The schematic of the SSDLL is displayed in Fig. 6.10. The tunable delay cell is implemented with a current starved inverter and its tuning range is designed to cover one VCO period with margin, which is enough for the SSDLL to align the Ref falling edge with a VCO zero-crossing. The rest of the Ref buffer has been shown in Fig. 6.6.



Figure 6.11. Simulated settling of the overall system.

## 6.4.4 Settling Behavior

The overall architecture in Fig. 6.7 includes multiple loops: a SSPLL core loop, a FLL for frequency locking which consists of a divider and a 3-state PFD/CP with a build-in (- $\pi$ ,  $\pi$ ) dead zone (DZ) [9], and a SSDLL for Ref duty cycle tuning. Since the SSDLL only tunes the Ref tracking edge, it will not affect the loop dynamics of the SSPLL. The delay of the DLL delay cell is set to the middle of its tuning range at start-up.

Fig. 6.11 shows the transient simulation results for the overall system. During frequency acquisition,  $f_{VCO}$  is much different from  $N \cdot f_{ref}$ . The FLL dominates the loop dynamic and charges up the loop filter. There are several noticeable regions that the FLL is doing nothing. That is because even though the frequency error is not yet zero, the instantaneous phase error can be smaller than  $\pi$  and falls inside the DZ. The CP in the FLL thus injects no current into the loop filter. Since there is still a frequency error, the phase error keeps accumulating until it becomes larger than  $\pi$  and falls outside the DZ. The FLL then takes action again. After the core SSPLL loop achieves locking, the frequency error is zero and the phase error is always small. The FLL stays quiet and injects nothing to the filter. The SSDLL settles later than the SSPLL since we set its bandwidth to be smaller than that of the SSPLL. For experimental purpose, the SSDLL tuning can be disabled from off-chip by connecting its filter capacitor to half supply instead of its CP.



Figure 6.12. Chip microphotograph.



Baseband Noise using a Test Set

Figure 6.13. Measured PLL phase noise.



Figure 6.14. Jitter and power comparison between this work and other good FOM PLLs.

### **6.5 Experimental Results**

To verify the presented ideas, a 2.21 GHz SSPLL according to Fig. 6.7 has been fabricated in a standard 0.18- $\mu$ m CMOS process and tested in a 24 pin Quad LLP package. Fig. 6.12 shows a die microphotograph. All circuitry uses 1.8 V battery supply, while separate supply domains provide isolation. The reference clock is derived from a 55.25 MHz SC Sprinter crystal oscillator from Wenzel Associates. The XO output is attenuated to 1.8 V<sub>p-p</sub> and DC biased using a bias-T before it is fed into the chip.

The PLL core (excluding the 50  $\Omega$  buffer for measurement) consumes 3.8 mW, with less than 0.2 mW in the SSDLL. Fig. 6.13 shows the measured phase noise spectrum using an Agilent E5501B phase noise measurement setup. The in-band phase noise is -121 dBc/Hz at 200 kHz offset and out-of-band phase noise is -138 dBc/Hz at 20 MHz offset. Enabling the SSDLL does not increase the phase noise level. Compared with [9], the in-band phase noise is 5 dB higher, mainly because we used one more SSPD buffer stage and a 6x smaller  $C_{sam}$  in this design which helps reducing the spur level but raises the noise contribution of the SSPD and its buffer. According to the noise summary in Spectre RF PNoise simulations, the reference clock (XO and buffer), the SSPD and its buffer, and the rest of the circuits contribute 30%, 55% and 15% to the in-band phase noise at 200 kHz, respectively. Due to this higher in-band phase noise and a less optimally designed loop bandwidth, it also has a higher jitter than [9]: 0.3 ps<sub>rms</sub> integrating from 10 kHz to 100 MHz. However, the PLL FOM [11] of this design is still competitive compared to the best low jitter PLL designs we found in ISSCC and JSSC papers as shown in Fig. 6.14 even though our design isn't optimized for jitter but for a low reference spur<sup>4</sup>.

To investigate the effect of the SSDLL on the spur level, spurs at  $f_{ref}$  (reference spur) as well as spurs at  $2f_{ref}$  away from the VCO frequency have been measured with the SSDLL enabled and disabled while tuning the position of the Ref falling edge via changing  $V_{DC,in}$ . The result is shown in Fig. 6.15(a). When the SSDLL is disabled, the spurs show a periodic pattern when the relative position of the Ref tracking edge and VCO zero-crossing is changed by  $T_{VCO}^{5}$  Note that when the SSDLL is disabled by disconnecting its loop filter and the tunable delay cell, its SSPD still functions as the dummy for the SSPD of the SSPLL and helps to reduce the spur level. When the SSDLL is enabled, the spurs hardly change with  $V_{DC,in}$  which indicates that the DLL tuning works. The spur level with the SSDLL enabled (corresponding to minimum charge sharing in theory) is not the lowest but close to the average. This can be explained if the charge sharing has comparable contribution as the other spur mechanisms. Depending on the relative position of the Ref falling edge and the VCO zero crossing, the charge sharing sign can be positive or negative  $(C_{sam} \text{ injects charge to or absorbs charge from the VCO, see Fig. 6.5).$  It thus may add up or cancel the other spur sources, thereby increasing or reducing the spur level. Although enabling the SSDLL does not result in the lowest spur, it is still valuable as it improves the worst case spur. The improvement is limited in this case, but reduced variability is still valuable. The power and area overhead of having the DLL tuning is also small.

Another observation from Fig. 6.15(a) is that there are significant spurs at  $2f_{ref}$ . That is because with the complementary switched dummy sampler added, the SSPD switching on/off activity is doubled. This does not affect the BFSK effect since  $f_{VCO}$  still changes once every Ref period. However, the charge injection/sharing now happens twice every Ref period. Therefore we can expect to see spurs at  $f_{ref}$  as well as  $2f_{ref}$ . In the design of Chapter 5 with no SSPD buffer, spurs at  $2f_{ref}$  are slightly lower than the spurs at  $f_{ref}$  (Fig. 5.10). While in this design with SSPD buffer added, we see from Fig. 6.15(a) that spurs at  $2f_{ref}$  are actually a few dB higher than the spurs at  $f_{ref}$ . This can be explained as practical buffers have limited isolation due to e.g. parasitic capacitors. Therefore they provide less isolation at higher frequencies where parasitic effects are more prominent. Fig. 6.15(b) shows the measured spurs from 20 chips with the SSDLL enabled. The worst sample has <-76 dBc at  $2f_{ref}$  and <-80 dBc at  $f_{ref}$ . The reference spur is thus >34 dB better than [9]. The spectrum of the chip with the lowest spurs is shown in Fig. 6.16.

<sup>&</sup>lt;sup>4</sup> The reference spurs for the low jitter PLL designs in [9] and [12-16] are either not reported or larger than -65 dBc. Therefore they are not included in the reference spur comparison in Table 6.1.

<sup>&</sup>lt;sup>5</sup> In measurement, it is not possible to see how much the Ref tracking edge is shifted on-chip with a certain change in  $V_{DC,in}$ . Simulation is thus used to estimate the shifts of Ref falling when  $V_{DC,in}$  is tuned from 0.5 V to 0.6 V in Fig. 6.15(a). It can only be a coarse estimation as the measured sample is subject to PVT variations.



Figure 6.15. (a) Measured spur variations while tuning the position of Ref tracking edge via tuning  $V_{DC,in}$ ; (b) Spurs measured from 20 chips with SSDLL tuning enabled.

| 🔆 Agilent 10:12:21 Mar 2, 2010 |                                  |                             |                                      |       |                     |                                                  |                         |              |                                                |                   |
|--------------------------------|----------------------------------|-----------------------------|--------------------------------------|-------|---------------------|--------------------------------------------------|-------------------------|--------------|------------------------------------------------|-------------------|
| Ref -1                         | 0 dBm                            |                             | Atten                                | 10 dB |                     | >                                                |                         | <b>⊿</b> Mk  |                                                | 5.2 MHz<br>177 dB |
| #Peak<br>Log<br>10<br>dB/      | Marke<br>-55.2<br>-84.4          | 2000                        | 100 M                                | Hz_   | 2                   | R                                                |                         |              |                                                |                   |
|                                | 2                                |                             |                                      |       | /                   | \                                                |                         |              |                                                |                   |
| LgAv                           | Î                                |                             | Ŷ                                    |       | ~                   | hanne                                            | •~11.00,h47.0           | <del>.</del> |                                                | l                 |
| Center                         | Center 2.210 0 GHz Span 240 MHz^ |                             |                                      |       |                     |                                                  |                         |              |                                                |                   |
| #Res B                         | #Res BW 20 kHz                   |                             |                                      |       |                     | Hz                                               | Sweep 9.373 s (601 pts) |              |                                                |                   |
| Mark<br>1R<br>1∆<br>2R<br>2∆   | 0<br>0<br>0                      | ace<br>1)<br>1)<br>1)<br>1) | Type<br>Fred<br>Fred<br>Fred<br>Fred |       | 2.21<br>-59<br>2.21 | Axis<br>0 0 GHz<br>5.2 MHz<br>0 0 GHz<br>3.4 MHz |                         |              | Amplit<br>-11.09<br>-84.48<br>-11.09<br>-80.55 | dBm<br>dB<br>dBm  |

Figure 6.16. Spectrum of the chip with the lowest spur in Fig. 6.15(b).

|                                                        | This work           | [2]             | [3]             | [5]             | [6]*            | [7]*            | [17]            | [18]            |
|--------------------------------------------------------|---------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| f <sub>out</sub> (GHz)                                 | 2.21                | 5.5             | 5.4             | 5.2             | 2.4             | 3.6             | 5.24            | 5.3             |
| f <sub>ref</sub> (MHz)                                 | 55.25               | 43              | 10              | 10              | 12              | 50              | 13.33           | 20              |
| $f_c/f_{ref}$                                          | 1/20                | 1/540           | 1/400           | 1/50            | 1/12            | 1/50            | 1/66            | 1/333           |
| Spur@f <sub>ref</sub> /2f <sub>ref</sub><br>(Sample #) | <-80/<-76<br>(#=20) | -69<br>(#=1)    | -70<br>(#=1)    | -69<br>(#=1)    | -70<br>(#=4)    | -74<br>(#=1)    | -66<br>(#=1)    | -74<br>(#=1)    |
| In-band phase<br>noise (dBc/Hz)                        | -121@<br>200kHz     | -88@<br>40kHz   | -63@<br>10kHz   | -76@<br>20kHz   | -103@<br>100kHz | -98@<br>100kHz  | -95@<br>40kHz   | -79@<br>10kHz   |
| Power (mW)                                             | 3.8                 | 23              | 13.5            | 19.8            | 39              | 110             | 56              | 36              |
| Active area (mm <sup>2</sup> )                         | 0.20                | ?               | 0.49            | 0.64            | <4.8            | 2.7             | ?               | 0.8             |
| Technology                                             | 0.18-µm<br>CMOS     | 0.25-µm<br>CMOS | 0.25-µm<br>CMOS | 0.18-µm<br>CMOS | 0.18-µm<br>CMOS | 0.18-µm<br>CMOS | 0.18-µm<br>CMOS | 0.18-µm<br>CMOS |

\*This is a Fractional-NPLL with the numbers measured in Integer-N mode.

Table 6.1. LS-SSPLL performance summary and comparison with low spur PLL designs.

Table 6.1 summarizes the PLL performance and displays a comparison with other low spur PLLs. This design has the lowest spur combined with lower in-band phase noise and power consumption. Note that we measured 20 samples and the low spur is achieved with a high  $f_c f_{ref}$  of 1/20. The measurement results in Fig. 6.16 suggest that the spur level is still limited by the SSPD not the CP. The PLL bandwidth can thus be increased even further without increasing the spur level. When an even lower spur level is desired, more buffering or buffers with better isolation (than the 2-stage CML buffer here) may be used to further isolate the VCO from the SSPD.

## 6.6 Conclusion

In a SSPLL, the CP is amplitude controlled and insensitive to mismatch. Low CP ripple can thus be achieved with a simple design. With the CP ripple reduced, the main source of VCO spur is the SSPD sampler which periodically disturbs the VCO operation via charge injection, charge sharing and frequency modulation due to a change in the VCO capacitive load. In contrast to the CP-induced spurs, the spur due to periodic sampling of the VCO is not related to the loop filter and there is thus no tradeoff between high loop bandwidth and low spur. Dummy samplers and isolation buffers are used to minimize the disturbance of the SSPD the VCO. A duty-cycle controlled reference buffer with DLL tuning is proposed to further reduce the worst case spur level. While using a high loop-bandwidth-to-reference-frequency ratio of 1/20, the reference spurs measured from 20 chips are <-80 dBc. Since the frequency divider noise is eliminated and the SSPD and CP noise is not

multiplied by  $N^2$ , the sub-sampling based PLL also has good phase noise performance. It achieves -121 dBc/Hz at 200 kHz in-band phase noise with only 3.8 mW power. The output jitter integrated from 10 kHz to 100 MHz is 0.3 ps<sub>rms</sub>.

## 6.7 References

- [1] C. S. Vaucher, Architectures for RF Frequency Synthesizers. Boston, MA: Kluwer, 2002.
- [2] C. M. Hung and K. K. O, "A fully integrated 1.5-V 5.5-GHz CMOS phase-locked loop," *IEEE J. Solid-State Circuits*, vol. 37, pp. 521–525, Apr. 2002.
- [3] S. Pellerano, S. Levantino, C. Samori and A. L. Lacaita, "A 13.5-mW 5-GHz frequency synthesizer with dynamic-logic frequency divider," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 2, pp. 378-383, Feb. 2004.
- [4] S. L. J. Gierkink, "Low-spur, low-phase-noise clock multiplier based on a combination of PLL and recirculating DLL with dual-pulse ring oscillator and self-correcting charge pump," *IEEE J. Solid-State Circuits*, vol. 43, pp. 2967-2976, Dec. 2008.
- [5] C.-F. Liang, S.-H. Chen and S.-I. Liu, "A digital calibration technique for charge pumps in phase-locked systems," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 390-398, Feb. 2008.
- [6] K. J. Wang, A. Swaminathan and I. Galton, "Spurious tone suppression techniques applied to a wide-bandwidth 2.4 GHz fractional-N PLL," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2787-2797, Dec. 2008.
- [7] S. E. Meninger and M. H. Perrott, "A 1 MHz bandwidth 3.6 GHz 0.18 umCMOS fractional-N synthesizer utilizing a hybrid PFD/DAC structure for reduced broadband phase noise," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 966–980, Apr. 2006.
- [8] M. G. Johnson and E. L. Hudson, "A variable delay line PLL for CPU coprocessor synchronization," *IEEE J. Solid-State Circuits*, vol. 23, pp.1218-1223, Oct. 1988.
- [9] X. Gao, E. Klumperink, M. Bohsali and B. Nauta, "A low noise sub-sampling PLL in which divider noise is eliminated and PD/CP noise is not multiplied by N<sup>2</sup>," *IEEE J. Solid-State Circuits (JSSC)*, vol. 44, no.12, pp. 3253-3263, Dec. 2009.
- [10] X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "Spur-reduction techniques for PLLs using sub-sampling phase detection," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 474-475, Feb. 2010.

- [11] X. Gao, E. Klumperink and B. Nauta, "Jitter analysis and a benchmarking figure-ofmerit for phase-locked loops," *IEEE Trans. Circuits Syst. II*, vol. 56, no.2, pp. 117-121, Feb. 2009.
- [12]B. Helal, C.-M. Hsu, K. Johnson and M. Perrott, "A low jitter programmable clock multiplier based on a pulse injection-locked oscillator with a highly-digital tuning loop," *IEEE J. Solid-State Circuits*, vol. 44, pp.1391-1400, May 2009.
- [13] J. Lee and H. Wang, "Study of subharmonically injection-locked PLLs," *IEEE J. Solid-State Circuits*, vol. 44, pp.1539-1553, May 2009.
- [14] R. C. H. van de Beek, C. S. Vaucher, D. M. W. Leenaerts, E. Klumperink and B. Nauta, "A 2.5–10-GHz clock multiplier unit with 0.22-ps RMS jitter in standard 0.18µm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 11, pp. 1862-1872, Nov. 2004.
- [15] C.-M. Hsu, M. Z. Straayer and M. H. Perrott, "A low-noise, wide-BW 3.6GHz digital Δ∑ fractional-N frequency synthesizer with a noise-shaping time-to-digital converter and quantization noise cancellation," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2776-2786, Dec. 2008.
- [16] R. Gu, A. Yee, Y. Xie and W. Lee, "A 6.25GHz 1V LC-PLL in 0.13µm CMOS," IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 594-595, Feb. 2006.
- [17] P. Zhang, T. Nguyen, C. Lam, D. Gambetta, C. Soorapanth, B. Cheng, S. Hart, I. Sever, T. Bourdi, A. Tham and B. Razavi, "A direct conversion CMOS transceiver for IEEE 802.11a WLANs," in *IEEE Int. Solid-State Circuits Conf.*, pp. 354-355, Feb. 2003.
- [18] C.-Y. Kuo, J.-Y. Chang and S.-I. Liu, "A spur-reduction technique for a 5-GHz frequency synthesizer," *IEEE Trans. Circuits Syst. I*, vol. 53, no. 3,pp. 526-533, Mar. 2006.

## Chapter 7

# Conclusions

## 7.1 Summary and Conclusions

### Chapter 1

A periodic clock signal is required in many ICs. These clocks are for instance used to define the sampling moments in analog-to-digital or digital-or-analog data converters; to up-convert and down-convert the wanted signals in wireless transceivers and to synchronize the data flow in wireline and optical serial data communication links. The timing/phase accuracy of the clock affects the overall system performance and therefore the jitter/phase-noise of the clock generator should be low. Moreover, a clock generator is also desired to dissipate low power to save energy. This thesis aims to design a clock generation PLL with low jitter (on the order of 100 fs) as well as low power consumption (on the order of 10 mW).

#### Chapter 2

In Chapter 2 we described the classical PLL and analyzed its phase noise, jitter and power consumption. The phase noise of a classical PLL can be classified into two parts: 1) the VCO phase noise which is high pass filtered and dominates out-of-band and 2) the loop phase noise contributed by the reference clock, phase detector (PD), charge pump (CP) and frequency divider which is multiplied by  $N^2$ , low pass filtered and dominates in-band. The VCO phase noise in some respects systematically depends on the oscillation frequency and power consumption. It can be characterized using the well-known VCO figure-of-merit (FOM), which normalizes for this systematic dependency. For the loop phase noise, we found that it scales with the reference frequency, frequency division ratio N and the power consumption of the loop components when only taking into account the minimum power. A benchmark FOM is proposed to characterize and normalize the loop phase noise, complementary to the existing VCO FOM.

Using the calculated VCO and loop noise and their noise transfer functions, the overall PLL output jitter is calculated. Jitter optimization is discussed and an expression for the minimum jitter is derived. It is shown that, to minimize the PLL output jitter for a given power budget, designers should aim at: 1) spending equal power on the loop and the VCO; and 2) setting the loop bandwidth such that the loop and the VCO contribute equally to the total jitter. In such an optimized PLL, the output jitter is independent of the reference frequency and output frequency for a given power budget. Based on these insights, a benchmark FOM to evaluate PLL jitter performance in relation to the consumed power is proposed. This PLL FOM can be used to benchmark various PLL designs and assist making design decisions. Moreover, system designers can use it to estimate jitter and power during system level design and explore design trade-offs.

### Chapter 3

In Chapter 3 we discussed low jitter multi-phase clock generation. Multi-phase clocks are for instance needed in harmonic and image rejection receivers, high speed serial links and time-interleaved ADCs. We discussed and compared two competing multi-phase clock generator (MPCG) architectures, one based on a delay-locked loop (DLL) and the other on a shift register (SR) or ring counter. For M-phase clock generation, the DLL uses M delay units (DUs) and the SR uses M D flip-flops (DFFs). From an implementation point of view, the SR has a simpler architecture since it does not require analog tuning as in a DLL. However, the SR works at M times higher frequency than the DLL and at first glance seems to consume more power. Nonetheless, analysis with focus on the output jitter and power consumption reveals a different story.

The MPCG output jitter can be divided into two parts: 1) jitter transferred from the reference clock and 2) jitter generated by the MPCG circuits. Analysis shows that both a SR MPCG and a DLL MPCG don't reduce the reference clock jitter, but transfer the same amount of jitter from the reference clock to the output. It is therefore critical for both MPCGs to have a clean reference clock, which can be generated by a PLL from a low frequency crystal oscillator. The PLL for the SR needs a VCO with M times higher frequency, but it can be realized in a power neutral way. We also showed that a SR almost always generates less jitter than a DLL at a given power budget, when both are realized with current mode logic (CML) circuits. This is partly because there is no jitter accumulation from one DFF to the next in a SR while jitter accumulation does occur among the DUs in a DLL. In addition, analysis showed that the jitter generation of a CML circuit is proportional to its delay time. The DFFs in a SR can be designed to have very small delay, while the delay of each DU in a DLL is functionally fixed to be 1/M of the clock period. These results hold for both the jitter due to thermal noise and jitter due to mismatch. The

jitter advantage of the SR is larger if more advanced technologies are used and in applications where clocks with a larger number of phases at lower frequencies are needed.

From a multi-functionality point of view, the SR MPCG is clearly more attractive than a DLL: it is basically a digital circuit which can operate down from arbitrarily low frequency and up until it is limited by technology, while a DLL requires tuning of an "analog" delay. Also, a SR can basically instantaneously change its output frequency, while a DLL settles slowly, due to the preferred low loop bandwidth. Finally, a SR has the flexibility to generate clocks with different duty cycles.

#### Chapter 4

The analysis in Chapter 2 reveals one important bottleneck for the classical PLL to achieve low in-band phase noise: the PD, CP and divider noise is multiplied by  $N^2$  when transferred to the PLL output due to the divide-by-N in the feedback path. In Chapter 4, we proposed a new sub-sampling PLL (SSPLL) architecture which breaks this bottleneck. This SSPLL employs a sub-sampling PD (SSPD) that samples the high frequency VCO output with a low frequency reference clock without using a frequency divider. The VCO phase error is converted into sampled voltage variation. Due to the high slew rate of the high frequency VCO, the phase detection gain of this SSPD is very high, providing large suppression for the PD and CP noise. Phase noise analysis and a phase domain model reveals that, in contrast to what happens in a classical PLL, the PD and CP noise is not multiplied by  $N^2$  in the SSPLL. Moreover, no divider is needed for the phase detection, so divider noise and power can be eliminated. As a result, the SSPLL can achieve very low inband phase noise. Interestingly, the SSPLL has such a low PD, CP noise (and no divider noise) that the inverter buffer for the reference clock becomes the dominant source of the in-band phase noise as well as the dominant consumer of loop-component power.

Despite of the low noise feature, a SSPLL has drawbacks like difficulty of integration (large filter capacitor needed due to very high detection gain) and limited frequency acquisition range. In order to overcome these drawbacks, pulse width gain control is added to the CP to reduce the detection gain. By a careful choice of the pulse width, the detection gain will not be "unnecessarily high" but still high enough to provide substantial suppression for the CP noise. In this way, the low noise feature of the SSPD/CP can be explored without paying unnecessary filter capacitor area. The pulse width gain control block also functions as the slave track-and-hold for the VCO sampling and simplifies the SSPD to a single switch-capacitor circuit. To guarantee correct frequency locking, a classical 3-state PFD/CP based PLL with a dedicated dead zone creator is added as a frequency-locked loop (FLL). In the locked state, the VCO phase error is small and lies inside the dead zone. The FLL thus does not add noise to the PLL output and can be

powered down after locking is achieved.

To prove the concept, a 2.2 GHz SSPLL with a frequency division ratio of 40 is implemented in a standard 0.18- $\mu$ m CMOS process. The in-band phase noise at 200 kHz offset is measured to be -126 dBc/Hz. The reference spur is -46 dBc due to insufficient isolation between the VCO and the SSPD. The SSPLL has an rms output jitter of 0.15 ps (integrated from 10 kHz to 40 MHz) while consuming 5.8 mW in the loop-components and 1.8 mW in the VCO. The PLL benchmarking FOM is -248 dB, >10 dB better than the FOM of the state-of-art classical PLLs. In other words, the designed SSPLL is more than ten times more power efficient when generating the same amount of output jitter.

#### Chapter 5

From the SSPLL design in Chapter 4, we observed that the inverter based reference clock buffer is the dominant source of the loop-component power consumption. This buffer is needed to convert the sine-wave crystal oscillator into a steep square wave for VCO sampling. In Chapter 5, we investigated this inverter buffer and observed that most of the buffer power is wasted due to the short-circuit current caused by simultaneous conduction of the NMOS and PMOS transistors during the (finite slew-rate) zero-crossing transitions. A modified inverter buffer is proposed, where the gates of the NMOS and PMOS are controlled separately. An extra timing control circuit is added between the inverter input and the gate of the PMOS, such that the NMOS and PMOS transistors' conduction time are non-overlapping. In this way, the direct-current path from the supply to ground and thus the short-circuit current is eliminated. Besides the reference clock buffer, the second important source of the SSPLL loop-component power is the high speed isolation buffer between the VCO and the SSPD sampler. We proposed to remove this buffer and directly sample the VCO. Complementary switched dummy samplers are added to keep the disturbance of the sampler to the VCO low.

The two aforementioned power reduction techniques have been applied to a new 2.2 GHz SSPLL prototype fabricated in 0.18- $\mu$ m CMOS. It achieves -125 dBc/Hz in-band phase noise at 200 kHz while consuming only 700  $\mu$ W loop-component power. The whole PLL consumes 2.5 mW and the rms output jitter integrated from 10 kHz to 100 MHz is 0.16 ps, resulting in a PLL FOM of -252 dB. The worst case reference spur measured from 20 chips is -56 dBc. Compared with the design in Chapter 4, the reference spur is 10 dB lower; the loop-components power is 8x lower while the in-band phase noise is only 1 dB worse.

### Chapter 6

In Chapter 4 and 5 we showed that the SSPLL is able to achieve very low in-band phase noise at low power. However, the measured reference spurs are relatively high. In chapter 6 we analyzed the SSPLL spur mechanisms and presented design techniques to drastically reduce the reference spur level.

In a classical PLL, the CP up-current source and down-current source have a constant value fixed by biasing. The amplitude mismatch between the up- and down-current sources introduces CP output current-ripple which is then converted to ripple on the VCO control voltage by the loop filter, resulting in VCO spurs. A small filter bandwidth can be used to suppress the ripple but at the expense of a lower PLL bandwidth. In a SSPLL, the CP is amplitude controlled. We showed that the up- and down-current mismatch is automatically tuned out by the PLL and therefore the CP is insensitive to mismatch. Low CP ripple can thus be achieved with a simple CP design. Given the reduced CP ripple, the main source of spurs is now found to be the SSPD sampler which periodically disturbs the VCO operation. Different mechanisms were identified, namely charge injection, charge sharing and frequency modulation by periodically changing the VCO capacitive load. In contrast to the CP, the SSPD induces spurs without going through the loop filter. There is thus no trade-off between high PLL bandwidth and low spur level.

In order to suppress all the three SSPD spur mechanisms, a DLL/PLL dual loop architecture and a duty cycle controlled Ref buffer is proposed. The DLL uses the same SSPD/CP as in the SSPLL, which acts as a dummy to compensate the frequency modulation effect and charge injection. The DLL also tunes the sampling clock duty-cycle such that the voltages on the VCO and sampling capacitor is the same at the switch-on moments, thereby minimizing charge sharing. The DLL only tunes the non-critical tracking edge without affecting the critical sampling edge. Thus it neither disturbs the SSPLL operation nor adds noise to the SSPLL output. Aiming at very low spur, buffers are used to further isolate the SSPD and VCO, at the expense of extra loop-component power and in-band phase noise added by the buffer.

The spur reduction concepts have been verified by a chip in 0.18- $\mu$ m CMOS technology. Running at 2.2 GHz, the prototype achieves -121 dBc/Hz at 200 kHz in-band phase noise with 3.8 mW power consumption. The output jitter integrated from 10 kHz to 100 MHz is 0.3 ps<sub>rms</sub>. While using a high loop-bandwidth-to-reference-frequency ratio of 1/20, the reference spurs measured from 20 chips are <-80 dBc.

# **7.2 Original Contributions**

- The derivation of the relation between in-band phase noise and the reference frequency, frequency division ratio *N* and the power consumption in a classical PLL, providing a theoretical basis for the "Banerjee benchmarking" for PLL in-band phase noise. (Chapter 2)
- The proposal of a PLL FOM to evaluate the PLL jitter and power performance, with a theoretical basis for the FOM definition. (Chapter 2)
- The jitter-variance-and-power product comparison between Shift Register and DLL for multi-phase clock generation. The analysis of noise and mismatch jitter in CML circuits. (Chapter 3)
- The phase noise analysis based on a phase domain model for a PLL utilizing a subsampling phase detector. It is shown that the PD/CP noise in a sub-sampling PLL is not multiplied by  $N^2$  when transferred to the output, leading to low in-band phase noise. (Chapter 4)
- The design of the first fully integrated sub-sampling PLL exploiting the in-band phase noise benefit. The introduction of a sub-sampling PD/CP with pulse width gain control which simplifies the sampler design and reduces the on-chip loop filter area, and a classical PLL with a dead zone creator as frequency locked loop which guarantees correct frequency locking of the sub-sampling PLL without adding noise. (Chapter 4)
- The introduction of a buffer-less direct VCO sampling scheme to realize a low power sub-sampling PLL. Complementary switched dummy samplers are added to keep the disturbance of the sampler to the VCO low. (Chapter 5)
- The introduction of an inverter based reference clock buffer with low short-circuit current for converting a sine-wave reference into a square wave. This buffer also enables separate tuning for the clock rising and falling edges. (Chapter 5)
- The analysis of the spur generation due to the CP and due to the SSPD in a subsampling PLL. (Chapter 6)
- The introduction of a DLL/PLL dual loop architecture to reduce the spur due to the SSPD. Dummy samplers compensate the frequency modulation effect and charge injection while DLL tuning minimizes charge sharing between the VCO and SSPD. (Chapter 6)

## 7.3 Recommendations for Future Work

- In some applications, the PLL settling time is an important specification. In the current design, a classical PLL with dead zone functions as the FLL. Having a dead zone during frequency acquisition slows down the PLL settling, which may be problematic. It is worthwhile to investigate the settling behavior of the SSPLL further.
- The SSPLL designs in this work achieve very low in-band phase noise. But the quality of the VCO design, with a FOM of about -178 dBc/Hz, is not very high. It is recommended to focus on improving the VCO in order to improve the SSPLL performance even further. If the VCO in our work is replaced by a VCO with a state-of-art FOM of -194 dBc/Hz [1], [2], our model in Chapter 2 predicts that the SSPLL can achieve a PLL FOM of -260 dB. This means that, for instance, a 100 fs rms jitter can be achieved with only 1 mW power consumption.
- The investigation of a sampling based time-to-digital converter (TDC). TDCs are useful in many applications, e.g. in digital PLLs to digitize the VCO timing/phase error. Most of the existing TDCs are based on delay lines and counters and therefore the resolution is limited by the intrinsic gate delay. Using the SSPD, the VCO timing error is converted into sampled voltage variation. By adding an ADC at the SSPD output, the sampled voltage can be digitized. The SSPD together with an ADC thus realizes the function of a TDC. If the frequency is high, the VCO can have very high slew rate, and the gain from time to voltage can be very high. Thus the sampling TDC resolution, which is a key design challenge in many designs, can be very high. Simple calculation shows that for a 1V<sub>p-p</sub> 2 GHz sine-wave VCO with a 1 mV LSB ADC, the resolution for the sampling TDC is about 0.16 ps which is an order of magnitude lower than a state-of-the-art TDC [3].
- The investigation of a SSPD in a Type-I PLL. A Type-I PLL has no integration path in the loop filter but only a proportional path. The Type-I operation is useful in some applications [4], [5]. The SSPD will be very suitable for a Type-I PLL since it outputs a DC voltage, which can be used to directly control the VCO.
- The phase detection gain of the SSPD is independent of the reference and VCO frequency, see (4.5). This feature can be exploited in applications where the PLL bandwidth should be constant over a wide range of input and output frequencies.
- The SSPD works well for integer-*N* PLLs. It will be interesting to investigate whether it can also work in fractional-*N* PLLs since fractional-*N* PLLs are more versatile than integer-*N* PLLs. In a fractional-*N* SSPLL, the SSPD and the following

circuitry need to handle the full VCO swing even in the locked state. Therefore, the linearity of the SSPD and its following circuitry will probably be a key challenge.

## 7.4 References

- Z. Li and K. K. O, "A low-phase-noise and low-power multi-band CMOS voltagecontrolled oscillator," *IEEE J. Solid-State Circuits*, vol. 40, no. 6, pp. 1296–1302, Jun. 2005.
- [2] A. Mazzanti and P. Andreani, "A 1.4 mW 4.90–5.65 GHz class-C CMOS VCO with an average FoM of 194.5 dBc/Hz," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, pp. 474– 475, Feb. 2008.
- [3] M. Lee and A. A. Abidi, "A 9b, 1.25 ps Resolution Coarse–Fine Time-to-Digital Converter in 90 nm CMOS that Amplifies a Time Residue," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 769–777, Apr. 2008.
- [4] B. Zhang, P. E. Allen and J. M. Huard, "A Fast Switching PLL Frequency Synthesizer With an On-Chip Passive Discrete-Time Loop Filter in 0.25-um CMOS," *IEEE J. Solid-State Circuits*, vol. 38, pp.855–865, Jun. 2003.
- [5] P.-Y. Wang, J.-H. C. Zhan, H.-H. Chang and H.-M. S. Chang, "A Digital Intensive Fractional-N PLL and All-Digital Self-Calibration Schemes," *IEEE J. Solid-State Circuits*, vol. 44, pp.2182–2192, Aug. 2009.

# **List of Publications**

### Papers

- 1. X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "Spur Reduction Techniques for Phase-Locked Loops Exploiting a Sub-Sampling Phase Detector," accepted to *IEEE J. Solid-State Circuits*.
- X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "A 2.2GHz Sub-Sampling PLL with 0.16ps<sub>rms</sub> Jitter and -125dBc/Hz In-band Phase Noise at 700μW Loop-Components Power," *IEEE Symposium on VLSI Circuits*, paper 14.1, Jun. 2010.
- X. Gao, E. Klumperink, G. Socci, M. Bohsali and B. Nauta, "Spur Reduction Techniques for Phase-Locked Loops Using Sub-Sampling Phase Detection," *IEEE ISSCC Dig. Tech. Papers*, pp. 474-475, Feb. 2010.
- R. Dutta, T. K. Bhattacharyya, X. Gao and E. Klumperink, "Optimized Stage Ratio of Tapered CMOS Inverters for Minimum Power and Mismatch Jitter Product," 23rd International Conference on VLSI Design, pp.152-157, Jan. 2010.
- 5. **X. Gao**, E. Klumperink, M. Bohsali and B. Nauta, "A Low Noise Sub-Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is not Multiplied by N<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 44, no.12, pp. 3253-3263, Dec. 2009.
- X. Gao, E. Klumperink, M. Bohsali and B. Nauta, "A 2.2GHz 7.6-mW Sub-Sampling PLL with -126dBc/Hz In-band Phase Noise and 0.15ps<sub>rms</sub> Jitter in 0.18μm CMOS," *IEEE ISSCC Dig. Tech. Papers*, pp. 392-393, Feb. 2009.
- X. Gao, E. Klumperink, P. J. F. Geraedts and B. Nauta, "Jitter Analysis and a Benchmarking Figure-of-Merit for Phase-Locked Loops," *IEEE Trans. Circuits Syst. II*, vol. 56, no.2, pp. 117-121, Feb. 2009.
- X. Gao, E. Klumperink and B. Nauta, "Advantages of Shift Registers Over DLLs for Flexible Low Jitter Multiphase Clock Generation," *IEEE Trans. Circuits Syst. II*, vol. 55, no.3, pp. 244-248, Mar. 2008.
- X. Gao, E. Klumperink and B. Nauta, "Low-Jitter Multi-phase Clock Generation: A Comparison between DLLs and Shift Registers," *IEEE Int. Symp. Circuits Syst.*, pp. 2854-2857, May 2007.

### **Book Chapters**

 E. Klumperink, X. Gao and B. Nauta, "Polyphase Multipath Circuits for Cognitive Radio and Flexible Multi-phase Clock Generation," Chapter 7 in *Circuits and Systems for Future Generations of Wireless Communications*, Editors A. Tasic, W. A. Serdijn, L. E. Larson and G. Setti, ISBN 978-1-4020-9918-2, e-ISBN 978-1-4020-9917-5, Springer, 2009.

### Patents

- X. Gao, E. Klumperink, B. Nauta, M. Bohsali, G. Socci and A. Djabbari, "Spur Reduction Technique for Sampling PLLs," National Semiconductor Patent IDF 57459082, May 2009.
- X. Gao, E. Klumperink, B. Nauta, M. Bohsali, G. Socci and A. Djabbari, "Low Power and Low spur Sampling PLL," National Semiconductor Patent IDF 57290082, May 2009.
- X. Gao, E. Klumperink, B. Nauta, M. Bohsali, A. Kiaei, G. Socci and A. Djabbari, "Sampling Phase Detector and Charge Pump with Pulse Width Control," U.S. patent application No. 12/044522, Filed Mar. 2008.

## **Workshop Contributions**

- X. Gao, E. Klumperink, M. Bohsali and B. Nauta, "A PLL Exploiting Sub-Sampling of the VCO Output to Reduce In-band Phase Noise," 20th Annual Workshop on Circuits, Systems and Signal Processing, pp. 326-329, Nov. 2009.
- 15. E. Klumperink, X. Gao, P. J. F. Geraedts, E. van Tuijl and B. Nauta, "Recent Advances in Low Jitter CMOS Clock Generation Stimulated by FoM Definitions," invited lecture at the *IEEE MTT-S International Microwave Symposium (IMS)* 2009 workshop "WSB: Current and Future Trends in Frequency Generation Circuits", Jun. 2009.
- 16. E. Klumperink, R. Shrestha, E. Mensink, **X. Gao** and B. Nauta, "Multipath Polyphase Circuits and Multi-Phase Clock Generation," invited lecture at the *Caltech RF/Microwave seminar*, Pasadena, Feb. 2009.
- 17. E. Klumperink, R. Shrestha, E. Mensink, **X. Gao** and B. Nauta, "Multipath Polyphase Circuits for Cognitive Radio Transmitters", invited lecture at the *IEEE*

*MTT-S International Microwave Symposium (IMS) 2008 workshop "WMB: Enabling Technologies for Wireless Transceivers Beyond-3G"*, Jun. 2008.

 X. Gao, E. Klumperink and B. Nauta, "Comparing DLLs and Shift Registers for Low-Jitter Multi-phase Clock Generation," *18th Annual Workshop on Circuits Systems and Signal Processing*, pp. 29-30, Nov. 2007.

# Acknowledgements

I would like to express my most sincere gratitude and appreciation to the following people who have helped and supported me over the years. They, directly or indirectly, contributed to the completion of my PhD work here at the University of Twente and made it such a rewarding experience.

First of all, I want to thank my promoter Bram Nauta for giving me this enjoyable project and for his continuous support, encouragement and help over the years. It has been a great privilege to be a member of his group. He has given me freedom to conduct and enjoy my research, yet, guided me toward the right direction with his keen insight into IC design. I want to deeply thank my daily supervisor Eric Klumperink for his invaluable guidance and encouragement. He has been a great source of ideas and knowledge. His attitude toward work and the scientific way of thinking have inspired me to continuously challenge myself to reach new levels.

I want to thank National Semiconductor for sponsoring this project. I thank Mounir Bohsali, Gerard Socci and Ali Djabbari for the fruitful cooperation during this project. I want to thank Bijoy Chatterjee for connecting me with National and giving wise personal advice. I also want to thank Kim Wong and Bengyong Zhang for useful technical discussions, Glen Wells for layout assistance and the members of the NS Labs where I spent two enjoyable summers developing my chips.

I thank Gerard Wienk and Henk de Vries for their technical supports. Gerard helped to set the Cadence environment and designed PCB boards for my test chips. Henk organized the measurement equipments and taught me how to use them. I must also thank Gerdien Lammers for her countless help over the years, be it work related or personal matters. I thank Frederik Reenders and the ITC service center for helping me out with the PC and network issues.

I would like to thank all the members of IC Design group for creating a friendly, creative and interactive working environment. I really enjoyed and appreciated it. I thank my office mates Fabian van Houwelingen, Niels Moseley and Shadi Youssef. They have made the office a lot more fun. I would like to thank Paul Geraedts for technical discussions and the joint work about the PLL figure of merit. I also want to thank Anne-Johan Annema, Ed van Tuijl and Ronan van der Zee for teaching me various courses and Frank van Vliet for his valuable comments about the thesis. I also thank the members of Semiconductor Components group on the same floor for offering numerous coffee plus and the fun in the coffee room. I thank Annemiek Janssen for her generous help. I want to extend my appreciations to my Chinese group mate Wei Cheng and Zhiyu Ru and all my Chinese friends in Netherlands. Their help and friendship have made my study abroad a lot easier and more enjoyable.

Finally, my most special thanks go to my beloved family. I thank my wife Xiaoyan for her unconditional love, support and for taking care of our little angle Ruoxi when I am busy. Her existence made my difficult times bearable and my good times more special. I am deeply grateful to my two brothers and sisters-in-law for being so supportive and encouraging throughout the years. I am greatly in debt to my parents. The example they have set and the support they have given me have been overwhelming. If I spent every page trying to thank my parents for what they have done for me over the years, I would run out of pages.

Xiang Gao Enschede, May 2010

# **About the Author**



**Xiang Gao** was born in Yongkang, Zhejiang, China, in 1983. He received the B.E. degree (*with honor*) from the Zhejiang University, Hangzhou, China, in 2004 and the M.Sc. degree (*cum laude*) from the University of Twente, Enschede, The Netherlands, in 2006, both in electrical engineering. Since 2006, he has been with the IC-Design group of the University of Twente, working towards the Ph.D. degree on the subject of low jitter clock generation. During summer 2007 and summer 2008, he was a visiting scholar in National Semiconductor Labs, Santa Clara, California.