# CMOS Signal Synthesizers for Emerging RF-to-Optical Applications

#### Jahnavi Sharma

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

#### **ABSTRACT**

## CMOS Signal Synthesizers for Emerging RF-to-Optical Applications

#### Jahnavi Sharma

The need for clean and powerful signal generation is ubiquitous, with applications spanning the spectrum from RF to mm-Wave, to into and beyond the terahertz-gap. RF applications including mobile telephony and microprocessors have effectively harnessed mixed-signal integration in CMOS to realize robust on-chip signal sources calibrated against adverse ambient conditions. Combined with low cost and high yield, the CMOS component of hand-held devices costs a few cents per part per million parts. This low cost, and integrated digital processing, make CMOS an attractive option for applications like high-resolution imaging and ranging, and the emerging 5-G communication space. RADAR techniques when expanded to optical frequencies can enable micrometers of resolution for 3D imaging. These applications, however, impose upto 100x more exacting specifications on power and spectral purity at much higher frequencies than conventional RF synthesizers.

This generation of applications will present unconventional challenges for transistor technologies - whether it is to squeeze performance in the conventionally used spectrum, already wrung dry, or signal generation and system design in the relatively emptier mm-Wave to sub-mmWave spectrum, much of the latter falling in the "Terahertz Gap". Indeed, transistor scaling and innovative device physics leading to new transistor topologies have yielded higher cut-off frequencies in CMOS, though still lagging well behind SiGe and III-V semiconductors. To avoid multimodule solutions with functionality partitioned across different technologies, CMOS must be pushed out of its comfort zone, and technology scaling has to have accompanying breakthroughs in design approaches not only at the system but also at the block level. In this thesis, while not targeting a specific application, we seek to formulate the obstacles in synthesizing high frequency, high power and low noise signals in CMOS and construct a coherent design methodology to address them. Based on this, three novel prototypes to overcome the limiting factors in each case are presented.

The first half of this thesis deals with high frequency signal synthesis and power generation in CMOS. Outside the range of frequencies where the transistor has gain, frequency generation necessitates harmonic extraction either as harmonic oscillators or as frequency multipliers. We augment the traditional maximum oscillation frequency metric  $(f_{max})$ , which only accounts for transistor losses, with passive component loss to derive an effective  $f_{max}$  metric. We then present a methodology for building oscillators at this  $f_{max}$ , the Maximum Gain Ring Oscillator. Next, we explore generating large signals beyond  $f_{max}$  through harmonic extraction in multipliers. Applying concepts of waveform shaping, we demonstrate a Power Mixer that engineers transistor nonlinearity by manipulating the amplitudes and relative phase shifts of different device nodes to maximize performance at a specific harmonic beyond device cut-off.

The second half proposes a new architecture for an ultra-low noise phase-locked loop (PLL), the Reference-Sampling PLL. In conventional PLLs, a noisy buffer converts the slow, low-noise sine-wave reference signal to a jittery square-wave clock against which the phase of a noisy voltage-controlled oscillator (VCO) is corrected. We eliminate this reference buffer, and measure phase error by sampling the reference sine-wave with the 50x faster VCO waveform already available on chip, and selecting the relevant sample with voltage proportional to phase error. By avoiding the N-squared multiplication of the high-power reference buffer noise, and directly using voltage-mode phase error to control the VCO, we eliminate several noisy components in the controlling loop for ultra-low integrated jitter for a given power consumption. Further, isolation of the VCO tank from any varying load, unlike other contemporary divider-less PLL architectures, results in an architecture with record performance in the low-noise and low-spur space.

We conclude with work that brings together concepts developed for clean, high-power signal generation towards a hybrid CMOS-Optical approach to Frequency-Modulated Continuous-Wave (FMCW) Light-Detection-And-Ranging (LIDAR). Cost-effective tunable lasers are temperature-sensitive and have nonlinear tuning profiles, rendering precise frequency modulations or 'chirps' untenable. Locking them to an electronic reference through an electro-optic PLL, and electronically calibrating the control signal for nonlinearity and ambient sensitivity, can make such chirps possible. Approaches that build on the body of advances in electrical PLLs to control the performance, and ease the specification on the design of optical systems are proposed. Eventually, we seek to leverage the twin advantages of silicon-intensive integration and low-cost high-yield towards developing a

single-chip solution that uses on-chip signal processing and phased arrays to generate precise and robust chirps for an electronically-steerable fine LIDAR beam.

## Table of Contents

| Li           | st of | Figure          | es                                                  | v    |
|--------------|-------|-----------------|-----------------------------------------------------|------|
| Li           | st of | Tables          | S                                                   | xv   |
| $\mathbf{A}$ | cknov | wledge          | ement                                               | xvi  |
| D            | edica | tion            |                                                     | xvii |
| 1            | Intr  | oducti          | ion                                                 | 1    |
| <b>2</b>     | TH    | z Frequ         | uency Synthesis: Maximum Gain Ring Oscillator       | 9    |
|              | 2.1   | Techno          | ologies for high frequency signal generation        | 9    |
|              | 2.2   | $45\mathrm{nm}$ | SOI CMOS Technology Characterization                | 12   |
|              |       | 2.2.1           | Active Devices                                      | 12   |
|              |       | 2.2.2           | Transmission Line                                   | 15   |
|              |       | 2.2.3           | Capacitor (VNCAP)                                   | 16   |
|              | 2.3   | Maxin           | num Gain Ring Oscillator Topology                   | 17   |
|              |       | 2.3.1           | Accounting for Passive Element Loss                 | 21   |
|              |       | 2.3.2           | Determining the Matching Network                    | 24   |
|              |       | 2.3.3           | Circuit-Level Implementation                        | 26   |
|              | 2.4   | Harmo           | onic Power Extraction and Spurious Mode Suppression | 29   |
|              |       | 2.4.1           | Extracting the Kth harmonic                         | 29   |
|              |       | 2.4.2           | Increasing the Output Power                         | 31   |
|              |       | 2.4.3           | Maximizing the harmonic output power                | 33   |

|   |        | 2.4.4 Suppression of spurious modes                                   | 33 |
|---|--------|-----------------------------------------------------------------------|----|
|   | 2.5    | Oscillator Measurement                                                | 36 |
|   | 2.6    | Conclusion                                                            | 39 |
| 3 | $TH_2$ | z Power Generation: Frequency Multipliers                             | 42 |
|   | 3.1    | Scaling Trends in CMOS Multipliers                                    | 42 |
|   | 3.2    | A 134 GHz Doubler in 130 nm CMOS                                      | 47 |
|   | 3.3    | Conclusion                                                            | 49 |
| 4 | $TH_2$ | z Power Generation:Power Mixers                                       | 51 |
|   | 4.1    | Concept of Nonlinearity Engineering in beyond- $f_{max}$ Power Mixers | 53 |
|   |        | 4.1.1 Waveform shaping and output harmonic current                    | 54 |
|   |        | 4.1.2 Harmonic output power                                           | 58 |
|   |        | 4.1.3 Input power requirement                                         | 59 |
|   |        | 4.1.4 Effect of non-ideal input and output terminations               | 61 |
|   | 4.2    | $180-200\mathrm{GHz}$ 130 nm CMOS Power Mixer Implementation          | 64 |
|   |        | 4.2.1 130 nm CMOS 180 – 200 GHz Power Mixer                           | 65 |
|   |        | 4.2.2 130 nm CMOS 120 GHz Frequency Doubler                           | 67 |
|   |        | 4.2.3 130 nm CMOS Fundamental-frequency V-band PAs                    | 67 |
|   |        | 4.2.4 60 GHz Reflection-Type Phase Shifter (RTPS)                     | 69 |
|   |        | 4.2.5 60 GHz Variable Gain Amplifier                                  | 75 |
|   |        | 4.2.6 60 GHz Marchand Balun                                           | 77 |
|   | 4.3    | Measurement                                                           | 79 |
|   |        | 4.3.1 $60\mathrm{GHz}$ RTPS and VGA Breakout                          | 79 |
|   |        | 4.3.2 120 GHz Frequency Doubler Breakout                              | 81 |
|   |        | 4.3.3 $180 - 200 \mathrm{GHz}$ Power Mixer                            | 83 |
|   | 4.4    | Conclusion                                                            | 87 |
| 5 | Low    | Noise and Low Spur RF PLL: Reference-Sampling PLL                     | 91 |
|   | 5.1    | Review                                                                | 91 |
|   |        | 5.1.1 Conventional Type-II Second Order PLLs                          | 92 |

|     | 5.1.2  | Sub-sampling PLLs                                                         |
|-----|--------|---------------------------------------------------------------------------|
|     | 5.1.3  | Injection-Locked Clock Multipliers (ILCM)                                 |
| 5.2 | New S  | Sampled RF-PLL approach: Reference-Sampling Phase Locked Loop (RSPLL) 105 |
|     | 5.2.1  | Motivation: Low noise and Low Spur                                        |
|     | 5.2.2  | Sampled Phase detector (PD)                                               |
|     | 5.2.3  | Sample Edge Selection Circuit (SESCi)                                     |
|     | 5.2.4  | Frequency Tracking Loop                                                   |
|     | 5.2.5  | Proposed PLL Architecture                                                 |
|     | 5.2.6  | Noise and Power Analysis                                                  |
|     | 5.2.7  | Effect of nonidealities                                                   |
| 5.3 | Propo  | sed Loop Implementation                                                   |
|     | 5.3.1  | Loop parameter selection                                                  |
|     | 5.3.2  | Switch size                                                               |
|     | 5.3.3  | SESCi                                                                     |
|     | 5.3.4  | Reference Buffer                                                          |
|     | 5.3.5  | LC-VCO Implementation                                                     |
|     | 5.3.6  | VCO Buffer                                                                |
|     | 5.3.7  | Frequency Tracking Loop                                                   |
|     | 5.3.8  | Output Test Buffer                                                        |
|     | 5.3.9  | Ground isolation and ESD protection                                       |
| 5.4 | Simul  | ated performance and Measurement                                          |
|     | 5.4.1  | Phase noise simulation                                                    |
|     | 5.4.2  | Measured Performances: VCO                                                |
|     | 5.4.3  | Measured Performances: RSPLL                                              |
| 5.5 | Comp   | arison                                                                    |
| 5.6 | Future | e work: Loop Bandwidth Modification                                       |
| Wie | le Ban | adwidth Electro-optic PLLs for FMCW LIDAR 152                             |
| 6.1 | Theor  | y of FMCW detection                                                       |
| 6.2 | Photo  | -electric interface                                                       |
| 6.3 | EO-P   | LL Basics                                                                 |

6

|    | 6.4                       | Propos   | sed EO-PLL for FMCW LIDAR                                  | . 162 |
|----|---------------------------|----------|------------------------------------------------------------|-------|
|    |                           | 6.4.1    | Motivation: Reduction in MZI delay and implementation area | . 162 |
|    |                           | 6.4.2    | Loop architecture                                          | . 164 |
|    |                           | 6.4.3    | Implementation                                             | . 164 |
|    |                           | 6.4.4    | Acquisition of mixer-based phase detector                  | . 170 |
|    |                           | 6.4.5    | Measured Perfomance                                        | . 170 |
|    | 6.5                       | Future   | e work                                                     | . 171 |
|    |                           | 6.5.1    | Mixer-based EO-PLL                                         | . 171 |
|    |                           | 6.5.2    | Conventional EO-PLL around a Laser Phase Shifter           | . 174 |
| 7  | Con                       | ıclusioı | n                                                          | 178   |
| Ι  | Bib                       | oliogra  | aphy                                                       | 181   |
| Bi | bliog                     | graphy   |                                                            | 182   |
| II | $\mathbf{A}_{\mathbf{l}}$ | ppend    | ices                                                       | 197   |
| A  | Effe                      | ect of C | Oscillator Non-ideality in OFDM                            | 198   |
|    | A.1                       | Inter-c  | carrier interference                                       | . 198 |
|    | A.2                       | Phase    | Noise                                                      | . 200 |
| B  | Rag                       | o Con    | ditions in Multi loop DLLs                                 | 203   |

## List of Figures

| 1.1 | Phase noise and $\text{FoM}_{PN} = \left(\frac{f_c}{\Delta f_c}\right)^2 / \left(L_{\Delta f_c} P_{DC,mW}\right)$ for state-of-the-art CMOS PLLs |    |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------|----|
|     | for different carrier frequencies $f_c$ at different offsets $\Delta f_c$ . Maximum phase noise                                                  |    |
|     | limits for m-QAM at the OFDM-based standard's carrier spacing is shown for PLLs                                                                  |    |
|     | with loop bandwidth of $100\mathrm{kHz}$ . 3GPP to WiGig impose phase noise requirements                                                         |    |
|     | at increasing carrier spacing $\Delta f_c$ . Requirements for evolving OFDM-based standards                                                      |    |
|     | are met by CMOS for at least 64 QAM, but with poorer $FOM_{PN}$                                                                                  | 2  |
| 1.2 | Output power and efficiency roll-off in CMOS with increasing frequency                                                                           | 5  |
| 1.3 | (a) "Terahertz Gap" between Electronics and Photonics, [1] (b) Comparison of max-                                                                |    |
|     | imum oscillation frequency $(f_{max})$ across scaling technology nodes                                                                           | 5  |
| 2.1 | Layout of a BC NFET Device. This allows the gate to be doubly contacted in a                                                                     |    |
|     | symmetric fashion                                                                                                                                | 13 |
| 2.2 | The model for the NFET BC device. In, the FB version, there is no 'b' node and                                                                   |    |
|     | $r_{body}$ is absent. The FB node is $b_1$ ,                                                                                                     | 14 |
| 2.3 | (a) A line fitted by linear regression to the measured $U$ and $h_{21}$ plots of the 10 $\times$                                                 |    |
|     | $1\mu\mathrm{m}/56\mathrm{nm}$ BC device with a $J=0.56m\mathrm{A}/\mu\mathrm{m}$ . (b) Comparison of the $f_{max}$ across                       |    |
|     | $J$ from measurement for the $10\times1\mu\mathrm{m}/56\mathrm{nm}$ and $20\times1\mu\mathrm{m}/56\mathrm{nm}$ BC devices and                    |    |
|     | the $10 \times 1 \mu \text{m}/40 \text{nm}$ FB device. (c) Measured $f_T$ across $J$ for all three devices                                       | 14 |
| 2.4 | Measured performance of a 70<br>$\!\Omega$ CPW in 45nm SOI CMOS. A comparison with the                                                           |    |
|     | simulated performance in IE3D is also shown                                                                                                      | 16 |
| 2.5 | (a) Model used for the VNCAP in 45nm SOI CMOS. $La$ and $Ra$ are added to capture                                                                |    |
|     | the via effect and the inductances on either plate are coupled. (b) IE3D VNCAP                                                                   |    |
|     | simulation setup                                                                                                                                 | 17 |
|     |                                                                                                                                                  |    |

| 2.6  | Measured and simulated series capacitance $(=\frac{Imy_{11}}{\omega})$ and Quality Factor $(=\frac{Im(y_{11})}{Re(y_{11})})$ |    |
|------|------------------------------------------------------------------------------------------------------------------------------|----|
|      | of (a) a 214fF and (b) a 70fF VNCAP in 45nm SOI CMOS                                                                         | 18 |
| 2.7  | (a) Cross-coupled oscillator as a two-stage tuned ring oscillator with a single inter-                                       |    |
|      | stage matching inductor. (b) MGRO concept                                                                                    | 19 |
| 2.8  | Power gain $(PG)$ circles on the $G_v$ plane at 100GHz for a $10 \times 1 \mu \text{m}/56 \text{nm}$ body-                   |    |
|      | contacted NMOS device including estimated layout parasitics in a common source                                               |    |
|      | configuration. Current Density, $J=0.56 \mathrm{mA}/\mu\mathrm{m}.$                                                          | 21 |
| 2.9  | Power gain $(PG')$ circles on the $G_v$ plane at 100GHz for the device in Fig. 3.8 with                                      |    |
|      | Inductor Quality Factor taken to be 14 at 100<br>GHz. Current Density, $J=0.56 {\rm mA}/\mu{\rm m}.$                         | 23 |
| 2.10 | (a) MAG' of the $10 \times 1 \mu \text{m}/56 \text{nm}$ body-contacted NMOS device versus frequency for                      |    |
|      | different $Q_L$ values. The annotated $Q_L$ values are for a frequency of 100GHz, and                                        |    |
|      | $Q_L$ is assumed to scale linearly with frequency. (b) Maximum oscillation frequencies                                       |    |
|      | of the device in the MGRO and XCO topologies as a function of $Q_L$                                                          | 25 |
| 2.11 | Circuit diagram of the 216GHz signal source                                                                                  | 27 |
| 2.12 | Circuit diagram of the 316GHz signal source                                                                                  | 28 |
| 2.13 | Chip microphotographs of the (a) 216GHz and (b) 316GHz signal sources                                                        | 28 |
| 2.14 | Ring Oscillator with $K$ -stages to extract the $K$ th harmonic                                                              | 30 |
| 2.15 | Ring Oscillator with $K \times p$ -stages to extract the Kth harmonic. $\phi = \frac{2\pi}{K \times p}$ to ensure            |    |
|      | an inductive matching network                                                                                                | 30 |
| 2.16 | Load-pull plot of the $K=2,p=2$ oscillator at 100<br>GHz                                                                     | 32 |
| 2.17 | 216GHz oscillator frequency and power measurement setup with WR-3 second-                                                    |    |
|      | harmonic mixer-downconverter (SHMD)                                                                                          | 34 |
| 2.18 | $216\mathrm{GHz}$ oscillator frequency and power measurement setup with WR-3 SHMD in-                                        |    |
|      | cluding two additional WR3 bends and one additional WR3 straight. This measure-                                              |    |
|      | ment is to measure the loss of the latter three components                                                                   | 34 |
| 2.19 | Power meter measurement setup for the 216GHz oscillator. The inset shows the                                                 |    |
|      | measured loss of the additional two WR3 bends and the 1" WR3 straight. Details                                               |    |
|      | are provided in the text                                                                                                     | 35 |
| 2.20 | $216\mathrm{GHz}$ oscillator frequency and power measured by a WR-3 SHMD and an Erickson                                     |    |
|      | PM4 power meter. Measurement details are in the text                                                                         | 37 |

| 2.21 | (a) Measured downconverted spectrum of the 216GHz source for a DC power of                    |    |
|------|-----------------------------------------------------------------------------------------------|----|
|      | $57.5 \mathrm{mW},$ output frequency of 216.2<br>GHz and calibrated output power of -14.4dBm. |    |
|      | (b) Measured phase noise                                                                      | 38 |
| 2.22 | $316\mathrm{GHz}$ oscillator frequency and output power measured using the WR3 SHMD           | 39 |
| 3.1  | (a) Scaling of supply voltage and cutoff frequency $(f_T)$ across CMOS nodes. (b)             |    |
|      | Comparison of this work with state-of-the-art CMOS sources across output frequency            |    |
|      | normalized to technology $f_T$                                                                | 43 |
| 3.2  | Circuit diagram of a simple balanced CMOS frequency doubler                                   | 44 |
| 3.3  | (a) Device size needed to deliver maximum power to a 50 $\Omega$ load in a 130 nm CMOS        |    |
|      | balanced doubler. (b) Frequency dependence of $R_{sub,p}$                                     | 45 |
| 3.4  | (a) $P_{in}$ of optimal doubler driving $50\Omega$ across frequency. (b) Frequency dependence |    |
|      | of $2^{nd}$ harmonic current due to NQS effect in $130\mathrm{nm}$                            | 45 |
| 3.5  | Simulated output power for optimal doublers driving 50 $\Omega$ in 130 nm and 65 nm           |    |
|      | CMOS across (a) absolute $f_{out}$ , (b) $f_{out}$ normalized to $f_T$                        | 46 |
| 3.6  | Block diagram and chip photo of the 130 nm CMOS F-band doubler. The annotated                 |    |
|      | values are at 67 GHz after post-layout simulations                                            | 47 |
| 3.7  | (a) First V-band amplifier stage and, (b) the F-band balanced doubler                         | 48 |
| 3.8  | (a) Measured and simulated saturated output power and efficiency. (b) Output                  |    |
|      | power and conversion gain at 134 GHz                                                          | 48 |
| 4.1  | Power mixer technique mixes the first and the second harmonic to generate the                 |    |
|      | third harmonic current. The third harmonic output power can be optimized by                   |    |
|      | controlling the amplitudes and relative phase shifts of the input fundamental and             |    |
|      | second harmonic signals                                                                       | 52 |
| 4.2  | (a) Conventional three phase frequency tripler. (b) Device nonlinearity clips the             |    |
|      | input fundamental sine wave, resulting in a clipped drain current waveform                    | 53 |
| 4.3  | Third harmonic current generated by a device when it is configured as (a) frequency           |    |
|      | tripler (the dashed portion of a curve indicates when the input amplitude violates            |    |
|      | long-term reliability guidelines), and (b) power mixer.                                       | 55 |

| 4.4  | Gate-source voltage shape and the resultant output current waveforms for the power                             |    |
|------|----------------------------------------------------------------------------------------------------------------|----|
|      | mixer with $A_{\omega}=A_{2\omega}=V_{dd}=1.5\mathrm{V}$ and a relative phase shift $\phi$ of (a) 90°, (b) 0°. | 56 |
| 4.5  | Ratio of third harmonic current to peak current, $F_3 = \frac{i_{3\omega}}{i_{peak}}$ , generated by a device  |    |
|      | when it is configured as (a) a frequency tripler, and (b) a power mixer $(A_{\omega} = 1.5 \mathrm{V})$ .      | 57 |
| 4.6  | Peak of output current waveform, $i_{peak}$ , generated by a device when it is configured                      |    |
|      | as (a) a frequency tripler, and (b) a power mixer $(A_{\omega} = 1.5 \mathrm{V})$                              | 57 |
| 4.7  | Output power delivered to the optimal $30\Omega$ load by the device as a (a) frequency                         |    |
|      | tripler, (and b) power mixer with $A_{\omega} = 1.5 \mathrm{V}.$                                               | 58 |
| 4.8  | Fundamental input power required to generate third harmonic when the device is                                 |    |
|      | configured as (a) a frequency tripler, and (b) a power mixer with $A_{\omega}=1.5\mathrm{V}.$ In the           |    |
|      | power mixer case, the second harmonic is assumed to be generated by a balanced                                 |    |
|      | doubler with a conversion loss of 5 dB                                                                         | 59 |
| 4.9  | Fundamental to third harmonic conversion loss when the device is configured as (a)                             |    |
|      | a frequency tripler, and (b) a power mixer with $A_{\omega}=1.5\mathrm{V}.$ In the power mixer                 |    |
|      | case, the second harmonic is assumed to be generated by a balanced doubler with a                              |    |
|      | conversion loss of 5 dB                                                                                        | 60 |
| 4.10 | Effect of non-ideal input and output terminations on the output power of the (a) a                             |    |
|      | frequency tripler, and (b) a power mixer with $A_{\omega}=1.5\mathrm{V.}$                                      | 61 |
| 4.11 | Block diagram and chip photograph of the implemented 2.4 mm $\times$ 1.1 mm 180 $-$                            |    |
|      | 200 GHz power mixer in 130 nm CMOS with $f_{max} \approx 135$ GHz                                              | 62 |
| 4.12 | (a) BEOL cross-section of the $130\mathrm{nm}$ CMOS process. (b) Circuit diagram of the                        |    |
|      | implemented $180-200\mathrm{GHz}$ power mixer. (c) Series resistance, $R$ ( $\Omega$ ), and (d)                |    |
|      | reactance, $ jX $ ( $\Omega$ ), of the 150 fF radial capacitance compared with that of the PDK                 |    |
|      | model of the $8.5\mu\mathrm{m}$ × $8.5\mu\mathrm{m}$ MIMcap                                                    | 64 |
| 4.13 | Layout of the power mixer device. The source via is pulled to one side in $M2\text{-}M4$                       |    |
|      | and then built up<br>to $M7$ (not shown). The substrate connection is not shown                                | 65 |
| 4.14 | Circuit diagram of the implemented 120GHz frequency doubler [2]                                                | 67 |
| 4.15 | Circuit diagram of the last stage of the implemented V-band amplifier chain                                    | 68 |
| 4.16 | Simulated S-parameter performance of the $60\mathrm{GHz}$ fundamental amplifier chain in                       |    |
|      | the source path of the power mixer.                                                                            | 69 |

| 4.17 | (a) Circuit diagram of the RTPS. It uses a broadside coupled line 3 dB coupler and        |    |
|------|-------------------------------------------------------------------------------------------|----|
|      | CLC reflective terminations. (b) Cross-section of the coupled line 3 dB coupler. A        |    |
|      | slow-wave technique has been used for achieving high even mode impedance as well          |    |
|      | as simplifying the design procedure                                                       | 70 |
| 4.18 | Simulated characteristic impedance of the coupler in the even and odd modes. $W =$        |    |
|      | $12 \mu\mathrm{m},  W_{slot} = 10 \mu\mathrm{m}$ and $L_{slot}$ is varied                 | 70 |
| 4.19 | (a) Simulated effective varactor capacitance for different signal amplitudes showing      |    |
|      | large signal effects. Larger signal amplitude across the varactor causes a reduction      |    |
|      | in capacitance range and tuning ratio (b) Simulated phase shift of the RTPS under         |    |
|      | large signal operation for different input power levels                                   | 73 |
| 4.20 | (a) Block diagram of the variable gain amplifier. (b) Circuit diagram of the ampli-       |    |
|      | fiers. (c) Circuit diagram of the variable attenuator                                     | 76 |
| 4.21 | Circuit diagram of the impedance transforming Marchand Balun with a passive               |    |
|      | cancellation network between the balanced outputs. The passive network improves           |    |
|      | the output return losses and the isolation between output ports. $\dots$                  | 77 |
| 4.22 | Simulated performance of the Marchand Balun including (a) input and output return         |    |
|      | losses and insertion loss, and (b) phase and amplitude imbalance                          | 78 |
| 4.23 | Die photo of the test structure implemented to characterize the $60\mathrm{GHz}$ RTPS and |    |
|      | VGA cascade                                                                               | 79 |
| 4.24 | Insertion (a) phase shift and (b) gain of the RTPS-VGA breakout versus RTPS               |    |
|      | control voltage at 60 and 63 GHz. The VGA has been set to maximum gain ( $V_{VGA} =$      |    |
|      | 0 V)                                                                                      | 80 |
| 4.25 | Insertion (a) gain and (b) phase-shift of the RTPS-VGA breakout versus VGA control        |    |
|      | voltage. For these measurements, $V_{RTPS} = 0  \text{V}.  \dots  \dots  \dots  \dots$    | 81 |
| 4.26 | Measurement setup of the power mixer prototype with (a) an Erickson power meter           |    |
|      | and (b) a second harmonic mixer downconverter (SHMD)                                      | 82 |

| 4.27 | Third harmonic output power of the implemented power mixer vs. output frequency                       |    |
|------|-------------------------------------------------------------------------------------------------------|----|
|      | measured with the power meter setup. The output power is plotted for the optimal                      |    |
|      | input phase at each frequency with the VGA set to maximum gain. The original                          |    |
|      | power mixer simulation, and the simulation with updated amplifier models that                         |    |
|      | capture the degradation in fundamental power available to the mixer are shown. For                    |    |
|      | comparison, a simulation of a frequency tripler driven by amplifiers with a frequency                 |    |
|      | mismatch similar to the power mixer implementation is also shown. The annotated                       |    |
|      | input power is at the fundamental frequency.                                                          | 83 |
| 4.28 | Output power at $189\mathrm{GHz}$ vs. input power at $63\mathrm{GHz}$ for different input phase shift |    |
|      | (varying $V_{RTPS}$ ) measured with the power mixer setup. The VGA is set to maximum                  |    |
|      | gain                                                                                                  | 84 |
| 4.29 | Variation in output power of the power mixer at 189 GHz as the relative phase shift                   |    |
|      | between the input is changed by varying $V_{RTPS}$ . $V_{VGA}$ is adjusted to compensate              |    |
|      | for the RTPS gain variation across $V_{RTPS}$ settings. For these measurements, the                   |    |
|      | calibrated fundamental input power at the probe tip is $+12\mathrm{dBm.}$                             | 85 |
| 4.30 | Comparison of the output power of the $180-200\mathrm{GHz}$ power mixer with other $130\mathrm{nm}$   |    |
|      | CMOS signal sources at the same CMOS technology node                                                  | 87 |
| 4.31 | Visual summary of Table 5.1. Even with frequency mismatch in the driver, the power                    |    |
|      | mixer has one of the highest power across technology nodes for $f_{out}/f_T > 2$ amongst              |    |
|      | mulitpliers and oscillator-arrays (per-element power). Mismatch can be corrected in a                 |    |
|      | re-spin by using updated EM models for drivers to achieve an efficiency comparable                    |    |
|      | to the multiplier and multiplier-arrays trend in efficiency.                                          | 90 |
| 5.1  | Conventional Type-II Second Order PLL                                                                 | 92 |
| 5.2  | (a) Sub-sampling phase detector. (b) Timing Diagram                                                   | 94 |
| 5.3  | Sub-sampling phase detector + charge pump profile and comparison with conven-                         |    |
|      | tional PFD+ charge pump. SSPD has higher gain and restricted monotonicity                             | 95 |
| 5.4  | Sub-sampling PLL architecture with acquisition aid                                                    | 96 |

| 5.5  | Acquisition process in SSPLL and conventional PLL. The red dot denoting the in-        |
|------|----------------------------------------------------------------------------------------|
|      | stantaneous phase error drifts across the phase-detector profile as the VCO frequency  |
|      | varies. Without a separate acquisition loop, the sign of feedback changes repeatedly   |
|      | in an SSPLL, while it remains the same for the conventional PLL 96 $$                  |
| 5.6  | Reference buffer power consumption reduction [3]. The on time of $M_n$ and $M_p$ are   |
|      | offset to reduce short circuit current                                                 |
| 5.7  | Spur reduction technique [3]. Dummy switch and load for the VCO tank to prevent        |
|      | changing tank impedance during sampling                                                |
| 5.8  | Spur reduction technique [3]. To prevent spurs from periodic charge injection from     |
|      | the sampler into the VCO, there should be no difference in steady state between        |
|      | the VCO voltage at the start of the tracking phase and the sampled value stored on     |
|      | the capacitor. For this, the other reference edge is locked to a VCO zero crossing     |
|      | through a DLL. The noise on this edge is immaterial, so it can be generated using      |
|      | the low power short-circuit eliminating circuit of Fig. 6.6                            |
| 5.9  | Profile of equivalent phase detector in ILCM. This is taken from the simulated profile |
|      | under large thick pulse injection from [4]                                             |
| 5.10 | From [4]. The injection locking path and the DLL are on simultaenously (blue time      |
|      | period). While the Type-I path works, the DLL matches the reference edges of the       |
|      | injection path and the integral path. When the injection path is gated (red time       |
|      | period), the paths in red are connected, and the accumulated phase error due to        |
|      | frequency drift alone is corrected                                                     |
| 5.11 | Basic concept of the Reference-Sampling PLL. It combines the functionality of the      |
|      | power-hungry clock and isolation buffers to eliminate the dual noise penalty of two    |
|      | separate buffers. This helps realize very low jitter for a given power consumption     |
|      | while demonstrating low spur                                                           |
| 5.12 | Proposed sampled phase detector and timing diagram. The VCO is used to evaluate        |
|      | phase error in the loop by sampling voltages on the reference sinewave. The relevant   |
|      | sample pertaining to the phase error is selected using the Sample Edge Selection       |
|      | Circuit (SESCi). The noise of SESCi does not affect the sampled value 109              |

| 5.13 | Profile of proposed sampled phase detector. The profile is montonic over $\pm \pi_{VCO}$    |
|------|---------------------------------------------------------------------------------------------|
|      | unlike ILCM and SSPD which are only monotonic over $0.5 \pm \pi_{VCO}$ 110                  |
| 5.14 | Half rate multiplexing of samples in each differential path. This scheme reduces the        |
|      | area for sample and hold capacitances                                                       |
| 5.15 | Sample Edge Selection Circuit (SESCi) selects the sample relevant to estimating the         |
|      | VCO phase error. The timing diagram shows that the reference buffer edge RefBuff            |
|      | in the SESCi does not contribute noise to the $TRACK$ signal                                |
| 5.16 | Architecture and block diagram of proposed PLL                                              |
| 5.17 | Different sources of noise in the proposed architecture                                     |
| 5.18 | A detailed analysis of noise contributions and power consumption in the RSPLL               |
|      | normalized to 2.21 GHz                                                                      |
| 5.19 | A detailed analysis of noise contributions and power consumption in the SSPLL               |
|      | normalized to 2.21 GHz                                                                      |
| 5.20 | (a) When reference buffer generates an advanced edge with respect to reference              |
|      | sinewave zero-crossing, the PD resolves large VCO edge delays as advances. (b)              |
|      | Phase detector profile remains monotonic with advanced reference buffer edge. (a)           |
|      | Delayed reference buffer edge resolves advances as delays (b) Phase detector profile        |
|      | with delayed reference buffer edge is also monotonic                                        |
| 5.21 | (a) Delay due to gates after $TRACK$ is generated results in sampling due to $DelayedTRACK$ |
|      | (b) Timing diagram when $RefBuff$ is at $t=0$ . All samples are -ve and there is            |
|      | no steady state solution for the feedback loop. (c) By advancing $RefBuff$ we can           |
|      | compensate for $\tau_{delay}$ ensure a solution to the feedback loop                        |
| 5.22 | Effect of reference buffer noise on the PD profile creates two zones of uncertain           |
|      | samples $\pi$ apart. The timing diagram is shown for noise on an advanced $RefBuff$         |
|      | edge. By tuning $RefBuff$ position, we can position the zones of uncertainty for            |
|      | higher robustness and place the ideal locking point in the center of the range $129$        |
| 5.23 | Common centroid layout of differential Sampling Phase Detector                              |
| 5.24 | SESCi component sizing and circuit diagram                                                  |
| 5.25 | Reference buffer with tunable delay                                                         |
| 5.26 | Circuit diagram of Frequency Tracking Loop (FTL) with CMFB                                  |

| 5.27 | Simulation setup for loop noise                                                                         | 41 |
|------|---------------------------------------------------------------------------------------------------------|----|
| 5.28 | Comparison of simulated PLL noise with measured performance at 2.55 GHz 1                               | 42 |
| 5.29 | Phase noise corresponding to the best measured FoM $_{VCO}$ at 2.3 GHz ( $P_{dc}=3.26mW$ ,              |    |
|      | $\text{FoM}_{VCO,1MHz} = 186.7$ ) and 2.55 GHz ( $P_{dc} = 1.6mW$ , $\text{FoM}_{VCO,1MHz} = 184.2$ ) 1 | 43 |
| 5.30 | Comparison of measured VCO performance with simulated phase noise at $2.3~\mathrm{GHz.}$ . $1$          | 44 |
| 5.31 | Measured performance of the RSPLL at 2.55 GHz. The RSPLL shows a record $\mathrm{FoM}_j$                |    |
|      | of $-253.5\mathrm{dB}$ amongst explicit PLLs, with the lowest reference spur of $-67\mathrm{dBc}$ for   |    |
|      | such a low jitter-power figure-of-merit. The $25\mathrm{MHz}$ spur is a result of half-rate             |    |
|      | multiplexing, and is not intrinsic to the RSPLL architecture                                            | 46 |
| 5.32 | Measured performance across carrier frequency across three different samples $1$                        | 47 |
| 5.33 | Measured performance across VCO supply voltage variation. The desired lock fre-                         |    |
|      | quency is 2.55 GHz                                                                                      | 48 |
| 5.34 | The RSPLL architecture combines the best aspects of subsampling PLL and ILCM                            |    |
|      | architectures to show significant improvement in the jitter versus spur performance                     |    |
|      | space                                                                                                   | 48 |
| 5.35 | Possible approach to modifying loop bandwidth without changing area (total sam-                         |    |
|      | pling cap size remains same), power consumption (total switch size remains same)                        |    |
|      | or output noise                                                                                         | 49 |
| 6.1  | FMCW with triangular chirp. (a) Stationary object. (b) Max. range for stationary                        |    |
|      | object. (c) Moving object and doppler shift. (d) Max. velocity for moving object at                     |    |
|      | a given distance                                                                                        | 54 |
| 6.2  | EO-PLL photoelectric interface                                                                          | 57 |
| 6.3  | EO-PLL block diagram                                                                                    | 59 |
| 6.4  | Proposed EO-PLL architecture with mixer-based phase detector                                            | 64 |
| 6.5  | Alcatel A1905LMI laser tuning curve                                                                     | 65 |
| 6.6  | Mixer-based phase detector profile with limited monotonicity                                            | 66 |
| 6.7  | Photograph of measurement setup to verify performance of mixer based continuous                         |    |
|      | analog correction loop with bandwidth equal to reference frequency                                      | 71 |
| 6.8  | Second harmonic cancellation in mixer-based EO-PLL                                                      | 73 |
| 6.9  | EO-PLL around laser phase shifter with electronic phase-domain integrator 1                             | 75 |

| 6.10 | EO-PLL around laser phase shifter with downconverted ramp locked by conventional             |
|------|----------------------------------------------------------------------------------------------|
|      | electronic FMCW PLL techniques                                                               |
| A.1  | Single carrier (a) up- and down- converted by matched LO (b) leaking into the other          |
|      | bins when up-converted and down- converted by mismatched LO 199                              |
| A.2  | SNR from Inter Carrier Interference (ICI) due to Carrier Frequency Offset (CFO).             |
|      | CFO is represented on the x-axis as a fraction $\alpha$ of the inter-carrier spacing [5] 199 |
| A.3  | The limit on LO phase noise is calculated assuming that the OFDM data is just                |
|      | carriers                                                                                     |

### List of Tables

| 2.1 | Comparison of Reported CMOS-Based Sources Operating Above 200 GHz          | 40  |
|-----|----------------------------------------------------------------------------|-----|
| 3.1 | Recent CMOS Multipliers beyond 100 GHz                                     | 49  |
| 4.1 | Recent CMOS and SiGe Sources beyond 150 GHz                                | 89  |
| 4.2 | Recent CMOS and SiGe Sources beyond 150 GHz (continued)                    | 90  |
| 5.1 | Comparison of RSPLL with state-of-the-art integer-N frequency synthesizers | 151 |

#### ACKOWLEDGMENT

This thesis has been a long time coming, and it wouldn't be here without the support and encouragement, both academic and interpersonal, of my advisor Dr. Harish Krishnaswamy. Not many advisors would have the patience he has had for my particular brand of maladjustment, and I will always be grateful to him for providing an environment in which I could experiment with ideas, and individuality. I am also grateful for the stability he provided when the waters were tempestuous. Thank you for helping me develop as a researcher, for teaching me, and for leading by example on how to surmount fearsome challenges.

As they say, it takes a village. My mentors at IBM - Alberto Valdes Garcia, Mark Ferriss, and Bodhi Sadhu, and at Bell Labs, Mike Zierdt - thank you for giving me an opportunity to work with you and to learn from you. I am deeply grateful to Mark for all of the time he has spent in discussions with me, and for the intuitive insights he helped reveal.

The foundation all that is decent and worthwhile in me - my mum, my dad, my brother, my grandparents, and Babaji. They were there for every crisis of faith, when I thought the thread slipped through my fingers in the dark. My aunts Tanuja and Anupa, and the two people who raised me - Mini Ma and Jija, for their love and affection and concern. And though they will probably not read this, Mekhala, Mayu, Shinjini, Anshuman, Lorna Aunty, David Uncle and Susan.

My wonderful, loving and dear friends Karthik Swaminathan and Nandhini Chandramoorthy, and the Hobbes to my Calvin, Markus Nussbaum. They were my family away from home, and made life so much more equitable and importantly, happy.

Thanks are due to my labmates for all that they have helped me learn, in particular Jeffrey Chuang, Anandaroop Chakrabarti, Negar Reiskarimian and Mahmood Dastjerdi. Tolga Dinc deserves a special mention, for the long hours spent taping out with two weeks to go, and bonding the most stubborn chips with two days to go. Sohail Ahsan, though new to the lab, has been great fun to work with on the recent photonics research, and I wish him the best of luck.

To Ma, Papa, Dadda, my grandparents and Baba

#### Chapter 1

#### Introduction

The need for clean and powerful signal generation is ubiquitous, with applications spanning the spectrum from RF to mm-Wave, to into and beyond the terahertz-gap. RF applications including mobile telephony and microprocessors have effectively harnessed mixed-signal integration in CMOS to realize robust on-chip signal sources calibrated against adverse ambient conditions. Combined with low cost and high yield, the CMOS component of hand-held devices costs a few cents per part per million parts. This low cost, and integrated digital processing, make CMOS an attractive option for applications like high-resolution imaging and ranging, spectroscopy and the emerging 5G communication space. However, these applications are expected to impose far more exacting specifications on power and spectral purity at much higher frequencies than conventional RF synthesizers.

RF-based detection and ranging (RADAR) techniques when expanded to mmWave and even sub-mmWave frequencies can enable centimeters to millimeters of resolution, which prove useful in navigation systems and satellite imaging. mm-Wave and sub-mmWave systems can leverage windows in the absorption spectrum at 94 GHz, 140 GHz and 220 GHz for imaging in poor visibility conditions [6, 7]. 24 GHz and 77 GHz systems have found use in automotive radars for parking assistance and automatic cruise control [8]. Fine resolution ranging applications can require very clean signals to minimize reciprocal mixing between the transmitted and received signal, while long range applications require powerful beams.

Expanding ranging techniques to the optical domain can yield very fine angular and distance resolution. Apart from automotive applications, this finds particular use in 3-D imaging. How-



Figure 1.1: Phase noise and  $\text{FoM}_{PN} = \left(\frac{f_c}{\Delta f_c}\right)^2 / (L_{\Delta f_c} P_{DC,mW})$  for state-of-the-art CMOS PLLs for different carrier frequencies  $f_c$  at different offsets  $\Delta f_c$ . Maximum phase noise limits for m-QAM at the OFDM-based standard's carrier spacing is shown for PLLs with loop bandwidth of 100 kHz. 3GPP to WiGig impose phase noise requirements at increasing carrier spacing  $\Delta f_c$ . Requirements for evolving OFDM-based standards are met by CMOS for at least 64 QAM, but with poorer  $\text{FOM}_{PN}$ .

ever, cost-effective tunable optical sources are temperature-sensitive and have nonlinear tuning profiles, rendering precise frequency modulations or 'chirps' untenable. Locking them to an electronic reference through an electro-optical PLL, and electronically calibrating the control signal for nonlinearity and ambient sensitivity, can make such chirps possible. To avoid high-cost modular implementations, we seek to leverage the twin advantages of silicon-intensive integration and low-cost high-yield towards developing a single-chip solution that uses on-chip signal processing and phased arrays to generate precise and robust chirps for an electronically-steerable fine LIDAR beam.

For next generation communication networks that promise increased connectivity through improved accessibility, lower latency and better reliability, innovations at the network layer and in signal processing will require concomitant advances in hardware. A shift to higher frequencies will bring with it the advantages of higher bandwidth and new mobile networks are expected to incorporate a large mm-Wave component. Even in current network deployments, point-to-point highly directional mm-wave links, implemented through small form-factor phased arrays, are used



Figure 1.2: Output power and efficiency roll-off in CMOS with increasing frequency.

as part of the wireless backhaul of data, augmenting or even replacing wireline links where fiber is hard to route. This has led to a vast research effort in recent years in mm-Wave communication systems, peppered with innovations at the system level and in transciever chain building blocks such as power amplifiers, power combiners, low-noise amplifiers, mixers and of course synthesizers, much of which will be leveraged in the coming 5-G network evolution. For widespread adoption it is necessary that such technology is developed in a low cost, reliable process like CMOS that enables integrated mixed signal processing. Fig. 1.1 shows the phase noise requirement of different m-QAM constellations on a multi-carrier scheme like OFDM, along with the best reported phase noise of CMOS synthesizers for different carrier frequencies. For any m-QAM in OFDM, the integrated noise satisfying EVM is proportional to the in-band noise at carrier spacing  $\Delta f_c$  in the standard, and the loop bandwidth. For a detailed derivation, see Appendix A. CMOS has been able to keep up with the tighter demands for phase noise for different standards though with reducing margin. The figure also means that 5-G, with its push towards 100x end-user data rates, is expected to need even cleaner sources that enable transmission of dense modulation on powerful high frequency carriers.

A consequence of expanding connectivity in future networks will be high rates of data transfer between devices and the need for large throughput in processing this data. Wide bandwidth wireless chip-to-chip interconnects have been proposed as a replacement for wired connections to enable high density implementations which can pack in intensive functionality in a small foot print. In addition to communication applications, high bandwidth chip-to-chip interconnects can also prove very useful for fast data transfer inside data centers and in dense and powerful computing infrastructure. Owing to the short distance and controlled environment in which these links operate it is possible that these interconnects may even be implemented at sub-mmWave frequencies. With relation to CMOS, power generation in CMOS at sub-mmWave frequencies is very challenging as this frequency range is beyond the transistor cut-off frequency and relies on weak nonlinearities for harmonic generation. While the short wavelength leads to small antenna size and signal can be boosted through integrated phased arrays, the weak individual transmitting element remains a bottleneck. The latter can drive up the number of elements needed in an array presenting serious on-chip signal routing and distribution issues. Fig. 1.2 shows output power and efficiencies in recent sub-mmWave CMOS sources. Although not the focus of this thesis, receiver design for high sensitivity and efficiency also remain open research problems. Fully integrated CMOS centimeter range wireless communication links at 135 GHz and 260 GHz with an energy efficiency of 10pJ/bit [9] and 30 pJ/bit [10] have been demonstrated. While promising, this is still short of state-of-the-art energy efficiencies of 4 pJ/bit in on-chip RF interconnects ( $\leq 1$  cm) and optical interconnects ( $\geq 10 \text{ cm}$ ) [11]..

Another application in the sub-mmWave range is spectroscopy. [12–16]. Certain molecules have resonances in the 100-300 GHz spectrum, and development of clean, wideband synthesizers will find applications in medical screening such as skin cancer detection, defensive technology such as the detection of poisonous gases like sarin and methyl chloride, product evaluation such as non-destructive pharmaceutical testing and as a useful investigative tool for material scientists, and perhaps even a "tricorder" in the decades to come [17], [18]. Indeed development in CMOS, where both sample excitation and data processing can be performed in a miniaturized area, could inform a drastic improvement in the portability and affordability of such technology across healthcare and industry.

All this is to say that the coming generation of applications will present unconventional challenges for transistor technologies - whether it is to squeeze more performance in the conventionally used spectrum, already wrung dry, or signal generation and system design in the relatively emptier mm-Wave to sub-mmWave spectrum, most of the latter falling in the so-called "Terahertz Gap" shown in Fig. 1.3 [1]. Indeed, transistor scaling and innovative device physics leading to



Figure 1.3: (a) "Terahertz Gap" between Electronics and Photonics, [1] (b) Comparison of maximum oscillation frequency  $(f_{max})$  across scaling technology nodes.

new transistor topologies have yielded higher cut-off frequencies in CMOS, as in Fig. 1.3, though still lagging well behind SiGe and III-V semiconductors. Similarly the noise in CMOS remains high compared to bipolar SiGe technology. Quick roll-off in the gain and reducing power supply can be formidable obstacles for performance at mm-Wave to sub-mmWave frequencies. To avoid multimodule solutions with functionality partitioned across different technologies, CMOS must be pushed out of its comfort zone, and technology scaling has to have accompanying breakthroughs in design approaches not only at the system but also at the block level. In this thesis, while not targeting a specific application, we seek to formulate the obstacles in synthesizing high frequency, high power and low noise signals in CMOS and construct a coherent design methodology to address them.

The first half of this thesis deals with signal synthesis and power generation in CMOS in the so-called Terahertz Gap. This region lies outside the range of frequencies where the MOS transistor has gain, and necessitates frequency generation through harmonic extraction either as harmonic oscillators or as frequency multipliers.

Chapter 2 presents a technique to implement an oscillator operating at the maximum oscillation frequency of a given technology, hereby denoted as  $f_{max}$ . The conventional definition of  $f_{max}$  includes only the limit defined by device gain falling to one as a result of internal losses in a transistor. As such oscillators cannot startup beyond technology  $f_{max}$ . Due to the high loss in

passive components, especially with increasing frequency, the actual achievable maximum fundamental frequency for oscillators is lower. Interestingly, the presented design methodology results in a closed form expression relating the actual achievable maximum oscillation frequency or effective  $f_{max}$  (denoted as  $f_{max,eff}$ ) of a technology to the transistor loss as well as the quality factor of the passives in the BEOL irrespective of the specific topology of passives used in the resonant network. This chapter also forays into harmonic generation for signals beyond technology  $f_{max}$  and dwells on determining the optimal load for maximizing harmonic power transfer from the oscillator under large signal conditions. Using the discussed techniques two oscillators generating second harmonic outputs at 200 GHz and 300 GHz are demonstrated in a 45 nm SOI-CMOS technology with  $f_{max}$  of 200 GHz.

In Chapter 3, we study the alternate approach of harmonic generation through frequency multipliers. Harmonic power from multipliers is a product of the harmonic current generated by the transistor, and the load to which it is delivered. The most common harmonic current generation technique in frequency multipliers is through a MOS device biased such that the voltage swing of the input fundamental frequency sinusoid at the gate generates a clipped sinusoidal current. Most recent works focus on finding the input bias that yields an optimal duty cycle of the clipped sinusoid to maximize content at a desired harmonic. The optimal load is then usually obtained by a large signal load pull simulation. Several factors can limit the output load such as the I-V conduction loss, output matching network loss, gate resistance and substrate loss. We investigate the dominant factor that affects the output load and derive a scaling trend for output harmonic power generated from conventional frequency multipliers. A 135 GHz frequency doubler generating +4 dBm output power at 1.1  $\times$  technology  $f_{max}$  is implemented in 130 nm CMOS to verify the presented theoretical formulation. This study is also useful because it shows that substrate loss is the limiting loss mechanism for output load. Therefore, short of techniques that can circumvent substrate loss, increase in harmonic power from a given device size will only be got from increasing harmonic current generated through the device transconductance. This necessitates using transistor nonlinearity more effectively than simple sinusoidal clipping and duty cycle optimization, and forms the basis of the work in the next chapter.

In Chapter 4, we present a power mixer topology that generates the third harmonic current by mixing the first and second harmonic signals by feeding them to a MOS transistor at the source and gate respectively. By controlling the amplitudes and relative phase shifts of the input signals we optimize how the device moves through various regions of operation so that it generates waveforms with more third harmonic content than the conventional frequency multipliers. This nonlinearity engineering technique is shown to generate three times more current than a frequency tripler for the same fundamental swing. Given that the optimal load is limited by substrate loss in both cases translates to nine times higher output power. This chapter also does a rigorous comparative study of the two approaches in terms of conversion gain and long term reliability, the latter is especially important in decreasing supply voltages of scaled CMOS processes. A 180 GHz power mixer generating -13.5 dBm at  $1.5 \times$  technology  $f_{max}$  is demonstrated in a 130 nm CMOS process.

The second part of this thesis is devoted to low noise phase locked loops (PLLs) at RF frequencies, and to electro-optic PLLs (EO-PLLs) in developing robust Light-Detection-And-Ranging (LIDAR) systems.

Chapter 6 reviews state-of-the-art techniques for low jitter FoM CMOS PLLs based on LC-VCOs including subsampling PLLs (SSPLL) and injection locked clock multipliers (ILCM) with high-multiplication ratio. The SSPLL greatly attenuates the noise from the phase detector and charge pump by removing the N<sup>2</sup> multiplication of the phase noise from these blocks. In ILCMs, these blocks are altogether eliminated. The dominant source of noise in these two techniques is the in-band reference buffer, the noise of which is still multiplied by N<sup>2</sup>, and the out-band VCO noise. These two blocks also consume the most power. In recent work, several LC-VCOs topologies with phase-noise-and-power FoM close to theoretical achievable limit have been demonstrated, and the reference logic remains the last hurdle. A new type-I RF PLL approach, the reference-sampling PLL (RSPLL) is presented which eliminates the reference buffer and samples the reference sinewave using the fast VCO waveform. Of the multiple samples generated, the relevant sample is selected every N VCO cycles through a very low noise and low power selection logic. Further, the isolation of the VCO tank from any varying loads helps simultaneously achiever low spur without additional circuitry. Using this approach, a 2.05-2.55 GHz RSPLL achieves a record FoM<sub>i</sub> of -253.5dB among explicit PLLs and reference spur <-67 dBc. This work achieves record numbers across architectures in the low-jitter versus low-spur performance space.

Chapter 6 discusses ongoing work that brings together concepts developed for signal generation towards a hybrid CMOS-Optical approach to Frequency-Modulated Continuous-Wave (FMCW)

LIDAR. In any electro-optic PLL (EO-PLL), a Mach-Zender Interferometer (MZI) is used as a delay discriminator to generate low frequency signals with information about optical performance. The low frequency signal can be processed by the electrical component of the loop to provide corrective behavior to the optical component. In the case of FMCW LIDAR, the photodiode generates a low frequency signal with frequency proportional to the modulation slope and the discriminator delay which can then be locked to an electrical reference. Based on the settling behavior of the triangular frequency chirp, very high reference frequencies may be required. For the photodiode output to match the reference for a given modulation slope, this yields impractically large optical delays in the MZI. We break the tradeoff of settling behavior and on-chip optical delay by proposing a novel PLL architecture which provides continuous error correction and has a bandwidth (settling) equal to reference, rather than a factor of ten lower as in conventional PLLs. This allows us to reduce MZI form factor by ten times. A discrete implementation of the proposed PLL is demonstrated. The chapter concludes by proposing future work on several new architectures for EO-PLLs that address different challenges in EO-PLL implementations.

Chapter 8 concludes the dissertation with a summary of the academic contributions of the thesis.

#### Chapter 2

# THz Frequency Synthesis: Maximum Gain Ring Oscillator

#### 2.1 Technologies for high frequency signal generation

Sub-mmWave signals have been dominantly generated using III-V compound semiconductors, [19–26]. In [19], using a 250nm InP HBT technology with a maximum frequency of oscillation  $f_{max} > 800 \text{GHz}$ , the authors have demonstrated fundamental oscillators at 573GHz and 412.9GHz with -19 dBm and -5.6 dBm of output power respectively. Signal generation in the high-mmWave/sub-mmWave range has also been successfully demonstrated using heterostructure barrier varactor (HBV) multipliers [27–31]. Most recently, the authors of [30] have demonstrated an HBV quintupler generating 60mW of power at 175GHz. Signal sources based on GaAs Schottky diode multipliers have also been constructed [32–34]. The authors in [32] generate more than 0dBm of power in the 840-900 GHz range using a frequency multiplier chain. The authors in [33] have shown -17.5 dBm of output power at 2.58 THz.

SiGe technologies have also become an active avenue for signal generation in the mmWave and sub-mmWave regimes, [35–40]. The authors in [37] construct a push-push oscillator generating -4.5dBm at 190GHz in a SiGe:C bipolar technology with an  $f_{max}$  of 275GHz. A 278GHz push-push oscillator in the same technology has been shown in [40]. Recently, in a 250nm SiGe BiCMOS process ( $f_{max}$ =435GHz), the authors in [35] have shown a spatially power combined array of four frequency multipliers generating an EIRP of -17dBm at 820GHz.

Modern CMOS technology nodes have an  $f_{max}$  of about 150-300GHz (130nm-65nm CMOS). Loss in passive components is also quite high at these frequencies, and consequently these technologies cannot provide amplification in the sub-mmave/THz region. Current research focuses on building oscillators close to and below  $f_{max}$  (typically below 200GHz), and extracting harmonics to extend the output frequency  $(f_{out})$  beyond  $f_{max}$ . The authors in [41] use the push-push technique with a cross-coupled oscillator (XCO) in 45nm CMOS to generate a second harmonic at 410 GHz with -49dBm power. The authors in [42] feed the four 90° out-of-phase outputs of a quadrature XCO to a rectification circuit that feeds the fourth harmonic to an external 50 $\Omega$  load. This shifts the burden of generating harmonic power from the oscillator core to the rectification circuit. A -46dBm signal at 324GHz is shown in 90nm CMOS. In [43], the authors combine the fourth harmonic current at the source node of the coupling transistors of a quadrature XCO to generate a -36.6dBm 553GHz signal in 45nm CMOS. They include a matching network between the common source node and the antenna to increase the fourth harmonic power transmitted. In [44], the inductance of a regular XCO is split between the core and the buffer stage. This mutually couples back the signal from the buffer to the core to improve the oscillator loop gain and thus improve the fundamental oscillation frequency to 300.5GHz in 65nm CMOS. The authors of [45] have designed a travelling wave-oscillator with a 300GHz second harmonic output in 45nm SOI CMOS. The geometry of the oscillator and the ground plane is such that the structure is radiative at the second harmonic. They mutually lock and spatially combine such distributed active radiators to improve the total radiated power. The total radiated power of a  $2 \times 1$  array is -19dBm, and that of a  $2 \times 2$  array is -10.9dBm. In [46], the same authors show a 282GHz  $4 \times 4$  beam-steering array of distributed active radiators in 45nm SOI CMOS with 80° electronic beam-scanning in each of the orthogonal axes in 2D space. A total power of -7.2dBm is radiated broadside with an EIRP of 9.4dBm while consuming around 800mW of DC power. A tuning range of 3.2% has been shown around 280GHz. Finally, the authors of [47] attempt to maximize  $f_{osc}$  by increasing the small-signal startup gain of a ring oscillator. They do so by maximizing the small-signal power added per stage. They have shown a 256GHz third harmonic oscillator with -17dBm output power in 130nm CMOS and a 482GHz third harmonic oscillator with -7.9dBm output power in 65nm CMOS. It should be noted that in these works, the output is at the center of the ring oscillator and the on chip routing losses of a practical implementation are not included. In [48], the authors show a VCO in a 65nm CMOS process with a fourth-harmonic output power of about -1.2 dBm at 292GHz. It has a tuning range of 4.5% around 290GHz using variable coupling between injection locked oscillators. In [49] the authors use a 90nm CMOS process and generate -6.5 dBm of power at 228GHz by extracting the third harmonic from a differential VCO in a Colpitts configuration. They show a tuning range of 7%. Finally, the authors of [50] have shown a doubler in 45nm SOI CMOS generating -3 to 0dBm of output power in the 170 - 195GHz range with a conversion gain between -2 to -1dB.

This brief overview suggests that while CMOS based oscillators and sources are now able to operate in the high mmWave/THz regime, additional research needs to be done to produce output frequencies and power comparable to compound semiconductor technologies. Our work introduces a topology that maximizes the frequency of oscillation achievable in a given technology through a ring oscillator configuration with appropriately-designed passive matching networks. There have been many works in the microwave community that lend significant insight into maximizing oscillation frequency and output power. The author in [51] discusses the existence of a specific voltage ratio between the drain and the gate terminal that maximizes the power gain from the gate to the drain, thus improving small-signal loop gain. Our work improves on this by standardizing the methodology of arriving at the passive network to achieve this gain. Our work also explicitly accounts for passive loss in a closed form fashion, and allows for any phase shift per stage for a multi-stage ring oscillator with arbitrary number of stages. This in turn allows power combining of different number of stages for larger output power. The authors in [52] similarly attempt to improve loop gain of a ring oscillator but maximize added power from the gate to the drain. Other works of interest that work on maximizing power gain across a ring oscillator include [53].

This chapter is organized as follows. Section 2.2 discusses the modeling of the IBM 45nm SOI CMOS active and passive devices. Section 2.3 deals with the MGRO concept. Section 2.4 discusses the design and optimization of networks that extract harmonic output power from the MGRO. It also discusses the possibility of spurious-mode oscillations which must be suppressed. Section 2.5 discusses the measurement setup and the performance of the fabricated chips. Section 2.6 concludes the chapter.

#### 2.2 45nm SOI CMOS Technology Characterization

#### 2.2.1 Active Devices

The IBM 45nm SOI CMOS technology offers floating-body (FB) devices with a channel length of 40nm and body-contacted (BC) devices with a channel length of 56nm. The BC devices are slower than their FB counterparts due to their longer channel length and the additional capacitive parasitics introduced by the body contact. We have measured a  $10 \times 1\mu\text{m}/56\text{nm}$  BC, a  $20 \times 1\mu\text{m}/56\text{nm}$  BC, and a  $10 \times 1\mu\text{m}/40\text{nm}$  FB device. The gate-over-device layout used in this work is shown in Fig. 2.1 for a BC device. For the FB device, the body-contact notches, [54], are absent. The thick via walls for the gate and the drain reduce wiring resistance. The layout allows a symmetric doubly contacted gate. The concern of capacitance between the gate and drain via walls is mitigated by placing them sufficiently far apart, while a possible increase in drain resistance is reduced by using the winged structures in  $M_2$  as shown. The fringing capactance from the wings to gate does not add to the gate-drain via capacitance.

Fig. 2.2 shows a simplified BSIMSOI 4.x model (inside the dotted box [55]) to demonstrate some of the important components that model high-frequency effects. The components that are absent in the PDK model, namely the non-quasi static gate resistance  $r_{iir}$ , and  $r_{BDB}$  and  $r_{BSB}$  in the body resistance network, have been marked in dashed boxes. Wiring self-  $(L_{gwire}, L_{dwire})$  and mutual-  $(K_{gd})$  inductances, resistance  $(r_{gwire})$  and capacitances  $(C_{gswire}, C_{gdwire})$  are located outside the dotted box containing the BSIMSOI model, and are also not included in the PDK. The mutual inductance is primarily from the coupling between the drain and gate vias. Wiring parasitics are determined through parasitic extraction using the Calibre extraction tool and through EM simulations using the IE3D field solver [56].

Open-Short Deembedding, [57], was used to deembed the pad and feedlines from the device test structures that were measured. The sufficiency of Open-Short Deembedding up to 67GHz has been verified by EM simulations, which confirm that the pad and the feedline can be treated as lumped components at these frequencies. The deembedding is done up to a reference plane located at the top of the gate and drain vias. The measured Mason's Unilateral Gain (U) and  $h_{21}$  for the  $10 \times 1 \mu \text{m}/56 \text{nm}$  BC device at a current density  $J = 0.56 \text{mA}/\mu \text{m}$  is shown in Fig. 2.3(a). To determine  $f_{max}$  and cut-off frequency  $f_T$ , a 20dB per decade line is extrapolated from the measured



Figure 2.1: Layout of a BC NFET Device. This allows the gate to be doubly contacted in a symmetric fashion.



Figure 2.2: The model for the NFET BC device. In, the FB version, there is no 'b' node and  $r_{body}$  is absent. The FB node is ' $b_1$ '



Figure 2.3: (a) A line fitted by linear regression to the measured U and  $h_{21}$  plots of the  $10 \times 1 \mu \text{m}/56 \text{nm}$  BC device with a  $J = 0.56 m \text{A}/\mu \text{m}$ . (b) Comparison of the  $f_{max}$  across J from measurement for the  $10 \times 1 \mu \text{m}/56 \text{nm}$  and  $20 \times 1 \mu \text{m}/56 \text{nm}$  BC devices and the  $10 \times 1 \mu \text{m}/40 \text{nm}$  FB device. (c) Measured  $f_T$  across J for all three devices.

U and  $|h_{21}|$ . A similar approach is applied for the other two devices as well. The extrapolated  $f_{max}$  and  $f_T$  for all three devices across current density J are shown in Fig. 2.3(b) and (c). The peak  $f_{max}$  of the  $10 \times 1 \mu \text{m}/56 \text{nm}$  BC device is around 210GHz for a  $J = 0.3 \text{mA}/\mu \text{m}$ , and that of the  $10 \times 1 \mu \text{m}/40 \text{nm}$  FB device is about 250GHz for a  $J = 0.4 \text{mA}/\mu \text{m}$ .

Mason's Unilateral Gain U of devices can have behavior different from a 20dB per decade slope ( [58], [59]). So it is important to predict figures like  $f_{max}$  etc. from models rather than by simply extrapolating the measured plot. In [60], we have fit the model described in Fig. 2.2 to determine  $f_{max}$  and  $f_T$  for the  $10 \times 1 \mu \text{m}/56 \text{nm}$  BC device at a  $J = 0.56 \text{mA}/\mu \text{m}$  and the results are consistent with those presented here.

Recently, the author of [61] has reported an  $f_{max}$  of 430GHz in this technology for a floating-body (FB)  $20 \times 0.4 \mu \text{m}/41 \text{nm}$  NFET device deembedded up to the gate and the drain's first metal contact. This finger width is not available in the PDK. Furthermore, the parasitic inductance, capacitance and resistance of the gate and drain vias and interconnects have a significant impact on the device  $f_{max}$ . The authors of [50] have reported an  $f_{max}$  of  $200 \pm 5 \text{GHz}$  at a current density (J) of 0.2 to  $0.5 \text{mA}/\mu \text{m}$  for a  $30 \times 1 \mu \text{m}/40 \text{nm}$  FB device referenced to the top of the drain and gate vias. The gate-over-device layout used in this work enables a symmetric doubly-contacted gate, and thus allows us to achieve a higher  $f_{max}$  of 250GHz for floating-body devices, and a similar  $f_{max}$  of about 200GHz for the slower body-contacted devices. To put this into perspective, in [44], the author has constructed an oscillator with a fundamental frequency of 300GHz in the TSMC 65nm CMOS technology. Consequently, the  $f_{max}$  of this technology is at least greater than 300GHz.

#### 2.2.2 Transmission Line

The inductors in our design are implemented with high characteristic impedance transmission lines. There are no accurate models in the PDK for transmission lines. Consequently, they are simulated in IE3D. Fig. 2.4 shows the measured characteristics of a test Coplanar Waveguide (CPW). We implement most of the transmission lines as CPWs for two reasons: (i) it minimizes interference with nearby components, and (ii) the metal density in the side-shield vias helps meet metal-fill requirements. The CPW's  $7.5\mu$ m wide signal line is in the  $2.1\mu$ m thick top aluminium layer (LB). A ground plane approximately  $6.96\mu$ m below the signal layer is formed by tying the three bottom-most copper layers (M1, M2 and M3) to reduce loss. The metal thicknesses are



Figure 2.4: Measured performance of a  $70\Omega$  CPW in 45nm SOI CMOS. A comparison with the simulated performance in IE3D is also shown.

 $0.136\mu\text{m}$ ,  $0.144\mu\text{m}$  and  $0.144\mu\text{m}$  respectively, and the distances between M1 and M2, and M2 and M3 are both  $0.115\mu\text{m}$ . The side ground shields at a separation of  $12.5\mu\text{m}$  from the signal line are formed by tying the top layer in LB to the ground plane through a metal and via pattern designed to satisfy metal fill requirements. A characteristic impedance of  $Z_c = 70\Omega$  is seen as expected from simulation. The wavelength  $\lambda$  and attenuation constant  $\alpha$  also match up well with simulation up to 65GHz. However, as indicated in [50], [62] and [63], the authors have documented an increase in transmission line loss beyond 100GHz that is not captured in EM simulations. EM simulations predict an  $\alpha$  of -1.17dB/mm at 200GHz and -1.78dB/mm at 300GHz for this line.

#### 2.2.3 Capacitor (VNCAP)

In this PDK, the Vertical Natural Capacitor (VNCAP), also known as metal-oxide-metal (MOM) or finger capacitor, is available. To test the performance and the accuracy of models for these capacitors, we implemented test structures for 70fF and 214fF VNCAPs. They are implemented between the metals M3 and B3. The capacitors, like the transmission line and devices, were deembedded using the Open-Short technique.

The PDK model for the capacitor is only valid when it is contacted at the centre of the bottommost metal M3 for both plates. In our work, the signals are in the topmost aluminium LB layer and we construct a via to contact the capacitor plates at B3. The deembedding reference plane is placed at the top of the LB via in the test structures. Both capacitors are modeled (Fig. 2.5(a)) by augmenting the PDK capacitor with series inductance and resistance on either plate (5.9pH and



Figure 2.5: (a) Model used for the VNCAP in 45nm SOI CMOS. La and Ra are added to capture the via effect and the inductances on either plate are coupled. (b) IE3D VNCAP simulation setup.

 $0.5\Omega$  for the 214fF capacitor, and 7pH and  $2\Omega$  for the 70fF capacitor). This captures the effect of the via's resistance and inductance on the Quality Factor and self resonance, as well as any series inductance unmodeled in the PDK. The two via inductances are also mutually coupled with a k of 0.7 for both. The smaller capacitor and its vias are also modeled with IE3D (Fig. 2.5(b)). The IE3D model needs to be augmented with 4pH and  $2\Omega$  on either plate. This is possibly because the vias internal and external to the VNCAP are modeled as continuous bars in IE3D as the simulation of discrete vias is too cumbersome. We show a comparison between the measured performance and the models in Fig. 2.6. The 214fF and 70fF capacitors have self resonance frequencies of 80GHz and 115GHz respectively indicating that they are actually inductive at the frequency of interest.

# 2.3 Maximum Gain Ring Oscillator Topology

While oscillators are large-signal circuits, small-signal concepts of  $f_{max}$  and maximum power gain are relevant to compute the startup gain of an oscillator topology and consequently its maximum oscillation frequency.

The conventional way in which high-frequency CMOS oscillators are built is the cross-coupled



Figure 2.6: Measured and simulated series capacitance (=  $\frac{Imy_{11}}{\omega}$ ) and Quality Factor (=  $\frac{Im(y_{11})}{Re(y_{11})}$ ) of (a) a 214fF and (b) a 70fF VNCAP in 45nm SOI CMOS.



Figure 2.7: (a) Cross-coupled oscillator as a two-stage tuned ring oscillator with a single inter-stage matching inductor. (b) MGRO concept.

oscillator topology (XCO) (Fig. 2.7(a)). The XCO may be viewed as a ring of two tuned amplifiers with a single effective inductor acting as the matching component between the amplifiers. This is insufficient to convert the input impedance of the second amplifier to the conjugately matched load impedance needed by the first to deliver the maximum available power gain (MAG), [64] and [65]. Consequently, the XCO achieves sub-optimal power gain and can be expected to exhibit a maximum oscillation frequency that is below the  $f_{max}$  of the technology. The MGRO, as shown in Fig. 2.7(b), rectifies the matching problem by including additional reactive components in the matching network between the stages. If the Y-parameters of each device are represented by [Y] with  $Y_{ij} = G_{ij} + jB_{ij}$ ,

$$Y_{in} = \frac{1}{Z_{in}} = Y_{11} + G_v Y_{12} \tag{2.1}$$

$$Y_{load} = \frac{1}{Z_{load}} = -\left(\frac{Y_{21}}{G_v} + Y_{22}\right) \tag{2.2}$$

$$PG = \frac{P_{out}}{P_{in}} = \frac{|G_v|^2 Re(Y_{load})}{Re(Y_{in})}$$

$$= \frac{-\left((A_v^2 + B_v^2)G_{22} + A_v G_{21} + B_v B_{21}\right)}{(G_{11} + A_v G_{12} - B_v B_{12})}$$
(2.3)

where  $G_v = \frac{v_2}{v_1} = A_v + jB_v$  is the voltage gain across the device.  $P_{in}$  is the power flowing into the gate (port 1) of the device,  $P_{out}$  is the power delivered out of the drain (port 2), and  $P'_{out}$  is the power delivered after the matching network <sup>1</sup>.  $Y_{in}$  is the input impedance of the device and  $Y_{load}$  is the impedance to which the input impedance of the subsequent stage is transformed by the matching network. In the absence of passive loss,  $P'_{out} = P_{out}$  and hence, device power gain, and consequently oscillator startup gain, can be maximized by determining the complex  $G_v$  value that maximizes PG in (2.3) to MAG. This may be performed either analytically or numerically if the device Y-parameters are known. Fig. 2.8 depicts power gain (PG) circles on the real-imaginary plane of  $G_v$  at 100GHz for a 10 × 1 $\mu$ m/56nm body-contacted NMOS device including layout parasitics for a current density of J = 0.56mA/ $\mu$ m. The maximum achievable power gain

<sup>&</sup>lt;sup>1</sup>This formulation and the developed design methodology is general and can be used for device configurations other than common-source, or even configurations involving multiple devices.



Figure 2.8: Power gain (PG) circles on the  $G_v$  plane at 100GHz for a 10 × 1 $\mu$ m/56nm body-contacted NMOS device including estimated layout parasitics in a common source configuration. Current Density,  $J = 0.56 \text{mA}/\mu\text{m}$ .

is 3.3dB. Once the optimal  $G_v$  is known, the matching network can be designed to transform  $Y_{in}$  to the requisite  $Y_{load}$ . It is evident that the maximum oscillation frequency of this methodology, namely the highest frequency at which the PG maximized to the MAG crosses 0dB, is the  $f_{max}$  of the device.

At lower frequencies, there exist regions of the contour plot where  $P_{in} < 0$  and  $P_{out} > 0$ . The circular area where  $P_{out} > 0$  extends below the  $P_{in} = 0$  line. In such regions, the device "self-oscillates". In other words, with the appropriate source and load terminations, the internal  $C_{gd}$  feedback of the device is sufficient to cause oscillation. Such behavior disappears as the frequency approaches a higher fraction of  $f_{max}$ .

#### 2.3.1 Accounting for Passive Element Loss

When the passive matching components contain loss, the maximum oscillation frequency will be lower than  $f_{max}$  and the optimal  $G_v$  value might change. Furthermore, additional guidelines are required for the design of the matching network in such a case. We can use Foster's second theorem

to arrive at the following expressions, [66]

$$Y_{load} = \frac{2P_{out} + 4j\omega(E_{E,Y_{in}} + E_{E,M} - E_{H,Y_{in}} - E_{H,M})}{|G_v v_1|^2}$$
(2.4)

where  $E_{E,M}$  and  $E_{H,M}$  are the stored electric and magnetic energies in the matching network respectively.  $E_{E,Y_{in}}$  and  $E_{H,Y_{in}}$  are the stored electric and magnetic energies in the looking-in impedance of the subsequent stage, which from Fig. 2.7 is  $Y_{in}$ .  $P_{out}$  is the total loss in  $Y_{load}$ , and is a sum of the losses in the matching network  $(P_{loss,M})$  and the subsequent stage  $(P'_{out})$ .

$$P_{out} = P_{loss,M} + P'_{out} \tag{2.5}$$

Assuming that the matching network may be constructed with inductors and capacitors of quality factors  $Q_L$  and  $Q_C$  respectively, the total loss in the matching network is

$$P_{loss,M} = 2\omega \frac{E_{H,M}}{Q_L} + 2\omega \frac{E_{E,M}}{Q_C}$$
 (2.6)

Also, the loss and the stored energies in  $Y_{in}$  of the subsequent stage are related by the following equations.

$$Y_{in} = \frac{2P_{out'} + 4j\omega(E_{E,Y_{in}} - E_{H,Y_{in}})}{|v'_{1}|^{2}}$$

$$|v'_{1}|^{2} = \frac{2P'_{out}}{Re(Y_{in})}$$

$$(E_{E,Y_{in}} - E_{H,Y_{in}}) = \frac{Im(Y_{in})}{4\omega} \times \frac{2P'_{out}}{Re(Y_{in})}$$
(2.7)

Using equations (2.6) and (2.7) in the expression for  $Y_{load}$  in (2.4), we derive an expression for the stored energy in the matching network.

$$E_{E,M} - E_{H,M} = \left(\frac{|G_v v_1|^2 Im\left(Y_{load}\right)}{4\omega} - \frac{2P'_{out} Im\left(Y_{in}\right)}{4\omega Re\left(Y_{in}\right)}\right)$$
(2.8)

If the RHS of (2.8) is positive, the matching network must store net electric energy; otherwise the net stored energy is magnetic. As shown in (2.6),  $P_{loss,M} = 2\omega \frac{E_{H,M}}{Q_L} + 2\omega \frac{E_{E,M}}{Q_C}$ . If the net stored energy is magnetic, to minimize  $P_{loss,M}$ , only inductors should be used. The use of capacitors will require an increase in the stored magnetic energy to compensate for the non-zero  $E_{E,M}$ . Similarly,



Figure 2.9: Power gain (PG') circles on the  $G_v$  plane at 100GHz for the device in Fig. 2.8 with Inductor Quality Factor taken to be 14 at 100GHz. Current Density,  $J = 0.56 \text{mA}/\mu\text{m}$ .

if the net stored energy is electric, only capacitors should be used. For MOSFET devices, the matching network must typically store net magnetic energy. Furthermore, all inductor matching networks are preferable because they enable convenient gate and drain biasing, and because the quality factor of inductors at mmWave and sub-mmWave frequencies exceeds that of integrated capacitors. Assuming an all inductor network (setting  $E_{E,M} = 0$  in (2.8)), using the value of  $P_{out}$  from (2.3), and setting  $v_1 = 1 \angle 0^{\circ}$  without loss of generality, the net power gain including matching network loss can be written as

$$P'_{out} = P_{out} - P_{loss,M} = P_{out} - 2\omega \frac{E_{H,M}}{Q_L}$$
(2.9)

$$PG' = \frac{P'_{out}}{P_{in}} = \frac{|G_v|^2}{2} \left[ \frac{Q_L Re(Y_{load}) + Im(Y_{load})}{Q_L Re(Y_{in}) + Im(Y_{in})} \right]$$
(2.10)

Oscillator startup gain is now maximized by determining the  $G_v$  value that maximizes PG' in (2.10) to MAG' while restricting oneself to  $G_v$  values that result in net stored magnetic energy. MAG' is the value of the device MAG when it is conjugately matched using lossy inductors. This is the first time that closed form design guidelines that maximize startup gain in the presence of

passive loss have been derived. MAG' can be thought of as a new technology metric that quantifies achievable device power gain in a ring-oscillator configuration taking into account active and passive device limitations. It is interesting that for such matching networks employing only inductors,  $P_{loss}$  and consequently, PG' and MAG' are independent of network topology (L-match, pi-match etc.) or number of inductors, but only depend on  $Q_L$ . PG' contour plots similar to Fig. 2.8 have been plotted to maximize PG' with respect to  $G_v$  in Fig. 2.9.

If the optimal  $G_v$  determined from PG is used to design the matching network with lossy elements instead of  $G_v$  from loss-inclusive methodology, the oscillator could still work at 100 GHz but with a startup power gain of 1.8 dB (less by 0.82 dB from loss-inclusive methodology MAG'). The important contribution is that in a regime where start-up is very tentative, 0.82 dB gain advantage obtained by using the loss-inclusive optimization is valuable. Obtaining this MAG' in a prototype is limited only by accuracy of device models and passives which can be corrected by improved modeling and respinning the die.

Fig. 2.10(a) depicts the MAG' of the  $10 \times 1\mu\text{m}/56\text{nm}$  body-contacted NMOS device (including layout parasitics) versus frequency for different  $Q_L$  values. The annotated  $Q_L$  values are for a frequency of 100GHz, and  $Q_L$  is assumed to scale linearly with frequency. Fig. 2.10(b) depicts the maximum oscillation frequency (namely the frequencies at which MAG' = 1) of the MGRO topology as a function of  $Q_L$ , and compares it with the simulated maximum oscillation frequency of the conventional XCO topology. A significant enhancement is observed.

#### 2.3.2 Determining the Matching Network

While the discussion thus far has focused on maximizing power gain for startup, the total phase shift in the ring must also equal an integral multiple of  $2\pi$  at the desired frequency. While two reactances are sufficient to achieve the impedance transformation for optimal  $G_v$ , a third reactance is required to arbitrarily control the phase shift  $\phi$  across a stage. This allows flexibility in the number of stages N ( $N\phi = 2n\pi$ ,  $n \in \mathbb{Z}$ ), allowing freedom in choosing the harmonic to be extracted. Assuming a T matching network as shown in Fig. 2.7, the three matching reactances  $X_1$ ,  $X_2$ , and  $X_3$  may be determined using the following expressions  $^2$ :

<sup>&</sup>lt;sup>2</sup>Closed form solutions are also available.



Figure 2.10: (a) MAG' of the  $10 \times 1\mu\text{m}/56\text{nm}$  body-contacted NMOS device versus frequency for different  $Q_L$  values. The annotated  $Q_L$  values are for a frequency of 100GHz, and  $Q_L$  is assumed to scale linearly with frequency. (b) Maximum oscillation frequencies of the device in the MGRO and XCO topologies as a function of  $Q_L$ .

$$\frac{\left[\left(Re(Z_{in}) + jIm(Z_{in})\right) + jX_3\right]jX_2}{\left(Re(Z_{in}) + jIm(Z_{in})\right) + jX_3 + jX_2} + jX_1 = Z_{load}$$
(2.11)

$$\angle \left[ (G_v Y_{load}) \frac{jX_2}{jX_2 + (jX_3 + Z_{in})} \times Z_{in} \right] = \phi$$

$$(2.12)$$

The requirement for  $X_1$ ,  $X_2$ , and  $X_3$  to be positive (all-inductor matching network) does place some restrictions on  $\phi$ . This, however, need not restrict our choice of the harmonic to be extracted, as discussed in Section 2.4.1.

In practice, at high-mmWave and terahertz frequencies, these inductors would be implemented using transmission lines, which in general do not behave as pure two-port reactances. This deviation from the purely-inductive assumption can be minimized by choosing transmission lines with high characteristic impedance (e.g. microstrips with small widths). The trade-off is that narrow microstrips exhibit poor  $Q_L$ . Alternately, the equations above can be modified appropriately to capture transmission-line behavior of the three matching components.

The authors in [47] attempt to maximize startup by maximizing the added power ( $P_{added} = P_{out} - P_{in}$ ) per stage. In comparison, our work provides simple expressions, as opposed to an iterative procedure, to arrive at the interstage passive network required to achieve the maximum small signal gain. Our methodology also allows any desired phase shift per stage, allowing flexibility in the number of stages in the ring and the harmonic to be extracted. Our analysis also includes the effect of passive loss on the maximum achievable fundamental oscillation frequency.

#### 2.3.3 Circuit-Level Implementation

Using the theory discussed, we have implemented a 108GHz and a 158GHz oscillator in the 45nm SOI CMOS process. By the contour plotting technique, a  $10 \times 1 \mu \text{m}/56n\text{m}$  body-contacted device, including layout parasitics, has a maximum startup gain of 2.62dB at 100GHz in the presence of passive loss, and 0.62dB at 150GHz (with  $Q_L = 14$  at 100GHz). It is to the credit of the technique described that the latter oscillator works well in measurement despite the marginal small signal gain.

With  $G_v$  for maximum power gain thus determined,  $Y_{in}$  and  $Y_{load}$  can be calculated. We now solve equations (2.11) and (2.12) with different values of  $\phi = \frac{360}{m}^{\circ}$  ( $m \in Z$ ) until we get positive values of  $X_1$ ,  $X_2$  and  $X_3$ .  $\phi = 90^{\circ}$  gives a purely inductive matching network but  $\phi = 180^{\circ}$  does



Figure 2.11: Circuit diagram of the 216GHz signal source.



Figure 2.12: Circuit diagram of the 316GHz signal source.



Figure 2.13: Chip microphotographs of the (a) 216GHz and (b) 316GHz signal sources.

not. Consequently, we implemented a four-stage oscillator. The choices of phase and number of stages made here are formalized in Section 2.4. To extract the second harmonic, we combine every second stage at the top of  $X_2$ , as shown in Figs. 2.11 and 2.12, to create virtual grounds for the fundamental signal. This yields two outputs 180° out of phase at the second harmonic. Ignoring the box labeled Large Signal Impedance Transformation for now, they are phase shifted as shown and combined using a Wilkinson combiner. The bias for the 108GHz oscillator is provided through the bias-tee of the probe. The bias for the 158GHz oscillator is provided by the mirrored current source isolated in AC through the  $\frac{\lambda}{4}$  (at 300GHz) transmission line.

The chip microphotographs are shown in Fig. 2.13. The 216GHz signal source occupies 0.83mm  $\times$  0.63mm of chip area, and the 316GHz source occupies 0.75mm  $\times$  0.45mm.

### 2.4 Harmonic Power Extraction and Spurious Mode Suppression

Signals beyond the  $f_{max}$  must be derived by harmonic extraction. In this section we look at the relationship between power extraction at a desired harmonic and the design of the MGRO and the harmonic extraction network. As discussed in the foregoing section, a key consideration in determining the number of stages remains the feasibility of obtaining a per-stage phase shift which allows for an inductive matching network. The presence of too many stages encourages spurious oscillations as any frequency at which the total phase shift through the loop is a multiple of  $2\pi$  is a potential mode, [67]. These concerns need to be taken into account when choosing the number of stages, the harmonic to be extracted and while designing the extraction network. The harmonic power delivered to the load can be optimized by using large-signal impedance-transforming and power-combining networks. Loss in the extraction networks must also be accounted for. Depending on the number of stages used, filtering may also be required to remove other harmonics that flow into the extraction network.

#### 2.4.1 Extracting the Kth harmonic

To extract the Kth harmonic from an MGRO, K-stages with a per-stage phase shift of  $\phi = \frac{2\pi}{K}$  can be used. All the nodes above the  $X_2$  impedance are joined together as shown in Fig. 2.14. Any other lower harmonics would get suppressed in the extraction path. If  $\phi = \frac{2\pi}{K}$  does not yield



Figure 2.14: Ring Oscillator with K-stages to extract the Kth harmonic.



Figure 2.15: Ring Oscillator with  $K \times p$ -stages to extract the Kth harmonic.  $\phi = \frac{2\pi}{K \times p}$  to ensure an inductive matching network.

a purely inductive interstage matching network, we could implement a per-stage phase shift of  $\phi = \frac{2m\pi}{K}$  (meZ) more amenable to this requirement in the K-stage ring. However, lower harmonics would not be inherently suppressed in the extraction path, and filtering would be required. We could also extract the Kth harmonic of a  $\frac{K}{m}$ -stage oscillator with a per-stage phase shift of  $\frac{2m\pi}{K}$ . All the stages would be tied together above  $X_2$  as before, and the output current would consist of the  $\frac{K}{m}$ th harmonic and it's multiples. Once again, we would need a filter to reject all but the desired multiple. The advantage of this approach is that the ring is smaller and susceptible to fewer spurious modes.

If we need to reduce the  $\phi$  below  $\frac{2\pi}{K}$  to satisfy the inductive network requirements but would like to avoid filtering of unwanted harmonics, we can extract the Kth harmonic from a  $K \times p$ -stage oscillator as shown in Fig. 2.15, with a per-stage phase shift of  $\phi = \frac{2\pi}{K \times p}$ . Every pth stage is joined, yielding p outputs carrying current at the Kth harmonic and separated in phase by  $\frac{2\pi}{p}$ . Appropriate phase-shifting networks are placed in each of the K paths before power combining. The increase in the number of stages in the ring does render it more susceptible to spurious modes, but these can be suppressed through the judicious introduction of loss in the extraction path without affecting the power of the desired harmonic as is discussed later in this section. The phase-shifting and power-combining networks do introduce additional loss that must be taken into consideration. This approach has been exploited in the implemented prototypes described earlier. The second harmonic (K = 2) has been extracted from a four stage MGRO (p = 2), yielding a per-stage phase shift of 90° which satisfies the inductive matching network requirement.

#### 2.4.2 Increasing the Output Power

To increase the ouput power in the aforementioned approaches, we could increase the number of elements in the ring while maintaining the same phase shift. Such an implementation is more susceptible to spurious modes because of the larger number of elements in the ring. Instead, we can also synchronize multiple MGROs and combine them on chip. To avoid the losses of an on chip power combining network we could also combine multiple synchronized MGROs through free-space power combining, as in [45]. This would, of course, require the interfacing of the synchronized MGROs with on-chip/off-chip antennas.



Figure 2.16: Load-pull plot of the  $K=2,\,p=2$  oscillator at 100 GHz.

#### 2.4.3 Maximizing the harmonic output power

We now show that each stage of an MGRO needs an optimal output impedance  $Z_{opt}$  at the Kth harmonic, as shown in the inset of Figs. 2.14 and 2.15, to facilitiate maximum power transfer. This value is determined through load pull simulations. For the implemented MGRO, we place an impedance transformation network composed of transmission lines that converts  $50\Omega$  to the optimal impedance for each of the two outputs. This network is inside the box marked Large Signal Impedance Transform in Figs. 2.11 and 2.12. A load-pull plot of the power as a function of transformed optimal impedance is shown in Fig. 2.16. The peak value is different from the simulated value in Fig. 2.20 in the section on measurement, Section 2.5 by 11dB. This difference is due to the loss in the matching network, the phase shifting line, the combiner and the transparent pads.

Changing the number of elements in the ring while maintaining the phase shift (and hence the design of each stage) causes the overall optimal impedance to scale inversely due to the parallel combination of the harmonic currents produced by each stage.  $Z_{opt}$  also scales inversely with device size, as is the case in all power-generating circuits. To minimize loss in the large-signal impedance transformation network, device size and MGRO parameters should be chosen so that the overall optimal impedance is as close to  $50\Omega$  as possible.

#### 2.4.4 Suppression of spurious modes

Reducing the number of stages in the oscillator might not curtail all spurious modes. The harmonic extraction network is normally not seen by the fundamental oscillation as it exists in the common-mode path. However, the phase shifts in some spurious modes may cause the harmonic extraction network to appear appended to the inter-stage passive matching network. These modes can be eliminated by the judicious placement of suppression resistors in the extraction network such that they do not appear in the signal path. To this end, we have placed the R-C networks in the box labeled  $Common-Mode\ Suppression$  in Figs. 2.11 and 2.12.



Figure 2.17: 216GHz oscillator frequency and power measurement setup with WR-3 secondharmonic mixer-downconverter (SHMD).



Figure 2.18: 216GHz oscillator frequency and power measurement setup with WR-3 SHMD including two additional WR3 bends and one additional WR3 straight. This measurement is to measure the loss of the latter three components.



Figure 2.19: Power meter measurement setup for the 216GHz oscillator. The inset shows the measured loss of the additional two WR3 bends and the 1" WR3 straight. Details are provided in the text.

#### 2.5 Oscillator Measurement

The signal sources are tested in chip-on-board configuration through on-chip probing. We first discuss the measurement of the 216GHz oscillator. The setup is shown in Fig. 2.17. A GGB WR-5 probe was used in conjunction with a 200 – 320GHz WR-3 second harmonic downconversion mixer (SHMD) from Virginia Diodes Inc through a WR5-WR3 taper. Supply is provided through the Bias-T of the probe. The WR-3 mixer comprises a second-harmonic mixer and an LO amplifier chain that multiplies a 25-40 GHz input by a factor of four. The value of the probe loss is determined at about 2.5dB from the measured data provided by GGB. The measured conversion loss of the downconverter falls from 10.5 - 8.8dB over 212 - 220GHz. The signal at the IF port is small and so two amplifiers are placed in the IF path. The total loss of the amplifiers, cables and connectors in the IF path is measured in a separate setup. The measured oscillation frequency and calibrated output power of the source is depicted in Fig. 2.20. The oscillator generates -14.4dBm of power at 216.2GHz while drawing 57.5mW of DC power. The downconverted spectrum at this power level is depicted in Fig. 2.21. A phase noise measurement of the 216GHz oscillator is performed by measuring the phase noise of the downconverted IF spectrum, yielding -92dBc/Hz at an offset of 10MHz. A phase noise analysis of tuned ring oscillators has been quantified in [68]. A phase noise analysis of MGROs as they approach  $f_{max}$  in operation frequency is an interesting topic for future research.

We verify these results through a power meter measurement in the configuration shown in Fig. 2.19. The power meter setup requires some additional waveguide components, the losses of which are now discussed. The loss of the 1" WR10-WR10 straight and the 1" WR3-WR10 taper are measured by VDI at 0.2 and 0.35dB respectively. Only the two WR3 bends and the 1" WR3 straight losses remain unknown. Custom Microwave predicts their total loss at 0.4dB per inch. Assuming negligible reflections, we determine their loss by adding them to the WR3 mixer-downconverter measurement setup as shown in Fig. 2.18. The difference between the measured value in this setup with and without the three components is shown as an inset in Fig. 2.19. The average value is 2.5dB. These losses are corrected for in the power meter measurement to obtain the other two curves in the Fig. 2.20. A close match is seen between the power measured by the SHMD and the power meter verifying the accuracy of the mixer downconversion and its linearity with respect to the RF port.



Figure 2.20: 216GHz oscillator frequency and power measured by a WR-3 SHMD and an Erickson PM4 power meter. Measurement details are in the text.

Next we measure the 316GHz oscillator using the same SHMD in the configuration of Fig. 2.18 but eliminate the WR3-WR5 taper as a WR3 probe is used. The measured frequency and power are shown in Fig. 2.22. A very good match to the simulated power and frequency is seen. A maximum power of -21dBm is measured at 316.5GHz while drawing a DC power of 46.4mW.

A comparison of recent works is shown in Table 2.1.  $f_{max}$  is strongly dependent on the device layout and may vary from one implementation to another even when designed using the same technology. However, in this table we have indicated some  $f_{max}$  values from our own measurements and those reported in previous works to facilitate a judicious comparison. While the techniques described in this chapter enable the functioning of the implemented oscillators at frequencies close to the limits dictated by active and passive device characteristics, several techniques discussed earlier can be exploited to further increase output power. The use of the 40nm floating-body devices would improve performance. As was discussed earlier, the losses in the large-signal impedance transformation network, phase-shifting line and Wilkinson power combiner are substantial (as high as 11dB in the 200GHz prototype). The phase-shifting line and Wilkinson combiner are necessitated by the choice of a four-element ring with  $90^{o}$  per-stage phase shift. Through appropriate device



Figure 2.21: (a) Measured downconverted spectrum of the 216GHz source for a DC power of 57.5mW, output frequency of 216.2GHz and calibrated output power of -14.4dBm. (b) Measured phase noise.



Figure 2.22: 316GHz oscillator frequency and output power measured using the WR3 SHMD.

sizing and MGRO design (number of stages and phase shift through each stage), these losses can be eliminated.

#### 2.6 Conclusion

A Maximum-Gain Ring Oscillator topology that maximizes small signal gain per stage to the maximum available gain, MAG, through inter-stage matching and also takes into account passive loss has been presented. The topology also affords freedom in choosing the number of stages while satisfying the MAG condition. The robustness of such an approach is verified through the implementation of 108GHz and 158GHz oscillators using the 56-nm body-contacted devices ( $f_{max} \approx 200$ GHz) of IBM's 45nm SOI CMOS technology with per-stage small signal gains as low as 2.62dB and 0.62dB respectively.

Power at frequencies beyond  $f_{max}$  can only be generated by harmonic extraction. The impact of the choice of the output harmonic on the design of the fundamental ring and the extraction network has been looked at in detail. In particular, the second harmonic of the implemented oscillators has been extracted. A network that transforms the output load to an optimal impedance to maximize harmonic power transfer is determined through load-pull simulations, and an on-chip

Table 2.1: Comparison of Reported CMOS-Based Sources Operating Above  $200\,\mathrm{GHz}$ 

| Ref. | Tech.           | $f_{max}$   | $f_{osc}$ | $f_{out}$   | $P_{out}$                        | P.Noise               | $P_{DC}$ |
|------|-----------------|-------------|-----------|-------------|----------------------------------|-----------------------|----------|
|      |                 | (GHz)       | (GHz)     | (GHz)       | (dBm)                            | $(\mathrm{dBc/Hz})$   | (mW)     |
| [44] | 65nm            | >300        | 300.5     | 300.5       | N/A                              | N/A                   | 3.7      |
| [42] | 90nm            | N/A         | 81        | 324         | -46                              | $-91@10\mathrm{MHz}$  | 12       |
| [41] | 45nm            | N/A         | 205       | 410         | -49 (rad.)                       | N/A                   | N/A      |
| [43] | 45nm            | N/A         | 133.3     | 533         | -36.5                            | N/A                   | 64       |
| [69] | 130nm           | N/A         | 96        | 192         | -20                              | $-100@10\mathrm{MHz}$ | 16.5     |
| [47] | 130nm           | 135         | 128       | 256         | -17                              | -88@1MHz              | 71       |
| [45] | 45nm-SOI        | >250        | 150       | 300         | $-10.9(2 \times 2 \text{array})$ | NA                    | 74.8     |
|      | (40nm FB-NMOS)  | (our meas.) |           |             | (EIRP=-1)                        |                       |          |
| [46] | 45nm-SOI        | >250        | 90.2-98.5 | 276-285     | $-7.2(4 \times 4 \text{array})$  | NA                    | 817      |
|      | (40nm FB-NMOS)  | (our meas.) |           | (3.2%)      | (EIRP=9.4)                       |                       |          |
| [47] | $65\mathrm{nm}$ | >300 [44]   | 160.7     | 482         | -7.9                             | -76@1MHz              | 61       |
| [48] | 65nm            | >300 [44]   | 70.8-74   | 283-296     | $-1.2@290 { m GHz}$              | -78@1MHz              | 325      |
|      |                 |             |           | (4.4%)      |                                  |                       |          |
| [49] | 90nm            | 175         | 72.5-77.7 | 217.5-233.3 | $-6.5@228 \mathrm{GHz}$          | $-90.5@1\mathrm{MHz}$ | 86.4     |
|      |                 |             |           | (7%)        |                                  |                       |          |
| This | 45nm-SOI        | 200         | 108.1     | 216.2       | -14.4                            | $-93@10\mathrm{MHz}$  | 57.5     |
| work | (56nm BC-NMOS)  | (our meas.) |           |             |                                  |                       |          |
| This | 45nm-SOI        | 200         | 158.3     | 316.5       | -21                              | NA                    | 46.4     |
| work | (56nm BC-NMOS)  | (our meas.) |           |             |                                  |                       |          |

power combining network that sums the power from multiple stages has been implemented. The oscillators generate -14.4 dBm of output power at 216.2 GHz and -21 dBm of output power at 316.5GHz while drawing 57.5mW and 46.4mW of DC power respectively.

Techniques to interface CMOS terahertz sources with on-chip or off-chip radiators, and the relative merits of these two approaches remain a point of interest.

# Chapter 3

# THz Power Generation: Frequency Multipliers

Modern CMOS nodes have an  $f_{max}$  from 130 – 300 GHz (Fig. 3.1). So, current CMOS high mm-Wave sources use device nonlinearity, in oscillators ([70], [47]) or frequency multipliers [50,71–74], to generate harmonics in this range.

We also present a theoretical analysis of a balanced doubler to identify fundamental performance limits across frequency and technology. In frequency multipliers output power is determined by device harmonic current and optimal load. Existing analyses in [50], [73] and [72] discuss increasing harmonic content through duty-cycle optimizations. To our knowledge, this is the first attempt at identifying the optimal load impedance of mm-Wave frequency multipliers. We also obtain a closed form expression for the output power purely in terms of technology metrics.

## 3.1 Scaling Trends in CMOS Multipliers

The conventional balanced frequency doubler in Fig. 3.2 has two transistors biased for nonlinear (low duty-cycle) operation and driven by anti-phase signals at the fundamental frequency. The second harmonic is extracted and the fundamental and odd harmonics are suppressed by connecting the drains before driving the load. A second harmonic trap (quarter-wavelength open stub) at the inputs forces the gate voltages at the second harmonic to zero, as the second harmonic current



Figure 3.1: (a) Scaling of supply voltage and cutoff frequency ( $f_T$ ) across CMOS nodes. (b) Comparison of this work with state-of-the-art CMOS sources across output frequency normalized to technology  $f_T$ .

generated by a second harmonic voltage at the gate (fed back through  $C_{gd}$ ) is detrimental to output power [50]. For the theoretical study, the devices are sized to drive 50  $\Omega$  optimally without impedance transformation to minimize output side loss.

In the device model in Fig. 3.2, aside from the nonlinear drain-source current  $I_{ds}$ , all capacitances and resistances are linear. Then, if  $\omega_{in} (C_{gs} + C_{gd}) r_g \ll 1$ , the input power is,

$$P_{in,\omega_{in}} \approx 2 \times \frac{v_{amp}^2}{2} \omega_{in}^2 r_g \left( C_{gs} + C_{gd} \right)^2, \tag{3.1}$$

where  $\omega_{in}$ ,  $r_g$ ,  $C_{gs}$ ,  $C_{gd}$  and  $v_{amp}$  are input frequency, gate resistance, gate-source and gate-drain capacitances and fundamental amplitude respectively.

The output power is determined by the second harmonic current from the devices and the optimal load impedance. Assuming a piecewise-linear model for device current in Fig. 3.2, based on the gate bias  $V_{GS,DC}$ , the device transconductance generates a clipped sine-wave current. Authors in [50] and [73] show that the optimal duty-cycle to maximize second harmonic current is 35% if the peak positive gate voltage swing is set by the gate-source voltage limit for long-term reliability  $(2 \times V_{dd} = 3 \text{ V})$  between any two device terminals). Given a threshold voltage of 0.45 V, this dictates negative gate bias voltages. The doubler in [50] uses 0 V gate bias. Additionally, gate-drain voltage swing limits must be considered. Simulated conversion gain at peak output power across bias, when both gate-drain and gate-source swing limits are considered, is relatively constant. For simplicity,



Figure 3.2: Circuit diagram of a simple balanced CMOS frequency doubler.

#### a 0 V bias is used here.

Several mechanisms potentially limit the optimal load resistance  $R_{opt}$  (or alternately, the optimal device size that delivers maximum power to 50  $\Omega$ ). The dependence of the device current on the drain voltage through channel length modulation or triode operation yields an optimal load resistance that we term  $R_{opt,DC}$ . As it arises from device DC I-V characteristics,  $R_{opt,DC}$  is largely frequency independent. Other mechanisms include losses in drain inductance  $(R_{p,Ld})$ , substrate resistance and the gate resistance (seen from the drain through  $C_{gd}$ ).

Fig. 3.3(a) depicts load-pull simulations of a balanced doubler in 130 nm CMOS across frequency. The optimal device size to drive 50  $\Omega$  is shown with the various effects sequentially enabled.  $R_{opt,DC}$  arises from I-V characteristics and cannot be turned off but substrate and gate resistance can be disabled in design kit models.  $R_{opt,DC}$  is indeed frequency independent, while gate resistance and losses in  $L_d$  produce negligible effect. Interestingly, beyond an output frequency of 60 GHz and unlike fundamental-frequency small-signal/power amplifiers, the optimal load is dominated by substrate resistance.

The substrate model is shown in Fig. 3.3(b), where  $R_{sub,p}$  and  $C_p$  are the net parallel resistance and capacitance respectively.  $r_{ds}$  arises from channel-length modulation in saturation. The simulated output resistance of a 130 nm device in saturation is compared to the model where  $R_{sub,p}$  is given by  $R_{sub,p} = \frac{1}{\omega^2 C_{db}^2 r_{sub}} + (1 + \frac{C_{sb}}{C_{db}})^2 r_{sub}$ . When substrate resistance alone dominates, the optimal device size to drive 50  $\Omega$  would be  $W_{opt}(\omega) = \frac{1}{\omega_{out}^2 C_{db,u}^2 r_{sub,u} \times 2 \times 50\Omega} + (1 + \frac{C_{sb}}{C_{db}})^2 \frac{r_{sub,u}}{2 \times 50\Omega}$  as the



Figure 3.3: (a) Device size needed to deliver maximum power to a 50  $\Omega$  load in a 130 nm CMOS balanced doubler. (b) Frequency dependence of  $R_{sub,p}$ .



Figure 3.4: (a)  $P_{in}$  of optimal doubler driving 50  $\Omega$  across frequency. (b) Frequency dependence of  $2^{nd}$  harmonic current due to NQS effect in 130 nm.



Figure 3.5: Simulated output power for optimal doublers driving 50  $\Omega$  in 130 nm and 65 nm CMOS across (a) absolute  $f_{out}$ , (b)  $f_{out}$  normalized to  $f_T$ .

50  $\Omega$  load should be conjugate-matched to  $R_{sub,p}/2$  (due to the presence of two devices). Here  $C_{db,u}$  and  $r_{sub,u}$  are per unit length. This  $W_{opt}(\omega)$  value moves from  $1/\omega_{out}^2$  dependence to a constant value. The range of interest, 60 GHz - 200 GHz in 130 nm CMOS, lies in the transition between the two regions. For this range, preserving the value at the transition corner, optimal device size can be modeled with a  $1/\omega_{out}$  dependence.

$$W_{opt}(\omega) \approx \frac{(1 + \frac{C_{sb}}{C_{db}})}{\omega_{out}C_{db,u} \times 50 \,\Omega}.$$
 (3.2)

The accuracy of (3.2) is verified in Fig. 3.3(a).  $W_{opt}(\omega)$  closely follows the optimal device size to drive  $50\,\Omega$  as predicted by large signal simulations in the high mm-Wave range.

The simulated input power versus frequency for 130 nm and 65 nm CMOS designs is in Fig. 3.4(a). In (3.1), to maximize output power within breakdown limits  $v_{amp}$  is set to  $V_{dd}$ . As  $r_g = r_{g,u}/W_{opt}$ ,  $C_{gs} = C_{gs,u} \times W_{opt}$  and  $C_{gd} = C_{gd,u} \times W_{opt}$ , where  $C_{gs,u}$ ,  $C_{gd,u}$  and  $r_{g,u}$  are per unit length, the  $1/\omega_{out}$  dependence of  $W_{opt}$  means the input power is expected to linearly increase with frequency, as seen in Fig. 3.4(a).

The second harmonic current for a clipped-sine-wave model can be written as  $F_2 \times g_{m,u} \times W_{opt} \times (V_{dd} - V_{th})$ .  $g_{m,u}$  is the per unit width transconductance when  $V_{gs} > V_{th}$  and  $F_2$  is the ratio of second harmonic component to peak. Half the current of both devices flows into the load due to conjugate match. The optimality of conjugate match results from the dominance of the substrate network at mm-Wave and enables a closed form expression for output power. At high frequencies,



Figure 3.6: Block diagram and chip photo of the 130 nm CMOS F-band doubler. The annotated values are at 67 GHz after post-layout simulations.

the Non-Quasi Static (NQS) effect, or the finite time of channel charge build-up, produces a roll-off in the output harmonic current. This is modeled as a pole at  $f_{NQS} = 150$  GHz in 130 nm CMOS (Fig. 3.4(b)). The output power then is  $P_{out} = \frac{1}{2} \frac{(F_2 \times g_{m,u} \times W_{opt} \times (V_{dd} - V_{th}))^2}{1 + f_{out}^2 / f_{NQS}^2} \times 50 \Omega$ , which becomes

$$P_{out} = \frac{F_2^2}{100 \Omega} \left(\frac{C_{in}}{C_{out}} \times \frac{C_{sb}}{C_{db}}\right)^2 (V_{dd} - V_{th})^2 \frac{\left(\frac{f_T}{f_{out}}\right)^2}{1 + \frac{f_{out}^2}{f_{NQS}^2}}.$$
 (3.3)

 $C_{in} = C_{gs} + C_{gd}$  and  $C_{out} = C_{db}C_{sb}/(C_{db} + C_{sb})$ . Equation (3.3) indicates that the output power falls first at 20 dB and then 40 dB per decade. Fig. 3.5 depicts the simulated output power for optimal doublers driving 50  $\Omega$  in 130 nm and 65 nm CMOS. It also plots the theoretical trend from (3.3) for 130 nm CMOS. Fig. 3.1 implies that  $f_T \times (V_{dd} - V_{th})^2$  is constant ( $\approx 90 \text{ GHz-V}^2$ ) across CMOS scaling. Ignoring NQS, (3.3) indicates that at a fixed  $f_{out}$ , a 65 nm doubler surpasses a 130 nm CMOS doubler in output power by the ratio of  $f_T$ , namely 2.2 dB. This is indeed seen in Fig. 3.5(a). If  $f_{out}$  is normalized to  $f_T$ , a 130 nm doubler surpasses its 65 nm counterpart by the ratio of  $(V_{dd} - V_{th})^2$ , which is  $\approx 3 \text{ dB}$  (Fig. 3.5(b)).

#### 3.2 A 134 GHz Doubler in 130 nm CMOS

A +4 dBm doubler at 134 GHz (originally designed for 120 GHz) is shown in 130 nm CMOS  $(f_{max} \approx 135 \text{ GHz [47]}).$ 

A Marchand balun splits the input to drive two Class-AB V-band amplifier chains (Fig. 3.6). To mitigate the fundamental power generation challenge, two-way device stacking is used in the amplifiers to enable operation from 3 V and increase output power [75]. In 130 nm CMOS, the



Figure 3.7: (a) First V-band amplifier stage and, (b) the F-band balanced doubler.



Figure 3.8: (a) Measured and simulated saturated output power and efficiency. (b) Output power and conversion gain at 134 GHz.

Maximum Available Gain (MAG) for a device is only 5 dB at 60 GHz. A cascode with identical devices has 7.5 dB MAG. MAG improvement through interstage matching or broadband neutralization shows  $\approx 1.5 \text{dB}$  improvement before layout. We therefore use a simple cascode (Fig. 3.7(a)) laid out as in [75] with stepped gate and drain vias [50]. Device sizes of each stage are shown in Fig. 3.6. They are sized up by  $1.33 \times$  to  $2.25 \times$  to ensure saturation of the stages with compressed gains of  $1.25 \, \text{dB}$  to  $3.5 \, \text{dB}$ . Three stages are conjugate matched for gain. The last stage is designed for output power and efficiency. Each amplifier chain has a post-layout simulated small-signal gain of  $14 \, \text{dB}$  at  $67 \, \text{GHz}$ ,  $12.4 \, \text{dBm}$  saturated power and  $8 \, \%$  drain efficiency.

The amplifier chain drives a doubler designed as in the previous section (Fig. 3.7(b)). The

anti-phase devices are laid out as in [50] but with a shared drain. In the doubler layout, the pad capacitance along with the routing line to the shared drain transforms the probe  $50\,\Omega$  to  $30\,\Omega$ . This inevitable transformation in layout is steepened to  $24\,\Omega$  as this block is used in a larger system where it drives  $24\,\Omega$  and its performance can be directly verified. The reduction in load also proportionally increases output power. Based on Fig. 3.3(a), the optimal device size is around  $100\,\mu\text{m}$ , and the post-layout optimized size of  $90\,\mu\text{m}$  is very close indicating the strength of our analysis.

The measured saturated output power and efficiency, defined as  $\eta = P_{out}/(P_{DC} + P_{in})$ , across frequency is in Fig. 3.8(a). Post-layout EM simulations of entire matching networks to capture the effects of bends and T-junctions improve correlation between simulations and measurements. A 7% upward frequency shift is still seen and can be from uncertainties in the device models and metal stack. The measured output power across input power at 67 GHz in Fig. 3.8(b) shows a peak conversion gain of  $-3.1\,\mathrm{dB}$ . Equation (3.3) predicts a power of 8.5 dBm which falls to 5.3 dBm after post-layout simulations and a peak power of  $+4.2\,\mathrm{dBm}$  is measured at an output frequency of 134 GHz with a total power consumption of 708 mW in the amplifiers and 81 mW in the doubler during peak operation. The simulated  $-3\,\mathrm{dB}$  saturated output power BW is 17%.

 $\frac{P_{out}}{P_{DC} + P_{in}}$ Ref. Tech.  $f_{in}$  $P_{sat}$ Pk. CG  $P_{DC}$  $f_{out}$ (nm) (GHz) (GHz) (dBm) (dB) (mW) (%)This Work 130 67 134 1.4 4.2 -3.1790 0.33 10\* [76] 62.5 125  $0.18^{\dagger}$ -1.5N/A 130 -10 [73]65 10.2 91.8 0.68.5 -5.7438 1.54 [71] 65 122 1.62 -11.4 40 0.51\*244 -6.6[74]65 240 480 3.2 -14.3 $3.7^{*}$ -6.3N/A

Table 3.1: Recent CMOS Multipliers beyond 100 GHz

#### 3.3 Conclusion

Through analysis of fundamental limits and scaling trends of doublers across frequency and CMOS nodes, a 134 GHz doubler in 130 nm CMOS is implemented. It achieves  $4 \times$  higher output power than other 130 nm CMOS sources in the same frequency range (Table 3.1) and state-of-the-art

<sup>&</sup>lt;sup>†</sup>Diode Doubler, Diode  $f_T = 680 \,\mathrm{GHz}$ 

<sup>\*</sup>Drivers not implemented, and will reduce efficiency.

output power for the same normalized output frequency  $(f_{out}/f_T)$  across all CMOS technologies as seen in Fig. 3.1.

# Chapter 4

# THz Power Generation: Power Mixers

While high mm-wave and sub-mm-wave transmitters and receivers have been demonstrated in CMOS recently [77–80], the generation of appreciable power at these frequencies remains a fundamental challenge. Modern  $130-65\,\mathrm{nm}$  CMOS technology nodes only have an  $f_{max}$  in the range of  $130-300\,\mathrm{GHz}$  [2]. As the Maximum Available Gain (MAG) falls off rapidly with frequency, most high and sub-mm-wave CMOS signal sources look to extract harmonics from fundamental oscillators [41, 42, 47, 70, 81, 82] or from frequency multiplier circuits [50, 71, 73, 83].

The nonlinearity of the device I-V characteristics is responsible for the generation of harmonics. It is interesting to look at waveform-engineering techniques that optimize the nonlinearity employed towards the generation of power at a desired harmonic. Conventional multiplier topologies extract the desired harmonic signal by exciting a device in common source configuration with a fundamental sinusoid at the gate. Typically, an optimization in terms of the bias and the input amplitude, and hence duty cycle, is done to maximize the harmonic current extracted, [50], [73] and [72]. In the context of harmonic extraction from oscillators, in [81] the authors study the effect of the relative phase shift between the drain and gate voltage on the harmonic current generated by the device. In [70] second harmonic power is extracted from the common mode of an oscillator designed for maximum startup gain. The second harmonic power delivered to the load is optimized by transforming the  $50\,\Omega$  load to an optimal impedance at the second harmonic.

Waveform engineering at harmonics is a hitherto unexplored yet promising area of investigation to enhance the harmonic power extracted from a transistor. In [84], we proposed a power mixer circuit to mix the first and second harmonic to generate the desired third harmonic signal, as shown



Figure 4.1: Power mixer technique mixes the first and the second harmonic to generate the third harmonic current. The third harmonic output power can be optimized by controlling the amplitudes and relative phase shifts of the input fundamental and second harmonic signals.

in Fig. 4.1. By controlling the relative phase shifts between the two harmonics and their amplitudes as well, we can control the effective gate-source waveform such that the desired harmonic generated through the device nonlinearity is enhanced. This work presents a detailed understanding of the advantages of the power mixer technique and a comprehensive comparison with a conventional multiplier approach to harmonic generation. In Section 4.1, we show that our proposed technique can generate  $4\times$  more third harmonic current than a frequency tripler, and hence up to  $16\times$ higher power [84], for the same fundamental to third harmonic conversion loss. We investigate this significant advantage by considering two quantitative measures to understand the harmonic content of the device current waveform, the peak of the current waveform  $i_{peak}$  and the ratio of the third harmonic current to the peak  $F_3$ . We show that the power mixer outperforms a conventional tripler in both these aspects resulting in the improved harmonic power. The proposed waveform engineering also lends the power mixer an advantage in terms of sustainable voltage swings. A frequency tripler cannot generate the higher third harmonic power even with increased fundamental input power without violating long term reliability considerations. It should be noted that while the techniques discussed here are for generating the third harmonic, the ideas can be extended to other harmonics.

The rest of the chapter is organized as follows: Section 4.2 presents the design of the implemented prototype which mixes the first and second harmonic of a 63 GHz signal to generate an output at 189 GHz in a 130 nm CMOS process ( $f_{max}$  is  $\approx 135$  GHz [47]). Section 4.3 presents the measurements of the 189 GHz power mixer which validate the concept of nonlinearity engineering. Section 4.4 concludes the chapter with comparisons to existing literature.



Figure 4.2: (a) Conventional three phase frequency tripler. (b) Device nonlinearity clips the input fundamental sine wave, resulting in a clipped drain current waveform.

# 4.1 Concept of Nonlinearity Engineering in beyond- $f_{max}$ Power Mixers

In recent work [2], we have analyzed the limits on the performance of conventional multiphase frequency multipliers. A conventional three phase frequency tripler is shown in Fig. 4.2(a). The harmonic power depends on the output harmonic current and the output load resistance. The device nonlinearity clips an input fundamental sine wave as in Fig. 4.2(b). The desired harmonic current flows into the optimal load resistance  $R_{opt}$  to generate the desired harmonic output power. The multiphase topology suppresses swing at undesired harmonics at the drain node. Multiple phases can be generated with a ring oscillator topology [70], [47].

In [2], we observed that contemporary conventional multipliers increase harmonic content of the output current by optimizing the gate bias  $V_{GS,DC}$  and the amplitude  $A_{\omega}$ , and hence the duty cycle, of the input fundamental sinusoid. To our knowledge, [2] is the first work that identifies the fundamental limits on the optimal load of conventional multiphase frequency multipliers in bulk CMOS. Several phenomena potentially limit the optimal load, including the drain voltage dependence of the device current in triode or through channel length modulation in saturation, the finite quality factor of the output inductance, the gate resistance seen at the output through  $C_{gd}$  or loss in the substrate network. We conclude that in the high mm-Wave range of output frequency in bulk CMOS, the optimal load is limited by and should be conjugately matched to the equivalent parallel resistance presented by the substrate network. Other than reducing substrate loss, we can improve harmonic output power by increasing the desired harmonic content of the output current.

In [84], we propose a power mixer, as shown in Fig. 4.1, wherein the first and the second harmonic are mixed to generate the desired third harmonic. In such a mixer, the amplitudes of the input fundamental and second harmonic sinusoids,  $A_{\omega}$  and  $A_{2\omega}$ , and their relative phase shift  $\phi$ , provide us control on the device waveforms. In addition to the gate-source bias  $V_{GS,DC}$ , we use these variables to enhance the harmonic output power generated through the device nonlinearity. We show that engineering harmonics in the gate-source waveform is more effective in optimizing harmonic power generated through device nonlinearity than simple duty cycle optimization in frequency multipliers.

In this work, the input harmonics are fed at the gate and source to improve the isolation between the two harmonic paths. Further, it is more challenging to generate second harmonic power than fundamental power. The gate impedance is higher than the impedance looking into the source node (the latter is of the order of  $1/g_m$ ). For the same second harmonic voltage swing, it is easier to drive the gate node as this reduces the power required at the second harmonic. Consequently, the second harmonic input is fed at the gate and the fundamental at the source node in the implemented prototype.

### 4.1.1 Waveform shaping and output harmonic current

To quantify the improvement in harmonic content of the output current, we first study the current waveform and the magnitude of harmonic current generated by a conventional frequency tripler. All simulations in this sections are run for extracted device models for the device layout shown in Fig. 4.13 in Section 4.2. The amplitude of the input sinusoid is constrained by the  $2V_{dd}$  (3 V in 130 nm CMOS as  $V_{dd} = 1.5$  V) limit on the voltage swing between any two device nodes for long term reliability. The clipped sine wave output current for  $V_{GS,DC} = 0$  V is shown in Fig. 4.2(b) for a maximum input amplitude of  $V_{dd}$  (limited by the gate-drain  $2V_{dd}$  limit on the negative half swing). The third harmonic current generated by a  $24 \times 2 \,\mu\text{m}/120$  nm device in 130 nm CMOS for different input amplitudes across gate bias is shown in Fig. 4.3(a). This device size is the same as that used for the implemented power mixer prototype of Section 4.2. The multiphase implementation



Figure 4.3: Third harmonic current generated by a device when it is configured as (a) frequency tripler (the dashed portion of a curve indicates when the input amplitude violates long-term reliability guidelines), and (b) power mixer.

of the tripler requires that each of the three devices is  $8 \times 2 \,\mu\text{m}/120\,\text{nm}$ . The dashed portion of a curve indicates when the input amplitude causes either the gate-source or gate-drain swing to exceed the  $2V_{dd}$  limit on either the positive half swing or the negative half swing respectively. We see a maximum of  $3.4\,\text{mA}$  for  $V_{GS,DC} = 1.5\,\text{V}$  at  $A_{\omega} = 1.5\,\text{V}$ .

The waveforms for a power mixer for two different relative phase shifts,  $\phi = 0^{\circ}$  and  $\phi = 90^{\circ}$ , are shown in Fig. 4.4. For simplicity, in this work we have chosen gate and source bias voltages of 0 V. The 1:2 frequency ratio between the input harmonics implies that we only need to consider relative phase shifts between  $0^{\circ}$  and  $180^{\circ}$ . The input amplitudes are set to  $A_{\omega} = 1.5 \,\mathrm{V}$  and  $A_{2\omega} = 1.5 \,\mathrm{V}$  for both cases. The second harmonic swing at the gate node is limited by the gate-drain breakdown and the fundamental swing at the source node is limited by the drain-source breakdown. It is confirmed that for these amplitudes the gate-source waveform also does not violate the condition for long term reliability for any  $\phi$  between  $0^{\circ}$  and  $180^{\circ}$ . For  $\phi = 90^{\circ}$ , the current waveform has almost the same peak as the tripler current waveform in Fig. 4.2(b) but appears to have a richer nonlinearity content. For  $\phi = 0^{\circ}$ , the clipped current waveform is similar in shape to the conventional frequency tripler but has more than twice the peak current in Fig. 4.2(b).

The third harmonic current generated by the device when configured as a power mixer is shown



Figure 4.4: Gate-source voltage shape and the resultant output current waveforms for the power mixer with  $A_{\omega} = A_{2\omega} = V_{dd} = 1.5 \,\text{V}$  and a relative phase shift  $\phi$  of (a) 90°, (b) 0°.

in Fig. 4.3(b). For  $A_{2\omega} = 1.5 \,\mathrm{V}$  and  $A_{\omega} = 1.5 \,\mathrm{V}$  and a relative phase shift of 0°, the  $24 \times 2 \,\mu\mathrm{m}/120 \,\mathrm{nm}$  device generates 13.3 mA third harmonic current. This is four times higher than the maximum third harmonic current of the frequency tripler. If the output load is still determined by the substrate loss mechanism, the power mixer can deliver  $16 \times$  higher output power at the third harmonic than the conventional frequency tripler. However, it is challenging to generate such a large voltage swing at the second harmonic, and hence large second harmonic power, even at the device gate node. Fig. 4.3(b) shows the resultant third harmonic current for reduced second harmonic swing as well. Even at half-swing with  $A_{2\omega} = 0.7 \,\mathrm{V}$ , the maximum third harmonic current of  $8.2 \,\mathrm{mA}$  is  $2.5 \times$  higher than the frequency tripler, which is a  $7.5 \,\mathrm{dB}$  improvement in output power.

The improvement in the third harmonic current  $(i_{3\omega})$  can be quantified by studying the peak of the current waveform,  $i_{peak}$ , and the ratio of third harmonic current to the peak of the current waveform,  $F_3 = i_{3\omega}/i_{peak}$ .  $F_3$  is an intuitive measure of the desired nonlinearity in the waveform.  $i_{3\omega}$  can therefore be increased by either increasing  $i_{peak}$  or  $F_3$ , or both.

In a frequency tripler, clipping of the sinusoidal current generates the desired harmonic content. For the same amplitude, at low bias, the waveform is more clipped and can have higher harmonic



Figure 4.5: Ratio of third harmonic current to peak current,  $F_3 = \frac{i_{3\omega}}{i_{peak}}$ , generated by a device when it is configured as (a) a frequency tripler, and (b) a power mixer  $(A_{\omega} = 1.5 \text{ V})$ .



Figure 4.6: Peak of output current waveform,  $i_{peak}$ , generated by a device when it is configured as (a) a frequency tripler, and (b) a power mixer  $(A_{\omega} = 1.5 \text{ V})$ .

content. This is seen in Fig. 4.5(a) where  $F_3$  is consistently largest for a 0V gate bias across different input amplitudes. It peaks to a maximum of 0.17 for a 0V bias and  $A_{\omega} = 0.5$  V. As the gate bias increases far beyond the threshold voltage of the device, for a given amplitude, there is less clipping and the output waveform is increasingly more sinusoidal with a lower  $F_3$ . For example, for  $V_{GS,DC} = 1.5$  V and  $A_{\omega} = 1.5V$  the nonlinearity measure  $F_3$  is only 0.04. However, at this high bias and amplitude, the  $i_{peak}$  at 74.7 mA is much higher (Fig. 4.6(a)). The higher  $i_{peak}$  dominates improvement in third harmonic current so that at  $A_{\omega} = 1.5$  V and  $V_{GS,DC} = 1.5$  V, the tripler yields 3.4 mA compared to only 0.3 mA for  $V_{GS,DC} = 0$  V and  $A_{\omega} = 0.5$  V.

For the power mixer, the maximum  $i_{peak}$  is 85 mA and it appears for  $A_{\omega} = 1.5 \,\mathrm{V}$ ,  $A_{2\omega} = 1.5 \,\mathrm{V}$  and  $\phi = 0^{\circ}$  (Fig. 4.6(b)). We recall from Fig. 4.4(b) that a relative phase of  $\phi = 0^{\circ}$  results in a much higher peak current but a waveform shape similar to that generated by a frequency tripler



Figure 4.7: Output power delivered to the optimal  $30 \Omega$  load by the device as a (a) frequency tripler, (and b) power mixer with  $A_{\omega} = 1.5 \text{ V}$ .

when at a low (0V) bias voltage. Indeed,  $F_3 = 0.16$  at this point (Fig. 4.5(b)) is comparable to the peak  $F_3 = 0.17$  for a 0V bias tripler. However, the  $i_{peak}$  of 85 mA is significantly higher and comparable to the maximum  $i_{peak}$  of the frequency tripler. Consequently, the net third harmonic current generated is 13.3 mA, significantly higher than the frequency tripler. For a relative phase shift of  $\phi = 90^{\circ}$ , we observed in Fig. 4.4(a) that the  $i_{peak}$  is lower than the  $\phi = 0^{\circ}$  case but the waveform appears to have significantly higher nonlinear content. This is seen in Fig. 4.5(b) and Fig. 4.6(b) where  $i_{peak}$  has fallen to 42.6 mA, about half its value at  $\phi = 0^{\circ}$ , and  $F_3$  is two times higher at 0.33. As such, an  $i_{3\omega}$  of about 13 mA for  $A_{2\omega} = A_{\omega} = 1.5$  V is seen at both  $\phi = 0^{\circ}$  and  $\phi = 90^{\circ}$ . While there is a trade off between  $i_{peak}$  and  $F_3$  in both circuit configurations, the power mixer outperforms the tripler substantially by exploiting device nonlinearity more effectively to generate higher third harmonic current.

## 4.1.2 Harmonic output power

We now confirm that the increased harmonic current generated by a device when configured as a power mixer indeed leads to higher third harmonic output power. The following results are for generating 180 GHz from a 60 GHz signal using the two approaches. As mentioned previously, for a frequency tripler, the optimal load is dominated by substrate resistance and is determined to be  $30\,\Omega$  across gate bias and input amplitude through a large signal simulation of a  $24\times2\,\mu\text{m}/120\,\text{nm}$ 



Figure 4.8: Fundamental input power required to generate third harmonic when the device is configured as (a) a frequency tripler, and (b) a power mixer with  $A_{\omega} = 1.5 \,\text{V}$ . In the power mixer case, the second harmonic is assumed to be generated by a balanced doubler with a conversion loss of 5 dB.

device. The peak power for any bias is obtained by pushing the input amplitude to that allowed by the  $2V_{dd}$  reliability constraint on the gate-drain and the gate-source swings. Fig. 4.7(a) plots the peak third harmonic power at the optimal load by the frequency tripler. The tripler delivers  $-9 \,\mathrm{dBm}$  at  $180 \,\mathrm{GHz}$  with  $V_{GS,DC} = 1.2 \,\mathrm{V}$  and  $A_{\omega} = 1.8 \,\mathrm{V}$ .

We determine the optimal load for the power mixer to be  $30\,\Omega$  as well through large signal simulations. The peak output power for  $A_{\omega} = 1.5\,\mathrm{V}$ , and for all  $A_{2\omega}$ , is observed for a  $\phi$  of 0°.A peak power of 2.7 dBm is seen for the maximum allowable amplitude subject to reliability constraints, that is when  $A_{2\omega} = 1.5\,\mathrm{V}$  and  $A_{\omega} = 1.5\,\mathrm{V}$ . This peak power is 11.7 dB or almost  $16\times$  higher than the frequency tripler as argued before from the comparison of harmonic current in Fig. 4.3(b). At half the second harmonic swing with  $A_{2\omega} = 0.7\,\mathrm{V}$ , the output power is  $-1.8\,\mathrm{dB}$  which is still almost 7.2 dB higher than the peak 180 GHz power delivered by the frequency tripler as predicted earlier.

#### 4.1.3 Input power requirement

In frequency multiplier circuits, the dc power consumption is dominated by the fundamental power generation and amplification circuits, [50], [2] and [45]. This is observed in the power mixer implementation as well, where the fundamental amplifiers dominate the dc power consumption compared to the mixer and the frequency doubler circuit used to generate the second harmonic input. Simulations and measurements show that the power mixer and doubler only consume 2.3% of the total



Figure 4.9: Fundamental to third harmonic conversion loss when the device is configured as (a) a frequency tripler, and (b) a power mixer with  $A_{\omega} = 1.5 \,\text{V}$ . In the power mixer case, the second harmonic is assumed to be generated by a balanced doubler with a conversion loss of 5 dB.

dc power consumption. Therefore, a comparison of fundamental to third harmonic conversion loss is representative of the dc-RF efficiencies of a complete implementation of the two topologies. The fundamental power required by the frequency tripler for peak output power, when the input amplitude is pushed to the limits allowed by reliability considerations, is plotted in Fig. 4.8(a). For the maximum output power of  $-9 \,\mathrm{dBm}$  at  $V_{GS,DC} = 1.2 \,\mathrm{V}$  and  $A_\omega = 1.8 \,\mathrm{V}$ , the tripler needs 8.6 dBm power at 60 GHz. This corresponds to a conversion loss of 17.6 dB in Fig. 4.9(a).

In the power mixer, apart from the fundamental power fed at the source node, additional fundamental power is needed to generate second harmonic power for the gate through a frequency doubler. As the relative phase  $\phi$  of the two inputs changes, it is seen that the fundamental power requirement at the source is almost constant at about 14 dBm across  $\phi$  and  $A_{2\omega}$ . Based on the results reported in [2], we have estimated the fundamental power required to generate the 120 GHz signal assuming a simulated conversion loss of 5 dB from a balanced doubler optimized to drive the impedance looking into the mixer gate. This power is added to the fundamental power fed at the source. The total fundamental power requirement of the power mixer for different  $A_{2\omega}$  as  $A_{\omega}$  is kept fixed at 1.5 V is shown in Fig. 4.8(b). For  $A_{2\omega} = 1.5$  V and  $\phi = 0^{\circ}$ , the power mixer requires 10.8 dB more fundamental input power than the 8.6 dBm requirement of the frequency tripler at peak output power, but generates 11.7dB more third harmonic output power. For  $A_{2\omega} = 0.7$  V and  $\phi = 0^{\circ}$ , the power mixer requires 7.3 dB more input power to yield 7.2 dB improvement in output power. Conversion loss numbers of the power mixer are in Fig. 4.9(b).

We conclude that the power mixer improves the output power significantly without deteriorating



Figure 4.10: Effect of non-ideal input and output terminations on the output power of the (a) a frequency tripler, and (b) a power mixer with  $A_{\omega} = 1.5 \,\text{V}$ .

conversion loss. It is important to note that the same larger output power cannot be generated from the same device when configured as a frequency tripler. This is because the tripler cannot support the larger input gate swing corresponding to the increased input power without violating the conditions on voltage swing for long-term reliability. It is possible to feed the larger input power to a larger tripler device so as to limit the voltage swing to an acceptable amplitude. However, apart from the challenge of laying out a larger device, the smaller optimal load leads to a steeper impedance transformation at the output which will result in an increased loss in the output matching network.

#### 4.1.4 Effect of non-ideal input and output terminations

So far in our comparison, both the power mixer and the frequency tripler were driven by ideal voltage sources. In a practical implementation where the multiplier is driven by an amplifier with finite output impedance, it is typical to place a trap, resonant at the output harmonic, at the gate of multiplier circuits to prevent any feedback of the output harmonic current through the device



Figure 4.11: Block diagram and chip photograph of the implemented  $2.4 \,\mathrm{mm} \times 1.1 \,\mathrm{mm}$   $180 - 200 \,\mathrm{GHz}$  power mixer in  $130 \,\mathrm{nm}$  CMOS with  $f_{max} \approx 135 \,\mathrm{GHz}$ .

 $C_{gd}$  and transconductance [2], [50]. The three-phase tripler of Fig. 4.3 with an added open  $\lambda_{3\omega}/4$  stub at each gate (a third harmonic trap), and driven by a 50  $\Omega$  fundamental source is shown in Fig. 4.10(a). For a gate bias,  $V_{GS,DC}$ , the port power is such that the peak  $A_{\omega}$ , as allowed by long-term reliability, appears at the device gate. Owing to the finite quality factor of the  $\lambda_{3\omega}/4$  open stub, and the interconnect from the gate via to the open stub, the third harmonic trap is not an ideal short.<sup>1</sup> A fraction of the output harmonic current flows through  $C_{gd}$  and the non-ideal harmonic trap, generating a small, but finite, third harmonic voltage at the device gate. This third harmonic voltage creates a third harmonic current through the transconductance. The effect of this feedback on the output power of the frequency tripler is captured through simulation, such that the new peak output power of the tripler falls from  $-9 \, \text{dBm}$  at a gate bias of 1.2 V to  $-11.5 \, \text{dBm}$ .

For the power mixer, a  $\lambda_{\omega}/4$  open stub (trap at fundamental and third harmonic) is placed at

<sup>&</sup>lt;sup>1</sup>The finite impedance of the harmonic trap appears in parallel with the port impedance. The port impedance of  $50\Omega$  is sufficiently larger than the trap impedance at the third harmonic, and it's value does not alter the feedback of the output harmonic current. In the chip implementation a fundamental frequency matching network is included between the driving amplifier and the tripler device.

the gate of the device as in Fig. 4.10(b). An open  $\lambda_{3\omega}/4$  stub and a shorted  $\lambda_{2\omega}/2$  line at the source act as traps for the third and second harmonic respectively. The source driving port ensures an amplitude of  $A_{\omega} = 1.5 \,\mathrm{V}$  while the power at the second harmonic port is varied to impose different  $A_{2\omega}$  at the device gate while maintaining an optimal relative phase shift of  $\phi = 0^{\circ}$ . As seen in Fig. 4.7(b), a relative phase shift of  $\phi = 0^{\circ}$  between the inputs is optimal for maximizing output power. Owing to the non-ideal nature of the harmonic traps, they present a finite impedance at the desired resonance frequencies. A small but finite voltage at the first and third harmonic appear at the gate, and a second and third harmonic voltage appear at the source. As with the tripler, these finite voltages modify the performance of the power mixer. The effect on the output power is captured through simulation for different  $A_{2\omega}$  while  $\phi = 0^{\circ}$ . The output power for  $A_{2\omega} = 1.5 \,\mathrm{V}$  at the gate falls from 2.7 dBm to 0.1 dBm; and for  $A_{2\omega} = 0.7 \,\mathrm{V}$ , the power falls from  $-1.8 \,\mathrm{dBm}$  to  $-5.2 \,\mathrm{dBm}$ .

As discussed before, the output impedance of the circuits is dominated by the substrate resistance, and both circuits seek to drive an optimal impedance of  $30\,\Omega$  while resonating the output capacitance through a transmission line at the drain node. The loss in the output matching network to the final  $50\,\Omega$  load in the two cases is identical, and results in an additional  $0.5\,\mathrm{dB}$  degradation in output power.

After taking into account the effect of both non-ideal input and output terminations on the output power, the power mixer generates  $-0.4\,\mathrm{dBm}$  when  $A_{2\omega}=1.5\,V$  and  $-5.7\,\mathrm{dBm}$  when  $A_{2\omega}=0.7\,V$ , for an optimal relative input phase of  $0^\circ$ . In doing so it continues to outperform the frequency tripler by 11.6 dB when  $A_{2\omega}=1.5\,V$  and by 6.3 dB when  $A_{2\omega}=0.7\,V$ , compared to 11.7 dB and 7.2 dB respectively when driven by ideal voltage sources. For given input amplitudes, the presence of harmonic traps does not alter the input power requirements of the two circuits. Therefore, the change in the output power due to non-ideal terminations is the change in the fundamental-to-third harmonic conversion loss.

Although we have discussed engineering the device nonlinearity to increase the third harmonic content, the discussion can be extended to other harmonics. Other modifications to the device waveforms, perhaps even by controlling the drain swing at other harmonics can potentially be investigated to further enhance the harmonic content.



Figure 4.12: (a) BEOL cross-section of the 130 nm CMOS process. (b) Circuit diagram of the implemented 180 – 200 GHz power mixer. (c) Series resistance, R ( $\Omega$ ), and (d) reactance, |jX| ( $\Omega$ ), of the 150 fF radial capacitance compared with that of the PDK model of the 8.5  $\mu$ m ×8.5  $\mu$ m MIMcap.

# 4.2 $180 - 200 \,\text{GHz} \,\, 130 \,\text{nm} \,\, \text{CMOS Power Mixer Implementation}$

The block diagram of the 180-200 GHz power mixer implemented in  $130\,\mathrm{nm}$  CMOS ( $f_{max}$  is  $\approx 135\,\mathrm{GHz}$  [47]) is shown in Fig. 4.11 and the chip microphotograph is in Fig. 4.11. The second harmonic signal is generated on chip using a conventional balanced frequency doubler [2]. The input fundamental signal is split using a Wilkinson divider. One output of the Wilkinson divider feeds the fundamental path and the other feeds a Marchand balun. The balun generates the differential signal required to feed the frequency doubler. The outputs of the Marchand balun are amplified using two chains of five-stage 60 GHz amplifiers. The other output of the Wilkinson is amplified with a similar five-stage amplifer chain before feeding the source of the power mixer. A Reflection Type Phase Shifter (RTPS) is also included between the Wilkinson divider and the amplifier chain to adjust the phase shift between the second and first harmonic. A variable gain amplifier (VGA) has been included to compensate for the variable loss of the RTPS.



Figure 4.13: Layout of the power mixer device. The source via is pulled to one side in M2-M4 and then built upto M7 (not shown). The substrate connection is not shown.

In the following sections, we discuss the implementation of the individual blocks. A cross-section of the BEOL of the 130 nm CMOS process is shown in Fig. 4.12(a).

# **4.2.1** 130 nm CMOS 180 – 200 GHz Power Mixer

The circuit diagram of the  $180-200\,\mathrm{GHz}$  power mixer is shown in Fig. 4.12(b). The power mixer device is sized to drive  $30\,\Omega$ . The pad capacitance along with the routing transmission line  $(TL_6)$  transform the probe  $50\,\Omega$  to this desired value. The device size required to drive  $30\,\Omega$  instead of directly driving  $50\,\Omega$  provides a good tradeoff between the larger power from a larger device size, the challenge in laying out a larger device and the loss in the impedance transformation network.

The  $48 \,\mu\text{m}/120 \,n\text{m}$  device is folded into two  $24 \times 1 \,\mu\text{m}/120 \,n\text{m}$  devices. Each device is laid out as in [82]. The layout of the power mixer transistor is shown in Fig. 4.13. The gate is doubly connected to a poly ring around the device to reduce wiring resistance. The poly ring is enhanced with a ring in metal M1, and then pulled to one side before being built up to M7 with a stepped via [50] and connected through an M7 layer transmission line  $(TL_3)$  to the doubler drain built which has been built up till M7 as well. This avoids the loss of the M7 - M8 via transition in the second harmonic path.

The ground plane is laid out in M1-M4 while in parts it is built to the top metal layer M8. This

provides the ground to transmission lines implemented as coplanar waveguides and also provides isolation between signal paths. Substrate contacts are placed as a ring around the device, and as in [82], the bulk is connected to the adjacent ground plane. As in [82], the source is pulled to the middle of the two devices in M2. However, here the source carries the fundamental signal and is not tied to the ground. Instead, it is built upto M4 in the middle of the two devices and pulled to one side (opposite to the gate) and the remaining transition from M4 to the M8 transmission line  $(TL_2)$  is completed.

At the gate, a  $\lambda_{\omega}/4$  open stub serving as first and third harmonic trap is included. An open  $\lambda_{3\omega}/4$  line and a shorted  $\lambda_{2\omega}/2$  line serve as third and second harmonic traps respectively at the source. As the power mixer is biased at 0 V gate and source voltage, the  $\lambda_{2\omega}/2$  line can be shorted directly to the chip ground plane. Matching networks implemented in M7 at the gate  $(TL_3)$  and  $TL_4$  and in M8 at the source  $(TL_1)$  and  $TL_2$  for the second and first harmonic respectively are implemented to transform the transistor impedance to the optimal value for maximum power transfer from the doubler and fundamental path final stage amplifier respectively. 0 V gate bias is provided through the shorted shunt line in the second harmonic matching network in Fig. 4.12(b).

The drain connection is along the lines of [82]. The drain fingers are pulled to one side, opposite to that of the source contacts and then joined together in M2 to M4, such that the M2-M4 source via is in the middle of the two devices and there are two M2-M4 drain vias on either side of the folded devices. These two M2-M4 drain vias are bridged on top of the two devices in M5 through M7. The device output capacitance is resonated with an output inductance implemented as a microstrip in layer M7 ( $TL_5$ ). The resonating inductance is shorted at its end through a bypass capacitance implemented as a radial capacitance [85] in M7 sandwiched into the ground plane between layers M8 and M6. The capacitance is simulated in an EM simulator (IE3D [56]). As the capacitance is implemented in a lower loss metal than the technology's Metal-Insulator-Metal capacitance (MIMcap), its series resistance is only  $0.5\,\Omega$  compared to the  $3\,\Omega$  of a  $8.5\,\mu$ m  $\times 8.5\,\mu$ m MIMcap which implements the same capacitance value at the output frequency of interest, as shown in Fig. 4.12(c). The loss of the MIMcap is independent of frequency as the Process Design Kit (PDK) model does not capture frequency-dependent skin depth ( $\delta$ ). Consequently, we can expect the loss of the MIMcap to be even higher in practice. Fig. 4.12(d) plots the reactance of the two capacitors. The radial capacitance also has a higher self resonance frequency than the MIMcap



Figure 4.14: Circuit diagram of the implemented 120GHz frequency doubler [2].

and presents a better short at the output frequency. The mixer drain bias can be routed to  $V_{dd}$  after the bypass radial capacitance without performance degradation. In this work, the bias is provided by the bias T of the output probe. A  $50\,\Omega$  transmission line implemented as a coplanar waveguide connects the drain to the output pad. The signal line metal transitions from M7 to M8 before reaching the pad. It is observed that including the M7-M8 transition as part of the routing transmission line rather than in device's drain via, after the resonating inductance in M7 has been included, has less detrimental effect on the final third harmonic power delivered to the  $50\,\Omega$  load of the probe, improving the performance by about  $0.5\,\mathrm{dB}$  in simulation.

# 4.2.2 130 nm CMOS 120 GHz Frequency Doubler

The circuit diagram of the balanced frequency doubler is shown in Fig. 4.14. The doubler is sized to optimally drive  $24 \Omega$  and deliver upto  $6.5 \,\mathrm{dBm}$ . This power is intended to generate a swing of half  $V_{dd}$  or  $0.7 \,\mathrm{V}$  at the power mixer gate. A version of the same doubler driving a  $50 \,\Omega$  load impedance transformed to the required  $24 \,\Omega$  was implemented and its performance was verified in an earlier work [2]. For further details on the doubler circuit and its design, the reader is directed to [2].

#### 4.2.3 130 nm CMOS Fundamental-frequency V-band PAs

The fundamental signal in both first and second harmonic paths is amplified using a five stage chain of V-band power amplifiers. The Maximum Available Gain (MAG) of the 130 nm CMOS process is only about 5 dB at 60 GHz. A cascode configuration with two equally sized devices has a



Figure 4.15: Circuit diagram of the last stage of the implemented V-band amplifier chain.

MAG of 7.5 dB. To improve the MAG further, narrowband techniques such as interstage matching within the cascode and broadband methods such as neutralization of a differential PA with cross coupled capacitors [86] were evaluated. These techniques at best yield  $1 - 1.5 \,\mathrm{dB}$  improvement before layout. As such, a stacked amplifier design as in [75] is chosen and the circuit diagram of the last amplifier stage is shown in Fig. 4.15. The stacked design is driven from a  $2V_{dd}$  or  $3\,\mathrm{V}$  supply to improve the power delivery capability [75].

The devices of the five stages are sized as two  $24 \,\mu\mathrm{m}$  stages followed by three stages of size  $36 \,\mu\mathrm{m}$ ,  $48 \,\mu\mathrm{m}$ , and  $108 \,\mu$ . The input of the first stage is matched to  $50 \,\Omega$  in the fundamental path but to  $25 \,\Omega$  in the second harmonic path. This is done to interface with the Marchand balun and is discussed in more detail later in this section. The devices are sized up to ensure saturation of the stages as long as the compressed gain of the stages exceeds  $1.24 \,\mathrm{dB}$  to  $3.5 \,\mathrm{dB}$ . In [2], we observed that the interstage matching networks in the amplifier chain are not well modeled and can cause mistuning of interfaces. The matching networks were remodeled to include the effect of T-junctions and bends through a complete EM simulation in IE3D. In Fig. 4.16, we have plotted the small signal gain of the amplifier chain with and without the corrected models for the interstage matching networks. It was seen that the peak small signal simulated gain of the chain falls from  $21 \,\mathrm{dB}$  at  $60 \,\mathrm{GHz}$  to  $15 \,\mathrm{dB}$  at  $66 \,\mathrm{GHz}$ . The mistuning of the matching network between stages reduces



Figure 4.16: Simulated S-parameter performance of the 60 GHz fundamental amplifier chain in the source path of the power mixer.

the simulated saturated output power of each amplifier chain from 13.4 dBm to 10 dBm and the simulated efficiency from 6.8% to about 3%.

# 4.2.4 60 GHz Reflection-Type Phase Shifter (RTPS)

Fig. 4.17(a) shows the circuit diagram of the RTPS. It consists of a 3 dB quadrature coupler and two identical reflective loads terminating the through and coupled ports of the coupler. In this work, a broadside coupled line coupler is employed to achieve a high coupling of 3 dB in the presence of tight CMOS BEOL rules. The essential design parameters of a coupled line coupler are the even and odd mode characteristic impedances ( $Z_{0,e}$  and  $Z_{0,o}$ ) and the coupling factor (c) which are related by the equations shown in Fig. 4.17(b). For 3 dB coupling, irrespective of the input matching, there should be 1 to 5.8 ratio between  $Z_{0,o}$  and  $Z_{0,e}$ . In a 50  $\Omega$  system, this necessitates  $Z_{0,e} = 120 \Omega$  and  $Z_{0,o} = 21 \Omega$ . An even mode impedance of  $120 \Omega$  is a challenging requirement to satisfy in a silicon based process. As a result, authors in [87] sacrifice input matching, whereas authors in [88] use thinner metal layers and totally remove the ground plane underneath the coupler to achieve higher inductance per unit length at the expense of increased insertion loss. A differential broadside coupled line coupler with co-planar striplines over floating ground strips was reported in [89] and it harnesses the lateral spacing between coplanar striplines to achieve a high  $Z_{0,e}$ . However, this approach is not applicable to single-ended broadside couplers. In this work, the RTPS employs a slow-wave technique in the single-ended coupled line coupler design to satisfy high  $Z_{0,e}$  requirement.



Figure 4.17: (a) Circuit diagram of the RTPS. It uses a broadside coupled line 3 dB coupler and CLC reflective terminations. (b) Cross-section of the coupled line 3 dB coupler. A slow-wave technique has been used for achieving high even mode impedance as well as simplifying the design procedure.



Figure 4.18: Simulated characteristic impedance of the coupler in the even and odd modes.  $W = 12 \,\mu\text{m}$ ,  $W_{slot} = 10 \,\mu\text{m}$  and  $L_{slot}$  is varied.

Fig. 4.17(b) depicts the cross-section of the coupled line coupler. Vertically coupled microstrip lines are implemented using the two topmost thick metal layers, M8 and M7, of the 130 nm CMOS process BEOL to achieve lower loss. A ground plane with slots is employed under the coupled microstrip lines to shield the coupler structure from the lossy silicon substrate. The ground plane is formed by stacking M1, M2 and M3 metals to satisfy the metal density rules. Stacking also provides a thickness greater than  $3\delta$ , where  $\delta$  is the skindepth, at  $60 \,\text{GHz}$  for the return current flow and reduces the loss.  $10 \,\mu\text{m}$  wide slots separated by  $10 \,\mu\text{m}$  spacing are opened, orthogonal to the signal propagation direction, in the ground plane for creating the slow-wave effect [90].

The even and odd mode characteristic impedances of the coupler are simulated in IE3D. Fig. 4.18 depicts the even and odd mode characteristic impedances as the slot length is varied. In the even mode, when the coupled lines are excited with same polarity, the return current flows through the ground plane. Since the slots in the ground plane are orthogonal to the signal propagation direction, the return current in the signal direction is forced to flow far away from the microstrip lines. This increases the inductance per unit length. As a result, the even mode characteristic impedance increases with increasing slot length. The slots boost the even mode impedance by decreasing the capacitance per unit length as well. On the other hand, the current flows through one of the microstrips and returns through the other in the odd mode. Therefore, magnetic fields cancel everywhere except between the top and bottom lines. Electric field is also confined between two parallel lines in this mode. Accordingly, changing the slot length does not have any effect on the inductance and capacitance per unit length and thus the characteristic impedance in the odd mode.

The slow-wave technique also simplifies the coupler design to a two step procedure. First, the odd mode impedance can be set by changing the width, W, of the signal lines. Then the even mode impedance can be increased up to the desired value by using the slot length,  $L_{slot}$ . The physical design parameters, W and  $L_{slot}$  were found based on this two step procedure in IE3D.  $Z_{0,o}$  of  $21\,\Omega$  and  $Z_{0,e}$  of  $120\,\Omega$  require a width of  $12\,\mu\mathrm{m}$  and a slot length of  $60\,\mu\mathrm{m}$ . Quarter-wavelength in even and odd modes corresponds to approximately  $500\,\mu\mathrm{m}$  length at  $60\,\mathrm{GHz}$ . The coupled microstrips are bent to conserve chip area. Simulations show that the coupler achieves  $3\,\mathrm{dB}$  coupling with  $0.8\,\mathrm{dB}$  insertion loss (varies by  $0.1\,\mathrm{dB}$  between the coupled and thru port) and  $35\,\mathrm{dB}$  isolation at  $60\,\mathrm{GHz}$ . The simulated phase difference between the through and coupled ports is  $84^{\circ}\pm1^{\circ}$  between

40 GHz and 70 GHz. The deviation from 90° is attributed to the difference between the even and odd mode propagation constants. This can be improved by including another parameter in the design procedure which could be asymmetry in the widths as in [89], or asymmetry in the position of the coupled lines as in [91].

The effect of the coupler phase imbalance on the RTPS performance is analyzed to the first order assuming there is no amplitude imbalance. Assuming there is  $\phi$  imbalance between  $S_{21}$  and  $S_{31}$  (e.g.  $\angle S_{21}$  and  $\angle S_{31}$  are -90°+ $\phi$  and 0° respectively) the signals from reflected terminations will be -180°+2 $\phi$  out of phase at the input port instead of 180°. Then,  $S_{11}$  of the RTPS can be expressed as

$$|S_{11,RTPS}| = \sqrt{\frac{1 - \cos(2\phi)}{2}}$$
 (4.1)

According to (4.1),  $S_{11}$  of the RTPS would be lower than -15 dB up to a coupler phase imbalance of  $10^{\circ}$ . Similarly, the forward transmission coefficient of the RTPS can be calculated in the presence of a coupler phase imbalance as

$$|S_{21.RTPS}| = |S_{21}S_{42}\Gamma_L + S_{31}S_{43}\Gamma_L| \tag{4.2}$$

where  $S_{21}$ ,  $S_{31}$ ,  $S_{42}$  and  $S_{43}$  are the scattering parameters of the coupler in the presence of phase imbalance and  $\Gamma_L$  is the load reflection coefficient. Due to symmetry,  $S_{31}$  and  $S_{42}$  are equal both in magnitude and phase and unitary condition requires  $\angle S_{21} + \angle S_{43} = 180^{\circ}$ . Thus, using  $|S_{21}| = |S_{31}| = |S_{42}| = |S_{43}| = 1/\sqrt{2}$ ,  $\angle S_{21} = -90 + \phi$  and  $\angle S_{43} = -90 - \phi$ , (4.2) simplifies to

$$S_{21,RTPS} = \cos(\phi)\Gamma_L e^{-j90} \tag{4.3}$$

Equation (4.3) indicates that phase imbalance in the coupler does not cause any variation in the overall RTPS phase shift. Loss of the RTPS in dB is equal to  $20log|cos\phi| + 20log|\Gamma_L|$  (the coupler is assumed to be lossless). As a result, loss of the RTPS increases with increasing phase imbalance in the coupler. 10° phase imbalance would degrade RTPS loss by 0.13 dB, which is negligible compared to the loss from the reflective termination.

Using S-parameters of an ideal coupled line coupler (Fig. 4.17(a)), the phase shift of the RTPS can be expressed as [89]



Figure 4.19: (a) Simulated effective varactor capacitance for different signal amplitudes showing large signal effects. Larger signal amplitude across the varactor causes a reduction in capacitance range and tuning ratio (b) Simulated phase shift of the RTPS under large signal operation for different input power levels.

$$\angle S_{21,RTPS} = -90 - 2 \tan^{-1} \left(\frac{X}{Z_0}\right)$$
 (4.4)

where X is the reactance of the reflective loads and  $Z_0$  is the characteristic impedance of the coupler. A  $\pi$ -type C-L-C termination is used as variable reactance in the RTPS to achieve 180° continuous phase range. Fig. 4.17(a) depicts the circuit diagram of the reflection termination. Varactors are used as shunt capacitances and are implemented using  $30 \times 2 \,\mu\text{m}/160 \,\text{nm}$  FET devices whose source and drain are connected together. The control voltage is applied at the gate terminal to vary the capacitance. The impedance of the reflective load is given by

$$Z_L = \frac{1 - \omega^2 L_{eff} C_v}{(2 - \omega^2 L_{eff} C_v) j\omega C_v} \tag{4.5}$$

where  $C_v$  is the varactor capacitance and  $L_{eff}$  is the effective inductance of the transmission line. For the  $30 \times 2 \,\mu\text{m}/160 \,\text{nm}$  MOS varactor, the simulated minimum capacitance,  $C_{v,min}$ , is 65 fF with a tuning ratio of 1.9 at 60 GHz when the control voltage is swept from 0 to 0.8 V (Fig. 4.19). The quality factor of the varactor varies from 18 to 4. By setting  $L_{eff} = 2/\omega^2 C$  where  $C = (C_{v,max} + C_{v,min})/2$ , a phase range more than 180° is achieved (Fig. 4.19(b)). The effective inductance is implemented using a CPW transmission line with 60  $\Omega$  characteristic impedance and 360  $\mu$ m length.

In this work, the power incident on the RTPS was  $+8\,\mathrm{dBm}$  in simulation. Therefore, the phase shift of the RTPS is also evaluated under large signal operation. Fig. 4.19(b) depicts the simulated phase shift as  $V_{RTPS}$  is varied for different input powers at 60 GHz. There are two important observations from Fig. 4.19(b): 1) the RTPS phase shift becomes more linear with increasing input power and 2) after some point, pushing more power into RTPS causes compression in phase range. The phase compression becomes worse as the input power is increased. These large signal effects on the RTPS phase shift have been overlooked in the literature although they are due to the same mechanism which causes AM/PM conversion in voltage controlled oscillators. The large signal swing across the varactor device modulates the capacitance throughout the signal period and thus the effective capacitance of the varactor is averaged over each period [92]. The resulting effective capacitance (ratio of the root mean square, RMS, of the current to RMS of the derivative of the voltage with respect to time) versus RTPS control voltage ( $V_{RTPS}$ ) for different signal amplitudes is shown in Fig. 4.19(a). As can be seen, the effective varactor capacitance varies

more linearly with  $V_{RTPS}$  for larger signal amplitudes, resulting in a more linear RTPS phase shift. Additionally, the varactor tuning ratio reduces as the signal amplitude increases and this explains the phase compression for higher input powers in Fig. 4.19(b). The small signal phase shift of the RTPS should be designed with margin to make sure there is enough phase range under large signal operation. Designing the RTPS for a lower input and output impedance would also mitigate the phase compression issue since the voltage swing across the varactors would be lower for the same input power.

#### 4.2.5 60 GHz Variable Gain Amplifier

A variable gain amplifier is used to compensate the insertion loss variation in the RTPS across the control voltage. Fig. 4.20(a) shows the block diagram of the VGA. Variable gain is achieved by placing a variable attenuator between two amplifier stages as in [93]. Fig. 4.20(b) depicts the circuit diagram of the amplifiers, including bias circuitry. The amplifiers are implemented in stacked topology due to its higher reverse isolation compared to a common source stage. High reverse isolation helps in keeping the VGA input and output matching independent of the attenuation settings. Supply voltage of the amplifiers is scaled to  $3 \, \text{V}$  to improve the power handling capability [75]. The input and output of the amplifiers are conjugately matched to  $50 \, \Omega$  using L-type matching networks.

The schematic of the variable attenuator is shown in Fig. 4.20(c). It uses a variable shunt resistor, implemented as a MOS transistor operating in the deep triode region ( $V_{DS} = 0 V$  for zero power consumption).  $S_{21}$  of the attenuator, neglecting the transmission line loss and assuming that the shorted stub inductance  $L_p$  and the total capacitance at the drain  $C_d$  resonate, is given by

$$S_{21} = \frac{2R_v}{Z_0 + 2R_v} \tag{4.6}$$

where  $R_v$  is the channel resistance which can be varied by the gate voltage. A  $24 \times 1 \,\mu\text{m}/120 \,\text{nm}$  device can provide  $16 \,\Omega$  on-resistance at  $1.5 \,\text{V}$ , and is used to obtain approximately  $8 \,\text{dB}$  attenuation range. The control voltage is applied through a  $5 \,k\Omega$  resistor to make the gate float at ac. A floating gate reduces the total capacitance at the drain of M1 to  $C_{gs}/C_{gd}+C_{db}$ , and in return a larger shunt inductance is required to resonate it out. Assuming a constant quality factor, the shunt parasitic resistance of the shunt inductor increases and the loss of the attenuator (insertion loss



Figure 4.20: (a) Block diagram of the variable gain amplifier. (b) Circuit diagram of the amplifiers. (c) Circuit diagram of the variable attenuator.



Figure 4.21: Circuit diagram of the impedance transforming Marchand Balun with a passive cancellation network between the balanced outputs. The passive network improves the output return losses and the isolation between output ports.

when the transistor is OFF) reduces. The attenuator is ac-coupled at the input and output to the amplifier stages with 300 fF MiM capacitors.

#### 4.2.6 60 GHz Marchand Balun

An impedance transforming Marchand balun was integrated on chip to convert the single-ended output of the Wilkinson to a differential signal. Fig. 4.21 shows the circuit diagram of the Marchand balun. It consists of two identical quarter-wave length coupled line coupler sections and a passive network between balanced output ports. The required coupling factor for the couplers is -4.8 dB when all the ports are terminated with  $50 \Omega$  [94]. Due to limited time at the design phase, two copies of the 3dB coupled line coupler designed for the RTPS are used in the Marchand design. Using  $3 \, \mathrm{dB}$  couplers entails output impedance of  $25 \, \Omega$  for achieving  $-3 \, \mathrm{dB}$  power transfer to each port. The conventional Marchand balun suffers from poor output matching and isolation between the balanced outputs. A passive network consisting of two  $25\,\Omega$  resistances and half-wave length transmission line was integrated on chip between the balanced outputs to improve the output matching and isolation [95]. Without adding this network, the best attainable output return loss (assuming  $25\,\Omega$  output port impedance) and isolation between outputs would be theoretically 6 dB [94]. The passive network introduces another path with 6 dB attenuation and 180 degree phase shift for perfect cancellation between the balanced outputs. This helps to reduce the power drive requirements of the amplifier chains feeding the doubler and improves the overall conversion loss. Additionally, the improved isolation between balanced ports enhances the stability of the



Figure 4.22: Simulated performance of the Marchand Balun including (a) input and output return losses and insertion loss, and (b) phase and amplitude imbalance.



Figure 4.23: Die photo of the test structure implemented to characterize the 60 GHz RTPS and VGA cascade.

differential amplifier chain.

The simulated performance of the balun is shown in Fig. 4.22. The input and output return losses are better than 15 dB (Fig. 4.22(a)) and the balun achieves an isolation better than 22 dB between balanced ports from 50 GHz to 70 GHz. The simulated insertion loss of the balun (Fig. 4.22(a)) is lower than 1.6 dB in the same frequency range. The simulated phase and amplitude imbalances of the balun are within 1° and 0.1 dB, respectively.

# 4.3 Measurement

We present the measured performance of the  $180 - 200 \,\text{GHz}$  power mixer implemented in  $130 \,\text{nm}$  CMOS. The chip photograph was shown in Fig. 4.11. Breakouts of the RTPS, VGA and frequency doubler have also been measured.

#### 4.3.1 60 GHz RTPS and VGA Breakout

A cascaded RTPS and VGA breakout is tested in a chip-on-board setup to characterize the phase shift and amplitude control capability in the fundamental path. Fig. 4.23 shows the die microphotograph of the RTPS-VGA test breakout which occupies  $1.15 \times 0.36 \,\mathrm{mm^2}$ , not including pads. S-parameters of the RTPS-VGA breakout are measured upto 65 GHz using dc-67 GHz Cascade Infinity GSG probes and an Anritsu 37397E Lightning VNA. The phase shift versus RTPS control voltage  $V_{RTPS}$  is shown in Fig. 4.24 with the VGA set to maximum gain. The RTPS-VGA breakout achieves 158° and 137° phase variation range at 60 GHz and 63 GHz, respectively. The gain varies



Figure 4.24: Insertion (a) phase shift and (b) gain of the RTPS-VGA breakout versus RTPS control voltage at 60 and 63 GHz. The VGA has been set to maximum gain ( $V_{VGA} = 0 \text{ V}$ ).



Figure 4.25: Insertion (a) gain and (b) phase-shift of the RTPS-VGA breakout versus VGA control voltage. For these measurements,  $V_{RTPS} = 0 \text{ V}$ .

from  $-6.8 \,\mathrm{dB}$  to  $-0.3 \,\mathrm{dB}$  and from  $-1.2 \,\mathrm{dB}$  to  $3.6 \,\mathrm{dB}$  across  $V_{RTPS}$  at  $60 \,\mathrm{GHz}$  and  $63 \,\mathrm{GHz}$  respectively. As the VGA control voltage is fixed at its highest gain setting, Fig. 4.24 reveals the phase shift and insertion loss characteristic of the RTPS. Large variation of insertion loss across phase settings is the main drawback of reflection type phase shifters and it is compensated here using the VGA. Fig. 4.25 shows the gain and insertion phase of the RTPS-VGA breakout across frequency for different attenuator control,  $V_{VGA}$ , with  $V_{RTPS}$  fixed at  $0 \,\mathrm{V}$ . The VGA provides  $8.4 \,\mathrm{dB}$  analog gain control with a phase variation  $< 8^{\circ}$ .

## 4.3.2 120 GHz Frequency Doubler Breakout

A breakout of the frequency doubler, Marchand balun and driving amplifiers is reported in [2]. As mentioned earlier, the doubler in the power mixer is designed to drive  $24 \Omega$ , and so an impedance transformation from  $50 \Omega$  is included at the output in the breakout. Although the doubler was



Figure 4.26: Measurement setup of the power mixer prototype with (a) an Erickson power meter and (b) a second harmonic mixer downconverter (SHMD).

designed to deliver  $+7\,\mathrm{dBm}$  to the output load at  $120\,\mathrm{GHz}$ , modeling mismatch in the interstage matching networks ultimately yielded  $+4\,\mathrm{dBm}$  power to the output load at  $134\,\mathrm{GHz}$  with a peak conversion loss of  $3.1\,\mathrm{dB}$ . The passive modeling error was corrected by remodeling the interstage matching networks including T-junctions and bends in IE3D as discussed in Section 4.2. The new models of the passive networks help predict the measured up-tuned frequency and final output power closely.



Figure 4.27: Third harmonic output power of the implemented power mixer vs. output frequency measured with the power meter setup. The output power is plotted for the optimal input phase at each frequency with the VGA set to maximum gain. The original power mixer simulation, and the simulation with updated amplifier models that capture the degradation in fundamental power available to the mixer are shown. For comparison, a simulation of a frequency tripler driven by amplifiers with a frequency mismatch similar to the power mixer implementation is also shown. The annotated input power is at the fundamental frequency.

#### 4.3.3 $180 - 200 \, \text{GHz Power Mixer}$

The power mixer is measured in chip on board configuration using an Erickson PM4 power meter and in a second configuration using a second harmonic mixer downconverter (SHMD) from Millitech, as shown in Fig. 4.26. The third harmonic at the output pad is probed with a WR5 GSG probe from GGB industries. For the power meter measurement in Fig. 4.26(a), the first and second harmonics are filtered at the output using a WR4.3 waveguide with a lower cut-off of 137 GHz and a suggested range of operation from  $170 - 260 \,\text{GHz}$ . In Fig. 4.26(b), the Millitech mixer has a WR5 RF input port and downconverts with the second harmonic of a  $4 \times 15.6 \,\text{GHz}$  LO input. The fundamental input signal is provided by a dc-67 GHz Anritsu MG3697C signal generator through a Cascade Infinity dc-67 GHz probe in both setups. An external  $55-66 \,\text{GHz}$  Quinstar power amplifier is used after the signal generator, before further on chip amplification.

The saturated output power at the third harmonic is plotted across frequency for the optimal



Figure 4.28: Output power at 189 GHz vs. input power at 63 GHz for different input phase shift (varying  $V_{RTPS}$ ) measured with the power mixer setup. The VGA is set to maximum gain.

 $V_{RTPS}$  at each frequency in Fig. 4.27. The VGA is at maximum gain with  $V_{VGA} = 0$  V. The input power is 12 dBm as calibrated upto the tip of the input probe to ensure saturation takes place across frequency. Two simulations have been indicated on the graph. The dashed lines represent simulations of the chip when the effect of T-junctions and bends in the amplifier interstage matching networks have not been included. This mixer was observed to have a 1.5 V swing at the source node and a 0.6 V second harmonic swing at the gate. After including the effect of nonideal terminations in Section 4.1.4, Fig. 4.10(b) indicates the maximum output power for these voltage swings is -6.7 dBm, a 5.3 dB advantage over the peak -12 dBm output of the frequency tripler in Fig. 4.10(a). The dashed line simulation of the complete chip with drivers and doubler shows a peak power of  $-7 \,\mathrm{dBm}$  which is very close to these theoretical predictions. The solid line represents simulations after including the updated EM models for passive interstage matching networks that capture the effects of bends and T-junctions in the amplifier chain as discussed in Section 4.2. This includes the reduction in the saturated power of the amplifiers that adversely impacts the performance of the frequency doubler [2] and the power mixer. These full-chip simulations with updated EM models match measurements of the implemented prototype more closely in terms of both output power and frequency response. In simulations, the reduced saturated power of the amplifiers and the frequency doubler results in a 1.2 V fundamental frequency swing at the source and only 0.43 V second harmonic swing at the gate. The 3-dB bandwidth is from 184 GHz to 194 GHz.

Also shown in the figure is a simulation of the frequency tripler of Fig. 4.2 with post-layout



Figure 4.29: Variation in output power of the power mixer at 189 GHz as the relative phase shift between the input is changed by varying  $V_{RTPS}$ .  $V_{VGA}$  is adjusted to compensate for the RTPS gain variation across  $V_{RTPS}$  settings. For these measurements, the calibrated fundamental input power at the probe tip is  $+12 \, \text{dBm}$ .

parasitics and a mismatched driving amplifier chain that mimics the mismatch seen in our power mixer prototype. This has been added to bring to a conclusion the comparison between the power mixer and the frequency tripler. It can be seen that the peak simulated power mixer performance outperforms the peak tripler simulation by 5.4 dB in the face of EM modeling errors in the driving amplifiers. This is close to the theoretical 5.3 dB benefit in output power, for the designed second harmonic input of 0.6 V, afforded by the power mixer technique in Section 4.1.4.

Fig. 4.28 plots the third harmonic output power as the fundamental input power at 63 GHz is varied for different RTPS control voltages measured with the power meter setup. For this measurement, the VGA has been set to maximum gain. From these plots, it is seen that  $V_{RTPS} = 0.3 \,\mathrm{V}$  results in the optimal phase shift for maximum saturated third harmonic output power. We observe that under maximum VGA gain setting, the output power has not saturated for some RTPS control voltages, in particular  $V_{RTPS} = 0 \,\mathrm{V}$  and  $V_{RTPS} = 0.8 \,\mathrm{V}$ . It is challenging to generate the significant input power required to drive the power mixer into saturation at these RTPS control voltages.

We now verify the proposed nonlinearity engineering concept. The implemented power mixer prototype can be used to verify the effect of the relative phase shift between the input harmonics on third harmonic output power for a fixed second harmonic and fundamental swing at the mixer

inputs. The output power at 189 GHz for a fixed 63 GHz input power level and different relative phase shifts attained by varying  $V_{RTPS}$  is shown in Fig. 4.29. As the relative input phase shift is varied, the variation in gain of the RTPS is compensated by adjusting the VGA control voltage  $(V_{VGA})$  to the appropriate value, so that the fundamental power delivered at the source of the power mixer is held constant. The VGA voltage required to equalize the gain of the RTPS-VGA block is determined by using the results of Fig. 4.24 in conjunction with the results of Fig. 4.25. From these figures, it is seen that the VGA can be set to maximum gain  $(V_{VGA} = 0 \text{ V})$  for  $V_{RTPS} = 0.3 \text{ V}$ but it must operate at reduced gain for other  $V_{RTPS}$  settings. It is seen that the nature of variation of output power with relative input phase is similar to that seen in simulations. Simulations shows a  $2.5\,\mathrm{dB}$  variation across  $V_{RTPS}$  while a  $4\,\mathrm{dB}$  variation is seen in the measured results using the power meter configuration. Based on the output power curves in Fig. 4.28, an input power of 12 dBm was used for this measurement. In Fig. 4.28, we noted that the output power does not saturate for  $V_{RTPS} = 0 \text{ V}$  and  $V_{RTPS} = 0.8 \text{ V}$  for maximum VGA gain setting. Therefore, it can be expected that in this measurement, where the VGA operates at reduced gain, the output power at these RTPS control voltages will be further away from the saturated value. We expect that if the input power at 63 GHz required to drive the power mixer into saturation across RTPS settings with reduced VGA gain can be generated, the measured 4 dB variation in output third harmonic power with  $V_{RTPS}$  will reduce to become closer to the simulated 2.5 dB variation. A second chip was measured using the SHMD-based configuration of Fig. 4.26(b) and it also confirms the variation in output power with relative input phase and shows a maximum output power of  $-13 \,\mathrm{dBm}$ .

The measured dc power consumption of the chip is dominated by the three fundamental amplifier chains. The total power consumption of the implemented 180 - 200 GHz power mixer when peak output power is delivered at 189 GHz is 967 mW with 945 mW being consumed in the fundamental amplification circuits, 14 mW in the frequency doubler and the remaining 8 mW is consumed by the power mixer. The area of the chip is also dominated by the fundamental passive and amplification circuits. The area can be reduced through the use of a more scaled CMOS technology, which would reduce the number of amplification stages, the implementation of differential amplifiers with differential spiral based matching networks, and by replacing the Marchand balun with a spiral transformer-based balun [73].



Figure 4.30: Comparison of the output power of the  $180 - 200 \,\text{GHz}$  power mixer with other  $130 \,\text{nm}$  CMOS signal sources at the same CMOS technology node.

## 4.4 Conclusion

A technique for enhancing harmonic current generated by the device transconductance through engineering the harmonic content of the device voltage swings has been presented. By mixing the first and second harmonic signals, the power mixer can generate  $4\times$  more third harmonic current or  $16\times$  more third harmonic output power for the same fundamental to third harmonic conversion loss than a conventional tripler.

A prototype  $180 - 200 \,\text{GHz}$  power mixer was implemented in a  $130 \,\text{nm}$  CMOS process with an  $f_{max}$  of  $135 \,\text{GHz}$  [47]. It generates  $-13 \,\text{dBm}$  output power at  $189 \,\text{GHz}$ . The comparison with other  $130 \,\text{nm}$  CMOS sources is represented graphically in Fig. 4.30. Even with EM modeling errors in the amplifier chain as discussed in Section 4.2, the output power is the highest in this frequency range in this technology. It should also be noted that the  $130 \,\text{nm}$  CMOS oscillator-based sources in [47] and [69] achieve  $4 - 7 \,\text{dB}$  lower output power as they do not leverage the nonlinearity engineering techniques presented here. Power combining multiple such sources would require synchronization of the oscillators and would result in degradation of their dc-mm-wave conversion efficiency due to the loss of the power combiners.

A comparison with other works is shown in Table 4.1. In [2],  $f_{out}/f_T$  is shown as a relevant metric for comparing the output power of frequency multipliers across technology nodes, and a corresponding column has been included. Signal generation beyond  $f_{max}$  has been pursued using different classes of circuits - oscillators with harmonic extraction, frequency multipliers, radiating

harmonic oscillator arrays and radiating multiplier arrays. Stand-alone oscillators with harmonic extraction typically exhibit the highest efficiency for the same  $f_{out}/f_T$  since they avoid multi-stage fundamental-frequency amplifiers, which are required as drivers in frequency multipliers and in radiating oscillator arrays for synchronization. Indeed, in literature, harmonic oscillator arrays have efficiencies comparable to multiplier arrays. Harmonic-oscillator-based sources also impose restrictions at the transceiver and system-level. The simultaneous optimization of phase noise, frequency tuning and output harmonic enhancement while maintaining high dc-mm-wave efficiency in these sources is a challenging problem. In frequency-multipliers and the power-mixer, the fundamental VCO can be optimized for frequency tuning and phase noise. This potential advantage of multipliers for high-mm-wave signal generation has been noted in the literature in [73], [96] and [97]. In [96], the authors demonstrate 5 dB improvement in phase noise of a VCO driving a frequency doubler compared to a push-push implementation of the same VCO. A tunable low-frequency VCO is also easier to lock in a phase-locked loop. Signal path phase and particularly amplitude control for phased arrays is also rendered challenging in oscillator-based sources.

When compared with the other works across technology nodes in Table 4.1 with  $f_{out}/f_T > 2$ , the measured output power of -13 dBm is among the highest when frequency multipliers, multiplier arrays and harmonic oscillator arrays are considered (e.g. [98] and [78]), and output power per element of an array is used. The simulated output power of -7 dBm, which can be potentially achieved in the absence of frequency mismatches in the driving amplifiers, would be the highest including harmonic oscillators, and would be higher than the output power per element of oscillator arrays and multiplier arrays with a comparable  $f_{out}/f_T$  by 5-6dB (consistent with our theory). The simulated efficiency of 0.02%, in the absence of frequency mismatch, is also comparable to the efficiency-trend of frequency multipliers, multiplier arrays and harmonic oscillator arrays with  $f_{out}/f_T > 2$  when the difference in  $f_{out}/f_T$  is considered (as predicted by our theory). These conclusions on power and efficiency are depicted in Fig. 4.4.

The concept was presented in the context of the third harmonic but can be extended to other harmonics as well. Other circuit topologies for engineering the device waveforms to enhance the extracted harmonic power present an interesting avenue for further investigation.

Table 4.1: Recent CMOS and SiGe Sources beyond 150 GHz

| Ref.  | Tech.             | $f_{in}/$           | $f_{out}$      | $\frac{f_{out}}{f_T}$ | BW/           | $P_{sat}/$              | Peak  | $P_{dc}$ | $\frac{P_{out}}{P_{dc} + P_{in}}$ | Area               | Notes                                        |
|-------|-------------------|---------------------|----------------|-----------------------|---------------|-------------------------|-------|----------|-----------------------------------|--------------------|----------------------------------------------|
|       | CMO               | $\mathbf{S}f_{osc}$ |                |                       | TR            | $P_{sat}/\text{elem}$ . | CG    |          |                                   |                    |                                              |
| Units | nm                | $\mathrm{GHz}$      | $\mathrm{GHz}$ |                       | %             | $\mathrm{dBm}$          | dB    | mW       | %                                 | $\mathrm{mm}^2$    |                                              |
| Meas. | 130               | 63                  | 189            | 2.7                   | 5.3           | -13                     | -20.5 | 967      | 0.005                             | 2.4×1.1            | V-band PA+Active Doub.                       |
| Sim.  | 130               | 60                  | 180            | 2.6                   | 6.1           | -7                      | -14.5 | 1081     | 0.02                              |                    | +Power Mixer (Our work)                      |
|       |                   |                     |                |                       |               |                         |       |          |                                   |                    | Mult. and Mult. Arrays                       |
| [98]  | $45^{\mathrm{i}}$ | 105                 | 420            | 2.1                   | 10            | -3/-12                  | -15   | 700      | 0.07                              | $2.7 \times 3.8$   | PA+Act. Quad.+2×4 Arr.  •                    |
| [71]  | 65                | 122                 | 244            | 1.2                   | 7.8           | -6.6                    | -11.4 | 40       | 0.51                              | $0.2 \times 0.25$  | Active Doub.*                                |
| [99]  | $45^{\mathrm{i}}$ | 85                  | 170            | 0.9                   | 8             | 3.4/-2.6                | -14.6 | 267      | 0.82                              | $2 \times 2.9$     | $PA + Act. Doub. 2 \times 2 Arr. ^{\bullet}$ |
|       |                   |                     |                |                       |               |                         |       |          |                                   |                    | Harm. Osc. Arrays •†                         |
| [78]  | 65                | 84.5                | 338            | 1.7                   | 2.1           | -0.9/-12.9              | -     | 1540     | 0.05                              | $2 \times 1.95$    | 4×4 Array <sup>¶</sup>                       |
| [45]  | $45^{\mathrm{i}}$ | 145.5               | 291            | 1.5                   | -             | -10.9/-16.9             | -     | 74.8     | 0.11                              | $0.8 \times 0.8$   | $2\times2$ Distributed Act. Rad.             |
| [46]  | $45^{\rm i}$      | 140                 | 280.5          | 1.4                   | 3.2           | -7.2/-19.2              | -     | 817      | 0.02                              | $2.7{\times}2.7$   | 4×4 Distributed Act. Rad.                    |
| [81]  | 65                | 130                 | 260            | 1.3                   | $1.4^{\circ}$ | 4.1/-4.9                | -     | 800      | 0.33                              | $1.5 \times 1.5$   | $4\times2$ Array                             |
|       |                   |                     |                |                       |               |                         |       |          |                                   |                    | ${\bf Harmonic~Oscillators} \uparrow \^{}$   |
| [47]  | 130               | 85.3                | 256            | 3.6                   | -             | -17                     | -     | 71       | 0.03                              | $0.2 \times 0.26$  | ‡                                            |
| [69]  | 130               | 96                  | 192            | 2.7                   | -             | -20                     | -     | 16.5     | 0.06                              | $0.45 \times 0.39$ | 9                                            |
| [47]  | 65                | 160.7               | 482            | 2.4                   | -             | -7.9                    | -     | 61       | 0.26                              | $0.2 \times 0.11$  |                                              |
| [49]  | 90                | 76                  | 228            | 1.8                   | 10.8          | -6.2                    | -     | 86.4     | 0.27                              | $0.39 \times 0.4$  |                                              |
| [100] | 90                | 72.3                | 217            | 1.7                   | 7.8           | -4                      | -     | 128      | 0.31                              | $0.9 \times 0.59$  | includes antenna                             |
| [70]  | $45^{\rm i}$      | 158                 | 316            | 1.6                   | -             | -21                     | -     | 46.4     | 0.02                              | $0.75 \times 0.4$  | 5                                            |
| [101] | 65                | 97.7                | 293            | 1.5                   | 5.7           | -2.7                    | -     | 19.2     | 2.8                               | $0.11\times0.5$    | includes antenna $\P$                        |
| [82]  | 65                | 96                  | 288            | 1.4                   | -             | -1.5                    | -     | 275      | 0.26                              | $0.65 \times 0.5$  | includes antenna                             |
| [70]  | $45^{i}$          | 108                 | 216            | 1.1                   | -             | -14.4                   | -     | 57.5     | 0.06                              | $0.83 \times 0.6$  |                                              |

 $<sup>^{\</sup>mathrm{i}}$  This is a silicon-on-insulator technology.  $^{*}$  Drivers not implemented, and will reduce efficiency.

<sup>•</sup> For arrays, sum of power available at all antenna (assumes ideal spatial power combining) is used to calculate efficiency.  $P_{sat}$  per element is also reported.

 $<sup>^\</sup>dagger {\rm For}$  oscillators, the efficiency is  $P_{out}/P_{DC}.$   $^{\circ} {\rm Oscillator}$  tuning range.

<sup>¶</sup>Reported efficiency is  $P_{radiated}/P_{DC}$  as antenna radiation efficiency has not been reported in these works.

For works with antenna, power available at the antenna has been reported.

<sup>&</sup>lt;sup>‡</sup>This oscillator is biased through the output pad using the probe bias-T.

<sup>§</sup> For the multipliers driven by a VCO, the reported efficiency is  $P_{out}/P_{DC}$ .

| Ref.  | Tech.           | $f_{in}/$      | $f_{out}$      | $\frac{f_{out}}{f_T}$ | BW/  | $P_{sat}/$              | Peak | $P_{dc}$ | $\frac{P_{out}}{P_{dc} + P_{in}}$ | Area               | Notes                      |
|-------|-----------------|----------------|----------------|-----------------------|------|-------------------------|------|----------|-----------------------------------|--------------------|----------------------------|
|       | $\mathbf{SiGe}$ | $f_{osc}$      |                |                       | TR   | $P_{sat}/\text{elem}$ . | CG   |          |                                   |                    |                            |
| Units | nm              | $\mathrm{GHz}$ | $\mathrm{GHz}$ |                       | %    | $\mathrm{dBm}$          | dB   | mW       | %                                 | $\mathrm{mm}^2$    |                            |
| [97]  | 130             | 162.5          | 325            | 1.3                   | 6.3  | -1                      | 6    | 420      | 0.12                              | $1.2 \times 0.43$  | Active Doubler+PA          |
| [97]  | 130             | 18             | 322.5          | 1.3                   | 3.4  | -3                      | -2   | 1617     | 0.03                              | $2.2 \times 0.43$  | 2 Act.Trip.+PA+Act.        |
|       |                 |                |                |                       |      |                         |      |          |                                   |                    | Doub.                      |
| [96]  | 120             | 150            | 300            | 0.99                  | 7.7  | -1.7                    | N/A  | 167      | 0.404                             | N/A                | VCO+Buffer+Act. Doub. $\S$ |
| [102] | 90              | 111.3          | 222.5          | 0.76                  | 20.2 | 2                       | -15  | 35       | 1.86                              | $0.56 \times 0.44$ | 4Active Doubler*           |

Table 4.2: Recent CMOS and SiGe Sources beyond 150 GHz (continued)

<sup>§</sup> For the multipliers driven by a VCO, the reported efficiency is  $P_{out}/P_{DC}$ .



Figure 4.31: Visual summary of Table 4.1. Even with frequency mismatch in the driver, the power mixer has one of the highest power across technology nodes for  $f_{out}/f_T > 2$  amongst mulitpliers and oscillator-arrays (per-element power). Mismatch can be corrected in a re-spin by using updated EM models for drivers to achieve an efficiency comparable to the multiplier and multiplier-arrays trend in efficiency.

<sup>\*</sup>Drivers not implemented, and will reduce efficiency.

## Chapter 5

## Low Noise and Low Spur RF PLL: Reference-Sampling PLL

This chapter focuses on high purity frequency synthesizers in CMOS. As discussed in Chapter 1, CMOS lags heavily behind SiGe as an intrinsically low noise technology, but is able to meet the daunting specifications of evolving standards by leveraging its heavy integration and reliability for noise-minimizing and noise-canceling architectures. Two state-of-the-art approaches for low noise PLLs are discussed, and a new simple architecture which matches their jitter figure-of-merit is presented.

## 5.1 Review

This section reviews sub-sampling PLLs and injection-locked multipliers with large integer multiplication ratios. In recent literature, these two techniques have demonstrated the best jitter figure-of-merits, with the former also demonstrating one of the lowest phase noise at close-in offset from the carrier. While recently these techniques have also been expanded to fractional-N PLLs and all-digital PLLs, the discussion is beyond the scope of this thesis and is limited here to integer-N PLLs.



Figure 5.1: Conventional Type-II Second Order PLL.

## 5.1.1 Conventional Type-II Second Order PLLs

In conventional PLLs using a phase-frequency detector (PFD), as in Fig. 5.1, the phase error between reference and VCO edges is detected in time-domain and converted to a corrective control voltage for the VCO through a charge pump.

#### Frequency Acquistion and Tracking

The PFD has a phase error-to-voltage gain which is monotonic for all phase error, that is for  $\pm \pi$  of the reference phase range. It is for this reason it can work for both phase and frequency lock and does not need a separate loop to help with frequency acquisiton.

While acquisition is a nonlinear process, a rule of thumb is that the [103] the acquisition range is roughly equal to the loop bandwidth. After lock is acquired, slow frequency drifts may still occur due to temperature drifts etc., and these are corrected by the Type-II loop so that the static phase error is ideally maintained at zero.

## Low Noise Performance

The phase transfer functions from the in-band components to the VCO output are shown below

$$H_{n,i} = \frac{L(s)}{1 + L(s)} \times \frac{1}{\beta_i} \tag{5.1}$$

where L(s) is the PLL open loop gain,  $H_{n,i}$  is the transfer function<sup>1</sup> from component i to the output of the PLL, and  $\beta_i$  is the feedback factor from the loop output to the i<sup>th</sup> component output.

$$L(s) = \frac{K_{PD}}{N} \cdot \frac{(1 + sRC)}{sC} \cdot \frac{K_{VCO}}{s} \text{ where } K_{PD} = \frac{I_{CP}}{2\pi}$$
(5.2)

<sup>&</sup>lt;sup>1</sup>for small signal noise, this is also the noise transfer function

Here,  $K_{PD}$  is the phase detector gain estimated by averaging the pulse current generate by the charge pump over a reference cycle, and N is the multiplication ratio between the reference and the VCO being phase locked to it.  $I_{CP}$  is the charge pump current feeding onto the loop filter capacitance C through the stabilizing resistor R.

The different feedback factors are <sup>2</sup>

$$\beta_{ref+buffer} = \frac{1}{N} \tag{5.3}$$

$$\beta_{PD/CP} = \frac{K_{PD}}{N} \tag{5.4}$$

$$\beta_{div} = \frac{1}{N} \tag{5.5}$$

$$\beta_{VCO} = \frac{1}{L(s)} \tag{5.6}$$

Assuming that  $L(s) \gg 1$  inside the closed-loop bandwidth ( $\omega_{BW} \approx$  open loop  $ugf \omega_u \approx \frac{K_{PD}K_{VCO}R}{N}$ )<sup>3</sup>, the resultant phase transfer function from in-band components to the PLL output is just  $\frac{1}{\beta_i}$ . The phase noise at the PLL output from different components is then

$$S_{\phi n, out/ref + buffer}(f) = N^2 \times S_{\phi n, ref + buffer}(f)$$
(5.7)

$$S_{\phi n, out/CP}(f) = \frac{N^2}{K_{PD}^2} \cdot 2S_{in, CP} \cdot \frac{\tau_{PFD}}{T_{ref}} \quad \text{for thermal noise}$$
 (5.8)

$$S_{\phi n.out/div}(f) = N^2 \times S_{\phi n.div}(f) \tag{5.9}$$

Here,  $2S_{in,CP}$  is the noise from the UP and DOWN (DN) MOS current sources <sup>4</sup>,  $\tau_{PFD}$  is the width of the UP or DN pulse from the charge pump at lock <sup>5</sup>, and  $T_{ref}$  is the reference period.

Outside the bandwidth (L(s) < 1) the noise contribution from loop components rolls of as the loop gain and  $H_{n,i} \approx L(s) \times \frac{1}{\beta_i}$ . So, the VCO noise appears directly as

$$S_{\phi n, out/VCO}(f) = 1 \times S_{\phi n, VCO}(f)$$
(5.10)

<sup>&</sup>lt;sup>2</sup>While there is also noise from loop filter resistance and the digital logic in the the PFD they are not necessary for the discussion here.

<sup>&</sup>lt;sup>3</sup>This is true if the stabilizing zero  $\frac{1}{RC}$  is much less that the open loop unity gain frequency

<sup>&</sup>lt;sup>4</sup>In [104],  $2S_{in,CP}$  is approximated as  $8kT\gamma g_m$  from a single MOS-based current source in saturation, and no noise from the MOS-based UP/DN switches.

<sup>&</sup>lt;sup>5</sup>Ideally, this is zero width, but due to mismatches in the loop, a static phase error leading to a finite pulse width can exist.



Figure 5.2: (a) Sub-sampling phase detector. (b) Timing Diagram.

The in-band noise is dominated by loop components including reference buffer, phase detector, charge-pump and divider, where their noise is multiplied up by a factor of  $N^2$ , and the out-of-band performance is determined by the VCO. For the latter, recent literature has many works that minimize phase noise for a given power achieving the theoretical limit of the phase-noise-and-power FoM [105–114]. Gao et. al. in [104] have proposed a sub-sampling PLL which minimizes the noise contribution from the loop components and achieves the best integrated jitter for a given DC power consumption, and lowest in-band phase-noise of any demonstrated PLL architecture.

## Spur Performance

Due to mismatch between the UP and DN current sources in the charge pump, a static phase error is introduced to equalize the positive and negative charge deposited on the loop filter capacitance during a reference cycle. At lock, a short pulse of width  $\tau_P FD$  (ideally  $\tau_P FD = 0$ ) is generated from the PFD each reference cycle and can introduce significant spurs on the control voltage, which then get upconverted around the VCO frequency through the varactors.

Typically, to avoid discrete time effects in PLLs, the bandwidth is kept at  $\omega_{ref}/10$  [103]. To suppress spurs, conventional Type-II PLLs have a bandwidth of  $\omega_{ref}/20$  which increases settling time.

## 5.1.2 Sub-sampling PLLs

The subsampling PLL (SSPLL) estimates the phase error in the voltage domain by sampling the VCO sinewave with the buffered reference. At any falling reference edge (t = 0), the oscillator



Figure 5.3: Sub-sampling phase detector + charge pump profile and comparison with conventional PFD+ charge pump. SSPD has higher gain and restricted monotonicity.

voltage is  $A_{VCO}\sin(\Delta\phi)$ , where  $\Delta\phi$  is the phase error. This is shown in Fig. 5.2, where the voltage of the VCO output at the reference falling edge is sampled onto the capacitance  $C_{samp}{}^6$ . The resultant subsampling phase detector (SSPD) profile is shown in Fig. 5.3. The phase detector has an acquisition range where the profile is monotonic, roughly from  $-\pi/2$  to  $\pi/2$  of oscillator phase. The subsampling phase detector samples every N VCO cycles, or once every reference cycle, and does not require a divider, thereby eliminating one source of noise. The complete sub-sampling PLL is shown in Fig. 5.4.

#### Frequency Acquistion and Tracking

The absence of a frequency divider implies that a separate mechanism for frequency acquisition is required, as in the absence of a divider, the SSPLL will lock to any integer multiple of the reference frequency. Another pressing reason arises from the limited monotonicity of the phase detector profile, as the output of the phase detector cycles across the extended profile during frequency acquisition, shown in Fig.5.5.

In [104] there is an additional conventional Type-II PLL used as a frequency acquisition loop (FAL). The loop has a PFD<sup>7</sup> and charge pump (CP) that adds its correction current to the same loop filter capacitance as the SSPLL transconductance, providing the control voltage for the VCO

<sup>&</sup>lt;sup>6</sup>In [104], the unity gain buffer and the second switch are implemented implicitly through a transconductance and pulser combination. The hold capacitance is the loop filter capacitance.

<sup>&</sup>lt;sup>7</sup>monotonic from  $-\pi$  to  $+\pi$  of reference phase



Figure 5.4: Sub-sampling PLL architecture with acquisition aid.



Figure 5.5: Acquisition process in SSPLL and conventional PLL. The red dot denoting the instantaneous phase error drifts across the phase-detector profile as the VCO frequency varies. Without a separate acquisition loop, the sign of feedback changes repeatedly in an SSPLL, while it remains the same for the conventional PLL.

varactor. The PFD has a dead-zone, so that the loop is only activated when the SSPLL falls out of frequency lock and the phase error exceeds a certain limit. Once the PLL acquires frequency lock, the task of locking the phase falls to the SSPLL. The type-II nature of the SSPLL itself ensures that lock is acquired with zero static phase error.

After lock, any further small disturbances in phase alone are quickly tracked and eliminated by the SSPLL. There are two types of disturbances in frequency that can occur after lock - slow drifts in frequency due to effects like temperature variation etc., and sudden large disturbances that immediately force the loop out of lock. The former manifest as a slow drift in phase and are corrected by the SSPLL itself with its limited acquisition range <sup>8</sup>, such that the loop remains at lock with the zero static phase error of Type-II loops. However, sudden large disturbances will cause the loop to fall out of lock, and the FAL must kick in to help re-acquire. In [104], the FAL is switched off after the initial frequency acquistion to save power. This means the loop in [104] remains susceptible to large disturbances.

It should be noted that in the event of large disturbances in frequency, if the FAL were to remain on, the re-acquisition process is slow as the phase error must first increase to  $\pm \pi/2$  of reference phase for the FAL to kick in. A technique for linearizing the SSPD profile and eliminating the dead-zone based PFD is discussed in [115]. The authors are able to re-acquire lock quicker than the approach in [104] when there is alarge step disturbances in frequency. This is done without incurring a large DC power penalty and achieving almost the same jitter-DC power FOM<sub>j</sub>.

#### Low Noise Performance

The SSPLL has drastically better noise performance than a conventional type-II PLL. The divider noise is obviously eliminated. There are other techniques, such as [116] which also eliminate dividers and their noise. The latter however still contains a virtual division by N, and so the noise of loop components other than the VCO is still multiplied by  $N^2$ . As shown below, for the SSPLL, this multiplication is eliminated for everything but the reference source and reference buffer noise.

Loop gain is

$$L(s) = K_{PD}.g_m.\frac{(1+sRC)}{sC}.\frac{K_{VCO}}{s} \text{ where } K_{PD} = A_{VCO}$$
(5.11)

<sup>&</sup>lt;sup>8</sup>The phase error from slow frequency drifts appears slowly and is corrected before it can drift out of the monotonic range of the SSPD.

Here, L(s) is the loop gain,  $K_{PD}$  is the phase detector gain as derived in Fig. 5.3,  $g_m$  is the transconductance converting the sample voltage with phase error information into a current fed onto the loop filter capacitance C through stabilizing resistor R.

$$\beta_{ref} = \frac{\phi_{ref}}{\phi_V CO} = N$$

$$\beta_{SSPD} = \frac{V_{SSPD}}{\phi_V CO} = A_{VCO} \text{ or, indeed } \frac{1}{K_{PD}}$$

$$\beta_{CP} = g_m A_{VCO}$$

$$\beta_{VCO} = 1$$

It is clear, that unlike the conventional PFD-based type-II PLL, the subsampling PLL only multiplies the reference and reference buffer noise by  $N^2$ . The PLL achieves one of the best jitter performance Figure of Merit for a given DC power consumption with FOM<sub>j</sub> = -248 dB.

#### Spur Performance

At lock, the difference between the UP and DN transconductance currents must necessarily be zero. They are pulsed by the same pulser. As the two current sources are on for the same time, even if there is a mismatch in the UP/DN currents, the mismatch is balanced by the loop by introducing a static phase error at steady state. The spur performance of SSPLL in [104] is still only -46 dBc which is poorer than conventional PLLs even though charge pump mismatch is not a contributing factor in the former. The reason is that even with a buffer, there is sufficient BPSK-like modulation of the VCO tank load at reference frequency between the ON and OFF state of the sampling switch (due to capacitance paths in the buffer) to give strong periodic spur effects.

## Other notable iterations on subsampling PLLs

As the noise of only the reference path is multiplied by  $N^2$ , a low-noise reference buffer is the dominant soruce of power consumption in the SSPLL. A very low noise reference crystal with sinewave output was used in [104]. The slow slope of the sinewave makes reference buffer power consumption very large due to large short circuit current. To reduce the time for which short circuit current flows and hence reduce the power consumption, Gao. et. al. propose a subsequent iteration in [3], see Fig. 5.6, where they delay the input to the PMOS gate and drive the NMOS and PMOS in a non-overlapping manner. Only the rising edge of the sinewave rectified by the NMOS  $M_n$  matters for the sampling process, so the gate  $G_{small}$  and PMOS  $M_p$  need not be sized



Figure 5.6: Reference buffer power consumption reduction [3]. The on time of  $M_n$  and  $M_p$  are offset to reduce short circuit current.

for low noise.

In this work, the authors have also included a dummy switch with a dummy sampling capacitance, see Fig. 5.7. The VCO is connected to the dummy sampling cap during the OFF-state. This attenuates the BPSK modulation of the VCO tank at reference frequency, in turn reducing the spurs. To further reduce the power consumption, the authors remove the VCO buffer. This helps with the jitter FoM but scales back the improvement in spur reduction due to periodic charge injection from the sampling switch. When sampling starts, the rising edge is not carefully planned, so the VCO value can be different from the steady state sample on the sampling capacitance, leading to charge injection into the VCO tank. Overall, the PLL in [3] has 4 dB better FOM<sub>j</sub> of -252 dB and 10 dB better spur at -56 dBc compared to [104].

In [117], the authors include the dummy path spur suppression mechanism, but retain the VCO buffer to increase isolation of the VCO tank. The authors also show that the load modulation spur is proportional to  $\sqrt{C}_{samp}$  and use a smaller  $C_{samp}$  (the spur is related to the mismatch in  $C_{samp}$  as a fraction of tank capacitance  $C_{tank}$ ). This in turn increases the  $kT/C_{samp}$  noise from the buffer. To further reduce the spur, a method to reduce charge injection from the sampler into the VCO tank through parasitic capacitance in the buffer is proposed, see Fig. 5.8. In steady state



Figure 5.7: Spur reduction technique [3]. Dummy switch and load for the VCO tank to prevent changing tank impedance during sampling.



Figure 5.8: Spur reduction technique [3]. To prevent spurs from periodic charge injection from the sampler into the VCO, there should be no difference in steady state between the VCO voltage at the start of the tracking phase and the sampled value stored on the capacitor. For this, the other reference edge is locked to a VCO zero crossing through a DLL. The noise on this edge is immaterial, so it can be generated using the low power short-circuit eliminating circuit of Fig. 5.6.

the VCO value at the start of the sampling process should be equal to the steady state sample on  $C_{samp}$ . This needs retiming the tracking reference edge as well, so that it aligns with a VCO zero crossing (equal to steady state sample stored on  $C_{samp}$  at sampling edge). A DLL is used for this realignment process. The DLL can be low power, as the noise on the tracking edge is not critical <sup>9</sup>. This results in a spur of  $-80 \,\mathrm{dB}$ , but the FoM<sub>j</sub> is higher at  $-244.6 \,\mathrm{dB}$ . The latter is due to the use of a VCO buffer which is noisy and consumes power, and also the reduced  $C_{samp}$ .

In recent literature, the subsampling idea has been extended to all digital PLL approaches [118], [119], ring-VCOs [120], and even to fractional-N PLLs [121], [122].

## 5.1.3 Injection-Locked Clock Multipliers (ILCM)

Injection locking shapes a VCO phase noise like a first order PLL, and sub-harmonic injection can help lock a noisy VCO to a clean low frequency reference. The effectiveness of this approach is based on the strength of the injection signal and as the power in the relevant harmonic of the injected signal rolls off with large multiplication ratio N, and for some time injection locked clock multiplier sources had been limited to low multiplication ratios.

Elkholy et. al. in [4] have demonstrated ILCMs with large multiplication ratio by injecting large subharmonic signals. The conventional injection locking analysis is for small injection, so they derive the non-linear behavior of the oscillator under large injection conditions. The exact analysis is not relevant here, and it suffices to present the equivalent phase detector's  $K_{pd}$  profile, as seen in Fig. 5.9. We can see that the profile resembles the subsampling PLL and is only monotonic between  $-\pi/2$  and  $+\pi/2$  of the oscillator phase.

## Frequency Acquistion and Tracking

The complete ILCM loop is shown in Fig. 5.10. The absence of a divider and limited acquisition range again points to the importance of a Frequency Acquision Loop (FAL). In [4], to ensure that the ILCM locks to the correct super-harmonic of the reference, instead of a FAL, a fine resolution capacitor bank is used. The loop is therefore still vulnerable, and cannot recover if there is a disturbance that throws it out of lock.

<sup>&</sup>lt;sup>9</sup>As long as the tracking edge is aligned with a VCO zero-crossing, the duty cycle need not be 50%. This allows the use of the power saving reference buffer from [3]



Figure 5.9: Profile of equivalent phase detector in ILCM. This is taken from the simulated profile under large thick pulse injection from [4].



Figure 5.10: From [4]. The injection locking path and the DLL are on simultaenously (blue time period). While the Type-I path works, the DLL matches the reference edges of the injection path and the integral path. When the injection path is gated (red time period), the paths in red are connected, and the accumulated phase error due to frequency drift alone is corrected.

In a typical Type-I loop, the static phase error is not zero. There is no charge pump and loop filter capacitance (unlike a Type-II PLL), and the control voltage is adjusted to a value which tracks the VCO frequency as it drifts  $^{10}$  So while the static phase error will drift with time and the loop stays in lock, this is not a problem for noise performance if the slope of the profile  $K_{PD}$  remains constant at the value designed for optimal noise performance. For ILCM, slow frequency drifts will be tracked to a certain amount but will cause the locking point to drift closer to the edge of the monotonic region of the phase detector profile, that is the edge of the injection locking range. Further, not only will the loop be more vulnerable to falling out of lock, if the drift continues in the same direction, it will eventually fall out of lock. As the slope of the phase detector in the monotonic region is not constant, a drift in the lock point will also reduce  $K_{PD}$  and adversely affect the noise performance.  $^{11}$ 

For this reason, it is necessary to keep the ILCM locked near the ideal locking point in the center of the locking range, and a very low bandwidth frequency tracking loop (integral path) is introduced in addition to the ILCM proportional control path, see Fig. 5.10. The integral path ensures that the static phase error at lock remains zero  $^{12}$  and the proportional path is centered in the middle of the locking range with optimal  $K_{PD}$  for noise performance.

Race Conditions When using separate paths to lock the same VCO, it is important to ensure reference edge for the two loops is not mismatched. Otherwise, a race condition will occur, where the two loops compete to fix the others' error. To avoid this the injection locking path is gated and turned off periodically in the ILCM. This way the accumulated phase error from frequency is not

<sup>&</sup>lt;sup>10</sup>Actually, a Type-I PLL with acquisition range from  $\pm \pi$  of reference phase, will be able to recover even if it falls out of lock if it is still within the acquisition range.

<sup>&</sup>lt;sup>11</sup>In reality, Type-I loops are usually specified to lock under certain conditions. It can tolerate a limited drift in free-running frequency and the voltage regulator is designed to ensure that the VCO does not drift out of that range. Similar arguments can be made towards temperature control. Recovery from a sudden change in both Type-I and II is only possible if the loop is still withing acquisition range. Injection locking has a a very narrow range as it is a weak mechanism. In [4], the ILCM will remain in lock only if the voltage regulator maintains the voltage to within 25 mV if the FTL is removed, which is very small. As seen later, this is not the issue with our Type-I PLL architecture where the locking mechanism is stronger and is based on explicit phase detection, and the loop can lock, and maintain performance across 120 mV, or 8 MHz of frequency variation without additional assistance, Fig. 5.33.

<sup>&</sup>lt;sup>12</sup>assuming no non-idealities such as mismatch etc.

reset by the an injection pulse for the gated cycle, and is corrected by the integral path without simultaneous competition from the proportional path. <sup>13</sup>

This is an important aspect to consider when using separate frequency tracking loops, and is accounted for in the new architecture proposed in this chapter. For the SSPLL, the FAL only runs when the SSPLL is off and does not compete with it.

#### Low Noise Performance

Injection locking removes all in-band components, and directly injects a reference signal into the oscillator. The reference and reference buffer noise is multiplied up by  $N^2$  within the PLL bandwidth. The slow frequency tracking loop has a very low bandwidth and does not contribute much to the in-band noise except at very low offset.

In-band suppression of VCO noise for the same VCO in Type-I PLL is lower (20 dB/dec) than Type-II PLLs (40 dB/dec) <sup>14</sup>, but the elimination of any other noisy loop component and attendant low power consumption means that the  ${\rm FoM}_j = -252\,{\rm dB}$  of the ILCM rivals the subsampling PLL of the previous section.

#### Spur Performance

The periodic gating of the reference injection pulse results in strong sub-harmonic reference spurs. The large periodic injection itself results in large reference spurs. The proposed ILCM has large spurs of  $\approx -40$  dBc.

It is noted that the phase detector and divider are implicit in an ILCM, which means there is no explicit measure of the phase error available outside the oscillator. In the Type I architecture

<sup>&</sup>lt;sup>13</sup>An additional DLL is used in [4] to match the reference edge of the integral path to the that of the proportional path by aligning the former input reference edge with the VCO during the time the integral path is off. This is not to avoid race conditions, but to make sure that the component introduced in the accumulated phase error during the gated cycle (when the integral path is on) due to phase difference between the reference of the two paths, is eliminated. This way that component is not fixed periodically by the integral path. Otherwise while there is no race condition, the loops are still chasing one after the other to correct each other's perceived phase error due to reference edge mismatch.

 $<sup>^{14}</sup>$ While the FTL yields  $\frac{1}{\omega^2}$  open loop gain fall-off and hence 40 dB/dec suppression of VCO noise, this is only true for frequencies below FTL bandwidth. For most of the loop bandwidth VCO noise is suppressed at 20 dB/decade.

proposed in the chapter, an explicit phase detector is used. Based on the phase error, schemes for spur reduction and cancellation have been proposed.

A fractional-N approach of ILCMs has been presented in [123].

# 5.2 New Sampled RF-PLL approach: Reference-Sampling Phase Locked Loop (RSPLL)

Divider-less PLLs, such as the sub-sampling PLL (SSPLL) and the injection-locked clock multiplier (ILCM), substantially reduce loop noise to cross the  $-250\,\mathrm{dB}$  jitter-power figure-of-merit (FoM<sub>i</sub>) barrier. However, there exists a trade-off between  $FoM_i$  and reference spurs in PLLs, although the mechanisms vary across architectures. Narrow PLL bandwidths are necessary for reducing spurs through filtering, but this can conflict with the optimal bandwidth for jitter. In SSPLLs, buffers isolating the VCO from the sub-sampled phase-detector (SSPD) reduce spurs at the expense of noise and power consumption. Smaller sample capacitances in the SSPD reduce spurs generated by mismatch-induced charge sharing, charge injection and tank frequency modulation at the expense of increased kT/C noise. Consequently, the SSPLL of [117] achieves spur < -80 dBc by using isolation buffers, a small sample capacitance (and another DLL-based technique) but exhibits a FoM<sub>i</sub> of -244.6 dB. In the SSPLL of [3], elimination of this isolation buffer and the use of a larger capacitance results in a better  $FoM_i$  of  $-252 \, dB$  but a spur of  $-56 \, dBc$ . The ILCM in [123] operates with large injection to enable locking to a high multiple of the reference, but this degrades spurs. The absence of noisy loop components yields a very low  $FoM_i$ , but large injection leads to a spur of -43 dBc. Also, ILCMs do not feature explicit phase detectors, limiting optimization of loop dynamics, or techniques for spur suppression.

## 5.2.1 Motivation: Low noise and Low Spur

We propose a new divider-less PLL architecture - the reference-sampling PLL (RSPLL) - that combines the best aspects of the SSPLL and the ILCM by (i) merging the sampler clock buffer with the VCO isolation buffer and (ii) eliminating all other noisy loop components to simultaneously achieve low noise and low spur. A  $2.05 - 2.55 \,\text{GHz}$  RSPLL demonstrates achieves a record FoM<sub>j</sub> of  $-253.5 \,\text{dB}$  among explicit PLLs and reference spur  $< -67 \,\text{dBc}$ .



#Short-circuit power P<sub>SC</sub> is largely independent of frequency.



Figure 5.11: Basic concept of the Reference-Sampling PLL. It combines the functionality of the power-hungry clock and isolation buffers to eliminate the dual noise penalty of two separate buffers. This helps realize very low jitter for a given power consumption while demonstrating low spur.

The basic premise is outlined in Fig. 5.11. The loop uses a buffered and gated VCO waveform to directly sample the low-noise reference-crystal sine-wave near the reference zero-crossing, as opposed to having a buffered square-wave reference sample the VCO sine-wave as in SSPLLs. This eliminates the large, noisy and power-hungry reference buffer necessary in SSPLLs, as the VCO feedback buffer essentially combines the functionalities of clock buffering for the sampler and isolation of the VCO from the sampler. The noise of this slewing inverter-buffer in the VCO feedback path has  $N^2$  contribution to the PLL output like the reference buffer in an SSPLL or ILCM (see Section 5.2.6). However, it should be emphasized that it does not have higher power consumption due to its  $N \times$  higher frequency of operation, because an inverter driven by a sine-wave is dominated by largely-frequency-independent short-circuit or crowbar current. This improves the spur significantly compared to the VCO-buffer-less SSPLL in [3], and eliminates the duplicate noise penalty of the spur-suppressing VCO buffer and reference buffer in the SSPLL in [117]. Additionally, of multiple samples potentially produced by the VCO edge, only the sample near the reference zero-crossing contains phase error information (Fig. 5.11). Therefore, a sample edge selection circuit (SESCi) terminates sampling after the relevant edge, and in doing so, reduces switching activity in the feedback path to further lower loop power consumption. The penalty of sampling the reference is a virtual division-by-N. The sampler generates a sample  $v_{PD}$  proportional to the phase error between VCO and reference,  $v_{PD} = A_{ref} \sin(\Delta \phi)/N$ , where  $A_{ref}$  is the reference sine-wave amplitude. Consequently, the noise of loop components including and after the phase detector is multiplied by  $N^2$ , unlike SSPLLs. However, using Type-I loop dynamics to eliminate the charge pump, realizing essentially a loop with no additional noisy loop components similar to ILCMs, keeps the in-band loop noise, and consequently the  $FoM_i$ , very low despite the virtual division by N.

## 5.2.2 Sampled Phase detector (PD)

The phase error can be estimated in the voltage domain and the sampled phase detector (PD) shown in Fig. 5.12 is proposed. Here the VCO sinewave is converted to a square wave using the inverter buffer, and is used to sample the reference sinewave. A logic block, the Sample Edge Selection Circuit (SESCi), discussed later in 5.2.3, is used to select the relevant sample from the multiple values sampled onto the capacitor at each VCO edge. Broadly, the VCO edge near the

buffered reference edge is selected. As shown in the figure, the noise from the buffered reference inside the SESCi itself does not affect the value sampled on the clean sinewave reference.

#### PD profile

If there is a phase error  $\Delta \phi$  between the VCO and reference, the reference sinewave value sampled by the VCO edge near the buffered reference is

$$v_{PD} = 2A_{ref}\sin(\Delta\phi)/N \tag{5.12}$$

$$v_{PD} \approx 2A_{ref}\Delta\phi/N$$
 for small phase errors (5.13)

Here,  $A_{ref}$  is the amplitude of the reference sinewave, and the factor of 2 is from the differential implementation discussed below.

This assumes that the buffered reference edge is itself close to the sinewave zero-crossing. The buffer circuit for satisfying these specifications is discussed in Section 5.3.

The phase detector profile is plotted in Fig. 5.13. Due to the virtual division by N, the proposed phase detector is monotonic over  $\pm \pi$  of the VCO phase unlike SSPLL and ILCM which are only linear in  $\pm \pi/2$  of the VCO phase. As the profile is still not monotonic over the entire reference cycle ( $\pm \pi$  reference phase), an acquisition aid is still needed. The advantage of a very linear profile is that  $K_{PD}$  is almost constant, and the noise performance is independent of the phase error at lock, obviating the need for circuitry that ensures that the loop locks to the center of the lock range (verified in Fig. 5.33).

#### Differential implementation

Further, if the mismatch is low and the loop is Type-II, the static phase error will be close to zero. This means that the samples at lock will be close or equal to the zero crossing of the reference sinewave, and a differential implementation of the phase detector may be used to counter charge injection under steady state condition. A differential input reference is required and is generated from the single-ended crystal reference through an off-chip balun.



Figure 5.12: Proposed sampled phase detector and timing diagram. The VCO is used to evaluate phase error in the loop by sampling voltages on the reference sinewave. The relevant sample pertaining to the phase error is selected using the Sample Edge Selection Circuit (SESCi). The noise of SESCi does not affect the sampled value.



Figure 5.13: Profile of proposed sampled phase detector. The profile is montonic over  $\pm \pi_{VCO}$ unlike ILCM and SSPD which are only monotonic over  $0.5 \pm \pi_{VCO}$ 

## Half reference multiplexing

For the sampling phase, a large enough hold capacitance is required to hold the value for half the reference cycle <sup>15</sup>. This is turn will translate to a larger sampling capacitance so that a larger portion of the sampled value moves from the sampling to the hold capacitance during the hold phase. In order to reduce the area, a multiplexing scheme at half the reference rate is proposed, as shown in Fig. 5.14. This avoids a large hold capacitance area.

In this scheme, a half rate 25 MHz signal is generated from the buffered reference (more on this in the SESCi). In each single-ended path, there are two sample capacitances which are muxed to the varactor control voltage. One sample capacitance tracks the signal every alternate reference cycle and presents its sampled value as the control voltage for one reference cycle period. The timing diagram is shown in Fig. 5.14. We discuss later the noise considerations that dictate the size of the sample capacitance. We have used 10 pF sample capacitance in this work, which is large enough to hold the sampled control voltage steady over 20 ns of the reference cycle.

 $<sup>^{15} \</sup>mathrm{Unlike}$  the SSPLL, there is no implicit pulser + $g_m$  cell S&H.



Figure 5.14: Half rate multiplexing of samples in each differential path. This scheme reduces the area for sample and hold capacitances.

## **Timing Considerations**

In the SSPD a very fast VCO sinewave is sampled for  $0.5/f_{ref}$  time period by the reference square wave. However, the  $R_{on}C_{samp}$  time constant there had to be sufficiently smaller than  $1/f_{VCO}$  to track the oscillator waveform closely and generate a voltage sample proportional to phase error. As SSPD uses a very small sampling capacitance, the switch size can be small, and the reference inverter buffer load remains small. This allows a fast buffering with less noise addition.

In the proposed PD, the sampling time is very small  $0.5/f_{VCO}$  which is N times lower than the time available for sampling in the SSPD. However, the signal being sampled is N times slower as well. For 85% settling to a step response,  $R_{on}C_{samp}$  need only be half the sampling time <sup>16</sup>. Compared to the VCO, the reference sinewave is slow enough that robust performance is obtained without needing a very large switch size to reduce  $R_{on}C_{samp}$ . Even with a switch larger than SSPLL, the power in the driving inverters operating at VCO frequency can be reduced by gating as discussed later in this chapter.

#### 5.2.3Sample Edge Selection Circuit (SESCi)

The LC-VCO has a differential output  $VCO_{+} - VCO_{-}$ . The antiphase signal  $VCO_{-}$  is inverter buffered  $(VCO_{-,buff})$  and used to sample the differential sinewave reference input. This signal, named TRACK generates several samples, and the idea is to select the sample closest to the zero-crossing of the differential reference  $(v_{PD} = 2A_{ref}/N.\Delta\phi)$  through the Sample Edge Selection Circuit (SESCi) shown in Fig. 5.15.

The sample selection process is shown in Fig. 5.15. In the SESCi, the in-phase VCO output  $VCO_{+}$  is ANDed with a reference frequency square wave obtained by buffering the differential reference sinewave. The resultant signal clocks a D-Flip Flop which generates a rising edge, denoted as the SampleEDGE. SampleEDGE is ANDed with the buffered  $VCO_{-}$  signal to gate TRACKand prevent it from transferring any more samples onto the sampling capacitance  $C_{samp}$ .

As the sampling process occurs on the falling edge of  $VCO_{-,buff}$ , and any SampleEDGE can only be generated from the corresponding rising and delay-matched  $VCO_{+,buff}$  edge after some delay (=  $\tau_{AND} + \tau_{DFF}$ ), no sampling process is cut short.

 $<sup>^{16}</sup>$ That is, a ratio of two between the sampling time and the RC time constant is sufficient to track a non-varying input signal



Figure 5.15: Sample Edge Selection Circuit (SESCi) selects the sample relevant to estimating the VCO phase error. The timing diagram shows that the reference buffer edge RefBuff in the SESCi does not contribute noise to the TRACK signal.

The half-rate multiplexing signal that determines which  $C_{samp}$  is visible as the varactor control voltage, is generated from the SampleEDGE signal itself ensuring that multiplexing will only occur after the sampling process is completed. This way sampling transients are not seen on the control voltage. This is also why the noise on the half-rate multiplexing signal is not important.

## 5.2.4 Frequency Tracking Loop

Owing to the limited acquisition range of the proposed proportional loop, a frequency tracking mechanism was also implemented. As discussed in Section 5.1.3, a Type-I loop can track by changing the static phase-error at the expense of noise performance, and increasing vulnerability to falling out of lock for PLLs using phase-detectors with limited and non-linear profiles. Due to the linear nature of the proposed PD, noise performance will not be compromised by changing static phase error. However, the loop can eventually drift out of lock, if the frequency error due to environmental and supply variations is larger than a prescribed value <sup>17</sup>. The integral correction FTL path helps this by tracking drift and maintains lock in the center of the phase detector profle.

## 5.2.5 Proposed PLL Architecture

Based on our discussion, the PLL block diagram controlling and LC-VCO is shown in Fig. 5.16. The LC-VCO has two varactor banks controlled by the main proportional and integral FTL path respectively.

Once the initial capacitance bank digital setting is programmed into the LC-VCO, the loop must lock to the closest integral multiple of the reference. In the absence of FTL, the proportional loop will lock with some static phase error. With the FTL, the differential loop filter capacitance develops the correct control voltage to keep static phase error zero. This process is slow due to the low loop bandwidth of the FTL. After this, the FTL tracks slow frequency drifts maintaining lock in the center of the PD profile. As the bandwidth of the FTL is very low, noise behavior is expected to be dominated by the proportional Type-I loop.

<sup>&</sup>lt;sup>17</sup>Prescribed value is obtained by measurement in Fig. 5.33



Figure 5.16: Architecture and block diagram of proposed PLL.

Loop gain for the complete loop is  $^{18}$ 

$$L(s) = K_{PD} \left[ \frac{K_{VCO,I}}{s} + \frac{g_m r_o}{(1 + s r_o C_{FTL})} \cdot \frac{K_{VCO,II}}{s} \right]$$

$$(5.14)$$

 $r_o$ , the output impedance of the transconductance cell should be made large enough, such that the practical integrator pole  $\omega_i = 1/r_o C_{FTL}$  is much smaller than the unity gain frequency  $g_m/C_{FTL}$  of the gm-C integrator. A large  $r_o$  is also essential to prevent  $C_{FTL}$  from discharging too quickly. For frequencies well beyond the very low integrator pole at  $1/r_o C_{FTL}$ ,

$$L(s) = K_{PD} \left[ \frac{K_{VCO,I}}{s} + \frac{g_m}{(sC_{FTL})} \cdot \frac{K_{VCO,II}}{s} \right]$$
 (5.15)

$$=\frac{K_{PD}K_{VCO,I}}{s^2}\left(s + \frac{g_m K_{VCO,II}}{C_{FTL}K_{VCO,I}}\right)$$
(5.16)

The second loop is Type-II and has a very low bandwidth. The overall system is a sum of first order Type-I proportional path, and second order Type-II integral path. To prevent the loop gain from crossing at the 0dB axis at  $40\,\mathrm{dB/decade}$  and potential instability, a zero is required. The sum of proportional and integral path yields a zero  $\omega_z$  at  $\frac{g_m K_{VCO,II}}{C_{FTL}K_{VCO,I}}$ 

The loop phase margin starts at  $-180^{\circ}$  and reaches  $-90^{\circ}$  with the help of the zero, resulting in first order behavior near the unity gain frequency. If the zero is chosen to be sufficiently smaller than the unity gain frequency  $\omega_u$  is given by

$$|L(s)| = \left| \frac{K_{PD} K_{VCO,I}}{j\omega_u^2} \left( j\omega_u + \frac{g_m K_{VCO,II}}{C_{FTL} K_{VCO,I}} \right) \right| = 1$$
 (5.17)

$$\approx \left| \frac{K_{PD} K_{VCO,I}}{j\omega_u} \right| = 1 \text{ if } \omega_u \gg \omega_z$$
 (5.18)

$$\omega_u = K_{PD} K_{VCO} = \frac{2A_{ref} K_{VCO,I}}{N} \tag{5.19}$$

If  $\omega_z$  is chosen to be a tenth of  $\omega_u$ , the closed loop behaves like a first-order system well beyond the zero. The  $\omega_u$  of the open loop is then also the PLL 3-dB bandwidth.

## 5.2.6 Noise and Power Analysis

In order to understand the advantages in the proposed architecture, we discuss both the noise and power trade-offs. The different sources of noise in the proposed architecture are shown in Fig. 5.17

<sup>&</sup>lt;sup>18</sup>There is also a ZOH term for the sampled loop, but it does not affect the performance discussed here.



Figure 5.17: Different sources of noise in the proposed architecture.

The loop gain of the proposed PLL without the Frequency Tracking Loop (FTL) is

$$L(s) = \frac{K_{PD}K_{VCO}}{s} = \frac{2A_{ref}K_{VCO}}{sN}$$
 (5.20)

 $A_{ref}$  is the single sided reference amplitude, and the factor of 2 is from differential implementation. Unlike the SSPLL, there is a virtual division by the multiplication ratio N. The unity gain bandwidth of the loop, and hence the closed loop PLL bandwidth is  $\omega_{BW} = \frac{2A_{ref}K_{VCO}}{N}$ .

#### Sampled Phase Detector (PD) noise and power

The feedback factor to the PD output from the PLL output is

$$\beta_{PD} = \frac{2A_{ref}}{N} \tag{5.21}$$

The transfer function to the output phase noise from PD output is

$$H_{n,PD} = \frac{\phi_{outn,PD}}{v_{n,PD}} = \frac{L(s)}{1 + L(s)} \frac{1}{\beta_{PD}}$$
 (5.22)

$$= \frac{1}{\beta_{PD}} \text{ within } \omega_{BW} \tag{5.23}$$

$$= \frac{L(s)}{\beta_{PD}} \text{ falls off as 20 dB per dec outside } \omega_{BW}$$
 (5.24)

The PSD of voltage noise at the output of the PD is  $\frac{kT}{C_{samp}f_{VCO}} \cdot \frac{f_{VCO}}{f_{ref}}$  19, such that the output

<sup>&</sup>lt;sup>19</sup>PD noise is derived through noise analysis of a sampled system. See [124]

phase noise at an offset  $\Delta f$  from the carrier due to voltage noise from PD will be<sup>20</sup>

$$L_{\phi n, out/PD}(\Delta f) = \frac{1}{2} \cdot \frac{N^2}{4A_{ref}^2} \cdot \frac{2kT}{C_{samp} f_{ref}}$$

$$(5.25)$$

In addition to the kT/C mean square voltage noise, there is also switch noise  $4kTR_{sw2}$  of the second switch which communicates the sample voltage to the varactor control. This can be very small compared to other noise sources in the loop.

Unlike the subsampled phase detector, the noise of the sampled phase detector is multiplied up by  $N^2$ . So, while the SSPLL can do with a sampling cap as small as  $10 \,\mathrm{fF}$ , the  $N^2$  multiplication of PD noise in the proposed PLL must be compensated by increasing the sample capacitance. This is really a repurposing of the large loop filter capacitance of the SSPLL towards sampling capacitance in the proposed PLL. In this work we use a sample capacitance of  $10 \,\mathrm{pF}$ .

## Charge pump and Divider noise elimination

As mentioned, the in-band noise from loop components is multiplied by  $N^2$ . To get around this and maintain competitive noise performance, we eliminate the charge pump and divider similar to the ILCM. The ILCM gets performance like SSPLL by simply eliminating loop components instead of looking to heavily suppress their noise. Indeed, except for the edge-selection logic, the actual controlling loop of the proposed loop is completely passive.

## VCO Buffer noise and power

The VCO buffers can be categorized as the noisy slewing first stage buffer which converts the VCO sinewave to a sampling clock, and the later hard switching buffers. The power consumption of a slewing buffer is dominated by short circuit power consumption  $P_{SC}$  given as in [125]

$$P_{SC} = t_{SC} I_{SC} V_{dd} f_{out} (5.26)$$

Here,  $t_{SC}$  is the short-circuit time when both PMOS and NMOS are simultaneously on,  $I_{SC}$  is determined by the load capacitance, and  $f_{out}$  is the output frequency of the buffer.  $P_{SC}$  is largely

<sup>&</sup>lt;sup>20</sup>Phase noise is upconverted to around the carrier through the nonlinearity  $A_{VCO}sin(\omega_{out}t + \phi_{n,out})$ . The factor of half is because PLL and VCO phase noise is reported as double sided noise.

independent of frequency because the short circuit time is approximately inversely proportional to output frequency.

The power consumption of the hard switching buffers is given by

$$P_{switching} = \alpha_{0 \to 1} f_{out} C_{load} V_{dd}^2 \tag{5.27}$$

where  $C_{load}$  is the total switching inverter load in the feedback path, and  $\alpha_{0\to 1}$  is the switching activity at the output frequency  $f_{out}$ . With increasing output frequency  $f_{out}$ , this can increase drastically and is  $N \times$  higher at  $f_{VCO}$  than  $f_{ref}$ . However, as only one sample is relevant per cycle the hard switching activity can be heavily gated, and in the limit  $min(\alpha_{0\to 1}) = 1/N$ . In this work, we have implemented  $\alpha_{0\to 1} = 0.5$  by gating the hard switching buffers with SampleEDGE in the process of terminating sampling after the relevant edge, as seen in Fig. 5.15.

In the SESCi, we need to delay match  $VCO_{+,buff}$  and  $VCO_{-,buff}$  so that a sample obtained on the falling edge of the latter is selected by the rising on the former. If the VCO buffer noise is low enough these two edges are well matched for the sample edge selection process.  $VCO_{+,buff}$  can be treated as a clean edge, and the single phase  $VCO_{-,buff}$  inverter noise can be multiplied by a factor of two as the noise in the inverter buffers in each differential path are uncorrelated. This is the noise on the TRACK sampling signal from VCO buffering. Therefore,

$$H_{n,VCO\,buff} = \frac{1}{\beta_{VCO\,buff}} = 2 \tag{5.28}$$

$$L_{\phi n, out/VCO \text{ buff}}(\Delta f) = \frac{1}{2} \cdot 4 \cdot S_{\phi n, VCO \text{ buff}}(\Delta f)$$
 (5.29)

Next, we show that despite the transfer function above the noise of the first-stage VCO buffer in the feedback is multiplied at the output as  $N^2$  similar to the reference buffer in all PLLs.

Phase noise at the output of the VCO buffer is [126]

$$\phi_{n,\text{VCO buff}}^2 = \omega_{out}^2 \Delta t_{n,\text{VCO buff}}^2$$
 (5.30)

$$= \omega_{out}^2 \frac{v_{n,\text{VCO buff}}^2}{\text{SL}_{out}^2} \tag{5.31}$$

Here buffer output frequency is  $\omega_{out} = \omega_{VCO}$ . The output slope  $SL_{out}$  is given as  $Gain_{VCObuff}A_{VCO}\omega_{VCO}$  when the input signal has a small swing. For large swings, for example here  $A_{VCO} = 0.5 \,\text{V}$  with

 $V_{dd, \text{VCO buff}} = 1.2V$ , the buffer output will slew as determined by the buffer load capacitance and will be largely independent of frequency.

The phase noise of the buffer edge is sampled at  $f_{VCO}$  when the TRACK edge clocks the sampling phase detector.

$$S_{\phi n, \text{VCO buff}} = \frac{\phi_{n, \text{VCO buff}}^2}{f_{VCO}} \tag{5.32}$$

It is then further sampled at  $f_{ref}$  when only one sample's information is communicated to the VCO varactor control voltage folding the noise by a factor of  $\frac{f_{VCO}}{f_{ref}}$ . The VCO buffer phase noise is therefore multiplied by a factor of  $\frac{1}{f_{VCO}} \cdot \frac{f_{VCO}}{f_{ref}}$  similar to the sampler's capacitance noise. <sup>21</sup>

$$L_{\phi n, out/\text{VCO buff}}(\Delta f) = N^2 \omega_{ref}^2 \frac{v_{n, \text{VCO buff}}^2}{(I/C)^2} \cdot \frac{1}{f_{VCO}} \cdot \frac{f_{VCO}}{f_{ref}}$$
 (5.33)

$$\propto N^2 \cdot W_{\text{VCO buff}} \cdot f_{ref}$$
 (5.34)

Essentially, if the VCO buffer is not slewing its output slope will scale with frequency and its noise will not be multiplied by  $N^2$  at the PLL output. However, if the VCO swing is large, the output slope will slew with a slope independent of frequency and buffer noise will appear as  $N^2 \times$  higher at the output.

The VCO clock-and-isolation buffer is sized to ensure the required noise performance, and is discussed more in the implementation section Section 5.3, and the voltage noise is proportional to the buffer size  $W_{\text{VCO buff}}$ . As it buffers a noisier sinewave (VCO) than the reference buffer in SSPLL or ILCM, it is significantly smaller. Therefore, despite the  $N^2$  mutliplication and differential implementation (factor of 4), the VCO buffer contribution (normalized to 2.21 GHz) is much lower than the single reference buffer in the SSPLL. It is dominated by the VCO as discussed next.

<sup>&</sup>lt;sup>21</sup>The assumption that sampled buffer phase noise is flat over  $\pm f_{VCO}/2$  is only true if it is white. The folding factor of  $f_{VCO}/f_{ref}$  is only valid for white noise, see [127].

#### In-band VCO noise suppression

The transfer function from VCO phase noise to the PLL output is

$$H_{n,VCO} = \frac{L(s)}{1 + L(s)} \frac{1}{\beta_{VCO}} \text{ where } \beta_{VCO} = L(s)$$
(5.35)

$$= \frac{1}{L(s)} \text{ which is } 20 \, \text{dB/dec suppression within } \omega_{BW}$$
 (5.36)

$$= 1 \text{ outside } \omega_{BW}$$
 (5.37)

The remaining issue is the in-band suppression of VCO noise. Due to the Type-I nature of the loop, the VCO noise dominates the in-band noise of the loop. The SSPLL can suppress it by 40 dB per decade, so the VCO can be noisier than in the ILCM. However, in the type-I ILCM with only 20 dB per decade in-band suppression of VCO noise, the VCO must be designed for very good phase-noise-to-power-consumption ratio, so that the integrated jitter  $FoM_j$  is competitive with SSPLL. In [4], the authors have designed an LC-VCO with a FoM of 193.6 dB. <sup>22</sup>. In this work the VCO only has a FoM of 184.6 dB, and it is expected that if this improved, the integrated jitter will improve very significantly.

$$L_{\phi n, out/VCO}(f) = \frac{1}{L(s)} \cdot L_{\phi n, VCO}(f)$$
(5.38)

The noise contribution and block-level power consumption in the RSPLL and SSPLL is summarized in Figs. 5.18 and 5.19. The observations in this section are summarized in the pie chart where noise contributions (normalized to 2.21 GHz) are shown at 200 kHz. The VCO clock-and-isolation buffer contributes much lower noise than the SSPLL, and its output phase noise is dominated by the VCO.

<sup>&</sup>lt;sup>22</sup>Theoretical limit on 2.4 GHz LC cross-coupled oscillator FoM is as high as 195 dB in the technology used here. VCO FoM is defined as FOM=  $\frac{(\omega/\Delta\omega)^2}{L(\Delta\omega)P_{DC[mW]}} = \frac{2\eta Q^2}{kTF}10^{-3}$ , where  $\Delta\omega$  is the offset from carrier at which FoM is calculated,  $\eta$  is RF-to-DC power efficiency, F is the oscillator noise factor, Q is the tank quality



Figure 5.18: A detailed analysis of noise contributions and power consumption in the RSPLL normalized to 2.21 GHz.

## Noise Analysis - FTL

Transfer function from the transconductance cell to the output of the PLL for  $|s| = \omega \gg \frac{1}{r_o C_{FTL}}$ , i.e for frequencies well beyond the practical integrator pole  $\omega_i$ 

$$H_{n,FTL} = \frac{K_{VCO,II}}{s} \cdot \frac{r_o}{1 + sr_o C_{FTL}} \cdot \frac{1}{1 + L(s)} \text{ rad/Amp}$$

$$(5.39)$$

$$=\frac{K_{VCO,II}}{s^2C_{FTL}} \cdot \frac{1}{1+L(s)} \tag{5.40}$$

We also note the following approximation for the loop gain

$$L(s) = K_{PD} \left[ \frac{K_{VCO,I}}{s} + \frac{g_m r_o}{1 + s r_o C_{FTL}} \cdot \frac{K_{VCO,II}}{s} \right]$$

$$(5.41)$$

$$\approx \frac{K_{PD}K_{VCO,I}}{s} \left[ s + \frac{g_m}{sC_{FTL}} \cdot \frac{K_{VCO,II}}{K_{VCO,I}} \right] \quad |s| = \omega \gg \omega_i = \frac{1}{r_o C_{FTL}}$$
 (5.42)

$$\approx \frac{K_{PD}K_{VCO,I}}{s^2} \cdot \frac{g_m K_{VCO,II}}{C_{FTL}K_{VCO,I}} \omega_z = \frac{g_m K_{VCO,II}}{C_{FTL}K_{VCO,I}} \gg \omega \gg \omega_i$$
 (5.43)

$$\approx \frac{K_{PD}K_{VCO,I}}{s} \ \omega \gg \omega_z \tag{5.44}$$



Figure 5.19: A detailed analysis of noise contributions and power consumption in the SSPLL normalized to 2.21 GHz.

The noise from the FTL can be considered as  $S_{n,FTL}(f) = 2(4kT\gamma_p g_{mp} + 4kT\gamma_n g_m)$ , where  $g_{mp}$  and  $g_{mn}$  are the transconductance of the second stage bias current in Fig. 5.26. <sup>23</sup>.

Below the zero, where  $(1 + L(s)) \approx L(s)$ 

$$H_{n,FTL}(s) = \frac{K_{VCO,II}}{s^2 C_{FTL}} \cdot \frac{s^2 C_{FTL} K_{VCO,I}}{g_m K_{PD} K_{VCO,II} K_{VCO,I}}$$
(5.45)

$$S_{\phi n, out/\text{FTL}}(f) = \frac{1}{g_m K_{PD}} \times 8kT(\gamma_p g_{mp} + \gamma_n g_{mn})$$
 (5.46)

$$S_{\phi n, out/\text{FTL}}(f) = \frac{N^2}{4A_{ref}^2} \times \frac{8kT(\gamma_p g_{mp} + \gamma_n g_{mn})}{g_m^2}$$

$$(5.47)$$

Between the zero and loop bandwidth  $(1 + L(s)) \approx L(s)$ 

$$H_{n,FTL}(s) = \frac{K_{VCO,II}}{s^2 C_{FTL}} \cdot \frac{s}{K_{PD} K_{VCO,I}}$$

$$(5.48)$$

$$S_{\phi n, out/\text{FTL}}(f) = \frac{K_{VCO, II}/K_{VCO, I}}{sC_{FTL}K_{PD}} \times 8kT(\gamma_p g_{mp} + \gamma_n g_{mn})$$
(5.49)

$$S_{\phi n, out/\text{FTL}}(f) = \frac{N^2}{4A_{ref}^2} \cdot \left[ \frac{K_{VCO, II}/K_{VCO, I}}{sC_{FTL}} \right]^2 \times 8kT(\gamma_p g_{mp} + \gamma_n g_{mn})$$
 (5.50)

Beyond the loop bandwidth  $(1 + L(s)) \approx 1$ 

$$H_{n,FTL}(s) = \frac{K_{VCO,II}}{s^2 C_{FTL}}.1$$
 (5.51)

$$S_{\phi n, out/\text{FTL}}(f) = \frac{K_{VCO, II}}{s^2 C_{FTL}} \times 8kT(\gamma_p g_{mp} + \gamma_n g_{mn})$$
 (5.52)

We are interested in analyzing the loop noise beyond 10 kHz. By positioning the zero  $\frac{g_m K_{VCO,II}}{C_{FTL}K_{VCO,I}}$  we can get first order suppression of the FTL noise. However, through a programmable  $g_m$  in this prototype, it is possible to increase the location of the zero, such that VCO noise is suppressed second order within the loop at the expense of letting in some more FTL noise. If the latter is low enough, through a lowering of  $g_{mp/n}$  by using longer channel bias current sources, this can lead to an overall improvement in FoM<sub>j</sub>. This relaxes the VCO phase noise FoM requirement mentioned in the previous section.

However, the final implementation is measured without turning on the FTL as its flicker noise was improperly captured in the PDK. This lead to an undesired increase in the low frequency noise increasing the integrated jitter value. Without the FTL, the VCO noise dominates in the final

<sup>&</sup>lt;sup>23</sup>Source follower noise is rejected by the gm-cell degenerated input pair. Only the top PMOS and bottom-most NMOS bias current contribute noise

implementation reported here. As discussed in measurement Fig. 5.33, due to the strong explicit locking mechanism, unlike ILCM Type-I dynamics, a large locking range is observed despite turning off the FTL.

#### 5.2.7Effect of nonidealities

# Vertical Offsets in PD profile

In the process of reference buffering, the buffered reference edge will be  $\Delta t$  away from the zerocrossing of the reference. The effect of a reference buffer edge delay or advance is shown in Fig. 5.20. <sup>24</sup>. While the phase detector profile is still completely monotonic over  $\pm \pi$  of oscillator phase as before, the profile is vertically asymmetric. This is not an issue, as long as the phase detector profile crosses zero, the loop will lock. In order to control  $\Delta t$ , we include a reference buffer with tunable delay to ensure that the profile always crosses zero. It should be noted that setting the delay or advance of the buffered reference edge is a one time procedure to calibrate against process corner.

Delays in the TRACK sampling signal after the edge-selection process (ANDing of Sample EDGE and  $VCO_{-,buff}$ ) can cause the PD profile to be completely positive. The timing diagram for this is shown in Fig. 5.21. This can be adjusted by advancing the reference buffer edge, so that combined with the TRACK path delay, the phase detector profile has a zero crossing.

## Horizontal Offsets in PD profile

We have only talked of effects which cause the PD profile to move vertically up or down. All the while, the zero sample control voltage coincided with PLL phase error  $\Delta \phi = 0^{\circ}$ . The loop locks such that there is zero phase error between the sinewave at the sample switch source node and TRACK waveform at its gate. The matching resistance and bypass capacitance  $^{25}$  at the reference pads introduce a phase shift between the crystal phase and the sinewave reference on-chip. There is also feedback path delay from the VCO to the TRACK signal. So the ideal locking phase error between crystal and LC-VCO is horizontally offset and is not zero.

<sup>&</sup>lt;sup>24</sup>Advanced reference buffer edges can be generated using a complementary CMOS differential reference buffer discussed in Section 5.3

<sup>&</sup>lt;sup>25</sup>These are added to attenuate charge injection into the reference sinewave at the critical zero-crossing time instant.



Figure 5.20: (a) When reference buffer generates an advanced edge with respect to reference sinewave zero-crossing, the PD resolves large VCO edge delays as advances. (b) Phase detector profile remains monotonic with advanced reference buffer edge. (a) Delayed reference buffer edge resolves advances as delays (b) Phase detector profile with delayed reference buffer edge is also monotonic.



Figure 5.21: (a) Delay due to gates after TRACK is generated results in sampling due to DelayedTRACK. (b) Timing diagram when RefBuff is at t=0. All samples are -ve and there is no steady state solution for the feedback loop. (c) By advancing RefBuff we can compensate for  $\tau_{delay}$  ensure a solution to the feedback loop.

# SESCi component noise

The blocks after the AND gate, including the DFF do not add much noise as they operate on sharp edges. We have already discussed the effect of noise on the TRACK signal from the VCO buffer in each differential paths due the sampling and sample edge selection process.

Next, we discuss the effect of noise in the SESCi reference buffer on the PD profile. The SampleEDGE is generated by ANDing the VCO with the buffered reference (RefBuff). The SampleEDGE is therefore defined either by the rising  $VCO_{+,buff}$  edge or the rising RefBuff. The noise of RefBuff only interferes with the sample for PLL phase errors when  $VCO_{+,buff}$  is near RefBuff. Fig. 5.22(a) shows the effect of buffer noise for a RefBuff advanced from the reference sinewave zero-crossing by  $|t_{adv}|$  The reference buffer noise interferes with SampleEDGE definition for when desired rising  $VCO_{+,buff}$  edge (falling  $VCO_{-,buff}$  edge) is located at  $-t_{adv}$ , and when it is at  $-t_{adv} + 0.5T_{VCO}$ . This introduces two zones of uncertainty,  $\pi/2$  apart, in the phase detector profile as shown in Fig. 5.22(b). By adjusting RefBuff advance or delay we can position the ideal locking point half way between these two zones. After lock is achieved, the FTL would prevent the loop from drifting too far off from this ideal locking point.

As long as noise of the reference buffer in SESCi does not overwhelm the  $\pi/2$  zone, the noise performance at steady state can be robust. At lock, RefBuff and  $VCO_{+/-,buff}$  track the reference sinewave with some additional error, that is their rising edge tracks the zero-crossing of the reference sinewave. So, the integrated jitter added by the reference buffer circuit should be about a tenth of  $0.5T_{vco}$ , which with a  $2.4\,\text{GHz}$  VCO or  $T_{VCO}=416.7\,\text{ps}$ . is about 21 ps. As this is not a very low noise specification, the reference buffer design can be low power. <sup>26</sup>

We want  $VCO_{+,buff}$  to have the same jitter as  $VCO_{-,buff}$  so that the rising edge of the former in the sample selection process identically matches the falling edge of the latter in the sampling process. This prevents a premature termination of the sampling process. At lock, they both have the same jitter at the LC-VCO output, equal to  $N^2.XO_{noise}$  plus some uncorrelated inverter buffer noise. We can assume the buffer in  $VCO_{+,buff}$  path is noiseless, and refer all the inverter buffer error to  $VCO_{-,buff}$ . This means the noise on  $VCO_{+,buff}$  is just  $N^2.XO_{noise}$ . The noise on RefBuff is  $XO_{noise}$  plus the noise of the reference buffer. The remaining  $N^2.XO_{noise}$  noise on  $VCO_{+,buff}$  is not error, as it is what enables us to track  $VCO_{-,buff}$  identically. The uncertainty zone in PD profile is therefore due to the difference between the additive noise of the reference buffer.

 $<sup>^{26}</sup>$ As  $VCO_{\pm,buff}$  and RefBuff both track the reference sinewave, we need to consider the sources of difference between these two waveforms to estimate the effect of reference buffer noise on the PD profile.



Figure 5.22: Effect of reference buffer noise on the PD profile creates two zones of uncertain samples  $\pi$  apart. The timing diagram is shown for noise on an advanced RefBuff edge. By tuning RefBuff position, we can position the zones of uncertainty for higher robustness and place the ideal locking point in the center of the range.

# 5.3 Proposed Loop Implementation

The proposed  $2.05 - 2.55 \,\text{GHz}$  PLL is implemented in a  $65 \,\text{nm}$  TSMC CMOS technology. The prototype has a functional area of  $0.36 \,\text{mm}^2$ . Of this,  $0.3 \,\text{mm}^2$  is occupied by the LCVCO with 21.7% tuning range. The loop components only occupy  $0.06 \,\text{mm}^2$  which is comparable to the SSPLL implementations.

# 5.3.1 Loop parameter selection

In the proposed implementation, we have chosen VCO frequency to be 2.55 GHz, reference frequency is chosen to be 50 MHz, yelding a multiplication factor of 51. <sup>27</sup>

Optimal Noise Bandwidth  $A_{ref}$  is determined by the capability of the crystal oscillator sinewave to have an amplitude of 0.5 V (0.7 V max). Too large an amplitude will forward-bias the body-source junction of the reference sampling switch and increase the spurs coupling through the substrate and affect designed performance. Given N and  $A_{ref}$ , the main loop  $K_{VCO,I}$  is determined by the PLL loop bandwidth for optimal noise performance. Ideally, we would like the loop bandwidth to be adjustable. This is usually harder in Type-I PLLs and more straightforward in Type-II loops with a charge pump where the loop bandwidth can be adjusted by tuning the pump current. We choose the optimal bandwidth, the frequency at which in-band noise contribution to the PLL output is equal to the VCO phase noise  $^{28}$ , as 1.57 MHz ( $\approx \frac{f_{ref}}{30}$ ). For this, we need  $K_{VCO,I}$  of 72.5 MHz/V.

 $C_{samp}$  is chosen such that the total in-band noise matches the SSPLL sampler contribution (refer Figs. 5.18 and 5.19.). Here we have  $C_{samp} = 10 \,\mathrm{pF}$ .

**PLL Area** Of the free variables N,  $K_{VCO,I}$ ,  $g_m$ ,  $K_{VCO,II}$  and  $C_{FTL}$  we have determined the first two.  $C_{FTL}$  is chosen based on area limitations. FTL bandwidth has to be very low which leads to a large  $C_{FTL}$  and area penalty. This issues can be addressed in future work by implementing

<sup>&</sup>lt;sup>27</sup>These are chosen close to the SSPLL in [104] to enable a direct performance comparison.

 $<sup>^{28}</sup>$ Too small a bandwidth leads to noise peaking and the VCO noise is sub-optimally suppressed. If the bandwidth is too large, we integrate more of the in-band PLL noise than if we followed the VCO phase-noise profile after the point of intersection. At the PLL bandwidth, the VCO and the in-band noise contribute the same noise. The noise value is equal to half the VCO noise at an offset of  $\omega_{PLLBW}$  from carrier frequency. Outside the loop bandwidth the output noise follows the VCO phase noise profile.

a digital FTL path controlling a fine capacitor bank rather than a varactor in the oscillator. In this design with single ended loop filter cap of 80 pF, we mitigate the area requirement to a certain degree because we use a differential loop filter. By implementing  $C_{FTL}$  differentially, we halve the physical capacitance to 40 pF.

Acquisition  $K_{VCO,II}$  is chosen based on desired acquisition range and the output voltage swing of the FTL. With  $\pm 200 \,\mathrm{mV}$  output swing, and a  $K_{VCO,II}$  of  $50 \,\mathrm{MHz/V}$ , an acquisition range of  $\approx \pm 10 \,\mathrm{MHz}$  is possible. <sup>29</sup> With this, and  $\omega_z \approx 0.1 \omega_u = 10 \,\mathrm{kHz}$ , the transconductance  $g_m$  can be determined as  $5 \,\mu\mathrm{S}$ . In implementation, we get  $g_m = 4 \,\mu\mathrm{S}$  and  $\omega_z = 8 \,\mathrm{kHz}$ .

Spur Performance The value of  $K_{VCO,II}$  and  $K_{VCO,I}$  is also adjusted to get the desired spur performance, higher values of  $K_{VCO,I/II}$  will lead to larger spurs. For a fixed  $\omega_z$ , to reduce  $g_m$ ,  $K_{VCO,II}$  can be made larger only by a factor of two or so. Beyond, this it has a negative effect on spur. For this work, the chosen  $K_{VCO,I/II}$  gives acceptable spur performance.

Spur performance is also expected to be better than conventional PLLs as there are no narrow pulses. The mismatch in the differential pair of the FTL charge pump doesn't affect the spur. This is because, similar to the SSPLL, the two paths (positive and negative current) are on simultaneously. At lock, no current may flow into the loop filter capacitance, and the loop adjusts the steady state phase error (away from ideal zero crossing lock point) to adjust  $V_{PD+}$  and  $V_{PD-}$  to neutralize current mismatch.

Further, unlike the SSPLL, the VCO is well protected from charge injection through the chain of buffers to the sampling switch. There is also no BPSK-like load modulation of the VCO tank, as the VCO outputs are fed to the gate and not the source of the sampling switch.

The half reference multiplexing causes sub-harmonic reference spurs in the loop which can be quite large. In the future, we propose a randomization of the signal selecting between two multiplexed sample capacitances to reduce this, at the expense of adding some quantization noise to the loop. A detailed analysis of such a scheme would be in order.

Finally, the PLL is operated in Type-I mode with a very reasonable acquisition range across environmental and supply variations obviating the need for the FTL area and other FTL considerations discussed in this section.

 $<sup>^{29}</sup>$ The exact symmetry of the 20 MHz band depends on the symmetry of the output swing.

## 5.3.2 Switch size

As the sampling capacitance in PD is N times larger than  $C_{samp}$  of SSPD from noise considerations, and we would need N larger switch size for the same  $R_{on}C_{samp}$ . However, as discussed in Section 5.2.2, the  $R_{on}C_{samp}$  time constant need only be as small as half the VCO cycle (sampling time of  $0.5/f_{VCO}$ ), so it can be considerably larger than the SSPD  $R_{on}C_{samp}$  time constant is very much smaller than  $1/f_{VCO}$ . This moderates the switch size considerably.

A switch size of  $64 \,\mu\text{m}/60 \,\text{nm}$  and  $R_{on}$  of  $10 \,\Omega$  is chosen to use with  $C_{samp}$  of  $10 \,\text{pF}$ . This switch size is not too large and can be driven without excessive power consumption in the TRACK generation scheme. The power in the buffers used to drive it is gated as discussed in Section 5.2.6.

Grounded body node The body of the switch is grounded, the source is connected to the input reference sinewave, the drain to the sample capacitance and the gate to the switching clock. While tying the body to the source would reduce threshold voltage and hence device  $R_{on}$  allowing for a smaller switch size, we choose to keep the body grounded. This is to prevent the drain-body from getting forward-biased if  $A_{ref}$  exceeds 0.7 V, although this design nominally uses  $A_{ref}$  of 0.5 V around  $V_{DC,ref}$  of 0.5 V. In such a case, during the hold mode the drain node would be around  $V_{DC,ref}$  that is 0.5 V. If the input swings up by 0.7, the body-drain junction will get forward biased.

Layout The switches in each differential path are split into two  $32 \,\mu\text{m}/60 \,\text{nm}$  switches, and the MIM sampling caps are split as two 5 pF caps, which are then laid out in common centroid configuration, as shown in Fig. 5.23. Within each quarter of a differential phase, each  $32 \,\mu\text{m}/60 \,\text{nm}$  switch for a mux path is split as two  $16 \,\mu\text{m}/60 \,\text{nm}$ , and each quarter's two 5 pF mux caps are split as two  $2.5 \,\text{pF}$  caps. The two half-rate mulitplexed paths in each quarter for a particular differential phase are also laid out in common centroid. This layout allows the clocks to be routed symmetrically and easily to all devices. The clocks from the SESCi are routed to the center of the block and then routed to the center of each quarter. From there, they are again routed to switches.

## 5.3.3 SESCi

The SESCi uses standard digital library cells. The routing parasitics are carefully minimized by studying the Calibre R+C+CC extracted layout <sup>30</sup>, especially for the cells processing the sensitive

<sup>&</sup>lt;sup>30</sup>R+C+CC is used to include all timing and associated phase noise issues



Figure 5.23: Common centroid layout of differential Sampling Phase Detector.

VCO and reference sinewaves, so as to quickly create fast edges more resilient to the digital noise. After following an initial 4x scaling scheme to size the chain for minimum delay and fast edges, some empirical gate size resizing based on the extracted layout leads to the circuit shown in Fig. 5.24. This step of extraction and optimization was critical to the PLL jitter performance.

The clocking signals from the SESCi to SSPD are in  $M_7$  as it allows us to minimize capacitance on these lines for the same resistance as would be in lower  $M_1$  to  $M_6$  layers. A quasi-distributed RC model is used for the clock routing signals. The model is constructed through a combination of lumping the results from calibre C+CC calculations and hand-calculated metal line resistances. The final buffers in the SESCi chain are sized to be able to drive the two large  $64 \,\mu\text{m}/60 \,\text{nm}$  switches in each path, and the routing cap, while not degrading phase noise of the TRACK signal.

### 5.3.4 Reference Buffer

As the reference input is a differential sinewave, a differential reference buffer is required. To ensure that the buffered reference edge RefBuff is close to the zero crossing of the sinewave itself, a complementary differential CMOS buffer with a tunable delay is used as shown in Fig. 5.25 is used. Such a buffer has static power consumption unlike an inverter buffer. However, the noise



Figure 5.24: SESCi component sizing and circuit diagram.

of the reference buffer in the SESCi path can be quite large before it affects the control voltage sample in the sampled phase detector, and the power consumption of this reference buffer can still be low, see Section 5.2.7. The buffer power consumption is kept to  $170 \,\mu\mathrm{W}$  in this design.

**Tunable Delay** To tune the time difference between the RefBuff and the zero-crossing, we employ the common mode circuit shown in Fig. 5.25. By varying the common mode reference voltage, the duty cycle of its output square wave must change to match it, thereby changing its average value and the position of the rising and falling edge. If we reduce the common mode reference we decrease the duty cycle, and can create an advanced edge. By increasing the common mode reference, we can increase the duty cycle, make the positive swing longer, and get a delayed edge.

An inverter buffer can introduce a lot of delay in a sinewave apart from the large short circuit current. An inverter buffer will also include any single ended errors which are eliminated when the sinewave reference is maintained differentially in the buffer. If the input signal is a slow small signal differential sinewave, ideally we expect a differential gm-cell to introduce very little phase shift for low frequencies. In reality, the input wave is slow but large signal, and the buffer response is quite nonlinear. As such, some phase shift is introduced based on how the devices recover from their excursions into cut-off, triode etc. However, we leverage the square wave output to introduce tunable delay through the mechanism described above.

It should be noted that setting the delay or advance of the buffered reference edge RefBuff is



Figure 5.25: Reference buffer with tunable delay.

a one time procedure to calibrate against process corner. For prototyping, we provide the variable common mode reference through the self-bias of a self-biased inverter. By switching the size of the PMOS and NMOS devices we can change the self-bias. For additional freedom in defining RefBuff, we will supply the voltage of the self-bias inverter separately. However, ideally, for noise reduction it is better to share the supply of the common mode circuit, the common mode reference generator, and the differential buffer.

Charge injection from PD sampler Due to the sampling process in the PD, some charge injection onto the sinewave reference occurs at every TRACK edge. At lock, this injection will disturb the sinewave at the point of highest slope, near the zero crossing. The common mode error due to this on reference sinewave is immaterial for the differentially sensed control voltage sample. However, if common mode error is not adequately rejected in the reference buffer, the resultant RefBuff may not be close to the reference sinewave zero crossing. Differential mode error in the reference sinewave due to charge injection is critical for both the sampled control voltage and RefBuff generation.  $^{31}$ 

The buffer is operated in large signal mode. For small signal input the top and tail current

<sup>&</sup>lt;sup>31</sup>As the reference sinewave is slow, it is quite susceptible to distrubances and may also pick up differential/common mode errors from sources other than charge injection from the sampler.

source provides sufficient common mode rejection. Clearly, when the input swing is large, the current source will be periodically crushed into triode. However, it will recover near the zero crossing of the input sinewave. This is sufficient, as for RefBuff rising edge definition, we are only concerned with common mode rejection near the input sinewave zero crossing.

The charge injection occurs at 2.4 GHz every half reference cycle, so if it is modeled as a delta train, it has components from 50 MHz to 2.4 GHz and 2.4 GHz±50 MHz, and further harmonics. At low frequencies the wirebond is a short, and the crystal imposes a voltage. However for the high frequency component, the wirebond is open. To attenuate charge injection into the reference sinewave at the critical zero-crossing time instant, large 40 pF capacitances are included. The 40 pF capacitors attenuate both the differential and common mode error from charge injection.

# 5.3.5 LC-VCO Implementation

A pseudo-differential cross coupled VCO was implemented with a  $1.3\,\mathrm{nH}$  inductor with quality factor of 16 at  $2.4\,\mathrm{GHz}$ . This is the maximum attainable quality factor.  $^{32}$ 

The cross coupled devices are each  $14 \times 1 \,\mu/120\,\mathrm{nm}$  with multiplicity two, so that a low supply of  $0.5\,\mathrm{V}$  can be used.

There are two differential varactors controlled by the FTL path and Type-I path, implemented using the NMOS in n-well device. The n-well are connected to the positive control voltage, and the gate is connected to the negative control voltage. The varactors are placed such that the n-well is at the center away from the large swing of the oscillator output which can potentially forward bias the substrate-p/varactor-nwell. The FTL bias of 1.3 V is not an issue due to the differential implementation which prevents a large voltage from breaking the device. The varactors are sized to achieve  $50 \,\mathrm{MHz/V}$  for  $K_{VCO,I}$  and  $K_{VCO,II}$ .

The acquisition range of the PLL is largely determined by the main Type-I loop and is about 10 MHz. In order to set the frequency of the VCO within the acquisition range of the PLL, a very fine 7-bit digital capacitor bank is implemented using custom 30 fF Metal-Oxide-Metal or MOM

 $<sup>^{32}</sup>$  [113] is also a pseudo-differential implementation. However, it requires a much larger inductor value to satisfy both common and differential mode conditions. Implementing such a large inductor degrades the inductor quality factor at 2.4 GHz. This technique was found to achieve the same  $FoM_{VCO}$  performance as the cross coupled VCO at low frequencies where it is hard to achieve high quality factor.

cap in layers 4-7, so that the largest frequency step (at the high end of the tuning curve) is half the acquisition range. For ease of routing, the cap and collocated switch are implemented in a single ended fashion. By replacing the MoM vertical finger capacitors with MOS capacitors as in [123] we can get significant improvement in VCO area, as currently the size of the 7-bit digital bank is equal to the inductor area.

The VCO has a simulated FoM $_{VCO}$  of at best 186.5 dB with further details in Section 5.4.

#### 5.3.6 VCO Buffer

In this topology, we have chosen to use the inverter buffer to buffer the LC-VCO sinewave and not the reference sinewave. Ideally, the VCO buffer are the only noise contributors in the proposed PLL. By using a large sampling cap, we have matched the phase detector noise suppression level in SSPLL. However, unlike SSPLL the VCO noise is not attenuated by  $1/\omega^2$  but only by  $1/\omega$  in the proposed PLL. The FTL also contributes to noise to a smaller degree.

As discussed previously due to slewing the noise of the VCO buffer will get multiplied by  $N^2$ , and it must be designed with care. The inverter buffer is implemented using long channel devices, to reduce flicker noise and sized to ensure that it is not the dominant contributor the PLL phase noise. The power consumption in the buffer is  $500 \,\mu\text{W}$ .

#### 5.3.7Frequency Tracking Loop

The frequency tracking loop is run off a  $2.4\,\mathrm{V}$  supply while the main Type-I loop is run off 1/1.2V. A PMOS source follower buffer is used to translate the low bias of 0.5 V DC at the output of the PD to a higher bias for the input of the  $g_m$  cell. To obtain large output impedance for the transconductance cell, we use thick-oxide <sup>33</sup> long-channel devices in a cascoded structure. The FTL circuit diagram, along with annotated device sizes are shown in Fig. 5.26.

Small  $g_m$  implementation To achieve kHz bandwidth without an unacceptable increase in  $C_{FTL}$  value, we implement a degenerated  $g_m$  cell. This cascoding of the input pair transistors, significantly attenuates their noise contribution to the FTL output current. The cascode nature also attenuates the noise of the preceding source follower circuit. For this reason, the main contributors

 $<sup>^{33}</sup>$ high  $V_{TH}$ 



Figure 5.26: Circuit diagram of Frequency Tracking Loop (FTL) with CMFB.

of noise are only the PMOS and NMOS current sources of the telescopic cascode topology. This fits into the model used for FTL noise analysis earlier.

An option for shorting the source of the input differential pair in the telescopic cascode is provided. This is a higher noise option with higher FTL bandwidth is provided for prototyping. To change  $g_m$  in the low or high gain setting PMOS current source size can be changed by switching in devices. Simulations ensure that all devices are maintained in saturation irrespective of current setting.

Input and Output swing Once the initial capacitance bank digital setting is programmed into the LC-VCO, the loop must lock to the closest integral multiple of the reference. In the absence of FTL, the proportional loop will lock with some static phase error. With the FTL, the differential loop filter capacitance develops the correct control voltage to keep static phase error zero. This process is slow due to the low loop bandwidth of the FTL. The range and symmetry of the output swing determines the acquisition range and its symmetry around the desired VCO lock frequency through  $|V_{out,max} - V_{out,min}| \times K_{VCO,II}$  <sup>34</sup>. The output range will also determine how far the FTL can track slow frequency errors. The output swing is designed to be  $\pm 200 \,\mathrm{mV}$ .

Ideally, at lock the phase error is zero and the PD control voltage is 0 V. However, while tracking frequency drift error or due to nonideality at lock, the PD control voltage can have single ended excursion between  $\approx \frac{A_{ref}}{N} \times \pm \pi$  that is  $\pm 50 \,\mathrm{mV}$  for this implementation. This must be accounted for in design.

CMFB The loop filter is implemented differentially. As discussed previously, this helps halve the physical area for the loop filter capacitance. The differential voltage across the capacitor is determined through the loop feedback so that the differential varactor control voltage for maintaining the oscillation frequency under slow drift is achieved. There is no mechanism for setting the common mode voltage of the output, and a CMFB is required for the transconductance to function properly. The CMFB circuit is shown in Fig. 5.26.

To maintain high output common mode resistance, the CMFB should not load the output. Therefore, instead of a resistive sensing network, MOS devices are used. However, a source follower

 $<sup>^{34}</sup>$ This is one factor that determines the acquisition range. Another rule of thumb measure is the loop bandwidth, though this is only a rough approximation as the actual acquisition process is nonlinear. We must ensure that for the desired acquisition range, the  $g_m$  cell does not go into triode

architecture for the MOS sensing, limits the differential output swing to avoid pushing the current sources in the CMFB into triode. By using the architecture in Fig. 5.26, where each differential node  $v_{FTL\pm}$  is compared with the common mode voltage  $v_{CM,FTL}$  instead of with each other, we can double the allowed differential swing.

We note that a portion of the loop filter capacitance is implemented in a single-ended fashion. This is to help adjust phase margin and stabilize the CMFB circuit.

# 5.3.8 Output Test Buffer

The output buffer is a chain of large inverter buffers designed for low noise and drive a  $50 \Omega$  load. The buffer power is not included in the  $\text{FoM}_j$  calculation. The buffers are also placed in a separate ground to ensure that the large bounce due to the large current consumption in the test buffers does not introduce heavy spurs in the PLL output.

# 5.3.9 Ground isolation and ESD protection

To reduce spurs and minimize noise, digital and analog grounds are separated. The p-substrate grounds are isolated by using deep n-well trenches between the grounds. After power up, the deep n-well is connected to the  $V_{ESD}$  of the digital and mixed signal ground island. Each ground island provides an alternate path through a solid ground plane to the respective island's ground pads. The islands are bridged through differential control signals from the loop to the VCO, and differential output from the LC-VCO the loop and test buffers.

The analog LC-VCO and the scan chain are located in a separate analog ground. The output test buffer is located in its own ground to minimize spurs. The VCO buffer, SESCi along with its reference buffer, main proportional path i.e. the PD, and the FTL are in the digital/mixed-signal ground. The differential buffer processes a slow input sinewave, so it should not be placed it in the noisy ground island. However, the single ended output is taken and fed to the SESCi. As the interface between the reference buffer and SESCi is not differential, they must share a ground. Further, the reference buffer noise is largely immaterial as discussed in Section 5.2.6. Next, the FTL is a mixed signal block which takes its input from the sampled phase detector and puts out an analog differential control voltage. As the block is implemented differentially with strong common mode rejection and differential interface with the VCO, it is placed in the digital/mixed signal



Figure 5.27: Simulation setup for loop noise.

ground. Finally, the VCO buffers process the sinewave output of the LC-VCO. To reduce static power consumption, two pseudo-differential inverter buffers are used to buffer and generate the sampling waveforms. To mitigate the effect of ground noise, they should share a ground with the subsequent digital gates.

Each ground island has its own ESD ring. ESD rings are composed of a reversed biased diode from ground to the node of interest, and one from the node to the 2.6 V i.e.  $V_{ESD}$  node. A clamp of four series diodes then completes the diode ring to provide a discharge part for accumulated charge. The input and output reference and VCO pads have small diodes tying to  $V_{dd}$  and ground to reduce RF capacitance while still providing some RF protection.

Eventually, all grounds are shorted off-chip. Before wirebonding, if the grounds are isolated, an ESD event between them can cause breakdown of the deep n-well/p-substrate junction. For this reason, back to back diodes are connected between all ground islands tying the grounds to within 0.7, V of each other. Nominally, the diodes are off and small noisy disturbances are isolated between different domains.

#### Simulated performance and Measurement 5.4

#### Phase noise simulation 5.4.1

To simulate the PLL phase noise performance we simulate the noise of the loop and the VCO separately and combine them in MATLAB using the transfer functions derived in the previous



Figure 5.28: Comparison of simulated PLL noise with measured performance at 2.55 GHz.

sections.

We run a periodic noise simulation with 25 MHz beat frequency and upto 100 to 300 harmonics to determine the noise on the control voltage from the VCO buffer, SESCi non-idealities such as reference buffer noise and include the  $\times 2$  multiplication of single phase VCO buffer noise due to the sample selection process (see Section 5.2.7), and the PD itself, see Fig. 5.27. The ideal vsin reference sinewave is fed through 2 nH wirebonds. Spectre yields single sided noise spectrum for noise and noise analysis. However, phase noise simulations in VCOs through noise yields single side-band phase noise  $^{35}$ . To calculate output phase noise, we multiply this single-sided spectrum by  $H_{n,PD} = \frac{N^2}{4A_{ref}^2}$ .

VCO phase noise is simulated using .pnoise analysis while ensuring sufficient number of sidebands for noise folding.

The spectral densities of these three simulations and the total noise at the PLL output is shown in Fig. 5.28 along with a comparison to the measured performance. The latter is discussed in more detail next.

<sup>&</sup>lt;sup>35</sup>SSB phase noise is double-sided spectrum. Single sided noise spectrum is 3 dB higher than double-sided spectrum. This is different from receiver noise terminology - SSB refers to downconversion of RF located on only one side of LO. DSB refers to downconversion with RF located on both sides of LO. As such DSB noise figures are 3 dB higher than SSB noise figures.



Figure 5.29: Phase noise corresponding to the best measured FoM<sub>VCO</sub> at 2.3 GHz ( $P_{dc} = 3.26mW$ ,  $\text{FoM}_{VCO,1MHz} = 186.7$ ) and 2.55 GHz ( $P_{dc} = 1.6mW$ ,  $\text{FoM}_{VCO,1MHz} = 184.2$ ).



Figure 5.30: Comparison of measured VCO performance with simulated phase noise at 2.3 GHz.

We have not simulated the effect of supply noise on the output phase noise. A battery provides the supply voltage of the VCO and the scanchain. E3632A Agilent supplies provide the bias for the digital supply and output buffer (without an additional voltage regulator on board).

#### 5.4.2Measured Performances: VCO

The measured VCO performance is shown in Fig. 5.29. The best measured FoM $_{VCO}$  is close to 186.7 dB at 1 MHz offset (thermal noise region) from 2.3 GHz carrier with a supply of 0.47 V and bias current of 6.93 mA. For the PLL data reported at 2.55 GHz, the VCO has a best measured  $FoM_{VCO}$  of 184.2 dB with a supply of 0.38 V and current of 4.21 mA. It is noted that the bias point and carrier frequency for optimal VCO phase noise FoM may not be the best for overall loop jitter  $FoM_i$ . A comparison with simulation at 2.3 GHz is shown in Fig. 5.30.

#### Measured Performances: RSPLL 5.4.3

The RSPLL is packaged in a QFN 48 package and mounted on a PCB for testing. A battery provides the supply voltage of the VCO and the scanchain. E3632A Agilent supplies provide the bias for the digital supply and output buffer (without an additional voltage regulator on board). The reference is a 50 MHz crystal from Wenzel Associates with an phase noise of -170 dBc/Hz at offsets above 10 kHz. Using a -165 dBc/Hz crystal as in [3, 104, 117] only degrades the FoM<sub>i</sub> by 0.1 dB in simulation, and is not a concern. The phase noise and spurs are measured on an Agilent E4448A spectrum analyzer with phase noise personality option 226.

The measured performance at 2.55 GHz when the VCO has a free running frequency of 2.544 GHz, that is an initial frequency error of 6 MHz is shown in Fig. 5.31. An integrated jitter of 109.63 fs is seen. The RSPLL shows a record Fo $M_i$  of -253.5 dB amongst explicit PLLs, with the lowest reference spur of  $-67 \, dBc$  for such a low jitter-power figure-of-merit. A sub-harmonic spur at 25 MHz at -63 dBc is observed due to half-rate multiplexing in the phase detector. It is expected that the 25 MHz spurs can be eliminated by using a sample-and-hold implementation. <sup>36</sup>

Apart from the match to simulated results, it is difficult to verify the observation that VCO noise dominates in the loop. Increasing the VCO power to lower its contribution in-band noise

<sup>&</sup>lt;sup>36</sup>Intuitively, this may result in a slight increase in the reference spur due to charge injection which will not be eliminated by differential implementation if the Type-I loop locks to a non-zero condition.



Figure 5.31: Measured performance of the RSPLL at 2.55 GHz. The RSPLL shows a record FoM<sub>i</sub> of  $-253.5 \, dB$  amongst explicit PLLs, with the lowest reference spur of  $-67 \, dBc$  for such a low jitterpower figure-of-merit. The 25 MHz spur is a result of half-rate multiplexing, and is not intrinsic to the RSPLL architecture.



Figure 5.32: Measured performance across carrier frequency across three different samples.

and observe the in-band contribution of the loop components is not possible in measurement, as the  $FoM_{VCO}$  worsens with increasing  $V_{dd}$  and it is not possible to lower the in-band VCO noise indefinitely.

The measured performance across carrier frequency (for about the same initial frequency error in each case) is shown across three different samples in Fig. 5.32. Note that the voltages on  $V_{ddTUNE}$  and  $V_{ddCM}$  both of which control the position of the reference buffer edge (RefBUFF) <sup>37</sup> with respect to the reference zero-crossing were only calibrated once for this frequency range. <sup>38</sup>

As this is a Type-I implementation we must verify the conditions under which the RSPLL can be locked without an integral path. Typically before the loop fails catastrophically, the degradation will manifest in the jitter and spur performance. As such, a measurement of th PLL performance

 $<sup>^{37}</sup>$ As discussed before in Section 5.3 ,  $V_{ddTUNE}$  controls the delay or advance of the reference buffer path and  $V_{ddCM}$  controls the bias of the input reference sinewave to the reference buffer implicitly controlling the reference buffer delay or advance, although the latter can be shorted to  $V_{ddDIGITAL}$  in this implementation, and the bias controlled through a programmable resistive divider.

<sup>&</sup>lt;sup>38</sup>Broadly speaking, as the spur is low across carrier frequency, this shows that the spur is not low at 2.55 GHz as a result of some coincidental cancellation in parasitic paths. The actual optimal may lie for slightly different  $V_{ddTUNE}$  or  $V_{ddCM}$  setting but largely this verifies the montonicity argument and the irrelevance of lock at zero voltage condition.

across VCO supply variation is shown in Fig. 5.33. In this variation the PLL can lock from a maximum error of 8 MHz from the desired integer-N multiple of the reference. This is fairly robust and comparable to the robustness of the proportional-integral dynamics of the ILCM+FTL in [4]. This measurement shows that the RSPLL does not need an additional integral path (FTL)to ensure lock at zero phase error for best performance or robustness.



Figure 5.33: Measured performance across VCO supply voltage variation. The desired lock frequency is 2.55 GHz.

#### Comparison 5.5



Figure 5.34: The RSPLL architecture combines the best aspects of subsampling PLL and ILCM architectures to show significant improvement in the jitter versus spur performance space.

A comparison with int-N PLLs and ILCMs exhibiting state-of-the-art jitter-power figure of merit is shown in Table 5.1. The work has the record jitter  $FoM_j$  and the lowest spur for low jitter performance. Fig. 5.34 shows how this work improves performance and achieves record numbers across architectures in the low-jitter versus low-spur performance space.

# 5.6 Future work: Loop Bandwidth Modification



Figure 5.35: Possible approach to modifying loop bandwidth without changing area (total sampling cap size remains same), power consumption (total switch size remains same) or output noise.

The unity gain bandwidth of the loop, and hence the closed loop PLL bandwidth is  $\frac{A_{ref}K_{VCO}}{N}$ . For a fixed reference, this can only be modified by changing  $K_{VCO}$  which is also fixed. This is a usual problem in Type-I loops, where there is no easily modifiable parameter such as charge pump current in conventional Type-II loops (or SSPLL), To modify the loop gain, and hence the loop bandwidth, we propose the scheme in Fig. 5.35. The figure is shown as an example of  $3 \times$  loop bandwidth by partitioning the same  $C_{samp}$  across three paths. The same sample is stored on the three capacitance in the sampling phase, and then summed to increase the loop gain by connecting the three capacitors in series in the hold phase. As a result, the overall area is not increased. Each

switch is also third of its original size as its on resistance can be three times larger. The overall switch size remains the same, as does the power consumption in the clock buffer driving them. Finally, as shown in the figure, the noise contribution of the sampled phase detector remains the same.

In conclusion, the proposed loop bandwidth modification technique for Type-I sampled PLLs can modify bandwidth with the same area, noise or power consumption as the nominal bandwidth condition.

Table 5.1: Comparison of RSPLL with state-of-the-art integer-N frequency synthesizers

|                             | [e] O               | G [117]               | II 1 1 [100]          | EU 1 [109]             | (D): XX 1               |
|-----------------------------|---------------------|-----------------------|-----------------------|------------------------|-------------------------|
|                             | Gao [3]             | Gao [117]             | Helal [128]           | Elkholy [123]          | This Work               |
|                             | VLSI '10            | ISSCC '10             | JSSC '09              | ISSCC '16              |                         |
| Architecture                | SSPLL with no       | SSPLL with spur       | Pulse-injection       | Frac-N ILCM            | RSPLL                   |
|                             | VCO isolation       | cancellation          | locking               | with freq.             |                         |
|                             | buffer              | scheme                |                       | $\operatorname{track}$ |                         |
| Output freq.                | 2.21                | 2.21                  | 3.2                   | 6.75 - 8.25            | 2.05-2.55               |
| (GHz)                       |                     |                       |                       | (20%)                  | (21.7%)                 |
| Mul. factor N               | 40                  | 40                    | 64                    | 64                     | 50                      |
| PN @ 200 kHz <sup>(1)</sup> | -125                | -121                  | -119.2 <sup>(2)</sup> | -122.2 <sup>(2)</sup>  | -122.8                  |
| (dBc/Hz)                    |                     |                       |                       |                        |                         |
| PN @ 1 MHz <sup>(1)</sup>   | -124 <sup>(2)</sup> | -120.1 <sup>(2)</sup> | -130.2                | -126.2                 | -125.2                  |
| (dBc/Hz)                    |                     |                       |                       |                        |                         |
| Int. jitter                 | 160                 | 300                   | 130                   | 104                    | 110                     |
| (fs)                        | (10k-100M)          | (10k-100M)            | (100-40M)             | (10k-30M)              | (10k-100M)              |
| Ref. Spur                   | -56                 | -80                   | -63.9                 | -43                    | -67 @ f <sub>ref</sub>  |
| (dBc)                       |                     |                       |                       |                        | -63 @ $f_{ref}/2^{(3)}$ |
| B/W frac                    | $f_{ref}/20$        | $f_{ref}/20$          |                       | $< f_{ref}/50$         | $f_{ref}/30$            |
| DC Power (mW)               | 2.5                 | 3.8                   | 28.6                  | 2.25                   | 3.7                     |
| (VCO + Loop)                | (1.8+0.7)           | (1.8+2)               | N/R                   | (2.2+0.45)             | (1.6+1.1)               |
| Area                        | $0.2^{(4)}$         | $0.2^{(4)}$           | $0.4^{(5)}$           | $0.27^{(6)}$           | $0.36^{(7)}$            |
| $(mm^2)$                    |                     |                       |                       |                        | [VCO:0.3, Loop:0.06]    |

<sup>(1)</sup> Normalized to 2.21 GHz center frequency.

<sup>(2)</sup> From measurement paper in the figure.

<sup>(3)</sup> Due to half-rate multiplexing and not intrinsic to RSPLL architecture.

<sup>(4)</sup> VCO area is dominated by inductor and has limited tuning range.

<sup>(5)</sup> External loop filter.

<sup>(6)</sup> VCO tuning range digital bank is implemented with MOS caps.

 $<sup>^{(7)}</sup>$  VCO has 21.7% tuning range implemented with MoM cap. Ind. area = Cap bank area = 0.15 mm<sup>2</sup>.

# Chapter 6

# Wide Bandwidth Electro-optic PLLs for FMCW LIDAR

Electro-optic PLLs have become quite popular, as they leverage the intensive integrated processing of electronics to control performance of optical components and provide robust functionality in changing ambient conditions. Optical phased arrays with beam steering capability have found use in optical communications, and imaging and ranging systems. In [129], the authors locks several inexpensive and noisy high power laser sources to a single clean source through an electro-optic PLL. In [130], the authors introduce a tunability mechanism between the clean-reference and controlled-noisy laser sources, which allows an additional RF source to modulate the latter's locked center frequency. In [131], the authors modulate the laser frequency electronically for very high resolution FMCW 3-D imaging system.

This chapter focuses on electro-optic PLLs for light based ranging systems, known as LIDARs (Light Detection And Ranging) which has garnered increased attention in autonomous system applications. Silicon-based optical phased arrays with solid-state beam steering which can be integrated with CMOS-based electronics are being explored as they can generate very narrow beam-widths with smaller apertures than RF or microwave systems [132–134]. An advantage of silicon-based systems is that apart from the electronics required to control the optical beam, EO-PLLs that control the laser waveform and increase ranging precision can also be integrated into the LIDAR system. However, in a power-starved free-space application, shunting laser power to the

EO-PLL is burdensome, and we discuss below how discrete-time EO-PLLs exacerbate the problem especially in low chirp-bandwidth applications. We describe an alternate continuous-time approach to the EO-PLL that relaxes the specification for the optical components in terms of area and power loss.

We first discuss the theory of FMCW detection, the photo-electric interface and then present the proposed PLL implementation. The chapter concludes with short- and long-term future work for electro-optic PLLs.

#### 6.1Theory of FMCW detection

In typical pulse based radar, the difference between the transmit and receive time of a narrow pulse in time is used to determine object distance. FMCW is an alternate approach to ranging and detection which has recently become very popular in a host of application including automotive and fine resolution 3-D imaging. In this a continuous wave source with a varying frequency illuminates the object, and mixes the received signal with the instantaneous frequency of the source. The resulting beat waveform contains information on both the distance of the object as well as the velocity of the moving object.

Fig. 6.1 shows the FMCW concept with a triangular chirp modulation where the chirp bandwidth is B and up- or down- ramp time is  $T_{ramp} = 1/f_{ramp}$ , or repetition rate of triangular chirp is  $f_{ramp}/2$ . The slope of the ramp is given by  $\gamma = \frac{B}{T_{ramp}}$  For a stationary object, the received waveform is just a delayed version of the transmitted one. For a moving object, the received waveform is shifted in time and frequency due to doppler effect.

# Stationary objects

Distance of stationary object

$$\Delta t = \frac{2D}{c} \tag{6.1}$$

$$\Delta t = \frac{2D}{c}$$

$$\frac{\Delta f}{\Delta t} = \frac{B}{T_{ramp}}$$

$$(6.1)$$

$$D = \frac{c\Delta f T_{ramp}}{2B} \tag{6.3}$$

The ranging resolution (the smallest distance that can be detected, or the smallest distance



Figure 6.1: FMCW with triangular chirp. (a) Stationary object. (b) Max. range for stationary object. (c) Moving object and doppler shift. (d) Max. velocity for moving object at a given distance.

between two distinct stationary objects) depends on the lowest beat frequency that can be detected. The lowest frequency must have one complete cycle in  $T_{ramp}$  to be counted correctly. This will be the beat frequency between received signals from two different stationary objects. It is also the closest an object can be placed to the transmitter.

$$\Delta f_{min} = \frac{1}{T_{ramp}} = f_{ramp} \tag{6.4}$$

$$D_{min} = \frac{c}{2B} \tag{6.5}$$

The range of the LIDAR, the maximum distance that be detected is based on Fig. 6.1, and occurs when  $\Delta f_{max}$  is B. The coherence length of the laser also limits the maximum range that can be measured. This beat frequency only lasts for a very short time, and represents the theoretical limit. Beyond this beat frequency the results are corrupted.

$$\Delta f_{max} = B \tag{6.6}$$

$$D_{max} = \frac{cT_{ramp}}{2} \tag{6.7}$$

Note that the sampling rate of the post-processing ADC is determined by the maximum possible beat frequency. According to Nyquist criteria, this should be  $2 \times B$ . The number of bits N is determined by the requirement that sampling must complete withing one  $T_{ramp}$ .

Sampling time 
$$= 2^N \cdot \frac{1}{f_{sample}} \le T_{ramp}$$
 (6.8)

Min. FFT resolution, 
$$=\frac{1}{2^N} \cdot f_{sample} \ge \frac{1}{T_{ramp}}$$
 (6.9)

(6.10)

This means that if the sampling time is limited to  $T_{ramp}$ , and the number of bits N is high enough, the minimum frequency that can be resolved by the FFT is  $1/T_{ramp}$  as was shown in Eq. 6.4 as well, where we determined that at least one complete cycle must occur in  $T_{ramp}$  to be counted.

# Moving objects

A moving object shifts the received profile in both time and frequency.

$$f_L = \left[ f_c - \gamma \frac{T_{ramp}}{2} + \gamma \Delta t \right] - \left[ f_c - \gamma \frac{T_{ramp}}{2} \right] \cdot (1 + \frac{v}{c})$$
 (6.11)

$$f_H = -\left[f_c + \gamma \frac{T_{ramp}}{2} - \gamma \Delta t\right] + \left[f_c + \gamma \frac{T_{ramp}}{2}\right] \cdot (1 + \frac{v}{c}) \tag{6.12}$$

Here v is the velocity of the object <sup>1</sup>,  $f_c$  is the center frequency of the ramp.  $f_L$  and  $f_H$  are the low and high beat frequencies.  $f_L$  and  $f_H$  appear in the up and down ramp respectively for an object moving away, and vice versa for an object moving towards the LIDAR source.

As the chirp bandwidth B is a very small fraction of the center frequency  $f_c$  we make the following approximation

$$f_c - \gamma \frac{T_{ramp}}{2} \approx f_c \approx f_c + \gamma \frac{T_{ramp}}{2}$$
 (6.13)

We replace the terms and obtain

$$f_L = [f_c + \gamma \Delta t] - f_c \cdot (1 + \frac{v}{c}) \tag{6.14}$$

$$f_H = -\left[f_c - \gamma \Delta t\right] + f_c \cdot \left(1 + \frac{v}{c}\right) \tag{6.15}$$

Adding and subtracting, we obtain expressions for the distance and velocity of the object

$$D = \frac{cT_{ramp}}{2B} \frac{f_H + f_L}{2} \tag{6.16}$$

$$v = \frac{c}{2f_c}(f_H - f_L) \tag{6.17}$$

As before, the smallest frequency that can be detected is  $1/T_{ramp}$ . This is the minimum for  $(f_H - f_L)$ , and the slowest detectable speed is

$$(f_H - f_L)_{min} = \frac{1}{T_{ramn}} \tag{6.18}$$

$$v_{min} = \frac{c}{2f_c T_{ramp}} \tag{6.19}$$

Finally, the situation for maximum detectable velocity is shown in Fig. 6.1. The maximum detectable velocity depends on the distance of the object, it is highest for the furthest object and

<sup>&</sup>lt;sup>1</sup>Doppler effect shifts a frequency as  $f_D = \frac{c+v_r}{c+v_s} f_T$  where  $v_r$  and  $v_s$  are object and source velocities respectively.  $f_T$  is the original frequency and  $f_D$  is the doppler shifted version



Figure 6.2: EO-PLL photoelectric interface.

lowest for the closest object.

$$f_{c} - \gamma \frac{T_{ramp}}{2} + \gamma \Delta t = \left[ f_{c} - \gamma \frac{T_{ramp}}{2} \right] \cdot \left( 1 + \frac{v_{max}}{c} \right)$$

$$\approx f_{c} + \gamma \Delta t = f_{c} \cdot \left( 1 + \frac{v_{max}}{c} \right)$$

$$\frac{B}{T_{ramp}} \cdot \frac{2D}{c} = f_{c} \cdot \frac{v_{max}}{c}$$

$$v_{max} = \frac{2BD}{f_{c}T_{ramp}}$$
(6.21)

# 6.2 Photo-electric interface

In order to control the laser frequency through a stable low frequency reference, we must first discuss the photo-electric interface, shown in Fig. 6.2. The laser frequency is modulated by changing its bias current<sup>2</sup> and the laser source can be modeled as a current controlled oscillator (CCO).

$$H_{laser}(s) = \frac{K_{CCO}}{s} \text{ rad/Amp}$$
 (6.22)

To detect the laser modulation a Mach-Zender Interferometer (MZI) is used. This consists of combining the laser's instantaneous frequency with a delayed version of itself in an optical coupler.

The combined light is then incident on a photodiode which generates a current proportional to

<sup>&</sup>lt;sup>2</sup>Carrier injection in diode laser.

<sup>&</sup>lt;sup>3</sup>This is actually the same mathematical operation discussed in the FMCW concept. In the imaging path, an MZI is used for performing the FMCW ranging function.

the power incident on it. The quadratic relationship between light power and amplitude generates a frequency at twice the laser center frequency, and a low frequency component.

$$i_{PD}(t) = \left[\cos(\omega t + 0.5\frac{\gamma}{2\pi}t^{2}) + \cos(\omega(t - \tau_{MZI}) + 0.5\frac{\gamma}{2\pi}(t - \tau_{MZI})^{2})\right]^{2}$$

$$= \cos^{2}((\omega t + 0.5\frac{\gamma}{2\pi}t^{2})) + \cos^{2}(\omega(t - \tau_{MZI}) + 0.5\frac{\gamma}{2\pi}(t - \tau_{MZI})^{2})$$

$$= DC + \cos(2\omega t + \frac{\gamma}{2\pi}t^{2}) + \cos(2\omega(t - \tau_{MZI}) + \frac{\gamma}{2\pi}(t - \tau_{MZI})^{2})$$

$$+ \cos(2\omega t + 0.5\frac{\gamma}{2\pi}t^{2} - (\frac{\gamma}{2\pi}\tau_{MZI})t - \omega\tau_{MZI} + 0.5\frac{\gamma}{2\pi}\tau_{MZI}^{2})$$

$$+ \cos((\frac{\gamma}{2\pi}\tau_{MZI})t + \omega\tau_{MZI} - 0.5\frac{\gamma}{2\pi}\tau_{MZI}^{2})$$
 relevant component (6.24)

Here  $\gamma$  is the slope of the triangular modulation, such that the phase of the source ramps at  $\gamma t^2$ , and  $\tau_{MZI}$  is the MZI delay. The high frequency components are filtered out. If the photodiode has a low enough capacitance, it will generate an AC current at a frequency of  $\gamma \tau_{MZI}$  Hz. If the laser modulation is stable and well-controlled, this frequency should not change with time.

For small delays, the MZI along with the filtering of high frequency components is a delay discriminator, that is an inexact differentiator with a gain of  $\tau_{MZI}$ . The transfer function for the MZI can be modeled as follows for small  $\tau_{MZI}$ 

$$i_{PD}(j\omega) = j\omega\tau_{MZI} \tag{6.25}$$

# 6.3 EO-PLL Basics

We can use the observation, that the frequency of the photodiode output current is constant under a stable chirp modulation, to motivate the electro-optic PLL architecture. By comparing the frequency of the photodiode current to a clean reference crystal we can provide an additive correction to the nominal ramping modulation current of the laser. This is shown conceptually in Fig. 6.3.

The loop gain of the EO-PLL is

$$L(s) = K_{PD} \cdot \frac{K_i}{s} \cdot \frac{K_{CCO}}{s} \cdot s\tau_{MZI}$$
(6.26)

The additional integrator  $\frac{K_i}{s}$  is used to cancel the zero of the MZI.  $K_{PD}$  is the phase detector gain. Note that the responsivity of the photodiode and TIA gain are important factors in determin-



Figure 6.3: EO-PLL block diagram.

ing signal detection. However, they do not affect the phase transfer function and are not included in L(s).

The loop can also be a Type-II loop, where we use a PFD followed by a charge pump with the loop gain

$$L(s) = K_{PD} \cdot \frac{I_{cp}(1 + sRC)}{2\pi C} \cdot \frac{K_i}{s} \cdot \frac{K_{CCO}}{s} \cdot s\tau_{MZI}$$
(6.27)

One such Type-II implementation was demonstrated recently by Behroozpour et.al. in [131].

#### Periodic reacquisition and loop BW

It should be noted that periodically the photodiode output undergoes a 180° phase shift as the slope of the laser signal changes from  $+\gamma$  to  $-\gamma$  and back. The time domain response of the photodiode for a stable chirp is shown in Fig. 6.3. The frequency remains constant for some time at  $\gamma \tau_{MZI}$ , then slows down and falls to zero and starts up again with a phase inversion. This mean that every  $T_{ramp}$ , the PLL must reacquire. To ensure that the time for reacquisition,  $t_{settling}$ , , takes a very small fraction of the  $T_{ramp}$ , the following condition is imposed

$$t_{settling} \approx \frac{1}{f_{BW}} \le 0.1 T_{ramp}$$
  
or  $f_{BW} \ge 10 \times f_{ramp}$  (6.28)

where  $f_{BW}$  is EO-PLL bandwidth.

It should be noted that for each  $T_{ramp}$ , the loop can reacquire from a different initial condition and the reacquistion process waveform and time can vary from cycle to cycle. In [131], the authors have included a retiming mechanism where the photodiode output is retimed with the reference, so that the initial condition is identical for each  $T_{ramp}$  cycle, as is the reacquisition process.

#### Sign inversion

The EO-PLL for FMCW requires a periodic inversion of the feedback sign to maintain stability. This is explained as follows. Consider a Type-II PLL with a phase-and-frequency detector (PFD). If the modulation slope  $\gamma$  is exact, the photodiode output frequency is  $\gamma \tau_{MZI}$  which matches the reference exactly and the control voltage output is zero. If on the up-ramp, there is a positive error  $\gamma + \delta \gamma$ , the photodiode frequency is higher than the reference and the control voltage is negative  $-V_{cont}$ . This subtracts in the adder shown in Fig. 6.3 and attenuates  $+\delta \gamma$ . A similar process occurs for  $-\delta \gamma$ . However, in the down ramp with slope  $-\gamma$ , a positive error  $+\delta \gamma$  manifests as a reduced frequency  $(\gamma - \delta \gamma)\tau_{MZI}$ , and the control voltage from the frequency detector is  $+V_{cont}$  exacerbating the modulation slope error. A similar process occurs for  $-\delta \gamma$ . The loop goes into +ve feedback for the down-ramp.

For this reason, the sign of the control voltage  $V_{cont}$  from the P/FD should be periodically inverted and synched to the up/down ramp generator.

#### Loop stability and reference frequency

Due to their discrete nature, PLLs only correct phase error every reference cycle at rate  $f_{ref}$ . The error in the phase continues to accumulate between two corrections. The high frequency drift in error is attenuated by the loop. If the error that passes through the low pass filtering action of L(s) accumulates faster than the correction rate, the loop will not lock. To ensure stability, the discrete time nature of the PLL should be masked by imposing the so-called Gardner limit  $f_{BW} = 1/5 \cdot f_{ref}$ .

Approximated analog domain analysis of Type-I loops confirms that very high bandwidths can also result in unacceptable phase margin. Typically

$$f_{BW} = 0.1 f_{ref}$$

$$f_{ref} = 10 \cdot f_{BW} = 100 \cdot f_{ramp} \text{ For Type-I}$$

$$(6.29)$$

Type-II PLLs have high spurs and typically  $f_{BW} = 1/20 \cdot f_{ref}$  to attenuate spurs. In Type -II, analog domain analysis confirms that bandwidth much higher than the 1/RC zero is good for stability as it serves to make the phase margin closer to 90°. This competes with the Gardner limit and the spur attenuation requirements.

$$f_{BW} = 0.05 f_{ref}$$

$$f_{ref} = 20 \cdot f_{BW} = 200 \cdot f_{ramp} \text{ For Type-II}$$

$$(6.30)$$

For a given  $f_{ramp}$ , calculated from application specifications, faster periodic reacquisition requires a higher bandwidth  $f_{BW}$ , and a higher reference frequency  $f_{ref}$  due to the stability considerations just discussed. Chirp bandwidth B does not affect these timing considerations.

#### MZI delay and area

Once the chirp repetition rate  $f_{ramp}/2$  and chirp bandwidth B, and hence the modulation slope  $\gamma$  are fixed by the application, the MZI delay  $\tau_{MZI}$  must be modified till the photodiode output frequency under lock,  $\gamma \tau_{MZI}$ , equals the chosen  $f_{ref}$ .

The relationship between  $f_{ref}$  and  $f_{BW}$  is fixed to mask the discrete nature of the PLL. However, a faster reacquisition spec will results in a large  $f_{ref}$ , as in Eq. 6.29 and 6.30, and a larger MZI delay  $\tau_{MZI}$  for a given chirp bandwidth B and ramp rate  $f_{ramp}/2$ . The MZI delay is then given by

$$\gamma \tau_{MZI} = f_{ref} 
\tau_{MZI} = f_{ref} \cdot \frac{T_{ramp}}{B} = \frac{100}{T_{ramp}} \cdot \frac{T_{ramp}}{B} 
100$$
(6.31)

$$\tau_{MZI} = \frac{100}{B} \tag{6.32}$$

#### Loop delay

In the presence of electrical and optical delay in the EO-PLL, phase margin and stability are compromised. For example, in a Type-I PLL the open loop gain in the presence of delay  $T_d$  is

$$L(s) = K_{PD} \cdot \frac{K_{VCO,eq}}{s} \cdot e^{-sT_d}$$
(6.33)

At  $f_{UGF}$ , which is equal to  $f_{BW}$  for first order behavior, this yields a phase margin of

$$PM = \left(-90^{\circ} - 2\pi f_{BW} \cdot T_d \cdot \frac{180}{\pi}\right) + 180^{\circ}$$
$$= 90^{\circ} - 2\pi \cdot \frac{10}{T_{ramp}} \cdot T_d \cdot \frac{180}{\pi}$$
(6.34)

For a phase margin of  $60^{\circ}$ , the maximum optical and electrical interconnect delay can be  $T_{ramp}/120$ .

#### 6.4 Proposed EO-PLL for FMCW LIDAR

#### 6.4.1 Motivation: Reduction in MZI delay and implementation area

In this project,  $B=3\,\mathrm{GHz}$ , chirp repetition rate is 1.67 MHz or up/down ramp time is  $T_{ramp}=0.3\mu\mathrm{sec}$ , giving a  $\gamma$  of  $10\,\mathrm{GHz}/\mu\mathrm{sec}$ . If the settling time is a tenth of the ramp time, and a Type-I EO-PLL with bandwidth a tenth of the reference is chosen,  $f_{ref}=100\times f_{ramp}=333.33\,\mathrm{MHz}$ . This yields an MZI delay of

$$\gamma \tau_{MZI} = f_{ref}$$

$$\tau_{MZI} = f_{ref} \cdot \frac{T_{ramp}}{B} = \frac{100}{T_{ramp}} \cdot \frac{T_{ramp}}{B}$$

$$\tau_{MZI} = \frac{100}{B}$$

$$= 33.33 \text{ ns !}$$

An MZI delay of 33.33 ns corresponds to an optical waveguide of impractical length 1 m in photonics.

We note that given freedom in changing chirp specifications, there are two cases to reducing MZI delay, which we discuss below.

#### Case I: Changing modulation slope

$$\frac{B \times m}{T_{ramp}} \tau_{MZI} = f_{ref}$$

$$\frac{B \times m}{T_{ramp}} \tau_{MZI} = \frac{100}{T_{ramp}}$$

$$\tau_{MZI} = \frac{1}{m} \cdot \frac{100}{B}$$
(6.35)

Increasing B to reduce  $\tau_{MZI}$  without changing  $T_{ramp}$  will keep the same reference and reacquisition proportion. The modulation slope has increased and the laser needs to sweep a larger bandwidth in the same time. This is a more challenging specification for the laser source optical component and is not practical for large optical phased array imaging with thousands of elements.

#### Case II: Constant modulation slope

$$\frac{B \times m}{T_{ramp} \times m} \tau_{MZI} = f_{ref}$$

$$\frac{B \times m}{T_{ramp} \times m} \tau_{MZI} = \frac{100}{T_{ramp} \times m}$$

$$\tau_{MZI} = \frac{1}{m} \cdot \frac{100}{B}$$
(6.36)

Here the modulation slope remains the same, but both the chirp bandwidth and chirp time increase. The repetition rate reduces and the reference frequency can be lower to maintain the same reacquisition proportion.

However, B and  $f_{ramp}$  are fixed through application, and can usually not be changed to reduce  $\tau_{MZI}$ . To reduce the MZI delay, we must reduce the reference frequency. As the bandwidth is determined by how fast we want the loop to reacquire,  $f_{BW}$  is still  $10 \times f_{ramp}$ , but we can seek to change the  $10 \times$  relationship between bandwidth and reference.

We note that in a first-order PLL with no loop filter, the phase margin is 90°, the only thing preventing a higher bandwidth is the limit imposed by the discrete-time nature of PLL correction. The PLL only corrects the error every reference cycle, or every N VCO cycles, so that on an average VCO frequency is  $N \cdot f_{ref}$ . For this reason, errors faster that  $f_{BW} = 0.1 \times f_{ref}$  are suppressed inside the loop, so that the detected error does not change before the correction can be applied. When N = 1, we can use a mixer based PLL to provide continuous analog correction to the PLL and do away with this bandwidth limit needed to suppress discrete-time effects. Theoretically, in such a case  $f_{BW} = f_{ref}$ .

The absence of loop filter also means that a separate second harmonic spur suppression method will be required. If the loop has a bandwidth  $f_{BW} = f_{ref}$ , including an additional first order RC filter for  $2f_{ref}$  suppression is not sufficient. Further, an RC pole below  $2f_{ref}$  will severely degrade the phase margin of the loop.



Figure 6.4: Proposed EO-PLL architecture with mixer-based phase detector.

If all other poles are sufficiently large, the phase margin is ideally  $-90^{\circ}$  and the loop is stable. The phase margin is limited only by the due to interconnect delay. From Equation 6.34, a delay upto  $T_{ramp}/120$  or 2.5 ns can be tolerated for a 60° phase margin with  $f_{ref} = f_{BW} = 10/T_{ramp}$ .

With this reduction in reference frequency, the MZI delay for a given loop is reduced by a factor of 10 to 3.33 ns, and the corresponding area also reduces.

#### 6.4.2 Loop architecture

The block diagram of the proposed mixer-based PLL is shown in Fig. 6.4.

#### 6.4.3 Implementation

To complete verification of the proposed architecture, a discrete component version was implemented. It is difficult to complete such a verification in simulation due to the extensive time required to determine reacquisition from all possible initial conditions. Due to the discrete implementation, the chirp specifications are modified. This is still sufficient to verify whether the proposed technique for reducing reference frequency and MZI delay is feasible.



Figure 6.5: Alcatel A1905LMI laser tuning curve.

#### Laser frequency tuning curve

A tunable laser source from Alcatel A1905LMI with a center frequency of 1550 nm is used. The laser tuning curve is shown in Fig. 6.5. The reason for the flat portions in the curve are attributed to the resolution of the Optical Spectrum Analyzer Thorlabs OSA202C. The laser cut-in voltage is  $20 \,\mathrm{mA}$  and it saturates at  $110 \,\mathrm{mA}$ . A straight line is fit to the laser source and yields a  $K_{CCO}$  of  $275.52 \,\mathrm{GHz/Amp}$ .

#### Chirp specifications and reference choice

The laser is placed in a laser controller driver from Thorlabs, CLD1015. The controller can accept control voltage modulations within a frequency range of  $DC - 250 \,\mathrm{kHz}$ . For this reason we choose, the loop bandwidth to be 25 kHz, so that the controller cut-off does not introduce a stray pole in the implemented first-order Type-I loop. This, in turn, sets the repetition rate to 2.5 kHz, so that the reacquisition at the start of each up and down ramp only occupies a tenth of the  $T_{ramp} = 400 \,\mu\mathrm{sec}$  period.

A tunable Newport delay FVDL26FAS, with a maximum delay of 600 ps has been chosen for



Figure 6.6: Mixer-based phase detector profile with limited monotonicity.

this experiment. This gives us  $\tau_{MZI} = 600 \, \mathrm{psec}$ .

With a reference frequency of 25 kHz, this means we require a chirp bandwidth of  $f_{ref}$  ·  $T_{ramp}/\tau_{MZI}$ , or 16.7 MHz.

#### Mixer PD transfer function

Minicircuits passive diode-based mixer ZAD8+ is used for this experiment. It supports a maximum RF port signal of  $50\,\mathrm{mW}$  or  $17\,\mathrm{dBm}$  into a  $50\,\Omega$  load or an amplitude of  $2.2\,\mathrm{V}$ . We use  $40\,\mathrm{mW}$  or  $16\,\mathrm{dBm}$  into  $50\,\Omega$  with an amplitude  $2\,\mathrm{V}$  of for this work. The output of the equivalent VCO, that is the photodiode-TIA combination drives the RF port. The mixer has an LO requirement of  $+7\,\mathrm{dBm}$ . The LO port is connected to the reference frequency generated by a Keysight RF Signal Generator. It has a conversion loss of  $8.5\,\mathrm{dB}$ , which means the IF output signal is  $7.5\,\mathrm{dBm}$ , that is  $5.6\,\mathrm{mW}$  into  $50\,\Omega$  or  $0.75\,\mathrm{V}$ .

We see that the  $K_{PD}$  V/rad of the mixer based phase detector is (see also Fig. 6.6)

$$V_{cont} = -0.5A_{VCO}A_{ref}cos(\Delta\phi_{VCO})$$

$$= 0.5A_{VCO}A_{ref}sin\left(\Delta\phi_{VCO} - \frac{pi}{2}\right)$$

$$= 0.5A_{VCO}A_{ref}sin(\phi_{\epsilon})$$

$$= 0.5A_{VCO}A_{ref}\phi_{\epsilon} = K_{PD}$$

$$K_{PD} = 0.5A_{VCO}A_{ref} = V_{IF} = V_{RF} \cdot \frac{1}{\sqrt{CL}}$$
(6.37)

This is valid in steady state, after frequency lock,  $\omega_{VCO} = \omega_{ref}$ . For a Type-I PLL the static phase error depends on the initial frequency condition of the VCO, and may be different from zero.  $\phi_{\epsilon}$  is the error from the zero control voltage phase error of  $\pi/2$ .  $V_{IF}$  is the amplitude of the IF signal from the mixer, CL is the conversion loss (dB10), and  $V_{RF}$  is the amplitude  $A_{VCO}$  generated when the photodiode-TIA combination drives the 50  $\Omega$  RF port of the mixer.

From this we conclude that the ZAD8+ mixer has a  $K_{PD}$  of 0.75 V/rad when used as a phase detector driven by an amplitude of 2 V into 50  $\Omega$ . In practice, we have used the photodiode and adjustable-gain TIA modulde from Thorlabs. We adjust the gain of the TIA till we obtain an amplitude of 2 V on the oscilloscope in 50 Ohm mode.

Integrator Design The integrator unity gain frequency  $K_i$  is chosen so that the first-order loop has a unity-gain frequency or bandwidth of 25 kHz. We note that

$$L(s) = K_{PD} \cdot \frac{K_i}{s} \cdot m \frac{K_{CCO}}{s} \cdot s \tau_{MZI}$$

$$\omega_u = K_{PD} \cdot K_i \cdot m \cdot K_{CCO} \cdot \tau_{MZI}$$
(6.38)

Here, m is the modulation coefficient of the laser controller. The output voltage of the summing integrator  $K_i/s$  is converted to a laser control current within the driver with a transconductance of 150 mA/V. With  $\omega_u = 25$  kHz, and the other parameters as derived in the previous sections, we get

$$K_i = 1.34 \,\mathrm{krad/s} = 0.213 \,\mathrm{kHz}$$
 (6.39)

An LM - 741 discrete op-amp with an intrinsic bandwidth of 1.8 MHz, well beyond 25 kHz is

chosen. The integrator circuit is shown in Fig. 6.4.

$$K_i = \frac{1}{R_{in}C_f} \tag{6.40}$$

$$R_{in} = 470\,\Omega\tag{6.41}$$

$$C_f = 1 \, mu$$
F based on component availability (6.42)

The choice of  $R_{in}$ , and hence  $C_f$  are determined so that the practical integrator provides the required integrating functionality for the majority of the 25 kHz bandwidth. The opamp must have DC feedback so that it does not latch to a rail, and the finite integrator pole  $1/R_fC_f$  should be much lower than 25 kHz. The DC gain  $R_f/R_{in}$  should also be very large. We choose  $R_f = 60.4 \,\mathrm{k}\Omega$ , yielding 40 dB DC gain  $^4$ . The integrator pole is at 2.63 kHz, and the actual implemented unity gain frequency  $K_i$  is 0.34 kHz.  $^5$ 

#### Nominal modulation and error summation

A chirp bandwidth of  $16.53\,\mathrm{GHz}$  is chosen corresponding to  $\Delta i$  of  $\pm 30\mathrm{mA}$  for the laser. With a controller modulation transconductance of  $150\,\mathrm{mA/V}$ , the input to the controller is a  $1.25\,\mathrm{kHz}$  triangular wave of amplitude  $\pm 0.2\,\mathrm{V}$ . This is a slope of  $0.4\,\mathrm{V/400\,\mu s}$ .

To close the loop, we use a summing integrator. It has an input corresponding to an input square wave between which generates the required nominal triangular chirp, one input for the mixer IF output which corresponds to the PLL loop correction and a third input corresponding to a fixed DC voltage to cancel offsets.

<sup>&</sup>lt;sup>4</sup>largely based on component availability

<sup>&</sup>lt;sup>5</sup>We anticipate that the  $K_{PD}$  itself will be attenuated somewhat, and instead of 40 kHz bandwidth as the implemented  $K_i$  suggests, we will get something closer to  $f_{ref}$  of 25 kHz. The latter is best calculated in measurement by observing the settling time of the control voltage per chirp cycle.

The swing  $\pm v_{pulse}$  of the 1.25 kHz square wave input to the summing integrator is

$$\frac{i_{in} \cdot t}{C_f} = v_{out}$$

$$\frac{v_{pulse}/R_{in} \cdot t}{C_f} = v_{out}$$
(6.43)

$$v_{pulse} \cdot \frac{1}{R_{in}C_f} = \frac{v_{out}}{t} = \frac{0.4}{400} \,\text{V}/\mu\text{sec}$$

$$v_{pulse} = 0.47 \,\text{V} \tag{6.44}$$

We generate this pulse from the same Keysight signal generator generating the reference, through the synchronized modulation output option  $^6$ . A sinewave at 1.25 kHz is passed through another LM741 configured as an open loop comparator with a resistive divider at the output. The  $\pm 15 \,\mathrm{V}$  square wave at the output of the comparator is scaled to  $\pm 0.47 \,\mathrm{V}$ .

The mixer output must be terminated with  $50\,\Omega$ . A  $50\,\Omega$  resistor placed to actual ground in parallel with  $470\,\Omega$  to the op-amp virtual ground provides about  $45\,\Omega$  which is sufficient for matching.

Finally, there are two offsets that need to be corrected to ensure that the input ramp to the controller has 0 DC voltage. The first is the opamp intrinsic offset, which can be first calibrated in open loop. The second comes from the chirp itself - as the square starts switching between  $\pm 0.47 \,\mathrm{V}$ , the output of the integrator ramps from 0V to 0.4 V and back. as such, an external DC voltage based on the amplitude of the ramp must be applied to zero the bias of the chirp.

#### Sign inversion

The sign of the mixer output should be switched periodically before being connected to the summing integrator input. For this we connect the mixer output to an inverting and a non-inverting buffer. The direct and inverted mixer output is then multiplexed using discrete MOS CD4066B. The switches are clocked using the comparator which generates the nominal modulation. This synchronizes the sign of the negative feedback to the rising or falling modulation slope.

<sup>&</sup>lt;sup>6</sup>This is essential, as  $\gamma \tau_{MZI} = f_{ref}$  is the basis for all foregoing calculations, and the repetition rate has a well defined relationship to reference, which in turn sets the MZI delay of 600 psec.

#### 6.4.4 Acquisition of mixer-based phase detector

The mixer phase detector profile has limited monotonicity. Further, it would appear that it puts out zero DC control voltage when the two input frequency are not matched. We expect such a phase detector to have almost no acquisition range. In practice, mixer based phase detectors exhibit finite acquisition range closely related to loop bandwidth as shown below.

$$V_{cont} = A_{VCO}A_{ref}sin(\omega_{VCO}t)sin(\omega_{ref}t)$$

$$= 0.5A_{VCO}A_{ref}[cos((\omega_{VCO} + \omega_{ref})t) - cos(\Delta\omega t)]$$
(6.45)

The high frequency component  $\omega_{VCO} \neq \omega_{ref}$  is well beyond the loop bandwidth and is completely filtered. The low frequency component  $\Delta\omega$  passes with some attenuation and would average to zero, so there is no correcting component. However, the instantaneous VCO frequency varies as follows

$$\begin{split} V_{out} &= A_{VCO} sin \left[ \omega_{VCO} t + K_{VCO,eq} \cdot G_{\Delta\omega} \cdot \int V_{cont} dt \right] \\ &= A_{VCO} sin \left[ \omega_{VCO} t - 0.5 A_{VCO} A_{ref} \cdot K_{VCO,eq} \cdot G_{\Delta\omega} \cdot \int cos(\Delta\omega) dt \right] \\ &= A_{VCO} sin \left[ \omega_{VCO} t - 0.5 A_{VCO} A_{ref} \cdot \frac{K_{VCO,eq} \cdot G_{\Delta\omega}}{\Delta\omega} \cdot sin(\Delta\omega t) \right] \\ &= A_{VCO} sin(\omega_{VCO} t) - 0.5 A_{VCO}^2 A_{ref} \cdot \frac{K_{VCO} \cdot G_{\Delta\omega}}{\Delta\omega} \cdot cos(\omega_{VCO} t) sin(\Delta\omega t) \end{split}$$
(6.46)

 $G_{\Delta\omega}$  is the attenuation experienced by the  $0.5A_{VCO}a_{ref}cos(\Delta\omega)$  component of the mixer output. The results are derived under the condition that  $0.5A_{VCO}^2A_{ref}\cdot\frac{K_{VCO}\cdot G_{\Delta\omega}}{\Delta\omega}$  is small and  $cos(\alpha)\approx 1$ , and  $sin(\alpha)\approx \alpha$  for small  $\alpha$ .

The beat component at  $\omega_{VCO} \pm \Delta \omega$ , goes around the loop and mixes with the reference to generate a DC component which can change the VCO frequency. The acquisition range of a mixer-based loop is therefore related to the loop bandwidth, so that sufficient  $\Delta \omega$  component is passed though. To truly have an acquisition range wide enough to accommodate the order of drifts in optical components, an aided acquisition loop is required as discussed in Section 6.5.1.

#### 6.4.5 Measured Perfomance

A photograph of the measurement setup is shown in Fig. 6.7.



Figure 6.7: Photograph of measurement setup to verify performance of mixer based continuous analog correction loop with bandwidth equal to reference frequency.

Due to the absence of aided acquisition, the loop only stays in lock for about a minute. However, this time was sufficient to verify the effect of locking the laser chirp with an EO-PLL versus when the laser is modulated in open loop by a nominal triangular control waveform. The results of the measurement are shown in http://theeigenrhythm.weebly.com/lidar-video.html. The chirp rate is  $1.25\,\mathrm{kHz}$  and we can see that each ramp is  $400\,\mu\mathrm{sec}$  long. Without the EO-PLL, the loop is not stable, and we can see that the frequency of the photodiode output is varying with time. The reference is  $25\,\mathrm{kHz}$  and we can clearly determine ten ears of locked downconverted modulation in the  $(\gamma \tau_{MZI}\,\mathrm{Hz})$  in the photodiode output.

#### 6.5 Future work

#### 6.5.1 Mixer-based EO-PLL

High-bandwidth mixer-based EO-PLLs have three main remaining challenges including acquisition, suppression of second harmonic spur and noise.

#### Aided acquistion

In the short-term, the problem of acquisition and frequency tracking in the proposed mixed-based EO-PLL must be addressed through a low bandwidth aided acquisition loop. A simple counter and comparator based frequency detector, as in [135] is sufficient for this.

Apart from the ability to acquire lock from an unfavorable initial condition, EO-PLL have additional challenges in acquisition. As opposed to conventional electronic PLLs, if the control voltage latches to a rail, the laser can be cut off or saturate. This means that there is no ramplike modulation and the photodiode output is a DC current. Another reason may be that the laser is not saturated and has a ramp like modulation, but the drift in optical components is large enough to push the relevant low frequency component beyond the bandwidth of the photodiode. In conventional electric PLLs, zero output at the oscillator input node, means the VCO frequency is too low and the loop will put out a positive control voltage correction to accelerate the VCO. This is not the case here, as zero output from the photodiode may be because the either the frequency is too high or too low. A special mechanism will be needed to resolve between these two instances.

#### Frequency Tracking

The low bandwidth aided acquisition loop can also help with frequency tracking. Large changes in functionality of optical components, such as MZI delay variation with temperature drift, can change the photodiode output frequency enough to push the loop out of lock. In [130], the authors have estimated that a  $1\,\mu$ m optical fiber can introduce a phase variation of  $7\pi\,\text{rad}/^{\circ}\text{C}$ . The center frequency of a cavity based laser can vary by 25 GHz per 2 °C. The tracking bandwidth need not be very large, but should be large enough to correct slow drifts before they accumulate and push the loop out of lock. As discussed in Chapter 5, FTL bandwidth should remain much smaller than that of the main Type-I loop.

#### Second harmonic spurs

In a mixer-based phase detector, once frequencies are matched, the control voltage is

$$V_{cont} = 0.5 A_{VCO} A_{ref} \left[ cos(2\omega_{ref} + \Delta\phi_{VCO}) - cos(\Delta\phi_{VCO}) \right]$$
(6.47)

With a wide bandwidth of  $f_{ref}$ , this loop can have a strong second harmonic spur. For band-



Figure 6.8: Second harmonic cancellation in mixer-based EO-PLL.

width  $f_{BW} = f_{ref}$ , including an additional first order RC filter between  $f_{ref}$  and  $2f_{ref}$  for  $2f_{ref}$  suppression is not sufficient. Further, an RC pole below  $2f_{ref}$  will severely degrade the phase margin of the loop.

For this reason we introduce a cancellation scheme as shown in Fig. 6.8.As this is only a Type-I loop, a finite static phase error  $\phi_{VCO}$  other than zero may appear at the mixer PD output in steady state for frequency lock. The issue is that the phase of the second harmonic in the control voltage will vary as  $\Delta\phi_{VCO}$  drifts to keep the frequency locked, and the cancellation is inexact.

Two possible solutions are proposed toward this. The first shown in 6.8, follows the replica  $2f_{ref}$  signal with a variable phase shifter. The control voltage of the phase shifter is derived from the loop DC control voltage and the phase shifter generates  $\Delta\phi_{VCO}$  in response to it. In effect, the phase shifter is the inverse of the phase detector and must implement an inverse cosine function.

We propose another solution where instead of just a counter based frequency detector in the AAC/FTL, we use a very low bandwidth phase-and-frequency detector. This will impose a fixed phase error condition between the VCO and the reference at lock, so that the output current of the charge pump is zero. This static phase error does not change as the loop tracks frequency drift, and the phase shifter then just has to be calibrated once for maximum cancellation.

Race Conditions In Chapter 5 and Appendix B, we discussed race conditions for PLLs with two control mechanisms. Here, we have a Type-I and Type-II loop simultaneously controlling the VCO. The Type-II loop sets the static phase error to null the integrator input in steady state. It adjusts the bias of the loop filter so that the bias control voltage from the Type-II loop, and the voltage output of the Type-I loop in response to the fixed phase error, together bring the VCO to frequency lock. As such, we do not anticipate a race condition in this solution.

#### Noise analysis

While noise analysis of the laser source and the EO-PLL is beyond the scope addressed in this thesis, an overall future system implementation will address this issue.

#### 6.5.2 Conventional EO-PLL around a Laser Phase Shifter

We seek to leverage the vast body of work in electronic PLLs to come up with solutions for electrooptic PLLs which will ease the design and implementation of the silicon photonics component. Two possible architectures are proposed wrapped around laser phase shifters as opposed to around laser source.

#### 6.5.2.1 Proposal I

The first architecture is shown in Fig. 6.9. In this a laser phase modulator is used to adjust the output phase of a clean fixed frequency laser source. This solution can scale well in silicon photonics and is particularly applicable for very large element optical phased arrays. By providing a quadratic modulation in phase we get a linear ramp in frequency. The phase modulator should respond to all control voltage corrections, so it should have a bandwidth sufficiently greater than the EO-PLL bandwidth.

In addition to the phase detector and charge pump, we have shown that the loop requires an extra integrator to cancel the MZI differentiator zero. The phase modulator requires an additional integrator to provide the quadratic phase modulation. The phase modulator in conjunction with



Figure 6.9: EO-PLL around laser phase shifter with electronic phase-domain integrator.

the delay discriminator and photodiode present as an equivalent VCO.

$$L(s) = K_{PD} \cdot \frac{I_{CP}(1 + sRC_{LF})}{sC_{LF}} \cdot \left[ \frac{K_{i1}}{s} \cdot \frac{K_{i2}}{s} \cdot K_{PM} \cdot s\tau_{MZI} \right]$$
 (6.48)

$$= K_{PD} \cdot \frac{I_{CP}(1 + sRC_{LF})}{sC_{LF}} \cdot \left[\frac{K_{VCO,eq}}{s}\right]$$

$$(6.49)$$

where, 
$$K_{VCO,eq} = K_{i1} \cdot K_{i2} \cdot K_{PM} \cdot \tau_{MZI}$$
 (6.50)

The charge pump and loop filter  $\frac{I_{CP}(1+sRC_{LF})}{sC_{LF}}$  are only present in a Type-II loop.

It should be noted that the output phase of the modulator  $(V_{cont} + \frac{\gamma}{K_{VCO,eq} \cdot s}) \frac{K_{i2}K_{PM}}{s}$  must wrap every  $2\pi$ . As such, there should be a way to wrap the voltage input to the phase modulator, that is the output of the integrator  $K_{i2}/s$ . Any comparator based wrapping, will introduce periodic disturbances, or spurs in the process. As such we propose a phase domain integrator, similar to [136] where the output of the phase detector is converted to a current. This current is integrated by a current controlled oscillator <sup>7</sup> in the phase domain. In [136], the authors use a CCO as an integrator to reduce the area needed for loop filter capacitance. To convert the phase-domain integral back to voltage domain, it is compared against '0–' phase, that is the reference edge, using a phase detector. A pulse with width proportional to the phase-domain integral is generated. This can be

<sup>&</sup>lt;sup>7</sup>this is a separate oscillator different from the equivalent laser current controlled oscillator



Figure 6.10: EO-PLL around laser phase shifter with downconverted ramp locked by conventional electronic FMCW PLL techniques.

used to control the main equivalent voltage or current controlled oscillator by using it directly or through an I-DAC.

Here, we are using it as a phase wrapping integrator. After the PFD we get a waveform with a width between 0 to  $T_{ref}$  which is proportional to  $\int \int \phi_{\epsilon} \cdot dt \cdot dt$ . This contains information on the integral of phase error  $\phi_{\epsilon}$ .

Ring VCO Output = 
$$sig[cos(\omega_{ref}t + \int \int \phi_{\epsilon} \cdot dt \cdot dt)]$$
 (6.51)

The second integral will be removed by the MZI zero, so that the overall block from the loop-filter to the photodiode-TIA looks like an equivalent VCO. Next, the V-DAC maps the PFD output to a voltage between  $-V_0$  to  $+V_0$ , that is the input voltage range of the phase-modulator. The phase modulator then shifts the laser by a phase proportional to the input control. The phase is updated once every cycle for the EO-PLL path. The nominal modulation is a quadratic waveform in the appropriate voltage range  $(-V_0, +V_0)$ .

#### 6.5.2.2 Proposal II

If we use a laser phase shifter instead of a laser source, we have access to the chirped laser waveform as well as the original unmodulated signal. By mixing these two in the delay discriminator plus photodiode combination, we can translate the triangular modulation to a low frequency, as shown in Fig. 6.10. It is now possible, to lock this electrical domain chirp to a reference chirped using a Direct Digital Frequency Synthesizer (DDFS). Other than a DDFS, it is also possible to leverage the vast body of work on electrical FMCW PLLs to lock the chirp, such as with variable modulus fractional-N dividers in [8]. This approach is not feasible when the loop is locked around the laser source instead of the laser phase modulator.

As there in no need for  $\gamma \tau_{MZI}$  to now equal the reference frequency, this technique may offer a scope to reduce the MZI delay even below 3.33 ns as acheived by the mixed-based PLL.

## Chapter 7

## Conclusion

This thesis has studied and proposed solutions for several problems in signal synthesis across the spectrum. Specifically, we have investigated challenges in implementing these solutions in CMOS, to contribute towards making low-cost integrated CMOS viable for emerging technologies.

To study the challenge with generating signals in the terahertz gap, we first studied the problem of maximizing fundamental frequency oscillation. A theoretical analysis of the interstage matching network between the stages of a ring oscillator, shows that at high frequencies all-inductive matching networks are least lossy, and that the loss is independent of the matching network topology. This analysis also enabled us to upgrade the conventional maximum oscillation frequency metric to include both the dominant sources of loss for THz design - transistor loss and the passive quality factor. A reproducible design methodology, known as the Maximum Gain Ring Oscillator (or MGRO) for implementing a 3-element matching network which maximizes startup was introduced. This approach to oscillator design can also be extended to implement multi-element matching networks that optimize other metrics such as phase noise, and even to implement wideband matching networks for spectroscopic applications. We also looked at the problem of optimally extracting harmonics from the fundamental oscillators to generate signals in the THz-gap region beyond technology  $f_{max}$ .

Next, we studied harmonic signal generation in conventional frequency multipliers. We noted that the optimal load for harmonic signal in bulk CMOS is limited by substrate loss, and the path to increase harmonic power lies in the ability to generate large harmonic current from the device transconductance. For this we propose, a power mixer topology that engineers the device

nonlinearity by shaping the waveforms incident on the device nodes, and controlling the device region of operation to maximize current at a specific harmonic. Specifically, we show that mixing a first and second harmonic with appropriate amplitudes and relative phase shift can generate upto  $3\times$  more current, or  $9\times$  more power than a conventional duty-cycle optimized frequency tripler. Broadly, this non-linearity engineering paradigm for signal generation beyond device cut-off is a departure from the conventional linearity-focused fundamental frequency transceiver design. Through waveform shaping we can dictate how a device moves through different regions of its I-V curve to optimize signal generation at other harmonics, and also to optimize other metrics such as DC-to-harmonic-power generation efficiency.

We proposed a low-noise and simultaneously low-spur RF PLL, the Reference-Sampling PLL (or SSPLL) which eliminates the N-squared output phase-noise from a traditional PLL reference buffer, by sampling the reference sinewave directly with the VCO. A VCO buffer is still used to generate the sampling clock but consumes much less power. Of multiple samples generated, the loop only corrects VCO phase error with the sample near the reference zero-crossing and most of the loop components are gated to operate only for a short period around this event. Further, by using a Type-I loop with fewer components, the loop noise-power FoM [137] is lower than even the SSPLL although, unlike the SSPLL, SSPLL has a virtual division-by-N. Combined with VCO techniques that have state-of-the-art VCO FoM, this approach yields record jitter-power performance for the PLL. It also displays much lower spur than other contemporary ultra-low noise PLLs, as it inherently isolates the VCO tank from a varying load. Methods for programming the PLL bandwidth without compromising on area or performance, despite the Type-I implementation, have been suggested. Lastly, in this prototype we have used a crystal sinewave, but it can be reasonably conjectured that this approach may be useful in cascaded PLLs where the intermediate PLL output is a sinewave.

We conclude with some approaches towards Frequency-Modulated Continuous-Wave (FMCW) Light-Detection-And-Ranging (LIDAR). FMCW techniques with optical imagers can enable micrometers of resolution. However, cost-effective tunable optical sources are temperature-sensitive and have nonlinear tuning profiles, rendering precise frequency modulations or 'chirps' untenable. Locking them to an electronic reference through an electro-optic PLL, and electronically calibrating the control signal for nonlinearity and ambient sensitivity, can make such chirps possible. Further, by building on existing electrical PLL works we can also ease the design constraints on optical

systems. As an example, we showed an EO-PLL where by using continuous error correction, rather than once per reference cycle, we were able to reduce the form-factor of the optical delay discriminator in the photo-electic interface by a factor of ten. To avoid high-cost modular implementations, we seek to leverage the twin advantages of CMOS - intensive integration and low-cost high yield towards developing a single-chip solution that uses on-chip signal processing and phased arrays to generate precise and robust chirps for an electronically-steerable LIDAR beam.

# Part I

Bibliography

## Bibliography

- [1] T. W. Crowe, W. L. Bishop, D. W. Porterfield, J. L. Hesler, and R. M. Weikle, "Opening the terahertz window with integrated diode circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 10, pp. 2104–2110, 2005.
- [2] J. Sharma, T. Dinc, and H. Krishnaswamy, "A +4 dbm frequency doubler at  $f_{max}$  in 130 nm CMOS," *IEEE Microw. Wireless Compon. Lett.*, pp. 1–4, 2014.
- [3] X. Gao, E. Klumperink, G. Socci, M. Bohsali, and B. Nauta, "A 2.2 GHz sub-sampling PLL with 0.16 psrms jitter and -125dbc/Hz in-band phase noise at  $700\,\mu\mathrm{W}$  loop-components power," *Proc. IEEE VLSI Circuits Symp*, p. 139140, 2010.
- [4] A. Elkholy, M. Talegaonkar, T. Anand, and P. K. Hanumolu, "Design and analysis of low-power high-frequency robust sub-harmonic injection-locked clock multipliers," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3160–3174, 2015.
- [5] C. Muschalik, "Influence of RF oscillators on an OFDM signal," *IEEE Transactions on Consumer Electronics*, vol. 41, no. 3, pp. 592–603, 1995.
- [6] A. V.-Garcia, A. Natarajan, D. Liu, M. Sanduleanu, X. Gu, M. Ferriss, B. Parker, C. Baks, J.-O.Plouchart, H. Ainspan, B. Sadhu, M. Islam, and S. Reynolds, "A fully-integrated dualpolarization 16-element W-band phased-array transceiver in SiGe BiCMOS," *IEEE Proc. of Radio Frequency Integrated Circuits*, pp. 375–378, 2013.
- [7] R. W. McMillan, "Terahertz imaging, millimeter-wave radar," NATO Security through Science Series A: Chemistry and Biology, 2005.

[8] J. Lee, Y. A. Li, M. H. Hung, and S. J. Huang, "A fully-integrated 77-GHz FMCW radar transceiver in 65 nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2746–2756, 2010.

- [9] M. Fujishima, M. Motoyoshi, K. Katayama, K. Takano, N. Ono, and R. Fujimoto, "98 mW 10 Gbps wireless transceiver chipset with D-band CMOS circuits," *IEEE J. Solid-State Circuits*, vol. 48, no. 10, pp. 2273–2284, 2013.
- [10] S. Thyagarajan, "Millimeter-wave/terahertz circuits and systems for wireless communication," Ph.D. dissertation, U. California, Berkeley.
- [11] Q. Gu, "THz interconnect: The last centimeter communication," *IEEE Communication Magazine*, pp. 206–215, 2015.
- [12] K. Statnikov, "60-GHz to 1-THz multi-color active imaging with a lens-coupled SiGe HBT chip-set," *IEEE Trans. on Microw. Theory and Techn.*, vol. 63, no. 2, pp. 520–532, 2015.
- [13] Q. Zhong, "A 210-to-305 GHz cmos receiver for rotational spectroscopy," IEEE Int. Solid-State Circuits Conf. Tech. Dig., pp. 426–427, 2016.
- [14] C. Jiang, "A 320 GHz subharmonic-mixing coherent imager in 0.13 μm SiGe BiCMOS," IEEE Int. Solid-State Circuits Conf. Tech. Dig., pp. 432–433, 2016.
- [15] N. Sharma, "160 310 GHz frequency doubler in 65-nm CMOS with 3-dbm peak output power for rotational spectroscopy," *IEEE Proc. of Radio Frequency Integrated Circuits*, pp. 186–189, 2016.
- [16] C. C, "Hyperspectral imaging: Techniques for spectral detection and classification," Springer Science and Business Media, 2003.
- [17] Qualcomm, "Project x," http://tricorder.xprize.org/, 2012, [Online; accessed 1-April-2017].
- [18] G. Rodenberry, "Star trek," http://www.startrek.com/, 2151, [Online; accessed 1-April-2017].
- [19] M. Seo, M. Urteaga, J. Hacker, A. Young, Z. Grifth, V. Jain, R. Pierson, P. Rowell, A. Skalare, A. Peralta, R. Lin, D. Pukala, and M. Rodwell, "InP HBT IC technology for terahertz

- frequencies: Fundamental oscillators up to 0.57 THz," *IEEE J. Solid-State Circuits*, vol. 46, no. 10, pp. 2203–2214, 2010.
- [20] M. Seo, M. Urteaga, A. A. Young, V. Jain, Z. Griffith, J. Hacker, P. Rowell, R. Pierson, and M. Rodwell, "> 300GHz fixed-frequency and voltage-controlled fundamental oscillators in an InP DHBT process," *IEEE Proc. of Int. Microw. Symp. Dig.*, pp. 272–275, 2010.
- [21] V. Radisic, X. Mei, W. Deal, W. Yoshida, P. Liu, J. Uyeda, M. Barsky, L. Samoska, A. Fung, T. Gaier, and R. Lai, "Demonstration of sub-millimeter wave fundamental oscillators using 35-nm InP HEMT technology," *IEEE Microw. Wireless Compon. Lett.*, vol. 17, no. 3, pp. 223–225, 2007.
- [22] V. Radisic, L. Samoska, W. Deal, X. Mei, W. Yoshida, P. Liu, J. Uyeda, A. Fung, T. Gaier, and R. Lai, "A 330-GHz MMIC oscillator module," *IEEE Proc. of Int. Microw. Symp.*, pp. 395–398, 2008.
- [23] Y. Baeyens, N. Weimann, V. Houtsma, J. Weiner, Y. Yang, J. Frackoviak, P. Roux, A. Tate, and Y. Chen, "Highly efficient harmonically tuned InP D-HBT push-push oscillators operating up to 287GHz," *IEEE Proc. of Int.. Microw. Symp.*, pp. 341–344, 2007.
- [24] R. Makon, R. Driad, K. Schneider, R. Aidam, M. Schlechtweg, and G. Weimann, "Fundamental W-band InP DHBT-based VCOs with low phase noise and wide tuning range," *IEEE Proc. of Int. Microw. Symp.*, pp. 649–652, 2007.
- [25] I. Kallfass, A. Tessmann, H. Massler, D. Lopez-Diaz, A. Leuther, M. Schlechtweg, and O. Ambacher, "A 300GHz active frequency-doubler and integrated resistive mixer MMIC," IEEE European Microw. Integrated Symp., pp. 200–203, 2009.
- [26] W. Weibo, W. Zhigong, Z. Bin, K. Yaohui, W. Liqun, and Y. Naibin, "A 108GHz GaAs MHEMT VCO MMIC," IEEE Proc. of Int. Symp. on Microw., Antenna, Propag. and EMC Techn. for Wireless Commun., pp. 127–130, 2009.
- [27] X. Melique, A. Maestrini, R. Farre, P. Mounaix, M. Favreau, O. Vanbesien, J. Goutoule, F. Mollot, G. Beaudin, T. Nahri, and D. Lippens, "Fabrication and performance of InP-

- based heterostructure barrier varactors in 250-GHz waveguide tripler," *IEEE Trans. Microw. Theory Techn.*, vol. 48, no. 6, p. 10001006, 2000.
- [28] X. Qun, J. Hesler, T. Crowe, B. Deaver, and R. Weikle, "A 270-GHz tuner-less heterostructure barrier varactor frequency tripler," *IEEE Microw. Wireless Compon. Lett.*, vol. 17, no. 4, pp. 241–243, 2007.
- [29] —, "A 5-mW and 5% efficiency 210GHz InP-based heterostructure barrier varactor quintupler," *IEEE Microw. Wireless Compon. Lett.*, vol. 14, no. 4, pp. 159–161, 2004.
- [30] T. Bryllert, A. Malko, J. Vukusic, and J. Stake, "A 175 GHz HBV frequency quintupler with 60mW output power," *IEEE Microw. Wireless Compon. Lett.*, vol. 22, no. 2, pp. 76–78, 2012.
- [31] J. Vukusic, T. Bryllert, O. Olsen, J. Hanning, and J. Stake, "Monolithic HBV-based 282 GHz tripler with 31mW output power," *IEEE Electron Device Lett.*, vol. 33, no. 6, pp. 800–802, 2012.
- [32] A. Maestrini, J. Ward, J. Gill, C. Lee, B. Thomas, R. Lin, G. Chattopadhyay, and I. Mehdi, "A frequency-multiplied source with more than 1mW of power across the 840 – 900GHz band," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 7, pp. 1925 – 1932, 2010.
- [33] A. Maestrini, I. Mehdi, J. Siles, J. Ward, R. Lin, B. Thomas, C. Lee, J. Gill, G. Chattopadhyay, E. Schlecht, J. Pearson, and P. Siegel, "Design and characterization of a room temperature all-solid-state electronic source tunable from 2.48 to 2.75THz," *IEEE Trans. on Terahertz Sci. and Techn.*, vol. 2, no. 2, pp. 177–185, 2012.
- [34] C. Zhao, "Modelling and characterisation of a broadband 85/170GHz schottky varactor frequency doubler," Ph.D. dissertation, Chalmers University of Technology, 2011.
- [35] E. Öjefors, J. Grzyb, Y. Zhao, B. Heinemann, B. Tillack, and U. Pfeiffer, "A 820 GHz SiGe chipset for terahertz active imaging applications," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 224–226, 2011.
- [36] E. Laskin, K. Tang, K. Yau, P. Chevalier, A. Chantre, B. Sautreuil, and S. Voinigescu, "170-GHz transceiver with on-chip antennas in SiGe technology," *IEEE Proc. of Radio Frequency Integrated Circuits Symp.*, pp. 637–640, 2008.

[37] R. Wanner, R. Lachner, and G. Olbrich, "A monolithically integrated 190 GHz SiGe push-push oscillator," *IEEE Microw. Wireless Compon. Lett.*, vol. 15, no. 12, pp. 862–864, 2005.

- [38] Y. Zhao, B. Heinemann, and U. Pfeiffer, "Fundamental mode colpits VCOs at 115 and 165 GHz," IEEE Proc. of Bipolar Circuits and Techn. Meeting, pp. 33–36, 2011.
- [39] E. Laskin, P. Chevalier, A. Chantre, B. Sautreuil, and S. Voinigescu, "80/160-GHz transceiver and 140-GHz amplifier in SiGe technology," *IEEE Proc. of Radio Frequency Integrated Cir*cuits Symp., pp. 153–156, 2007.
- [40] R. Wanner, R. Lachner, G. Olbrich, and P. Russer, "A SiGe monolithically integrated 278 GHz push-push oscillator," *IEEE Proc. of Int. Microw. Symp.*, pp. 333–336, 2007.
- [41] E. Seok, C. Cao, D. Shim, D. Arenas, D. Tanner, C. Hu, and K. K.O., "A 410 GHz CMOS push-push oscillator with an on-chip patch antenna," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 472–629, 2008.
- [42] D. Huang, T. R. LaRocca, M. F. Chang, L. Samuska, A. Fung, R. Campbell, and M. Andrews, "Terahertz CMOS frequency generator using linear superposition technique," *IEEE J. Solid-State Circuits*, pp. 2730–2738, 2008.
- [43] D. Shim, D. Koukis, D. Arenas, D. Tanner, and K. K.O., "553 GHz signal generation in CMOS using a quadruple-push oscillator," *Symp. on VLSI Circuits*, pp. 154–155, 2011.
- [44] B. Razavi, "A 300 GHz fundamental oscillator in 65 nm CMOS technology," IEEE J. Solid-State Circuits, vol. 46, no. 4, pp. 894–903, 2011.
- [45] K. Sengupta and A. Hajimiri, "Distributed active radiation for THz signal generation," IEEE Int. Solid-State Circuits Conf. Tech. Dig., pp. 288–289, 2011.
- [46] —, "A 0.28 THz power-generation and beam-steering array in CMOS based on distributed active radiators," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3013–3031, 2012.
- [47] O. Momeni and E. Afshari, "High power terahertz and millimeter-wave oscillator design: A systematic approach," IEEE J. Solid-State Circuits, vol. 46, no. 3, pp. 583–597, 2011.

[48] Y. Tousi, O. Momeni, and E. Afshari, "A 283-to-296GHz VCO with 0.76mW peak output power in 65nm CMOS," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 258–260, 2012.

- [49] B. Khamaisi and E. Socher, "A 209-233GHz frequency source in 90nm CMOS technology," IEEE Microw. Wireless Compon. Lett., pp. 260-262, 2012.
- [50] B. Cetinoneri, Y. Atesal, A. Fung, and G. Rebeiz, "W-band amplifers with 6-db noise figure and milliwatt-level 170-200 GHz doublers in 45nm CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 3, pp. 286–287, 2011.
- [51] M. Vehovec, L. Houselander, and R. Spence, "On oscillator design for maximum power," IEEE Trans. on Circuit Theory, vol. 15, no. 3, pp. 281–283, 1968.
- [52] B. Kormanyos and G. Rebeiz, "Oscillator design for maximum added power," IEEE Microw. Guided Wave Lett., vol. 4, no. 6, pp. 205–207, 1994.
- [53] R. J. Gilmore and F. J. Rosenbaum, "An analytic approach to optimum oscillator design using s-parameters," *IEEE Trans. Microw. Theory Techn.*, vol. 31, no. 8, pp. 205–207, 1983.
- [54] S. Lee, B. Jagannathan, S. Narasimha, A. Chou, N. Zamdmer, J. Johnson, R. Williams, L. Wagner, J. Kim, J. Plouchart, J. Pekarik, S. Springer, and G. Freeman, "Record of performance of 45nm SOI CMOS technology," *IEEE Int. Electron Devices Meeting*, pp. 255–258, 2007.
- [55] BSIM4SOIv4.4 MOSFET Model Users Manual, University of California at Berkeley, 2010.
- [56] Hyper Lynx ® User's Manual Version 15.2, Mentor Graphics Corp., 2012.
- [57] M. Koolen, J. Geelen, and M. Versliejen, "An improved de-embedding technique for on-wafer high-frequency characterization," *IEEE Proc. of Bipolar Circuits and Techn. Meeting*, pp. 188–191, 1991.
- [58] U. Gogineni, H. Li, J. A. del Alamo, S. L. Sweeney, J. Wang, and B. Jagannathan, "Effect of substrate contact shape and placement on RF characteristics of 45 nm low power CMOS devices," *IEEE J. Solid-State Circuits*, vol. 45, no. 5, pp. 998–1006, 2010.

[59] Z. Huszka, K. Molnar, and E. Seebacher, "Estimation of  $f_{max}$  by the common-intercept method," *IEEE Proc. of Bipolar/BiCMOS Circuits and Techn. Meeting*, pp. 233–236, 2003.

- [60] J. Sharma and H. Krishnaswamy, "215GHz CMOS signal source based on a maximum gain ring oscillator topology," *IEEE Proc. of Int. Microw. Symp.*, pp. 1–3, 2012.
- [61] J. Plouchart, "Applications of SOI technologies to communication," IEEE Proc. of Comp. Semiconductor Integrated Circuit Symp., pp. 1–4, 2011.
- [62] K. Yau, I. Sarkas, A. Tomkins, P. Chevalier, and S. Voinigescu, "On-wafer s-parameter deembedding of silicon active and passive devices up to 170GHz," *IEEE Proc. of Int. Int. Microw. Symp.*, pp. 600–603, 2010.
- [63] M. Seo, B. Jagannathan, C. Carta, J. Pekarik, L. Chen, C. Yue, and M. Rodwell, "A 1.1V 150GHz amplifier with 8db gain and +6dbm saturated output power in standard digital 65nm CMOS using dummy prefilled microstrip lines," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 484–485, 2009.
- [64] S. Venkateswaran and P. Sarma, "Maximum available power gain of active 2-ports," IEEE Electron. Lett., vol. 4, no. 13, pp. 278–279, 1968.
- [65] M. Gupta, "Power gain in feedback amplifiers, a classic revisited," IEEE Trans. Microw. Theory Techn., vol. 40, no. 5, pp. 864–879, 1992.
- [66] H. Krishnaswamy, "Architectures and integrated circuits for RF and mmwave multiple-antenna systems on silicon," Ph.D. dissertation, University of Southern California, 2009.
- [67] H. Krishnaswamy and H. Hashemi, "A variable-phase ring oscillator and PLL architecture for integrated phased array transceivers," *IEEE J. Solid-State Circuits*, vol. 43, no. 11, pp. 2446–2463, 2008.
- [68] —, "A rigorous phase noise analysis of tuned ring oscillators," *Proc. of IEEE Radio and Wireless Symp.*, pp. 43–46, 2007.
- [69] C. Cao, E. Seok, and K. K.O., "192 GHz push-push VCO in 0.13μm CMOS," IEEE Electron. Lett., vol. 42, no. 4, pp. 208–209, 2006.

[70] J. Sharma and H. Krishnaswamy, "216- and 316 GHz 45 nm SOI CMOS signal sources based on a maximum-gain ring oscillator topology," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 1, pp. 492–504, 2013.

- [71] O. Momeni and E. Afshari, "A 220 to 275GHz traveling-wave frequency doubler with -6.6dbm power at 244GHz in 65nm CMOS," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 286–287, 2011.
- [72] E. Ciardha, S. Lidholm, and B. Lyons, "Generic-device frequency-multiplier analysis a unified approach," *IEEE Trans. Microw. Theory Techn.*, pp. 1134–1141, 2000.
- [73] N. Mazor and E. Socher, "Analysis and design of an X-band-to-W-band CMOS active multiplier with improved harmonic rejection," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 5, pp. 1924–1933, 2013.
- [74] R. Han and E. Afshari, "A broadband 480 GHz passive frequency doubler in 65 nm bulk CMOS with 0.23 mW output power," *IEEE Proc. of Radio Frequency Integrated Circuits*, pp. 203–206, 2012.
- [75] A. Chakrabarti and H. Krishnaswamy, "High-power, high-efficiency, class-E-like, stacked mm-wave PAs in SOI and bulk CMOS: theory and implementation," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 8, pp. 1686–1704, 2014.
- [76] C. Mao, C. S. Nallani, S. Sankaran, E. Seok, and K. Kenneth, "125 GHz diode frequency doubler in 0.13 μm CMOS," IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1531–1538, 2009.
- [77] S. Kang, S. Thyagarajan, and A. Niknejad, "A 240 GHz wideband QPSK transmitter in 65 nm CMOS," *IEEE Proc. of Radio Frequency Integrated Circuits Symp.*, pp. 353–356, 2014.
- [78] Y. Tousi and E. Afshari, "A scalable THz 2D phased array with +17 dbm of EIRP at 338 GHz in 65 nm bulk CMOS," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 258–259, 2014.
- [79] Y. Yang, S. Zihir, H. Lin, O. Inac, and G. Rebeiz, "A 155 GHz 20 Gbit/s QPSK transceiver in 45 nm CMOS," *IEEE Proc. of Radio Frequency Integrated Circuits Symp.*, pp. 365–368, 2014.

[80] M. Fujishima, M. Motoyoshi, K. Katayama, K. Takano, N. Ono, and R. Fujimoto, "98 mW 10 Gbps wireless transceiver chipset with D-band CMOS circuits," *IEEE J. Solid-State Circuits*, vol. 48, no. 10, pp. 2273–2284, 2013.

- [81] R. Han and E. Afshari, "A CMOS high-power broadband 260 GHz radiator array for spectroscopy," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3090–3104, 2013.
- [82] Y. Zhao, J. Grzyb, and U. R. Pfeiffer, "A 288-GHz lens-integrated balanced triple-push source in a 65-nm CMOS technology," *IEEE Proc. of ESSIRC*, pp. 289–292, 2012.
- [83] F. Golcuk, A. Fung, and G. Rebeiz, "A 0.37-0.43 THz wideband quadrupler with 160 μw peak output power in 45 nm CMOS," *IEEE Proc. of Int. Microw. Symp.*, pp. 1–4, 2013.
- [84] J. Sharma, T. Dinc, and H. Krishnaswamy, "A 200 GHz power mixer in 130 nm-CMOS employing nonlinearity engineering," IEEE Proc. of Radio Frequency Integrated Circuits, pp. 1–4, 2014.
- [85] M. Seo, B. Jagannathan, J. Pekaric, and M. Rodwell, "A 150 GHz amplifier with 8 db gain and 6 dbm in digital 65 nm CMOS using dummy-prefilled microstrip lines," *IEEE J. Solid-State* Circuits, vol. 44, no. 12, pp. 3410–3421, 2009.
- [86] N. Deferm and P. Reynaert, "A 100 GHz transformer-coupled fully differential amplier in 90 nm CMOS," IEEE Proc. of Radio Frequency Integrated Circuits, pp. 359–362, 2010.
- [87] M.-D. Tsai and A. Natarajan, "60 GHz passive and active RF-path phase shifters in silicon," in *IEEE Proc. of Radio Frequency Integrated Circuits*, Jun. 2009, pp. 223–226.
- [88] J.-C. Wu, C.-C. Chang, S.-F. Chang, and T.-Y. Chin, "A 24 GHz full-360 CMOS reflectiontype phase shifter MMIC with low loss-variation," in *IEEE Proc. of Radio Frequency Inte*grated Circuits, Jun. 2008, pp. 365–368.
- [89] H. Krishnaswamy, A. Valdes-Garcia, and J.-W. Lai, "A silicon-based, all-passive, 60 GHz, 4-element, phased-array beamformer featuring a differential, reflection-type phase shifter," in IEEE Proc. of Int. Symp. Phased Array Systems and Techn., Oct. 2010, pp. 225–232.

[90] J.-J. Lee and C.-S. Park, "A slow-wave microstrip line with a high-Q and a high dielectric constant for millimeter-wave CMOS application," *IEEE Microw. Wireless Compon. Lett.*, vol. 20, no. 7, pp. 381–383, Jul. 2010.

- [91] B. Biglarbegian, M. R. Nezhad-Ahmadi, M. Fakharzadeh, and S. Safavi-Naeini, "Millimeter-wave reflective type phase shifter in CMOS technology," *IEEE Microw. Wireless Compon. Lett.*, vol. 19, no. 9, pp. 560–562, Sep. 2009.
- [92] R. L. Bunch and S. Raman, "Large-signal analysis of MOS varactors in CMOS- LC VCOs," IEEE J. Solid-State Circuits, vol. 38, no. 8, pp. 1325–1332, 2003.
- [93] A. Natarajan, S. Nicolson, M.-D. Tsai, and B. Floyd, "A 60 GHz variable-gain LNA in 65 nm CMOS," in *IEEE Asian Solid-State Circuits Conf. Tech. Dig.*, Nov. 2008, pp. 117–120.
- [94] K. S. Ang and I. D. Robertson, "Analysis and design of impedance-transforming planar marchand baluns," *IEEE Trans. on Microw. Theory and Techn.*, vol. 49, no. 2, pp. 402–406, 2001.
- [95] J.-X. Liu, C.-Y. Hsu, H.-R. Chuang, and C.-Y. Chen, "A 60 GHz millimeter-wave CMOS marchand balun," in *IEEE Proc. of Radio Frequency Integrated Circuits*, Jun. 2007, pp. 445–448.
- [96] S. P. Voinigescu, A. Tomkins, E. Dacquay, P. Chevalier, J. Hasch, A. Chantre, and B. Sautreuil, "A study of SiGe HBT signal sources in the 220-330 GHz range," *IEEE J. Solid-State Circuits*, vol. 48, no. 9, pp. 2011–2021, 2013.
- [97] E. Ojefors, B. Heinemann, and U. Pfeiffer, "Active 220 and 325 GHz frequency multiplier chains in an SiGe HBT technology," *IEEE Trans. Microw. Theory Techn.*, vol. 59, no. 5, pp. 1311–1318, 2011.
- [98] F. Golcuk, O. Gurbuz, and G. Rebeiz, "A 0.39-0.44THz 2x4 amplifier quadrupler array with peak EIRP of 34 dbm," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 12, pp. 4483–4491, 2013.
- [99] —, "A 163-180GHz 22 amplifier-doubler array with peak EIRP of +5 dbm," *IEEE Proc.* of Radio Frequency Integrated Circuits, pp. 363–366, 2013.

[100] B. Khamaisi, S. Jameson, and E. Socher, "A 210-227 GHz transmitter with integrated onchip antenna in 90nm CMOS technology," *IEEE Trans. Terahertz Science and Techn.*, vol. 3, no. 2, pp. 141-150, 2013.

- [101] S. Jameson and E. Socher, "High efficiency 293 GHz radiating source in 65 nm CMOS," *IEEE Microw. Wireless Compon. Lett.*, pp. 463–465, 2014.
- [102] H. Lin and G. M. Rebeiz, "A 200 245 GHz balanced frequency doubler with peak output power of +2 dbm," *IEEE Proc. of Comp. Semiconductor Integrated Circuit Symp.*, pp. 1–4, 2013.
- [103] F. Gardner, "Charge-pump phase-lock loops," IEEE Trans. on Communications, vol. 28, no. 11, pp. 1849–1858, 1980.
- [104] M. B. X. Gao, E. Klumperink and B. Nauta, "A low noise subsampling PLL in which divider noise is eliminated and PD/CP noise is not multiplied by n<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, p. 32533263, 2009.
- [105] E. Hegazi, H. Sjland, and A. Abidi, "A filtering technique to lower LC oscillator phase noise," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1921–1930, 2001.
- [106] D. Ham and A. Hajimiri, "Concepts and methods in optimization of integrated LC VCOs," *IEEE J. Solid-State Circuits*, vol. 36, no. 6, pp. 896–909, 2001.
- [107] B. Soltanian and P. Kinget, "Tail current-shaping to improve phase noise in LC voltage-controlled oscillators," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1792–1802, 2006.
- [108] A. Mazzanti and P. Andreani, "Class-C harmonic CMOS VCOs, with a general result on phase noise," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2716–2729, 2008.
- [109] L. Fanori and P. Andreani, "A 2.5-to-3.3GHz CMOS class-D VCO," IEEE Int. Solid-State Circuits Conf. Tech. Dig., pp. 346–348, 2013.
- [110] M. Babaie and R. B. Staszewski, "A class-F CMOS oscillator," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3120–3133, 2013.

[111] M. Garampazzi, P. M. Mendes, N. Codega, D. Manstretta, and R. Castello, "Analysis and design of a 195.6 dbc/Hz peak FoM p-n class-B oscillator with transformer-based tail filtering," IEEE J. Solid-State Circuits, vol. 50, no. 7, pp. 1657–1668, 2015.

- [112] R. B. S. M. Shahmohammadi, M. Babaie, "A 1/f noise upconversion reduction technique applied to class-D and class-F oscillators," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 444–445, 2015.
- [113] D. Murphy, H. Darabi, and H. Wu, "A VCO with implicit common-mode resonance," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 442–443, 2015.
- [114] D. Murphy and H. Darabi, "A complementary VCO for IoE that achieves a 195dbc/Hz FOM and flicker noise corner of 200 kHz," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 44–45, 2016.
- [115] R. J. W. et. al., "A sub-sampling-assisted phase-frequency detector for low-noise PLLs with robust operation under supply interference," *IEEE Trans. on Circuits Syst.*, vol. 62, no. 1, pp. 90–99, 2015.
- [116] A. Shahani, D. Shaeffer, S. Mohan, H. Samavati, H. Rategh, M. Hershenson, M. X. C. Yue, D. Eddleman, and T. Lee, "Low-power dividerless frequency synthesis using aperture phase detector," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 2232–2239, 1998.
- [117] X. Gao, E. A. M. Klumperink, G. Socci, M. Bohsali, and B. Nauta, "Spur reduction techniques for phase-locked loops exploiting a sub-sampling phase detector," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1809–1820, 2010.
- [118] Y. W. Z.Z. Chen, J. Shin, Y. Zhao, S. A. Mirhaj, Y. Kuan, H. N. Chen, C. Jou, M. Tsai, F. Hsueh, and M. Chang, "A sub-sampling all-digital fractional-N frequency synthesizer with -111 dbc/Hz in-band phase noise and an FOM of -242db," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 268–269, 2015.
- [119] T. Siriburanon, S. Kondo, K. Kimura, T. K. T. Ueno and S. Kawashima, W. Deng, M. Miyahara, K. Okada, and A. Matsuzawa, "A 2.2 GHz -242 db-FOM 4.2 mW ADC-PLL using

- digital sub-sampling architecture," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 440–441, 2015.
- [120] T. Chuang and H. Krishnaswamy, "A 0.0049 mm.<sup>2</sup> 2.3 GHz sub-sampling ring-oscillator PLL with time-based loop filter achieving -236.2 db jitter-FOM," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 192–193, 2017.
- [121] K. Raczkowski, N. Markulic, B. Hershberg, and J. Craninckx, "A 9.212.7 GHz wideband fractional-N subsampling PLL in 28 nm CMOS with 280 fs RMS jitter," *IEEE J. Solid-State Circuits*, vol. 50, no. 5, pp. 1203–1213, 2015.
- [122] N. Markulic, K. Raczkowski, E. Martens, P. E. P. Filho, B. Hershberg, P. Wambacq, and J. Craninckx, "A DTC-based subsampling PLL capable of self-calibrated fractional synthesis and two-point modulation," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3160–3174, 2015.
- [123] A. Elkholy, A. Elmallah, M. Elzeftawi, K. Chang, and P. K. Hanumolu, "A 6.75-to-8.25 GHz, 250 fsrms-integrated-jitter 3.25 mW rapid on/off PVT-insensitive fractional-N injection-locked clock multiplier in 65 nm CMOS," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 192–193, 2016.
- [124] K. Kundert, Simulating Switched-Capacitor Filters with SpectreRF, Designers Guide Consulting, Inc., 2015.
- [125] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits A Design Perspective*. Hoboken, New Jersey: Pearson Education, 2002.
- [126] X. Gao, E. Klumperink, G. Socci, M. Bohsali, and B. Nauta, "A 2.2 GHz sub-sampling PLL with 0.16 psrms jitter and -125dbc/Hz in-band phase noise at  $700\,\mu\mathrm{W}$  loop-components power," *Proc. IEEE VLSI Circuits Symp*, p. 139140, 2010.
- [127] S. Levantino, L. Romano, S. Pellerano, C. Samori, and A. L. Lacaita, "Phase noise in digital frequency dividers," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 5, pp. 775–784, 2004.

[128] B. Helal, C. Hsu, K. Johnson, and M. Perrot, "A low jitter programmable clock multiplier based on a pulse injection-locked oscillator with a highly-digital tuning loop," *IEEE Journal* of Solid-State Circuits, vol. 44, no. 5, pp. 1391–1400, 2009.

- [129] F. Aflatouni and H. Hashemi, "An electronically controlled semiconductor laser phased array," *Proc. IEEE/MTT-S Int. Microwave Symp.*, pp. 1–3, 2012.
- [130] F. Aflatouni, O. Momeni, and H. Hashemi, "A heterodyne phase locked loop with GHz acquisition range for coherent locking of semiconductor lasers in 0.13 μm CMOS," Proc. Custom Integrated Circuits Conf., pp. 463–466, 2007.
- [131] B. Behroozpour, P. A. M. Sandborn, N. Quack, T.-J. Seok, Y. Matsui, M. C. Wu, and B. E. Boser, "Electronic-photonic integrated circuit for 3D microimaging," *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 161–172, 2017.
- [132] C. Poulton, A. Yaacobi, M. Byrd, M. Raval, D. Vermeulen, and M. Watts, "Visual coherent solid-state LIDAR with silicon photonic optical phased arrays," *Optics Letters*, vol. 42, no. 20, pp. 4091–4094, 2017.
- [133] T. Komljenovic, R. Helkey, L. Coldren, and J. E. Bowers, "Sparse aperiodic arrays for optical beam forming and lidar," *Optics Express*, vol. 25, pp. 2511–2528, 2017.
- [134] M. Zadka, Y. Chang, A. Mohanty, C. Phare, S. Roberts, and M. Lipson, "Millimeter long grating coupler with uniform spatial output," OSA Conf. on Laser and Electro-Optics Tech. Dig., pp. 1–2, 2017.
- [135] V. Kratyuk, P. Hanumolu, U.-K. Moon, and K. Mayaram, "Frequency detector for fast frequency lock of digital PLLs," *Electronics Letters*, vol. 43, no. 1, pp. 13–14, 2007.
- [136] J. Zhu, R. K. Nandwana, G. Shu, A. Elkholy, S.-J. Kim, and P. K. Hanumolu, "A 0.0021 mm.<sup>2</sup> 1.8,mW 2.2 GHz PLL using time-based integral control in 65 nm CMOS," *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, pp. 238–339, 2016.
- [137] P. G. X. Gao, E. Klumperink and B. Nauta, "Jitter analysis and a benchmarking figure-of-merit for phase-locked loops," *IEEE Trans. on Circuits Syst. II: Express Briefs*, vol. 56, no. 2, pp. 117–121, 2009.

[138] J. v. d. B. D. L. F. S. O. Edfors, M. Sandell, "An introduction to orthogonal frequency division multiplexing," 1996.

- [139] R. Shafik, M. Rahman, and A.H.R., "On the extended relationships among EVM, BER and SNR as performance metrics," Proc. Int. Conf. on Electrical and Computer Engineering, pp. 408–411, 2006.
- [140] T. Siriburanon, T. Ueno, K. Kimura, S. Kondo, W. Deng, K. Okada, and A. Matsuzawa, "A 60-GHz sub-sampling frequency synthesizer using subharmonic injection-locked quadrature oscillators," Proc. Radio Frequency Integrated Circuits, pp. 105–108, 2014.

## Part II

# Appendices

## Appendix A

# Effect of Oscillator Non-ideality in OFDM

Inaccuracies in the synthesized signal include Carrier Frequency Offset (CFO) and Phase Noise (PN). These inaccuracies degrade the signal to noise ratio of the OFDM signal in two ways: Intercarrier Interference or ICI, and Phase Noise.

#### A.1 Inter-carrier interference

CFO between the upconverting and downconverting synthesizers will cause the carriers to leak into the bins of other carriers. Each carrier carries stochastic data, and so, this leakage into other bins is stochastic and appears as noise degrading the SNR. ICI can cause SNR degradation even in the absence of oscillator phase noise.

Fig. A.2, from [5], shows the SNR which results from a certain CFO. CFO is represented on the x-axis as a fraction  $\alpha$  of the inter-carrier spacing. For a detailed analysis of how SNR is calculated from the fractional CFO,  $\alpha$ , see Section 6 of [5].

Downconversion by an oscillator with CFO also causes a steady phase rotation of the signal constellation. This phase error is the same across all carriers and is known as the Common Phase Error or CPE. Very often, this effect of the CFO can be calibrated using algorithms. Some of these algorithms use the cyclic prefix of OFDM for this, and some use separate training symbols. CFO of any fraction of the space between two carrier frequencies can be corrected. Even CFOs equal to



Figure A.1: Single carrier (a) up- and down- converted by matched LO (b) leaking into the other bins when up-converted and down- converted by mismatched LO.



Figure A.2: SNR from Inter Carrier Interference (ICI) due to Carrier Frequency Offset (CFO). CFO is represented on the x-axis as a fraction  $\alpha$  of the inter-carrier spacing [5]



Figure A.3: The limit on LO phase noise is calculated assuming that the OFDM data is just carriers.

a few integer multiples of the inter carrier spacing can be calibrated.

According to [138], at the end of the algorithms correcting for CFO, the carrier must be 2% accurate, which is a 42 MHz error around 2.1 GHz or 20,000 ppm. From the chart above this is a SNR (C/N - Carrier to Noise) ratio of 30 dB. In base stations (BS), by choosing crystal oscillators to meet the 0.1ppm accuracy of the signal, CFO should not be an issue, but can be an issue in user equipment (UE).

#### A.2 Phase Noise

Another culprit for SNR degradation is oscillator phase noise. To determine the phase noise profile for which the PLL components should be designed, we will assume that the OFDM data is just a collection of carriers, as shown in the figure below. It is correct to anticipate that the presence of data on the carriers will cause a tighter specification on the phase noise. There will also be SNR degradation from other components in the chain. It is for these reasons we shoot for a quarter value for EVM (1-2%) rather than the 8% prescribed by the standard (for 64 QAM). An EVM of 1-2% translates to an SNR of  $40-34\,\mathrm{dB}$ , using  $SNR=1/EVM^2$  (or  $SNR_{dB10}=-EVM_{dB20}$ ), from [139], [140].

Now, the phase noise of the up- or down-converting oscillator is transferred onto each carrier. The skirts of the noise extend to the other carriers. So each carrier experiences the noise from the other carriers. see Fig. A.3. The other carriers are located at spaces of 15 kHz from (15 kHz to 20 MHz) away. To calculate the noise, we need to sum the synthesizer noise from 15 kHz to 20 MHz in steps of 15 kHz. This step is fine enough to treat it as a continuous integration of the PLL

Total noise from other carriers = 
$$\int_{15k}^{20M} \phi_{PLL}^2(f) df$$
 (A.1)

The PLL is locked to the reference in-band and follows the VCO outside the bandwidth B. VCO phase noise falls off as  $\frac{1}{f^3}$ , or 30 dB per decade, due to flicker noise, and then at  $\frac{1}{f^2}$ , or 20 dB per decade, due to thermal noise. So outside the bandwidth of the PLL, the noise can fall off at 20 to 30 dB per decade based on whether the PLL bandwidth B is larger or smaller than the VCO's corner frequency. As long as the VCO phase noise at an offset of B is lower than the PLLs inband noise, there is no peaking in the phase noise response. For optimal noise the bandwidth should be chosen to be the frequency at which the in-band component noise contribution matches VCO noise contribution.

For integrating the PLL phase noise, we assume that the VCO phase noise falls off at  $20 \,\mathrm{dB}$  per decade outside the bandwidth B. We also assume that there is no peaking in the phase noise response. Because of the rapid  $20 \,\mathrm{dB/dec}$  fall off after B,

$$\int_{15k}^{20M} \phi_{PLL}^2(f) df \approx \int_{15k}^{\infty} = \phi_{PLL}^2(f) df \tag{A.2}$$

PLL phase noise can be modeled as a first order transfer function with bandwidth B

$$\phi_{PLL}^2(f) = \frac{\phi_{flat}^2}{1 + \frac{f^2}{B^2}} \tag{A.3}$$

On integration,

$$\int_{15k}^{\infty} \phi_{PLL}^2(f) df = \left[ B \phi_{flat}^2 tan^{-1} \frac{f}{B} \right]_{15k}^{\infty} \qquad \approx \left[ B \phi_{flat}^2 tan^{(-1)} \frac{f}{B} \right]_0^{\infty} = \frac{\pi}{2} B \phi_{flat}^2$$
 (A.4)

If the VCO noise fell off at 30 dB/decade after bandwidth B, this integration is actually an overestimation of the integrated PLL noise. This approximation assumes  $\phi_{flat}^2$  from 0 to 15 kHz. This is not correct as PLL phase noise usually rises very steeply at close-in offsets to the carrier. As long as, the noise flattens to  $\phi_{flat}^2$  within 15 kHz, this high close-in offset phase noise does not cause signal degradation, and we don't need to include it in studying the effect of phase noise on OFDM SNR.

To evaluate the PLL requirement shown in in Chapter 1 for modulation schemes for different standards, we use the following equation

$$10log_{10}\left(\frac{\pi}{2} \cdot B\phi_{flat}^2\right) = -SNR_{dB10} = EVM_{dB20}$$
(A.5)

Note that phase noise is relative to the carrier level, so the "S" of SNR is built in on LHS.

## Appendix B

## Race Conditions in Multi-loop PLLs

Race conditions can appear when two phase locked loops simultaneously control a VCO.

#### Two Type-II PLLs

Both loops will insist on locking the frequency with a static phase error which ensures that the input to the additional integrator (other than the VCO) in the loop is zero at steady state. As such the following two conditions must be satisfied simultaneously,

$$\phi_{VCO,IIA} = \phi_{ref,IIA} + \Delta\phi_{IIA} \tag{B.1}$$

where  $\phi_{VCO,IIA}$  and  $\phi_{ref,IIA}$  are the VCO and reference phase respectively at the input of the PFD in Type-II loop A, and  $\Delta\phi_{IIA}$  is the static phase error to ensure that the integrator in loop A does not blow up. For example, this may be the static phase error required to ensure the charge pump current  $i_{CP}=0$  at lock.

Similarly,

$$\phi_{VCO,IIB} = \phi_{ref,IIB} + \Delta\phi_{IIB} \tag{B.2}$$

Even if divider and reference buffer delays in the two loops are identical (in itself impossible), such that  $\phi_{VCO,IIA} = \phi_{VCO,IIB}$  and  $\phi_{ref,IIA} = \phi_{ref,IIB}$ , the integrators will have some mismatch, such that  $\Delta\phi_{IIA} \neq \Delta\phi_{IIB}$ .

The two loops compete to correct each other's lock condition which they recognize as an error, and a race condition occurs.

#### Type-I and Type-II PLL

We consider a general case, unlike our proposed RF-PLL in Chapter 5, where the Type-II and Type-I loops do not share a phase detector.

The Type-II loop determines the static phase error condition such that the integrator input under steady state is zero. Due to mismatch, this static phase error can be different from zero. The presence of static phase error will generate a non-zero control voltage in the Type-I loop. The bias voltage on the loop filter (generated by charge accumulation during acquistion) in combination with this Type-I control voltage must maintain the frequency lock. Even as frequency drifts, the loop will revert to a fixed static phase error needed by the Type-II loop, and in the process adjust the bias voltage on the loop filter during tracking.

As such, the two loops working simultaneously can reach a steady state, and no race condition appears.

#### ILCM and Type-II PLL

ILCMs behave like Type-I PLLs in their dynamics and noise-rejection behavior. However, when combined with Type-II PLLs to prevent the VCO from drifting out of the injection-locking range, they can create race conditions.

The Type-II PLL locks the VCO frequency exactly with some static phase error

$$\phi_{VCO,IIB} = \phi_{ref,IIB} + \Delta\phi_{IIB} \tag{B.3}$$

When VCO frequency is matched to the injected reference frequency, the ILCM will try to lock the VCO with zero phase error between the two.

$$\phi_{VCO,IA} = \phi_{ref,IA} \tag{B.4}$$

Further,  $\phi_{ref,IIB} = \phi_{ref,IA} + \Delta\phi_{Pulse\,Generator}$ , where  $\Delta\phi_{Pulse\,Generator}$  is the delay in the block generating the injection pulse.  $\phi_{VCO,IA}$  and  $\phi_{VCO,IIB}$  are related by the feedback delay in the Type-II PLL.

As there is no guarantee that these conditions will be simultaneously satisfied, a race condition may result. In [4], the authors introduce multiplex the two paths periodically.

ILCMs have the additional problem that, the injection path is fast and sets the phase error to zero (locking phase when VCO and injection frequency are matched) every reference cycle. Even, if the integral path only turns on periodically enough phase error does not accumulate to provide a large enough correction. For this reason, gating and a DLL are used, as described in thereview of prior art in Chapter 5.