# Modeling and Analysis of High-Speed Links

Vladimir Stojanovic<sup>1,2</sup>, Mark Horowitz<sup>1</sup>

<sup>1</sup> Stanford University<sup>2</sup> Rambus Inc.

#### **High-Speed Link Research**

- Used to focus on making chip fast
  - Required precision timing PLL
  - Require HS transmitter, receiver circuitry
  - Many papers on these topics

REF FORWARD OF SOLED-STATE CIRCUITS, VOL. 11, NO. 13, NEWWINESS 199

Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques

John G. Maneati

Abstract— Delay-backed loop (DLL) and phase-backed los. (PLL) designs haved upon self-biased techniques are presented The DLL and PLL designs achieve process techning: lash prostores, these distributions for the second distribution of the tractions ratio areas inserted in the second second second distribution of the second s ey ratio, broad frequency range, input phase offset cancel and, most importantly, low input tracking Jitae. Both the g. factor and the bandwidth to operating frequency ratio ity by a ratio of o every nor everythal blasting, which can require spec circuits, by generating all of the internal b iterents from each other so that the bias levels a termined by the operating conditions. Fabrical well CMOS gate array process, the PLL achies N-well CMOS gate array process, the PLJ, achieves in frequency range of 0.0025 MHz to 550 MHz and ing jitter of 384 ps at 250 MHz with 500 mV of low



ulty all of the precase the band-gy and arvicommental variability that plaques PLL and DLL designs. Self-bining can provide a bandwidth that tracks the operating frequency. This making hardwidth can in our provide a very bread frequency range, minimized supply and substrate noise induced jater with a high pro-

intrational style and indexite noise makes paper with a large control of the start indexite interaction of the large start (other benefits incursion a rund of maximum famous for the large input phase offset cancellation. Both the damping famous and the bandwidth to operating frequency rule on edexturines completely by a unit of capacitances giving effective porces throbagis independence. The large side helf-initial is which may refraction built. By refressing all hiss voltage

other acacruted him ar tially establ

the opening has look an associately couldness, which an opening frequency. The next for examplicity, which an interpret of the second second second second second This spacer will begin by restricting a affective second stage design the provides high second second second second authorizers for second second

II. DIPPERINTIAL BUPPER STAGE

DLL and PLL designs. Section V will describe a DLL transition of the second state of the state of the second state of the second state of the second state of the second state of the state of the second state of th

the operating bias levels are esse

 INTRUDUCTION
 INTRUDUCTION
 INTRUDUCTION
 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION

 INTRUDUCTION sity environment in which DLL's and PLL's must function its neise, typically in the func-This noise, typically in the form of supply and substrate noise, tends to cause the output clocks of DLL's and PLL's to jittar table to carbon the couple closes of DLL 3 and PLL's to just more then ideal training. With a solving polerane for given in some theory of the polerane polerane in the polerane polerane inter DLL's and PLL's has become very challenging. Activating law given in PLL and DLL designs can be afficient due to a number of design tradeoffs. Correlate pole PLL which is based on a vertex gree contractation coefficient VCOS. The innexent of input tracking juter produced in a well of souply and solvinaria moles in directly interpolated in a low or well of souply and solvinaria moles in directly interpolated in a low sickly the PLL can correct the output frequency. To reduce e jitter, the loop hundwidth should be set as high as possible. arly, the loop bandwickh is affected by many process vintumisely, the loop basebooth is differed by must precess belowing factors and is contained to be well below the receipt operating frequency for sublity [1]. These constaints on a case the PLL babes a narrow operating frequency ange of poor jithe performance. Although a typical DLL is based and how a larked delay ange which leads to a soft problems within to find or the PLL. describes both a DLL and PLL design based

df-biased techniques [2]. Self-biasing can re-

rel Nay 3, 1998; revised July 28, 1986.

Is order to achieve low jitter operation, DLL and PLL designs mayine buffer stage designs with low supply and substrate noise sensitivity. The voltage-controlled techny line (VCDL) and the VCD used in the DLL and PLL designs are DISCONSISTENCE IN THE

#### INTE IOURNAL OF SOLID-STATE CROLITS, VOL. 52, NO. 5, NAV 1997

A 700-Mb/s/pin CMOS Signaling Interface Using Current Integrating Receivers Stefanos Sidiropoulos, Sudent Member, IEEE, and Mark Horowitz, Senior Member, IEEE

— A high speed CMOS signaling interface for sp-in multiprocessor interconnection networks has been The interface utilizes 1-V pack-poil drivers, a data include long (PLL), not service and sets on both in clock in order to increase the noise immunity of current-integrating input pin sampler is used to ning data. Chips fabricated in a 0.8-yen CMOS as franker raise of 240 Mibilia consultan from we transfer rate of 740 Mb/s/pin operating from with a bit error rate of lass than 10 million Data communication, datay line phase-in

In orrespondent data back consistent of the state of the

----

est ( ama D<sub>4</sub> ) D<sub>0</sub> | D<sub>1</sub> ) D<sub>2</sub> | D<sub>3</sub> | D<sub>4</sub> CLK new D<sub>s1</sub> D<sub>g</sub> date N Dec ( De )

Fig. 1. Interface timing

 $F_{2} : Index imp \\ f_{2} : Index imp \\ f_$ 

for an appearing in Fig. 1. The start of particle information is the start of the

THE FOURIAL OF SOUD-STATE CROUTS YOU IS NO. 5 MAY 1998

A 0.5-µm CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling Chih-Kong Ken Yang, Suders Member, IEEE, Ramin Farjad-Rad, Student Member, IEEE, and Mark A. Horowitz, Sector Member, IEEE

Address — A Grills work (b) transitive in Allegi and solitor to me of the soliton of the soliton of the without paper of this light in a bit, the site as an adjustment water transmitter and instantially density and water results. This parallation is a takined by sing multiple planes topped frame at 1950 Mills. The lateral light means the bit time. Topic and at 1950 Mills. The lateral light means to 20 Mills. For reboot of mersory, the input is sempled to 10 mills to 10 mills. Topic planes planes glassing and the solit bits. Topic and all soliton and the soliton is a soliton of the soliton is a soliton planes plane glassing light in resource to a soliton is an empiricable solution. With a 3.3-V supply, the ship has a

#### I. DITACOUCTION

1. Distancessioni de la construcción de la const

II. ARCHITECTURE

A 0.5-yem CMOS technology is not fast enough to directly generate and receive a 4-Gibble stream (nince the maximum ring reclinite frequency is <2 GHz). Instead, we use pani-alium to reduce the performance requirements of each circuit. <text><text><text><footnote><footnote><footnote>

X X X X D0 X D1 X D2 X X



0018-4000 94\$10 00 @ 1998 1222

#### **Present Problem:**



#### High speed link chip

Now, the bandwidth limit is in wires

#### **New Link Research:**

Dealing with bandwidth limited channels

- This is an old research area
  - Textbooks on digital communications
  - Think modems, DSL
- But can't directly apply their solutions
  - Standard approach requires high-speed A/Ds and digital signal processing
  - 20Gs/s A/Ds are expensive
- (Un)fortunately need to rethink issues

# **Outline Of This Talk**

- Create a framework to evaluate trade-offs
   For practical Gs/s digital communication systems
- Channel
  - How is the signal degraded?
- Noise (voltage and timing)
  - How large must the received signal be?
- Communication techniques
  - How much of the noise can be reduced
  - While maintaining a reasonable cost

## **Backplane Environment**



- Line attenuation
- Reflections from stubs (vias)

## **Backplane Channel**

- Loss is variable
  - Same backplane
  - Different lengths
  - Different stubs
    - Top vs. Bot
- Attenuation is large
  >30dB @ 3GHz
  - But is that bad?
- Required signal amplitude set by noise



# What We Will Call Noise

- Deterministic errors
  - Things we could in theory correct but don't
- Random noise
  - Have no choice
- Noise comes in two dimensions
  - Voltage
  - Timing
    - Will convert to an effective voltage noise

# Inter-symbol Interference (ISI)

- Channel is low pass
  - Our nice short pulse gets spread out



- Dispersion short latency (skin-effect, dielectric loss)
- Reflections long latency (impedance mismatches – connectors, via stubs, device parasitics, package)





- Middle sample is corrupted by 0.2 trailing ISI (from the previous symbol), and 0.1 leading ISI (from the next symbol) resulting in 0.3 total ISI
- As a result middle symbol is detected in error

#### Crosstalk

- Don't just receive the signal you want
  - Get versions of signals "close" to you
  - Vertical connections have worse coupling
    - "Close" in these vertical connection regions



#### **Frequency View of Crosstalk**



• For this example:

> 4GHz, noise is as large as the signal

## **Random Voltage Noise**

- Thermal noise
  - Resistor and Device noise
- Quantization
- Estimation error
- Supply noise
- Receiver offset

### **Modeling "Noise" Sources**

Generally use one of two methods:

- Worst case analysis
  - Used for deterministic "noise" like ISI
  - Find worst case and subtract from signal
  - Then apply Gaussian Noise to result
- Assume Gaussian Distribution
  - Rely on Central-Limit Theorem
  - Most noise looks Gaussian, right?

#### **Accuracy Issues**

- Worst case analysis
  - Can be too pessimistic
    - If probability of worst case very small
- Gaussian distributions
  - Works well near mean
  - Often way off at tails
    - E.g. ISI distribution is bounded
- We will use direct noise statistics

# **Effect of Timing Noise**

Need to map from time to voltage

Voltage noise when receiver clock is off

Iittered Ideal sampling sampling Voltage noise

 The effect is going to depend on the size of the jitter, the input sequence, and the channel

### **Effect of Transmitter Jitter**

#### Jittered pulse decomposition



- Decompose output into ideal and noise
- Noise are pulses at front and end of symbol
  - Width of pulse is equal to jitter

## **Transmitter Jitter Noise**

- Approximate the noise pulses with deltas
  - Assuming jitter is small



#### Channel output

- Output with no jitter
- Response to the noise deltas

### **Jitter Propagation Model**



Channel bandwidth matters

- If h(T/2) is small, the noise is small
- h(nT+1/2) not small, many pulses add

# **Jitter Effect On Voltage Noise**

#### Transmitter jitter

- High frequency (cycle-cycle) jitter is bad
  - Changes the energy (area) of the symbol
  - No correlation of noise sources that sum
- Low frequency jitter is less bad
  - Effectively shifts waveform
  - Correlated noise give partial cancellation
- Receive jitter
  - Modeled by shift of transmit sequence
  - Same as low frequency transmitter jitter

# **Voltage Noise From Jitter**

- White jitter
  - Noise from Tx much larger than from Rx jitter
  - From Rx jitter, noise is white
  - From Tx jitter, filtered by the channel
- Y-axis is noise  $\sigma$  (in Volts)
  - If the noise was white
  - σ = 10mV => -40dBV



- Bandwidth of the jitter is critical
  - It sets the magnitude of the noise created

## **Jitter Source From PLL Clocks**

#### Noise sources

- Reference clock phase noise
- VCO supply noise
- Clock buffer supply noise



Mansuri '02

#### Stationary phase-space model

### **Noise Transfer Functions**



- Low-pass from reference (input clock)
- Band-pass from VCO supply
- High-pass from clock buffer supply

#### **Jitter Spectrum**

- Set by reference clock
- Set by supply noise

f < loop bandwidth $f \ge loop bandwidth$ 

Spectrum using 100MHz BW power supply noise



# 2x Oversampled Bang-Bang CDR





#### Generate early/late from d<sub>n</sub>,d<sub>n-1</sub>,e<sub>n</sub>

Simple 1<sup>st</sup> order loop, cancels receiver setup time

Now need jitter on data Clk, not PLL output

#### **Data Clk Noise**

- Model phase selector and PLL
  - Base linear PLL jitter
  - Add non-linear phase selector noise from CDR
- Model the CDR loop as a state machine
  - The current phase position is the state
  - State transitions are caused by early/late
  - Jitter on input data and PLL means
    - Possible to be late and get early PD result
    - Often filter early/late to generate up/down

## **Transition Probabilities**



• Example system:

CDR loop

- Residual ISI
  - At edge -30dBV
- Desired phaseState = 133

On average move to correct position

- But probability of wrong movement is not small
- Need to find probability of at each phase location

#### **Bang-Bang CDR Statistical Model**



- Need steady state probabilities of the states
  - Have the transition probabilities
- Iteratively apply transition probabilities
   Results will converge to a steady-state

#### Bang-Bang CDR Model



Gives the probability distribution of phase

Which is the CDR jitter distribution

#### **Noise Summary**

- Many important sources of noise
  - ISI, crosstalk, quantization, estimation, etc.
- Largest noise comes from ISI
  - By factor of 10x
- Timing is noisy too
  - High frequency transmitter jitter is bad
  - CDR jitter needs to be considered
    - Especially if the data input is noisy
- How much noise can we eliminate?

# **Removing ISI**

#### Linear transmit equalizer



Transmit and Receive Equalization

- Changes signal to correct for ISI
- Often easier to work at transmitter
  - DACs easier than ADCs

# **Equalization Mechanisms**



#### Tx equalization

- Pre-filter the pulse with the inverse of the channel
- Filters the low freq. to match attenuation of high freq.
- Rx feedback equalization
  - Subtract the error from the signal

### **Residual Error**

- Cannot correct all the ISI
  - Equalizers are finite length
  - EQ coefficients quantized
  - Channel estimate error
- The error affects both voltage and timing

Need to find the distribution of this error

## **Generating ISI Distributions**



# **Estimated Residual Error**

#### **5 Tap Transmitter Equalizer**



#### Edge sample distribution

**Data sample distribution** 

#### **Comparison w/ Gaussian Model**

#### **Cumulative ISI distribution**

#### Impact on CDR phase



Gaussian model only good down to 10<sup>-3</sup> probability

Way pessimistic for much lower probabilities

### **Equalizer Related Error Sources**



Residual ISI is the biggest source of error

- Quantization error and equalizer estimation
  - Are significant for reasonable assumptions about accuracy

### **ISI and CDR Phase Distributions**



In ideal world, there would be only two dots
This plot shows how these dots spread out
Vertical slice - ISI distribution per time offset
Horizontal weight - CDR phase distribution

# **Tx Equalization**



Transmit equalization attenuates low frequencies

- Output swing is constrained (peak power constraint)
- Reduces ISI, but also decreases SNR (decreases signal)

### **Receive Equalization**

- Feedback equalization (DFE)
  - Subtracts error from input
  - No attenuation
- Problem with DFE
  - Need to know values of interfering bits
  - ISI must be causal
    - And latency in the decision circuit is a problem
    - Receive latency + DAC settling < bit time</li>
  - Can increase allowable time by loop unrolling
    - Receive next bit before the previous is resolved

# **1 Bit Loop Unrolling**



Instead of subtracting the error

- Move the slicer level to include the noise
- Slice for each possible level, since previous value unknown

### **Loop Unrolling Implementation**

#### Parhi '90 and Kasturia '91



Offset slicer levels by +/- α
 Previous symbol selects correct value
 M<sup>L</sup> receivers for L taps, M level signal
 Each receiver with M-1 comparators

# **Putting It All Together**

- To compare different designs
  Compare the voltage margin at given BER
- Need to include all noise sources
  - Accurate ISI distribution
  - Transmit and receive jitter
  - CDR jitter
  - EQ quantization noise
  - Receiver offset

## **BER Contours**

#### 5 tap Tx Eq

#### 5 tap Tx Eq + 1 tap DFE



### Voltage margin

 Min. distance between the receiver threshold and contours with same BER

# **Pulse Amplitude Modulation**

### Binary (NRZ)

- 1 bit / symbol
- Symbol rate = bit rate

### • 4-PAM

- 2 bits / symbol
- Symbol rate = bit rate/2





# When Does 4-PAM Make Sense?



First order : slope of S21

Zerbe et al '03

- 3 eyes : 1 eye = 10db
- loss > 10db/octave : 4-PAM should be considered

## **BER Contours**

PAM2 DFE

PAM4 linear equalization



### Voltage Margins [mV] at BER=10<sup>-12</sup>

| Eq/Mod type<br>vs. BP length | 3" | 10" | 20" |
|------------------------------|----|-----|-----|
| 2PAM                         | 32 | 17  | 19  |
| 2PAM w. DFE                  | 79 | 49  | 44  |
| 4PAM                         | 10 | 37  | 31  |

- Longer backplane channels more ISI
- PAM2 with DFE effectively combats ISI
- PAM4 makes better use of available bandwidth
  - Less ISI

## Conclusions

- Backplane links limited by the channel
- ISI is large
  - Can't completely compensate
    - (At least not with reasonable area/power)
  - Residual ISI also increases CDR jitter
- Generally have low BER requirements
  - Accurate noise statistic important
  - Many of large noise source are bounded
- Power constrained transmitter
  - 4 PAM and simple DFE are attractive solutions