# A PWM and PAM Signaling Hybrid Technology for Serial-Link Transceivers

Ching-Yuan Yang, Member, IEEE, and Yu Lee

Abstract—A 1-Gb/s 0.18- $\mu$ m CMOS serial-link transceiver using multilevel pulse-width and pulse-amplitude modulation (PWAM) signaling and a pre-emphasis technique is presented. Based on the PWAM technique, the transmit signaling is implemented to effectively push high data rates through bandwidthlimited channels. The clock is implicitly embedded in the 4-bit data stream, and the associated overhead needed in the clock-and-data recovery circuitry can be mitigated. In addition, the pin count can be reduced by transferring the data channels and the clock channel over a single transmitted channel. The recovered clock has an rms jitter of 5.9 ps at 250 MHz, and the retimed data have an rms jitter of 13.7 ps at 250 MHz. The occupied die area is  $1.65 \times 1.40$  mm<sup>2</sup>. The transmitter and receiver power consumption is 86 and 45 mW, respectively.

*Index Terms*—Chip-to-chip communication, clock recovery, intersymbol interference (ISI), pulse-amplitude modulation (PAM), pulse-width modulation (PWM), serial link.

#### I. INTRODUCTION

**I** N A CHIP-TO-CHIP communication, a per-pin interconnection bandwidth must scale with the speed and integration level of the integrated circuits for maintaining a high-speed, low-cost, and less-complex system. To achieve a high data transfer rate, the internal bus bandwidth must be increased. Increasing the bus bandwidth, however, increases the pin count and enlarges the chip area. It also leads to complicated routing between different modules on the same printed circuit board (PCB). Hence, the concept of transferring multiple bits over each symbol through modulation techniques has been proposed to solve these problems.

In the transmission of digital information over a communication channel, the modulator is the interface device that maps the digital information into analog waveforms. One common method is the pulse-amplitude modulation (PAM) technique, which incorporates multilevel amplitudes rather than binary signals to increase the data rate. For example, if every two consecutive bits in the sequence are grouped and converted to one of four levels, then each level is twice as long as a bit period,

Manuscript received March 6, 2007; revised June 13, 2007. This work was supported by the National Science Council, Taiwan, R.O.C., under Contract NSC93-2215-E-005-001.

C.-Y. Yang is with the Department of Electrical Engineering, National Chung Hsing University, Taichung 40254, Taiwan, R.O.C. (e-mail: ycy@dragon.nchu.edu.tw).

Y. Lee is with the SoC Technology Center, Industrial Technology Research Institute, Hsinchu 310, Taiwan, R.O.C.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIM.2007.915134

demanding only half the bandwidth required for transmission of the binary stream. As shown in [1]–[5], the data rate of the transmitters based on PAM signaling reached several gigabits per second.

The other concept of analog-digital merged data stream was addressed in [6] and [7] by using pulse-modulated signals, including pulse-width modulation (PWM), pulse-phase modulation (PPM), and pulse-density modulation (PDM). The pulse modulation techniques proved to be effective for the interface both on multichip modules and PCBs. The PPM was adopted in [8], and the bit rate of 160 Mb/s was achieved. In [9], a CMOS 400-Mb/s interface circuit using the PWM scheme was presented. The asynchronous compressed PWM, which originates from PWM, was discussed in [10] and [11]. Among them, the data and clock channels are combined in a single channel to reduce the pin count. The binary data are encoded into pulses with different widths while ensuring a periodic rising edge during each period. Thus, the clock signal could be easily recovered in the receiver using a simple phase-locked loop (PLL), as compared to more complicated techniques used for PAM signals.

In this paper, to achieve a high data transfer rate with a simplified scheme and pin-count reduction, the data and clock channels were merged into a single channel using both PWM and PAM (PWAM) schemes [12]. The binary data are encoded into pulses with different widths and amplitudes. This paper is organized as follows: A proposed 4-bit PWAM signaling concept is reviewed in Section II. Sections III and IV describe the implementation of the transmitter and receiver, respectively. Experimental results are presented in Section V, followed by a conclusion.

# II. CONCEPTS OF THE PWM AND PAM TRANSCEIVER

## A. Traditional Signaling in Links

A conventional two-level PAM (2-PAM) transmitter converts the input parallel data word to a serial output. In Fig. 1(a), 2-bit parallel input data are serialized to an output with a double rate, i.e.,  $1/T_b$ . However, the high data rate in the channel can lead to problems with intersymbol interference (ISI). To overcome these problems, a multilevel transmitter can be employed. As an example, waveforms for a four-level or 4-PAM transmitter are shown in Fig. 1(b). The transmitted output data rate is the same as the parallel data rate. Since a PAM-encoded signal has no spectral line at  $f_{\rm CK}$ , a traditional PLL will not lock to the data to produce the clock signal. As the need for complicated clock-and-data recovery (CDR) or timing recovery



Fig. 1. Waveforms of a serialized two-bit transmitter. (a) Conventional 2-PAM. (b) 4-PAM. (c) 4-PWM.

circuits arises, PWM signaling is an effective way to represent information because of its digital magnitude and analog form [9]. The example shown in Fig. 1(c) demonstrates PWMencoded signaling with a periodic rising edge. Since the PWMencoded signal processes the periodic rising edge, it simplifies the design of the CDR circuit in the receiver.

#### B. PWAM Signaling Structure

Fig. 2(a) shows the proposed PWAM transceiver scheme. *Chip A* uses a 4-bit PWAM transmitter to transmit the merged data and clock across a channel, whereas *chip B* recovers the data and the clock. Fig. 2(b) shows a PWAM signaling scheme, which is merged by a four-level PWM (4-PWM) and a five-level PAM (5-PAM) format and transmits four bits of data with the system clock across a channel. The PWAM-encoded signals cannot only achieve a high-speed data rate due to the PAM format but can also reduce the pin counts and easily recover the clock signal due to the embedded PWM function.

The PWAM transmitter consists of a 2-bit PWM modulator and a 2-bit PAM modulator. The PWM-encoded signal has pulses with four different widths. The pulse width is quantized into four levels to represent Tx-bit0 and Tx-bit1. Then, the PAM modulator converts Tx-bit2 and Tx-bit3 into a PAWM-encoded signal [Fig. 2(c)]. The output is converted to 5-PAM signaling, which is quantized into four amplitude levels to represent Tx-bit2 and Tx-bit3 and one level for PWM-encoded return-toinitial (RI) discrimination. Consequently, the waveforms can be viewed as a 2-D modulation technique using pulse modulation on the x-axis and amplitude modulation on the y-axis.

Each PWAM symbol carries an RI level for PWM signaling. Since the RI level is placed in the middle of the PAM signal, as illustrated in Fig. 2(c), the amplitude levels of PWAM data eye are symmetric. Fig. 3 shows other PWAM formats, which are asymmetric. They will result in larger transition differences from the RI level going up or down, as compared with that in Fig. 2(c). Thus, the PWAM format of Fig. 2(c) benefits from smaller amplitude differences from the RI level and can provide sharper transitions and a better ISI performance.

# C. Signaling Expression

A 2-PAM nonreturn-to-zero (NRZ) signal can be formulated as

$$x_{2-\text{PAM}}(t) = \sum_{k=-\infty}^{\infty} a_k \cdot P_{i1}(t - kT_b)$$
(1)

where  $a_k \in \{-1/2, 1/2\}$ , and  $P_{i1}(t)$  is the unit pulse function that is defined as

$$P_{i1}(t) = \begin{cases} 1, & 0 \le t \le T_b \\ 0, & \text{otherwise.} \end{cases}$$
(2)

The average power spectrum density, i.e., the Fourier transform of the time-averaged autocorrelation function, can be calculated as [13]

$$S_{2-\text{PAM}}(f) = \frac{1}{4} T_b \left[ \text{sinc}(fT_b) \right]^2.$$
 (3)

Due to the zero of the sinc function, the spectrum experiences frequency nulls at integer multiple frequencies of the data rate  $1/T_b$ . This indicates that the synchronization mechanism in the receiver should be a nonlinear process because the received signal itself does not have any information at the clock frequency. Similarly, 4-PAM and 4-PWM signals can be, respectively, expressed as

$$x_{4-\text{PAM}}(t) = \sum_{k=-\infty}^{\infty} b_k \cdot P_{i2}(t - 2kT_b)$$
$$P_{i2}(t) = \begin{cases} 1, & 0 \le t \le 2T_b \\ 0, & \text{otherwise} \end{cases}$$
$$b_k \in \left\{ -\frac{1}{2}, -\frac{1}{6}, \frac{1}{6}, \frac{1}{2} \right\}$$
(4)

and

$$x_{4-\text{PWM}}(t) = \sum_{k=-\infty}^{\infty} P_{i3}(c_k, t - 2kT_b)$$

$$P_{i3}(c_k, t) = \begin{cases} 1, & 0 \le t \le 2c_kT_b \\ 0, & \text{otherwise} \end{cases}$$

$$c_k \in \left\{ \frac{1}{5}, \frac{2}{5}, \frac{3}{5}, \frac{4}{5} \right\}.$$
(5)

Note that the duration of the unit pulse functions in (4) and (5) is  $2T_b$ , and the encoded signals are assumed to introduce a uniform quantization.



Fig. 2. Proposed 4-bit PWAM serial-link transceiver. (a) Block diagram. (b) Waveforms. (c) PWAM-encoded format with detecting levels in a symbol time.



Fig. 3. Other different PWAM formats.



Fig. 4. Power spectrum density of the (a) 2-PAM (NRZ), (b) 4-PAM, (c) 4-PWM, and (d) proposed PWAM on a logarithmic axis for 1-Gb/s NRZ inputs.

The proposed 4-bit PWAM-encoded signal with a uniformly quantized level of amplitudes can be formulated as

$$x_{4\text{-PWM}}(t) = \sum_{k=-\infty}^{\infty} d_k \cdot P_{i4}(e_k, t - 4kT_b)$$

$$P_{i4}(e_k, t) = \begin{cases} 1, & 0 \le t \le 4e_kT_b \\ 0, & \text{otherwise} \end{cases}$$

$$d_k \in \left\{ -\frac{1}{2}, -\frac{1}{4}, \frac{1}{4}, \frac{1}{2} \right\}$$
(6)

where the unit pulse function  $P_{i4}$  has a duration of  $4T_b$ , and the RI level is represented as 0. If a uniform quantization is applied to the PWM format in the PWAM technique, then we have  $e_k = c_k$ . Also, it can be shown that the minimum pulse width is equal to  $4T_b/5$ . Due to a smaller pulse width than the conventional NRZ (2-PAM) case, the transmitted regime results in more ISI and cannot efficiently transmit for high-speed operation. In the receiver, the digital data are recovered from the propagated waveform over a bandwidth-limited channel, which causes a reduction in the amplitude and pulse width of the received signal. This ISI effect is an inevitable consequence of the channel low-pass characteristics. One visual way to improve ISI is to enlarge the pulse width of the transmitted signal. For a reasonable pulse width in a PWM format,  $e_k$  in (6) can be designed as

$$e_k \in \left\{\frac{2}{7}, \frac{3}{7}, \frac{4}{7}, \frac{5}{7}\right\}.$$
 (7)

In this case, the minimum pulse width  $T_{p,\min}$  is equal to  $8T_b/7$ , which introduces better ISI than the conventional 2-PAM, even the PWAM with  $e_k = c_k$ . Moreover, maximizing the minimum pulse difference  $\Delta T_{p,\min}$  [shown in Fig. 2(c)] is also another eye consideration. In (7),  $\Delta T_{p,\min}$  is equal to  $4T_b/7$ . Other choices of  $e_k$ , such as  $\{3/9, 4/9, 5/9, 6/9\}$ ,  $\{4/11, 5/11, 6/11, 7/11\}$ , etc., can produce wider pulse width than (7), but  $\Delta T_{p,\min}$  becomes small, thereby reducing the eye response. In reality, a reasonable  $\Delta T_{p,\min}$  depends on the data rate and transmitted media. Considering the tradeoff between  $T_{p,\min}$  and  $\Delta T_{p,\min}$ , as well as the simplified circuitry, we utilize (7) in this paper.

Fig. 4 shows the power spectrum density on a log–log scale under the same transmitted data rate. The input rate is assumed to be 1 Gb/s for NRZ data [shown in Fig. 4(a)], and the data are also applied to the modulated techniques: the 4-PAM, the 4-PWM, and the PWAM [shown in Fig. 4(a)–(c)]. It is interesting to note that PWM and PWAM signals have a line component at the transmitted symbol rate due to the edge-periodic characteristics.

#### D. Summarized Features of the PWAM

In this paper, the binary data are encoded into pulses with different widths and amplitudes. The important features of the proposed PWAM scheme are summarized in the following.

 With the characteristics of PWM, the necessary component of clock is embedded in the encoded signals. Thus, in the receiver, a conventional PLL could easily extract the



Tx-bit0

Tx-bit1

**PWM modulator** 

Fig. 5. Transmitter block diagram.

clock from the incoming data stream, and the multiphase outputs of the voltage-controlled oscillator (VCO) in the PLL could be used to demodulate the PWM-encoded signal.

- 2) For a given data rate, the multilevel PAM scheme reduces the symbol rate, as compared to a conventional 2-PAM system. The symbol rate reduction lowers not only the ISI in the channel but the maximum required on-chip clock frequency as well. By transmitting multiple bits in each symbol time, the required bandwidth of the channel for a given bit rate decreases, and the system channel efficiency increases.
- 3) For a 4-bit modulated technique, the proposed 4-bit PWAM merged with a 2-bit PWM and a 2-bit PAM can arrive at reasonable high-speed applications. The data rate is actually lower than the 4-PAM or 8-PAM but higher than the conventional PWM. Furthermore, it is almost difficult to work by utilizing a 4-bit PAM (16-PAM) technique for a high-speed link due to the worse ISI response to amplitudes.
- 4) The pin count can be more reduced by transferring the 4-bit data channels and the clock channel over a single transmitted channel through modulation techniques, e.g., PAM and PWM.
- 5) A preemphasis scheme at the transmitter is used to convert the loss of the channel [1]–[5]. It illustrates the reduction in the received eye height due to the transmit preemphasis event for a channel.

Moreover, the chip cost and power consumption are also important considerations.

#### **III. TRANSMITTER DESIGN**

The PWAM transmitter is shown in detail in Fig. 5. It contains three main building blocks: a PWM modulator, a PAM modulator, and a preemphasis scheme. A delay-locked loop (DLL) provides evenly spaced clock phases that are used to produce PWM-encoded signals. After processing Tx-bit0 and Tx-bit1 using the PWM technique, PAM signaling is used to





Fig. 6. Timing diagram of the PWM modulator.

modulate the information from Tx-bit2 and Tx-bit3. In addition to the PWAM modulator, the transmitter incorporates a preemphasis block, generating preemphasis for both step-up and stepdown code changes. Preemphasis compensates for the limited bandwidth of the package leads and channel medium [14].

## A. PWM Modulator

The PWM modulator is made by a DLL and a phase controller. A DLL is a circuit that synchronizes the output clock to its input clock. It consists of a phase detector (PD), a charge pump, a low-pass filter (LPF), and a seven-stage voltagecontrolled delay line (VCDL). The input clock Tx-CK is fed into both PD and VCDL, and the output clock is a delayed version of Tx-CK. The PD detects the phase error between two inputs, which is used to control a charge-pump current, charging and discharging the LPF. The following VCDL is controlled by a filtered control voltage, and the adjusted output clock is fed back to PD. Through the feedback operation, the closed loop tends to insert a delay time of one clock between two inputs for clock synchronization. The DLL is used to generate seven-phase sequences, and five phases are actually used to form PWM signaling. By changing the phase-switching sequence, as illustrated in Fig. 6, the PWM signal is produced by the phase controller. The pulse width is dependent on the input data bits Tx-bit0 and Tx-bit1. The output (Tx-pwm) duty cycle is (n+2)/7, where  $n = 0, \ldots, 3$ . The phase controller,

Tx-CK



Fig. 7. Phase controller merging the phase selector and the phase-combined generator.



Note:  $IR = \Delta V$ 

2∆V

(b)

0

Û

1

2/



Fig. 9. Timing diagram. (a) Normal operation ( $\Delta t < t_1$ ). (b) and (c) Mistake operations ( $\Delta t > t_1$  and  $\Delta t < 0$ , respectively).

as shown in Fig. 7, provides the function of the phase selector and the phase-combined generator to produce the PWM signals.

## B. PAM Modulator

The PAM signaling is used to modulate the information from Tx-bit2 and Tx-bit3. Fig. 8(a) shows the PAM scheme, which is composed of a 3-bit current-mode digital-to-analog converter (DAC) that generates five amplitude levels to represent the PAM signal. The output current of the 5-PAM signaling circuit can be given by

$$I_{\text{out}} = \begin{cases} 2I + \text{Tx-bit2} \times 3I + \text{Tx-bit3} \times I, & \text{if Tx-pwm} = 1\\ 4I, & \text{otherwise.} \end{cases}$$
(8)

The output has five levels to illustrate the PAM function. When the PWM signal is at a high level, the output is driven to 2I, 3I, 5I, and 6I by Tx-bit2 and Tx-bit3. When the PWM signal returns to zero, the output current becomes 4I, which means that the output signaling returns to a common-mode level. Note that the reference current I is generated by  $V_R/(4R)$ . The load element is terminated with a value of R in the channel. The PAM-encoding states are illustrated in Fig. 8(b). Since the value of  $\Delta V$  is equal to IR, i.e.,  $V_R/4$ , the RI level becomes  $V_R$ , which is the reference voltage, and is tunable.

The skew effect may result in the uncertainty of the arrival of the outgoing signals. A simplified timing diagram of the PAM scheme in Fig. 9 is outlined with its operating mechanism. It is useful to introduce the time interval  $\Delta t$ , which is defined as the time difference from the beginning of each bit in Tx-bit2 and Tx-bit3 to the positive edge of the Tx-PWM signal.



Fig. 10. Transmitter output waveforms. (a) Without preemphasis. (b) With preemphasis.

When  $\Delta t$  is negative or larger than  $t_1$ , as shown in Fig. 9(b) and (c), the PWM scheme fails to decode the output signals. Note that  $t_1$  is equal to  $(2/7)T_{\rm CK}$ , which is the interval located on the low level of the Tx-PWM. To avoid this problem, the trigger time of the system clock for the incoming data Tx-bit2 and Tx-bit3 should be located around  $t_1$ , i.e.,  $\Delta t < t_1$ , as illustrated in Fig. 9(a). One approach that addresses this problem can be seen in Fig. 6. The digital data are synchronized by the phase  $\phi_0$ , which is inphase with the system clock, whereas the rising edge of the PWM signal is triggered by the phase  $\phi_1$ . This guarantees that the time difference  $\Delta t$  is equal to  $T_{\rm CK}/7$  and smaller than  $t_1$ ; therefore, the PAM scheme can correctly produce the PWAM-encoded signals.

#### C. Preemphasis Scheme

For the same timing margin, the transmitted signals need sharper transitions to make the system as tolerant to the sampling phase error as that in Fig. 2(c). In other words, signal transitions must become sharper to increase the eye width. The eye opening can be increased through ISI cancellation, signal power increase, and noise and jitter reduction. Another approach, called preemphasis, can also increase the eye width by increasing the slope of the signal transitions [3], [14]. Fig. 10 shows the transmit and receive waveforms with and without preemphasis. The preemphasized waveform closely tracks the original transmitted pattern. Preemphasis is performed using a current-mode charge pump that can entail an increase in the resolution of the DAC (PAM). Once a signal transition is detected, the charge pump creates a short pulse with a constant duration of  $T_{\rm CK}/7$  and with an amplitude of  $I_{\rm RP} + \overline{\text{Tx-bit2}} \oplus \overline{\text{Tx-bit3}} \cdot I_{\rm RP}$ . The output pump current is programmed according to the edge direction and the value of the PWAM-encoded data transition, providing a preemphasized amplitude proportional to the change. Combining the preemphasis current to compensate for its attenuation over longer channels, the received signal has an opener eye.



Fig. 11. PAM demodulator. (a) Scheme. (b) Comparator with a buffer. (c) Timing diagram.

# IV. RECEIVER DESIGN

# A. PAM Demodulator

The receiver performs demultiplexing of the serial stream prior to digital processing. The received data are decoded using the threshold levels, i.e., voltages and phases. The PWAM receiver includes a PAM demodulator and a PWM demodulator to recover the data and the system clock. A block diagram of the PAM demodulator is shown in Fig. 11(a). The demodulator mainly comprises a 2-bit Flash analog-to-digital converter (ADC). The ADC consists of four comparators and regenerative latches, followed by a thermometer-to-binary decoder. The comparator shown in



Fig. 12. PWM demodulator. (a) Block diagram. (b) Timing. (c) VCO.

Fig. 11(b) is similar to that of [15], which exhibits a self-biased characteristic with a wider input common range and can achieve a low bit error rate (BER) at 1 Gb/s. Since the comparator operates under a 3-V supply voltage  $V_{DDA}$ , a buffer is employed to drive a 1.8-V output for digital signals. A resistor ladder is employed to generate the reference voltages VR0, VR1, VR2, and VR3 [also shown in Fig. 2(c)]. These voltages are equal to  $2.5\Delta V$ ,  $3.5\Delta V$ ,  $4.5\Delta V$ , and  $5.5\Delta V$ , respectively. The simplified timing of the PAM-decoded operating mechanism is illustrated in Fig. 11(c). It is interesting to note that the PWM-encoded signal can be recovered by  $B1 \bigoplus B2$ , and its rising edge periodically appears. Hence, we can use the PWM signal to trigger the outputs of the comparators and regenerate the digital data. Additionally, a delay buffer can be employed for timing requirement. Finally, a decision circuit, which is formed by D flip-flops, is driven by the system clock and then retimes the decoded data. The clock is provided by the PWM demodulator, as discussed in Section IV-B.

#### B. PWM Demodulator

Fig. 12(a) shows the PWM demodulator. The system clock can be recovered by a PLL from the PWM-encoded signals. The PLL here acts similar to the transmitter's DLL. It produces 14 intermediate phases by alternating the polarity of the inputs for the differential-to-single-ended circuit following every fulldifferential VCO's ring stage. The PLL synchronizes the VCO to the PWM-encoded signal by comparing their phases and controlling the VCO in a manner that tends to maintain an inphase relationship between the two. Since the phase comparison is performed on every cycle, the VCO phase and frequency do not substantially drift. Thus, phases  $Q_4$ ,  $Q_6$ ,  $Q_8$ , and  $Q_{10}$ correspond to the position of the PWM-encoded pulses. As illustrated in Fig. 12(b), the additional five phases, i.e.,  $Q_3$ ,  $Q_5, Q_7, Q_9$ , and  $Q_{11}$ , are used to detect the location of the falling edge of each PWM-encoded signal. Once the falling edges are recognized, the symbol can be decoded back to two bits. The PWM decoder is implemented using a digital



Fig. 13. Microphotograph of the PWAM transceiver.

circuit [9]. Fig. 12(c) shows a seven-stage fully differential ring oscillator for the VCO. The differential structure has superior supply and substrate noise immunity. A replica bias circuit adjusts the load of the delay cells over a wide range in response to a variable supply current [16]. It ensures that the output swing of the delay cells remains constant and takes a variable bias current to cover a suitable range of different output frequencies.

#### V. EXPERIMENTAL RESULTS

The proposed circuit was fabricated in a 0.18-µm N-well CMOS technology. Fig. 13 shows the microphotograph of the PWAM transceiver. The chip area is  $1.65 \times 1.40$  mm<sup>2</sup>. This transceiver is fully integrated, with the exception of the PLL loop filter. Since the sampling clock used in the receiver is provided by the PLL, which is a closed loop, the clock would be more immune from the unwanted timing uncertainty, as well as process and temperature variations. Resulting from the characteristics of the PLL, the task of the phase-locked receiver is to adequately reproduce the original clock signal while removing as much of the noise as possible [17]. The external filter can be used for adjusting to the loop characteristics of the PLL, thereby introducing a better dynamic performance. The supply voltage for the interface circuits of the PAM modulator and demodulator was 3 V, whereas the PWM circuits used digital levels from a 1.8-V supply. With the ISI of the transmitted signal over the channel, enlarging the transmitted amplitude can obtain a better eye diagram while obtaining a reasonable output swing. Note that the device process could provide dual supply voltages, i.e., 1.8 and 3 V, but  $f_T$  degraded under the 3-V process.

An Anritsu pulse-pattern generator MP1763C generated a 1-Gb/s pseudorandom binary sequence (PRBS) of  $2^{11} - 1$  as the transmitted data. A serial-to-parallel converter built on chip converted every four adjacent bits of the data stream to form Tx-bit0, Tx-bit1, Tx-bit2, and Tx-bit3. The parallel data were synchronized with a 250-MHz system clock, i.e., Tx-CK. The waveforms and the associated quality were obtained using a Tektronix digital phosphor oscilloscope TDS7704B. The BER was measured with an Anritsu error detector MP1764C through an external parallel-to-serial converter.





File Edit Vertical Horiz/Acq Trig Display Cursors Measure Masks Math App Utilities Help Buttons

Fig. 14. Measured transmitted PWAM waveforms for a long PCB trace. (a) Without preemphasis. (b) With preemphasis.

The PCB trace was a microstrip transmission line with a characteristic impedance of 50  $\Omega$ . The subminiature version A (SMA) connection was applied to both terminals of the trace. The input signal was applied to the transmitter on the test board through an SMA connector and a coaxial cable. The transmit termination resistor *R* is set to 50  $\Omega$ , an external 100- $\Omega$  resistor, and an on-chip 100- $\Omega$  resistor in parallel. The bias current was tuned to generate a fitted RI level of 1.4 V. The output voltage level of the PWAM transmitter varied from 0.7 to 2.1 V.

To verify the impact of preemphasis, it was used as a transmitter driver to compensate for rising/falling losses and to remove ISI in the received signal after a long PCB trace of around 1 m. Fig. 14(a) demonstrates the eye diagram when preemphasis is disabled. The advantage of preemphasis can be seen in Fig. 14(b).

The test channel between the transmitter and the receiver was a 30-cm trace on the PCB. Fig. 15 shows the waveforms and eye diagrams of the PWM and PAM signals for 1-Gb/s



Fig. 15. Measured PWM and PWAM signals under transmit emphasis at a 30-cm PCB trace.





Fig. 17. Recovered data performance. (a) Histogram. (b) Bathtub curve.

channel effects, and driving mismatch in the output buffer. As expected, the PWAM output exhibits four-level PWM and five-level PAM signaling.

The recovered clock and data are measured in the receiver, which contains PAM and PWM demodulators. The PAM demodulator reverts two transmitted bits and the PWAMencoded signals. Following the PAM demodulation, the PWM demodulator recovers the clock and the other two bits of data. Their waveforms are shown in Fig. 16(a). Fig. 16(b) shows the measured rms and peak-to-peak jitter of the recovered clock to be 5.9 and 39 ps, respectively. The measured jitter performance of the demodulated data is shown in Fig. 17(a), i.e., rms jitter of 13.7 ps  $(3.425 \times 10^{-3} \text{ UI})$  and peak-to-peak jitter of 107 ps  $(26.75 \times 10^{-3} \text{ UI})$ . The jitter distribution can be used to find its impact on the BER. Fig. 17(b) shows the associated bathtub curve. The eye diagram opening at the BER =  $10^{-12}$  is about 0.88 UI. In addition, the 4-bit demodulated data were multiplexed into a test data through a parallel-to-serial converter. Using this output, the BER of the system could be measured. With a random sequence of  $2^{11} - 1$ , the BER was smaller than  $10^{-12}$ . Table I summarizes the performance of the proposed PWAM transceiver with several comparable PAM [2], [3] and PWM [9] over the past few years.

# VI. CONCLUSION

In this paper, a serial link for chip-to-chip communications fabricated in 0.18- $\mu$ m standard CMOS technology is presented. This serial link utilizes PWAM signaling, combining the advantages of PWM and PAM technologies, to improve the quality of communication. By transmitting the PAM-encoded signal and incorporating multilevels, the total bandwidth per channel is increased. Additionally, due to the presence of periodic rising edges in the PWM-encoded signal, the system

Fig. 16. Measured recovered waveforms in the receiver. (a) Recovered clock and data. (b) Jitter of the recovered clock.

PRBS transmission. The measured duty cycles of PWM signals are 27.1%, 41.0%, 54.8%, and 68.1%, respectively. The results somewhat deviate from the ideal value due to process variation,

| Ref                     | [2]          | [3]                            | [9]                             | This work                       |
|-------------------------|--------------|--------------------------------|---------------------------------|---------------------------------|
| Technology              | 0.35-µm CMOS | 0.5-µm CMOS                    | 0.25-µm CMOS                    | 0.18-µm CMOS                    |
| Supply voltage          | N/A          | 3.3V                           | 2.5V                            | 1.8V/3 V                        |
| Data rate / channel     | 1.6 Gb/s     | 1.3 Gb/s                       | 400 Mb/s                        | 1 Gb/s                          |
| Modulated technique     | 4-PAM        | 8-PAM                          | 4-PWM                           | PWAM                            |
| Embedded data / channel | 2-bit data   | 3-bit data                     | 2-bit data + clock              | 4-bit data + clock              |
| Transmit pre-emphasis   | yes          | yes                            | no                              | yes                             |
| Transmit medium         | PCB trace    | 15-cm PCB trace                | 50-cm cable                     | 30-cm PCB trace                 |
| Chip size               | N/A          | 2 mm <sup>2</sup>              | Tx: 0.823×0.482 mm <sup>2</sup> | 1.65×1.40 mm <sup>2</sup>       |
|                         |              |                                | Rx: 0.678×0.338 mm <sup>2</sup> |                                 |
|                         |              |                                | Tx/Rx PLL:                      |                                 |
|                         |              |                                | 0.432×0.290 mm <sup>2</sup>     |                                 |
| Power dissipation       | N/A          | 400 mW                         | Tx: 15.31 mW                    | Tx:                             |
|                         |              |                                | Rx: 14.63 mW                    | 86 mW (with pre-emphasis)       |
|                         |              |                                | Tx/Rx PLL: 18.27 mW             | 67 mW (without pre-emphasis)    |
|                         |              |                                |                                 | Rx: 45 mW                       |
| Jitter performance      | N/A          | Rx synthesizer                 | Tx output:                      | Recovered clock:                |
|                         |              | clock*:                        | $\Delta T_{\rm rms}$ : 26.67 ps | $\Delta T_{\rm rms}$ : 5.9 ps   |
|                         |              | $\Delta T_{\rm rms}$ : 3.1 ps  | $\Delta T_{\rm pk-pk}$ : 180 ps | $\Delta T_{\rm pk-pk}$ : 39 ps  |
|                         |              | $\Delta T_{\rm pk-pk}$ : 20 ps | Recovered clock:                | Recovered data:                 |
|                         |              | * External reference           | $\Delta T_{\rm rms}$ : 22.84 ps | $\Delta T_{\rm rms}$ : 13.7 ps  |
|                         |              | clock source                   | $\Delta T_{\rm pk-pk}$ : 156 ps | $\Delta T_{\rm pk-pk}$ : 107 ps |
|                         |              |                                | Recovered data: N/A             |                                 |
|                         |              | Recovered data: N/A            |                                 |                                 |
|                         | 1            |                                |                                 |                                 |

TABLE I PWAM TRANSCEIVER PERFORMANCE SUMMARY AND COMPARISON

clock can be embedded in the data stream, and the associated overhead needed for the CDR can be mitigated. In the PWAM transceiver, the symbol rate is 250 MS/s, and the equivalent data rate is 1 Gb/s. It is suitable for a chip-to-chip interconnection domain, where designers of high-throughput chips are restricted by the limited number of package pins and PCB traces.

#### ACKNOWLEDGMENT

The authors would like to thank the Chip Implementation Center and the Taiwan Semiconductor Manufacturing Company, Taiwan, R.O.C., for the fabrication of the chip.

#### REFERENCES

- F.-R. Ramin, C.-K. Yang, M. A. Horowitz, and T. H. Lee, "A 0.3-μm CMOS 8-Gb/s 4-PAM serial link transceiver," *IEEE J. Solid-State Circuits*, vol. 35, no. 5, pp. 757–764, May 2000.
- [2] J. L. Zerbe, P. S. Chau, C. W. Werner, T. P. Thrush, H. J. Liaw, B. W. Garlep, and K. S. Donnelly, "1.6 Gb/s/pin 4-PAM signaling and circuits for a multidrop bus," *IEEE J. Solid-State Circuits*, vol. 36, no. 5, pp. 752–760, May 2001.
- [3] D. J. Foley and M. P. Flynn, "A low-power 8-PAM serial transceiver in 0.5-µm digital CMOS," *IEEE J. Solid-State Circuits*, vol. 37, no. 3, pp. 310–316, Mar. 2002.
- [4] K. Farzan and D. A. Johns, "A CMOS 10-Gb/s power efficient 4-PAM transmitter," *IEEE J. Solid-State Circuits*, vol. 39, no. 3, pp. 529–532, Mar. 2004.
- [5] V. Stojanovic, A. Ho, B. W. Garlepp, F. Chen, J. Wei, G. Tsang, E. Alon, R. Kollipara, C. Werner, J. Zerbe, and M. Horowitz, "Autonomous dualmode (PAM2/4) serial link transceiver with adaptive equalization and data

recovery," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 1012–1026, Apr. 2005.

- [6] A. Iwata and M. Nagata, "A concept of analog–digital merged circuit architecture for future VLSIs," *IEICE Trans. Fundam.*, vol. E79-A, no. 2, pp. 145–157, Feb. 1996.
- [7] A. Iwata, T. Morie, and M. Nagata, "Merged analog-digital circuits using pulse modulation for intelligent SoC applications," *IEICE Trans. Fundam.*, vol. E84-A, no. 2, pp. 486–496, Feb. 2001.
- [8] K. Nogam and A. E. Gamal, "A CMOS 160-Mb/s phase modulation I/O interface circuit," in *ISSCC Dig. Tech. Papers*, Feb. 1994, pp. 108–109.
- [9] W.-H. Chen, G.-K. Dehng, J.-W. Chen, and S.-I. Liu, "A CMOS 400-Mb/s serial link for AS-memory systems using a PWM scheme," *IEEE J. Solid-State Circuits*, vol. 36, no. 10, pp. 1498–1505, Oct. 2001.
- [10] T. Yamauchi, Y. Morooka, and H. Ozaki, "A low power and high speed data transfer scheme with asynchronous compressed pulse width modulation for AS-memory," *IEEE J. Solid-State Circuits*, vol. 31, no. 4, pp. 523–530, Apr. 1996.
- [11] M. Nagata, J. Funakoshi, and A. Iwata, "A PWM signal processing core circuit based on a switched current integration technique," *IEEE J. Solid-State Circuits*, vol. 33, no. 1, pp. 53–60, Jan. 1998.
- [12] C.-Y. Yang and Y. Lee, "A 0.18-μm CMOS 1-Gb/s serial link transceiver by using PWM and PAM techniques," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2005, pp. 1150–1153.
- [13] B. P. Lathi, *Modern Digital and Analog Communication Systems*, 2nd ed. London, U.K.: Oxford Univ. Press, 1995.
- [14] J. Everitt, J. F. Parker, P. Hurst, and K. R. Konda, "A CMOS transceiver for 10-Mb/s and 100-Mb/s Ethernet," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 2177–2196, Dec. 1998.
- [15] M. Bazes, "Two novel fully complementary self-biased CMOS differential amplifiers," *IEEE J. Solid-State Circuits*, vol. 26, no. 2, pp. 165–168, Feb. 1991.
- [16] J. G. Maneatis, "Low-jitter process-independent DLL and PLL based on self-biased techniques," *IEEE J. Solid-State Circuits*, vol. 31, no. 11, pp. 1723–1732, Nov. 1996.
- [17] F. M. Gardner, *Phaselock Techniques*, 3rd ed. Hoboken, NJ: Wiley, 2005.



Ching-Yuan Yang (S'97–M'01) was born in Miaoli, Taiwan, R.O.C., in 1967. He received the B.S. degree in electrical engineering from the Tatung Institute of Technology, Taipei, Taiwan, in 1990 and the M.S. and Ph.D. degrees in electrical engineering from the National Taiwan University, Taipei, in 1996 and 2000, respectively. During 2000–2002, he was on the faculty of

During 2000–2002, he was on the faculty of Huafan University, Taipei. Since 2002, he has been on the faculty of National Chung Hsing University, Taichung, Taiwan, where he is currently an Associate

Professor with the Department of Electrical Engineering. His research interests are in the area of mixed-signal integrated circuits and systems for wireline and wireless communications.



**Yu Lee** was born in Pingtung, Taiwan, R.O.C., on October 9, 1980. He received the B.S. degree in electronic engineering from Lunghwa University of Science and Technology, Taoyuan, Taiwan, in 2003 and the M.S. degree in electrical engineering from the National Chung Hsing University, Taichung, Taiwan, in 2005.

In 2005, he joined the SoC Technology Center, Industrial Technology Research Institute, Hsinchu, Taiwan. His research interests include high-speed mixed-signal circuit design and analog chip testing,

particularly built-in self-test and design-for-test techniques.