scispace - formally typeset
Search or ask a question
Author

E-Hung Chen

Bio: E-Hung Chen is an academic researcher from MediaTek. The author has contributed to research in topics: CMOS & SerDes. The author has an hindex of 11, co-authored 24 publications receiving 460 citations. Previous affiliations of E-Hung Chen include Rambus & University of California, Los Angeles.

Papers
More filters
Journal ArticleDOI
TL;DR: A SerDes operating at 40 Gb/s optimized for chip-to-chip communication is presented and equalization consists of 2-tap feed-forward equalizers in both transmitter and receiver.
Abstract: A 40 Gb/s serial link interface is presented that includes four lanes of transceiver optimized for chip-to-chip communication while compensating for 20 dB of channel loss. Transmit equalization consists of a 2-tap feed-forward equalizer (FFE) while receive equalization includes a 2-tap FFE using a transversal filter, a 3-stage continuous-time linear equalizer with active feedback, and discrete-time equalizers consisting of a 17-tap decision feedback equalizer (DFE) and a 3-tap sampled FFE. The receiver uses quarter-rate double integrate-and-hold sampling. The clock and data recovery (CDR) unit uses a split-path CDR/DFE design which facilitates wider bandwidth and lower jitter simultaneously. A phase detection scheme that filters out edges affected by residual inter-symbol interference allows recovering a low-jitter clock from a partially-equalized eye. A fractional-N PLL is implemented for frequency offset tracking. Combining these techniques, the digital CDR recovers a stable 10 GHz clock from an eye containing 0.8 UI p-p input jitter and achieves 1-10 MHz of tracking bandwidth. The transceiver achieves horizontal and vertical eye openings of 0.27 UI and 120 mV, respectively, at BER = 10 -9 . The quad SerDes is realized in 28 nm CMOS technology. Amortizing common blocks, it occupies 0.81 mm $^{2}$ per lane and achieves 23.2 mW/Gb/s power efficiency at 40 Gb/s.

101 citations

Journal ArticleDOI
TL;DR: An ADC-based receiver that uses a low-gain analog and mixed-mode pre-equalizer in conjunction with non-uniform reference levels for the ADC, which compensates for both the frontend non-ideality and the channel response while maintaining low ADC resolution and hence enables low power consumption is presented.
Abstract: Implementing serial I/O receivers based on analog-to-digital converters (ADCs) and digital signal post-processing has drawn growing interest with technology scaling, but power consumption remains among the key issues for such digital receiver in high speed applications. This paper presents an ADC-based receiver that uses a low-gain analog and mixed-mode pre-equalizer in conjunction with non-uniform reference levels for the ADC. The combination compensates for both the frontend non-ideality and the channel response while maintaining low ADC resolution and hence enables low power consumption. The receiver is fabricated in a 65 nm CMOS technology with 10 Gb/s data rate, and has 13 pJ/bit and 10.6 pJ/bit power efficiency for a 29 dB and a 23 dB loss channel respectively.

63 citations

Journal ArticleDOI
TL;DR: A new adaptation strategy of I/O link equalizers is presented based on minimizing the bit error rate (BER) as the objective function to maximize the receiver voltage margin and requires almost no additional hardware compared to SS-LMS adaptation.
Abstract: A new adaptation strategy of I/O link equalizers is presented based on minimizing the bit error rate (BER) as the objective function to maximize the receiver voltage margin. The adaptation strategy is verified in a 90-nm test chip on both the transmitter finite-impulse response filter (Tx-FIR) and the receiver decision-feedback equalizer (Rx-DFE). The performance is compared with the commonly used sign-sign least mean square (SS-LMS) adaptation and demonstrates significant improvements especially in the case of the Tx-FIR. This paper also demonstrates that in a highly attenuating system that contains both a Tx-FIR and Rx-DFE, using a Tx-FIR subject to peak output power constraint to compensate pre-cursor ISI is worse than solely using an Rx-DFE. The adaptation strategy is further applied to adapt the sampling phase of the clock-and-data recovery loop (CDR). The technique enables near-optimal BER performance by substantially reducing the pre-cursor ISI and requires almost no additional hardware compared to SS-LMS adaptation.

52 citations

Proceedings ArticleDOI
01 Feb 2020
TL;DR: Channel reflection and cross-talk are excessive at 100Gb/s, which puts a ceiling on attainable BER, and considering practical equalization capabilities of a long-reach system (>30dB), 10dB package loss significantly limits the available channel reach.
Abstract: Explosive growth in mega-scale data centers drives switch chips to transition from 12.8Tb/s to 51.2Tb/s throughput. A 51.2Tb/s switch requires 512 lanes operating at 106Gb/s PAM-4. Such a massive integration of electrical SERDES is restrained by three factors: First, a large switch die size (>25×25mm2) substantially lowers yield and prohibitively increases cost. Second, a large-size package suffers more than 10dB insertion loss from combined TX and RX traces. Considering practical equalization capabilities of a long-reach system (>30dB), 10dB package loss significantly limits the available channel reach. Lastly, channel reflection and cross-talk are excessive at 100Gb/s, which puts a ceiling on attainable BER.

49 citations

Journal ArticleDOI
TL;DR: This work proposes a forward FIR equalizer and a decision-feedback equalizer (DFE) that compensate for both data and edge samples that achieve convergence.
Abstract: Limited channel bandwidth introduces inter-symbol interference (ISI) at both data and edge samples. In addition to the ISI at data samples, ISI at the edge samples (edge ISI) increases the bit error rate (BER) by degrading on the eye diagram and increasing the jitter of the clock and data recovery (CDR). This work proposes a forward FIR equalizer and a decision-feedback equalizer (DFE) that compensate for both data and edge samples. To adapt both the data and edge equalizers, a modified LMS adaptation algorithm is introduced to achieve convergence. A transmitter and receiver are implemented in 0.13 mum and 0.18 mum technologies respectively. The edge ISI is improved by 20% and the jitter is improved by 10% in measurement. The link operates over a 120'' FR4 channel with 24 dB attenuation at Nyquist frequency, and the BER is below 10-14 at 3.6 Gb/s.

47 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A SerDes operating at 40 Gb/s optimized for chip-to-chip communication is presented and equalization consists of 2-tap feed-forward equalizers in both transmitter and receiver.
Abstract: A 40 Gb/s serial link interface is presented that includes four lanes of transceiver optimized for chip-to-chip communication while compensating for 20 dB of channel loss. Transmit equalization consists of a 2-tap feed-forward equalizer (FFE) while receive equalization includes a 2-tap FFE using a transversal filter, a 3-stage continuous-time linear equalizer with active feedback, and discrete-time equalizers consisting of a 17-tap decision feedback equalizer (DFE) and a 3-tap sampled FFE. The receiver uses quarter-rate double integrate-and-hold sampling. The clock and data recovery (CDR) unit uses a split-path CDR/DFE design which facilitates wider bandwidth and lower jitter simultaneously. A phase detection scheme that filters out edges affected by residual inter-symbol interference allows recovering a low-jitter clock from a partially-equalized eye. A fractional-N PLL is implemented for frequency offset tracking. Combining these techniques, the digital CDR recovers a stable 10 GHz clock from an eye containing 0.8 UI p-p input jitter and achieves 1-10 MHz of tracking bandwidth. The transceiver achieves horizontal and vertical eye openings of 0.27 UI and 120 mV, respectively, at BER = 10 -9 . The quad SerDes is realized in 28 nm CMOS technology. Amortizing common blocks, it occupies 0.81 mm $^{2}$ per lane and achieves 23.2 mW/Gb/s power efficiency at 40 Gb/s.

101 citations

Journal ArticleDOI
TL;DR: A 56-Gb/s PAM4 wireline transceiver testchip is implemented in 16-nm FinFET, and the ADC-based receiver incorporates hybrid analog and digital equalizations.
Abstract: A 56-Gb/s PAM4 wireline transceiver testchip is implemented in 16-nm FinFET. The current mode logic transmitter incorporates an auxiliary current injection at the output nodes to maintain PAM4 amplitude linearity. The ADC-based receiver incorporates hybrid analog and digital equalizations. The analog equalization is performed using two identical stages of continuous time linear equalizer, each having a constant of ~0-dB dc-gain and a maximum peaking of ~7 dB peaking at 14 GHz. A 28-GSample/s 32-way time-interleaved SAR ADC converts the equalized analog signal into digital domain for further equalization using digital signal processing. The transceiver achieves <1e-8 bit error rate over a backplane channel with 31-dB loss at 14-GHz and 3.5-mVrms additional crosstalk, using a fixed ~10-dB TX equalization and an adaptive hybrid RX equalization, with the DSP configured to have a 24-tap feed forward equalizer and a 1-tap decision feedback equalizer. The transceiver consumes 550-mW power at 56 Gb/s, excluding the power of the on-chip configurable DSP that cannot be accurately measured as it is implemented as part of a larger test structure.

95 citations

Journal ArticleDOI
TL;DR: In this article, the authors presented a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification of speech and non-speech.
Abstract: This work presents a ${\text{sub}}{\text{-}}6\ \upmu {\text{W}} $ acoustic frontend for speech/non-speech classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the VAD system is minimized by architectural design around a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification. Power-proportional sensing allows for hierarchical and context-aware scaling of the frontend’s power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on / off the computation of individual features depending on the features’ usefulness in a particular context. The proposed VAD system reduces the power consumption by $\text{{10}} \times $ as compared to state-of-the-art (SotA) systems and yet achieves an 89% average hit rate (HR) for a 12 dB signal-to-acoustic-noise ratio (SANR) in babble context, which is at par with software-based VAD systems.

92 citations

Journal ArticleDOI
TL;DR: A novel all-digital CDR scheme in 90 nm CMOS with generalized sampling and retiming architecture is used in an efficient sharing technique that reduces the number of clocks required, saving power and area in high-density interconnect.
Abstract: This paper presents a novel all-digital CDR scheme in 90 nm CMOS. Two independently adjustable clock phases are generated from a delay line calibrated to 2 UI. One clock phase is placed in the middle of the eye to recover the data (“data clock”) and the other is swept across the delay line (“search clock”). As the search clock is swept, its samples are compared against the data samples to generate eye information. This information is used to determine the best phase for data recovery. After placing the search clock at this phase, search and data functions are traded between clocks and eye monitoring repeats. By trading functions, infinite delay range is realized using only a calibrated delay line, instead of a PLL or DLL. Since each clock generates its own alignment information, mismatches in clock distribution can be tolerated. The scheme's generalized sampling and retiming architecture is used in an efficient sharing technique that reduces the number of clocks required, saving power and area in high-density interconnect. The shared CDR is implemented using static CMOS logic in a 90 nm bulk process, occupying 0.15 mm2. It operates from 6 to 9 Gb/s, and consumes 2.5 mW/Gb/s of power at 6 Gb/s and 3.8 mW/Gb/s at 9 Gb/s.

80 citations

Journal ArticleDOI
TL;DR: This paper introduces a fully-integrated wireline transceiver operating at 40 Gb/s that incorporates a 5-tap finite-inpulse response (FIR) filter with LC-based delay lines precisely adjusted by a closed-loop delay controller.
Abstract: This paper introduces a fully-integrated wireline transceiver operating at 40 Gb/s. The transmitter incorporates a 5-tap finite-inpulse response (FIR) filter with LC-based delay lines precisely adjusted by a closed-loop delay controller. The receiver employs a similar 3-tap FIR filter as an equalizer front-end with digital adaptation, and a sub-rate clock and data recovery circuit using majority voting phase detection. The transceiver delivers 40-Gb/s 27-1 PRBS data across a Rogers channel of 20 cm (19-dB loss at 20 GHz) with BER <; 10-12 while consuming a total power of 655 mW.

80 citations