scispace - formally typeset
Search or ask a question
Author

Ahmed Ragab

Bio: Ahmed Ragab is an academic researcher from Nvidia. The author has contributed to research in topics: Low-power electronics & Clock rate. The author has an hindex of 2, co-authored 2 publications receiving 25 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A half-duplex serial link design that is capable of 22 Gbps operation over PCB channels with up to 20 dB of loss is presented and can be configured either as a pre-cursor or post- cursor 2-tap FIR filter.
Abstract: A half-duplex serial link design that is capable of 22 Gbps operation over PCB channels with up to 20 dB of loss is presented. A current-mode transmitter can be configured either as a pre-cursor or post-cursor 2-tap FIR filter. The receiver consists of a trans-admittance-trans-impedance single-stage linear equalizer that can provide 10 dB of high-frequency gain without the use of peaking inductors. The CTLE is followed by an half-rate 2-tap decision feedback equalizer with direct feedback. To mitigate long-tail intersymbol interference in a power-efficient manner, a third DFE tap employs a single-pole IIR filter. A 15-22 GHz LC-PLL provides quadrature clocks to a 16-lane macro. The 16-lane macro occupies 1.66 mm × 1.6 mm in a 28 nm CMOS process and is packaged in a 45 mm × 45 mm flip-chip MCM module. The link operates from two power supplies at 1.35 V and 0.9 V with a BER and a power efficiency of 6.5 mW/Gbps at 20 Gbps.

16 citations

Proceedings ArticleDOI
06 Mar 2014
TL;DR: This work targets reliable, differential, bi-directional links at 20 Gb/s over 6” FR4 PCB trace and flip-chip packages with a total loss budget of 20 dB at Nyquist.
Abstract: As the processing power and clock rate of CPUs and GPUs increase, there is a need for increased I/O bandwidth to enable chip-to-chip communication. I/O pin limitations demand faster links at low power to enable integration of high chip-to-chip bandwidth. However, the channel losses and impedance discontinuities increase at high data rates making it difficult to equalize the channel at low power. In this work, we target reliable, differential, bi-directional links at 20 Gb/s over 6” FR4 PCB trace and flip-chip packages with a total loss budget of 20 dB at Nyquist. In a half-duplex link, one TX and RX are connected on each side and the link direction can be turned around by the controller. A link-turnaround latency of <;10 ns is achieved by placing several key circuits on standby when not in use and by designing fast bias circuits. When fast turnaround is not required, the circuits not in use are powered down permanently and the link is reduced to the simplex case. The top-level transceiver architecture is shown. An LC-VCO-based PLL oscillates at 20 GHz and generates quadrature I/Q clocks at 10 GHz. Both TX and RX use a half-rate architecture to optimize power. The clocks are distributed through an on-chip transmission line to 16 I/O lanes arranged in 2 rows. The links are capable of data rates as low as 14 Gb/s to save power when full bandwidth is not required.

11 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A 28-Gb/s receiver IC with self-contained adaptive equalization and sampling point control using an on-chip stochastic sigma-tracking eye-opening monitor (SSEOM) that accurately detects the bit-error-rate (BER)-related eye contour efficiently without the use of an external microcontroller.
Abstract: This paper describes a 28-Gb/s receiver IC with self-contained adaptive equalization and sampling point control using an on-chip stochastic sigma-tracking eye-opening monitor (SSEOM). The proposed SSEOM accurately detects the bit-error-rate (BER)-related eye contour efficiently without the use of an external microcontroller. The SSEOM determines the BER-optimal sampling point and equalizer coefficients on the basis of pattern-filtered eye diagrams. It also features a background adaptation scheme for robust long-term operation by tracking temperature variations and device aging. The proposed SSEOM is integrated in a 28-Gb/s receiver that is designed to compensate for channel loss up to 25 dB at the Nyquist rate by using a continuous time linear equalizer (CTLE) and a one-tap decision feedback equalizer (DFE) together with an one-tap pre-emphasis at a transmitter. The time required for complete adaptation and the total power consumption are 364 ms and 43.9 mW, respectively. The proposed 28-Gb/s receiver is fabricated in 40 nm CMOS.

44 citations

Journal ArticleDOI
TL;DR: A novel 4:1 multiplexer (MUX) is used as the final stage of the serializer to reduce power and a novel LC-based FFE structure is proposed to improve the bandwidth of the delay line and the output combiner.
Abstract: This paper presents a complete 50–64 Gb/s serializing transmitter including a 4-tap equalizer. An LC-based FFE structure is proposed. The FFE improves the bandwidth of the delay line and the output combiner by applying the design methodology of LC-ladder filters. Proper arrangement of the output combiner reduces the required number of inductors and hence reduces the area. In addition, a novel 4:1 multiplexer (MUX) is used as the final stage of the serializer to reduce power. Designed and fabricated in 65 nm CMOS technology, the transmitter achieves a maximum data rate of 64.5 Gb/s with an energy efficiency of 3.1 pJ/bit.

44 citations

Journal ArticleDOI
TL;DR: This paper presents a quad-lane serial transceiver that supports virtually all data center communication standards around 8.5-13 Gbps, implemented in 28 nm CMOS technology and represents the lowest reported power in its class to date.
Abstract: This paper presents a quad-lane serial transceiver that supports virtually all data center communication standards around 8.5–13 Gbps, implemented in 28 nm CMOS technology. The transmitter consists of 20:2 mux followed by a half-rate source-series terminated (SST) driver embedded with a 4 tap FFE and an analog equalizer. The receiver has an adaptive CTLE, 5 tap DFE, and fully digital CDR followed by 2:20 demux. At 13 Gbps, the transceiver can equalize 35 dB Nyquist loss at BER of 10-12. At 1.0 V supply, the transceiver consumes 49 mW/lane at 13 Gbps rate with full equalization capability. An LC VCO-based fractional PLL provides the clocking to quad TX/RX lanes using a low-power inductively tuned clock routing channel. The transceiver architecture not only enables the baud rate operation from 8.5 to 13 Gbps but also supports a wide range of oversampled subrates. This work represents the lowest reported power in its class to date, and the transceiver is suitable for many applications due to its comprehensive flexibility and power efficiency.

32 citations

Proceedings ArticleDOI
19 Mar 2015
TL;DR: The signal conditioner is the demonstration to achieve the BER <;1012 PRBS31 at 100G-KR4 in a 40dB chip-to-chip backplane with two connectors by using the 36-tap DFE to cancel the reflection and to operate across a wide range of data-rates from 0.3 to 28.05Gb/s.
Abstract: As processing and network speeds are accelerated to support data-rich services, the bandwidth of backplane interconnects needs to be increased while maintaining the channel length and multi-rate links. However, channel losses and impedance discontinuities increase at high data-rates, making it difficult to compensate the channel. In this work, we target serial links from auto-negotiation in 100G-KR4 of 0.3Gb/s to 32GFC of 28.05Gb/s in 40dB backplane architecture [1-3]. To achieve this challenge, there are two key techniques. First, we introduce a 36-tap decision-feedback equalizer (DFE) to cancel reflections due to connectors because these reflections close the eye. To operate the 36-tap DFE, we need to fix a CDR lock-point and calculate 36-tap coefficients accurately. Thus, we develop a pattern-captured CDR with a 4b pattern filter to fix the lock-point, and a 3b pattern-matched adaptive equalizer (AEQ) to optimize 36 tap coefficients. These techniques enable our chip to compensate 40dB channel loss. Second, we target 100G-KR4/40G-KR4/10G-KR/25G-KR and 32GFC/16GFC/8GFC/4GFC. To operate across a wide range of data-rates, from 0.3 to 28.05Gb/s, with low jitter, we develop a PLL architecture with two LC-VCOs and one ring VCO with a data-rate-adjustment technique by controlling an LDO. Our test chip is fabricated in 28nm CMOS. Our signal conditioner is the demonstration to achieve the BER <1012 PRBS31 at 100G-KR4 in a 40dB chip-to-chip backplane with two connectors by using the 36-tap DFE to cancel the reflection and to operate across a wide range of data-rates from 0.3 to 28.05Gb/s.

22 citations

Proceedings ArticleDOI
01 Feb 2018
TL;DR: It is shown that Si-IF-based packageless processors outperform their packaged counterparts by up to 58%, 136% (103% average), and 295% (80% average) due to increased memory bandwidth, increased allowable TDP, and reduced area respectively.
Abstract: Demand for increasing performance is far outpacing the capability of traditional methods for performance scaling. Disruptive solutions are needed to advance beyond incremental improvements. Traditionally, processors reside inside packages to enable PCB-based integration. We argue that packages reduce the potential memory bandwidth of a processor by at least one order of magnitude, allowable thermal design power (TDP) by up to 70%, and area efficiency by a factor of 5 to 18. Further, silicon chips have scaled well while packages have not. We propose packageless processors - processors where packages have been removed and dies directly mounted on a silicon board using a novel integration technology, Silicon Interconnection Fabric (Si-IF). We show that Si-IF-based packageless processors outperform their packaged counterparts by up to 58% (16% average), 136%(103% average), and 295% (80% average) due to increased memory bandwidth, increased allowable TDP, and reduced area respectively. We also extend the concept of packageless processing to the entire processor and memory system, where the area footprint reduction was up to 76%.

20 citations