scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A 7 Gb/s Embedded Clock Transceiver for Energy Proportional Links

TL;DR: A rapid-on/off transceiver for embedded clock architecture that enables energy proportional communication over the serial link that demonstrates power scalability with a wide range of link utilization and, therefore, helps in improving overall system efficiency.
Abstract: A rapid-on/off transceiver for embedded clock architecture that enables energy proportional communication over the serial link is presented. In an energy proportional link, energy consumed by serial link is proportional to the amount of data communicated. Energy proportionality can be achieved by scaling the serial link power linearly with the link utilization, and fine grained rapid power state transition (rapid-on/off) is one such technique which can achieve this objective. In this paper, architecture and circuit techniques to achieve rapid-on/off in PLL, transmitter and receiver are discussed. Background phase calibration technique in PLL and CDR phase calibration logic in receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utilization and, therefore, helps in improving overall system efficiency. Fabricated in 65 nm CMOS technology, the 7 Gb/s transceiver achieves power-on-lock in less than 20 ns. Proposed PLL achieves power-on-lock in 1 ns. The transceiver achieves power scaling by 44 $\times$ (63.7 mW-to-1.43 mW) and energy efficiency degradation by only 2.2 $\times$ (9.1 pJ/bit-to-20.5 pJ/bit), when the effective data rate (link utilization) changes by 100 $\times$ (7 Gb/s-to-70 Mb/s). The proposed transceiver occupies an active die area of 0.39 mm $^{2}$ .
Citations
More filters
Journal ArticleDOI
TL;DR: An all-digital phase-locked loop (ADPLL) for Bluetooth low energy (BLE) that eliminates the need for a crystal oscillator (XO) other than a 32.768-kHz real-time clock (RTC) already present in wireless systems is introduced.
Abstract: In this paper, we introduce an all-digital phase-locked loop (ADPLL) for Bluetooth low energy (BLE) that eliminates the need for a crystal oscillator (XO) other than a 32.768-kHz real-time clock (RTC) already present in wireless systems. Specifically, we propose to replace the conventional channel settling with a band settling that would be carried out only once per global device power up. The ADPLL locks to the center of the Bluetooth band (2440 MHz) upon system power-up and jointly performs an instantaneous channel hopping and Gaussian frequency shift keying (GFSK) modulation in a two-point manner to overcome the narrow PLL bandwidth (BW) due to the 32.768-kHz reference. Extensive calibrations linearize the effective cubic digitally controlled oscillator (DCO) transfer function to achieve a precise frequency range of hopping and modulation. Realized in 16-nm FinFET, it consumes <1 mW at ≤0.45 V, while achieving best-in-class performance and <100-ns hopping time.

27 citations


Cites background or methods from "A 7 Gb/s Embedded Clock Transceiver..."

  • ...The high reference frequency ( fR) ensures a sufficiently wide bandwidth (BW) of tens to hundreds of kilohertz of the surrounding phaselocked loop (PLL) to: 1) quickly acquire a new channel, preferably with loop presetting techniques [19], [28], [29] and 2) suppress lower frequency phase noise (PN) of the RF oscillator....

    [...]

  • ...Having adopted the ADPLL architecture for its potential to perform the ultra-fast [28], [29] or even instantaneous and...

    [...]

  • ...channel frequency at wakeups [19], [28], [29]....

    [...]

Journal ArticleDOI
TL;DR: A rapid ON/OFF LC-based fractional-N injection-locked clock multiplier (ILCM) is presented, which employs a high-resolution digital-to-time converter to align the injected pulses to the oscillator’s zero crossings, illustrating almost instantaneous settling.
Abstract: A rapid ON/OFF LC-based fractional-N injection-locked clock multiplier (ILCM) is presented. The proposed architecture extends the merits of ILCMs to fractional-N operation. It employs a high-resolution digital-to-time converter to align the injected pulses to the oscillator’s zero crossings. An all-digital frequency-tracking loop continuously tunes the oscillator free-running frequency toward the target output frequency. The proposed clock multiplier can be powered ON from a completely OFF state almost instantaneously. Background calibration techniques ensure that robust operation across process, voltage, and temperature. Fabricated in 65-nm CMOS process with an active area of 0.27 mm2, the prototype ILCM generates output clock in the range of 6.75–8.25 GHz using a 115-MHz reference clock. It achieves integrated jitter performance of 109 fsrms (integer-N) and 177 fsrms (fractional-N), while consuming only 2.65 (integer-N) and 3.25 mW (fractional-N). This translates to the best-reported FoMJ of −255 (integer-N) and −250 dB (fractional-N). The turn-on time is less than 4 ns in both the integer- and fractional-N modes, illustrating almost instantaneous settling.

21 citations


Cites background or methods from "A 7 Gb/s Embedded Clock Transceiver..."

  • ...[17] showed that output phase trajectory of an oscillator can be made to take a deterministic path....

    [...]

  • ...This rapid ON/OFF capability is beneficial in implementing energy proportional links [17], [18]....

    [...]

  • ...Compared to [17] and [29], the proposed rapid power-ON locking technique controls only the start pulse of the DCO, so it has no impact on ILCM loop dynamics and phase noise performance....

    [...]

Journal ArticleDOI
TL;DR: The ways to improve startup time of XOs are presented, using a two-step injection technique in a three-step process, which reduces the XO startup time to within 1.5 seconds.
Abstract: Fast startup crystal oscillators (XOs) are needed in heavily duty-cycled communication systems for implementing aggressive dynamic power management schemes. This article presents the ways to improve startup time of XOs. Using a two-step injection technique in a three-step process, the proposed technique reduces the XO startup time to within 1.5 $\times $ the theoretical minimum. By solving the differential equation governing crystal resonator under injection for arbitrary injection frequency, the behavior of energy build-up inside a crystal resonator is analyzed and used to determine optimum injection time as a function of the desired XO steady-state amplitude and injection frequency error. Bounds on tolerable injection frequency error to guarantee the existence of optimal timing are provided. Fabricated in a 65-nm CMOS process, the proposed 54-MHz fast startup XO occupies an active area of 0.075 mm2 and achieves a startup time of less than 20 $\mu \text{s}$ across a temperature range of −40 °C to 85 °C while consuming a startup energy of 34.9 nJ and operating from a 1.0-V supply.

13 citations


Cites background from "A 7 Gb/s Embedded Clock Transceiver..."

  • ...Because conventional DPLLs suffer from long locking time, techniques to reduce it are needed [23]–[26]....

    [...]

Journal ArticleDOI
TL;DR: A baud-rate ROO DFE receiver that can turn on in just 10 ns (~120 UI) and is implemented using a new timing function that is amenable to operation with a loop un-rolled decision feedback equalizer (DFE).
Abstract: Rapid ON/OFF (ROO) operation helps scale power in accordance with link utilization. In this article, we present a baud-rate ROO receiver that can turn on in just 10 ns (~120 UI). Baud-rate clock and data recovery (CDR) is implemented using a new timing function that is amenable to operation with a loop un-rolled decision feedback equalizer (DFE). The receiver is turned on rapidly by sweeping the recovered clock phase across the received data bit by offsetting the digitally controlled oscillator (DCO) frequency at each power-ON event. This first ROO DFE receiver also includes a continuous-time linear equalizer (CTLE) and three-tap DFE to compensate up to 20-dB channel loss at Nyquist. Fabricated in a 65-nm CMOS process, the prototype receiver recovers 12 Gb/s with BER 30-MHz JTOL corner, 377 $fs_{\text {rms}}$ recovered clock jitter, and 3.8-pJ/bit energy efficiency.

8 citations


Cites background or methods from "A 7 Gb/s Embedded Clock Transceiver..."

  • ...2978138 circuits have been adequately addressed [1], many bottle-...

    [...]

  • ...tecture [1] and is controlled by four different paths: wakeup,...

    [...]

  • ...temperature compensation schemes [1], [25]....

    [...]

  • ...Other approaches that rely on a fixed delay relationship between the wakeup detector and the initial oscillator phase [1] [see Fig....

    [...]

  • ...RAPID ON/OFF (ROO) transceivers offer an attractive means to reduce power consumption in data center network switches [1], mobile interfaces, and reconfigurable optical networks [2], [3]....

    [...]

References
More filters
Journal ArticleDOI
Luiz Andre Barroso1, Urs Hölzle1
TL;DR: Energy-proportional designs would enable large energy savings in servers, potentially doubling their efficiency in real-life use, particularly the memory and disk subsystems.
Abstract: Energy-proportional designs would enable large energy savings in servers, potentially doubling their efficiency in real-life use. Achieving energy proportionality will require significant improvements in the energy usage profile of every system component, particularly the memory and disk subsystems.

2,499 citations


"A 7 Gb/s Embedded Clock Transceiver..." refers background in this paper

  • ...Consequently, serial link power may become the bottleneck to increasing processor's computational capacity....

    [...]

Proceedings ArticleDOI
Dennis Abts1, Michael R. Marty1, Philip M. Wells1, Peter Michael Klausler1, Hong Liu1 
19 Jun 2010
TL;DR: It is demonstrated that energy proportional datacenter communication is indeed possible and that there is a significant power advantage to having independent control of each unidirectional channel comprising a network link.
Abstract: Numerous studies have shown that datacenter computers rarely operate at full utilization, leading to a number of proposals for creating servers that are energy proportional with respect to the computation that they are performing. In this paper, we show that as servers themselves become more energy proportional, the datacenter network can become a significant fraction (up to 50%) of cluster power. In this paper we propose several ways to design a high-performance datacenter network whose power consumption is more proportional to the amount of traffic it is moving -- that is, we propose energy proportional datacenter networks. We first show that a flattened butterfly topology itself is inherently more power efficient than the other commonly proposed topology for high-performance datacenter networks. We then exploit the characteristics of modern plesiochronous links to adjust their power and performance envelopes dynamically. Using a network simulator, driven by both synthetic workloads and production datacenter traces, we characterize and understand design tradeoffs, and demonstrate an 85% reduction in power --- which approaches the ideal energy-proportionality of the network. Our results also demonstrate two challenges for the designers of future network switches: 1) We show that there is a significant power advantage to having independent control of each unidirectional channel comprising a network link, since many traffic patterns show very asymmetric use, and 2) system designers should work to optimize the high-speed channel designs to be more energy efficient by choosing optimal data rate and equalization technology. Given these assumptions, we demonstrate that energy proportional datacenter communication is indeed possible.

473 citations


"A 7 Gb/s Embedded Clock Transceiver..." refers background in this paper

  • ...Consequently, serial link power may become the bottleneck to increasing processor's computational capacity....

    [...]

Journal ArticleDOI
TL;DR: A multi-core processor that integrates 48 cores, 4 DDR3 memory channels, and a voltage regulator controller in a 64 2D-mesh network-on-chip architecture that uses message passing while exploiting 384 KB of on-die shared memory for fine grain power management.
Abstract: This paper describes a multi-core processor that integrates 48 cores, 4 DDR3 memory channels, and a voltage regulator controller in a 64 2D-mesh network-on-chip architecture. Located at each mesh node is a five-port virtual cut-through packet-switched router shared between two IA-32 cores. Core-to-core communication uses message passing while exploiting 384 KB of on-die shared memory. Fine grain power management takes advantage of 8 voltage and 28 frequency islands to allow independent DVFS of cores and mesh. At the nominal 1.1 V supply, the cores operate at 1 GHz while the 2D-mesh operates at 2 GHz. As performance and voltage scales, the processor dissipates between 25 W and 125 W. The processor is implemented in 45 nm Hi-K CMOS and has 1.3 billion transistors.

415 citations


"A 7 Gb/s Embedded Clock Transceiver..." refers background in this paper

  • ...This results in energy proportional operation, where the energy consumed to transfer data is directly proportional to the amount of data transferred and is independent of link utilization....

    [...]

Journal ArticleDOI
TL;DR: In this article, an exact analysis for third-order charge-pump phase-locked loops using state equations is presented, and the effect of the loop parameters and the reference frequency on the loop phase margin and stability is analyzed.
Abstract: In this paper, we present an exact analysis for third-order charge-pump phase-locked loops using state equations. Both the large-signal lock acquisition process and the small-signal linear tracking behavior are described using this analysis. The nonlinear state equations are linearized for the small-signal condition and the z-domain noise transfer functions are derived. A comparison to some of the existing analysis methods such as the impulse-invariant transformation and s-domain analysis is provided. The effect of the loop parameters and the reference frequency on the loop phase margin and stability is analyzed. The analysis is verified using behavioral simulations in MATLAB and SPECTRE.

152 citations