scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2009"


Book
03 Apr 2009
TL;DR: This book addresses CMOS logic gates, cell library, timing arcs, waveform slew, cell capacitance, timing modeling, interconnect parasitics and coupling, pre- and post-layout interconnect modeling, delay calculation, specification of timing constraints for analysis of internal paths as well as IO interfaces.
Abstract: The book covers topics such as cell timing and power modeling; interconnect modeling and analysis, delay calculation, crosstalk, noise and the chip timing verification using static timing analysis. For each of these topics, the book provides a theoretical background as well as detailed examples to elaborate the concepts. The static timing analysis topics covered start from verification of simple blocks useful for a beginner to this field. The topics then extend to complex nanometer designs with in-depth treatment of concepts such as modeling of on-chip variation, clock gating, half-cycle paths, as well as timing of source-synchronous interfaces such as DDR. The impact of crosstalk on timing and noise is covered as is the usage of hierarchical design methodology. This book addresses CMOS logic gates, cell library, timing arcs, waveform slew, cell capacitance, timing modeling, interconnect parasitics and coupling, pre- and post-layout interconnect modeling, delay calculation, specification of timing constraints for analysis of internal paths as well as IO interfaces. Advanced modeling and analysis concepts such as controlled current source timing and noise models for nanometer technologies, power modeling including active and leakage power, crosstalk timing and crosstalk glitch calculation, verification of half-cycle and multi-cycle paths, false paths, synchronous interfaces are also covered.

208 citations


Journal ArticleDOI
TL;DR: Applying clock gating to the energy recovery clocked flip-flops reduces their power by more than 1000times in the idle mode with negligible power and delay overhead in the active mode.
Abstract: A significant fraction of the total power in highly synchronous systems is dissipated over clock networks. Hence, low-power clocking schemes are promising approaches for low-power design. We propose four novel energy recovery clocked flip-flops that enable energy recovery from the clock network, resulting in significant energy savings. The proposed flip-flops operate with a single-phase sinusoidal clock, which can be generated with high efficiency. In the TSMC 0.25-mum CMOS technology, we implemented 1024 proposed energy recovery clocked flip-flops through an H-tree clock network driven by a resonant clock-generator to generate a sinusoidal clock. Simulation results show a power reduction of 90% on the clock-tree and total power savings of up to 83% as compared to the same implementation using the conventional square-wave clocking scheme and flip-flops. Using a sinusoidal clock signal for energy recovery prevents application of existing clock gating solutions. In this paper, we also propose clock gating solutions for energy recovery clocking. Applying our clock gating to the energy recovery clocked flip-flops reduces their power by more than 1000times in the idle mode with negligible power and delay overhead in the active mode. Finally, a test chip containing two pipelined multipliers one designed with conventional square wave clocked flip-flops and the other one with the proposed energy recovery clocked flip-flops is fabricated and measured. Based on measurement results, the energy recovery clocking scheme and flip-flops show a power reduction of 71% on the clock-tree and 39% on flip-flops, resulting in an overall power savings of 25% for the multiplier chip.

173 citations


Patent
09 Jul 2009
TL;DR: In this paper, an integrated circuit device includes an open loop clock distribution circuit and a transmit circuit that cooperate to enable high-speed transmission of information-bearing symbols unaccompanied by source-synchronous timing references.
Abstract: In a low-power signaling system, an integrated circuit device includes an open loop- clock distribution circuit and a transmit circuit that cooperate to enable high-speed transmission of information-bearing symbols unaccompanied by source-synchronous timing references. The open-loop clock distribution circuit generates a transmit clock signal in response to an externally-supplied clock signal, and the transmit circuit outputs a sequence of symbols onto an external signal line in response to transitions of the transmit clock signal. Each of the symbols is valid at the output of the transmit circuit for a symbol time and a phase offset between the transmit clock signal and the externally-supplied clock signal is permitted to drift by at least the symbol time.

93 citations


Proceedings ArticleDOI
06 Mar 2009
TL;DR: A new technique, called Common Activity-based Model for Power (CAMP), is proposed, to estimate activity factors and power for microarchitectural structures, using a relatively few input parameters based on general microprocessor utilization statistics.
Abstract: Microprocessor power has become a first-order constraint at run-time. Designers must employ aggressive power-management techniques at run-time to keep a processor's ballooning power requirements under control. Effective power management benefits from knowledge of run-time microprocessor power consumption in both the core and individual microarchitectural structures, such as caches, queues, and execution units. Increasingly feasible per-structure power-control techniques, such as fine-grain clock gating, power gating, and dynamic voltage/frequency scaling (DVFS), become more effective from run-time estimates of per-structure power. However, run-time computation of per-structure power estimates based on utilization requires daunting numbers of input statistics, which makes per-structure monitoring of run-time power a challenging problem. To address the challenges of estimating per-structure power in hardware, we propose a new technique, called Common Activity-based Model for Power (CAMP), to estimate activity factors and power for microarchitectural structures. Despite using a relatively few input parameters-specifically nine-based on general microprocessor utilization statistics (e.g., IPC and load rate), our linear-regression-based model estimates activity and dynamic power for over 100 structures in an out-of-order x86 pipeline and core power with an average error of 8%. Because the computations utilize few inputs, CAMP is simple enough to implement in hardware, providing run-time structure and core power estimates for dynamic power management. Because the input statistics are generic in nature and the model remains accurate across incremental microarchitectural refinements, CAMP provides simple intuitive equations relating global microarchitectural statistics to structure activity and power. These equations provide a simple technique that can equate changes in one structure's activity to power variations in other structures across the pipeline.

84 citations


Journal ArticleDOI
TL;DR: This work describes how the Cell Broadband Engine (Cell BE) processor is experimentally transformed to have a resonant-load global clock distribution similar to the one in (Chan et al., 2004).
Abstract: Resonant clock distributions have the potential to save power by recycling energy from cycle-to-cycle while at the same time improving performance by reducing the clock distribution latency and filtering out non-periodic noise. While these features have been successfully demonstrated in several small-scale experiments, there remained a number of concerns about whether these techniques would scale to a product application. By modifying the Cell broadband engine processor to incorporate a large resonant global clock network, power savings with full functionality is demonstrated over a 20% range in clock frequencies, and a 6-8 Watt power savings at 4 GHz. This was achieved by changing one wiring level and adding an additional thick copper level to create inductors and capacitors.

74 citations


Proceedings ArticleDOI
02 Nov 2009
TL;DR: The algorithm minimizes the overall wirelength and clock power consumption while providing the pre-bonding testability and post-bond operability under given skew and slew constraints.
Abstract: Pre-bond testing of 3D stacked ICs involves testing individual dies before bonding. The overall yield of 3D ICs improves with pre-bond testability because designers can avoid stacking defective dies with good ones. However, pre-bond testability presents unique challenges to 3D clock tree design. First, each die needs a complete 2D clock tree for the pre-bond testing. In addition, the entire 3D stack needs a complete 3D clock tree for post-bond testing and normal operations. In the case of two-die stack, a straightforward solution is to have two complete 2D clock trees connected with a single Through-Silicon-Via (TSV). We show that this solution suffers from long wirelength and high clock power consumption. Instead, our algorithm minimizes the overall wirelength and clock power consumption while providing the pre-bond testability and post-bond operability under given skew and slew constraints. Compared with the single-TSV solution, SPICE simulation results show that our multi-TSV approach significantly reduces the clock power by up to 15.9% for two-die and 29.7% for four-die stack. In addition, the wirelength reduction is up to 24.4% and 42.0%.

69 citations


Journal ArticleDOI
TL;DR: A 20-Gb/s full-rate clock and data recovery circuit employing a mixer-type linear phase detector and automatic frequency locking technique is described, revealing rms and peak-to-peak jitter of 480 fs and 4.22 ps in response to a 231 -1 PRBS on the recovered clock while consuming 154 mW from a 1.5-V supply.
Abstract: A 20-Gb/s full-rate clock and data recovery circuit employing a mixer-type linear phase detector and automatic frequency locking technique is described. The phase detector achieves high-speed operation by mixing the clock with the data-transition pulses, providing output proportional to the phase error. The frequency acquisition loop utilizes the data phases rather than the clock phases to distill the frequency difference, and no external reference is used in this design. Fabricated in 90-nm CMOS technology, this circuit reveals rms and peak-to-peak jitter of 480 fs and 4.22 ps in response to a 231 -1 PRBS on the recovered clock while consuming 154 mW from a 1.5-V supply.

68 citations


Proceedings ArticleDOI
29 Sep 2009
TL;DR: This work considers and evaluates FPGA clock network architectures with built-in clock gating capability and describes a flexible placement algorithm that can operate with various gating granularities (various sizes of device regions containing clock loads that can be gated together).
Abstract: Clock gating is a power reduction technique that has been used successfully in the custom ASIC domain. Clock and logic signal power are saved by temporarily disabling the clock signal on registers whose outputs do not affect circuit outputs. We consider and evaluate FPGA clock network architectures with built-in clock gating capability and describe a flexible placement algorithm that can operate with various gating granularities (various sizes of device regions containing clock loads that can be gated together). Results show that depending on the clock gating architecture and the fraction of time clock signals are enabled, clock power can be reduced by over 50%, and results suggest that a fine granularity gating architecture yields significant power benefits.

59 citations


Proceedings ArticleDOI
20 Apr 2009
TL;DR: This work proposes an event-guided, adaptive method for avoiding voltage emergencies, which exploits the fact that most emergencies are correlated with unique microarchitectural events, such as cache misses or the pipeline flushes that follow branch mispredictions.
Abstract: Supply voltage fluctuations that result from inductive noise are increasingly troublesome in modern microprocessors. A voltage "emergency", i.e., a swing beyond tolerable operating margins, jeopardizes the safe and correct operation of the processor. Techniques aimed at reducing power consumption, e.g., by clock gating or by reducing nominal supply voltage, exacerbate this noise problem, requiring ever-wider operating margins. We propose an event-guided, adaptive method for avoiding voltage emergencies, which exploits the fact that most emergencies are correlated with unique microarchitectural events, such as cache misses or the pipeline flushes that follow branch mispredictions. Using checkpoint and rollback to handle unavoidable emergencies, our method adapts dynamically by learning to trigger avoidance mechanisms when emergency-prone events recur. After tightening supply voltage margins to increase clock frequency and accounting for all costs, the net result is a performance improvement of 8% across a suite of fifteen SPEC CPU2000 benchmarks.

47 citations


Proceedings ArticleDOI
26 Jul 2009
TL;DR: This work argues that runtime adaptation of micro-architectural parameters, such as instruction window size and issue width, is a more effective mechanism for DTM and synergistically combining architectural adaptation with DVFS and fetch gating can achieve the best performance under thermal constraints.
Abstract: Exponentially rising cooling/packaging costs due to high power density call for architectural and software-level thermal management. Dynamic thermal management (DTM) techniques continuously monitor the on-chip processor temperature. Appropriate mechanisms (e.g., dynamic voltage or frequency scaling (DVFS), clock gating, fetch gating, etc.) are engaged to lower the temperature if it exceeds a threshold. However, all these mechanisms incur significant performance penalty. We argue that runtime adaptation of micro-architectural parameters, such as instruction window size and issue width, is a more effective mechanism for DTM. If the architectural parameters can be tailored to track the available instruction-level parallelism of the program, the temperature is reduced with minimal performance degradation. Moreover, synergistically combining architectural adaptation with DVFS and fetch gating can achieve the best performance under thermal constraints. The key difficulty in using multiple mechanisms is to select the optimal configuration at runtime for time varying workloads. We present a novel software-level thermal management framework that searches through the configuration space at regular intervals to find the best performing design point that is thermally safe. The central components of our framework are (1) a neural-network based classifier that filters the thermally unsafe configurations, (2) a fast performance prediction model for any configuration, and (3) an efficient configuration space search algorithm. Experimental results indicate that our adaptive scheme achieves 59% reduction in performance overhead compared to DVFS and 39% reduction in overhead compared to DVFS combined with fetch gating.

39 citations


Proceedings ArticleDOI
22 Feb 2009
TL;DR: In this paper, two complementary approaches for clock power reduction in the Xilinx Virtex-5 FPGA are presented, where clock enable signals on flip-flops are selectively migrated to use the dedicated clock enable available on the built-in clock network.
Abstract: Clock network power in field-programmable gate arrays (FPGAs) is considered and two complementary approaches for clock power reduction in the Xilinx Virtex-5 FPGA are presented. The approaches are unique in that they leverage specific architectural aspects of Virtex-5 to achieve reductions in dynamic power consumed by the clock network. The first approach comprises a placement-based technique to reduce interconnect resource usage on the clock network, thereby reducing capacitance and power (up to 12%). The second approach borrows the "clock gating" notion from the ASIC domain and applies it to FPGAs. Clock enable signals on flip-flops are selectively migrated to use the dedicated clock enable available on the FPGA's built-in clock network, leading to reduced toggling on the clock interconnect and lower power (up to 28%). Power reductions are achieved without any performance penalty, on average.

Patent
17 Nov 2009
TL;DR: In this article, a phase mixer receives two intermediate clocks and generates the final output clock having a phase between the phases of the intermediate clocks, and the output clock from the phase mixer is time synchronized with the input reference clock and does not exhibit any jitter or noise even at high clock frequency inputs.
Abstract: A clock synchronization system and method avoids output clock jitter at high frequencies and also achieves a smooth phase transition at the boundary of the coarse and fine delays. The system may use a single coarse delay line configured to generate two intermediate clocks from the input reference clock and having a fixed phase difference therebetween. The coarse delay line may have a hierarchical or a non-hierarchical structure. A phase mixer receives these two intermediate clocks and generates the final output clock having a phase between the phases of the intermediate clocks. The coarse shifting in the delay line at high clock frequencies does not affect the phase relationship between the intermediate clocks fed into the phase mixer. The output clock from the phase mixer is time synchronized with the input reference clock and does not exhibit any jitter or noise even at high clock frequency inputs. Because of the rules governing abstracts, this abstract should not be used to construe the claims.

Patent
14 May 2009
TL;DR: In this paper, a clock gating system and method is described, which includes an input logic circuit having at least one input to receive at least 1 input signal and having an output at an internal enable node.
Abstract: A clock gating system and method is disclosed. In a particular embodiment, the system includes an input logic circuit having at least one input to receive at least one input signal and having an output at an internal enable node. A keeper circuit includes at least one switching element that is responsive to a gated clock signal and is coupled to the internal enable node to selectively hold a logical voltage level at the internal enable node. The system further includes a gating element responsive to an input clock signal and to the logical voltage level at the internal enable node to generate the gated clock signal.

Proceedings ArticleDOI
Eli Arbel1, Cindy Eisner1, Oleg Rokhlenko1
26 Jul 2009
TL;DR: This paper proposes two optimization techniques for resurrecting infeasible clock gating functions that can be used as a generic post-processing phase in an automaticClock gating tool and aims at generating large gating domains by clustering similar clock gates functions.
Abstract: In this paper we consider the problem of exploiting infeasible clock gating functions. Analysis of industrial designs reveals a large margin of potential for power saving based on clock gating functions that initially appear to be useless due to timing violation or excessive power consumption. We propose two optimization techniques for resurrecting such functions that can be used as a generic post-processing phase in an automatic clock gating tool. The first provides timing-aware approximation and the second aims at generating large gating domains by clustering similar clock gating functions. Our experimental results show that the combination of these two techniques yields an additional power saving of up to 78% in industrial designs.

Proceedings ArticleDOI
24 May 2009
TL;DR: It is found that ≫70% wasted power reduction (including both short-circuit and leakage powers) as compared to the conventional asynchronous-logic pipeline stage can be achieved with all gating configurations.
Abstract: In this paper, a fine-grained power gating technique for an asynchronous-logic pipeline stage is proposed using locally controlled gating transistors. The proposed power gating technique is implemented with minimal control overheads (one additional inverter per pipeline stage for driving PMOS Gating) and delay overheads (within 15% more than the conventional asynchronous-logic pipeline stage). Different types of gating configurations using only PMOS transistor (PMOS Gating), only NMOS transistor (NMOS Gating), and both types of transistors (Dual Gating) are examined and compared. The effectiveness of the proposed power gating technique to the Combinational Block therein with different data input rates is investigated. Based on the computer simulation results, we have found that ≫70% wasted power reduction (including both short-circuit and leakage powers) as compared to the conventional asynchronous-logic pipeline stage can be achieved with all gating configurations. In particular, Dual Gating achieves the best wasted power reduction of 86% for short-circuit power and 99% for leakage power @ 10Mbps input rate.

01 Sep 2009
TL;DR: This study identifies the precise conditions under which partial reconfiguration reduces the total energy consumption, and proposes solutions that minimize the configuration energy overhead.
Abstract: In this paper we investigate whether partial reconfiguration can be used to reduce FPGA energy consumption. The core idea is that within a hardware design there are a number of independent circuits, and some can be idle for long periods of time. Idle circuits still consume power though, especially through clock oscillation and static leakage. Using partial reconfiguration we can replace these circuits during their idle time with others that consume much less power. Since the reconfiguration process itself introduces energy overhead, it is unclear whether this approach actually leads to an overall energy saving or to a loss. This study identifies the precise conditions under which partial reconfiguration reduces the total energy consumption, and proposes solutions that minimize the configuration energy overhead. Partial reconfiguration is compared against clock gating to evaluate its effectiveness. We apply these techniques to an existing embedded microprocessor design, and show how FPGAs can be used to accelerate application performance while also reducing overall energy consumption.

Proceedings ArticleDOI
01 Nov 2009
TL;DR: In this article, a stacking power gating structure is introduced which minimizes the leakage power and provides a way to control the ground bounce noise in transition mode, and the conditions for the important design goals such as (i) Minimum ground bounce noises and (ii) Minimum wakeup latency have been derived.
Abstract: Power gating is an effective method to reduce leakage current in logic circuits during sleep mode. However, conventional power gating technique for minimizing leakage current introduces ground bounce noise during sleep to active mode transition. In this paper, a high performance stacking power gating structure is introduced which minimizes the leakage power and provides a way to control the ground bounce noise in transition mode. Stacking power gating technique has been analyzed and the conditions for the important design goals such as (i) Minimum ground bounce noise and (ii) Minimum wakeup latency have been derived. The tradeoff between the ground bounce noise and wakeup latency has been explored for high performance power gating logic circuits. Further, to evaluate the efficacy of the proposed stacking power gating technique, simulation has been done using proposed technique and implemented on basic 2-input NAND gate circuit with BPTM 90nm technology. The leakage current is reduced by 81.1% over the conventional power gating technique. Ground bounce noise has also been reduced to 76.28% as comparison to the conventional power gating technique.

Patent
Tetsu Hasegawa1
16 Mar 2009
TL;DR: In this paper, a scan chain circuit with flip-flops acting as shift registers was proposed to allow a scan shift to be executed based on the logic of a scan enable signal.
Abstract: A scan chain circuit causes a plurality of flip-flops to function as shift registers during execution of a scan test and can execute a scan shift that serially transfers test pattern data for the scan test. A clock gating circuit controls output of a pulse of a clock signal supplied to the scan chain circuit in accordance with a clock gating signal, whereas disables the clock gating signal based on a logic of a scan enable signal authorizing the scan shift. A first clock gating circuit included in the clock gating circuit disables the clock gating signal during the scan shift based on the logic of the scan enable signal and also inverts the clock signal and outputs a result of inverting.

Proceedings Article
23 Nov 2009
TL;DR: In this paper, the authors provide a comprehensive knowledge of structural and algorithmic solutions that can be used to alleviate power dissipation during manufacturing test, and show how low-power circuits and systems can be tested safely without affecting yield and reliability.
Abstract: Managing the power consumption of circuits and systems is now considered as one of the most important challenges for the semiconductor industry. Elaborate power management strategies, such as voltage scaling, clock gating or power gating techniques, are used today to control the power dissipation during functional operation. The usage of these strategies has various implications on manufacturing test, and power-aware test is therefore increasingly becoming a major consideration during design-for-test and test preparation for low-power devices. This tutorial provides the fundamental and advanced knowledge in this area. It is organized into three main parts. The first one gives necessary background and discusses issues arising from excessive power dissipation during manufacturing test. The second part provides comprehensive knowledge of structural and algorithmic solutions that can be used to alleviate such problems. The last part surveys low-power design techniques and shows how low-power circuits and systems can be tested safely without affecting yield and reliability. Electronic Design Automation (EDA) solutions for testing low-power devices are also covered in the last part of the tutorial.

Proceedings ArticleDOI
18 Dec 2009
TL;DR: By taking advantage of the existing clock gating circuitry and selectively holding the value of some scan flip-flops, switching activity during the capture cycles of a test can be reduced.
Abstract: Scan-based manufacturing test of low power designs often exceeds the very tight functional constraints on average and instantaneous logic switching. The logic activity during the shift and launch-capture of test pattern data may lead to excessive power consumption and voltage droop. This paper focuses on the management of instantaneous power during the capture phase. By taking advantage of the existing clock gating circuitry and selectively holding the value of some scan flip-flops, switching activity during the capture cycles of a test can be reduced. The effectiveness of this technique is demonstrated on several industrial designs that show up to 30% (55%) reduction in instantaneous (average) capture switching.

Patent
09 Jul 2009
TL;DR: In this article, an integrated circuit device includes an open loop clock distribution circuit and a transmit circuit that cooperate to enable high-speed transmission of information-bearing symbols unaccompanied by source-synchronous timing references.
Abstract: In a low-power signaling system, an integrated circuit device includes an open loop- clock distribution circuit and a transmit circuit that cooperate to enable high-speed transmission of information-bearing symbols unaccompanied by source-synchronous timing references. The open-loop clock distribution circuit generates a transmit clock signal in response to an externally-supplied clock signal, and the transmit circuit outputs a sequence of symbols onto an external signal line in response to transitions of the transmit clock signal. Each of the symbols is valid at the output of the transmit circuit for a symbol time and a phase offset between the transmit clock signal and the externally-supplied clock signal is permitted to drift by at least the symbol time.

Proceedings ArticleDOI
29 Sep 2009
TL;DR: The proposed power reduction technique achieves better power reduction than pixel truncation technique with a similar PSNR loss and is quantified on the power consumption of two full search ME hardware implementations on a Xilinx Virtex II FPGA usingXilinx XPower tool.
Abstract: Motion Estimation (ME) is the most computationally intensive and the most power consuming part of video compression and video enhancement systems. In this paper, we propose a novel power reduction technique for ME hardware. We quantified the impact of glitch reduction, clock gating and the proposed technique on the power consumption of two full search ME hardware implementations on a Xilinx Virtex II FPGA using Xilinx XPower tool. Glitch reduction and clock gating together achieved an average of 21% dynamic power reduction. The proposed technique achieved an average of 23% dynamic power reduction with an average of 0.4db PSNR loss. The proposed technique achieves better power reduction than pixel truncation technique with a similar PSNR loss.

Patent
12 Aug 2009
TL;DR: In this paper, a second-level power gating controller monitors a set of events for each unit in the set of units within the data processing system and determines preceding sequences of a predetermined length that precede the idle sequences.
Abstract: A mechanism is provided for predictively power gating a set of units within the data processing system. A second-level power gating controller monitors a set of events for each unit in a set of units within the data processing system. The second-level power gating controller identifies idle sequences of a predetermined set of cycles within the events from each unit where the unit is idle. The second-level power gating controller determines preceding sequences of a predetermined length that precede the idle sequences. The second-level power gating controller determines an accuracy of the preceding sequences. Responsive to the accuracy being above a threshold, the second-level power gating controller sends a permit command to a first-level power gating mechanism associated with the unit to permit power gating of the unit.

Journal ArticleDOI
TL;DR: An adaptive circuit technique is presented that senses the temperature of different parts of the clock tree and adjusts the driving strengths of the corresponding clock buffers dynamically to reduce the clock skew, leading to much improved clock synchronization and design performance.
Abstract: On-chip temperature gradient has emerged as a major design concern for high-performance integrated circuits for the current and future technology nodes. Clock skew is an undesirable phenomenon for synchronous digital circuits that is exacerbated by the temperature difference between various parts of the clock tree. The main aim of this paper is to provide intelligent solution for minimizing the temperature-dependent clock skew by designing dynamically adaptive circuit elements, particularly the clock buffers. Using an RLC model of the clock tree, we investigate the effect of on-chip temperature gradient on the clock skew for a number of temperature profiles that can arise in practice due to different architectures and applications. As an effective way of mitigating the variable clock skew, we present an adaptive circuit technique that senses the temperature of different parts of the clock tree and adjusts the driving strengths of the corresponding clock buffers dynamically to reduce the clock skew. Simulation results demonstrate that our adaptive technique is capable of reducing the skew by up to 92.4%, leading to much improved clock synchronization and design performance.

Proceedings ArticleDOI
20 Apr 2009
TL;DR: This paper demonstrates a mathematical framework to compute the impact of NBTI on gating-enabled clock tree considering their workload dependent temperature variation and achieves up-to 70% reduction in clock skew degradation with miniscule power and area penalty.
Abstract: NBTI (Negative Bias Temperature Instability) has emerged as the dominant PMOS device failure mechanism for sub-100nm VLSI designs. There is little research to quantify its impact on skew of clock trees. This paper demonstrates a mathematical framework to compute the impact of NBTI on gating-enabled clock tree considering their workload dependent temperature variation. Circuit design techniques are proposed to deal with NBTI induced clock skew by achieving balance in NBTI degradation of clock devices. Our technique achieves up-to 70% reduction in clock skew degradation with miniscule (≪0.1%) power and area penalty.

Patent
19 Mar 2009
TL;DR: In this paper, a phase-locked loop (PLL) and a calibrator are used to adjust the frequency of the output clock signal according to a control signal, and a frequency calibration is performed.
Abstract: A clock generation circuit is provided and includes a phase locked loop (PLL) and a calibrator. The PLL is arranged to receive a first clock signal and generate the output clock signal. The PLL adjusts the frequency of the output clock signal according to a control signal. The calibrator is arranged to receive the output clock signal and a second clock signal, execute a frequency calibration between the output clock signal and the second clock signal, and generate the control signal according to results of the frequency calibration.

Patent
09 Mar 2009
TL;DR: In this paper, the authors proposed a power reduction in microcontrollers by reactivating a clock in the microcontroller for one or more peripheral modules in response to an internal or external trigger event.
Abstract: The disclosed implementations provide for power reduction in microcontrollers by reactivating a clock in the microcontroller for one or more peripheral modules in response to an internal or external trigger event, thus allowing the one or more peripheral modules to respond to events while operating in a low-power sleep mode. In some implementations, one or more peripheral modules in a microcontroller provide a clock request signal to a clock generator in the microcontroller. In response to the clock request signal, the clock generator reactivates one or more oscillator sources. The clock generator resumes clock generation only for the one or more requesting peripheral modules, keeping power consumption in the microcontroller to a minimum and not disturbing other modules in the microcontroller.

Journal ArticleDOI
TL;DR: This work proposes an adjustable power management using a hierarchical cascaded power gating (HCPG) and a hierarchical multi-level clock gating(HMLCG) with special regard for DRX framework and shows that optimal power saving was achieved.
Abstract: Long term evolution (LTE) of 3GPP provides high bandwidth for data transfer. In the viewpoint of the LTE terminal baseband modem chip, these high data rates require more complex logics and memories than previous baseband modem chips in preceding technologies such as Wideband Code Division Multiple Access (W-CDMA), High Speed Downlink Packet Access (HSDPA), and High Speed Packet Access (HSPA). It may exhaust user equipment (UE)'s battery power quickly. Thus, LTE provides an enhanced discontinuous reception (DRX) to extend the UE's battery lifetime. In this work, we propose an adjustable power management using a hierarchical cascaded power gating (HCPG) and a hierarchical multi-level clock gating (HMLCG) with special regard for DRX framework. The test results from the designed ASIC show that optimal power saving was achieved by using the proposed adjustable power management.

Patent
Feng Lin1
23 Oct 2009
TL;DR: In this paper, a delay-lock loop with a voltage-controlled delay line was proposed, in which a plurality of delayed clock signals having different phases were combined to generate output signals, and an initialization circuit set the delay of the delay line to a minimum delay value and then compared this delay value to the period of the input clock signal.
Abstract: A delay-lock loop receives an input clock signal from the output of a programmable divider that receives a reference clock signal. The delay-lock loop includes a voltage-controlled delay line generating a plurality of delayed clock signals having different phases. A plurality of the delayed clock signals are combined to generate a plurality of output signals. During an initialization period, an initialization circuit sets the delay of the delay line to a minimum delay value and then compares this delay value to the period of the input clock signal. Based on this comparison, the initialization circuit programs the programmable divider and adjusts the number of delayed clock signals combined to generate the output signals. More specifically, as the frequency of the reference clock signal increases, the divider is programmed to divide by a greater number, and a larger number of delay clock signals are combined to generate the output signals.

Patent
02 Jul 2009
TL;DR: In this paper, a clock receiver includes a bias circuit for establishing a bias voltage in the differential clock signal and a differential amplifier for amplifying the differential signal, which functions as a negative feedback signal for rejecting common-mode noise in the clock signal.
Abstract: A clock receiver includes a capacitive coupling circuit for filtering out direct-current voltages from a differential clock signal. In this way, the capacitive coupling circuit rejects common-mode noise in the differential clock signal. The clock receiver also includes a bias circuit for establishing a bias voltage in the differential clock signal and a differential amplifier for amplifying the differential clock signal. Further, the differential amplifier generate a feedback differential clock signal and provides the feedback differential clock signal to the bias circuit for further rejecting common-mode noise in the differential clock signal. The feedback differential clock signal functions as a negative feedback signal for rejecting common-mode noise in the differential clock signal and as a positive feedback signal for amplifying the differential clock signal. In some embodiments, the clock receiver includes a capacitive coupling circuit with a cut-off frequency above the frequency of the differential clock signal.