scispace - formally typeset
Search or ask a question
Topic

Clock gating

About: Clock gating is a research topic. Over the lifetime, 7838 publications have been published within this topic receiving 107903 citations.


Papers
More filters
Proceedings ArticleDOI
26 Jul 2009
TL;DR: This work argues that runtime adaptation of micro-architectural parameters, such as instruction window size and issue width, is a more effective mechanism for DTM and synergistically combining architectural adaptation with DVFS and fetch gating can achieve the best performance under thermal constraints.
Abstract: Exponentially rising cooling/packaging costs due to high power density call for architectural and software-level thermal management. Dynamic thermal management (DTM) techniques continuously monitor the on-chip processor temperature. Appropriate mechanisms (e.g., dynamic voltage or frequency scaling (DVFS), clock gating, fetch gating, etc.) are engaged to lower the temperature if it exceeds a threshold. However, all these mechanisms incur significant performance penalty. We argue that runtime adaptation of micro-architectural parameters, such as instruction window size and issue width, is a more effective mechanism for DTM. If the architectural parameters can be tailored to track the available instruction-level parallelism of the program, the temperature is reduced with minimal performance degradation. Moreover, synergistically combining architectural adaptation with DVFS and fetch gating can achieve the best performance under thermal constraints. The key difficulty in using multiple mechanisms is to select the optimal configuration at runtime for time varying workloads. We present a novel software-level thermal management framework that searches through the configuration space at regular intervals to find the best performing design point that is thermally safe. The central components of our framework are (1) a neural-network based classifier that filters the thermally unsafe configurations, (2) a fast performance prediction model for any configuration, and (3) an efficient configuration space search algorithm. Experimental results indicate that our adaptive scheme achieves 59% reduction in performance overhead compared to DVFS and 39% reduction in overhead compared to DVFS combined with fetch gating.

39 citations

Proceedings ArticleDOI
12 Nov 2007
TL;DR: The results show that the placement techniques used to make placement clock-aware have a significant influence on power and delay, and that the clock network architecture is also important.
Abstract: The programmable clock networks in FPGAs have a significant impact on overall power, area, and delay. Not only does the clock network itself dissipate a significant amount of power, since it connects to every latch on the FPGA and toggles every cycle, but the design of the clock network also affects how efficiently the rest of the application can be implemented since it imposes constraints on the CAD tools which map the application onto the FPGA. To examine this tradeoff, this paper describes and compares new clock-aware placement techniques and then examines how the clock network architecture affects overall power, area, and delay. Our results show that the placement techniques used to make placement clock-aware have a significant influence on power and delay. On average, circuits placed using the most effective techniques dissipate 9.9% less energy and were 2.4% faster than circuits placed using the least effective techniques. Moreover, the results show that the clock network architecture is also important. On average, FPGAs with an efficient clock network were up to 12.5% more energy efficient and 7.2% faster than other FPGAs.

39 citations

Patent
21 Jun 2005
TL;DR: In this article, a variable speed data processor includes a clock generator generating a plurality of clocks at different clock rates, synchronously selecting one of the clocks as an output clock signal to data processing circuitry, based on a data activity indication.
Abstract: A variable speed data processor includes a clock generator generating a plurality of clocks at different clock rates. Clock select circuitry synchronously selects one of the clocks as an output clock signal to data processing circuitry, based on a data activity indication. Activity logic generates the data activity indication based at least in part on the existence of data processing activity targeted to the data processing circuitry. When the data processing circuitry experiences bursty data processing activity, the clock rate can shift rapidly between the multiple clock rates, conserving power without substantially diminishing the availability of the data processing circuitry.

39 citations

Patent
06 Oct 2000
TL;DR: In this article, a dual mode clock alignment device including a clock buffer cell, a PLL, and a first set and second set of buffers is presented, where the first and second sets of buffers are further arranged to receive the second clock from the PLL for operating in a second clock mode.
Abstract: The present invention provides a dual mode clock alignment device including a clock buffer cell, a PLL, and a first set and second set of buffers. The clock buffer cell is arranged to receive a first clock and delays the first clock. The PLL is arranged to receive the delayed first clock from the clock buffer and outputs a second clock. The first and second sets of buffers are arranged to receive the delayed first clock from the clock buffer cell for operating in a first clock mode. The first and second sets of buffers are further arranged to receive the second clock from the PLL for operating in a second clock mode. In this arrangement, the first set of buffers delays the received clock by a first delay to output a third clock and the second set of buffers delays the delayed clock by a second delay to output a fourth clock. When operating in the second clock mode, the first, third, and fourth clocks are all aligned.

39 citations

Patent
03 Apr 2012
TL;DR: In this article, a clock delay may be adjusted between paths for transmitting a clock to these circuits to ensure operation performance of a circuit region under DVFS control at low costs and highly precisely while a power supply voltage change is made to the region.
Abstract: There is a need to ensure operation performance of a circuit region under DVFS control at low costs and highly precisely while a power-supply voltage change is made to the region. A first circuit (FVA) uses a first power-supply voltage (VDDA) for operation. A second circuit (NFVA) uses a second power-supply voltage (VDDB) for operation. A clock delay may be adjusted between paths for transmitting a clock to these circuits. When VDDA equals VDDB, a clock is distributed to FVA through a path that does not contain a delay device for phase adjustment. When the power-supply voltage for the FVA region is reduced, a clock is distributed to the FVA region based on a phase equivalent to one or two cycles of the clock displaced. Synchronization control is provided to synchronize clocks (CKAF and CKBF) and ensures operation so that a phase of two clocks to be compared fits in a range of design values while the power-supply voltage for the first circuit is changed.

39 citations


Network Information
Related Topics (5)
CMOS
81.3K papers, 1.1M citations
89% related
Integrated circuit
82.7K papers, 1M citations
85% related
Electronic circuit
114.2K papers, 971.5K citations
85% related
Semiconductor memory
45.4K papers, 663.1K citations
83% related
Transistor
138K papers, 1.4M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202231
202137
202050
201968
201884