scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2003"


Journal ArticleDOI
James W. Tschanz1, Siva G. Narendra1, Y. Ye1, B. Bloechel1, S. Borkar1, Vivek De1 
27 Oct 2003
TL;DR: In this paper, the authors used dynamic sleep transistors and body bias to control active leakage for a 32-bit integer execution core in 130-nm CMOS technology in order to manage the active power consumption of high-performance digital designs.
Abstract: In order to manage the active power consumption of high-performance digital designs, active leakage control techniques are required to provide significant leakage power savings coupled with fast time constants for entering and exiting idle mode. In this paper, dynamic sleep transistors and body bias are used in conjunction with clock gating to control active leakage for a 32-bit integer execution core in 130-nm CMOS technology. Measurements on pMOS sleep transistor reveal that lowest-leakage state is reached in less than 1 /spl mu/s, resulting in 37/spl times/ reduction in leakage power, while reactivation of block is achieved in less than two clock cycles. PMOS body bias reduces leakage power by 2/spl times/ with no performance penalty, and similar reactivation time. Power measurements at 4 GHz, 1.3 V, 75/spl deg/C demonstrate 8% total power reduction using dynamic body bias and 15% power reduction using a pMOS sleep transistor, for a typical activity profile.

332 citations


Proceedings ArticleDOI
25 Aug 2003
TL;DR: In this article, the ground bounce due to power mode transition in power gating structures was introduced and analyzed, and power gate switching noise reduction techniques were proposed to reduce ground bounce.
Abstract: We introduce and analyze the ground bounce due to power mode transition in power gating structures. To reduce the ground bounce, we propose novel power gating structures in which sleep transistors are turned on in a non-uniform stepwise manner. Our power gating structures reduce the magnitude of peak current and voltage glitches in the power distribution network as well as the minimum time required to stabilize power and ground. Experimental simulation results with PowerSpice fixtured in a package model demonstrate the effectiveness of the proposed power gate switching noise reduction techniques.

202 citations


Journal ArticleDOI
TL;DR: In this paper, a global clock network that incorporates standing waves and coupled oscillators to distribute a high-frequency clock signal with low skew and low jitter is described, including the key design issues involved in generating standing waves on a chip, including minimizing wire loss within an available technology.
Abstract: In this paper, a global clock network that incorporates standing waves and coupled oscillators to distribute a high-frequency clock signal with low skew and low jitter is described. The key design issues involved in generating standing waves on a chip are discussed, including minimizing wire loss within an available technology. A standing-wave oscillator, which is a distributed oscillator that sustains ideal standing waves on lossy wires, is introduced. A clock grid architecture comprised of coupled standing-wave oscillators and differential low-swing clock buffers is presented, along with a compact circuit model for networks of oscillators. The measured results for a prototyped standing-wave clock grid operating at 10 GHz and fabricated in a 0.18-/spl mu/m 6M CMOS logic process are presented. A technique is proposed for on-chip skew measurements with subpicosecond precision.

170 citations


Patent
17 Jan 2003
TL;DR: In this paper, a clock applying circuit for a synchronous memory is defined, which is composed of a clock input for receiving a clock signal, a clock output for delivering the signal to the memory in synchronism with but delayed from the clock input signal, the delay being a small fraction of the clock period of the signal.
Abstract: A clock applying circuit for a synchronous memory is comprised of a clock input for receiving a clock input signal, apparatus connected to the synchronous memory for receiving a driving clock signal, and a tapped delay line for receiving the clock input signal and for delivering the clock driving signal to the synchronous memory in synchronism with but delayed from the clock input signal, the delay being a small fraction of the clock period of the clock input signal.

156 citations


Proceedings ArticleDOI
08 Feb 2003
TL;DR: The resonant frequencies most relevant to current microprocessor packages are discussed, a "dI/dt stressmark" is produced that exercises the system at its resonant frequency, and the behavior of more mainstream applications are characterized.
Abstract: Increasing focus on power dissipation issues in current microprocessors has led to a host of proposals for clock gating and other power-saving techniques. While generally effective at reducing average power, many of these techniques have the undesired side-effect of increasing both the variability of power dissipation and the variability of current drawn by the processor This increase in current variability, often referred to as the dI/dt problem, can cause supply voltage fluctuations. Such voltage fluctuations lead to unreliable circuits if not addressed, and increasingly expensive chip packaging techniques are needed to mitigate them. This paper proposes and evaluates a methodology for augmenting packaging techniques for dI/dt with microarchitectural control mechanisms. We discuss the resonant frequencies most relevant to current microprocessor packages, produce and evaluate a "dI/dt stressmark" that exercises the system at its resonant frequency, and characterize the behavior of more mainstream applications. Based on these results plus evaluations of the impact of controller error and delay, our microarchitectural control proposals offer bounds on supply voltage fluctuations, with nearly negligible impact on performance and energy. With the ITRS roadmap predicting aggressive drops in supply voltage and power supply impedances in coming chip generations, novel voltage control techniques will be required to stay on track. Our microarchitectural dI/dt controllers represent a step in this direction.

144 citations


Proceedings ArticleDOI
09 Feb 2003
TL;DR: In this paper, a clock-gating substitute achieves a 200 ps wake-up time and 3 orders of magnitude leakage reduction for leakage dominant LSI's using Zigzag super cut-off CMOS.
Abstract: A block activation/deactivation technique uses zigzag super cut-off CMOS to improve wake up time by 8/spl times/ in 0.6 /spl mu/m CMOS versus super cutoff CMOS. Our clock-gating substitute achieves a 200 ps wake-up time and 3 orders of magnitude leakage reduction for leakage dominant LSI's.

133 citations


Bishop Brock1, Karthick Rajamani1
01 Jan 2003
TL;DR: This paper discusses several of the SOC design issues pertaining to dynamic voltage and frequency scalable systems, and how these issues were resolved in the IBM PowerPC 405LP processor, and introduces DPM, a novel architecture for policy-guided dynamic power management.
Abstract: This paper discusses several of the SOC design issues pertaining to dynamic voltage and frequency scalable systems, and how these issues were resolved in the IBM PowerPC 405LP processor. We also introduce DPM, a novel architecture for policy-guided dynamic power management. We illustrate the utility of DPM by its ability to implement several classes of power management strategies and demonstrate practical results for a 405LP embedded system. I. INTRODUCTION Advances in low-power components and system design have brought general purpose computation into watches, wireless telephones, PDAs and tablet computers. Power management of these systems has traditionally focused on sleep modes and device power management (1). Embedded processors for these applications are highly integrated system-on-a-chip (SOC) de- vices that also support aggressive power management through techniques such as programmable clock gating and dynamic voltage and frequency scaling (DVFS). This paper describes one of these processors, and the development of a software architecture for policy-guided dynamic power management. II. 405LP DESIGN AND POWER MANAGEMENT FEATURES The IBM PowerPC 405LP is a dynamic voltage and frequency scalable embedded processor targeted at high- performance battery-operated devices. The 405LP is an SOC ASIC design in a 0.18 m bulk CMOS process, integrating a PowerPC 405 CPU core modified for operation over a 1.0 V to 1.8 V range with off-the-shelf IP cores. The chip includes a flexible clock generation subsystem, new hardware accelerators for speech recognition and security, as well as a novel standby power management controller (2). In a system we normally operate the CPU/SDRAM at 266/133 MHz above 1.65 V and at 66/33 MHz above 0.9 V, typically providing a 13:1 SOC core power range over the 4:1 performance range. From a system design and active power management perspec- tive the most interesting facets of the 405LP SOC design concern the way the clocks are generated and controlled. These features of the processor are described in the remainder of this Section.

129 citations


Proceedings ArticleDOI
02 Jun 2003
TL;DR: This paper presents a methodology that, starting from an RTL description, automatically generates a set of constraints for driving the construction of the clock tree by the clock synthesis tool, fully integrated into an industry-strength design flow.
Abstract: As power consumption of the clock tree in modern VLSI designs tends to dominate, measures must be taken to keep it under control. This paper introduces an approach for reducing clock power based on clock gating. We present a methodology that, starting from an RTL description, automatically generates a set of constraints for driving the construction of the clock tree by the clock synthesis tool. The methodology has been fully integrated into an industry-strength design flow, based on Synopsys DesignCompiler (front-end) and Cadence Silicon Ensemble (back end). The power savings achieved on some industrial examples show that, when the size of the circuits is significant, savings on the power consumption of the clock tree are up to 75% larger than those achieved by applying traditional clock gating at the clock inputs of the RTL modules of the designs.

117 citations


Proceedings ArticleDOI
Hai Li1, Swarup Bhunia1, Yi Chen1, T. N. Vijaykumar1, Kaushik Roy1 
08 Feb 2003
TL;DR: Deterministic clock gating (DCG) is introduced based on the key observation that for many of the stages in a modern pipeline, a circuit block's usage in a specific cycle in the near future is deterministically known a few cycles ahead of time.
Abstract: With the scaling of technology and the need for higher performance and more functionality, power dissipation is becoming a major bottleneck for microprocessor designs. Pipeline balancing (PLB), a previous technique, is essentially a methodology to clock-gate unused components whenever a program's instruction-level parallelism is predicted to be low. However, no nonpredictive methodologies are available in the literature for efficient clock gating. This paper introduces deterministic clock gating (DCG) based on the key observation that for many of the stages in a modern pipeline, a circuit block's usage in a specific cycle in the near future is deterministically known a few cycles ahead of time. Our experiments show an average of 19.9% reduction in processor power with virtually no performance loss for an 8-issue, out-of-order superscalar processor by applying DCG to execution units, pipeline latches, D-Cache wordline decoders, and result bus drivers. In contrast, PLB achieves 9.9% average power savings at 2.9% performance loss.

115 citations


Proceedings ArticleDOI
02 Jun 2003
TL;DR: This work presents a technique, based on well-known power-reducing transformations coupled with randomized clock gating, that introduces a significant amount of scrambling in the power profile without increasing (and, in some cases, by even reducing) circuit power consumption.
Abstract: Differential power analysis is a very effective cryptanalysis technique that extracts information on secret keys by monitoring instantaneous power consumption of cryptoprocessors. To protect against differential power analysis, power supply noise is added in cryptographic computations, at the price of an increase in power consumption. We present a novel technique, based on well-known power-reducing transformations coupled with randomized clock gating, that introduces a significant amount of scrambling in the power profile without increasing (and, in some cases, by even reducing) circuit power consumption.

99 citations


Journal ArticleDOI
Takamoto Watanabe1, S. Yamauchi
TL;DR: An all-digital phase-locked loop (PLL) circuit in which resolution in the phase detector and digitally controlled oscillator (DCO) exactly matches the gate-delay time is presented, which can withstand a broad range of operating environments and is suitable for making a programmable clock generator on a chip.
Abstract: An all-digital phase-locked loop (PLL) circuit in which resolution in the phase detector and digitally controlled oscillator (DCO) exactly matches the gate-delay time is presented. The pulse delay circuit is connected in a ring shape with 32 inverters (2/sup 5/ inverters). With the inverter gate-delay time as the time base, the pulse phase difference is detected simultaneously with the generation of the output clock. In this system, the phase detector and oscillator share a single ring-delay-line (RDL). This means the resolution is the same at all times, making a high-speed response possible. In a prototype integrated circuit (IC) using 0.65-/spl mu/m CMOS, the generation of a frequency multiplication clock was achieved with four reference clocks, and that of a phase-locked clock with seven reference clocks, for a high-speed response. The cell size was 1.08 /spl times/ 1.08 mm/sup 2/, and the output clock frequency had a wide range of 50 kHz/spl sim/60 MHz. The multiplication range of the clock frequency was also a very wide 4/spl sim/1022, and a high level of precision was achieved with a clock jitter standard deviation of 234 ps. This digital PLL can withstand a broad range of operating environments, from -30/spl deg/C/spl sim/140/spl deg/C, and is suitable for making a programmable clock generator on a chip.

Journal ArticleDOI
TL;DR: A new technique of injecting clocks optically onto CMOS chips without the use of a receiver amplifier is presented and the benefits of such a direct approach are discussed and proof-of-principle experiments of the technique are presented.
Abstract: We present a new technique of injecting clocks optically onto CMOS chips without the use of a receiver amplifier. We discuss the benefits of such a direct approach and present proof-of-principle experiments of the technique. We analytically compare a receiver-less optical clock distribution and an electrical clock distribution in a fan-out-of-four clock tree to evaluate the timing and power benefits of the optical approach for present microprocessors. We also compare receiver-less direct injection of optical clocks to trans-impedance receiver based injection within the same distribution framework.

Proceedings ArticleDOI
Allan M. Hartstein1, Thomas R. Puzak1
03 Dec 2003
TL;DR: The theory shows that the more important power is to themetric, the shorter the optimum pipeline length that results, and that as dynamic power grows, the optimal design point shifts to shorter pipelines, while clock gating pushes the optimum to deeper pipelines.
Abstract: The impact of pipeline length on both the power and performance of a microprocessor is explored both theoretically and by simulation. A theory is presented for a wide range of power/performance metrics, BIPS/sup m//W. The theory shows that the more important power is to the metric, the shorter the optimum pipeline length that results. For typical parameters neither BIPS/W nor BIPS/sup 2//W yield an optimum, i.e., a non-pipelined design is optimal. For BIPS/sup 3//W the optimum, averaged over all 55 workloads studied, occurs at a 22.5 FO4 design point, a 7 stage pipeline, but this value is highly dependent on the assumed growth in latch count with pipeline depth. As dynamic power grows, the optimal design point shifts to shorter pipelines. Clock gating pushes the optimum to deeper pipelines. Surprisingly, as leakage power grows, the optimum is also found to shift to deeper pipelines. The optimum pipeline depth varies for different classes of workloads: SPEC95 and SPEC2000 integer applications, traditional (legacy) database and on-line transaction processing applications, modern (e.g. Web) applications, and floating point applications.

Proceedings ArticleDOI
03 Dec 2003
TL;DR: This paper defines a power similarity metric as an intersection of both magnitude based and ratio-wise similarities in the power dissipation of processor components and develops a thresholding algorithm in order to partition the power behavior into similarity groups.
Abstract: Characterizing program behavior is important for both hardware and software research. Most modern applications exhibit distinctly different behavior throughout their runtimes, which constitute several phases of execution that share a greater amount of resemblance within themselves compared to other regions of execution. These execution phases can occur at very large scales, necessitating prohibitively long simulation times for characterization. Due to the implementation of extensive clock gating and additional power and thermal management techniques in modern processors, these program phases are also reflected in program power behavior, which can be used as an alternative means of program behavior characterization for power-oriented research. In this paper, we present our methodology for identifying phases in program power behavior and determining execution points that correspond to these phases, as well as defining a small set of power signatures representative of overall program power behavior. We define a power similarity metric as an intersection of both magnitude based and ratio-wise similarities in the power dissipation of processor components. We then develop a thresholding algorithm in order to partition the power behavior into similarity groups. We illustrate our methodology with the gzip benchmark for its whole runtime and characterize gzip power behavior with both the selected execution points and defined signature vectors.

Patent
Jay W. Gustin1
24 Jun 2003
TL;DR: In this paper, a packet passing from an Ethernet Media Access Controller (MAC) to a Physical Interface Transceiver (PHY) is detected as a time synchronization packet from the time master.
Abstract: A device that recognizes the time synchronization packet and substitutes a real-time value from the master internal counter into the proper place in a data packet is placed between an Ethernet Media Access Controller (MAC) and a Physical Interface Transceiver (PHY). A second device monitors the packet passing from the MAC to the PHY and determines when it is a time synchronization packet from the time master. Upon recognition of the proper packet, the second device simultaneously captures the master's time value and captures the value of a local real-time clock. The result of these captures are presented to the local host computer which controls the time base clock that increments the local real-time clock to either speed up or slow down this local clock, thereby synchronizing the local clock to the time master clock. The offset and skew of the local clock to the master clock is reduced to only the network latency plus variability due to network congestion.

Proceedings ArticleDOI
13 Oct 2003
TL;DR: A new approach to global clock distribution is presented in which traditional tree-driven grids are augmented with on-chip inductors to resonate the clock capacitance at the fundamental frequency of the clock node.
Abstract: We present a new approach to global clock distribution in which traditional tree-driven grids are augmented with on-chip inductors to resonate the clock capacitance at the fundamental frequency of the clock node. Rather than being dissipated as heat, the energy of the fundamental resonates between electric and magnetic forms. The clock drivers must only provide the energy necessary to overcome losses. As a result, power reduction of over 80% is possible depending on the Q of the resonant system. Clock latency is also improved because the effective capacitance of the grid is lower, and fewer buffer stages are necessary to drive the grid. Skew and jitter reductions come about because of this reduced buffer latency.

Patent
06 Jun 2003
TL;DR: In this paper, a method for dynamically varying a clock frequency in a processor was proposed, which consisted of driving a clock distribution network with a clock output from a phased locked loop (PLL).
Abstract: A method for dynamically varying a clock frequency in a processor. The method of one embodiment comprises driving a clock distribution network with a clock output from a phased locked loop (PLL). An adjustable clock generator is locked with the phased locked loop. The adjustable clock generator is substituted for the PLL on the clock distribution network.

Patent
29 Sep 2003
TL;DR: An adaptive temperature dependent clock feedback control system and method for adaptively varying a frequency of a clock signal to a circuit such that the circuit may operate at a maximum safe operating clock frequency based on a circuit junction temperature is presented in this article.
Abstract: An adaptive temperature dependent clock feedback control system and method for adaptively varying a frequency of a clock signal to a circuit such that the circuit may operate at a maximum safe operating clock frequency based on a circuit junction temperature. The clock control system includes a thermal sensor and a temperature dependent dynamic overclock generator circuit. The thermal sensor detects a junction temperature corresponding to at least a portion of the circuit on a semiconductor die. The temperature dependent dynamic overclock generator circuit varies the clock signal based on the semiconductor die junction temperature, such that the clock signal operates at the highest possible operating frequency associated with the detected junction temperature. The frequency of the clock signal is increased from a first frequency to at least a second frequency and a third frequency if the junction temperature is below a lower junction temperature threshold.

Proceedings ArticleDOI
03 Dec 2003
TL;DR: Two statemachines that track parallelism on-the-fly are introduced, and the supply voltage is scaled depending on the level of parallelism, to avoid problems with circuit-level complexity concerns, stability and signal-propagationspeed issues which limit how fast VSV may transition between the voltages, and energy overhead factors which disallow supply-voltage scaling of largeRAM structures such as caches and register file.
Abstract: Energy efficient processor design is becoming more and more important with technology scaling and with high performance requirements. Supply-voltage scaling is an efficient way to reduce energy by lowering the operating voltage and the clock frequency of processor simultaneously. We propose a variable supply-voltage scaling (VSV) technique based on the following key observation: upon an L2 miss, the pipeline performs some independent computations but almost always ends up stalling and waiting for data, despite out-of-order issue and other latency-hiding techniques. Therefore, during an L2 miss we scale down the supply voltage of certain sections of the processor in order to reduce power dissipation while it carries on the independent computations at a lower speed. However, operating at a lower speed may degrade performance, if there are sufficient independent computations to overlap with the L2 miss. Similarly, returning to high speed may degrade power savings, if there are multiple outstanding misses and insufficient independent computations to overlap with them. To avoid these problems, we introduce two state machines that track parallelism on-the-fly, and we scale the supply voltage depending on the level of parallelism. We also consider circuit-level complexity concerns which limit VSV to two supply voltages, stability and signal-propagation speed issues which limit how fast VSV may transition between the voltages, and energy overhead factors which disallow supply-voltage scaling of large RAM structures such as caches and register file. Our simulation shows that VSV achieves an average of 20.7% total processor power reduction with 2.0% performance degradation in an 8-way, out-of-order-issue processor that implements deterministic clock gating and software prefetching, for those SPEC2K benchmarks that have high L2 miss rates. Averaging across all the benchmarks, VSV reduces total processor power by 7.0% with 0.9% performance degradation.

Journal ArticleDOI
20 Oct 2003
TL;DR: A 121 mm/sup 2/ graphics LSI is for portable 2D/3D graphics and MPEG4 applications that contains a RISC processor with MAC, a 3D rendering engine, 29Mb DRAM and is built in a 0.16/spl mu/m pure DRAM technology.
Abstract: A 121-mm/sup 2/ graphics LSI is designed and implemented for portable two-dimensional (2-D) and three-dimensional (3-D) graphics and MPEG-4 applications. The LSI contains a RISC processor with a multiply-accumulate unit (MAC), a 3-D rendering engine, a programmable power optimizer, and 29-Mb embedded DRAM. The chip is built in a 0.16-/spl mu/m pure DRAM technology to reduce the fabrication cost. Texture-mapped 3-D graphics with perspective-correct address calculation and bilinear MIPMAP filtering can be realized while consuming the low power with the help of depth-first clock gating, address alignment logic, and embedded DRAM. Programmable clocking allows the LSI to operate in lower power modes for various applications. The chip consumes less than 210 mW, delivering 66 Mpixels/s and 264 Mtexel/s texture-mapped pixels with real-time special effects such as full-scene antialiasing and motion blur.

Patent
26 Feb 2003
TL;DR: In this paper, a micromachined inductor and a pair of varactors are developed in metal layers on a silicon substrate to realize the high quality factor LC-tank apparatus.
Abstract: MEMS-based, computer system, clock generation and oscillator circuits and LC-tank apparatus for use therein are provided and which are fabricated using a CMOS-compatible process. A micromachined inductor (L) and a pair of varactors (C) are developed in metal layers on a silicon substrate to realize the high quality factor LC-tank apparatus. This micromachined LC-tank apparatus is incorporated with CMOS transistor circuitry in order to realize a digital, tunable, low phase jitter, and low power clock, or time base, for synchronous integrated circuits. The synthesized clock signal can be divided down with digital circuitry from several GHz to tens of MHz—a systemic approach that substantially improves stability as compared to the state of the art. Advanced circuit design techniques have been utilized to minimize power consumption and mitigate transistor flicker noise upconversion, thus enhancing clock stability.

Patent
Phillip J. Restle1
24 Nov 2003
TL;DR: In this article, the inductance value selected to resonate with the capacitive clock distribution circuit at resonance, power dissipation is reduced while skew and jitter performance can be improved.
Abstract: An integrated circuit (IC), IC assembly and circuit for distributing a clock signal in an integrated circuit includes a capacitive clock distribution circuit having at least one conductor therein. At least one inductor is formed in a metal layer of the integrated circuit and is coupled to the clock distribution circuit. The inductor, generally in the form of a number of spiral inductors distributed throughout the integrated circuit, provides an inductance value selected to resonate with the capacitive clock distribution circuit at resonance, power dissipation is reduced while skew and jitter performance can be improved.

Proceedings ArticleDOI
12 Jun 2003
TL;DR: In this article, a post-silicon clock timing adjustment architecture utilizing GA was proposed, which has three advantages: enhanced clock frequency leading to improved operating yields, lower power supply voltages while maintaining operating yield, and reductions in design times.
Abstract: A post-silicon clock timing adjustment architecture utilizing genetic algorithms (GA) is proposed, which has three advantages: (1) enhanced clock frequency leading to improved operating yields, (2) lower power supply voltages while maintaining operating yield, and (3) reductions in design times. Experiments with two different developed LSI chips and a design experiment demonstrated these advantages with a clock frequency enhancement of 25% (max), a power supply voltage reduction of 33%, and 21% shorter design times.

Patent
Elad Alon1, Scott C. Best1
08 Sep 2003
TL;DR: In this article, a circuit and method for digital control of delay lines in a delay-locked loop (DLL) system is presented, where an output tap from one delay line is used to produce a rising edge in an output clock signal while a corresponding tap in the complementary delay line produces a falling edge in the output signal to correct for distortion.
Abstract: A circuit and method is shown for digital control of delay lines in a delay locked loop (DLL) system. A pair of multiplexors (MUXes) is used to select output taps from a pair of complementary delay lines that delay a reference clock signal in order to lock onto a received clock signal. An output tap from one delay line is used to produce a rising edge in an output clock signal while a corresponding tap in the complementary delay line is used to produce a falling edge in the output signal in order to correct for distortion. The MUXes are controlled based on a phase difference detected between the received clock signal and a feedback clock corresponding to the output clock signal. Another aspect of the present invention provides for generation of a quadrature clock by interpolating between the rising and falling edges selected for the output clock signal. Still another aspect of the present invention provides for selectively disabling unused elements of the delay lines to reduce power consumption.

Patent
23 Sep 2003
TL;DR: In this paper, a tunable clock distribution system is used to minimize the power dissipation of a clock distribution network in an integrated circuit, where the inductance is tuned so that the resonant frequency approaches the frequency of the clock signal.
Abstract: A tunable clock distribution system is used to minimize the power dissipation of a clock distribution network in an integrated circuit. The tunable clock distribution system provides a tunable inductance on the clock distribution network to adjust a resonant frequency in the tunable clock distribution system. The inductance is tuned so that the resonant frequency of the tunable clock distribution system approaches the frequency of the clock signal on the clock distribution network. As the resonant frequency of the tunable clock distribution system approaches the frequency of the clock signal, the power dissipation of the clock distribution network decreases. Some embodiments also provide a tunable capacitance on the clock distribution network to adjust the resonant frequency of the tunable clock distribution system.

Patent
Takashi Hirata1, Toru Iwata1
11 Feb 2003
TL;DR: In this paper, a multi-phase clock transmission circuit includes a clock generator for generating a clock synchronizing with a reference clock, and a control signal responsive to the phase difference between the reference clock and the generated clock.
Abstract: A multi-phase clock transmission circuit includes: a clock generator for generating a clock synchronizing with a reference clock and a control signal responsive to the phase difference between the reference clock and the generated clock; and a delay circuit for generating a multi-phase clock based on the clock and the control signal. The clock generator generates a signal having a frequency equal to an integral multiple of the frequency of the reference clock and outputs the signal as the clock. The delay circuit has a circuit receiving the clock and including a plurality of delay elements in cascade connection each giving a delay according to the control signal to an input signal. Signals output from the plurality of delay elements are used as signals constituting the multi-phase clock.

Proceedings ArticleDOI
Stephan Held1, Bernhard Korte1, J. Massberg1, M. Ringe1, Jens Vygen1 
09 Nov 2003
TL;DR: A new method for clock scheduling and clocktree construction that improves the performance of high-end ASICs significantly andconstructs a clocktree that realizes arrival times within these intervals and exploits positive slacks to save power consumption.
Abstract: In this paper we present a new method for clock scheduling and clocktree construction that improves the performance of high-end ASICs significantly. First, we compute a clock schedule that yields the optimum cycle time and the best possible clock distribution with respect to early and late mode; in particular the number of critical tests is minimized. Second, individual arrival time intervals are computed for all endpoints of the clocktree. Finally, we construct a clocktree that realizes arrival times within these intervals and exploits positive slacks to save power consumption. We demonstrate the superiority of our method to previous approaches by experimental results on industrial ASICs with up to 194000 registers and more than 160 clock domains. We improved the clock frequencies by 5-28% up to 1.033 GHz (in hardware).

Patent
Hoe-ju Chung1, Kyu-hyoun Kim1
04 Sep 2003
TL;DR: In this article, a duty cycle correction circuit and an interpolating circuit interpolating a clock signal in a semiconductor memory device are presented, and a control circuit that controls the interpolation circuit in response to the clock frequency information of the external clock.
Abstract: A semiconductor memory device having a duty cycle correction circuit and an interpolating circuit interpolating a clock signal in the semiconductor memory device are disclosed. The semiconductor memory device comprises a duty cycle correction circuit, which receives an external clock, corrects the duty cycle of the external clock, and outputs the corrected duty cycle. The duty cycle correction circuit comprises a first delay locked loop that receives the external clock, inverts the external clock, synchronizes the external clock with the inverted external clock, and outputs the synchronized clock; a second delay locked loop that receives the inverted external clock, synchronizes the inverted external clock with the external clock and outputs the synchronized clock; an inverting circuit that inverts the output signal of the first delay locked loop; an interpolation circuit that interpolates the output signal of the inverting circuit with the output signal of the second delay locked loop, and outputs the interpolated signal; and a control circuit that controls the interpolation circuit in response to the clock frequency information of the external clock.

Patent
03 Dec 2003
TL;DR: In this article, a banked register file with power dissipation control capabilities is presented, where the register file is configured to selectively disable unused banks of registers by selectively gating off clock, address, and data inputs supplied by the register files.
Abstract: A circuit arrangement and method of controlling power dissipation utilize a register file (60) with power dissipation control capabilities through a banked register design coupled with enable logic (62, 82) that is configured to selectively disable unused banks (70) of registers by selectively gating off clock (74), address (76) and data (78) inputs supplied thereto.

Patent
Kazi Asaduzzaman1, Wilson Wong1
24 Sep 2003
TL;DR: In this article, a clock and data recovery circuitry is provided that is used in integrated circuits such as programmable logic device integrated circuits. But the clock and the data recovery circuit may have automatic mode switching capabilities, such as override signals may be used to force the circuit out of automatic mode and into either the reference or data mode.
Abstract: Clock and data recovery circuitry is provided that is used in integrated circuits such as programmable logic device integrated circuits. The clock and data recovery circuitry may recover digital data and an embedded clock from a high-speed differential input data stream. The clock and data recovery circuitry may have automatic mode switching capabilities. When operated in reference mode, the clock and data recovery circuit may use a first phase-locked loop to lock onto a reference clock. When operated in data mode, the clock and data recovery circuit may use a second phase-locked loop to lock onto the phase of the differential data stream. A control circuit may automatically switch the clock and data recovery circuit between the reference mode and the data mode. Override signals may be used to force the clock and data recovery circuit out of the automatic mode and into either the reference or data mode.