scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2001"


Proceedings ArticleDOI
20 Jan 2001
TL;DR: This work investigates dynamic thermal management as a technique to control CPU power dissipation and explores the tradeoffs between several mechanisms for responding to periods of thermal trauma and the effects of hardware and software implementations.
Abstract: With the increasing clock rate and transistor count of today's microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dynamic thermal management as a technique to control CPU power dissipation. With the increasing usage of clock gating techniques, the average power dissipation typically seen by common applications is becoming much less than the chip's rated maximum power dissipation. However system designers still must design thermal heat sinks to withstand the worse-case scenario. We define and investigate the major components of any dynamic thermal management scheme. Specifically we explore the tradeoffs between several mechanisms for responding to periods of thermal trauma and we consider the effects of hardware and software implementations. With approximate dynamic thermal management, the CPU can be designed for a much lower maximum power rating, with minimal performance impact for typical applications.

882 citations


Journal ArticleDOI
Nasser A. Kurd1, J.S. Barkarullah, R.O. Dizon1, Thomas D. Fletcher1, P.D. Madland1 
TL;DR: Core and I/O clock design for the Pentium(R) 4 microprocessor is described and Silicon speed path tools and clock debug features are designed to enable a short debug cycle.
Abstract: Core and I/O clock design for the Pentium(R) 4 microprocessor is described. Two phase-locked loops generate core and I/O clocks supporting concurrent multiple frequencies. A clock distribution network with skew optimization and jitter reduction is designed to achieve low clock inaccuracies for processors at frequencies /spl ges/2 GHz for the core and /spl ges/4 GHz for the rapid execution engine. A global medium clock frequency is distributed. Local clock drivers generate pulsed or regular (nonpulsed) clocks at fast, medium, and slow frequencies. A 3.2-GB/s system bus is achieved using a dedicated I/O phase-locked loop with glitch protection and detection. Silicon speed path tools and clock debug features are designed to enable a short debug cycle.

189 citations


Proceedings ArticleDOI
29 Mar 2001
TL;DR: Experimental results are shown indicating that the proposed approach to minimizing power during scan testing can significantly reduce both logic and clock power during testing.
Abstract: A novel approach for minimizing power during scan testing is presented. The idea is that given a full scan module or core that has multiple scan chains, the test set is generated and ordered in such a way that some of the scan chains can have their clock disabled for portions of the test set. Disabling the clock prevents flip-flops from transitioning, and hence reduces switching activity in the circuit. Moreover, disabling the clock also reduces power dissipation in the clock tree which often is a major source of power. The only hardware modification that is required to implement this approach is to add the capability for the tester to gate the clock for one subset of the scan chains in the core. A procedure for generating and ordering the test set to maximize the we of scan disable is described. Experimental results are shown indicating that the proposed approach can significantly reduce both logic and clock power during testing.

159 citations


Patent
20 Mar 2001
TL;DR: In this paper, a method and system for synchronizing a receiver's clock to the clock of a transmitter is described, which employs a digital clock synthesizer that uses patterns superimposed upon a receiver oscillator to synthesize a clock rate that approximates the clock rate of the transmitter.
Abstract: A method and system for synchronizing a receiver's clock to the clock of a transmitter is disclosed herein. The disclosed system employs a digital clock synthesizer that uses patterns superimposed upon a receiver oscillator to synthesize a clock rate that approximates the clock rate of the transmitter. The pattern superimposed upon the receiver oscillator can be varied to allow for tracking of the variation in the transmitter clock.

154 citations


Proceedings ArticleDOI
19 Nov 2001
TL;DR: A novel approach for minimizing power consumption during scan testing of integrated circuits or embedded cores is presented, based on a gated clock scheme for the scan path and the clock tree feeding thescan path.
Abstract: Test power is now a big concern in large system-on-chip designs. In this paper, we present a novel approach for minimizing power consumption during scan testing of integrated circuits or embedded cores. The proposed low power technique is based on a gated clock scheme for the scan path and the clock tree feeding the scan path. The idea is to reduce the clock rate on scan cells during shift operations without increasing the test time. Numerous advantages can be found in applying such a technique.

153 citations


Patent
29 Jun 2001
TL;DR: In this article, a race detection circuit is used to compare the binary-coupled capacitors in a delay-locked loop to the input clock signal and the output clock signal.
Abstract: A delay-locked loop incorporates binary-coupled capacitors in a capacitor bank to produce a variable capacitance along a delay line. The variable capacitance allows a delay of the variable delay line to be varied. In response to an input clock signal, the variable delay line produces a delayed output clock signal that is compared at a race detection circuit to the input clock signal. If the delayed clock signal leads the input clock signal, the race detection circuit increments a counter that controls the binary-coupled capacitors. The incremented counter increases the capacitance by coupling additional capacitance to the variable delay line to delay propagation of the delayed clock signal. If the delayed clock signal lags the original clock signal, the race detection circuit decrements the counter to decrease the capacitance, thereby decreasing the delay of the variable delay line. The race detection circuit includes an arbitration circuit that detects when the delayed clock signal and the variable clock signal are substantially synchronized and disables incrementing or decrementing of the counter in response.

151 citations


Patent
13 Jul 2001
TL;DR: In this article, a DLL circuit or the like is configured so as to be capable of measuring the optimum number of cycles for a delay amount from the input of an external clock to the output of data through the use of a variable delay circuit.
Abstract: A DLL circuit or the like is configured so as to be capable of measuring the optimum number of cycles for a delay amount from the input of an external clock to the output of data through the use of a variable delay circuit and performing lock according to the measured number of cycles, whereby a clock generation circuit having a wide lock range can be implemented regardless of the performance of the variable delay circuit and a clock access time.

116 citations


Journal ArticleDOI
TL;DR: This paper constructs a clock-tree topology based on the locations and the activation frequencies of the modules, while the locations of the internal nodes of the clock tree are determined using a dynamic programming approach followed by a gate reduction heuristic.
Abstract: This paper presents a zero-skew gated clock routing technique for VLSI circuits. Gated clock trees include masking gates at the internal nodes of the clock tree, which are selectively turned on and off by the gate control signals during the active and idle times of the circuit modules to reduce the switched capacitance of the clock tree. We construct a clock-tree topology based on the locations and the activation frequencies of the modules, while the locations of the internal nodes of the clock tree (and, hence, the masking gates) are determined using a dynamic programming approach followed by a gate reduction heuristic. This work assumes that the gates are turned on/off by a centralized controller. Therefore, the additional power and routing area incurred by the controller and the gate control signal routing are examined. Various tradeoffs between power and area for different design options and module activities are discussed and detailed experimental results are presented. Finally, good design practices for implementing the gated clocks are suggested.

105 citations


Patent
05 Feb 2001
TL;DR: In this paper, a variable clock rate device and a method of operating the device are discussed, and a pixel clock and a memory read clock are set to the largest values when the display device is first initialized.
Abstract: A variable clock rate device and a method of operating the device. When the display device is first initialized, a pixel clock and a memory read clock are set to the largest values. If the CPU reads from the memory area, the frequency of the pixel clock and the memory read clock is adjusted according to the frequency of the CPU update on-screen memory and the variation of the CPU change on-screen memory block. On the contrary, if the CPU does not initiate any updating, the pixel clock and the memory read clock are tuned down to the smallest possible values to conserve electricity.

102 citations


Patent
26 Dec 2001
TL;DR: In this article, a synchronous semiconductor memory device operates an input/output buffer circuit in synchronization with an external clock signal in a single data rate SDRAM operation mode, and an internal clock signal of a frequency two times that of the external clock signals is generated.
Abstract: A synchronous semiconductor memory device operates an input/output buffer circuit in synchronization with an external clock signal in a single data rate SDRAM operation mode. In a double data rate SDRAM operation mode, an internal clock signal of a frequency two times that of the external clock signal is generated. The input/output buffer circuit is operated in synchronization with the internal clock signal.

101 citations


Patent
18 Apr 2001
TL;DR: In this article, a low power reconfigurable processor core includes one or more processing units, each unit having a clock input that controls the performance of the unit; and a controller having a plurality of clock outputs each coupled to the clock inputs of the processing units.
Abstract: A low power a reconfigurable processor core includes one or more processing units, each unit having a clock input that controls the performance of the unit; and a controller having a plurality of clock outputs each coupled to the clock inputs of the processing units, the controller varying the clock frequency of each processing unit to optimize power consumption and processing power for a task.

Journal ArticleDOI
TL;DR: In this paper, an activity-driven clock gate insertion problem was proposed to minimize the system's power consumption by constructing an activity driven clock tree, where sections of the clock tree are turned off by gating the clock signals.
Abstract: In this paper, we investigate reducing the power consumption of a synchronous digital system by minimizing the total power consumed by the clock signals We construct activity-driven clock trees wherein sections of the clock tree are turned off by gating the clock signals Since gating the clock signal implies that additional control signals and gates are needed, there exists a tradeoff between the amount of clock tree gating and the total power consumption of the clock tree We exploit similarities in the switching activity of the clocked modules to reduce the number of clock gates Assuming a given switching activity of the modules, we propose three novel activity-driven problems: a clock tree construction problem, a clock gate insertion problem, and a zero-skew clock gate insertion problem The objective of these problems is to minimize the system's power consumption by constructing an activity-driven clock tree We propose an approximation algorithm based on recursive matching to solve the clock tree construction problem We also propose an exact algorithm employing the dynamic programming paradigm to solve the gate insertion problems Finally, we present experimental results that verify the effectiveness of our approach This paper is a step in understanding how high-level decisions (eg, behavioral design) can affect a low-level design (eg, clock design)

Proceedings ArticleDOI
05 Feb 2001
TL;DR: The clocking methodology of the present Alpha microprocessor handles challenges by radically departing from a single chip-wide clock distribution, to better control clock skew, jitter and power dissipation.
Abstract: Single-wire, synchronous clocking systems for increasingly large and complex microprocessors present major technical challenges: Die size increases whereas target clock skew and jitter typically remain a constant percentage of a decreasing cycle time. The clocking methodology of the present Alpha microprocessor handles such challenges by radically departing from a single chip-wide clock distribution, to better control clock skew, jitter and power dissipation. Four major clocks (one reference and three derived) are used to clock separate chip sections.

Proceedings ArticleDOI
05 Feb 2001
TL;DR: This work implements an on-chip dynamic skew calibration multi-phase clock generator with ultra-low jitter that does not require any additional calibration cycle and therefore does not interrupt the clock output at any time.
Abstract: Time-interleaved architectures employ multiple signal processing paths in parallel to achieve high overall speed while maintaining relaxed speed requirements on individual channels. This type of architecture requires a precise multi-phase clock generator to implement the interleaving function with the performance of the system depending upon the uniformity of the clock signals. This work implements an on-chip dynamic skew calibration multi-phase clock generator with ultra-low jitter. This scheme does not require any additional calibration cycle and therefore does not interrupt the clock output at any time.

Patent
02 Mar 2001
TL;DR: In this paper, a source synchronous type interface circuit is proposed, in which, for fetching a transmitted data, a source signal indicating a data transmission timing is transmitted from transmission to reception side along with the data, so that a reception clock is generated to define an operation timing of a first reception flip-flop for taking in a data from the reception signal of the source synchronized clock.
Abstract: A source synchronous type interface circuit in which, for fetch of a transmitted data, a source synchronous clock indicating a data transmission timing is transmitted from transmission to reception side along with the data, so that a reception clock is generated to define an operation timing of a first reception flip-flop for taking in a data from the reception signal of the source synchronous clock. The interface further includes a second reception flip-flop for feeding an output from the first reception flip-flop further to a second reception flip-flop in synchronization with a common system clock and a variable delay circuit for absorbing phase fluctuations of the first reception flip-flop depending on transmission delay time, to assure a phase difference required for correctly receiving the data. The variable delay circuit has a delay amount automatically controlled according to phase differences between the system clock and the source synchronous clock received.

Patent
19 Sep 2001
TL;DR: In this article, a clock circuit for supporting a plurality of memory module types is provided, which includes a clock generator for producing a clock signal and a clock buffer having doubly defined clock pins for outputting the first or second type memory clock signal.
Abstract: A clock circuit for supporting a plurality of memory module types is provided. The clock circuit is connected to a first type memory module slot, and a second type memory module slot. The clock circuit includes a clock generator for producing a clock signal and a clock buffer having doubly defined clock pins for outputting the first type memory clock signal or the second type memory clock signal. The clock buffer receives the clock signal and outputs a first type memory clock signal to the first type memory clock pin. The doubly defined clock pin is also capable of outputting a second type memory clock signal to the second type memory clock pin. This invention is capable of using just a single clock buffer to drive a plurality of different memory module types.

Patent
Toshikazu Nakamura1
07 Aug 2001
TL;DR: In this paper, a synchronous dynamic memory (SDM) has a clock input buffer receiving an external clock and outputting an input external clock, a command input buffer receives commands, an address input buffer sending addresses, and a data input buffer received data.
Abstract: A synchronous dynamic memory has a clock input buffer receiving an external clock and outputting an input external clock, a command input buffer receiving commands, an address input buffer receiving addresses, and a data input buffer receiving data. During normal operation mode, the clock input buffer supplies the clock to the command, address, and data input buffers. During data hold modes, such as power down mode, the clock input buffer supplies the clock to the command input buffer but not to the address and data input buffers.

Patent
22 Jun 2001
TL;DR: A low power reconfigurable processor core includes one or more processing units, each unit having a clock input that controls the performance of the unit; a wireless transceiver transmitting and receiving at a frequency based on a wireless clock input; and a controller having a plurality of clock outputs each coupled to the clock inputs of the processing units and the wireless clock inputs, the clock outputs being generated from a common master clock.
Abstract: A low power reconfigurable processor core includes one or more processing units, each unit having a clock input that controls the performance of the unit; a wireless transceiver transmitting and receiving at a frequency based on a wireless clock input; and a controller having a plurality of clock outputs each coupled to the clock inputs of the processing units and the wireless clock input, the clock outputs being generated from a common master clock.

Patent
09 Oct 2001
TL;DR: In this article, a delay-locked loop adjusts a delay of a clock signal that is generated in response to an external clock signal, where the clock signal is applied to an output buffer to clock the buffer so that data or clock signals from the buffer are synchronized with the external clock signals.
Abstract: A delay-locked loop adjusts a delay of a clock signal that is generated in response to an external clock signal. The clock signal is applied to an output buffer to clock the buffer so that data or clock signals from the buffer are synchronized with the external clock signal. The output buffer operates in a full-drive and reduced-drive mode in response to an output drive strength bit having first and second logic states, respectively. The delay-locked loop adjusts the delay of the clock signal in response to the state of the output drive strength bit to keep the data or clock signals from the buffer synchronized during both modes of operation.

Patent
David J. Ayers1, Edward T. Grochowski1, David J. Sager1, Vivek Tiwari1, Ian Young1 
14 Jun 2001
TL;DR: In this article, a mechanism for adjusting the activity of an integrated digital circuit such as a processor to reduce voltage changes attributable to current changes triggered by clock gating is presented, where the processor includes one or more functional units and a current control circuit that monitors activity states of the processor's functional units to estimate the current consumed over n clock cycles.
Abstract: The present invention provides a mechanism for adjusting the activity of an integrated digital circuit such as a processor to reduce voltage changes attributable to current changes triggered by clock gating. The processor includes one or more functional units and a current control circuit that monitors activity states of the processor's functional units to estimate the current consumed over n clock cycles. The current control circuit estimates the current change for a given clock cycle from the n activity states and compares the estimated current change with first and second thresholds. The processors activity is decreased if the estimated current change is greater than the first threshold, and the processor activity is decreased if the estimated current change is less than the second threshold.

Patent
18 Oct 2001
TL;DR: In this paper, a duty-cycle regulation method for deriving an output clock signal having a predetermined duty cycle from an input clock signal with an arbitrary duty cycle is presented. But the method is not suitable for the case of a single-input single-output (SISO) clock.
Abstract: A duty-cycle regulation method for deriving an output clock signal having a predetermined duty cycle from an input clock signal having an arbitrary duty cycle. Once the input clock signal is received, an output clock storage element is switched to a first state upon detecting a transition in the input clock signal for driving the output clock signal to a first signal level. The output clock storage element is then switched to a second state after a delay interval equal to a fraction of the period for driving the output clock signal to a second signal level. The fraction of the period can be programmed to a pre-selected value.

Patent
02 Mar 2001
TL;DR: In this paper, an emulation controller is provided with timing information indicative of operation of an internal clock of an integrated circuit that drives internal data processing activity of the integrated circuit, and the digital bits are output to the emulation controller at an output clock rate that differs from the clock rate of the internal clock.
Abstract: An emulation controller (12) located externally of an integrated circuit (14) can be provided with timing information indicative of operation of an internal clock of the integrated circuit that drives internal data processing activity of the integrated circuit. In response to each cycle of the internal clock, a corresponding digital bit is produced to represent the internal clock cycle, and the digital bits are output to the emulation controller at an output clock rate that differs from the clock rate of the internal clock.

Patent
25 Oct 2001
TL;DR: In this paper, a charge pump system and associated variable-amplitude clock generation circuitry are provided for generating high voltages from a low initial voltage in applications such as erasing and programming electrically erasable programmable read only memory (EEPROM) arrays.
Abstract: A charge pump system and associated variable-amplitude clock generation circuitry are provided for generating high voltages from a low initial voltage in applications such as erasing and programming electrically erasable programmable read only memory (EEPROM) arrays. The charge pump system uses a power supply voltage and a clock and includes a first phase bootstrapping circuit, an inverter, and a second phase bootstrapping circuit, and charge pump circuitry. The two phase bootstrapping circuits are both responsive to the clock and provide first and second phase clock signals. The inverter is connected to the second phase bootstrapping circuit, causing the second phase clock signal to be opposite in phase from the first clock signal. The charge pump circuitry is responsive to the power supply voltage and the first and second phase clocks and uses native transistors that have lower threshold voltages. A high voltage is produced from the charge pump circuitry by alternately adding charge to the power supply voltage in each cycle of the first and second phase clock signals. The first and second phase clock signals increase in voltage as the voltage level in the charge pump increases in order to overcome increased effective transistor threshold voltages.

Patent
17 Oct 2001
TL;DR: In this paper, a semiconductor device for generating first and second internal clocks complementary with each other from an external clock and usable for both a system of a type using a complementary clock and a system that generates a 180° phase clock internally, is disclosed.
Abstract: A semiconductor device for generating first and second internal clocks complementary with each other from an external clock and usable for both a system of a type using a complementary clock and a system of a type generating a 180° phase clock internally, is disclosed. A first clock input circuit (buffer) is supplied with a first external clock and outputs a first internal clock. A second clock input circuit (buffer) is supplied with a second external clock complementary with the first external clock and outputs a second clock. A ½ phase clock generating circuit generates a ½ phase shift signal 180° out of phase with the first internal clock. A second external clock state detection circuit judges whether the second external clock is input to the second clock input buffer. A switch is operated to produce the second clock as the second internal clock when the second external clock is input and to produce the ½ phase shift signal as the second internal clock when the second external clock is not input, in accordance with the judgement at the second external clock state detection circuit.

Patent
28 Jun 2001
TL;DR: In this article, a clock driver circuit is provided that is capable of being turned on and off, prior to synchronization, and after synchronization, the clock driver is on, and a clock receiver circuit includes a clock detection circuit to detect the presence of an input clock signal.
Abstract: A simultaneous bidirectional port coupled to a bus combines a synchronization circuit and a clock circuit. The synchronization and clock circuit synchronizes the port with another simultaneous data port coupled to the same bus. A clock driver circuit is provided that is capable of being turned on and off. Prior to synchronization, the clock driver is off, and after synchronization, the clock driver is on. A clock receiver circuit includes a clock detection circuit to detect the presence of an input clock signal. When an integrated circuit is ready to communicate, the output clock driver is turned on and the clock detection circuit is monitored to determine when an input clock signal is received. When both the output clock driver is turned on, and an input clock signal is being received, the simultaneous bidirectional port is synchronized, and communication between integrated circuits can take place.

Patent
Frederic Boutaud1
15 Nov 2001
TL;DR: In this article, a clock selection circuit for selecting one of a plurality of clocks as an output clock is presented, where the selection circuit switches between two of the plurality of clock candidates for output.
Abstract: A clock selection circuit for selecting one of a plurality of clocks as an output clock. When the selection circuit switches between two of the plurality of clocks for output, the currently output clock is removed from the output. The removal of the currently output clock is performed synchronously to the currently selected clock. The newly selected clock is then coupled to the output. Coupling of the newly selected clock is performed synchronously to the newly selected clock.

Patent
09 Apr 2001
TL;DR: In this article, a clock synchronization circuit is used to synchronize a reference or system clock signal in a programmable logic device or field programmable gate array (FPG array).
Abstract: A programmable logic device or field programmable gate array includes an on-chip clock synchronization circuit to synchronize a reference or system clock signal. The clock synchronization circuit is a delay-locked loop (DLL) circuit in one implementation and a phase-locked loop (PLL) circuit in another implementation. The DLL or PLL circuits may be analog or digital. The clock synchronization circuit generates a synchronized clock signal that is distributed throughout the programmable integrated circuit. The synchronized clock signal is programmably connected to the programmable logic elements or logic array blocks (LABs) of the integrated circuit. The clock synchronization circuit reduces or minimizes clock skew when distributing a clock signal within the integrated circuit. The clock synchronization circuit improves the overall performance of the programmable logic integrated circuit.

Patent
12 Nov 2001
TL;DR: In this paper, a load-switching transistor is used to increase the load on the intermediate clock node, increasing the delay and reducing the output clock cycle cycle periods when the transistor is turned off.
Abstract: A clock modulator spreads the frequency spectrum of an input clock to generate an output clock. A capacitor is connected to an intermediate clock node by a load-switching transistor. When the transistor is turned on, the capacitor increases the loading on the intermediate clock node, increasing delay. When the transistor is turned off, the delay is reduced. Output clock cycle periods are extended when delay is added, and reduced when the transistor turns off. A counter or sequencer is clocked by the input clock and drives the load-switching transistor. The transistor is turned on and off for alternate cycles when the counter is a toggle flip-flop, spreading the frequency over two frequencies every two clock cycles. Two capacitors of different sizes, connected to the intermediate clock node by two transistors, can be switched by a 2-bit sequencer, spreading the output clock over 7 frequencies every 7 clock cycles.

Patent
14 Dec 2001
TL;DR: In this article, a clock tree insertion method for distributing a clock signal in an integrated circuit design includes providing a physical design representative of the integrated circuit (IC) design, specifying a location for a root node of the clock tree in the physical design, constructing an array of buffers as clock tree where the array of buffer is constructed to minimize the maximum insertion delay from the root node to the clock signal endpoints.
Abstract: A clock tree insertion method for distributing a clock signal in an integrated circuit design includes providing a physical design representative of the integrated circuit design, specifying a location for a root node of the clock tree in the physical design, constructing an array of buffers as the clock tree where the array of buffers is constructed to minimize the maximum insertion delay from the root node to the clock signal endpoints and to meet a predefined maximum insertion delay constraint, identifying locations in the clock tree where clock skew violations occur and correcting the clock skew violations by introducing delay at buffer locations in the clock tree having the fastest clock signal arrival times, and identifying locations in the clock tree where minimum insertion delay violations occur and correcting the minimum insertion delay violations by slowing down the arrival times of clock signal endpoints of the clock tree.

Patent
20 Nov 2001
TL;DR: In this article, the authors propose a heuristic algorithm and reordering of the buffers of the clock domain to balance clock delays in the domain, and to equalize clock delays of several domains of a group that have timing paths between them.
Abstract: Clock delays are changed in a clock network of an ASIC. Global skew optimization is achieved by restructuring a clock domain to balance clock delays in the domain, and by equalizing clock delays of several domains of a group that have timing paths between them. Clock delays are equalized using buffer chains affecting all leaves of the respective domain, and an additional delay coefficient that equalizes clock delay. The clock insertion delays are changed for each group by restructuring the buffers in the group, based on both the data and clock logics to optimize the paths. Local skew optimization is achieved by restructuring the clock domain using a heuristic algorithm and re-ordering the buffers of the domain. A computer program enables a processor to carry out the processes.