scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 1999"


Proceedings ArticleDOI
09 Jan 1999
TL;DR: This work proposes hardware mechanisms that dynamically recognize and capitalize on "narrow-bitwidth" instances and reduces processor power consumption by using aggressive clock gating to turn off portions of integer arithmetic units that will be unnecessary for narrow bitwidth operations.
Abstract: In general-purpose microprocessors, recent trends have pushed towards 64 bit word widths, primarily to accommodate the large addressing needs of some programs. Many integer problems, however, rarely need the full 64 bit dynamic range these CPUs provide. In fact, another recent instruction set trend has been increased support for sub-word operations (that is, manipulating data in quantities less than the full word size). In particular, most major processor families have introduced "multimedia" instruction set extensions that operate in parallel on several sub-word quantities in the same ALU. This paper notes that across the SPECint95 benchmarks, over half of the integer operation executions require 16 bits or less. With this as motivation, our work proposes hardware mechanisms that dynamically recognize and capitalize on these "narrow-bitwidth" instances. Both optimizations require little additional hardware, and neither requires compiler support. The first, power-oriented, optimization reduces processor power consumption by using aggressive clock gating to turn off portions of integer arithmetic units that will be unnecessary for narrow bitwidth operations. This optimization results in an over 50% reduction in the integer unit's power consumption for the SPECint95 and MediaBench benchmark suites. The second optimization improves performance by merging together narrow integer operations and allowing them to share a single functional unit. Conceptually akin to a dynamic form of MMX, this optimization offers speedups of 4.3%-6.2% for SPECint95 and 8.0%-10.4% for MediaBench.

304 citations


Patent
25 May 1999
TL;DR: In this paper, a disk drive includes a disk having a recording surface and a write element for writing a sequence of symbols in a continuous-time signal on the recording surface, and a programmable phase synthesizer for generating a secondary phase write clock signal.
Abstract: A disk drive includes a disk having a recording surface and a write element for writing a sequence of symbols in a continuous-time signal on the recording surface The disk drive includes a frequency generator for generating a plurality of primary phase write clock signals having a channel frequency f ch Each of the primary phase write clock signals has a selected primary phase shift from another one of the primary phase write clock signals The disk drive includes a programmable phase synthesizer for generating a secondary phase write clock signal having the channel frequency f ch and a selected secondary phase shift from one of the primary phase write clock signals The programmable phase synthesizer includes programmable means for selecting two of the primary phase write clock signals and element for performing vector addition of the selected primary phase write clock signals to generate the secondary phase write clock signal The secondary phase write clock signal is used for providing at least one of the symbols in the sequence of symbols to the write element

157 citations


Patent
09 Apr 1999
TL;DR: In this paper, a delay lock loop uses a clock phase shifter with a delay line to synchronize a reference clock signal with a skewed clock signal, which is received on a feedback input terminal of the delayed lock loop.
Abstract: A delay lock loop uses a clock phase shifter with a delay line to synchronize a reference clock signal with a skewed clock signal. The delay line is coupled to a reference input terminal of the delay lock loop and generates a delayed clock signal that is provided to the clock phase shifter. The clock phase shifter generates one or more phase-shifted clock signals from the delayed clock signal. An output generator coupled to the delay line, the clock phase shifter, and an output terminal of the delay lock loop provides either the delayed clock signal or one of the phase-shifted clock signals as an output clock signal of the delayed lock loop. The propagation delay of the delay line is set to synchronize the reference clock signal with the skewed clock signal, which is received on a feedback input terminal of the delay lock loop. A phase detector compares the reference clock signal and the skewed clock signal to determine the appropriate propagation delay for the delay line.

136 citations


Patent
16 Mar 1999
TL;DR: In this paper, an application specific integrated circuit (ASIC) has a clock controller that dynamically selects an appropriate clock frequency for a resource, which is determined by the bandwidth utilization of the controllers requesting access to the resource.
Abstract: An application specific integrated circuit (ASIC) has a clock controller that dynamically selects an appropriate clock frequency for a resource. The ASIC includes a central processing unit (CPU), on-chip memory, a memory controller controlling external memory devices, a system bus, and various peripheral controllers. Devices that can be accessed by other devices, such as the on-chip memory, the memory controller, and the system bus are “resources.” The devices that access the resources are “controllers.” The ASIC generates a master clock and the clock controller derives clocks for driving the resources and controllers from the master clock. A multiplexer (MUX) in the clock controller selects the clock that is passed to a resource. Each controller has a request line to the clock controller for signaling when the controller is accessing a resource. The clock controller has a programmable register for each controller holding a value representing the bandwidth utilization of the controller and an adder and a frequency table. The adder sums the contents of the bandwidth registers of the controllers that are accessing a resource. The sum is an index to an entry in a frequency table. The value held in the frequency table is applied to the selection inputs of the MUX to select the clock for the resource. If no controllers are requesting access to the memory controller, the clock controller shuts down the memory clock. Accordingly, the clock frequency of the resource is determined by the bandwidth utilization of the controllers requesting access to the resource.

124 citations


Patent
14 Jun 1999
TL;DR: In this article, a frequency determination circuit and a fine adjust circuit for phase synchronization with an external clock signal at coarse and fine precision, respectively, are presented. And a clock reproduction circuit is provided which generates an internal clock signal phase-locking with a reference clock signal stably even when the operating environment changes.
Abstract: A frequency determination circuit generating a clock signal phase-locking with an external clock signal at a coarse precision and a fine adjust circuit generating an internal synchronizing signal phase-locking with the external clock signal at a fine precision are provided. The fine adjust circuit has a function of adjusting the phase of the frequency determination circuit when phase synchronization is to be carried out exceeding the adjust range thereof. The frequency determination circuit and the fine adjust circuit receive a clock power supply voltage. A clock reproduction circuit is provided which generates an internal clock signal phase-locking with an external clock signal or a reference clock signal stably even when the operating environment changes.

103 citations


Patent
David I. Poisner1
29 Sep 1999
TL;DR: In this paper, an overclock deterrent mechanism of a chipset which comprises an over-clock detection circuit for detecting over-clocking of a system (processor) clock signal based on comparison of ratio of the system clock signal which is likely to be overclock and a fixed, stable reference clock signal that is highly unlikely to be overheating is presented.
Abstract: An over-clock deterrent mechanism of a chipset which comprises an over-clock detection circuit for detecting over-clocking of a system (processor) clock signal based on comparison of ratio of the system (processor) clock signal which is likely to be over-clocked and a fixed, stable reference clock signal which is highly unlikely to be over-clocked, and an over-clock prevention (thwarting) circuit for deterring such an over-clocking by either disabling operations of a computer system or significantly undermining key operations of a computer system.

101 citations


Patent
23 Mar 1999
TL;DR: In this article, a programmable delay module is proposed to ensure validity of data accessed from synchronous memory during a read operation, wherein the synchronous memories are operating synchronously at a high frequency system clock.
Abstract: A novel apparatus and method is disclosed to assure validity of data accessed from synchronous memory during a "read" operation, wherein the synchronous memory is operating synchronously at a high frequency system clock. The invention comprises a programmable delay module which generates a skewed clock signal which is used to clock in data read from the synchronous memory. The programmable delay module generates the skewed clock signal by adding programmable time delays to the system clock signal. The inserted delay increases the data valid window time available for the "read" operation and allows sufficient setup and hold time for valid data to be read by a memory controller.

97 citations


Journal ArticleDOI
TL;DR: Techniques that attempt to reduce glitching power consumption by minimizing propagation of glitches in the RTL circuit are developed, which include restructuring multiplexer networks, clocking control signals, and inserting selective rising/falling delays, in order to kill the propagate of glitches from control as well as data signals.
Abstract: We present design-for-low-power techniques for register-transfer level (RTL) controller/data path circuits. We analyze the generation and propagation of glitches in both the control and data path parts of the circuit. In data-flow intensive designs, glitching power is primarily due to the chaining of arithmetic functional units. In control-flow intensive designs, on the other hand, multiplexer networks and registers dominate the total circuit power consumption, and the control logic can generate a significant amount of glitches at its outputs, which in turn propagate through the data path to account for a large portion of the glitching power in the entire circuit. Our analysis also highlights the relationship between the propagation of glitches from control signals and the bit-level correlation between data signals. Based on the analysis, we develop techniques that attempt to reduce glitching power consumption by minimizing propagation of glitches in the RTL circuit. Our techniques include restructuring multiplexer networks (to enhance data correlations and eliminate glitchy control signals), clocking control signals, and inserting selective rising/falling delays, in order to kill the propagation of glitches from control as well as data signals. In addition, we present a procedure to automatically perform the well-known power-reduction technique of clock gating through an efficient structural analysis of the RTL circuit, while avoiding the introduction of glitches on the clock signals. Application of the proposed power optimization techniques to several RTL circuits shows significant power savings, with negligible area and delay overheads.

96 citations


Patent
06 Oct 1999
TL;DR: In this paper, a phase-locked loop is used to compare the phase of a reference clock applied to another clock to be compared to generate a control signal having a value corresponding to the phase difference between the phases of the reference clock and other clock, for generating the other clock using at least a plurality of delay elements connected into a loop.
Abstract: Delay circuitry includes a phase-locked loop or PLL for comparing the phase of a reference clock applied thereto with that of another clock to be compared to generate a control signal having a value corresponding to the phase difference between the phases of the reference clock and other clock, for generating the other clock using at least a plurality of delay elements connected into a loop, a time delay provided by each of the plurality of delay elements being controlled by the control signal, and for changing the value of the control signal so that the other clock is made to be in phase with the reference clock. The delay circuitry further includes a register for storing information to set a certain time delay, and a delay unit including a plurality of delay elements each of which provides an input with a time delay that is controlled by the control signal from the PLL, for determining the number of delay elements through which an input signal is to be passed according to the information stored in the register, so as to provide the input signal with the predetermined time delay.

95 citations


Journal ArticleDOI
TL;DR: A chip has been designed and tested to demonstrate the feasibility of an ultra-low-power, two-dimensional inverse discrete cosine transform (IDCT) computation unit in a standard 3.3-V process, which meets the sample rate requirements for MPEG-2 MP@ML.
Abstract: A chip has been designed and tested to demonstrate the feasibility of an ultra-low-power, two-dimensional inverse discrete cosine transform (IDCT) computation unit in a standard 3.3-V process. A data-driven computation algorithm that exploits the relative occurrence of zero-valued DCT coefficients coupled with clock gating has been used to minimize switched capacitance. In addition, circuit and architectural techniques such as deep pipelining have been used to lower the voltage and reduce the energy dissipation per sample. A Verilog-based power tool has been developed and used for architectural exploration and power estimation. The chip has a measured power dissipation of 4.65 mW at 1.3 V and 14 MHz, which meets the sample rate requirements for MPEG-2 MP@ML. The power dissipation improves significantly at lower bit rates (coarser quantization), which makes this implementation ideal for emerging quality-on-demand protocols that trade off energy efficiency and video quality.

93 citations


Patent
19 Mar 1999
TL;DR: In this paper, a charge pump for providing a desired boosted output voltage, a plurality of boosting stages are connected in series, and the pump also has a clock signal supply circuit for providing clock signals and a boost circuit for boosting the clock signals.
Abstract: In a charge pump for providing a desired boosted output voltage, a plurality of boosting stages are connected in series. The pump also has a clock signal supply circuit for providing clock signals and a boost circuit for boosting the clock signals. Clock signals derived from the clock signal supply circuit are supplied to each of the boosting stages on a former side. In contrast, a boosted clock signal derived from the clock signal boost circuit and a clock signal derived from the clock signal supply circuit are supplied to each of the boosting stages on a latter side.

Patent
28 Apr 1999
TL;DR: In this article, a synchronizing circuit accomplishes phase synchronization between a signal from a node nearest to the clock distributing circuit with an external clock signal through a plurality of clock transmission nodes arranged in a tree.
Abstract: To input buffers included in a peripheral pad group inputting an external signal and a DQ pad group for data input/output, clock signals from a synchronizing circuit are transmitted through a clock distributing circuit having a plurality of clock transmission nodes arranged in a shape of a tree. The synchronizing circuit accomplishes phase synchronization between a signal from a node nearest to the clock distributing circuit with an external clock signal. Thus, a skew in clock signals applied to the input and output buffers can be eliminated.

Patent
John T. Orton1, Cau L. Nguyen1, Gurbir Singh1, Xia Dai1, Raviprakash Nagaraj1, J. Pole Ii Edwin1 
30 Apr 1999
TL;DR: In this article, the clock generator is reset by the clock frequency change indication to change the clock's frequency while the component is in a low activity state (e.g., deep sleep, stop grant, or other state).
Abstract: A system includes a component (e.g., a processor) that includes a clock generator that generates an internal clock running at a frequency. A controller generates a clock frequency change indication and places the component into a low activity state (e.g., deep sleep, stop grant, or other state). The clock generator is reset by the clock frequency change indication to change the clock's frequency while the component is in the low activity state. Storage elements containing different values are selectable to set the clock frequency. The storage elements include fuse banks and input pins.

Patent
23 Nov 1999
TL;DR: In this paper, a flip-flop uses only one p-channel transistor to drive the output node strongly to achieve fast results, and the clock becomes the dominant controller of the output when it is located closest to the output.
Abstract: Techniques for providing improved memory flip-flops and other logic circuits are described. A flip-flop uses only one p-channel transistor to drive the output node strongly to achieve fast results. To reduce diffusion area, parallel logic is substantially eliminated and only series branches are used, in critical areas. This allows all pull-up transistors and/or all pull-down transistors to be formed from contiguous active areas. The D-to-Q path is reduced, and the clock is used to control the output. The clock becomes the dominant controller of the output when it is located closest to the output. Placing the clock devices closest to the clocked nodes reduces clock skew. The rising D response time and falling D response time are caused to be as close as possible to reduce the overall cycle time. To reduce parasitics in the circuit, complex-gates are used which are asymmetric. Even multiples of series branches per gate are used to share contacts and eliminate breaks in the layout diffusion. Adding complex-gates to a circuit while using asymmetric gates for smaller layouts achieves additional functionality. One component of the clock, along with the master drive circuit, is used to drive the slave latch of a flip-flop to avoid inserting additional gates into the logic of the fast output path. Reset and set circuitry is designed to be outside the critical path of the clock, and outside the slave latch, to provide rapid Q output response time to the clock and D inputs.

Journal ArticleDOI
TL;DR: A bottom-up approach for the automatic extraction and synthesis of dynamic power management circuitry starting from structural logic-level specifications is proposed, which leverage the compact BDD-based representation of Boolean and pseudo-Boolean functions to detect idle conditions where the clock can be stopped without compromising functional correctness.
Abstract: Recent results have shown that dynamic power management is effective in reducing the total power consumption of sequential circuits. In this paper, we propose a bottom-up approach for the automatic extraction and synthesis of dynamic power management circuitry starting from structural logic-level specifications. Our techniques leverage the compact BDD-based representation of Boolean and pseudo-Boolean functions to detect idle conditions where the clock can be stopped without compromising functional correctness. Moreover, symbolic techniques allow accurate probabilistic computations; in particular, they enable the use of non-equiprobable primary input distributions, a key step in the construction of models that match the behavior of real hardware devices with a high degree of fidelity. The results are encouraging, since power savings of up to 34% have been obtained on standard benchmark circuits.

Patent
26 Apr 1999
TL;DR: In this article, an apparatus to power up an integrated device from a low power state wherein the clock circuit for generating the internal clocks has been disabled is provided, where a small set of programmable registers are reserved inside the CPU interface unit (CIF).
Abstract: An apparatus to power up an integrated device from a low power state wherein the clock circuit for generating the internal clocks has been disabled is provided. A small set of programmable registers is reserved inside the CPU interface unit (CIF) of an integrated device (e.g., a display/graphics controller) which can be accessed by the CPU even during a low power state mode (e.g., software controlled sleep mode D3 in the preferred embodiment). The programmable registers store programmed bits that are used in indicating to the Power Management Unit (PMU) the desired power state and whether the clock circuits are to be enabled or disabled. The programmable registers also store multiplication and division factors to be used by the clock circuits in determining their clock rate. Using this information, the integrated device can go through a predetermined power sequence to transition from the low power state to the normal state which includes powering up the clock circuits (e.g., PLLs and oscillator).

Patent
Peter H. Alfke1, Alvin Y. Ching1, Scott O. Frake1, Jennifer Wong1, Steven P. Young1 
18 Jun 1999
TL;DR: In this paper, a clock gating circuit is provided for a logic device that reduces device resource requirements, eliminates the need for users to define their own clock gate, and eliminates undesirable clock signal disturbances, such as glitches and runt pulses.
Abstract: A clock gating circuit is provided for a logic device that reduces device resource requirements, eliminates the need for users to define their own clock gating circuit, and eliminates undesirable clock signal disturbances, such as glitches and runt pulses. In one embodiment, the clock gating circuit includes an input terminal for receiving an input clock signal; an input terminal for receiving a clock enable signal; a storage latch coupled to receive the input clock signal and the clock enable signal, and in response, provide a clock gate control signal; and a logic gate coupled to receive the input clock signal and the clock gate control signal. The logic gate selectively routes the input clock signal in response to the clock gate control signal, thereby providing an output clock signal.

Patent
04 Nov 1999
TL;DR: In this paper, the authors proposed a technique for activating an active-mode high frequency clock following a sleep period for use within a mobile station wherein selected components of the mobile station operate using a low power, low frequency sleep-mode clock during the sleep period and the faster high frequency active frequency clock during non-sleep periods.
Abstract: A technique for activating an active-mode high frequency clock following a sleep period for use within a mobile station wherein selected components of the mobile station operate using a low power, low frequency sleep-mode clock during the sleep period and the faster high frequency active-mode clock during non-sleep periods. In one embodiment, the technique is implemented by a device having a wake-up estimation unit for estimating a wake up time using the sleep-mode clock and a frequency drift compensation unit for compensating for any error in the estimated wake up time caused by frequency drift in the sleep-mode clock. An off-set time compensation unit is also provided for compensating for a lack of precision in the low frequency sleep-mode clock resulting in a possible error in the estimated wake up time. The lack of precision can result in an initial timing off-set error at the beginning of the sleep period and an final timing off-set error at the end of the sleep period. Both the frequency drift compensation unit and the off-set time compensation unit employ a high frequency transition-mode clock signal for use in calculating the time required to adjust the wake-up time. The transition-mode clock, which may have the same frequency as the active-mode clock, is employed only at the beginning and end of the sleep period and is deactivated throughout most of the sleep period to reduce power consumption.

Proceedings ArticleDOI
17 Aug 1999
TL;DR: This paper introduces an architectural approach for reducing inductive noise due to clock-gating through gradual activation/deactivation of units, which provides a 2/spl times/ reduction in ground bounce on a 16 bit ALU simulated in SPICE, while reducing simulated SPEC95 performance by less than 5% on a typical superscalar architecture.
Abstract: As we approach Gigascale Integration, chip power consumption is becoming a critical system parameter. Clock-gating idle units provides needed reductions in power consumption. However, it introduces inductive noise that can limit voltage scaling. This paper introduces an architectural approach for reducing inductive noise due to clock-gating through gradual activation/deactivation of units. This technique provides a 2/spl times/ reduction in ground bounce on a 16 bit ALU simulated in SPICE, while reducing simulated SPEC95 performance by less than 5% on a typical superscalar architecture.

Proceedings ArticleDOI
17 Aug 1999
TL;DR: Modifications of zero-skew tree algorithms are looked at to consider both the physical and logical aspects of hierarchical gating, applied to data taken from a low power ASIC design.
Abstract: Gating the clock is an important technique used in low power design to disable unused modules of a circuit. Gating can save power by both preventing unnecessary activity in the logic modules as well as by eliminating power dissipation in the clock distribution network. There is an inherent pitfall though in implementing gating groups for hierarchical gated clock distribution because the groups are typically developed at the logic level with no information of the physical layout of the clocktree. Depending on the distribution of underlying sinks, maintaining gating groups can cause a wiring overhead that is potentially greater than the savings due to reduced switching. We look at modifications of zero-skew tree algorithms to consider both the physical and logical aspects of hierarchical gating. The algorithms are applied to data taken from a low power ASIC design. The best gated clocktree is created using both physical and logical information.

Patent
13 Jul 1999
TL;DR: In this paper, a power supply controlling circuit by which further reduction of power consumption in a circuit can be achieved includes a clock controller, which detects a processing state of a module based on an amount of data stored in a FIFO memory.
Abstract: A power supply controlling circuit by which further reduction of power consumption in a circuit can be achieved includes a clock controller. The clock controller detects a processing state of a module based on an amount of data stored in a FIFO memory. For example, when the load to the module is not very high, the clock controller continuously lowers the frequency of a system clock signal to be supplied to the module and continuously lowers the power supply voltage to the module.

Proceedings ArticleDOI
10 Jan 1999
TL;DR: This paper presents a methodology to identify registers and flip flops in a circuit for which the clock input can be gated with a control signal and presents an algorithm to estimate the power saving obtained by gating the clock and the performance penalty associated with the introduction of gating logic.
Abstract: In synchronous circuits, the clock signal switches at every clock cycle and drives a large capacitance. As a result, the clock signal is a major source of dynamic power dissipation. Significant power savings can be obtained by identifying periods of inactivity in parts of the circuit, and disabling the clock to those parts of the circuit at the appropriate times. Selectively disabling the clock in this manner is referred to as clock gating. In this paper/sup 1/, we present a methodology to identify registers and flip flops in a circuit for which the clock input can be gated with a control signal. We also generate the combinational logic to produce this control signal. We present an algorithm to estimate the power saving obtained by gating the clock and the performance penalty (if any) associated with the introduction of gating logic. The algorithm generates the clock gating logic which is inserted appropriately into the original circuit to produce a low power, gated clock version of the circuit.

Proceedings ArticleDOI
24 May 1999
TL;DR: In this article, a wireless interconnect system has been proposed for global clock signal distribution, which transmits and receives signals at 20 GHz or higher, and the received signal is then amplified, frequency divided to 4 GHz or lower, and buffered to provide a clock signal to the local clock distribution system.
Abstract: A wireless interconnect system has been proposed for global clock signal distribution. The system transmits and receives signals at 20 GHz or higher. The received signal is then amplified, frequency divided to 4 GHz or lower, and buffered to provide a clock signal to the local clock distribution system. An analysis comparing the projected power dissipation of a wireless clock distribution system to conventional grid-based and H-tree based distribution systems for 0.1 /spl mu/m generation microprocessors is performed, based on the total capacitive loading of the global distribution system. The results show that in terms of power dissipation, the wireless clock distribution system should be comparable to conventional systems.

Patent
16 Aug 1999
TL;DR: In this article, the efficiency of clock gating within low power clock trees has been investigated, where a correlation level between a plurality of clock gate signals and their corresponding gates which gate a source clock is determined.
Abstract: Methods are provided for improving the efficiency of clock gating within low power clock trees. In a first aspect, a correlation level between a plurality of clock gating signals and their corresponding gates which gate a source clock is determined. The clock gating signals and their corresponding gates are combined into a single clock gating signal and a single corresponding gate if a preselected level of correlation exists therebetween. In a second aspect, an area overlap is determined for a plurality of sinks, and one of the gated drovers of the sinks is removed. The sinks of the removed gated driver then are connected to a remaining gated driver driven by a single clock gating signal and a single corresponding gate. In a third aspect, physically proximate sink clusters are rewired to generate a pure clock gating group within each sink cluster if rewiring the clusters increases wiring length by less than a predetermined amount. In a fourth aspect, a clock gating group is selected and the power dissipation is computed for all sinks within the selected group assuming all the sinks therein are wired without clock gating. The power dissipation also is computed assuming all the sinks therein are gated. If the power dissipation for all sinks within the selected group is reduced by individually wiring the sinks therein, the group is ungated. A computer program product also is provided having a computer readable medium with means for performing the first, second, third and/or fourth aspects of the invention.

Patent
David L. Thompson1
09 Apr 1999
TL;DR: In this article, a data monitor is used to measure, sense or detect signals, inputs or outputs in a medical device before they are input to a principal or main digital signal processor, controller or microprocessor.
Abstract: Power consumption in medical and battery powered devices is reduced through the use and operation of a data monitor which measures, senses or detects signals, inputs or outputs in a medical device before they are input to a principal or main digital signal processor, controller or microprocessor. In response to detecting or measuring such a signal which meets certain amplitude, frequency and/or phase characteristics, the data monitor directs or controls clock or voltage supply circuits to increase or decrease clock frequency, or to increase or decrease the voltage provided to certain circuits within the medical device. The clock frequencies and/or voltages so employed are tailored to reduce the amount of power consumed by the medical device while preserving computational performance.

Proceedings ArticleDOI
30 May 1999
TL;DR: This work describes a methodology for partitioning a design into large synchronous blocks each having its own clock and presents results of applying it to a realistic design done in 0.25 micron, showing that the net power savings compared to fully synchronous designs are on average about 30%.
Abstract: Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way to obviate the global clock net is to partition the design into large synchronous blocks each having its own clock. Data with other blocks is exchanged asynchronously using handshake signals. Adopting such a strategy requires a methodology that supports: 1) a partitioning method dividing a design into the number of synchronous blocks such that the gain due to global clock net removal exceeds the communication overhead and 2) synthesis of handshake protocols to implement the data transfer between synchronous blocks. We describe this methodology and present results of applying it to a realistic design done in 0.25 micron, ranging in operating frequencies from 20 MHz to 1 GHz. The results show that the net power savings compared to fully synchronous designs are on an average about 30%.

Patent
Larry B. Li1, Akbar Ali1, Matteo Conta1
30 Apr 1999
TL;DR: In this article, a gated delay-locked loop is proposed to generate an output clock in phase with and having a frequency which is an integer multiple of the frequency of a reference clock.
Abstract: A gated-delay locked loop that generates an output clock in phase with and having a frequency which is an integer multiple of the frequency of a reference clock. The gated delay-locked loop includes a voltage-controlled gated oscillator having first and second serially connected voltage-controlled delay elements that each introduce a time delay to produce a first delayed clock and the output clock. An S-R flip-flop receives the first delayed clock on its R-input and either the output clock or the reference clock on its S-input to produce a loop clock. The loop clock is provided to the first delay element. A multiplexer selects the reference clock as the S input to the flip-flop once every N cycles, and selects the output clock as the S input the remaining N−1 cycles. A phase detector, a charge pump and a loop filter compare the phase of the output clock to the phase of the reference clock and apply a voltage to the delay elements to correct any phase differences.

Patent
04 Jan 1999
TL;DR: In this article, an integrated circuit chip comprises a plurality of clock distribution sub-networks each including a clock input for receiving a clock signal, each of the clock distribution subsets having a capacitance, as seen from the clock input, substantially equivalent to others of the subsets.
Abstract: An integrated circuit chip comprises a plurality of clock distribution sub-networks each including a clock input for receiving a clock signal, each of the clock distribution sub-networks having a capacitance, as seen from the clock input, substantially equivalent to others of the clock distribution sub-networks; and a structured clock buffer having a size based on a load of the clock distribution sub-networks, and providing the clock signal to the clock distribution sub-networks.

Patent
28 May 1999
TL;DR: In this article, an outer clock is inputted to a first phase conversion part 51 and the first phase part 51 selects an upper limit to which a feedback clock belongs in four upper limits.
Abstract: PROBLEM TO BE SOLVED: To provide a circuit for reducing the size of a jitter by outputting an inner clock in a clock phase correction circuit, outputting a feedback clock, comparing the phases of them, outputting a detection signal, outputting a control signal, inverting an outer clock, receiving the feedback clock and reducing the phase difference. SOLUTION: An outer clock is inputted to a first phase conversion part 51 and the first phase conversion part 51 selects an upper limit to which a feedback clock belongs in four upper limits. Two outputs Out 11 and Out 12 being the references of the upper limit are outputted. Outer clock Extclk and the inverse of Extclk are inputted to the first phase conversion part 51, and it outputs a signal A having the phase of 90 degrees and the signal, the inverse of A, which has the phase of 270 degrees, to a first multiplexer. A second phase conversion part 53 selects the signal A outputted by the first phase conversion part 51 and a signal Out 21 having the intermediate phase of the signal, the inverse of Extclk signal and transmits them to output Out 22.

Journal ArticleDOI
TL;DR: A nonfeedback CMOS digital-clock-generator, direct-skew-detect synchronous-mirror-delay (direct SMD) circuit has been developed that achieves clock-s Skew suppression in only two clock cycles for application-specific integrated circuits having unfixed and various clock paths.
Abstract: A nonfeedback CMOS digital-clock-generator, direct-skew-detect synchronous-mirror-delay (direct SMD) circuit has been developed that achieves clock-skew suppression in only two clock cycles for application-specific integrated circuits having unfixed and various clock paths. The direct SMD circuit detects both clock skew and clock cycle by using a direct-skew detector and clock-suspension circuitry. The skew-detection scheme removes the phase errors caused by delay in the clock-driver circuit. Measurements demonstrated that the direct SMD circuit eliminates various amounts of clock skew (2.0-3.0 ns) at 200 MHz in two clock cycles.