scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2008"


Journal ArticleDOI
TL;DR: In this paper, an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz.
Abstract: This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

645 citations


Patent
17 Sep 2008
TL;DR: In this paper, a chien search apparatus is used to evaluate an error locator polynomial having a known rank and including a sequence of terms for each element in a finite field whose elements correspond respectively to bits in each of a stream of data blocks to be decoded.
Abstract: Chien search apparatus operative to evaluate an error locator polynomial having a known rank and including a sequence of terms for each element in a finite field whose elements correspond respectively to bits in each of a stream of data blocks to be decoded, the apparatus comprising a sequence of functional units each operative to compute a corresponding term in the sequence of terms included in the error locator polynomial, each term having a degree; and a power saving unit operative to de-activate at least one individual functional unit from among the sequence of functional units, the individual functional unit being operative, when active, to compute a term whose degree exceeds the rank.

153 citations


Proceedings ArticleDOI
10 Mar 2008
TL;DR: This book explores existing solutions for power-aware test and design-for-test of conventional circuits and systems, and surveys test strategies and EDA solutions for testing low power devices.
Abstract: Managing the power consumption of circuits and systems is now considered one of the most important challenges for the semiconductor industry Elaborate power management strategies, such as dynamic voltage scaling, clock gating or power gating techniques, are used today to control the power dissipation during functional operation The usage of these strategies has various implications on manufacturing test, and power-aware test is therefore increasingly becoming a major consideration during design-for-test and test preparation for low power devices This book explores existing solutions for power-aware test and design-for-test of conventional circuits and systems, and surveys test strategies and EDA solutions for testing low power devices

148 citations


Book ChapterDOI
27 Jan 2008
TL;DR: This paper investigates the influence of register file partitions, register file sizes and the interconnection topology of ADRES, and proposes an enhanced architecture instantiation that improves performance by 60 - 70% and reduces energy by 50%.
Abstract: Reconfigurable architectures provide power efficiency, flexibility and high performance for next generation embedded multimedia devices ADRES, the IMEC Coarse-Grained Reconfigurable Array architecture and its compiler DRESC enable the design of reconfigurable 2D array processors with arbitrary functional units, register file organizations and interconnection topologies This creates an enormous design space making it difficult to find optimized architectures Therefore, architectural explorations aiming at energy and performance trade-offs become a major effort In this paper we investigate the influence of register file partitions, register file sizes and the interconnection topology of ADRES We analyze power, performance and energy delay trade-offs using IDCT and FFT as benchmarks while targeting 90nm technology We also explore quantitatively the influences of several hierarchical optimizations for power by applying specific hardware techniques, ie clock gating and operand isolation As a result, we propose an enhanced architecture instantiation that improves performance by 60 - 70% and reduces energy by 50%

92 citations


Proceedings ArticleDOI
07 Apr 2008
TL;DR: This paper proposes in this paper a generic way to instrument a SystemC/TLM platform in order to model power consumption at a coarse grain and has been applied to model an advanced DVFS architecture based on a network-on-chip.
Abstract: With growing integration, power consumption is becoming a challenging issue for mobile systems. Todaypsilas complex SoCs integrate advanced power management strategies, at both HW and SW level. HW mechanisms such as clock gating, power switches or voltage and frequency scaling optimize dynamically the power profile. In such architectures, power estimation at application level is a major concern for proper power optimization. SystemC at the transaction level is adapted and largely adopted by the industry as a simulation tool. We propose in this paper a generic way to instrument a SystemC/TLM platform in order to model power consumption at a coarse grain. The proposed approach has been applied to model an advanced DVFS architecture based on a network-on-chip.

63 citations


Patent
16 May 2008
TL;DR: In this paper, a low precision clock is included in the subset of electronic components for measuring time in the main clock stop status of an IC card, where the clock signal is suspended for avoiding a maximum power consumption threshold.
Abstract: An IC Card may include electronic components to receive a power supply and a main clock signal by a reader device. The power supply may be provided to a subset of the electronic components during a main clock stop status wherein the main clock signal is suspended for avoiding a maximum power consumption threshold. The IC Card may also include a low precision clock included in the subset of electronic components for measuring time in the main clock stop status.

63 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe RF1 and RF2, two level-clocked test-chips that deploy resonant clocking to reduce power consumption in their clock distribution networks.
Abstract: This paper describes RF1 and RF2, two level-clocked test-chips that deploy resonant clocking to reduce power consumption in their clock distribution networks. It also highlights RCL, a novel resonant-clock latch-based methodology that was used to design the two test-chips. RF1 and RF2 are 8-bit 14-tap finite-impulse response (FIR) filters with identical architectures. Designed using a fully automated ASIC design flow, they have been fabricated in a commercial 0.13 mum bulk silicon process. RF1 operates at clock frequencies in the 0.8-1.2 GHz range and uses a single-phase clocking scheme with a driven clock generator. Resonating its 42 pF clock load at 1.03 GHz with Vdd = 1.13 V, RF1 dissipates 132 mW, achieving a clock power reduction of 76% over conventional switching. RF2 achieves higher clock power efficiency than RF1 by relying on a two-phase clocking scheme with a distributed self-resonant clock generator. Resonating 38 pF of clock load per phase at 1.01 GHz with Vdd = 1.08 V, RF2 dissipates 124 mW and achieves 84% reduction in clock power over conventional switching. At 133 nW/MHz/Tap/InBit/CoeffBit, RF2 features the lowest figure of merit for FIR filters published to date.

58 citations


Proceedings ArticleDOI
01 Feb 2008
TL;DR: By modifying the Cell broadband engine processor to incorporate a large resonant global clock network, power savings with full functionality is demonstrated over a 20% range in clock frequencies, and a 6-8 Watt power savings at 4 GHz.
Abstract: Resonant clocking techniques show promise in reducing global clock power and timing uncertainty (skew and jitter). By resonating the large global clock capacitance with an inductance, the energy used to charge the clock node each period can be recycled within the LC tank network, resulting in lower clock power. Additional power savings are realized by reducing the strength of clock drivers because only losses need to be overcome at resonance. Skew and jitter are improved due to the bandpass characteristic of the LC network and the use of fewer clock buffering stages. We describe how the Cell Broadband Engine (Cell BE) processor is experimentally transformed to have a resonant-load global clock distribution similar to the one in (Chan et al., 2004).

50 citations


Patent
22 Feb 2008
TL;DR: In this paper, the authors present a method of synchronizing counters in two different clock domains within a memory device, which is comprised of generating a start signal for initiating production of a running count of clock pulses of a read clock signal in a first counter downstream of a locked loop and delaying the input of the start signal to a second counter upstream of the locked loop to delay the initiation of control clock pulses by an amount equal to a predetermined delay.
Abstract: A method of synchronizing counters in two different clock domains within a memory device is comprised of generating a start signal for initiating production of a running count of clock pulses of a read clock signal in a first counter downstream of a locked loop and delaying the input of the start signal to a second counter upstream of the locked loop to delay the initiation of a running count of control clock pulses by an amount equal to a predetermined delay. Another disclosed method is for controlling the output of data from a memory device comprising deriving from an external clock signal a control clock for operating an array of storage cells and a read clock, both the control clock and the read clock being comprised of clock pulses. A start signal is generated for initiating production of a running count of the read clock pulses in a first counter. The start signal may be produced when a locked loop achieves a lock between the read clock and the control clock. The input of the start signal to a second counter is delayed to delay the initiation of a running count of the control clock pulses. The delay, which may be expressed as an integer number of clock cycles, may be equal to an input/output delay of the memory device. The method may be modified by inputting the start signal to an offset counter before initiating the production of the running count of the read clock pulses in the first counter. The offset counter may be loaded with a value equal to a programmed latency less a synchronization overhead. Once the running counts are initiated, each time a read command is received, a then current value of the running count of control clock pulses from the second counter is latched or held. The held value is compared to the running count of read clock pulses from the first counter, with the read clock signal being used to output data in response to the comparison. Apparatus for implementing the disclosed methods are also disclosed. Because of the rules governing abstracts, this abstract should not be used to construe the claims.

50 citations


Patent
Yantao Ma1
28 Oct 2008
TL;DR: In this paper, memories, multi-phase clock signal generators, and methods for generating multiphase duty cycle corrected clock signals are disclosed. But the authors do not consider the use of a clock signal generator with a delay-locked loop having a first multi-tap adjustable delay line.
Abstract: Memories, multi-phase clock signal generators, and methods for generating multi-phase duty cycle corrected clock signals are disclosed. For example, one such clock signal generator includes a delay-locked loop having a first multi-tap adjustable delay line configured to delay a reference signal to provide a plurality of clock signals having different phases relative to the reference clock signal. A periodic signal generated by the delay-locked loop is provided to a second multi-tap adjustable delay line as an input clock signal. Clock signals from taps of the second multi-tap adjustable delay line are provided as the multi-phase duty cycle corrected clock signals.

47 citations


Proceedings ArticleDOI
01 Nov 2008
TL;DR: This paper proposes the design methodology for reducing the delay of the logic circuits during active mode, which limits the maximum value of transition current to a specified value and eliminates short circuit current.
Abstract: Multi-threshold CMOS (MTCMOS) power gating is a design technique in which a power gating transistor is connected between the logic transistors and either power or ground, thus creating a virtual supply rail or virtual ground rail, respectively. Power gating transistor sizing, transition (sleep mode to active mode) current, short circuit current and transition time are design issues for power gating design. The use of power gating design results in the delay overhead in the active mode. If both nMOS and pMOS sleep transistor are used in power gating, delay overhead will increase. This paper proposes the design methodology for reducing the delay of the logic circuits during active mode. This methodology limits the maximum value of transition current to a specified value and eliminates short circuit current. Experiment results show 16.83% reduction in the delay.

Proceedings ArticleDOI
08 Jun 2008
TL;DR: This paper proposes to exploit the existing clock gating in order to extract stronger gating conditions for blocks that are poorly gated or not gated at all, and presents a uniform treatment of unobservability and stability as dual approaches for propagating gating Conditions forward and backward.
Abstract: Clock gating has become a standard practice for saving dynamic power in the clock network. Due to design reuse, it is common to find designs that have already some partial clock gating. We propose to exploit the existing clock gating in order to extract stronger gating conditions for blocks that are poorly gated or not gated at all. A second contribution of our paper is a robust and scalable approach to extract stability conditions for clock gating. Finally, we present a uniform treatment of unobservability and stability as dual approaches for propagating gating conditions forward and backward. Experimental results demonstrate significant power reduction (in the range of 14% -- 55% of the clock power) on Intel micro-processor designs.

Patent
24 Jun 2008
TL;DR: In this article, a charge pump system is formed on an integrated circuit that can be connected to an external power supply, and a clock circuit is coupled to provide a clock output, at whose frequency the charge pump operates and generates output voltage from an input voltage.
Abstract: A charge pump system is formed on an integrated circuit that can be connected to an external power supply. The system includes a charge pump and a clock generator circuit. The clock circuit is coupled to provide a clock output, at whose frequency the charge pump operates and generates an output voltage from an input voltage. The clock frequency is a decreasing function of the voltage level of the external power supply. This allows for reducing power consumption in the charge pump system formed on a circuit connectable to an external power supply.

Journal ArticleDOI
TL;DR: Low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems are proposed and the use of multiple clock domains and clock gating reduces the power consumption.
Abstract: In this paper, we propose low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems. The objective of the work is the optimization, with respect to power, area, and latency, of both the signal processing algorithms themselves and their implementation. Novel circuit design strategies have been employed to realize optimal hardware and power efficient architectures for the fast Fourier transform, arc tangent computation unit, numerically controlled oscillator, and the decimation filters. The use of multiple clock domains and clock gating reduces the power consumption further. These blocks have been integrated into an experimental digital baseband processor for the IEEE 802.11a standard implemented in the 0.25mum- 5-metal layer BiCMOS technology from Institute for High Performance Microelectronics.

Proceedings ArticleDOI
08 Jun 2008
TL;DR: This paper presents a novel clock tree design style, called type-matching clock tree, to ensure that the logic gates at the same level are in the same type, and proposes a zero skew gated clock tree synthesis algorithm that can significantly reduce the clock skew in every process corner.
Abstract: Clock skew minimization is always very important in the clock tree synthesis. Due to clock gating, the clock tree may include different types of logic gates, e.g., AND gates, OR gates, and buffer gates. If the logic gates at the same level are in different types, which have different timing behaviors, the control of clock skew becomes difficult. Based on that observation, in this paper, we present a novel clock tree design style, called type-matching clock tree, to ensure that the logic gates at the same level are in the same type. We prove that any clock control logic can always be transformed to our type-matching clock tree. Then, based on the idea of type-matching clock tree, we propose a zero skew gated clock tree synthesis algorithm. Compared with the industry- strength gated clock tree synthesis, experimental data show that our approach can significantly reduce the clock skew in every process corner with a small penalty on the clock tree area and the clock tree power consumption.

Proceedings ArticleDOI
02 Jul 2008
TL;DR: This work presents a compact and low power NTRU design that is suitable for pervasive security applications such as RFIDs and sensor nodes and is also the first one to present a complete N TRU design with encryption/decryption circuitry.
Abstract: NTRU is a public-key cryptosystem based on the shortest vector problem in a lattice which is an alternative to RSA and ECC. This work presents a compact and low power NTRU design that is suitable for pervasive security applications such as RFIDs and sensor nodes. We have designed two architectures, one is only capable of encryption and the other one performs both encryption and decryption. The strategy for the designs includes clock gating of registers, operand isolation and precomputation. This work is also the first one to present a complete NTRU design with encryption/decryption circuitry. Our encryption-only NTRU design has a gate-count of 2.8 kgates and dynamic power consumption of 1.72 muW. Moreover, encryption-decryption NTRU design consumes about 6 muW dynamic power and consists of 10.5 kgates.

Proceedings ArticleDOI
01 Apr 2008
TL;DR: This work proposes task activity vectors describing which functional units a task uses to what degree and implemented several vector-based scheduling strategies for Linux, showing that vector- based scheduling considerably reduces hotspots.
Abstract: Non-uniform utilization of functional units in combination with hardware mechanisms such as clock gating leads to different power consumptions in different parts of a processor chip. This in turn leads to non-uniform temperature distributions and problematic local hotspots, depending on the characteristics of the currently running task. The operating system's scheduler, responsible for deciding which task to run at what time, can influence temperature distribution. Our work investigates what the operating system can do to alleviate the problem of hotspots. We propose task activity vectors describing which functional units a task uses to what degree. With the knowledge provided by these vectors, the scheduler can schedule tasks using different units successively, distribute tasks using a particular unit excessively over the system's processors, or mix tasks using different units on a SMT processor. We implemented several vector-based scheduling strategies for Linux. Our evaluations show that vector-based scheduling considerably reduces hotspots.

Proceedings ArticleDOI
24 Nov 2008
TL;DR: CTX effectively reduces launch switching activity, thus yield loss risk, even with a small number of donpsilat care (X) bits as in test compression, without any impact on test data volume, fault coverage, performance, and circuit design.
Abstract: At-speed scan testing is susceptible to yield loss risk due to power supply noise caused by excessive launch switching activity. This paper proposes a novel two-stage scheme, namely CTX (Clock-Gating-Based Test Relaxation and X-Filling), for reducing switching activity when test stimulus is launched. Test relaxation and X-filling are conducted (1) to make as many FFs inactive as possible by disabling corresponding clock-control signals of clock-gating circuitry in Stage-1 (Clock-Disabling), and (2) to make as many remaining active FFs as possible to have equal input and output values in Stage-2 (FF-Silencing). CTX effectively reduces launch switching activity, thus yield loss risk, even with a small number of donpsilat care (X) bits as in test compression, without any impact on test data volume, fault coverage, performance, and circuit design.

Patent
29 May 2008
TL;DR: In this paper, a system and method for time synchronization on a network is provided, where a slave clock device does not continuously receive a time synchronization message periodically transferred from a master clock device and thus does not correct its time upon all such occasions.
Abstract: A system and method for time synchronization on a network is provided. According to the system and method for time synchronization, a slave clock device does not continuously receive a time synchronization message periodically transferred from a master clock device and thus does not correct its time upon all such occasions. Rather, the slave clock device requests time information from the master clock device only when the slave clock device needs to correct its time, and receives a time synchronization message transferred from the master clock device and compensates for its time deviation only while the slave clock device is activated, thereby reducing its power consumption and amount of computation.

Patent
21 May 2008
TL;DR: In this paper, a high-voltage complementary metal-oxide semiconductor (CMOS) charge pump is presented, which includes a first Dickson charge pump for doubling a supply voltage based on an input clock signal and a complementary input signal with reversed phases to each other.
Abstract: Provided is a high-voltage complementary metal-oxide semiconductor (CMOS) charge pump. The high-voltage CMOS charge pump includes a first Dickson charge pump for doubling a supply voltage based on an input clock signal and a complementary input clock signal with reversed phases to each other; a level shifter for doubling voltage levels of the input clock signal and the complementary input clock signal based on an output signal and a complementary output signal of the first Dickson charge pump as power sources, to thereby output a doubled-output clock signal and a doubled-complementary output clock signal; and a second Dickson charge pump for doubling voltage levels of the output signal and the complementary output signal based on the doubled-output clock signal and the doubled-complementary output clock signal from the level shifter.

Journal ArticleDOI
TL;DR: An all-digital fast-locking programmable DLL-based clock generator is presented, and by resetting the output clock every two input clock periods, the initial minimal delay constraint in the conventional architecture is eliminated.
Abstract: An all-digital fast-locking programmable DLL-based clock generator is presented. By resetting the output clock every two input clock periods, the initial minimal delay constraint in the conventional architecture is eliminated. Compared with the previous work, the short locking time is also achieved. The proposed circuit has been fabricated in 0.35-mum CMOS process and occupies the active area of 0.216 mm2. The clock multiplication ratio is programmed from 2 to 15. The frequency ranges of the input and output clocks are 4 ~ 200 MHz and 60 ~ 450 MHz, respectively. It dissipates less than 17 mW at all operating frequencies from a 3.3-V supply.

Patent
06 Aug 2008
TL;DR: In this paper, the authors propose to gate the input clock signal at the clock input to each domain of the processor device rather than at the output of the PLL clock, which can provide a more natural state for the circuit during testing as well as allow the test control unit to test the different domains of the device individually.
Abstract: An apparatus or method for testing of a SOC processor device may minimize interference that is caused by interfacing a comparatively low-speed testing device with the high-speed processor during testing. Implementations may gate the input clock signal at the clock input to each domain of the SOC processor device rather than at the output of the PLL clock. The gating of the clock signal to each domain may then be controlled by clock stop signals generated by the testing device and sent to the individual domains of the processor device. Gating the clock signal at the domain may provide a more natural state for the circuit during testing as well as allow the test control unit to test the different domains of the SOC device individually.

Patent
05 Feb 2008
TL;DR: In this article, the authors present a system controller and a configuration of series-connected semiconductor devices, which includes an input for receiving a clock signal originating from a previous device, and an output for providing a synchronized clock signal destined for a succeeding device.
Abstract: A system includes a system controller and a configuration of series-connected semiconductor devices. Such a device includes an input for receiving a clock signal originating from a previous device, and an output for providing a synchronized clock signal destined for a succeeding device. The device further includes a clock synchronizer for producing the synchronized clock signal by processing the received clock signal and an earlier version of the synchronized clock signal. The device further includes a device controller for adjusting a parameter used by the clock synchronizer in processing the earlier version of the synchronized clock signal. The system controller has an output for providing a first clock signal to a first device, and an input for receiving a second clock signal from a second device. The second clock signal corresponds to a version of the first clock signal that has undergone processing by a clock synchronizer in at least one of the devices. The system controller further includes a detector for processing the first and second clock signals to detect a phase difference therebetween; and a synchronization controller for commanding an adjustment to the clock synchronizer in at least one of the devices based on the phase difference detected by the detector.

Proceedings ArticleDOI
07 Apr 2008
TL;DR: An innovative power gating approach is proposed, which in addition to targeting maximum reduction of major leakage currents will provide a way to control ground bounce during power mode transition, which will have an additional intermediate HOLD mode along with conventional CUTOFF and RUN modes.
Abstract: Conventional power gating techniques for minimizing leakage currents introduce ground bounce noise during power mode transition. Here an analysis of ground bounce due to power mode transition in power gating structures is presented. An innovative power gating approach is proposed, which in addition to targeting maximum reduction of major leakage currents will provide a way to control ground bounce during power mode transition. The proposed power gating technique will have an additional intermediate HOLD mode along with conventional CUTOFF and RUN modes. Its stepwise turning on feature will provide higher reduction of the magnitude of peak current and voltage glitches in the power distribution network as well as the minimum time required to stabilize power and ground as compared to other similar techniques.

Patent
28 Oct 2008
TL;DR: In this article, a multi-reference phase-locked loop (MPLL) is proposed to generate a high speed clock frequency and phase lock it to a lowest common reference frequency derived from a selected one of at least two reference clocks.
Abstract: A multi reference phase locked loop (MPLL) generates a high speed clock frequency and phase locks it to a lowest common reference frequency derived from a selected one of at least two reference clocks. One of the reference clocks is a system reference clock in a FBDIMM system, another may be a forwarded clock in an AMB 2 . A prescaler reduces the frequency of at least the forwarded clock to the lowest common reference frequency which is the frequency of the system reference clock. A PLL at the core of the MPLL may be locked to the forwarded clock or the system reference clock for generating a high speed clock. A feedback divider generates the feedback clock for the PLL as well as other clocks required in the system. Furthermore, the MPLL provides a number of clocking modes, including modes to facilitate testing and powering down of sections of the circuitry for conserving power.

Proceedings ArticleDOI
08 Jun 2008
TL;DR: A new method is introduced for automatically synthesizing conditions under which the transition of a register may be safely blocked in a way that minimizes netlist perturbation and is both timing- and physical-aware.
Abstract: Clock gating is the insertion of combinational logic along the clock path to prevent the unnecessary switching of registers and reduce dynamic power consumption. The conditions under which the transition of a register may be safely blocked can either be explicitly specified by the designer or detected automatically. We introduce a new method for automatically synthesizing these conditions in a way that minimizes netlist perturbation and is both timing- and physical-aware. Our automatic method is also scalable, utilizing simulation and satisfiability tests and necessitating no symbolic representation. On a set of benchmarks, our technique successfully reduces the dynamic clock power by 14.5% on average. Furthermore, we demonstrate how to apply a straightforward logic simplification to utilize resulting don't cares and reduce the logic by 7.0% on average.

Proceedings ArticleDOI
04 May 2008
TL;DR: A clock distribution network for a 3D multilayer core microprocessor that reduces power lost in long interconnects at block level and in the clock distribution and a methodology for turning off the global clock grid along with the logic for an entire layer in a3D stack is proposed.
Abstract: Clock distribution networks are extremely critical from a performance and power standpoint. They account for about 20-30% of the total power dissipated in current generation microprocessors. Many three-dimensional (3D) schemes propose to reduce interconnect length to improve performance and decrease power consumption. In this paper we propose a clock distribution network for a 3D multilayer core microprocessor. The 3D microprocessor floor plan has a single core folded onto multiple layers. A separate layer for the clock distribution network is proposed in the 3D microprocessor. This arrangement of a 3D chip stack reduces (a) power lost in long interconnects at block level and (b) in the clock distribution. Simulation results indicate a 15-20% power saving for this clock distribution scheme as compared to a 2D structure. A methodology for turning off the global clock grid along with the logic for an entire layer in a 3D stack is also proposed. Simulation results indicate an additional 8-10% savings in power with minimal impact on the critical parameters of the clock grid.

Proceedings ArticleDOI
08 Dec 2008
TL;DR: This paper describes a process for using Boolean tester measurements for determining the settings of the tunable buffers and shows that frequency improvements of 10% or more are possible by appropriate setting of tunable clock buffers.
Abstract: Optical shrink for process migration, manufacturing process variation and dynamic voltage control leads to clock skew as well as path delay variation in a manufactured chip. Since such variations are difficult to predict in pre-silicon phase, tunable clock buffers have been used in several microprocessor designs. The buffer delays are tuned to improve maximum operating clock frequency of a design. This however shifts the burden of finding tuning settings for individual clock buffers to the test process. In this paper, we describe a process for using Boolean tester measurements for determining the settings of the tunable buffers. The results show that frequency improvements of 10% or more are possible by appropriate setting of tunable clock buffers.

Patent
28 May 2008
TL;DR: In this paper, a method of optimizing clock-gated circuitry in an integrated circuit (IC) design is provided, where the clock gates gate a plurality of sequential elements in the IC design.
Abstract: A method of optimizing clock-gated circuitry in an integrated circuit (IC) design is provided. A plurality of signals which feed into enable inputs of a plurality of clock gates is determined, where the clock gates gate a plurality of sequential elements in the IC design. Combinational logic which is shared among the plurality of signals is identified. The clock-gated circuitry is transformed into multiple levels of clock-gating circuitry based on the shared combinational logic.

Proceedings ArticleDOI
01 Apr 2008
TL;DR: A clock synchronization algorithm with drift compensation that implements this symmetric error paradigm is presented and it is shown that the remaining error is symmetric and in the range of the clock granularity.
Abstract: In this paper we argue that achieving symmetric errors is the key to an improved understanding of clock synchronization. We present a clock synchronization algorithm with drift compensation that implements this symmetric error paradigm. The performance of the algorithm is evaluated by measurements in an indoor testbed using the TinyNode hardware platform. We show that the remaining error is symmetric and in the range of the clock granularity.