Showing papers on "Clock gating published in 2008"

PDF

Open Access

Journal Article•DOI•

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

[...]

Sriram R. Vangal¹, Jason Howard¹, Greg Ruhl¹, Saurabh Dighe¹, H. Wilson¹, James W. Tschanz¹, D. Finan¹, A. Singh¹, Tiju Jacob¹, Shailendra Jain¹, Vasantha Erraguntla¹, Clark Roberts¹, Yatin Hoskote¹, Nitin Borkar¹, Shekhar Borkar¹ - Show less +11 more•Institutions (1)

Intel¹

28 Jan 2008-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz.

...read moreread less

Abstract: This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

...read moreread less

645 citations

Patent•

Chien-search system employing a clock-gating scheme to save power for error correction decoder and other applications

[...]

Hanan Weingarten, Eli Sterin, Ofir Avraham Kanter

17 Sep 2008

TL;DR: In this paper, a chien search apparatus is used to evaluate an error locator polynomial having a known rank and including a sequence of terms for each element in a finite field whose elements correspond respectively to bits in each of a stream of data blocks to be decoded.

...read moreread less

Abstract: Chien search apparatus operative to evaluate an error locator polynomial having a known rank and including a sequence of terms for each element in a finite field whose elements correspond respectively to bits in each of a stream of data blocks to be decoded, the apparatus comprising a sequence of functional units each operative to compute a corresponding term in the sequence of terms included in the error locator polynomial, each term having a degree; and a power saving unit operative to de-activate at least one individual functional unit from among the sequence of functional units, the individual functional unit being operative, when active, to compute a term whose degree exceeds the rank.

...read moreread less

153 citations

Proceedings Article•DOI•

Power-Aware Testing and Test Strategies for Low Power Devices

[...]

Dimitris Gizopoulos, Kaushik Roy¹, Patrick Girard, Nicola Nicolici², Xiaoqing Wen³ - Show less +1 more•Institutions (3)

Purdue University¹, McMaster University², Kyushu Institute of Technology³

10 Mar 2008

TL;DR: This book explores existing solutions for power-aware test and design-for-test of conventional circuits and systems, and surveys test strategies and EDA solutions for testing low power devices.

...read moreread less

Abstract: Managing the power consumption of circuits and systems is now considered one of the most important challenges for the semiconductor industry Elaborate power management strategies, such as dynamic voltage scaling, clock gating or power gating techniques, are used today to control the power dissipation during functional operation The usage of these strategies has various implications on manufacturing test, and power-aware test is therefore increasingly becoming a major consideration during design-for-test and test preparation for low power devices This book explores existing solutions for power-aware test and design-for-test of conventional circuits and systems, and surveys test strategies and EDA solutions for testing low power devices

...read moreread less

148 citations

Book Chapter•DOI•

Architecture enhancements for the ADRES coarse-grained reconfigurable array

[...]

Frank Bouwens¹, Mladen Berekovic¹, Bjorn De Sutter², Georgi Gaydadjiev¹•Institutions (2)

Delft University of Technology¹, Katholieke Universiteit Leuven²

27 Jan 2008

TL;DR: This paper investigates the influence of register file partitions, register file sizes and the interconnection topology of ADRES, and proposes an enhanced architecture instantiation that improves performance by 60 - 70% and reduces energy by 50%.

...read moreread less

Abstract: Reconfigurable architectures provide power efficiency, flexibility and high performance for next generation embedded multimedia devices ADRES, the IMEC Coarse-Grained Reconfigurable Array architecture and its compiler DRESC enable the design of reconfigurable 2D array processors with arbitrary functional units, register file organizations and interconnection topologies This creates an enormous design space making it difficult to find optimized architectures Therefore, architectural explorations aiming at energy and performance trade-offs become a major effort In this paper we investigate the influence of register file partitions, register file sizes and the interconnection topology of ADRES We analyze power, performance and energy delay trade-offs using IDCT and FFT as benchmarks while targeting 90nm technology We also explore quantitatively the influences of several hierarchical optimizations for power by applying specific hardware techniques, ie clock gating and operand isolation As a result, we propose an enhanced architecture instantiation that improves performance by 60 - 70% and reduces energy by 50%

...read moreread less

92 citations

Proceedings Article•DOI•

Power Modeling in SystemC at Transaction Level, Application to a DVFS Architecture

[...]

H. Lebreton, P. Vivet

07 Apr 2008

TL;DR: This paper proposes in this paper a generic way to instrument a SystemC/TLM platform in order to model power consumption at a coarse grain and has been applied to model an advanced DVFS architecture based on a network-on-chip.

...read moreread less

Abstract: With growing integration, power consumption is becoming a challenging issue for mobile systems. Todaypsilas complex SoCs integrate advanced power management strategies, at both HW and SW level. HW mechanisms such as clock gating, power switches or voltage and frequency scaling optimize dynamically the power profile. In such architectures, power estimation at application level is a major concern for proper power optimization. SystemC at the transaction level is adapted and largely adopted by the industry as a simulation tool. We propose in this paper a generic way to instrument a SystemC/TLM platform in order to model power consumption at a coarse grain. The proposed approach has been applied to model an advanced DVFS architecture based on a network-on-chip.

...read moreread less

63 citations

Patent•

Ic card with low precision clock

[...]

Francesco Varone¹, Pasquale Vastano¹, Amedeo Veneroso¹•Institutions (1)

STMicroelectronics¹

16 May 2008

TL;DR: In this paper, a low precision clock is included in the subset of electronic components for measuring time in the main clock stop status of an IC card, where the clock signal is suspended for avoiding a maximum power consumption threshold.

...read moreread less

Abstract: An IC Card may include electronic components to receive a power supply and a main clock signal by a reader device. The power supply may be provided to a subset of the electronic components during a main clock stop status wherein the main clock signal is suspended for avoiding a maximum power consumption threshold. The IC Card may also include a low precision clock included in the subset of electronic components for measuring time in the main clock stop status.

...read moreread less

63 citations

Journal Article•DOI•

Resonant-Clock Latch-Based Design

[...]

Visvesh S. Sathe¹, Jerry Chang-Jui Kao², Marios C. Papaefthymiou•Institutions (2)

Advanced Micro Devices¹, University of Michigan²

31 Mar 2008-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, the authors describe RF1 and RF2, two level-clocked test-chips that deploy resonant clocking to reduce power consumption in their clock distribution networks.

...read moreread less

Abstract: This paper describes RF1 and RF2, two level-clocked test-chips that deploy resonant clocking to reduce power consumption in their clock distribution networks. It also highlights RCL, a novel resonant-clock latch-based methodology that was used to design the two test-chips. RF1 and RF2 are 8-bit 14-tap finite-impulse response (FIR) filters with identical architectures. Designed using a fully automated ASIC design flow, they have been fabricated in a commercial 0.13 mum bulk silicon process. RF1 operates at clock frequencies in the 0.8-1.2 GHz range and uses a single-phase clocking scheme with a driven clock generator. Resonating its 42 pF clock load at 1.03 GHz with Vdd = 1.13 V, RF1 dissipates 132 mW, achieving a clock power reduction of 76% over conventional switching. RF2 achieves higher clock power efficiency than RF1 by relying on a two-phase clocking scheme with a distributed self-resonant clock generator. Resonating 38 pF of clock load per phase at 1.01 GHz with Vdd = 1.08 V, RF2 dissipates 124 mW and achieves 84% reduction in clock power over conventional switching. At 133 nW/MHz/Tap/InBit/CoeffBit, RF2 features the lowest figure of merit for FIR filters published to date.

...read moreread less

58 citations

Proceedings Article•DOI•

A Resonant Global Clock Distribution for the Cell Broadband-Engine Processor

[...]

S. Chan¹, Phillip J. Restle¹, T.J. Bucelot¹, S. Weitzel¹, J. Keaty¹, John S. Liberty¹, Brian Flachs¹, R. Volant¹, P. Kapusta¹, J.S. Zimmerman¹ - Show less +6 more•Institutions (1)

IBM¹

01 Feb 2008

TL;DR: By modifying the Cell broadband engine processor to incorporate a large resonant global clock network, power savings with full functionality is demonstrated over a 20% range in clock frequencies, and a 6-8 Watt power savings at 4 GHz.

...read moreread less

Abstract: Resonant clocking techniques show promise in reducing global clock power and timing uncertainty (skew and jitter). By resonating the large global clock capacitance with an inductance, the energy used to charge the clock node each period can be recycled within the LC tank network, resulting in lower clock power. Additional power savings are realized by reducing the strength of clock drivers because only losses need to be overcome at resonance. Skew and jitter are improved due to the bandpass characteristic of the LC network and the use of fewer clock buffering stages. We describe how the Cell Broadband Engine (Cell BE) processor is experimentally transformed to have a resonant-load global clock distribution similar to the one in (Chan et al., 2004).

...read moreread less

50 citations

Patent•

Method and apparatus for initialization of read latency tracking circuit in high-speed DRAM

[...]

James Brian Johnson¹, Brent Keeth¹, Feng Dan Lin¹•Institutions (1)

Micron Technology¹

22 Feb 2008

TL;DR: In this paper, the authors present a method of synchronizing counters in two different clock domains within a memory device, which is comprised of generating a start signal for initiating production of a running count of clock pulses of a read clock signal in a first counter downstream of a locked loop and delaying the input of the start signal to a second counter upstream of the locked loop to delay the initiation of control clock pulses by an amount equal to a predetermined delay.

...read moreread less

Abstract: A method of synchronizing counters in two different clock domains within a memory device is comprised of generating a start signal for initiating production of a running count of clock pulses of a read clock signal in a first counter downstream of a locked loop and delaying the input of the start signal to a second counter upstream of the locked loop to delay the initiation of a running count of control clock pulses by an amount equal to a predetermined delay. Another disclosed method is for controlling the output of data from a memory device comprising deriving from an external clock signal a control clock for operating an array of storage cells and a read clock, both the control clock and the read clock being comprised of clock pulses. A start signal is generated for initiating production of a running count of the read clock pulses in a first counter. The start signal may be produced when a locked loop achieves a lock between the read clock and the control clock. The input of the start signal to a second counter is delayed to delay the initiation of a running count of the control clock pulses. The delay, which may be expressed as an integer number of clock cycles, may be equal to an input/output delay of the memory device. The method may be modified by inputting the start signal to an offset counter before initiating the production of the running count of the read clock pulses in the first counter. The offset counter may be loaded with a value equal to a programmed latency less a synchronization overhead. Once the running counts are initiated, each time a read command is received, a then current value of the running count of control clock pulses from the second counter is latched or held. The held value is compared to the running count of read clock pulses from the first counter, with the read clock signal being used to output data in response to the comparison. Apparatus for implementing the disclosed methods are also disclosed. Because of the rules governing abstracts, this abstract should not be used to construe the claims.

...read moreread less

50 citations

Patent•

Multi-phase duty-cycle corrected clock signal generator and memory having same

[...]

Yantao Ma¹•Institutions (1)

Micron Technology¹

28 Oct 2008

TL;DR: In this paper, memories, multi-phase clock signal generators, and methods for generating multiphase duty cycle corrected clock signals are disclosed. But the authors do not consider the use of a clock signal generator with a delay-locked loop having a first multi-tap adjustable delay line.

...read moreread less

Abstract: Memories, multi-phase clock signal generators, and methods for generating multi-phase duty cycle corrected clock signals are disclosed. For example, one such clock signal generator includes a delay-locked loop having a first multi-tap adjustable delay line configured to delay a reference signal to provide a plurality of clock signals having different phases relative to the reference clock signal. A periodic signal generated by the delay-locked loop is provided to a second multi-tap adjustable delay line as an input clock signal. Clock signals from taps of the second multi-tap adjustable delay line are provided as the multi-phase duty cycle corrected clock signals.

...read moreread less

47 citations

Proceedings Article•DOI•

Multi-threshold CMOS design for low power digital circuits

[...]

S. Hemantha¹, Amit Dhawan¹, Haranath Kar¹•Institutions (1)

Motilal Nehru National Institute of Technology Allahabad¹

01 Nov 2008

TL;DR: This paper proposes the design methodology for reducing the delay of the logic circuits during active mode, which limits the maximum value of transition current to a specified value and eliminates short circuit current.

...read moreread less

Abstract: Multi-threshold CMOS (MTCMOS) power gating is a design technique in which a power gating transistor is connected between the logic transistors and either power or ground, thus creating a virtual supply rail or virtual ground rail, respectively. Power gating transistor sizing, transition (sleep mode to active mode) current, short circuit current and transition time are design issues for power gating design. The use of power gating design results in the delay overhead in the active mode. If both nMOS and pMOS sleep transistor are used in power gating, delay overhead will increase. This paper proposes the design methodology for reducing the delay of the logic circuits during active mode. This methodology limits the maximum value of transition current to a specified value and eliminates short circuit current. Experiment results show 16.83% reduction in the delay.

...read moreread less

Proceedings Article•DOI•

A new paradigm for synthesis and propagation of clock gating conditions

[...]

Ranan Fraer¹, Gila Kamhi¹, Muhammad K. Mhameed¹•Institutions (1)

Intel¹

08 Jun 2008

TL;DR: This paper proposes to exploit the existing clock gating in order to extract stronger gating conditions for blocks that are poorly gated or not gated at all, and presents a uniform treatment of unobservability and stability as dual approaches for propagating gating Conditions forward and backward.

...read moreread less

Abstract: Clock gating has become a standard practice for saving dynamic power in the clock network. Due to design reuse, it is common to find designs that have already some partial clock gating. We propose to exploit the existing clock gating in order to extract stronger gating conditions for blocks that are poorly gated or not gated at all. A second contribution of our paper is a robust and scalable approach to extract stability conditions for clock gating. Finally, we present a uniform treatment of unobservability and stability as dual approaches for propagating gating conditions forward and backward. Experimental results demonstrate significant power reduction (in the range of 14% -- 55% of the clock power) on Intel micro-processor designs.

...read moreread less

Patent•

Clock Generator Circuit for a Charge Pump

[...]

Qui Vi Nguyen¹, Feng Pan¹, Jonathan Huynh¹•Institutions (1)

SanDisk¹

24 Jun 2008

TL;DR: In this article, a charge pump system is formed on an integrated circuit that can be connected to an external power supply, and a clock circuit is coupled to provide a clock output, at whose frequency the charge pump operates and generates output voltage from an input voltage.

...read moreread less

Abstract: A charge pump system is formed on an integrated circuit that can be connected to an external power supply. The system includes a charge pump and a clock generator circuit. The clock circuit is coupled to provide a clock output, at whose frequency the charge pump operates and generates an output voltage from an input voltage. The clock frequency is a decreasing function of the voltage level of the external power supply. This allows for reducing power consumption in the charge pump system formed on a circuit connectable to an external power supply.

...read moreread less

Journal Article•DOI•

Low-Power VLSI Implementation of the Inner Receiver for OFDM-Based WLAN Systems

[...]

Alfonso Troya, Koushik Maharatna¹, Milos Krstic¹, Eckhard Grass¹, Ulrich Jagdhold¹, Rolf Kraemer¹ - Show less +2 more•Institutions (1)

Innovations for High Performance Microelectronics¹

14 Mar 2008-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: Low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems are proposed and the use of multiple clock domains and clock gating reduces the power consumption.

...read moreread less

Abstract: In this paper, we propose low-power designs for the synchronizer and channel estimator units of the Inner Receiver in wireless local area network systems. The objective of the work is the optimization, with respect to power, area, and latency, of both the signal processing algorithms themselves and their implementation. Novel circuit design strategies have been employed to realize optimal hardware and power efficient architectures for the fast Fourier transform, arc tangent computation unit, numerically controlled oscillator, and the decimation filters. The use of multiple clock domains and clock gating reduces the power consumption further. These blocks have been integrated into an experimental digital baseband processor for the IEEE 802.11a standard implemented in the 0.25mum- 5-metal layer BiCMOS technology from Institute for High Performance Microelectronics.

...read moreread less

Proceedings Article•DOI•

Type-matching clock tree for zero skew clock gating

[...]

Chia-Ming Chang¹, Shih-Hsu Huang¹, Yuan-Kai Ho¹, Jia-Zong Lin¹, Hsin-Po Wang, Yu-Sheng Lu - Show less +2 more•Institutions (1)

Chung Yuan Christian University¹

08 Jun 2008

TL;DR: This paper presents a novel clock tree design style, called type-matching clock tree, to ensure that the logic gates at the same level are in the same type, and proposes a zero skew gated clock tree synthesis algorithm that can significantly reduce the clock skew in every process corner.

...read moreread less

Abstract: Clock skew minimization is always very important in the clock tree synthesis. Due to clock gating, the clock tree may include different types of logic gates, e.g., AND gates, OR gates, and buffer gates. If the logic gates at the same level are in different types, which have different timing behaviors, the control of clock skew becomes difficult. Based on that observation, in this paper, we present a novel clock tree design style, called type-matching clock tree, to ensure that the logic gates at the same level are in the same type. We prove that any clock control logic can always be transformed to our type-matching clock tree. Then, based on the idea of type-matching clock tree, we propose a zero skew gated clock tree synthesis algorithm. Compared with the industry- strength gated clock tree synthesis, experimental data show that our approach can significantly reduce the clock skew in every process corner with a small penalty on the clock tree area and the clock tree power consumption.

...read moreread less

Proceedings Article•DOI•

Low-cost implementations of NTRU for pervasive security

[...]

A.C. Atici, Lejla Batina¹, Junfeng Fan¹, Ingrid Verbauwhede¹, S.B.O. Yalcin² - Show less +1 more•Institutions (2)

Katholieke Universiteit Leuven¹, Istanbul Technical University²

02 Jul 2008

TL;DR: This work presents a compact and low power NTRU design that is suitable for pervasive security applications such as RFIDs and sensor nodes and is also the first one to present a complete N TRU design with encryption/decryption circuitry.

...read moreread less

Abstract: NTRU is a public-key cryptosystem based on the shortest vector problem in a lattice which is an alternative to RSA and ECC. This work presents a compact and low power NTRU design that is suitable for pervasive security applications such as RFIDs and sensor nodes. We have designed two architectures, one is only capable of encryption and the other one performs both encryption and decryption. The strategy for the designs includes clock gating of registers, operand isolation and precomputation. This work is also the first one to present a complete NTRU design with encryption/decryption circuitry. Our encryption-only NTRU design has a gate-count of 2.8 kgates and dynamic power consumption of 1.72 muW. Moreover, encryption-decryption NTRU design consumes about 6 muW dynamic power and consists of 10.5 kgates.

...read moreread less

Proceedings Article•DOI•

Task activity vectors: a new metric for temperature-aware scheduling

[...]

Andreas Merkel¹, Frank Bellosa¹•Institutions (1)

Karlsruhe Institute of Technology¹

01 Apr 2008

TL;DR: This work proposes task activity vectors describing which functional units a task uses to what degree and implemented several vector-based scheduling strategies for Linux, showing that vector- based scheduling considerably reduces hotspots.

...read moreread less

Abstract: Non-uniform utilization of functional units in combination with hardware mechanisms such as clock gating leads to different power consumptions in different parts of a processor chip. This in turn leads to non-uniform temperature distributions and problematic local hotspots, depending on the characteristics of the currently running task. The operating system's scheduler, responsible for deciding which task to run at what time, can influence temperature distribution. Our work investigates what the operating system can do to alleviate the problem of hotspots. We propose task activity vectors describing which functional units a task uses to what degree. With the knowledge provided by these vectors, the scheduler can schedule tasks using different units successively, distribute tasks using a particular unit excessively over the system's processors, or mix tasks using different units on a SMT processor. We implemented several vector-based scheduling strategies for Linux. Our evaluations show that vector-based scheduling considerably reduces hotspots.

...read moreread less

Proceedings Article•DOI•

CTX: A Clock-Gating-Based Test Relaxation and X-Filling Scheme for Reducing Yield Loss Risk in At-Speed Scan Testing

[...]

Hiroshi Furukawa¹, Xiaoqing Wen¹, Kohei Miyase¹, Yuta Yamato¹, Seiji Kajihara¹, Patrick Girard, Laung-Terng Wang, Mohammad Tehranipoor² - Show less +4 more•Institutions (2)

Kyushu Institute of Technology¹, University of Connecticut²

24 Nov 2008

TL;DR: CTX effectively reduces launch switching activity, thus yield loss risk, even with a small number of donpsilat care (X) bits as in test compression, without any impact on test data volume, fault coverage, performance, and circuit design.

...read moreread less

Abstract: At-speed scan testing is susceptible to yield loss risk due to power supply noise caused by excessive launch switching activity. This paper proposes a novel two-stage scheme, namely CTX (Clock-Gating-Based Test Relaxation and X-Filling), for reducing switching activity when test stimulus is launched. Test relaxation and X-filling are conducted (1) to make as many FFs inactive as possible by disabling corresponding clock-control signals of clock-gating circuitry in Stage-1 (Clock-Disabling), and (2) to make as many remaining active FFs as possible to have equal input and output values in Stage-2 (FF-Silencing). CTX effectively reduces launch switching activity, thus yield loss risk, even with a small number of donpsilat care (X) bits as in test compression, without any impact on test data volume, fault coverage, performance, and circuit design.

...read moreread less

Patent•

System and method for time synchronization on network

[...]

Seung-Woo Lee¹, Bhum-Cheol Lee¹, Young-Ho Park¹, Jung-Hee Lee¹, Dae-Geun Park¹, Hyun-yong Hwang¹ - Show less +2 more•Institutions (1)

Electronics and Telecommunications Research Institute¹

29 May 2008

TL;DR: In this paper, a system and method for time synchronization on a network is provided, where a slave clock device does not continuously receive a time synchronization message periodically transferred from a master clock device and thus does not correct its time upon all such occasions.

...read moreread less

Abstract: A system and method for time synchronization on a network is provided. According to the system and method for time synchronization, a slave clock device does not continuously receive a time synchronization message periodically transferred from a master clock device and thus does not correct its time upon all such occasions. Rather, the slave clock device requests time information from the master clock device only when the slave clock device needs to correct its time, and receives a time synchronization message transferred from the master clock device and compensates for its time deviation only while the slave clock device is activated, thereby reducing its power consumption and amount of computation.

...read moreread less

Patent•

High-voltage CMOS charge pump

[...]

Sung Eun Kim¹, Jin Kyung Kim¹, Chang-Hee Hyoung¹, Kang Sung Weon¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

21 May 2008

TL;DR: In this paper, a high-voltage complementary metal-oxide semiconductor (CMOS) charge pump is presented, which includes a first Dickson charge pump for doubling a supply voltage based on an input clock signal and a complementary input signal with reversed phases to each other.

...read moreread less

Abstract: Provided is a high-voltage complementary metal-oxide semiconductor (CMOS) charge pump. The high-voltage CMOS charge pump includes a first Dickson charge pump for doubling a supply voltage based on an input clock signal and a complementary input clock signal with reversed phases to each other; a level shifter for doubling voltage levels of the input clock signal and the complementary input clock signal based on an output signal and a complementary output signal of the first Dickson charge pump as power sources, to thereby output a doubled-output clock signal and a doubled-complementary output clock signal; and a second Dickson charge pump for doubling voltage levels of the output signal and the complementary output signal based on the doubled-output clock signal and the doubled-complementary output clock signal from the level shifter.

...read moreread less

Journal Article•DOI•

An All-Digital Fast-Locking Programmable DLL-Based Clock Generator

[...]

Chuan-Kang Liang¹, Rong-Jyi Yang¹, Shen-Iuan Liu¹•Institutions (1)

National Taiwan University¹

14 Mar 2008-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: An all-digital fast-locking programmable DLL-based clock generator is presented, and by resetting the output clock every two input clock periods, the initial minimal delay constraint in the conventional architecture is eliminated.

...read moreread less

Abstract: An all-digital fast-locking programmable DLL-based clock generator is presented. By resetting the output clock every two input clock periods, the initial minimal delay constraint in the conventional architecture is eliminated. Compared with the previous work, the short locking time is also achieved. The proposed circuit has been fabricated in 0.35-mum CMOS process and occupies the active area of 0.216 mm2. The clock multiplication ratio is programmed from 2 to 15. The frequency ranges of the input and output clocks are 4 ~ 200 MHz and 60 ~ 450 MHz, respectively. It dissipates less than 17 mW at all operating frequencies from a 3.3-V supply.

...read moreread less

Patent•

Method and apparatus for testing delay faults

[...]

Thomas Alan Ziaja¹, Kevin D. Woodling², Robert F. Molyneaux¹•Institutions (2)

Sun Microsystems¹, Oracle Corporation²

06 Aug 2008

TL;DR: In this paper, the authors propose to gate the input clock signal at the clock input to each domain of the processor device rather than at the output of the PLL clock, which can provide a more natural state for the circuit during testing as well as allow the test control unit to test the different domains of the device individually.

...read moreread less

Abstract: An apparatus or method for testing of a SOC processor device may minimize interference that is caused by interfacing a comparatively low-speed testing device with the high-speed processor during testing. Implementations may gate the input clock signal at the clock input to each domain of the SOC processor device rather than at the output of the PLL clock. The gating of the clock signal to each domain may then be controlled by clock stop signals generated by the testing device and sent to the individual domains of the processor device. Gating the clock signal at the domain may provide a more natural state for the circuit during testing as well as allow the test control unit to test the different domains of the SOC device individually.

...read moreread less

Patent•

Methods and apparatus for clock signal synchronization in a configuration of series connected semiconductor devices

[...]

Oh Hak June

05 Feb 2008

TL;DR: In this article, the authors present a system controller and a configuration of series-connected semiconductor devices, which includes an input for receiving a clock signal originating from a previous device, and an output for providing a synchronized clock signal destined for a succeeding device.

...read moreread less

Abstract: A system includes a system controller and a configuration of series-connected semiconductor devices. Such a device includes an input for receiving a clock signal originating from a previous device, and an output for providing a synchronized clock signal destined for a succeeding device. The device further includes a clock synchronizer for producing the synchronized clock signal by processing the received clock signal and an earlier version of the synchronized clock signal. The device further includes a device controller for adjusting a parameter used by the clock synchronizer in processing the earlier version of the synchronized clock signal. The system controller has an output for providing a first clock signal to a first device, and an input for receiving a second clock signal from a second device. The second clock signal corresponds to a version of the first clock signal that has undergone processing by a clock synchronizer in at least one of the devices. The system controller further includes a detector for processing the first and second clock signals to detect a phase difference therebetween; and a synchronization controller for commanding an adjustment to the clock synchronizer in at least one of the devices based on the phase difference detected by the detector.

...read moreread less

Proceedings Article•DOI•

Controlling Ground Bounce Noise in Power Gating Scheme for System-on-a-Chip

[...]

Masud H. Chowdhury¹, J. Gjanci¹, P. Khaled¹•Institutions (1)

University of Illinois at Chicago¹

07 Apr 2008

TL;DR: An innovative power gating approach is proposed, which in addition to targeting maximum reduction of major leakage currents will provide a way to control ground bounce during power mode transition, which will have an additional intermediate HOLD mode along with conventional CUTOFF and RUN modes.

...read moreread less

Abstract: Conventional power gating techniques for minimizing leakage currents introduce ground bounce noise during power mode transition. Here an analysis of ground bounce due to power mode transition in power gating structures is presented. An innovative power gating approach is proposed, which in addition to targeting maximum reduction of major leakage currents will provide a way to control ground bounce during power mode transition. The proposed power gating technique will have an additional intermediate HOLD mode along with conventional CUTOFF and RUN modes. Its stepwise turning on feature will provide higher reduction of the magnitude of peak current and voltage glitches in the power distribution network as well as the minimum time required to stabilize power and ground as compared to other similar techniques.

...read moreread less

Patent•

Multiple reference phase locked loop

[...]

Dirk Pfaff, Claus Reitlingshoefer, Stephen Robert Hobbs

28 Oct 2008

TL;DR: In this article, a multi-reference phase-locked loop (MPLL) is proposed to generate a high speed clock frequency and phase lock it to a lowest common reference frequency derived from a selected one of at least two reference clocks.

...read moreread less

Abstract: A multi reference phase locked loop (MPLL) generates a high speed clock frequency and phase locks it to a lowest common reference frequency derived from a selected one of at least two reference clocks. One of the reference clocks is a system reference clock in a FBDIMM system, another may be a forwarded clock in an AMB 2 . A prescaler reduces the frequency of at least the forwarded clock to the lowest common reference frequency which is the frequency of the system reference clock. A PLL at the core of the MPLL may be locked to the forwarded clock or the system reference clock for generating a high speed clock. A feedback divider generates the feedback clock for the PLL as well as other clocks required in the system. Furthermore, the MPLL provides a number of clocking modes, including modes to facilitate testing and powering down of sections of the circuitry for conserving power.

...read moreread less

Proceedings Article•DOI•

Automatic synthesis of clock gating logic with controlled netlist perturbation

[...]

Aaron P. Hurst¹•Institutions (1)

University of California, Berkeley¹

08 Jun 2008

TL;DR: A new method is introduced for automatically synthesizing conditions under which the transition of a register may be safely blocked in a way that minimizes netlist perturbation and is both timing- and physical-aware.

...read moreread less

Abstract: Clock gating is the insertion of combinational logic along the clock path to prevent the unnecessary switching of registers and reduce dynamic power consumption. The conditions under which the transition of a register may be safely blocked can either be explicitly specified by the designer or detected automatically. We introduce a new method for automatically synthesizing these conditions in a way that minimizes netlist perturbation and is both timing- and physical-aware. Our automatic method is also scalable, utilizing simulation and satisfiability tests and necessitating no symbolic representation. On a set of benchmarks, our technique successfully reduces the dynamic clock power by 14.5% on average. Furthermore, we demonstrate how to apply a straightforward logic simplification to utilize resulting don't cares and reduce the logic by 7.0% on average.

...read moreread less

Proceedings Article•DOI•

Low-power clock distribution in a multilayer core 3d microprocessor

[...]

Venkatesh Arunachalam¹, Wayne Burleson¹•Institutions (1)

University of Massachusetts Amherst¹

04 May 2008

TL;DR: A clock distribution network for a 3D multilayer core microprocessor that reduces power lost in long interconnects at block level and in the clock distribution and a methodology for turning off the global clock grid along with the logic for an entire layer in a3D stack is proposed.

...read moreread less

Abstract: Clock distribution networks are extremely critical from a performance and power standpoint. They account for about 20-30% of the total power dissipated in current generation microprocessors. Many three-dimensional (3D) schemes propose to reduce interconnect length to improve performance and decrease power consumption. In this paper we propose a clock distribution network for a 3D multilayer core microprocessor. The 3D microprocessor floor plan has a single core folded onto multiple layers. A separate layer for the clock distribution network is proposed in the 3D microprocessor. This arrangement of a 3D chip stack reduces (a) power lost in long interconnects at block level and (b) in the clock distribution. Simulation results indicate a 15-20% power saving for this clock distribution scheme as compared to a 2D structure. A methodology for turning off the global clock grid along with the logic for an entire layer in a 3D stack is also proposed. Simulation results indicate an additional 8-10% savings in power with minimal impact on the critical parameters of the clock grid.

...read moreread less

Proceedings Article•DOI•

An Automatic Post Silicon Clock Tuning System for Improving System Performance based on Tester Measurements

[...]

K. Nagaraj¹, Sandip Kundu¹•Institutions (1)

University of Massachusetts Amherst¹

08 Dec 2008

TL;DR: This paper describes a process for using Boolean tester measurements for determining the settings of the tunable buffers and shows that frequency improvements of 10% or more are possible by appropriate setting of tunable clock buffers.

...read moreread less

Abstract: Optical shrink for process migration, manufacturing process variation and dynamic voltage control leads to clock skew as well as path delay variation in a manufactured chip. Since such variations are difficult to predict in pre-silicon phase, tunable clock buffers have been used in several microprocessor designs. The buffer delays are tuned to improve maximum operating clock frequency of a design. This however shifts the burden of finding tuning settings for individual clock buffers to the test process. In this paper, we describe a process for using Boolean tester measurements for determining the settings of the tunable buffers. The results show that frequency improvements of 10% or more are possible by appropriate setting of tunable clock buffers.

...read moreread less

Patent•

Method for optimized automatic clock gating

[...]

Yunjian (William) Jiang¹, Arvind Srinivasan¹, Joy Banerjee¹, Yinghua Li¹, Partha Das¹, Samit Chaudhuri¹ - Show less +2 more•Institutions (1)

Magma Design Automation¹

28 May 2008

TL;DR: In this paper, a method of optimizing clock-gated circuitry in an integrated circuit (IC) design is provided, where the clock gates gate a plurality of sequential elements in the IC design.

...read moreread less

Abstract: A method of optimizing clock-gated circuitry in an integrated circuit (IC) design is provided. A plurality of signals which feed into enable inputs of a plurality of clock gates is determined, where the clock gates gate a plurality of sequential elements in the IC design. Combinational logic which is shared among the plurality of signals is identified. The clock-gated circuitry is transformed into multiple levels of clock-gating circuitry based on the shared combinational logic.

...read moreread less

Proceedings Article•DOI•

Symmetric clock synchronization in sensor networks

[...]

Philipp Sommer¹, Roger Wattenhofer¹•Institutions (1)

ETH Zurich¹

01 Apr 2008

TL;DR: A clock synchronization algorithm with drift compensation that implements this symmetric error paradigm is presented and it is shown that the remaining error is symmetric and in the range of the clock granularity.

...read moreread less

Abstract: In this paper we argue that achieving symmetric errors is the key to an improved understanding of clock synchronization. We present a clock synchronization algorithm with drift compensation that implements this symmetric error paradigm. The performance of the algorithm is evaluated by measurements in an indoor testbed using the TinyNode hardware platform. We show that the remaining error is symmetric and in the range of the clock granularity.

...read moreread less

Collapse