Showing papers on "Clock gating published in 2015"

PDF

Open Access

Patent•

Circuit, system and method for selectively turning off internal clock drivers

[...]

17 Mar 2015

TL;DR: In this paper, an internal clock is disabled if a no operation command is detected during periods of time when no read or write burst operation is taking place, which may be used to reduce power consumption by reducing operating current in memory devices.

...read moreread less

Abstract: The present invention includes a circuit, system and method for selectively turning off internal clock drivers to reduce operating current. The present invention may be used to reduce power consumption by reducing operating current in a memory device. Operating current may be reduced by turning off internal clock drivers that deliver a clock signal during selected periods of time. According to an embodiment of clock control circuitry of the present invention, an internal clock is disabled if a no operation command is detected during periods of time when no read or write burst operation is taking place. Methods, memory devices and computer systems including the clock control circuitry and its functionality are also disclosed.

...read moreread less

59 citations

Journal Article•DOI•

Near-Threshold Energy- and Area-Efficient Reconfigurable DWPT/DWT Processor for Healthcare-Monitoring Applications

[...]

Chao Wang¹, Jun Zhou¹, Lei Liao¹, Jingjing Lan¹, Jianwen Luo², Xin Liu¹, Minkyu Je³ - Show less +3 more•Institutions (3)

Agency for Science, Technology and Research¹, Hewlett-Packard², Daegu Gyeongbuk Institute of Science and Technology³

01 Jan 2015-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: An energy- and area-efficient discrete wavelet packet transform (DWPT) processor design for power-constrained and cost-sensitive healthcare-monitoring applications that employs recursive memory-shared architecture to achieve low hardware complexity while performing required arbitrary-basis DWPT decomposition.

...read moreread less

Abstract: This brief presents an energy- and area-efficient discrete wavelet packet transform (DWPT) processor design for power-constrained and cost-sensitive healthcare-monitoring applications. This DWPT processor employs recursive memory-shared architecture to achieve low hardware complexity while performing required arbitrary-basis DWPT decomposition. By exploiting inherent characteristics of different physiological signals through an entropy statistic engine, the DWPT processor core can be reconfigured to compute multilevel wavelet decomposition with effective time and frequency resolution. Various design techniques from algorithm to circuit levels, including reconfigurable computing, lifting scheme, dual-port pipeline processing, near-threshold operation, and clock gating, are applied to achieve energy efficiency. With a 0.18- $\mu\hbox{m}$ CMOS technology at 0.5 V and 1 MHz, the DWPT core only consumes 26 $\mu\hbox{W}$ for performing three-level 256-point DWPT decomposition with entropy statistic calculation. When integrated in an ARM Cortex-M0-based biomedical system-on-a-chip test platform, the DWPT processor achieves processing acceleration by three orders of magnitude and reduces energy consumption by four orders of magnitude compared with CPU-only implementations.

...read moreread less

36 citations

Journal Article•DOI•

A Fully Parallel Nonbinary LDPC Decoder With Fine-Grained Dynamic Clock Gating

[...]

Youn Sung Park¹, Yaoyu Tao¹, Zhengya Zhang¹•Institutions (1)

University of Michigan¹

01 Feb 2015-IEEE Journal of Solid-state Circuits

TL;DR: This work presents a 1.22 Gb/s fully parallel decoder of a GF(64) (160, 80) regular-(2, 4) NB-LDPC code in 65 nm CMOS, and allows each processing node to detect its own convergence and apply dynamic clock gating to save power.

...read moreread less

Abstract: Nonbinary LDPC (NB-LDPC) codes, defined over Galois field, offer better coding gain and a lower error floor than binary LDPC codes. However, the complex decoding and large memory requirement have prevented any practical chip implementations. We present a 1.22 Gb/s fully parallel decoder of a GF(64) (160, 80) regular-(2, 4) NB-LDPC code in 65 nm CMOS. The reduced number of edges in NB-LDPC code's factor graph permits a low wiring overhead in the fully parallel architecture. The throughput is further improved by a one-step look-ahead check node design that increases the clock frequency to 700 MHz, and the interleaving of variable node and check node operations that shortens one decoding iteration to 47 clock cycles. We allow each processing node to detect its own convergence and apply dynamic clock gating to save power. When all processing nodes have been clock gated, the decoder terminates and continues with the next input to increase the throughput to 1.22 Gb/s. The dynamic clock gating and decoder termination improve the energy efficiency to 3.03 nJ/b, or 259 pJ/b/iteration, at 1.0 V and 700 MHz. Voltage scaling to 675 mV improves the energy efficiency to 89 pJ/b/iteration for a throughput of 698 Mb/s at 400 MHz.

...read moreread less

35 citations

Journal Article•DOI•

Power Reduction by Clock Gating Technique

[...]

Nandita Srinivasan¹, Navamitha S. Prakash¹, D Shalakha¹, D Sivaranjani¹, G Swetha Sri Lakshmi¹, B. Bala Tripura Sundari¹ - Show less +2 more•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2015-Procedia Technology

TL;DR: AVHDL-based technique is proposed, to insert clock gating circuit and also the dynamic power due to this is estimated, which shows that the dynamicPower is reduced for the sequential benchmark circuits considered.

...read moreread less

30 citations

Journal Article•DOI•

Clock Tree Resynthesis for Multi-Corner Multi-Mode Timing Closure

[...]

Subhendu Roy¹, Pavlos M. Mattheakis², Laurent Masse-Navette², David Z. Pan¹•Institutions (2)

University of Texas at Austin¹, Mentor Graphics²

20 Jan 2015-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A novel clock tree resynthesis methodology which is based on a skew scheduling engine which works on an already built clock tree and demonstrates the effectiveness of the offsets at the output pins of the leaf-level clock drivers in comparison to the traditional clock scheduling in the clock pin of the flip-flops due to the better implementability and lesser area overhead.

...read moreread less

Abstract: With aggressive technology scaling and complex design scenarios, timing closure has become a challenging and tedious job for the designers. Timing violations persist for multi-corner, multi-mode designs in the deep-routing stage although careful optimization has been applied at every step after synthesis. Useful clock skew optimization has been suggested as an effective way to achieve design convergence and timing closure. Existing approaches on useful skew optimization: 1) calculate clock skew at sequential elements before the actual tree is synthesized and 2) do not account for the implementability of the calculated schedules at the later stages of design cycle. In this paper, we propose a novel clock tree resynthesis methodology which is based on a skew scheduling engine which works on an already built clock tree. The output of the engine is a set of positive and negative offsets which translate to the delay and accelerations, respectively in clock arrival at the clock tree pins. We demonstrate the effectiveness of the offsets at the output pins of the leaf-level clock drivers in comparison to the traditional clock scheduling in the clock pins of the flip-flops due to the better implementability and lesser area overhead and present an algorithm to accurately realize these offsets in the clock tree. Experimental results on large-scale industrial designs demonstrate that our clock tree resynthesis methodology achieves respectively 57%, 12%, and 42% average improvement in total negative slack, worst negative slack, and failure-end-point with an average overhead of 26% in clock tree area. We also experimentally study the impact of on-chip-variation-derates on our approach in terms of the timing metric improvement and clock tree overhead.

...read moreread less

28 citations

Journal Article•DOI•

Statistical Timing Analysis and Criticality Computation for Circuits With Post-Silicon Clock Tuning Elements

[...]

Bing Li¹, Ulf Schlichtmann¹•Institutions (1)

Technische Universität München¹

12 May 2015-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper proposes an alternative method using graph transformation, which computes a parametric minimum clock period and is more than 104 times faster than Monte Carlo simulation while maintaining a good accuracy.

...read moreread less

Abstract: Post-silicon clock tuning elements are widely used in high-performance designs to mitigate the effects of process variations and aging. Located on clock paths to flip-flops, these tuning elements can be configured through the scan chain so that clock skews to these flip-flops can be adjusted after manufacturing. Owing to the delay compensation across consecutive register stages enabled by the clock tuning elements, higher yield and enhanced robustness can be achieved. These benefits are, nonetheless, attained by increasing die area due to the inserted clock tuning elements. For balancing performance improvement and area cost, an efficient timing analysis algorithm is needed to evaluate the performance of such a circuit. So far this evaluation is only possible by Monte Carlo simulation which is very time-consuming. In this paper, we propose an alternative method using graph transformation, which computes a parametric minimum clock period and is more than $ {10}^ {4}$ times faster than Monte Carlo simulation while maintaining a good accuracy. This method also identifies the gates that are critical to circuit performance, so that a fast analysis-optimization flow becomes possible.

...read moreread less

22 citations

Proceedings Article•DOI•

An overlap-contention free true-single-phase clock dual-edge-triggered flip-flop

[...]

Andrea Bonetti¹, Adam Teman¹, Andreas Burg¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

24 May 2015

TL;DR: This paper presents a novel, static DET flip-flop with a true-single-phase clock that completely avoids clock overlap hazards by eliminating the need for an inverted clock edge for functionality.

...read moreread less

Abstract: Dual-edge-triggered (DET) synchronous operation is a very attractive option for low-power, high-performance designs Compared to conventional single-edge synchronous systems, DET operation is capable of providing the same throughput at half the clock frequency This can lead to significant power savings on the clock network that is often one of the major contributors to total system power However, in order to implement DET operation, special registers need to be introduced that sample data on both clock-edges These registers are more complex than their single-edge counterparts, and often suffer from a certain amount of clock-overlap between the main clock and the internally generated inverted clock This overlap can cause contention inside the cell and lead to logic failures, especially when operating at scaled power supplies and under process variations that characterize nanometer technologies This paper presents a novel, static DET flip-flop (DET-FF) with a true-single-phase clock that completely avoids clock overlap hazards by eliminating the need for an inverted clock edge for functionality The proposed DET FF was implemented in a standard 40nm CMOS technology, showing full functionality at low-voltage operating points, where conventional DET-FFs fail Under a near-threshold, 500mV supply voltage, the proposed cell also provides a 35% lower CK-to-Q delay and the lowest power-delay-product compared to all considered DET-FF implementations

...read moreread less

22 citations

Journal Article•DOI•

Low-Power Clock Distribution Using a Current-Pulsed Clocked Flip-Flop

[...]

Riadul Islam¹, Matthew R. Guthaus¹•Institutions (1)

University of California, Santa Cruz¹

27 Mar 2015-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A new paradigm for clock distribution that uses current, rather than voltage, to distribute a global clock signal with reduced power consumption is proposed and a new high-performance current-mode pulsed flip-flop with enable (CMPFFE) is created.

...read moreread less

Abstract: We propose a new paradigm for clock distribution that uses current, rather than voltage, to distribute a global clock signal with reduced power consumption. While current-mode (CM) signaling has been used in one-to-one signals, this is the first usage in a one-to-many clock distribution network. To accomplish this, we create a new high-performance current-mode pulsed flip-flop with enable (CMPFFE) using 45 nm CMOS technology. When the CMPFFE is combined with a CM transmitter, the first CM clock distribution network exhibits 62% lower average power compared to traditional voltage mode clocks.

...read moreread less

20 citations

Patent•

Clock gating with an asynchronous wrapper cell

[...]

Kenneth S. Stevens¹, Dipanjan Bhadra¹•Institutions (1)

University of Utah¹

15 Jun 2015

TL;DR: In this paper, an asynchronous wrapper circuit for a clock gating cell (CGC) is described, which includes circuitry configured to sample a data channel via sampling circuitry for a communication start signal to enable the CGC to start a gated clock for a data message on the data channel.

...read moreread less

Abstract: Technology is described for an asynchronous wrapper circuit for a clock gating cell (CGC). In one example, the asynchronous wrapper cell for CGC includes circuitry configured to (1) sample a data channel via sampling circuitry for a communication start signal to enable the CGC to start a gated clock for a data message on the data channel, and (2) reset an enable of the CGC to an idle mode via idle mode control circuitry after the data message has been clocked via the CGC through function cell circuitry. The idle mode control circuitry generates an output for the sampling circuitry from the function cell. Various other computing circuitries are also disclosed.

...read moreread less

19 citations

Proceedings Article•DOI•

Delay window blind oversampling clock and data recovery algorithm with wide tracking range

[...]

Travis Bartley¹, Shuji Tanaka¹, Yutaka Nonomura², Takahiro Nakayama², Masanori Muroyama¹ - Show less +1 more•Institutions (2)

Tohoku University¹, Toyota²

24 May 2015

TL;DR: The algorithm is capable of recovering data over a wide tracking range or when the precise oversampling rate is not known a priori, for any real-valued oversampled rate, β ≥ 3, making this BO-CDR algorithm the first to not require integer-valued β.

...read moreread less

Abstract: A new blind oversampling clock and data recovery (BO-CDR) algorithm is proposed It has high tolerance to low-frequency jitter (148 unit intervals at 10 kHz, measured at 640 Mbps) and is suitable for systems where the receiver clock has high drift with respect to the transmission The algorithm is capable of recovering data over a wide tracking range or when the precise oversampling rate (β) is not known a priori, for any real-valued oversampling rate, β ≥ 3, making this BO-CDR algorithm the first to not require integer-valued β To demonstrate the utility of the algorithm, two implementations are designed and evaluated The first is used in a low-power, low-data rate sensor node IC with a low-performance single phase clock source The second is a high-speed receiver with a multiple phase clock source implemented on FPGA The CDR core consists of just 47 logic cells and 19 registers and has an estimated power consumption of 070 mW at 640 Mbps The properties of this CDR algorithm make it appropriate for a wide range of applications in serial communication

...read moreread less

18 citations

Journal Article•DOI•

A reliable ground bounce noise reduction technique for nanoscale CMOS circuits

[...]

Vijay Kumar Sharma¹, Manisha Pattanaik¹•Institutions (1)

Indian Institute of Information Technology and Management, Gwalior¹

22 Jan 2015-International Journal of Electronics

TL;DR: In this paper, a power gating structure is proposed to reduce the ground bounce noise (GBN) caused by high voltage fluctuation on real ground rail during sleep mode to active mode transitions of Gating circuits.

...read moreread less

Abstract: Power gating is the most effective method to reduce the standby leakage power by adding header/footer high-VTH sleep transistors between actual and virtual power/ground rails. When a power gating circuit transitions from sleep mode to active mode, a large instantaneous charge current flows through the sleep transistors. Ground bounce noise (GBN) is the high voltage fluctuation on real ground rail during sleep mode to active mode transitions of power gating circuits. GBN disturbs the logic states of internal nodes of circuits. A novel and reliable power gating structure is proposed in this article to reduce the problem of GBN. The proposed structure contains low-VTH transistors in place of high-VTH footer. The proposed power gating structure not only reduces the GBN but also improves other performance metrics. A large mitigation of leakage power in both modes eliminates the need of high-VTH transistors. A comprehensive and comparative evaluation of proposed technique is presented in this article for a chai...

...read moreread less

Patent•

System and method for testing and configuration of an fpga

[...]

Laurent Rouge, Eydoux Julien, Giuffre Marcello

15 Oct 2015

TL;DR: In this article, a clock gating architecture is proposed for loading data to or reading data from specific selected shift registers, where bitstreams are provided at one end of the shift register, and clocked through until the last flip flop receives its value.

...read moreread less

Abstract: Configuration values for Lookup tables (LUTs) and programmable routing switches in an FPGA are provided by means of a number of flip flops arranged in a shift register. This shift register may receive test values in a factory test mode, and operational configuration values (implementing whatever functionality the client requires of the FPGA) in an operational mode. The bitstreams are provided at one end of the shift register, and clocked through until the last flip flop receives its value. Values may also be clocked out at the other end of the shift register to be compared to the initial bitstream in order to identify corruption of stored values e.g. due to radiation exposure. A clock gating architecture is proposed for loading data to or reading data from specific selected shift registers.

...read moreread less

Journal Article•DOI•

Power Efficient High-Level Synthesis by Centralized and Fine-Grained Clock Gating

[...]

Mohsen Riahi Alam¹, Mostafa Ersali Salehi Nasab¹, Sied Mehdi Fakhraie¹•Institutions (1)

University of Tehran¹

15 Jun 2015-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A centralized and fine-grained microarchitecture-level clock gating for low power hardware accelerators which are automatically designed by high-level synthesis (HLS) tool and shown that 47%-86% reduction in power dissipation is observed.

...read moreread less

Abstract: Nowadays, power is a primary concern in digital circuits and clock distribution networks are particularly a significant power consumer. Therefore, clock gating is an effective technique in saving dynamic power by reducing the switching activities. In this paper, we propose a centralized and fine-grained microarchitecture-level clock gating for low power hardware accelerators which are automatically designed by high-level synthesis (HLS) tool. The basic principium of our idea is not to use any extra computation for generating clock enabled signals and exploit exiting signals of finite state machine for controlling the datapath clock network. After determining the current state in finite state machine, clock sub-tree of current state is enabled and the other sub-trees are disabled with a slight increase in circuit area. Our approach is implemented within an HLS design flow for automatic low power hardware accelerator generation in application specific integrated circuit design. Experimental results are obtained on a set of representative benchmark programs. Depending on the circuit size and number of registers, it is shown that 47%–86% reduction in power dissipation is observed.

...read moreread less

Journal Article•DOI•

A Wide-Range, Low-Power, All-Digital Delay-Locked Loop With Cyclic Half-Delay-Line Architecture

[...]

Jinn-Shyan Wang¹, Chun-Yuan Cheng¹, Pei-Yuan Chou¹, Tzu-Yi Yang²•Institutions (2)

National Chung Cheng University¹, Industrial Technology Research Institute²

14 Sep 2015-IEEE Journal of Solid-state Circuits

TL;DR: A cyclic half-delay-line architecture that uses the same type of delay lines for cyclic delay determination and coarse locking is proposed and used to achieve the design goals of small footprint and fast locking for a large operating frequency range.

...read moreread less

Abstract: A 3 MHz-to-1.8 GHz, 94 $\mu\hbox {W}$ -to-9.5 mW, all-digital delay-locked loop (ADDLL) using 65-nm CMOS technology is presented. In this paper, a cyclic half-delay-line architecture that uses the same type of delay lines for cyclic delay determination and coarse locking is proposed and used to achieve the design goals of small footprint and fast locking for a large operating frequency range. In addition, a new delay structure is developed for the cyclic delay units and coarse delay line. In addition to clock gating, which is used to reduce power consumption in the lock-in state regardless of the clock frequency, the automatic bypassing of the cyclic operation is developed and used to reduce power consumption during high-frequency operation. Through the use of proposed techniques, the active area is reduced to only 0.0153 mm $^{2}$ , and the operating frequency range is from 3 MHz to 1.8 GHz. The measurement results show that the proposed ADDLL achieves a peak-to-peak jitter of 3 ps with 9.5 mW power consumption when operated at 1.8 GHz.

...read moreread less

Journal Article•DOI•

Synchronous sampling and clock recovery of internal oscillators for side channel analysis and fault injection

[...]

Colin O'Flynn¹, Zhizhang Chen¹•Institutions (1)

Dalhousie University¹

01 Apr 2015-Journal of Cryptographic Engineering

TL;DR: In this article, the performance of a synchronous sampling system attacking a modern microcontroller running a software AES implementation is characterized under four conditions: with a stable crystal oscillator-based clock, with a clock that is randomly varied between 3.9 and 13 MHz, with an internal oscillator that is slightly random variation due to natural "drift" in the oscillator.

...read moreread less

Abstract: Measuring power consumption for side channel analysis typically uses an oscilloscope, which measures the data relative to an internal sample clock. By synchronizing the sampling clock to the clock of the target device, the sample rate requirements are considerably relaxed; the attack will succeed with a much lower sample rate. This work characterizes the performance of a synchronous sampling system attacking a modern microcontroller running a software AES implementation. This attack is characterized under four conditions: with a stable crystal oscillator-based clock, with a clock that is randomly varied between 3.9 and 13 MHz, with an internal oscillator that is randomly varied between 7.2 and 8.1 MHz, and with an internal oscillator that has slight random variation due to natural ‘drift’ in the oscillator. Traces captured with the synchronous sampling technique can be processed with a standard Differential Power Analysis style attack in all four cases, whereas when an oscilloscope is used only the stable oscillator setup is successful. This work also develops the hardware to recover the internal clock of a device which does not have an externally available clock. It is possible to implement this scheme in software only, allowing it to work with existing oscilloscope-based test environments. Performing the recovery in hardware allows the use of fault injection with excellent temporal stability relative to a sensitive event. This is demonstrated with a power glitch inserted into a microcontroller, where the glitch is triggered based on a signature in the measured power consumption.

...read moreread less

Proceedings Article•DOI•

A scan shifting method based on clock gating of multiple groups for low power scan testing

[...]

Sungyoul Seo¹, Yong Lee¹, Joohwan Lee², Sungho Kang¹•Institutions (2)

Yonsei University¹, Samsung²

02 Mar 2015

TL;DR: A new scan shifting method based on clock gating of multiple groups by reducing toggle rate of the internal combinational logic is presented, which prevents cumulative transitions caused by shifting operations of the scan cells.

...read moreread less

Abstract: From the advent of very large scale integration (VLSI) design, a larger power consumption of a scan-based testing has been one of the most serious problems. The large number of scan cells lead to excessive switching activities during the scan shifting operations. In this paper, we present a new scan shifting method based on clock gating of multiple groups by reducing toggle rate of the internal combinational logic. This method prevents cumulative transitions caused by shifting operations of the scan cells. In addition, the existing compression schemes can be compatible with the proposed method without modification of decompression architecture. Experimental results on ITC'99 benchmark circuits and industrial circuits show that this shifting method reduces the scan shifting power in all cases. In spite of outperformed power, a burden of the extra logic is not necessary to be contemplated.

...read moreread less

Patent•DOI•

Current-mode clock distribution

[...]

Matthew R. Guthaus¹, Riadul Islam¹•Institutions (1)

University of California¹

29 Jan 2015

TL;DR: A new paradigm for clock distribution that uses current, rather than voltage, to distribute a global clock signal with reduced power consumption is proposed, and the first CM clock distribution network exhibits 45.2% lower average power compared to traditional voltage mode clocks.

...read moreread less

Abstract: Current-mode signaling for a one-to-many clock signal distribution providing significantly less dynamic power use and improved noise immunity compared to traditional VM signaling schemes.

...read moreread less

Patent•

Multi-bit flip-flops and scan chain circuits

[...]

Min-Su Kim¹, Matthew S. Berzins¹, Jong-woo Kim¹•Institutions (1)

Samsung¹

13 Feb 2015

TL;DR: In this paper, a multi-bit flip-flop operating as a master-slave flipflop may minimize power consumption occurring in a clock path through which the clock signal is transmitted.

...read moreread less

Abstract: A multi-bit flip-flop includes a plurality of multi-bit flip-flop blocks that share a clock signal. Each of the multi-bit flip-flop blocks includes a single inverter and a plurality of flip-flops. The single inverter generates an inverted clock signal by inverting the clock signal. Each of the flip-flops includes a master latch part and a slave latch part and operates the master latch part and the slave latch part based on the clock signal and the inverted clock signal. Here, the flip-flops are triggered at rising edges of the clock signal. Thus, the multi-bit flip-flop operating as a master-slave flip-flop may minimize (or, reduce) power consumption occurring in a clock path through which the clock signal is transmitted.

...read moreread less

Journal Article•DOI•

[...]

Chao Deng¹, Yici Cai¹, Qiang Zhou¹•Institutions (1)

Tsinghua University¹

13 Mar 2015-Journal of Computer Science and Technology

TL;DR: A novel register clustering methodology in generating the leaf level topology of the clock tree to reduce the power consumption and a buffer allocation algorithm is proposed to satisfy the slew constraint within the clusters at a minimum cost of power consumption.

...read moreread less

Abstract: Clock networks dissipate a significant fraction of the entire chip power budget. Therefore, the optimization for power consumption of clock networks has become one of the most important objectives in high performance IC designs. In contrast to most of the traditional studies that handle this problem with clock routing or buffer insertion strategy, this paper proposes a novel register clustering methodology in generating the leaf level topology of the clock tree to reduce the power consumption. Three register clustering algorithms called KMR, KSR and GSR are developed and a comprehensive study of them is discussed in this paper. Meanwhile, a buffer allocation algorithm is proposed to satisfy the slew constraint within the clusters at a minimum cost of power consumption. We integrate our algorithms into a classical clock tree synthesis (CTS) flow to test the register clustering methodology on ISPD 2010 benchmark circuits. Experimental results show that all the three register clustering algorithms achieve more than 20% reduction in power consumption without affecting the skew and the maximum latency of the clock tree. As the most effective method among the three algorithms, GSR algorithm achieves a 31% reduction in power consumption as well as a 4% reduction in skew and a 5% reduction in maximum latency. Moreover, the total runtime of the CTS flow with our register clustering algorithms is significantly reduced by almost an order of magnitude.

...read moreread less

Proceedings Article•DOI•

Power-Management Specification in SystemC

[...]

Dominik Macko¹, Katarina Jelemenska¹, Pavel Cicak¹•Institutions (1)

Slovak University of Technology in Bratislava¹

22 Apr 2015

TL;DR: This paper targets the unified specification of power-management techniques early in the design flow and efficiency of the proposed approach is illustrated by comparison of the unified power- management specification and the standardized approach.

...read moreread less

Abstract: Power consumption is the greatest concern in current highly-integrated hardware-system design. The power reduction is targeted mostly through power management, implementing such techniques as clock gating, power gating, or voltage and frequency scaling. Due to growing complexity, the start-point in the design has moved from the register-transfer level to the system level. However, the power management lacks the abstraction needed for the system level. Also, different power-management techniques are specified differently, complicating the specification even more. This paper targets the unified specification of power-management techniques early in the design flow. SystemC is used for describing the system functionality along with the power management. Efficiency of the proposed approach is illustrated by comparison of the unified power-management specification and the standardized approach.

...read moreread less

Patent•

Automatic calibration circuits for operational calibration of critical-path time delays in adaptive clock distribution systems, and related methods and systems

[...]

Keith Bowman¹, Jeffrey Todd Bridges¹, Sarthak Raina¹, Yeshwant Nagaraj Kolla¹, Jihoon Jeong¹, Francois Ibrahim Atallah¹, William R. Flederbach¹, Jeffrey Herbert Fischer¹ - Show less +4 more•Institutions (1)

Qualcomm¹

25 Mar 2015

TL;DR: In this paper, the adaptive clock distribution system includes a tunable-length delay circuit to delay distribution of a clock signal provided to a clocked circuit, to prevent timing margin degradation of the clocked circuits after a voltage droop occurs in a power supply supplying power to the clock circuit.

...read moreread less

Abstract: Automatic calibration circuits for operational calibration of critical-path time delays in adaptive clock distribution systems, and related methods and systems, are disclosed. The adaptive clock distribution system includes a tunable-length delay circuit to delay distribution of a clock signal provided to a clocked circuit, to prevent timing margin degradation of the clocked circuit after a voltage droop occurs in a power supply supplying power to the clocked circuit. The adaptive clock distribution system also includes a dynamic variation monitor to reduce frequency of the delayed clock signal provided to the clocked circuit in response to the voltage droop in the power supply, so that the clocked circuit is not clocked beyond its performance limits during a voltage droop. An automatic calibration circuit is provided in the adaptive clock distribution system to calibrate the dynamic variation monitor during operation based on operational conditions and environmental conditions of the clocked circuit.

...read moreread less

Patent•

Clock conditioner circuitry with improved holdover exit transient performance

[...]

Benyong Zhang¹, Anjin Du¹, Junqiang Shi¹•Institutions (1)

Texas Instruments¹

02 Dec 2015

TL;DR: In this article, the authors propose a circuit that provides an improved ability to exit from holdover operations, most notably during conditions where the clock signal inputs to a PLL of the clock conditioner are significantly out of phase.

...read moreread less

Abstract: Disclosed is a circuit, such as a clock conditioner, that provides an improved ability to exit from holdover operations, most notably during conditions where the clock signal inputs to a PLL of the clock conditioner are significantly out of phase. The circuit utilizes the PLL to generate output clocks based on a reference clock and a feedback clock. During holdover mode, the PLL is unlocked. When the reference clock becomes available and holdover mode can be exited, a holdover controller issues a reset signal that triggers a synchronization of the phases of the inputs to the PLL. The reset signal causes the feedback divider component that generates the feedback clock input to reset its phase and adjust its divide ratio for at least the first divide cycle after restart so that its next rising edge will be phase-aligned with the reference clock. Once the two inputs of the PLL phase detector are phase-aligned, the PLL is re-enabled and the PLL smoothly resumes normal operation.

...read moreread less

Journal Article•DOI•

Aggressive Voltage Scaling Through Fast Correction of Multiple Errors With Seamless Pipeline Operation

[...]

Insup Shin¹, Jae-Joon Kim², Youngsoo Shin¹•Institutions (2)

KAIST¹, Pohang University of Science and Technology²

26 Jan 2015-IEEE Transactions on Circuits and Systems

TL;DR: An improved Razor flip-flop is introduced which makes more effective use of its shadow latch, so that a pipeline stage can correct an error while continuing to receive data, and avoids the need for repeated clock gating when timing errors happen simultaneously at different stages.

...read moreread less

Abstract: Aggressive reduction of timing margins, called timing speculation, is an effective way of reducing the supply voltage for a pipeline circuit and thereby its power consumption. However, probability of timing error increases with the voltage scaling and hence, the errors must be corrected with small cycle penalty. We introduce an improved Razor flip-flop which makes more effective use of its shadow latch, so that a pipeline stage can correct an error while continuing to receive data. This avoids the need for repeated clock gating when timing errors happen simultaneously at different stages, or when an error persists. The new flip-flop also facilitates time-borrowing. Our technique uses less energy than the state-of-the art technique, and the energy saving increases with pipeline length: with 10 stages, 4–9% smaller energy is used.

...read moreread less

Proceedings Article•DOI•

Design & analysis of 16 bit RISC processor using low power pipelining

[...]

Priyanka Trivedi¹, Rajan Prasad Tripathi²•Institutions (2)

Galgotias University¹, Amity University²

15 May 2015

TL;DR: A 16 bit low power pipelined RISC processor is proposed by us in this paper, the RISC Processor consists of the block mainly ALU, Universal shift register and Barrel Shifter.

...read moreread less

Abstract: A 16 bit low power pipelined RISC processor is proposed by us in this paper, the RISC processor consists of the block mainly ALU, Universal shift register and Barrel Shifter. We have used modified Harvard architecture that uses separate memories for its instruction & data memory response where as in the other architecture by von Neumann, has only one shared memory for instruction and data, with one data bus and address bus with between data memory & processor memory. The remedial architectural modification has been made in incremental circuit utilized in carry select adder unit of the ALU in the RISC Processor. Operation in the core RISC Processor Fetch, Decode, execute, write back is implemented in the 2 stage pipelining with the positive edge & negative Edge. The process has been realized using XILINX ISE Design suit 13.2 & the Dynamic power is minimized in the RISC Core through the clock gating technique that is an efficient power technique and the total power estimation is done by the X Power analyzer. All the implementation is done in XILINX KINTEX XC7K1607-3fbg676 in it kit 28 nm technology are used. The simulation illustrate the total power dissipated by the processor to be 0.220 watt, and the Latency is 1.5 cycle.

...read moreread less

Patent•

Systems and methods of phase frequency detection involving features such as improved clock edge handling circuitry/aspects

[...]

Yi-Chi Cheng

13 Jul 2015

TL;DR: In this paper, a phase frequency detector (PFD) circuit comprises first circuitry including an output that outputs a missing edge signal, and second circuitry is coupled to the first circuitry and may include components arranged to generate one or both of a reference clock blocking signal and a feedback clock blocking signals based on the missing edge signals.

...read moreread less

Abstract: Systems and methods herein may include or involve control circuitry that detects missing edges of reference and/or feedback clocks and may block the next N rising edges of the feedback clock or reference clock, respectively. In some implementations, a phase frequency detector (PFD) circuit comprises first circuitry including an output that outputs a missing edge signal. The first circuitry may include components arranged to detect a missing rising edge of one or both of a reference clock signal and a feedback clock signal. Second circuitry is coupled to the first circuitry and may include components arranged to generate one or both of a reference clock blocking signal and a feedback clock blocking signal based on the missing edge signal. Further, in some implementations, the blocking of the next N rising edges of the opposite clock may effectively increase the positive gain of the PFD.

...read moreread less

Patent•

Clock generation with non-integer clock dividing ratio

[...]

Gil Stoler¹, Yaniv Shapira¹•Institutions (1)

Amazon.com¹

19 Jun 2015

TL;DR: In this paper, a clock generator for generating a clock equivalent to a target clock which is an input clock divided by a non-integer ratio is presented. But the clock generator is not designed to generate a clock with a fixed number of cycles.

...read moreread less

Abstract: A clock generator for generating a clock equivalent to a target clock which is an input clock divided by a non-integer ratio is disclosed The clock generator comprises a clock divider configured to receive the input clock and divide the input clock with a reconfigurable dividing ratio; and a control circuit controlling operations of the clock divider to divide the input clock by a first dividing ratio to generate a first number of cycles of a first clock in a frame, and divide the input clock by a second dividing ratio to generate a second number of cycles of a second clock in the frame, wherein a difference between a period of the frame and a cumulative time of the first number of cycles of the first clock and the second number of cycles of the second clock is less than a threshold value

...read moreread less

Journal Article•DOI•

Range Unlimited Delay-Interleaving and -Recycling Clock Skew Compensation and Duty-Cycle Correction Circuit

[...]

Yi-Ming Wang¹, Shih-Nung Wei²•Institutions (2)

National Chi Nan University¹, National Chung Cheng University²

01 May 2015-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Preliminary research results prove the feasibility of the proposed technique and show that the operating frequency ranges from 110 MHz to 1.75 GHz, with the corrected duty cycle varying from 51.2% to 48.9% based on 0.18-μm CMOS technology.

...read moreread less

Abstract: A clock skew-compensation and duty-cycle correction circuit (CSADC) is used as the second-level clock distributing circuit to align a system global clock while maintaining a 50% duty cycle. A power-efficient, range-unlimited, and accuracy-enhanced CSADC, designed mainly with a new delay-interleaving and -recycling technique that mitigates operating frequency limitations while keeping overhead costs low, is proposed in this paper. Our preliminary research results prove the feasibility of the proposed technique and show that the operating frequency ranges from 110 MHz to 1.75 GHz, with the corrected duty cycle varying from 51.2% to 48.9% based on 0.18- $\mu $ m CMOS technology. Meanwhile, the lock-in time, static phase error, and power consumption are, respectively, 26 clock cycles, 4.2 ps, and 5.58 mW at 1.75 GHz.

...read moreread less

Patent•

Integrated clock gater (ICG) using clock cascode complimentary switch logic

[...]

Matthew S. Berzins¹, Prashant U. Kenkare•Institutions (1)

Samsung¹

03 Feb 2015

TL;DR: In this article, an integrated clock gater (ICG) circuit having clocked complimentary voltage switched logic (CICG), which delivers high performance while maintaining low power consumption characteristics, is presented.

...read moreread less

Abstract: Inventive aspects include an integrated clock gater (ICG) circuit having clocked complimentary voltage switched logic (CICG) that delivers high performance while maintaining low power consumption characteristics. The CICG circuit provides a small enable setup time and a small clock-to-enabled-clock delay. A significant reduction in clock power consumption is achieved in both enabled and disabled modes, but particularly in the disabled mode. Complimentary latches work in tandem to latch different voltage levels at different nodes depending on the voltage level of the received clock signal and whether or not an enable signal is asserted. An inverter takes the voltage level from one of the nodes, inverts it, and outputs a gated clock signal. The gated clock signal may be active or quiescent depending on the various voltage levels. Time is “borrowed” from an evaluation window and added to a setup time to provide greater tolerances for receiving the enable signal.

...read moreread less

Patent•

Tunable sector buffer for wide bandwidth resonant global clock distribution

[...]

T.J. Bucelot¹, Alan J. Drake¹, Robert A. Groves¹, Jason D. Hibbeler¹, Yong I. Kim¹, Liang-Teck Pang¹, William Robert Reohr¹, Phillip J. Restle¹, M. G. R. Thomson¹ - Show less +5 more•Institutions (1)

IBM¹

07 May 2015

TL;DR: A wide bandwidth resonant clock distribution comprises a clock grid configured to distribute a clock signal to a plurality of components of an integrated circuit and a tunable sector buffer configured to receive the clock signal and provide an output to the clock grid as mentioned in this paper.

...read moreread less

Abstract: A wide bandwidth resonant clock distribution comprises a clock grid configured to distribute a clock signal to a plurality of components of an integrated circuit and a tunable sector buffer configured to receive the clock signal and provide an output to the clock grid. The tunable sector buffer is configured to set latency and slew rate of the clock signal based on an identified resonant or non-resonant mode.

...read moreread less

Proceedings Article•DOI•

Low power methodology for an ASIC design flow based on high-level synthesis

[...]

Fahad Bin Muslim¹, Affaq Qamar¹, Luciano Lavagno¹•Institutions (1)

Polytechnic University of Turin¹

02 Nov 2015

TL;DR: This work presents a methodology using both clock gating and power gating to save power of an inverse discrete cosine transform (IDCT) design when the register transfer level (RTL) is generated automatically by high-level synthesis (HLS).

...read moreread less

Abstract: Power management in system-on-chip (SoC) design has become very important in modern nanometric technologies. It is desirable to consider power optimization at the system-level for maximum power savings due to its higher level of abstraction. Clock gating and power gating are two well-known techniques for dynamic and leakage power reduction respectively. They can even be integrated to get maximum power reduction by using the same signal to control both. This work presents a methodology using both these techniques to save power of an inverse discrete cosine transform (IDCT) design when the register transfer level (RTL) is generated automatically by high-level synthesis (HLS). Power gating is implemented by capturing the power intent using common power format (CPF). This work mainly highlights the prospects of integrating CPF with automatically generated RTL using HLS flow. Saving in dynamic power by a factor of around 10× is obtained through clock gating while more than 50% saving in static power is obtained through power gating. Power gating also results in some area overhead.

...read moreread less

Collapse