scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2015"


Patent
17 Mar 2015
TL;DR: In this paper, an internal clock is disabled if a no operation command is detected during periods of time when no read or write burst operation is taking place, which may be used to reduce power consumption by reducing operating current in memory devices.
Abstract: The present invention includes a circuit, system and method for selectively turning off internal clock drivers to reduce operating current. The present invention may be used to reduce power consumption by reducing operating current in a memory device. Operating current may be reduced by turning off internal clock drivers that deliver a clock signal during selected periods of time. According to an embodiment of clock control circuitry of the present invention, an internal clock is disabled if a no operation command is detected during periods of time when no read or write burst operation is taking place. Methods, memory devices and computer systems including the clock control circuitry and its functionality are also disclosed.

59 citations


Journal ArticleDOI
TL;DR: An energy- and area-efficient discrete wavelet packet transform (DWPT) processor design for power-constrained and cost-sensitive healthcare-monitoring applications that employs recursive memory-shared architecture to achieve low hardware complexity while performing required arbitrary-basis DWPT decomposition.
Abstract: This brief presents an energy- and area-efficient discrete wavelet packet transform (DWPT) processor design for power-constrained and cost-sensitive healthcare-monitoring applications. This DWPT processor employs recursive memory-shared architecture to achieve low hardware complexity while performing required arbitrary-basis DWPT decomposition. By exploiting inherent characteristics of different physiological signals through an entropy statistic engine, the DWPT processor core can be reconfigured to compute multilevel wavelet decomposition with effective time and frequency resolution. Various design techniques from algorithm to circuit levels, including reconfigurable computing, lifting scheme, dual-port pipeline processing, near-threshold operation, and clock gating, are applied to achieve energy efficiency. With a 0.18- $\mu\hbox{m}$ CMOS technology at 0.5 V and 1 MHz, the DWPT core only consumes 26 $\mu\hbox{W}$ for performing three-level 256-point DWPT decomposition with entropy statistic calculation. When integrated in an ARM Cortex-M0-based biomedical system-on-a-chip test platform, the DWPT processor achieves processing acceleration by three orders of magnitude and reduces energy consumption by four orders of magnitude compared with CPU-only implementations.

36 citations


Journal ArticleDOI
TL;DR: This work presents a 1.22 Gb/s fully parallel decoder of a GF(64) (160, 80) regular-(2, 4) NB-LDPC code in 65 nm CMOS, and allows each processing node to detect its own convergence and apply dynamic clock gating to save power.
Abstract: Nonbinary LDPC (NB-LDPC) codes, defined over Galois field, offer better coding gain and a lower error floor than binary LDPC codes. However, the complex decoding and large memory requirement have prevented any practical chip implementations. We present a 1.22 Gb/s fully parallel decoder of a GF(64) (160, 80) regular-(2, 4) NB-LDPC code in 65 nm CMOS. The reduced number of edges in NB-LDPC code's factor graph permits a low wiring overhead in the fully parallel architecture. The throughput is further improved by a one-step look-ahead check node design that increases the clock frequency to 700 MHz, and the interleaving of variable node and check node operations that shortens one decoding iteration to 47 clock cycles. We allow each processing node to detect its own convergence and apply dynamic clock gating to save power. When all processing nodes have been clock gated, the decoder terminates and continues with the next input to increase the throughput to 1.22 Gb/s. The dynamic clock gating and decoder termination improve the energy efficiency to 3.03 nJ/b, or 259 pJ/b/iteration, at 1.0 V and 700 MHz. Voltage scaling to 675 mV improves the energy efficiency to 89 pJ/b/iteration for a throughput of 698 Mb/s at 400 MHz.

35 citations


Journal ArticleDOI
TL;DR: AVHDL-based technique is proposed, to insert clock gating circuit and also the dynamic power due to this is estimated, which shows that the dynamicPower is reduced for the sequential benchmark circuits considered.

30 citations


Journal ArticleDOI
TL;DR: A novel clock tree resynthesis methodology which is based on a skew scheduling engine which works on an already built clock tree and demonstrates the effectiveness of the offsets at the output pins of the leaf-level clock drivers in comparison to the traditional clock scheduling in the clock pin of the flip-flops due to the better implementability and lesser area overhead.
Abstract: With aggressive technology scaling and complex design scenarios, timing closure has become a challenging and tedious job for the designers. Timing violations persist for multi-corner, multi-mode designs in the deep-routing stage although careful optimization has been applied at every step after synthesis. Useful clock skew optimization has been suggested as an effective way to achieve design convergence and timing closure. Existing approaches on useful skew optimization: 1) calculate clock skew at sequential elements before the actual tree is synthesized and 2) do not account for the implementability of the calculated schedules at the later stages of design cycle. In this paper, we propose a novel clock tree resynthesis methodology which is based on a skew scheduling engine which works on an already built clock tree. The output of the engine is a set of positive and negative offsets which translate to the delay and accelerations, respectively in clock arrival at the clock tree pins. We demonstrate the effectiveness of the offsets at the output pins of the leaf-level clock drivers in comparison to the traditional clock scheduling in the clock pins of the flip-flops due to the better implementability and lesser area overhead and present an algorithm to accurately realize these offsets in the clock tree. Experimental results on large-scale industrial designs demonstrate that our clock tree resynthesis methodology achieves respectively 57%, 12%, and 42% average improvement in total negative slack, worst negative slack, and failure-end-point with an average overhead of 26% in clock tree area. We also experimentally study the impact of on-chip-variation-derates on our approach in terms of the timing metric improvement and clock tree overhead.

28 citations


Journal ArticleDOI
TL;DR: This paper proposes an alternative method using graph transformation, which computes a parametric minimum clock period and is more than 104 times faster than Monte Carlo simulation while maintaining a good accuracy.
Abstract: Post-silicon clock tuning elements are widely used in high-performance designs to mitigate the effects of process variations and aging. Located on clock paths to flip-flops, these tuning elements can be configured through the scan chain so that clock skews to these flip-flops can be adjusted after manufacturing. Owing to the delay compensation across consecutive register stages enabled by the clock tuning elements, higher yield and enhanced robustness can be achieved. These benefits are, nonetheless, attained by increasing die area due to the inserted clock tuning elements. For balancing performance improvement and area cost, an efficient timing analysis algorithm is needed to evaluate the performance of such a circuit. So far this evaluation is only possible by Monte Carlo simulation which is very time-consuming. In this paper, we propose an alternative method using graph transformation, which computes a parametric minimum clock period and is more than $ {10}^ {4}$ times faster than Monte Carlo simulation while maintaining a good accuracy. This method also identifies the gates that are critical to circuit performance, so that a fast analysis-optimization flow becomes possible.

22 citations


Proceedings ArticleDOI
24 May 2015
TL;DR: This paper presents a novel, static DET flip-flop with a true-single-phase clock that completely avoids clock overlap hazards by eliminating the need for an inverted clock edge for functionality.
Abstract: Dual-edge-triggered (DET) synchronous operation is a very attractive option for low-power, high-performance designs Compared to conventional single-edge synchronous systems, DET operation is capable of providing the same throughput at half the clock frequency This can lead to significant power savings on the clock network that is often one of the major contributors to total system power However, in order to implement DET operation, special registers need to be introduced that sample data on both clock-edges These registers are more complex than their single-edge counterparts, and often suffer from a certain amount of clock-overlap between the main clock and the internally generated inverted clock This overlap can cause contention inside the cell and lead to logic failures, especially when operating at scaled power supplies and under process variations that characterize nanometer technologies This paper presents a novel, static DET flip-flop (DET-FF) with a true-single-phase clock that completely avoids clock overlap hazards by eliminating the need for an inverted clock edge for functionality The proposed DET FF was implemented in a standard 40nm CMOS technology, showing full functionality at low-voltage operating points, where conventional DET-FFs fail Under a near-threshold, 500mV supply voltage, the proposed cell also provides a 35% lower CK-to-Q delay and the lowest power-delay-product compared to all considered DET-FF implementations

22 citations


Journal ArticleDOI
TL;DR: A new paradigm for clock distribution that uses current, rather than voltage, to distribute a global clock signal with reduced power consumption is proposed and a new high-performance current-mode pulsed flip-flop with enable (CMPFFE) is created.
Abstract: We propose a new paradigm for clock distribution that uses current, rather than voltage, to distribute a global clock signal with reduced power consumption. While current-mode (CM) signaling has been used in one-to-one signals, this is the first usage in a one-to-many clock distribution network. To accomplish this, we create a new high-performance current-mode pulsed flip-flop with enable (CMPFFE) using 45 nm CMOS technology. When the CMPFFE is combined with a CM transmitter, the first CM clock distribution network exhibits 62% lower average power compared to traditional voltage mode clocks.

20 citations


Patent
15 Jun 2015
TL;DR: In this paper, an asynchronous wrapper circuit for a clock gating cell (CGC) is described, which includes circuitry configured to sample a data channel via sampling circuitry for a communication start signal to enable the CGC to start a gated clock for a data message on the data channel.
Abstract: Technology is described for an asynchronous wrapper circuit for a clock gating cell (CGC). In one example, the asynchronous wrapper cell for CGC includes circuitry configured to (1) sample a data channel via sampling circuitry for a communication start signal to enable the CGC to start a gated clock for a data message on the data channel, and (2) reset an enable of the CGC to an idle mode via idle mode control circuitry after the data message has been clocked via the CGC through function cell circuitry. The idle mode control circuitry generates an output for the sampling circuitry from the function cell. Various other computing circuitries are also disclosed.

19 citations


Proceedings ArticleDOI
24 May 2015
TL;DR: The algorithm is capable of recovering data over a wide tracking range or when the precise oversampling rate is not known a priori, for any real-valued oversampled rate, β ≥ 3, making this BO-CDR algorithm the first to not require integer-valued β.
Abstract: A new blind oversampling clock and data recovery (BO-CDR) algorithm is proposed It has high tolerance to low-frequency jitter (148 unit intervals at 10 kHz, measured at 640 Mbps) and is suitable for systems where the receiver clock has high drift with respect to the transmission The algorithm is capable of recovering data over a wide tracking range or when the precise oversampling rate (β) is not known a priori, for any real-valued oversampling rate, β ≥ 3, making this BO-CDR algorithm the first to not require integer-valued β To demonstrate the utility of the algorithm, two implementations are designed and evaluated The first is used in a low-power, low-data rate sensor node IC with a low-performance single phase clock source The second is a high-speed receiver with a multiple phase clock source implemented on FPGA The CDR core consists of just 47 logic cells and 19 registers and has an estimated power consumption of 070 mW at 640 Mbps The properties of this CDR algorithm make it appropriate for a wide range of applications in serial communication

18 citations


Journal ArticleDOI
TL;DR: In this paper, a power gating structure is proposed to reduce the ground bounce noise (GBN) caused by high voltage fluctuation on real ground rail during sleep mode to active mode transitions of Gating circuits.
Abstract: Power gating is the most effective method to reduce the standby leakage power by adding header/footer high-VTH sleep transistors between actual and virtual power/ground rails. When a power gating circuit transitions from sleep mode to active mode, a large instantaneous charge current flows through the sleep transistors. Ground bounce noise (GBN) is the high voltage fluctuation on real ground rail during sleep mode to active mode transitions of power gating circuits. GBN disturbs the logic states of internal nodes of circuits. A novel and reliable power gating structure is proposed in this article to reduce the problem of GBN. The proposed structure contains low-VTH transistors in place of high-VTH footer. The proposed power gating structure not only reduces the GBN but also improves other performance metrics. A large mitigation of leakage power in both modes eliminates the need of high-VTH transistors. A comprehensive and comparative evaluation of proposed technique is presented in this article for a chai...

Patent
15 Oct 2015
TL;DR: In this article, a clock gating architecture is proposed for loading data to or reading data from specific selected shift registers, where bitstreams are provided at one end of the shift register, and clocked through until the last flip flop receives its value.
Abstract: Configuration values for Lookup tables (LUTs) and programmable routing switches in an FPGA are provided by means of a number of flip flops arranged in a shift register. This shift register may receive test values in a factory test mode, and operational configuration values (implementing whatever functionality the client requires of the FPGA) in an operational mode. The bitstreams are provided at one end of the shift register, and clocked through until the last flip flop receives its value. Values may also be clocked out at the other end of the shift register to be compared to the initial bitstream in order to identify corruption of stored values e.g. due to radiation exposure. A clock gating architecture is proposed for loading data to or reading data from specific selected shift registers.

Journal ArticleDOI
TL;DR: A centralized and fine-grained microarchitecture-level clock gating for low power hardware accelerators which are automatically designed by high-level synthesis (HLS) tool and shown that 47%-86% reduction in power dissipation is observed.
Abstract: Nowadays, power is a primary concern in digital circuits and clock distribution networks are particularly a significant power consumer. Therefore, clock gating is an effective technique in saving dynamic power by reducing the switching activities. In this paper, we propose a centralized and fine-grained microarchitecture-level clock gating for low power hardware accelerators which are automatically designed by high-level synthesis (HLS) tool. The basic principium of our idea is not to use any extra computation for generating clock enabled signals and exploit exiting signals of finite state machine for controlling the datapath clock network. After determining the current state in finite state machine, clock sub-tree of current state is enabled and the other sub-trees are disabled with a slight increase in circuit area. Our approach is implemented within an HLS design flow for automatic low power hardware accelerator generation in application specific integrated circuit design. Experimental results are obtained on a set of representative benchmark programs. Depending on the circuit size and number of registers, it is shown that 47%–86% reduction in power dissipation is observed.

Journal ArticleDOI
TL;DR: A cyclic half-delay-line architecture that uses the same type of delay lines for cyclic delay determination and coarse locking is proposed and used to achieve the design goals of small footprint and fast locking for a large operating frequency range.
Abstract: A 3 MHz-to-1.8 GHz, 94 $\mu\hbox {W}$ -to-9.5 mW, all-digital delay-locked loop (ADDLL) using 65-nm CMOS technology is presented. In this paper, a cyclic half-delay-line architecture that uses the same type of delay lines for cyclic delay determination and coarse locking is proposed and used to achieve the design goals of small footprint and fast locking for a large operating frequency range. In addition, a new delay structure is developed for the cyclic delay units and coarse delay line. In addition to clock gating, which is used to reduce power consumption in the lock-in state regardless of the clock frequency, the automatic bypassing of the cyclic operation is developed and used to reduce power consumption during high-frequency operation. Through the use of proposed techniques, the active area is reduced to only 0.0153 mm $^{2}$ , and the operating frequency range is from 3 MHz to 1.8 GHz. The measurement results show that the proposed ADDLL achieves a peak-to-peak jitter of 3 ps with 9.5 mW power consumption when operated at 1.8 GHz.

Journal ArticleDOI
TL;DR: In this article, the performance of a synchronous sampling system attacking a modern microcontroller running a software AES implementation is characterized under four conditions: with a stable crystal oscillator-based clock, with a clock that is randomly varied between 3.9 and 13 MHz, with an internal oscillator that is slightly random variation due to natural "drift" in the oscillator.
Abstract: Measuring power consumption for side channel analysis typically uses an oscilloscope, which measures the data relative to an internal sample clock. By synchronizing the sampling clock to the clock of the target device, the sample rate requirements are considerably relaxed; the attack will succeed with a much lower sample rate. This work characterizes the performance of a synchronous sampling system attacking a modern microcontroller running a software AES implementation. This attack is characterized under four conditions: with a stable crystal oscillator-based clock, with a clock that is randomly varied between 3.9 and 13 MHz, with an internal oscillator that is randomly varied between 7.2 and 8.1 MHz, and with an internal oscillator that has slight random variation due to natural ‘drift’ in the oscillator. Traces captured with the synchronous sampling technique can be processed with a standard Differential Power Analysis style attack in all four cases, whereas when an oscilloscope is used only the stable oscillator setup is successful. This work also develops the hardware to recover the internal clock of a device which does not have an externally available clock. It is possible to implement this scheme in software only, allowing it to work with existing oscilloscope-based test environments. Performing the recovery in hardware allows the use of fault injection with excellent temporal stability relative to a sensitive event. This is demonstrated with a power glitch inserted into a microcontroller, where the glitch is triggered based on a signature in the measured power consumption.

Proceedings ArticleDOI
02 Mar 2015
TL;DR: A new scan shifting method based on clock gating of multiple groups by reducing toggle rate of the internal combinational logic is presented, which prevents cumulative transitions caused by shifting operations of the scan cells.
Abstract: From the advent of very large scale integration (VLSI) design, a larger power consumption of a scan-based testing has been one of the most serious problems. The large number of scan cells lead to excessive switching activities during the scan shifting operations. In this paper, we present a new scan shifting method based on clock gating of multiple groups by reducing toggle rate of the internal combinational logic. This method prevents cumulative transitions caused by shifting operations of the scan cells. In addition, the existing compression schemes can be compatible with the proposed method without modification of decompression architecture. Experimental results on ITC'99 benchmark circuits and industrial circuits show that this shifting method reduces the scan shifting power in all cases. In spite of outperformed power, a burden of the extra logic is not necessary to be contemplated.

PatentDOI
29 Jan 2015
TL;DR: A new paradigm for clock distribution that uses current, rather than voltage, to distribute a global clock signal with reduced power consumption is proposed, and the first CM clock distribution network exhibits 45.2% lower average power compared to traditional voltage mode clocks.
Abstract: Current-mode signaling for a one-to-many clock signal distribution providing significantly less dynamic power use and improved noise immunity compared to traditional VM signaling schemes.

Patent
13 Feb 2015
TL;DR: In this paper, a multi-bit flip-flop operating as a master-slave flipflop may minimize power consumption occurring in a clock path through which the clock signal is transmitted.
Abstract: A multi-bit flip-flop includes a plurality of multi-bit flip-flop blocks that share a clock signal. Each of the multi-bit flip-flop blocks includes a single inverter and a plurality of flip-flops. The single inverter generates an inverted clock signal by inverting the clock signal. Each of the flip-flops includes a master latch part and a slave latch part and operates the master latch part and the slave latch part based on the clock signal and the inverted clock signal. Here, the flip-flops are triggered at rising edges of the clock signal. Thus, the multi-bit flip-flop operating as a master-slave flip-flop may minimize (or, reduce) power consumption occurring in a clock path through which the clock signal is transmitted.

Journal ArticleDOI
TL;DR: A novel register clustering methodology in generating the leaf level topology of the clock tree to reduce the power consumption and a buffer allocation algorithm is proposed to satisfy the slew constraint within the clusters at a minimum cost of power consumption.
Abstract: Clock networks dissipate a significant fraction of the entire chip power budget. Therefore, the optimization for power consumption of clock networks has become one of the most important objectives in high performance IC designs. In contrast to most of the traditional studies that handle this problem with clock routing or buffer insertion strategy, this paper proposes a novel register clustering methodology in generating the leaf level topology of the clock tree to reduce the power consumption. Three register clustering algorithms called KMR, KSR and GSR are developed and a comprehensive study of them is discussed in this paper. Meanwhile, a buffer allocation algorithm is proposed to satisfy the slew constraint within the clusters at a minimum cost of power consumption. We integrate our algorithms into a classical clock tree synthesis (CTS) flow to test the register clustering methodology on ISPD 2010 benchmark circuits. Experimental results show that all the three register clustering algorithms achieve more than 20% reduction in power consumption without affecting the skew and the maximum latency of the clock tree. As the most effective method among the three algorithms, GSR algorithm achieves a 31% reduction in power consumption as well as a 4% reduction in skew and a 5% reduction in maximum latency. Moreover, the total runtime of the CTS flow with our register clustering algorithms is significantly reduced by almost an order of magnitude.

Proceedings ArticleDOI
22 Apr 2015
TL;DR: This paper targets the unified specification of power-management techniques early in the design flow and efficiency of the proposed approach is illustrated by comparison of the unified power- management specification and the standardized approach.
Abstract: Power consumption is the greatest concern in current highly-integrated hardware-system design. The power reduction is targeted mostly through power management, implementing such techniques as clock gating, power gating, or voltage and frequency scaling. Due to growing complexity, the start-point in the design has moved from the register-transfer level to the system level. However, the power management lacks the abstraction needed for the system level. Also, different power-management techniques are specified differently, complicating the specification even more. This paper targets the unified specification of power-management techniques early in the design flow. SystemC is used for describing the system functionality along with the power management. Efficiency of the proposed approach is illustrated by comparison of the unified power-management specification and the standardized approach.

Patent
25 Mar 2015
TL;DR: In this paper, the adaptive clock distribution system includes a tunable-length delay circuit to delay distribution of a clock signal provided to a clocked circuit, to prevent timing margin degradation of the clocked circuits after a voltage droop occurs in a power supply supplying power to the clock circuit.
Abstract: Automatic calibration circuits for operational calibration of critical-path time delays in adaptive clock distribution systems, and related methods and systems, are disclosed. The adaptive clock distribution system includes a tunable-length delay circuit to delay distribution of a clock signal provided to a clocked circuit, to prevent timing margin degradation of the clocked circuit after a voltage droop occurs in a power supply supplying power to the clocked circuit. The adaptive clock distribution system also includes a dynamic variation monitor to reduce frequency of the delayed clock signal provided to the clocked circuit in response to the voltage droop in the power supply, so that the clocked circuit is not clocked beyond its performance limits during a voltage droop. An automatic calibration circuit is provided in the adaptive clock distribution system to calibrate the dynamic variation monitor during operation based on operational conditions and environmental conditions of the clocked circuit.

Patent
02 Dec 2015
TL;DR: In this article, the authors propose a circuit that provides an improved ability to exit from holdover operations, most notably during conditions where the clock signal inputs to a PLL of the clock conditioner are significantly out of phase.
Abstract: Disclosed is a circuit, such as a clock conditioner, that provides an improved ability to exit from holdover operations, most notably during conditions where the clock signal inputs to a PLL of the clock conditioner are significantly out of phase. The circuit utilizes the PLL to generate output clocks based on a reference clock and a feedback clock. During holdover mode, the PLL is unlocked. When the reference clock becomes available and holdover mode can be exited, a holdover controller issues a reset signal that triggers a synchronization of the phases of the inputs to the PLL. The reset signal causes the feedback divider component that generates the feedback clock input to reset its phase and adjust its divide ratio for at least the first divide cycle after restart so that its next rising edge will be phase-aligned with the reference clock. Once the two inputs of the PLL phase detector are phase-aligned, the PLL is re-enabled and the PLL smoothly resumes normal operation.

Journal ArticleDOI
TL;DR: An improved Razor flip-flop is introduced which makes more effective use of its shadow latch, so that a pipeline stage can correct an error while continuing to receive data, and avoids the need for repeated clock gating when timing errors happen simultaneously at different stages.
Abstract: Aggressive reduction of timing margins, called timing speculation, is an effective way of reducing the supply voltage for a pipeline circuit and thereby its power consumption. However, probability of timing error increases with the voltage scaling and hence, the errors must be corrected with small cycle penalty. We introduce an improved Razor flip-flop which makes more effective use of its shadow latch, so that a pipeline stage can correct an error while continuing to receive data. This avoids the need for repeated clock gating when timing errors happen simultaneously at different stages, or when an error persists. The new flip-flop also facilitates time-borrowing. Our technique uses less energy than the state-of-the art technique, and the energy saving increases with pipeline length: with 10 stages, 4–9% smaller energy is used.

Proceedings ArticleDOI
15 May 2015
TL;DR: A 16 bit low power pipelined RISC processor is proposed by us in this paper, the RISC Processor consists of the block mainly ALU, Universal shift register and Barrel Shifter.
Abstract: A 16 bit low power pipelined RISC processor is proposed by us in this paper, the RISC processor consists of the block mainly ALU, Universal shift register and Barrel Shifter. We have used modified Harvard architecture that uses separate memories for its instruction & data memory response where as in the other architecture by von Neumann, has only one shared memory for instruction and data, with one data bus and address bus with between data memory & processor memory. The remedial architectural modification has been made in incremental circuit utilized in carry select adder unit of the ALU in the RISC Processor. Operation in the core RISC Processor Fetch, Decode, execute, write back is implemented in the 2 stage pipelining with the positive edge & negative Edge. The process has been realized using XILINX ISE Design suit 13.2 & the Dynamic power is minimized in the RISC Core through the clock gating technique that is an efficient power technique and the total power estimation is done by the X Power analyzer. All the implementation is done in XILINX KINTEX XC7K1607-3fbg676 in it kit 28 nm technology are used. The simulation illustrate the total power dissipated by the processor to be 0.220 watt, and the Latency is 1.5 cycle.

Patent
13 Jul 2015
TL;DR: In this paper, a phase frequency detector (PFD) circuit comprises first circuitry including an output that outputs a missing edge signal, and second circuitry is coupled to the first circuitry and may include components arranged to generate one or both of a reference clock blocking signal and a feedback clock blocking signals based on the missing edge signals.
Abstract: Systems and methods herein may include or involve control circuitry that detects missing edges of reference and/or feedback clocks and may block the next N rising edges of the feedback clock or reference clock, respectively. In some implementations, a phase frequency detector (PFD) circuit comprises first circuitry including an output that outputs a missing edge signal. The first circuitry may include components arranged to detect a missing rising edge of one or both of a reference clock signal and a feedback clock signal. Second circuitry is coupled to the first circuitry and may include components arranged to generate one or both of a reference clock blocking signal and a feedback clock blocking signal based on the missing edge signal. Further, in some implementations, the blocking of the next N rising edges of the opposite clock may effectively increase the positive gain of the PFD.

Patent
Gil Stoler1, Yaniv Shapira1
19 Jun 2015
TL;DR: In this paper, a clock generator for generating a clock equivalent to a target clock which is an input clock divided by a non-integer ratio is presented. But the clock generator is not designed to generate a clock with a fixed number of cycles.
Abstract: A clock generator for generating a clock equivalent to a target clock which is an input clock divided by a non-integer ratio is disclosed The clock generator comprises a clock divider configured to receive the input clock and divide the input clock with a reconfigurable dividing ratio; and a control circuit controlling operations of the clock divider to divide the input clock by a first dividing ratio to generate a first number of cycles of a first clock in a frame, and divide the input clock by a second dividing ratio to generate a second number of cycles of a second clock in the frame, wherein a difference between a period of the frame and a cumulative time of the first number of cycles of the first clock and the second number of cycles of the second clock is less than a threshold value

Journal ArticleDOI
TL;DR: Preliminary research results prove the feasibility of the proposed technique and show that the operating frequency ranges from 110 MHz to 1.75 GHz, with the corrected duty cycle varying from 51.2% to 48.9% based on 0.18-μm CMOS technology.
Abstract: A clock skew-compensation and duty-cycle correction circuit (CSADC) is used as the second-level clock distributing circuit to align a system global clock while maintaining a 50% duty cycle. A power-efficient, range-unlimited, and accuracy-enhanced CSADC, designed mainly with a new delay-interleaving and -recycling technique that mitigates operating frequency limitations while keeping overhead costs low, is proposed in this paper. Our preliminary research results prove the feasibility of the proposed technique and show that the operating frequency ranges from 110 MHz to 1.75 GHz, with the corrected duty cycle varying from 51.2% to 48.9% based on 0.18- $\mu $ m CMOS technology. Meanwhile, the lock-in time, static phase error, and power consumption are, respectively, 26 clock cycles, 4.2 ps, and 5.58 mW at 1.75 GHz.

Patent
03 Feb 2015
TL;DR: In this article, an integrated clock gater (ICG) circuit having clocked complimentary voltage switched logic (CICG), which delivers high performance while maintaining low power consumption characteristics, is presented.
Abstract: Inventive aspects include an integrated clock gater (ICG) circuit having clocked complimentary voltage switched logic (CICG) that delivers high performance while maintaining low power consumption characteristics. The CICG circuit provides a small enable setup time and a small clock-to-enabled-clock delay. A significant reduction in clock power consumption is achieved in both enabled and disabled modes, but particularly in the disabled mode. Complimentary latches work in tandem to latch different voltage levels at different nodes depending on the voltage level of the received clock signal and whether or not an enable signal is asserted. An inverter takes the voltage level from one of the nodes, inverts it, and outputs a gated clock signal. The gated clock signal may be active or quiescent depending on the various voltage levels. Time is “borrowed” from an evaluation window and added to a setup time to provide greater tolerances for receiving the enable signal.

Patent
07 May 2015
TL;DR: A wide bandwidth resonant clock distribution comprises a clock grid configured to distribute a clock signal to a plurality of components of an integrated circuit and a tunable sector buffer configured to receive the clock signal and provide an output to the clock grid as mentioned in this paper.
Abstract: A wide bandwidth resonant clock distribution comprises a clock grid configured to distribute a clock signal to a plurality of components of an integrated circuit and a tunable sector buffer configured to receive the clock signal and provide an output to the clock grid. The tunable sector buffer is configured to set latency and slew rate of the clock signal based on an identified resonant or non-resonant mode.

Proceedings ArticleDOI
02 Nov 2015
TL;DR: This work presents a methodology using both clock gating and power gating to save power of an inverse discrete cosine transform (IDCT) design when the register transfer level (RTL) is generated automatically by high-level synthesis (HLS).
Abstract: Power management in system-on-chip (SoC) design has become very important in modern nanometric technologies. It is desirable to consider power optimization at the system-level for maximum power savings due to its higher level of abstraction. Clock gating and power gating are two well-known techniques for dynamic and leakage power reduction respectively. They can even be integrated to get maximum power reduction by using the same signal to control both. This work presents a methodology using both these techniques to save power of an inverse discrete cosine transform (IDCT) design when the register transfer level (RTL) is generated automatically by high-level synthesis (HLS). Power gating is implemented by capturing the power intent using common power format (CPF). This work mainly highlights the prospects of integrating CPF with automatically generated RTL using HLS flow. Saving in dynamic power by a factor of around 10× is obtained through clock gating while more than 50% saving in static power is obtained through power gating. Power gating also results in some area overhead.